From: eugene@cse.ucsc.edu (Eugene Miya) Newsgroups: comp.sys.super Subject: Re: What features must scientific languages have ... Date: 15 May 1998 01:49:07 GMT Organization: UC Santa Cruz CIS/CE Distribution: inet Message-ID: <6jg6uj$co4@darkstar.ucsc.edu> References: NNTP-Posting-Host: arapaho.cse.ucsc.edu Summary: Have at this Steve. Lines: 367 Re: our phone conversations. Re: your experience on the NA-list with people complaining about C++. I suggest that you capture this snap shot and separate out relevant portions. I think this is ground covered by many people before. Must we prioritize? Unfortunately yes. I did not think of these in this priority order [think order in brackets]. Shuffle the order of these or add or subtract as you see fit. How's this? REQUIREMENT #1: Performance. Must be adequate. [#1] REQUIREMENT #2: Consistency. [#4] REQUIREMENT #3: Convenience. [#3] Optional REQUIREMENT #4: Compatibility. [#2] I am mindful of the human factor [George Sperling's "Magical #7 + or - 2" paper and also Ed Yourdon'd comments about the difficulty of juggling 4 balls at once]. So I add additional requirements at great risk. REQUIREMENT #1: Performance. Must be adequate. [#1] This is a semantic requirement. Can trade some (a little) performance, but No one is going to use a machine unless it does something faster than by hand. Ah, so we can relax some performance, but not too much. Initially, software didn't even appear until later. What does this mean? When you ask or tell or direct the computer to do something, it has some performance characteristic. Cost: Expected to be high. REQUIREMENT #2: Consistency. [#4] Science loves determinism. The Journal of Irreproducible Results and the Usenet are fine for humor, but heaven help you if your science gets there/here. This is both a syntactic and semantic requirement. You want reproducibility. You want your colleagues to be able to reproduce (repeat/replicate and then apply) to validate and verify your simulation/analysis. "Chaos" is just a fad, and lacks elegance, and the simple stuff will get out (Einstein). Sequential programming derives a lot of its power, because it forces people to organize and serialize their work. It gets collected into neat little bundles like sentences, formulae, paragraphs, equations, imagery, animation, etc. Logic is basically sequential. It gets especially tough when we have to do dependence analysis. The role of random numbers and non-determinism play a funny part in since. I think that a lot of scientists would avoid time-sharing and parallel computing were it not for #1 and the costs. Science these days is cut-throat. So you have incentive to be consistent. Cost: Not expected to be high. Reality: might be high. You are willing to suffer a little pain to get #1 and #2. The trade off. REQUIREMENT #3: Convenience. [#3] Face it: what many scientists want is slavery. Look at grad student jokes. We know about the first "computists:" it was Dick Feynman riding herd over all those women..... However, if the slave(s) take up too much time in training [programming] slaves cease being worth their effort. This is both a syntactic and semantic requirement. Well if we can't have slavery, we get to computers. Every one knows that computers are dumb. Deeper Thought didn't program itself. People had to program it. Cost: Not expected to be high. Optional REQUIREMENT #4: Compatibility. [#2] This is sometimes at odds with performance. This is a long-term goal. It is frequently sacrificed. This gets into the Fortran/OS morass. This is both a syntactic and semantic requirement. Compatibility has two subissues 1) communication: you want to move the program for purposes of upgrade You have your colleagues also use your program. Generality. Reproducibily. Good science stuff. 2) application: you want to take advantage of the body of existing work like libraries and other packages. This can be a big deal. Do the LINPACK guys rewrite a version of LINPACK/BLAS, etc. for every new language which comes out? Are you kidding? Cost: Not expected to be high. REQUIREMENT #1: Performance. Must be adequate. [#1] REQUIREMENT #2: Consistency. [#4] REQUIREMENT #3: Convenience. [#3] Optional REQUIREMENT #4: Compatibility. [#2] compared with Pancake and Bergmark (panel 10, and IEEE Computer 1990): Convenience Reliability Expressiveness Compatibility So note that the ordinal number is also 4. I think that expressiveness is part of convenience and reliability is assumed in performance. LIST OF ATTEMPTS (Tarpits and mine fields) ................ Symbol manipulation systems Limited value. Has not scaled. Use in research/development, not production for performance reasons. Gibbs Unknown. Probably defunct. New packages Backus FP/FL Insufficient resources against New languages installed base. Using "mathematics" as a basis for scientific languages. It's okay. PROBLEM: In a scientific research, 90% of the time, we as researchers in any field don't know what we are doing. "If we knew what it was we were doing, it would not be called research, would it?" -- Albert Einstein Ref: Hist. of Programming Languages [HoPL] Conferences: Adele Goldberg's slide of S.S. Smalltalk: where the people are building the ship while attempting to sail it at the same time. von Neumann was opposed to high-level languages [HOPL-I], but he did not oppose Backus fortunately. Von Neumann as can be ascertained was a "Real Programmer"[tm], i.e., give him 0s and 1s on switches to solve his massive thermonuclear codes. Of course that relegates most of the rest of us to the category of "whimps." The real problem many programmers faced (complexity) was usefully using a homogeneous address space. They frequently overran array bounds, branched to wrong places, etc. This is why we now have programming languages. The new problem cames with the details: we end up creating dialects (really different compiler implementations) which are mostly compatible, but later we find, are not enough. The programming language alone is not enough to accomplish the test, and we got from that operating systems and other environments with inconsistent character sets (closer to the hardware), floating-point (it's called floating point but its a compromise), ways of gathering related data [take your pick datasets or file or DBMS systems], naming conventions, ad nauseum. This is a problem of notation (syntax). And we also have limited means of expressing this problem. A few words about "non-scientific" requirements ............................................... Many people would like to trade numeric issues for the other features like character and string processing. Some aspects really are needed in scientific languages for all kinds of reasons. Computer Languages .................. We need to speak of both the syntax and the semantics of languages. We have a few types. One lesson we have learned so far is that straight natural languages don't appear to hack it. Natural language is too ambiguous and too verbose (e.g., COBOL) for the work required [An interesting side note that I did not realize until recently was that the Engima cryphering machine has no numbers.] This despite the fact that symbolic packages like Mathematica, Maple, Macsyma attempt to use single natural language commands (natural language if your language is English). APL never caught on. Syntax Simple better. There is an attribute called orthogonality. Fort 8x and beyond have attempted to get into array syntax, but this will not be enough. Part of the problem comes with the management of array layout (col. vs. rows or rows vs. columns, etc.). I'm not certain how APL handles this. But I know that the guys who worked on the SISAL language attempted to be concern for handling boundaries: edges, faces, higher-dimensional structures, etc. Math-like ......... Knuth's observation on X=X+1. Knuth noted in a 1985 math journal that this type of statement is the confusion is at the heart of many problems. It's not math. In fact, based on last week's brief discussion, I forgot to note one critical detail about a program he had to run. He noted that he had a program which had a loop which had to run 1,000,000,000,000 times. I recently attended a CS seminar where some one noted they had a big data structure of 1,000,000,000 bytes. These are interesting numbers. Because between them, the typical computer runs out of bits. I know DEK doesn't run on a 64-bit machine. So the loop decomposition had to be done by hand. That's computers, not math. Semantics ......... This is a tough nut to crack. We have decades of software which use side effects as features (not bugs). This is a headache. What characterizes scientific computing/programming? .................................................... Emphasis on the numeric (not always) over character symbol manipulation, can involve particularly large, sometimes unbounded sets of data. Might not involve preexisting sets of data (i.e. data or knoelwedge bases and searching). Greater emphasis on "analysis" or simulation or emulation. Is frequently constrained by incompletely understood natural or physical phenomena. George contributed a few comments (which are on panel 18): It seems most useful to consider the question of software. Most commentators usually skirt this problem. Superficially, the software used for small computers is roughly the same as for that used in the Supercomputers with certain exceptions: 1. Software developed on small or otherwise inadequate computers usually shows all sorts of inadequacies, such as too small tables, or other installation and memory limitations 2. Software that originates on small computers is generally not robust; nor easily expendable. ... I suspect that very few people have ever written a program that needed to be run on a Supercomputer, so it might be helpful to examine some of the characteristics of such programs. ... Generally the control flow in the program is rather straightforward. The three things most needed in a Supercomputer are fast arithmetic, large memories, equally fast I/O, and a usable, robust operating environment.. {So I can't count ... either.} A typical problem may require between 10^14 and 10^17 arithmetic operations; clearly, the floating point unit has to be big and fast, and so does the memory (typically each floating point operation is responsible for 24 bytes of memory traffic.) * Very large memories are needed to accommodate the billions of WORDS needed for each data set, and the memory traffic on average has to move 3 words (24 bytes: 16 in and 8 out) per floating point operation.. It is typical that every data point is re-computed every cycle and it is usual that on each cycle each point requires tens of thousands of arithmetical operations. Often there is an enormous gush of output. Number ranges are often too big for anything other than floating point. Notwithstanding, there is always a problem of retaining some numerical significance so multiple precision arithmetic is vital. Perhaps you can guess that there is just a limited number of genuine Supercomputing applications, but even this number shrinks dramatically when we try to see who is willing to pay for developing a given problem. Reference: %A Stephen Nash, ed. %T A History of Scientific Computing %I ACM Press, Addison-Wesley %C NY %D 1990 %K book, text, survey, war stories, supercomputing, proceedings, %X chapter titles: Remembrance of Things Past The Contribution of J. H. Wilkinson to Numerical Analysis The Influence of George Forsythe and His Students Howard H. Aiken and the Computer Particles in Their Self-Consistent Fields: From Hartree's Differential Analyzer to Cray Machines Fluid Dynamics, Reactor Computations, and Surface Representations The Development of ODE Methods: A Symbiosis betweem Hardware and Numerical Analysis A Personal Retrospection of Reservoir Simulation How the FFT Gained Acceptance Conclusions: * Prompt publication of significant achievements is essental. * Reviews of old literature can be rewarding. * Communication among mathematicians, numerical analysts, and works in a wide range of applications can be fruitful. * Do not publish papers in neoclassic Latin. Origins of the Simplex Method Historical Comments on Finite Elements Conjugacy and Gradients A Historical Review of Iterative Methods Shaping the Evolution of Numerical Analysis in the Computer Age: The SIAM Thrust Reminiscences on the University of Michigan Summer Schools, the Gatlinberg Symposia, and Numerische Mathematik The Origin of Mathematics of Computation and Some Person Recollections Mathematical Software and ACM Publications BIT -- A Child of the Computer The Los Alamos Experience, 1943-1954 The Prehistory and Early History of Computation at the U.S. Bureau of Standards Programmed Computing at the Universities of Cambridge and Illinois in the Early Fifties Early Numerical Analysis in the United Kingdom The Pioneer Days of Scientific Computing in Switzerland The Development of Computational Mathematics in Czechoslovakia and the USSR The Contribution of Leningrad Mathematicians to the Development of Numerical Linear Algebra in the Period of 1950-1986 I think in the end, it's all futile. It seems our lot in life to suffer..... Fortran and the oncoming MS. We are doomed, doomed, doomed!