From: eugene@cse.ucsc.edu (Eugene Miya)
Newsgroups: comp.sys.super
Subject: Re: What features must scientific languages have ...
Date: 15 May 1998 01:49:07 GMT
Organization: UC Santa Cruz CIS/CE
Distribution: inet
Message-ID: <6jg6uj$co4@darkstar.ucsc.edu>
References: <uug1ih9crt.fsf@wayne.cs.clemson.edu>
NNTP-Posting-Host: arapaho.cse.ucsc.edu
Summary: Have at this Steve.
Lines: 367

Re: our phone conversations.
Re: your experience on the NA-list with people complaining about C++.
I suggest that you capture this snap shot and separate out relevant portions.

I think this is ground covered by many people before.

Must we prioritize?  Unfortunately yes.
I did not think of these in this priority order [think order in brackets].
Shuffle the order of these or add or subtract as you see fit.

How's this?
REQUIREMENT #1: Performance.  Must be adequate. [#1]
REQUIREMENT #2: Consistency.			[#4]
REQUIREMENT #3: Convenience.				[#3]
Optional REQUIREMENT #4: Compatibility.			[#2]


I am mindful of the human factor [George Sperling's "Magical #7 + or - 2"
paper and also Ed Yourdon'd comments about the difficulty of juggling 4
balls at once].  So I add additional requirements at great risk.




REQUIREMENT #1: Performance.  Must be adequate. [#1]
This is a semantic requirement.
Can trade some (a little) performance, but 
No one is going to use a machine unless it does something faster than by hand.
Ah, so we can relax some performance, but not too much.
Initially, software didn't even appear until later.

What does this mean?  When you ask or tell or direct the computer to do
something, it has some performance characteristic.

Cost: Expected to be high.  



REQUIREMENT #2: Consistency.			[#4]
Science loves determinism.
The Journal of Irreproducible Results and the Usenet are fine for humor,
but heaven help you if your science gets there/here.
This is both a syntactic and semantic requirement.

You want reproducibility.  You want your colleagues to be able to
reproduce (repeat/replicate and then apply) to validate and verify
your simulation/analysis. "Chaos" is just a fad, and lacks elegance,
and the simple stuff will get out (Einstein).

Sequential programming derives a lot of its power, because it forces
people to organize and serialize their work.  It gets collected into
neat little bundles like sentences, formulae, paragraphs, equations,
imagery, animation, etc.
Logic is basically sequential.  It gets especially tough when we have to
do dependence analysis.

The role of random numbers and non-determinism play a funny part in since.
I think that a lot of scientists would avoid time-sharing and parallel
computing were it not for #1 and the costs.
Science these days is cut-throat.  So you have incentive to be consistent.

Cost: Not expected to be high.  Reality: might be high.

You are willing to suffer a little pain to get #1 and #2.
The trade off.



REQUIREMENT #3: Convenience.				[#3]
Face it: what many scientists want is slavery.  Look at grad student jokes.
We know about the first "computists:" it was Dick Feynman riding herd
over all those women.....
However, if the slave(s) take up too much time in training [programming]
slaves cease being worth their effort.
This is both a syntactic and semantic requirement.

Well if we can't have slavery, we get to computers.
Every one knows that computers are dumb.
Deeper Thought didn't program itself.  People had to program it.

Cost: Not expected to be high.  



Optional REQUIREMENT #4: Compatibility.			[#2]
This is sometimes at odds with performance.
This is a long-term goal.  It is frequently sacrificed.
This gets into the Fortran/OS morass.
This is both a syntactic and semantic requirement.

Compatibility has two subissues
	1) communication: you want to move the program for purposes of upgrade
		You have your colleagues also use your program.
		Generality.  Reproducibily.  Good science stuff.
	2) application: you want to take advantage of the body of existing 
	work like libraries and other packages.  This can be a big deal.
	Do the LINPACK guys rewrite a version of LINPACK/BLAS, etc.
	for every new language which comes out?  Are you kidding?

Cost: Not expected to be high.  





REQUIREMENT #1: Performance.  Must be adequate. [#1]
REQUIREMENT #2: Consistency.			[#4]
REQUIREMENT #3: Convenience.				[#3]
Optional REQUIREMENT #4: Compatibility.			[#2]

compared with Pancake and Bergmark (panel 10, and IEEE Computer 1990):
Convenience
Reliability
Expressiveness
Compatibility

So note that the ordinal number is also 4.  I think that expressiveness
is part of convenience and reliability is assumed in performance.









LIST OF ATTEMPTS (Tarpits and mine fields)
................
Symbol manipulation systems		Limited value.  Has not scaled.
					Use in research/development, not
					production for performance reasons.

Gibbs					Unknown.  Probably defunct.
New packages

Backus FP/FL				Insufficient resources against
New languages				installed base.







Using "mathematics" as a basis for scientific languages.

It's okay.
PROBLEM: In a scientific research, 90% of the time, we as researchers in
any field don't know what we are doing.
	"If we knew what it was we were doing, it would not be called
	research, would it?"
                -- Albert Einstein
	Ref: Hist. of Programming Languages [HoPL] Conferences:
		Adele Goldberg's slide of S.S. Smalltalk:
		where the people are building the ship while attempting
		to sail it at the same time.

von Neumann was opposed to high-level languages [HOPL-I], but he did not
oppose Backus fortunately.  Von Neumann as can be ascertained was
a "Real Programmer"[tm], i.e., give him 0s and 1s on switches to solve
his massive thermonuclear codes.  Of course that relegates most of the
rest of us to the category of "whimps."

The real problem many programmers faced (complexity) was usefully using
a homogeneous address space.  They frequently overran array bounds,
branched to wrong places, etc.  This is why we now have programming languages.

The new problem cames with the details:
we end up creating dialects (really different compiler implementations)
which are mostly compatible, but later we find, are not enough.
The programming language alone is not enough to accomplish the test, and
we got from that operating systems and other environments with
inconsistent character sets (closer to the hardware), floating-point
(it's called floating point but its a compromise), ways of gathering
related data [take your pick datasets or file or DBMS systems],
naming conventions, ad nauseum.


This is a problem of notation (syntax).
And we also have limited means of expressing this problem.







A few words about "non-scientific" requirements
...............................................

Many people would like to trade numeric issues for the other features
like character and string processing.  

Some aspects really are needed in scientific languages for all kinds of reasons.


Computer Languages
..................

We need to speak of both the syntax and the semantics of languages.
We have a few types.
One lesson we have learned so far is that straight natural languages
don't appear to hack it.  Natural language is too ambiguous and too
verbose (e.g., COBOL) for the work required [An interesting side note
that I did not realize until recently was that the Engima cryphering
machine has no numbers.]

This despite the fact that symbolic packages like Mathematica, Maple,
Macsyma attempt to use single natural language commands (natural
language if your language is English).

APL never caught on.


Syntax
	Simple better.
There is an attribute called orthogonality.

Fort 8x and beyond have attempted to get into array syntax,
but this will not be enough.  Part of the problem comes with the
management of array layout (col. vs. rows or rows vs. columns, etc.).
I'm not certain how APL handles this.
But I know that the guys who worked on the SISAL language attempted to
be concern for handling boundaries: edges, faces, higher-dimensional
structures,  etc.


Math-like
.........
	Knuth's observation on X=X+1.
Knuth noted in a 1985 math journal that this type of statement is the
confusion is at the heart of many problems.  It's not math.

In fact, based on last week's brief discussion, I forgot to note one
critical detail about a program he had to run.  He noted that he had
a program which had a loop which had to run 1,000,000,000,000 times.
I recently attended a CS seminar where some one noted they had a big
data structure of 1,000,000,000 bytes.  These are interesting numbers.
Because between them, the typical computer runs out of bits.  I know DEK
doesn't run on a 64-bit machine.  So the loop decomposition had to be
done by hand.  That's computers, not math.

Semantics
.........
This is a tough nut to crack.
We have decades of software which use side effects as features
(not bugs).  This is a headache.










What characterizes scientific computing/programming?
....................................................
Emphasis on the numeric (not always) over character symbol manipulation,
can involve particularly large, sometimes unbounded sets of data.

Might not involve preexisting sets of data (i.e. data or knoelwedge
bases and searching).  Greater emphasis on "analysis" or simulation or
emulation.

Is frequently constrained by incompletely understood natural or
physical phenomena.


George contributed a few comments (which are on panel 18):

It seems most useful to consider the question of software.  Most
commentators usually skirt this problem.  Superficially, the software
used for small computers is roughly the same as for that used in the
Supercomputers with certain exceptions:

1.   Software developed on small or otherwise inadequate computers
usually shows all sorts of inadequacies, such as too small tables, or
other installation and memory limitations

2.   Software that originates on small computers is generally not
robust; nor easily expendable.

...

I suspect that very few people have ever written a program that needed
to be run on a Supercomputer, so it might be helpful to examine some of
the characteristics of such programs. ...

  Generally the control flow in the program is rather straightforward.
   The three things most needed in a Supercomputer are fast arithmetic,
   large memories, equally fast I/O, and a usable, robust operating
   environment..  {So I can't count ... either.} A typical problem may
   require between 10^14 and 10^17 arithmetic operations; clearly, the
   floating point unit has to be big and fast, and so does the memory
   (typically each floating point operation is responsible for 24 bytes
   of memory traffic.) *   Very large memories are needed to
accommodate the billions of WORDS needed for each data set, and the
memory traffic on average has to move 3 words (24 bytes: 16 in and 8
out) per floating point operation..
   It is typical that every data point is re-computed every cycle and
   it is usual that on each cycle each point requires tens of thousands
   of arithmetical operations.  Often there is an enormous gush of
   output.  Number ranges are often too big for anything other than
   floating point.  Notwithstanding, there is always a problem of
   retaining some numerical significance so multiple precision
   arithmetic is vital.  Perhaps you can guess that there is just a
   limited number of genuine Supercomputing applications, but even this
   number shrinks dramatically when we try to see who is willing to pay
   for developing  a given problem.


Reference:

%A Stephen Nash, ed.
%T A History of Scientific Computing
%I ACM Press, Addison-Wesley
%C NY
%D 1990
%K book, text, survey, war stories, supercomputing, proceedings,
%X chapter titles:
Remembrance of Things Past
The Contribution of J. H. Wilkinson to Numerical Analysis
The Influence of George Forsythe and His Students
Howard H. Aiken and the Computer
Particles in Their Self-Consistent Fields:
	From Hartree's Differential Analyzer to Cray Machines
Fluid Dynamics, Reactor Computations, and Surface Representations
The Development of ODE Methods:
	A Symbiosis betweem Hardware and Numerical Analysis
A Personal Retrospection of Reservoir Simulation
How the FFT Gained Acceptance
	Conclusions:
        * Prompt publication of significant achievements is essental.
        * Reviews of old literature can be rewarding.
        * Communication among mathematicians, numerical analysts,
          and works in a wide range of applications can be fruitful.
        * Do not publish papers in neoclassic Latin.
Origins of the Simplex Method
Historical Comments on Finite Elements
Conjugacy and Gradients
A Historical Review of Iterative Methods
Shaping the Evolution of Numerical Analysis in the Computer Age: The SIAM Thrust
Reminiscences on the University of Michigan Summer Schools,
	the Gatlinberg Symposia, and Numerische Mathematik
The Origin of Mathematics of Computation and Some Person Recollections
Mathematical Software and ACM Publications
BIT -- A Child of the Computer
The Los Alamos Experience, 1943-1954
The Prehistory and Early History of Computation at the U.S. Bureau of Standards
Programmed Computing at the Universities of Cambridge and Illinois in
	the Early Fifties
Early Numerical Analysis in the United Kingdom
The Pioneer Days of Scientific Computing in Switzerland
The Development of Computational Mathematics in Czechoslovakia and the USSR
The Contribution of Leningrad Mathematicians to the
	Development of Numerical Linear Algebra in the Period of 1950-1986



I think in the end, it's all futile.
It seems our lot in life to suffer.....
Fortran and the oncoming MS.
We are doomed, doomed, doomed!