[CSUSB] >> [CompSci] >> /u/faculty/dick/cs320/sebesta/11

CS320 Data Abstraction

	http://www.csci.csusb.edu/dick/cs320/sebesta/11
	http://www.csci.csusb.edu/dick/cs320/sebesta/11.html

Access to FORTRAN

	All UNIX systems have a ready to use FORTRAN 77 compiler called
		f77
	The normal suffix is .f.  However Q/quickie doesn't know
	how to handle FORTRAN unless you say what to do in a Makefile.

Information on Ada

	The complete definition of Ada 83 -- the Language Reference
	Manual or LRM is on file at
		http://www.csci.csusb.edu/dick/cs320/ada/lrm
	Blaze has an Ada 83 compiler by using Q/quickie
		Q foo.ada
	for example and there are lots of samples in
		http://www.csci.csusb.edu/dick/cs320/ada
	Blaze doesn't have Netscape but you can use
		lynx  http://www.csci.csusb.edu/dick/cs320/ada
	to look at the programs (use arrow keys to high-light a file
	and tap Return/Enter key) and to download them
	(select file name and tap the letter 'd' and tap Return/Enter
	 twice).  Tap capital letter Q to quit.

Review Exercises in Chapter 11

	The answers to the Review questions are (hidden?) in the
	chapter so I *do* expect you to know the answers *even*
	if they are about a language you have never used!

	Review Question 13 is highly ambiguous.  You can substitute
	any or all of the following:
	13a.  How does a programmer define a new class of objects in
		C++.

	13b.  How does a program create a new object with a given class
		in C++.

	13c.  How do you declare an object as a member of a C++ class
		that occurs once in the class and is shared by all
		objects of that class?

Problems in Chapter 11

	I do not expect you to do problems 3, 6, 7, & 8 since they are about 
	FORTRAN or modula-2.  I don't plan to set Problems 11,12, 13, 15 
	since they involve implementing a data type that not all people 
	will have met before.  They can be attempted for homework however 
	if you already know what a queue, matrix, or complex number is.


Notes for chapter 11 of Sebesta.

	In a programming language you need to define:
		The syntax of the values and operations.
			XBNF/EBNF/BNF works well.
		The semantics of values and operations that are available
			Map values into known mathematical values
			Map operations into mathematical functions/operators
			The assumptions that are made in the mathematics.
			Any exceptions in the operations that can occur
				(dividing by zero)

	This permits you to work out an IMPLEMENTATION
	and show that it FITS the abstract form.
			See later in chapter 11 for languages that a
			let the programmer can do this.

	A data type is useful because it has 
		A set of values
		A set operations, functions,procedures, constants,...
		A set of assumptions.
		An associated abstract mathematical meaning.

	But to be able to use a data type in a language we need
		A particular implementation (which is usually imperfect).

	Do not confuse the particular implementation
		with the idea it implements.

	For example: The natural numbers
		Has values: 1,2,3,4,...
		Has operations: +*-/,...
		Has assumptions: a+b=b+a,...
		Has a set theory model: 0={}, 1= {0}, 2={0,1}, 3={0,1,2},...
	The assumptions give one form of meaning 
		(called axiomatic semantics) and
	the mathematical model gives another(denotational semantics).

	In C we use either unsigned or int when we need the natural numbers.
		The implementation is incomplete!

	A sample of a definition of a data type is at the end of these
	notes.

11.1 The Concept of Abstraction

Means removing or ignoring something.

What you you see is an abstraction from what is. What is inside a watch is
	hidden or encapsulated.
	======    ============
What you can use (from outside the watch) is the abstraction.

11.2

To some extent in programming the situation is like this:
	Abstract-->implement-->Concrete
We often specify and use an abstraction before we supply the complete
inside information.  Compilers have supported this for 40 years -- see
the History below.  Modern Programming languages have explicit 
support for this approach.

	Theory		Ada		C	C++	Java
        ---------------------------------------------------------------
	Abstraction	Specification	*.h	*.h	interface, abstract
	Details		Body		*.c	*.cc	class

Abstraction and information hiding are important ideas because reliable
systems have to be made from components that are themselves reliable and a
component's reliability depends on its resistance to outside disturbances.
A PC, for example uses a CPU that is an integrated circuited, encapsulated
in a special box with a standard pin layout. The construction process 
avoids changing the internal "wiring" of the CPU. The internal design is 
hidden from the CPU board. The PC designer uses an abstract model of the 
circuit.

Consider Intel CPUs vs AMD, Weitec, NEC, etc cloning the same behavior with
different circuits!

Consider Babbage - work slowed down because no two bolts had the same 
thread!

Consider Roebling - Work on Brooklin Bridge delayed because cables were not
     up to the specified strengths - fraudulent Quality Control
     procedures.

Large pieces of software are also made by combining small "modules".
Small "chips" are easy to connect together - as long as the connections
are described:  interfaces and abstractions state what "connections"
work.

Abstraction, encapsulation and information hiding make modules that are 
harder to misuse.

Unfortunately these ideas are not obvious - it took 30 years for the ideas
to mature. The goal has been to make it difficult to be foolish enough to 
have a problem.  Another difficulty is that what is visible to a programmer
using a module is often hidden from the program that uses the module!

They don't make sense with a toy project. The payoff is in a BIG long term
piece of software...

Section 11.3 Intro to Abstract Data Types

	floating point  
	http://www.csci.csusb.edu/dick/cs320/stacks

Historical Techniques (Not in book...)


Machine code:
In 1951 Wilkes published the first description of a subroutine library and
showed its effectiveness in simplifying the programming task. Each subprogram
can be designed, encoded, filed and reused many times during the life of a
system. To do this a sub-program has to be stored in a code that lets it be
put any where in a program. This is called "object code." Some of the
addresses are in relative addresses and others are stored as labels or names.
Other modules (normally) refer to parts by these names. Thus the set of names
defines the interface. Reliable reuse came from using these names not the
addresses.  This made it possible to rewrite the subprogram and not have to
rewrite the code that called it.

Assembler:
In assembly each label or name becomes an address when loaded. During the
assembly and linking processes names are placed in tables and ultimately
connected with addresses. Most labels are removed from object code in
libraries. After loading ALL labels must have become addresses. This process
is called linking or link editing.

High level languages rely on the link editing process to connect modules
together and so make unwanted connections difficult to make. High level
languages enabled the collection of machine independent libraries of
subroutines.

C:
For example a C program is a collection of files.   There are header files
that are included in other files (*.h).  The code files (*.c) can be
compiled independently into "object" code (*.o).  The last stage of compiling
the whole is to link these *.o files into an executable file.  In essence
the header files give the compiler what it needs to know about other
*.c files when compiling a particular *.c file.

FORTRAN:
In FORTRAN each subprogram is an independently compiled subprogram.
Subprograms share "COMMON" data. COMMON blocks are named and the names are
visible to ALL subprograms. However the structure is hidden from other
subroutines(by the compiler). Thus a COMMON block can be used in ANY component
in ANY way. A programmer can fail to describe a COMMON block consistently and
two programmers can accidently choose the same name for two different common
blocks.  This is makes FORTRAN more unreliable. Subprograms are always global
and have uncheckable interfaces. Constantyne and Myers based the structured
design method of the late 60's on teaching programmers to minimize these
interfaces.

For example to implement a stack of characters in FORTRAN one might write:

	SUBROUTINE PUSH(C)
	COMMON /STACK/ISP,ST
	CHARACTER C, ST(100)
	ISP=ISP+1
	ST(ISP)=C
	RETURN
	END

	SUBROUTINE POP
	COMMON /STACK/ISP
	ISP=ISP-1
	RETURN
	END

	FUNCTION TOP(DUMMY)
	COMMON /STACK/ISP,ST(200)
	CHARACTER TOP
	TOP=ST(ISP)
	RETURN
	END

Can you find the bug?

In COBOL data and subprograms are either hidden (by CALLing a compiled
subprogram) or globally accessible in complicated and unpredictable
ways(PERFORM). The data and the procedures are widely separated in the code
making libraries harder to create.

For example, in the data division's working storage section one would 
write:
	01 STACK.
		03 ST1.
			05 TOP  PIC X.
			05 REST PIC X(99)
		03 ST2 REDEFINES ST1.
			05 TOP99 PIC X(99).
			05 BOTTOM PIC X.
	01 EXTRA. 03 XTRA PIC X(99).

and add to the procedure division:

PUSH. MOVE TOP99 TO EXTRA MOVE NEW TO TOP MOVE EXTRA TO REST.

POP.  MOVE REST TO EXTRA MOVE EXTRA TO STACK.

During the 1960's it was unfashionable to design a language that catered for
independent compilation. The idea of 'module' at this time was a single
subprogram that could be included in a program. Pascal is a typical 
example.

In Pascal there is no independent compilation and the only "module" is the
subprogram (procedure and/or function).  The only way to add a new subprogram
to a Pascal program is to *include* its complete definition in the source code
of the program.
To use 'power' for instance, then (in the official Pascal) the source code for
the subprogram must be copied into the declarations of the client program.

For example, a stack might be defined by:
{Adding two new types:}
        type stype=^snode;
	     snode=record top:char; rest:stype end;

{Adding a new variable:}
        var  stack:stype;

{Adding these subprograms:}
function top(s:stype):char;
begin top:=stack^.top;
end;
procedure push( c:char);
var n:stype;
begin new(n);
        with n do begin top:=c; rest:=stack end;
      stack:=n
end;
procedure pop;....
var old:stype;
begin
	old=stack;
	stack:=old^.rest;
	delete old
end;


Notice that the pre-written subprograms that manipulate a stack refer to it
and so it is part of the client's declarations. Thus the client can do
anything with the stack - and so subvert the designer's assumptions.

Modularization in Standard Pascal is less reliable as a result. This is
equally true of ALGOL 60, ALGOL 68 and BCPL. PL/I combined the features of
FORTRAN, COBOL and ALGOL and added nothing to the understanding of
modularization.

There has been research on 'modular' programming. Parnas published a key paper
that argued that designs that decompose programs into subprograms are
inherently harder to maintain than those where each data structure is placed
with the operations necessary to maintain it and with its design irrelevant to
the programmers using it.
	http://www.acm.org/classics/

At the start of the 1970s C was being designed to support the independent
compilation of collections of variables and subprograms. Kernighan and Ritchie
make it clear that that this was seen as a better kind of module.

A C program consists of a set of separate files that are linked together after
compilation. The person who writes and compiles a given file has the freedom
to permit other programmers to refer to functions or variables or to hide
their names so that they can not be referred to by other programmers 
(accept by modifying the code or using absolute addresses and the like).  
Such a file can be compiled and the hidden names are removed from the 
object file. Some function and variable names are left as labels of 
addresses in that segment of code. These names can be linked with names 
declared (but not defined) in another file. Kernighan and Ritchie give an 
example.

For example a typical C program file (named stack.c) contains data storage and
operations for manipulating the data in a safe way.

/* stack.c -- A stack of characters RJB 1990 */
/* defines a single stack object with functions to access and manipulate it
*/
#include <stdio.h>

#define MAXSTACK 60
/* this defines the hidden structure of the stack*/
static int sp=0; /*stack pointer*/
static char st[MAXSTACK]; /*contents of stack is st[0..sp-1]*/

/*These allow the stack to be manipulated*/

int empty(){return (!sp);}

int full(){return (sp==MAXSTACK-1);}

int top(){return (!empty()?st[sp-1]:EOF);}

void pop(){if(!empty())st[--sp]=0; else...}

void push(c) char c;{if(!full())st[sp++]=c; else...}


Compilation (cc -c stack.c ) generates an object module called stack.o 
which has had all static identifyers replaced by addresses.

C also permitted the separation of the declaration of an identifier (which
tells the compiler what it needs to know) from the definition (which 
provides the code). For example:

A "header file"(stack.h) declares the types of various identifiers:

/* stack.h -- header file for stack object*/
/*non-ANSI!*/
extern int empty(),full(), top();
extern void push(), pop ();

Thus C allows the encapsulation of a data structure in a module so that it
can only be accessed in restricted ways.

For realistic examples see the source code for LISP, Prolog and Smalltalk.
	(~dick/src/{lisp, prolog, smalltalk}) on our servers
or
	http://www.csci.csusb.edu/dick/cs320/prolog/src
and
	http://www.csci.csusb.edu/dick/cs320/lisp/src
Notice an immediate benefit of modularizing - if a change effects only one
module and leaves its specification the same then only that module needs
recompilation.   The newly compiled version can be linked to the old
compiled (*.o) code to give them new program.



However a C program that can refer to a type name can also refer to the 
names that are used to access its components
	- and so the structure of a defined
type can NOT be hidden from a program that uses that type.

For example if a C programmer attempts to create a new data type called 
STACK, so that other programmers can have as many stacks as they want 
(rather than just one) then he or she has to put it in the header file:

/* stacks.h ... A int Stack DATA TYPE in C  (not C++)*/
#define MAXSTACK 60
/*the new type is defined*/
typedef struct stack { int sp; int st[MAXSTACK];} STACK;
/*non-ANSI!*/
extern int empty(),full(), top();
extern void push(), pop ();
/*end of stack.h*/

This declares 'STACK' so that the client can have
#include "stacks.h"
STACK s1,s2,s3;
...
	push(s1,e);push(s2,top(s1)); pop(s1);...

The matching stack.c can be written and compiled independently to
this client module.
/*stacks.c   - stack data type*/
#include "stacks.h"
int empty(STACK s) { return !s.sp;}
....
int top(STACK s) {return s.st[sp-1];}
...
void pop(STACK *s) { (*s).sp--;}
...
/*end of stack.c...*/

compilation
	cc -c stacks.c
This makes
	stack.o

To use stacks.c the client module (client.c) must
	(1) #include "stacks.h"
	(2) be compiled
		cc -c client.c
	(3) be linked
		cc -o prog client.o stacks.o
However the header defines STACK and so additionally permits the client
to do things like this:

        s1.sp=10;
        s2.st[3]=0;

Which implies that the programmer of the STACK can not stop bad things from
happening. A solution is to have a dummy declaration which indicates the 
name and the size, but not the structure of STACK - but this always still 
confers the ability to access the data directly.

There is a way of hiding the structure of the data by using what are
called "opaque pointer types".  In essence this means that the
header file declares the data type to be a "void *" and the
".c" file actually defines the structure.  All the structures are then
handled indirectly by pointers.  For more see
.Set
	C Interfaces and Implementations
	Techniques for creating reusable software
	David R Hanson
	Addison-Wesley 1996
	ISBN 0-201-49841-3

SIMULA (11.5.1) permits the controlled creation and operation of a set of 
stacks and lead to the idea of object oriented programming however this is
via a special run time mechanism rather than a way of packaging at compile
time.

Parnas developed his ideas into the form that it should be possible to 
specify a type in terms of its operations and their properties. Hence, a 
programmer should be able to define, implement, test and distribute a new 
data type - with out giving away the structure and design of the data type.
Programs that use it are not able to do anything other than the operations
that the programmer specifies.

In Modular II(11.5.2), Java, EDISON, C++(11.5.4), and Ada(11.5.3) it is 
possible to compile a package of type, data, and subprogram declarations, 
and then hide some of the names, while keeping others.

Java provides the clearest mechanism for separating implementation
from abstraction.  In Java you can declare "interfaces" that a no
more than lists of function prototypes something like the following:
   public interface Stackable{
	public void push(Object);
	public Object top();
	public void pop();
	public bool empty();
	public int size();
   };

Any new class can provide implementations of this interface:
  public class SequentialStack implements Stackable{

	...

   }
and the compiler makes sure that all the functions are declared.  You
can then declare Objects that a Stackable and set them equal
to "new SequentialStack".  The compiler then lets you apply the functions in
the Stackable interface to this object.

Here are some pointers to Java interfaces:
http://csci.csusb.edu/public/faculty/dick/lab2.html#Interfaces

The next step in abstraction is to think of the data as objects which
are used by sending messages to them (as in simula) and adding the ability
to efficiently define one data type as a special form of another.  C++ is
an attempt to extend C in this direction.  This shows that data abstraction
leads to objects and the so called object oriented languages.

Notice that the development of data abstraction explicitly rejects and 
moves away from that of functional programming which makes all data 
explicit and rejects the idea of a stored (or hidden) state that changes.

There is a religious war between the functionalists and the 
object-oriented.  However the static scoping of Scheme and XLISP
allows the creation of things that are close to abstract data types.
After input of this
	( LET ( (stack () ) )
	      (defun push (X)  (setq stack (cons X stack)))
	      (defun  pop ( )  (setq stack (cdr stack)))
	      (defun  top ( )  (car stack))
	      (defun empty( )  (null stack))
	)
LISP has got a s stack added to it:
	(push 1) (push 2) (top) (pop) (top) .....
The code(LET...) above is in 
	http://www.csci.csusb.edu/dick/cs320/stacks/stack0.lsp

For a fully object orient LISP Stacj.... see
	http://www.csci.csusb.edu/dick/cs320/stacks/stack.lsp

Examples.


The problem of defining a random number generator is a good example of
a problem which requires hidden information. 

Test program
	http://www.csci.csusb.edu/dick/cs320/c/trandom.c
ADT:
	http://www.csci.csusb.edu/dick/cs320/c/random.h
	http://www.csci.csusb.edu/dick/cs320/c/random.c
The ANSI C version is
	http://www.csci.csusb.edu/dick/cs320/c/random.ansi.h
An Ada version in
	http://www.csci.csusb.edu/dick/cs320/ada/random.ads
	http://www.csci.csusb.edu/dick/cs320/ada/random.adb

An interesting attempt to hide the platform on which
a program is running that appears
	http://www.csci.csusb.edu/dick/cs320/lisp/src/oscalls.c
Use your browsers search function to find
	osrand(ival) int ival;

A pair of Prolog random number generators are in
	http://www.csci.csusb.edu/dick/cs320/prolog/nurandom.pl
	http://www.csci.csusb.edu/dick/cs320/prolog/random.pl
They are used in things like
	http://www.csci.csusb.edu/dick/cs320/prolog/story.pl
	http://www.csci.csusb.edu/dick/cs320/prolog/lotto93.pl

The shell has a random number command as well
	man 1 random

My 'cookie' script shows that it is not necessarily to generate
random numbers.... sometimes hashing the data and time is OK:
	/home/bin/cookie aka /usr/local/bin/cookie aka /share/bin/cookie
	http://www.csci.csusb.edu/dick/tools/cookie

Disclaimer.

In practice the random number generators in most platforms are
better than the examples given above.  For example,
the C library provides a pair of packages which are better than the
ones given above.  The following is a silly example:
	http://www.csci.csusb.edu/dick/cs320/c/rap.c

R.T.F.M. 
	man srand
	man srandom
	man 3 random