[CSUSB] >>
[CompSci] >>
/u/faculty/dick/cs320/sebesta/11
CS320 Data Abstraction
http://www.csci.csusb.edu/dick/cs320/sebesta/11
http://www.csci.csusb.edu/dick/cs320/sebesta/11.html
Access to FORTRAN
All UNIX systems have a ready to use FORTRAN 77 compiler called
f77
The normal suffix is .f. However Q/quickie doesn't know
how to handle FORTRAN unless you say what to do in a Makefile.
Information on Ada
The complete definition of Ada 83 -- the Language Reference
Manual or LRM is on file at
http://www.csci.csusb.edu/dick/cs320/ada/lrm
Blaze has an Ada 83 compiler by using Q/quickie
Q foo.ada
for example and there are lots of samples in
http://www.csci.csusb.edu/dick/cs320/ada
Blaze doesn't have Netscape but you can use
lynx http://www.csci.csusb.edu/dick/cs320/ada
to look at the programs (use arrow keys to high-light a file
and tap Return/Enter key) and to download them
(select file name and tap the letter 'd' and tap Return/Enter
twice). Tap capital letter Q to quit.
Review Exercises in Chapter 11
The answers to the Review questions are (hidden?) in the
chapter so I *do* expect you to know the answers *even*
if they are about a language you have never used!
Review Question 13 is highly ambiguous. You can substitute
any or all of the following:
13a. How does a programmer define a new class of objects in
C++.
13b. How does a program create a new object with a given class
in C++.
13c. How do you declare an object as a member of a C++ class
that occurs once in the class and is shared by all
objects of that class?
Problems in Chapter 11
I do not expect you to do problems 3, 6, 7, & 8 since they are about
FORTRAN or modula-2. I don't plan to set Problems 11,12, 13, 15
since they involve implementing a data type that not all people
will have met before. They can be attempted for homework however
if you already know what a queue, matrix, or complex number is.
Notes for chapter 11 of Sebesta.
In a programming language you need to define:
The syntax of the values and operations.
XBNF/EBNF/BNF works well.
The semantics of values and operations that are available
Map values into known mathematical values
Map operations into mathematical functions/operators
The assumptions that are made in the mathematics.
Any exceptions in the operations that can occur
(dividing by zero)
This permits you to work out an IMPLEMENTATION
and show that it FITS the abstract form.
See later in chapter 11 for languages that a
let the programmer can do this.
A data type is useful because it has
A set of values
A set operations, functions,procedures, constants,...
A set of assumptions.
An associated abstract mathematical meaning.
But to be able to use a data type in a language we need
A particular implementation (which is usually imperfect).
Do not confuse the particular implementation
with the idea it implements.
For example: The natural numbers
Has values: 1,2,3,4,...
Has operations: +*-/,...
Has assumptions: a+b=b+a,...
Has a set theory model: 0={}, 1= {0}, 2={0,1}, 3={0,1,2},...
The assumptions give one form of meaning
(called axiomatic semantics) and
the mathematical model gives another(denotational semantics).
In C we use either unsigned or int when we need the natural numbers.
The implementation is incomplete!
A sample of a definition of a data type is at the end of these
notes.
11.1 The Concept of Abstraction
Means removing or ignoring something.
What you you see is an abstraction from what is. What is inside a watch is
hidden or encapsulated.
====== ============
What you can use (from outside the watch) is the abstraction.
11.2
To some extent in programming the situation is like this:
Abstract-->implement-->Concrete
We often specify and use an abstraction before we supply the complete
inside information. Compilers have supported this for 40 years -- see
the History below. Modern Programming languages have explicit
support for this approach.
Theory Ada C C++ Java
---------------------------------------------------------------
Abstraction Specification *.h *.h interface, abstract
Details Body *.c *.cc class
Abstraction and information hiding are important ideas because reliable
systems have to be made from components that are themselves reliable and a
component's reliability depends on its resistance to outside disturbances.
A PC, for example uses a CPU that is an integrated circuited, encapsulated
in a special box with a standard pin layout. The construction process
avoids changing the internal "wiring" of the CPU. The internal design is
hidden from the CPU board. The PC designer uses an abstract model of the
circuit.
Consider Intel CPUs vs AMD, Weitec, NEC, etc cloning the same behavior with
different circuits!
Consider Babbage - work slowed down because no two bolts had the same
thread!
Consider Roebling - Work on Brooklin Bridge delayed because cables were not
up to the specified strengths - fraudulent Quality Control
procedures.
Large pieces of software are also made by combining small "modules".
Small "chips" are easy to connect together - as long as the connections
are described: interfaces and abstractions state what "connections"
work.
Abstraction, encapsulation and information hiding make modules that are
harder to misuse.
Unfortunately these ideas are not obvious - it took 30 years for the ideas
to mature. The goal has been to make it difficult to be foolish enough to
have a problem. Another difficulty is that what is visible to a programmer
using a module is often hidden from the program that uses the module!
They don't make sense with a toy project. The payoff is in a BIG long term
piece of software...
Section 11.3 Intro to Abstract Data Types
floating point
http://www.csci.csusb.edu/dick/cs320/stacks
Historical Techniques (Not in book...)
Machine code:
In 1951 Wilkes published the first description of a subroutine library and
showed its effectiveness in simplifying the programming task. Each subprogram
can be designed, encoded, filed and reused many times during the life of a
system. To do this a sub-program has to be stored in a code that lets it be
put any where in a program. This is called "object code." Some of the
addresses are in relative addresses and others are stored as labels or names.
Other modules (normally) refer to parts by these names. Thus the set of names
defines the interface. Reliable reuse came from using these names not the
addresses. This made it possible to rewrite the subprogram and not have to
rewrite the code that called it.
Assembler:
In assembly each label or name becomes an address when loaded. During the
assembly and linking processes names are placed in tables and ultimately
connected with addresses. Most labels are removed from object code in
libraries. After loading ALL labels must have become addresses. This process
is called linking or link editing.
High level languages rely on the link editing process to connect modules
together and so make unwanted connections difficult to make. High level
languages enabled the collection of machine independent libraries of
subroutines.
C:
For example a C program is a collection of files. There are header files
that are included in other files (*.h). The code files (*.c) can be
compiled independently into "object" code (*.o). The last stage of compiling
the whole is to link these *.o files into an executable file. In essence
the header files give the compiler what it needs to know about other
*.c files when compiling a particular *.c file.
FORTRAN:
In FORTRAN each subprogram is an independently compiled subprogram.
Subprograms share "COMMON" data. COMMON blocks are named and the names are
visible to ALL subprograms. However the structure is hidden from other
subroutines(by the compiler). Thus a COMMON block can be used in ANY component
in ANY way. A programmer can fail to describe a COMMON block consistently and
two programmers can accidently choose the same name for two different common
blocks. This is makes FORTRAN more unreliable. Subprograms are always global
and have uncheckable interfaces. Constantyne and Myers based the structured
design method of the late 60's on teaching programmers to minimize these
interfaces.
For example to implement a stack of characters in FORTRAN one might write:
SUBROUTINE PUSH(C)
COMMON /STACK/ISP,ST
CHARACTER C, ST(100)
ISP=ISP+1
ST(ISP)=C
RETURN
END
SUBROUTINE POP
COMMON /STACK/ISP
ISP=ISP-1
RETURN
END
FUNCTION TOP(DUMMY)
COMMON /STACK/ISP,ST(200)
CHARACTER TOP
TOP=ST(ISP)
RETURN
END
Can you find the bug?
In COBOL data and subprograms are either hidden (by CALLing a compiled
subprogram) or globally accessible in complicated and unpredictable
ways(PERFORM). The data and the procedures are widely separated in the code
making libraries harder to create.
For example, in the data division's working storage section one would
write:
01 STACK.
03 ST1.
05 TOP PIC X.
05 REST PIC X(99)
03 ST2 REDEFINES ST1.
05 TOP99 PIC X(99).
05 BOTTOM PIC X.
01 EXTRA. 03 XTRA PIC X(99).
and add to the procedure division:
PUSH. MOVE TOP99 TO EXTRA MOVE NEW TO TOP MOVE EXTRA TO REST.
POP. MOVE REST TO EXTRA MOVE EXTRA TO STACK.
During the 1960's it was unfashionable to design a language that catered for
independent compilation. The idea of 'module' at this time was a single
subprogram that could be included in a program. Pascal is a typical
example.
In Pascal there is no independent compilation and the only "module" is the
subprogram (procedure and/or function). The only way to add a new subprogram
to a Pascal program is to *include* its complete definition in the source code
of the program.
To use 'power' for instance, then (in the official Pascal) the source code for
the subprogram must be copied into the declarations of the client program.
For example, a stack might be defined by:
{Adding two new types:}
type stype=^snode;
snode=record top:char; rest:stype end;
{Adding a new variable:}
var stack:stype;
{Adding these subprograms:}
function top(s:stype):char;
begin top:=stack^.top;
end;
procedure push( c:char);
var n:stype;
begin new(n);
with n do begin top:=c; rest:=stack end;
stack:=n
end;
procedure pop;....
var old:stype;
begin
old=stack;
stack:=old^.rest;
delete old
end;
Notice that the pre-written subprograms that manipulate a stack refer to it
and so it is part of the client's declarations. Thus the client can do
anything with the stack - and so subvert the designer's assumptions.
Modularization in Standard Pascal is less reliable as a result. This is
equally true of ALGOL 60, ALGOL 68 and BCPL. PL/I combined the features of
FORTRAN, COBOL and ALGOL and added nothing to the understanding of
modularization.
There has been research on 'modular' programming. Parnas published a key paper
that argued that designs that decompose programs into subprograms are
inherently harder to maintain than those where each data structure is placed
with the operations necessary to maintain it and with its design irrelevant to
the programmers using it.
http://www.acm.org/classics/
At the start of the 1970s C was being designed to support the independent
compilation of collections of variables and subprograms. Kernighan and Ritchie
make it clear that that this was seen as a better kind of module.
A C program consists of a set of separate files that are linked together after
compilation. The person who writes and compiles a given file has the freedom
to permit other programmers to refer to functions or variables or to hide
their names so that they can not be referred to by other programmers
(accept by modifying the code or using absolute addresses and the like).
Such a file can be compiled and the hidden names are removed from the
object file. Some function and variable names are left as labels of
addresses in that segment of code. These names can be linked with names
declared (but not defined) in another file. Kernighan and Ritchie give an
example.
For example a typical C program file (named stack.c) contains data storage and
operations for manipulating the data in a safe way.
/* stack.c -- A stack of characters RJB 1990 */
/* defines a single stack object with functions to access and manipulate it
*/
#include <stdio.h>
#define MAXSTACK 60
/* this defines the hidden structure of the stack*/
static int sp=0; /*stack pointer*/
static char st[MAXSTACK]; /*contents of stack is st[0..sp-1]*/
/*These allow the stack to be manipulated*/
int empty(){return (!sp);}
int full(){return (sp==MAXSTACK-1);}
int top(){return (!empty()?st[sp-1]:EOF);}
void pop(){if(!empty())st[--sp]=0; else...}
void push(c) char c;{if(!full())st[sp++]=c; else...}
Compilation (cc -c stack.c ) generates an object module called stack.o
which has had all static identifyers replaced by addresses.
C also permitted the separation of the declaration of an identifier (which
tells the compiler what it needs to know) from the definition (which
provides the code). For example:
A "header file"(stack.h) declares the types of various identifiers:
/* stack.h -- header file for stack object*/
/*non-ANSI!*/
extern int empty(),full(), top();
extern void push(), pop ();
Thus C allows the encapsulation of a data structure in a module so that it
can only be accessed in restricted ways.
For realistic examples see the source code for LISP, Prolog and Smalltalk.
(~dick/src/{lisp, prolog, smalltalk}) on our servers
or
http://www.csci.csusb.edu/dick/cs320/prolog/src
and
http://www.csci.csusb.edu/dick/cs320/lisp/src
Notice an immediate benefit of modularizing - if a change effects only one
module and leaves its specification the same then only that module needs
recompilation. The newly compiled version can be linked to the old
compiled (*.o) code to give them new program.
However a C program that can refer to a type name can also refer to the
names that are used to access its components
- and so the structure of a defined
type can NOT be hidden from a program that uses that type.
For example if a C programmer attempts to create a new data type called
STACK, so that other programmers can have as many stacks as they want
(rather than just one) then he or she has to put it in the header file:
/* stacks.h ... A int Stack DATA TYPE in C (not C++)*/
#define MAXSTACK 60
/*the new type is defined*/
typedef struct stack { int sp; int st[MAXSTACK];} STACK;
/*non-ANSI!*/
extern int empty(),full(), top();
extern void push(), pop ();
/*end of stack.h*/
This declares 'STACK' so that the client can have
#include "stacks.h"
STACK s1,s2,s3;
...
push(s1,e);push(s2,top(s1)); pop(s1);...
The matching stack.c can be written and compiled independently to
this client module.
/*stacks.c - stack data type*/
#include "stacks.h"
int empty(STACK s) { return !s.sp;}
....
int top(STACK s) {return s.st[sp-1];}
...
void pop(STACK *s) { (*s).sp--;}
...
/*end of stack.c...*/
compilation
cc -c stacks.c
This makes
stack.o
To use stacks.c the client module (client.c) must
(1) #include "stacks.h"
(2) be compiled
cc -c client.c
(3) be linked
cc -o prog client.o stacks.o
However the header defines STACK and so additionally permits the client
to do things like this:
s1.sp=10;
s2.st[3]=0;
Which implies that the programmer of the STACK can not stop bad things from
happening. A solution is to have a dummy declaration which indicates the
name and the size, but not the structure of STACK - but this always still
confers the ability to access the data directly.
There is a way of hiding the structure of the data by using what are
called "opaque pointer types". In essence this means that the
header file declares the data type to be a "void *" and the
".c" file actually defines the structure. All the structures are then
handled indirectly by pointers. For more see
.Set
C Interfaces and Implementations
Techniques for creating reusable software
David R Hanson
Addison-Wesley 1996
ISBN 0-201-49841-3
SIMULA (11.5.1) permits the controlled creation and operation of a set of
stacks and lead to the idea of object oriented programming however this is
via a special run time mechanism rather than a way of packaging at compile
time.
Parnas developed his ideas into the form that it should be possible to
specify a type in terms of its operations and their properties. Hence, a
programmer should be able to define, implement, test and distribute a new
data type - with out giving away the structure and design of the data type.
Programs that use it are not able to do anything other than the operations
that the programmer specifies.
In Modular II(11.5.2), Java, EDISON, C++(11.5.4), and Ada(11.5.3) it is
possible to compile a package of type, data, and subprogram declarations,
and then hide some of the names, while keeping others.
Java provides the clearest mechanism for separating implementation
from abstraction. In Java you can declare "interfaces" that a no
more than lists of function prototypes something like the following:
public interface Stackable{
public void push(Object);
public Object top();
public void pop();
public bool empty();
public int size();
};
Any new class can provide implementations of this interface:
public class SequentialStack implements Stackable{
...
}
and the compiler makes sure that all the functions are declared. You
can then declare Objects that a Stackable and set them equal
to "new SequentialStack". The compiler then lets you apply the functions in
the Stackable interface to this object.
Here are some pointers to Java interfaces:
http://csci.csusb.edu/public/faculty/dick/lab2.html#Interfaces
The next step in abstraction is to think of the data as objects which
are used by sending messages to them (as in simula) and adding the ability
to efficiently define one data type as a special form of another. C++ is
an attempt to extend C in this direction. This shows that data abstraction
leads to objects and the so called object oriented languages.
Notice that the development of data abstraction explicitly rejects and
moves away from that of functional programming which makes all data
explicit and rejects the idea of a stored (or hidden) state that changes.
There is a religious war between the functionalists and the
object-oriented. However the static scoping of Scheme and XLISP
allows the creation of things that are close to abstract data types.
After input of this
( LET ( (stack () ) )
(defun push (X) (setq stack (cons X stack)))
(defun pop ( ) (setq stack (cdr stack)))
(defun top ( ) (car stack))
(defun empty( ) (null stack))
)
LISP has got a s stack added to it:
(push 1) (push 2) (top) (pop) (top) .....
The code(LET...) above is in
http://www.csci.csusb.edu/dick/cs320/stacks/stack0.lsp
For a fully object orient LISP Stacj.... see
http://www.csci.csusb.edu/dick/cs320/stacks/stack.lsp
Examples.
The problem of defining a random number generator is a good example of
a problem which requires hidden information.
Test program
http://www.csci.csusb.edu/dick/cs320/c/trandom.c
ADT:
http://www.csci.csusb.edu/dick/cs320/c/random.h
http://www.csci.csusb.edu/dick/cs320/c/random.c
The ANSI C version is
http://www.csci.csusb.edu/dick/cs320/c/random.ansi.h
An Ada version in
http://www.csci.csusb.edu/dick/cs320/ada/random.ads
http://www.csci.csusb.edu/dick/cs320/ada/random.adb
An interesting attempt to hide the platform on which
a program is running that appears
http://www.csci.csusb.edu/dick/cs320/lisp/src/oscalls.c
Use your browsers search function to find
osrand(ival) int ival;
A pair of Prolog random number generators are in
http://www.csci.csusb.edu/dick/cs320/prolog/nurandom.pl
http://www.csci.csusb.edu/dick/cs320/prolog/random.pl
They are used in things like
http://www.csci.csusb.edu/dick/cs320/prolog/story.pl
http://www.csci.csusb.edu/dick/cs320/prolog/lotto93.pl
The shell has a random number command as well
man 1 random
My 'cookie' script shows that it is not necessarily to generate
random numbers.... sometimes hashing the data and time is OK:
/home/bin/cookie aka /usr/local/bin/cookie aka /share/bin/cookie
http://www.csci.csusb.edu/dick/tools/cookie
Disclaimer.
In practice the random number generators in most platforms are
better than the examples given above. For example,
the C library provides a pair of packages which are better than the
ones given above. The following is a silly example:
http://www.csci.csusb.edu/dick/cs320/c/rap.c
R.T.F.M.
man srand
man srandom
man 3 random