Disclaimer. CSUSB and the CS Dept have no responsibility for the content of this page.
Copyright. Richard J. Botting ( Wed Dec 20 09:16:51 PST 2006 ). Permission is granted to quote and use this document as long as the source is acknowledged.
Note Added September 1995
The last section explains a crucial limitation on this theory.
[ Limitations ]
Note Added February 1997
The resolution of these limitations is tackled in a later
paper:
[ rjb9x.Timed.Relations.html ]
Note Added July 1997
An alternative resolution is to associate each statement
in a programming language with two relations rather
than a single relation. The first relation describes
what happens if and when the statement terminates and
the second relation describes when the statement fails
to terminate. There is an excellent paper on the abstract algebra
of relations and their use in program semantics by Roger D.
Maddux,
[ Relation Algebras in math_45_Three_Operators ]
. . . . . . . . . ( end of section Introduction) <<Contents | End>>
Basis
Plus the assumption that the same logic can be used to talk about things of all types - objects, sets, sets of sets, relations, sets of relations, and so on.
The set of all relations between sets A and B is written @(A,B). A relation is typically defined by a formula like the following:
We can also express relations as sets of pairs or as boolean functions depending on the context:
Program & System Specifications
Relations behave like the operations and conditions in a programming language
because each operation in a program(or an object!) defines a relationship
between the current state and the possible future states. If there is always
a single next state then the relation is deterministic. By composing
deterministic steps carefully we get an algorithm. In turn an algorithm can be
coded for input to a suitable translator giving an executable program.
Alternately we can start from a nondeterministic relation that describes a set
of acceptable behaviors and let the implementor selects a particular subset as
an implementation [compare with Kowalski, Hoare 87].
A relation in R:@(T1,T2) can be defined by
Or equivalently by
The inverse /R is the converse relation to R. For example if (_^2) is the 'square' function then /(_^2) is the square root relation. Similarly the arc sine function is written /sin. Further we want /parent_of=child_of. I define:
Given two binary relations R and S then
R; S = The relation between (1st) and (2nd) such that for some z, (1st) R z S (2nd),
R | S = The relation between (1st) and (2nd) such that either (1st) R (2nd) or (1st) S (2nd),
R & S = The relation between (1st) and (2nd) such that both (1st) R (2nd) and (1st)S (2nd).
For R: @(T1,T2), S:@(T2,T3),
In context, a set can also be used as a relation:
Complex situations can be defined quickly using these notations:
Kinship Example 1
male, female::@being.
Relations that change
Relationships change in real systems. Indeed most Z specifications contain
scheme that describe the changes that can occur in a collection of sets and
relations. For example, in a university above a student can add and drop
sections so if L represents the current relation student (1st) is enrolled in section (2nd) and L' the future form of the relation:
Kinship Example 2
Relationships can also be defined in a EBNF like form - complete with
recursion:
Trasitive Closures
The definitions of ancestor and lineal_descendant illustrate the
transitive closure or ancestral relation of parent and child. The
reflexive transitive closure is important and defined as follows:
Example:
The defining property of do is that if something is invariant (or fixed, or
closed) under R then it will not be changed by any number of Rs
Footnote1:
Example:
Where
Example:
We can also define the power of a relation [cf Antoy & Gannon 94] via the following equations:
Similar to the idea of an invariant set of states is the idea of a stable function:
and, so if f in stable(R) then f in stable(do(R)).
Also for all R and f, f is stable under no(R). These ideas are a simple and intuitive way to handle mathematical induction, but have a solid logical base in Principia Mathmatica [Whitehead & Russell].
Program Like Structures: if, while, etc
Relational expressions include structured programs because I define:
I can define an extension of Dijkstra's Guarded Commands(below) and Ada-like control structures as well:
Basic Operations
To make a usable system I need a way to specify simple changes to variables in
a program. The key definition is that of a dynamic predicate. A predicate
is a proposition that has free variables. A dynamic predicate has free
variables with primes (') attached to them. Here is an example:
After the statement, the value of x is equal to the sum of the old values of x and y. Also is unchanged. It could be coded as x:=x+y in Ada/Pascal/... The rule is that "a prime means the new and no prime means the old" [from Ross Ashby 56, compare Part II of Lam & Shankar 90, Hehner 84a & 84b, Guttag & Horning 91, Steward 93]. Formally x'=x+y defines a relation on a structured set with variables x and y. So
guarantees that x is 1 afterwards and also that no other variables change.
means that the value of x will be complemented - its new value will be the result of subtracting its old value from one while all other variables stay the same.
has no primed variable. It means that x has the value one, and does not change - all variables are left untouched. It is a relation that holds between identical states when and only when x has value 1. It filters out all states when x is not 1. It allows computation under a condition. We call it a condition therefore. It can be shown that it is a condition in the sense of Arbib and Manes [87].
Here is a more general statement of the MATHS interpretation of a predicate that contains variables with primes. Consider a simple state space S which declares m variables x1, x2,...x[m] of types X1, X2, ...X[M]:
Let P(y1',y2',y3',...y[n]',x1,x2,x3,...x[m]) be a predicate with n distinct primed variables(y1, y2,...y[n]) which may also appear without primes as xs. For example in x'=x+y, x1=x, x2=y, and y1=x.
The relation defined by P(y1', y2', y3', ...y[n]', x1, x2, x3, ...x[m]) is found by:
A more rigorous definition will be worked out in an appendix to this paper, [ Dynamic Predicates ] To specify a process that solves two simultaneous linear equations we write
To specify a process that solves a quadratic equation we write
On the other hand
specifies that x is left alone and a, b, and c are adjusted to fit the equation.
Dynamic predicates can be used with any type of variable. The Programming Research Group at Oxford University, England has shown how dynamics can be simply specified as changes to sets[Z]. Sets specify the possible data bases. This can be done in MATHS. Here is an example from the Elevator system[Chapter 2 section 4 MATHS in Software Engineering]
ADTs as Sets of Operations
To define an ADT you list declarations, conditions and actions (section 6.7
below). For example
Given a set of documentation that has only declarations and predicates we can define the operations described by the documentation as a set of relations:
So, for example, operations{ x::Nat , x'=x+1, x'=x-1, x'=0, x=0, x<>0} = { rel[u:X, v:X]( v.x=u.x+1 ), rel[u:X, v:X]( v.x=u.x-1 ), rel[u:X, v:X]( v.x=0 ), rel[u:X, v:X]( u.x=0 ), rel[u:X, v:X]( u.x<>0 ) }.
Structures and Programs
Relations are a nondeterministic generalization of structured programming. I
will now show that if-then-else, if-fi, while-do, do-od are special
formulae in the calculus of relations.
Definition of A Non-sequential Process
A Non-Sequential Process is an expression of the relational calculus
using dynamic predicates as the elementary relations.
Definition of a Structured Program
A structured program is a finite expression of the relational calculus
using only composition, union, simple predicates, and dynamic
predicates of the form
as the elementary relations, and the expressions have no exceptions( are total).
The structured programs based on a set of operations are those relational expressions that are regular expressions made out of the operations in the documentation and their negations:
The processes derived from a set of declarations and predicates by using the operators (;), (|), (do), (no), (&), (~), etc are called the non-sequential processes:
By well known arguments it is clear that
Examples include Jan de Bakker's use of the calculus of relations to supply the semantics for Pascal-like program schemata [De Bakker 75] and Hoare's Relational Semantics[Cousot's Theorem 19 (page 856 of Leeuwen 90)]. Hehner's recent book on programming with predicates is very similar.
I will check that my model is close to the Hoare/Dijkstra/Gries model of SP: Consider sequence, selection and iteration, and conditions in turn:
Sequences
Hoare has independently stated axioms for the semicolon in SP that are
those of semiring multiplication[Hoare 88]. Dijkstra's discussion of
'R;S' is equivalent to the my definition of R;S [Dijkstra 76]. Mili et
al write:
Quote
. . . . . . . . . ( end of section Quote) <<Contents | End>>
Iterations
I define defines do(R) by using invariants. The SP 'while' rules
[Hoare] and 'do-od' rules[ Dijkstra 76, Gries, Dijkstra (Ed) 90] are
defined the same way. However SP assumes deterministic elements and some
determinism in the structures, but the calculus of relations does not.
It can shown that, if P is a condition then:
( do (P; R); no P) =do P->R od.
So I define while(P)(R)::= ( do (P; R); no P).
The "terminating condition" ( no P ) is necessary.
Selections
A selection (R | S) gives a choice of two alternatives. It is not an
if-then-else [Knuth 74]. Instead (R | S) indicates a process that can
follow one of two paths, but does not pre-determine which. But, if R and
S are deterministic and P is any condition, the following does pre-
determine the component to obey:
(P; R | no P;S) =if P then R else S fi= if P -> R [] ~P -> S fi.
Conditions
Conditions form a Boolean Algebra[cf Manes & Arbib 86]. Unlike the
Dijkstra/Gries/Hoare theory there are no special guards. If any item in
a sequence is going to be inconsistent then an alternative is taken:
((a; no; c ) | b)= b. Such temporary non-determinism can be coded by
either back-tracking or look-ahead - this is discussed further in
chapter 6 of my unpublished monograph.
So if a "path" or "track" will fail then the computation never started. In the Dijkstra model, computations either succeed or abort. abort is defined to mean that any thing could happen. So I define 'abort' to be the universal relation.
I treat failure as something that the coder will make the program avoid by following an alternative path to be taken. In terms of Scott's theory of Domains, Dijkstra assumes that failure is the 'bottom' (least defined) element of the domain and I assume that failure is the 'top' (overdefined) element of the domain. Related to this difference is Dijkstra's "Law of the Excluded Miracle", Nelson has argued [Nelson 90, and earlier DEC Internal Reports] that this law makes the calculus more complex and less general. Two other, rival, approaches to the meaning of 'failure' have been recently reviewed [Sistla 90].
Several authors have come to contradictory conclusions about the relation between regular expressions and programming [Knuth 74, Harel 79, Sakarovitch 86, ...] . The calculus of regular expression on binary relations is more than a structured programming language - it has full "Angelic nondeterminism." It is a way to express designs that can be implemented with different space/time trade-offs [Examples: Martelli & Montanari 82, Bitner & Reingold 75, Kowalski 79, Survey: Cohen 79]. In Floyd's phrase relations are "non-deterministic, not in the sense of being random, but in the sense of having free will"[ Floyd 67b]. The implementor is left the task of coding "free will" efficiently without losing the correspondence between the problem and solution[ Jackson 75]. Intersections also lead to interesting - not to say intractable - coding problems.
. . . . . . . . . ( end of section Relations and SP) <<Contents | End>>
Conclusion
Thus every program can be specified as a relation but not vice versa.
Lamport's dialogue discusses the difference well:
"This can be done. However, programming languages are constrained by the requirement that programs must be compiled into reasonably efficient code. Because specifications do not have to compiled, specification languages can permit simpler specifications than can be written in programming languages." [p15 Lamport 86]
The calculus of relations using dynamic predicates contains all structured programming formulations so far published. It is not a programming language because the relations are a bridge from the environment to programs that fit it. Thus programs are a proper subset of relational expressions.
Normal Form
Finally, relational expressions follow the calculus of relations which
can be used to simplify sequences into disjunctive normal form, like this:
Typical simplifications are
Here are the key theorems need to prove the existence of the normal form.
These theorems merely show that an intuitive symbolic execution of a
relational expression will lead to a predicate describing the effect of
following one or more paths through a program. Such expressions help
discover properties, simplifications, and errors [Mills, Gries,
Coen-Porisini et al 91, Khanna 91].
Command Mathematics
The title of this section was coined by Prof. M L V Pitteway in the early 1970s
to describe the study of algebraic
manipulation of algorithms. Relations model commands well and the relational operators underlie structured (and indeed unstructured) program.
Further they can express mixtures
of conditions and actions in one relation. A well known enigmatic program is
often written:
( while(n>1) (
if n in even then n'=n/2
else n'=3*n+1
fi
)
).Using relations lets us derive a suggestive new form:
Clean Room
Harlan Mills's internal IBM report[Mills 72] that shows that the theory of
sets leads directly to a mathematical theory of structured programming which
is both complete and provable. He corrects Boehm and Jaccopini's proof that
there exists an equivalent program which is a tree with only the D'-
structures(Ledgard and Marcotti). He shows that it is possible to prove the
correctness of these programs by a tree walk. Finally he shows how top-down
design can be reduced to choosing:
By inspection of Mills's report, this is true. However IBM's derived "Cleanroom" methodology uses functions.
Program Derivation
Relations can make the derivation of programs more logical: (1) state the
properties of the changed variables as a set of equations and then (2) solve
for the primed variables. Similarly, a non-deterministic relationship can
often be refined to a form that is compatible with a declarative programming
system such as Prolog [Compare Lamport 87].
. . . . . . . . . ( end of section Related work) <<Contents | End>>
. . . . . . . . . ( end of section Binary Relationships) <<Contents | End>>
Parnas's work on the Darlington Reactor Shutdown Program in Canada uses a similar calculus to the one presented here[Parnas et al 95]. He points out that when a relation is nondeterministic then there is no way to tell from the relation whether it failed to produce a result, or whether the program has entered an infinite loop. An extreme example is that:
Since this is an important difference it is clear that the calculus of relations by itself ignores an important difference and so is not an adequate model for programming.
The kind of case Parnas et al describe would be written
Now, Consider the intuitive meaning of this program. It is clear that
when x=1 this terminates, but when x<>1 it loops forever. However
but this is a condition and has no loop in it.
Parnas et al's paper also discusses the problems that a program can have different performances but the associated relations will be equal. Parnas suggests that a program can be modeled not as relation, but as two relations: one gives what can happen, and the other indicates when the program will terminate safely under all conditions. For example
if true before loopy guarantees that the loop will not occur.
Hehner has already proposed an alternative way of using the predicate calculus and relations as a model specifications and programs that avoids the problem of infinite loops. It has the advantage that it handles termination, performance, and the analysis of algorithms by a uniform technique. In Hehner's model (Like some of Lamport's) a statement is a relation that always includes an assignment of form
Thus the loopy program above becomes:
There is now a way to state that the program does not loop more than n times:
However this does not get one out of the bind. If P=Q and T is another relation modeling a time constraint then
So that is is still not possible to distinguish an infinite loop from failure by using the calculus of relations alone.
There is clearly more work to be done on how to map programs into relations so as to capture useful properties. We can conclude that except in the most theoretical terms, relations are not programs! As Lamport has observed - Relations are specifications of behavioral properties.
Here are two resolutions: a program statement can be modeled by two relations: [ Relation Algebras in math_45_Three_Operators ] and a program statement can be modeled by a mapping that associates a relation describing the possible behaviors to the resources that are made available to complete the computation, [ rjb9x.Timed.Relations.html ] and [ c.semantics.html ]
. . . . . . . . . ( end of section Limitations) <<Contents | End>>
Modern formal notations from Ada to Z provide means for creating a new type of object by (1) listing components and (2)specifying constraints and
operations on these components. I believe that any set of declarations can be
interpreted as defining a set of labeled tuples. If D is a collection of
declarations, definitions, axioms, theorems and comments then Net{D} is
called a set of documentation. A dollar sign converts it into a set of
tuples. Net{D}=${D} is the set of objects that satisfy D. If
Net{D} is called N then N means the same thing as Net{D}. For
example: (Times, <) in LOSET.
Labeled Tuples
Here are some examples of tuples, sets of tuples and
sets of sets of tuples. You can think of these as records, tables and data
bases in a programming system respectively. You can think of each as a small
spreadsheet that automatically keeps itself up to date whenever any cell
changes. You can also think of them as objects. Bear in mind that programs
are finite implementations however. Formal definition is in
chapter 5 of my monograph.
Tuples look like this: (a=>xa, b=>xb, ...). They are in a type defined by
one of the following ways:
Short forms
Each label in a structure can be used to extract components from tuples with the dot operator of Ada, C, Pascal,..., and Z:
The dot fits with mathematics since '.' projects a vector into a scalar. MATHS formalizes projections as functions mapping structured objects to their components. It has both f(x) and x.f notations [Knuth 74]. So if x in X and X=${a:A,b:B, c::C,... } then
Because projections are maps, they are relations and so have converses. So for example
MATHS also defines projection operations onto lists of elements:
One way to imagine a MATHS structure is to think of it as a record or object
in a high level language or a labeled tuple in a RDBMS. The formal definition
implies a collection of maps. Thus
phones=Net{name,number::Strings}
implies name in phones->String and number in phones->String. Formally,
phones is the universal (free, initial, or universally repelling) object in
the category of types with names and numbers. In other words: all types with
two functions called name and number mapping objects into the Strings, is
modeled by phones, because for any such type there is a function that maps
phones into it without loosing the information about names and numbers.
Therefore phones encodes all sets of objects with names and numbers(details in
Chapter 5 of my monograph).
Subclasses of a structured type
Given the idea of a tuple and a structured type it is easy to allow a notation
for collections of tuple that satisfy a given constraint - for example the set
of phones with prefix "011-" might be written as
Net{name,number::String, number in "011-" String. }
In my notation(MATHS):
.Dangerous_Bend
Sets of Objects
Given a set or class X in @T then we can talk about its subsets (@X). Finite
sets of similarly structured elements are familiar to computer people as
files, tables, and lists and are written finite@X. For example:
Suppose phone_data::= finite@PHONE
where
So if
Further
name(P) = P.name = the names in P ={"John Doe","Joan Roe"},
and
In general a property of the components defines a set of objects:
But a single object can often be identified by specifying enough properties:
Codd has shown that any set of data can be expressed as simple structured tables like the above. Thus we can model all data as tuples. A collection of sets of tuples is a relational data base and forms a category. The objects in an object oriented design also forms a similar category [Freedman 83]. Structures form a conceptual model of the area being studied that is readily implemented using modern technologies such as data bases, logic programming, and objects[Chapter 5 of my monograph].
Functional Dependencies And Keys.
Notice that P above satisfies the requirement that we can look up a number.
Given the items name and place they determine a unique number. We document
this fact by writing:
We can define the phone books as those finite sets in @PHONE that relate a name and place with one number:
Identifiers or keys like (name, place) appear in all computer applications.
Further Codd's work suggests that all data can and should be thought about and
manipulated as a collection of such sets[DATA]
Example - A University
I presented model of a university as a poster session to a joint SIAM/MAA
Southern California Chapter meeting:
[ UNIVERSITY in rjb9Xb.discrete ]
Standard technology and techniques can be used to implement sets and mappings - using (1)files, (2)a DBMS[DATA], or (3)data structures plus OOP[OOP]. An alternative is to treat each entity and relationship as a functor or predicate in a logic programming system[Keller87, PROLOG]. However classes can not be derived from the static analysis above. It should now be extended by a study of the dynamics of the entities and relationships so that object- oriented classes with both attributes and methods can be specified.
. . . . . . . . . ( end of section Structured State Spaces) <<Contents | End>>
Dynamic Predicates
Many properties of predicates don't depend on the universe of discourse. Predicates do not explicitly determine the type of their free variables. They are polymorphic[FP], generic[ADA], or "typically ambiguous" [Russell and Whitehead 63]. This is also true of dynamic predicates - they are often manipulated independently of the types of their component variables. However the formal definition of dynamic predicates maps them into relations on a given set called the state space. The space is determined by the context [ Footnote2 ]
There is a special mechanism (See "with" below) for declaring "local" variables.
If D is the name of a simple piece of documentation then D is a record type, entity type, or class. $(D) is a typical object. @D names the collection of all sets of objects of type D. Adding a predicate P to D specifies a subset of the records (D and P) ∈ @D. These predicates can specify validity condition for a data base, or an invariant property of an object, or a condition in a process or program. A dynamic predicate specifies a change in a record or state as a binary relation @(D,D). So dynamic predicates lead to a superset of structured programs. They can also define, specify, and reason about complex non-deterministic processes:
Definition of Dynamic Predicates
Consider an well formed formula W which contains as free variables names of
components of S with and without primes, where S has simple documentation U.
The primed variables are called dynamic variables and those with no primes
are called static variables. Variables that are inU, but not in W are also
static variables. An occurrence of a dynamic variable without a prime
represents the prior value of the variable. An occurrence of a dynamic
variable with a prime is a later value. A primed and unprimed variable have
the same type. From W construct a predicate P(x, y, z) with three non-
overlapping lists of free variables x[1..l], y[1..m], z[1..n] to replace
the variables in W. The x[i] variables replace those occurrences of a
name of a component with a prime in W - the new values. Each y[i] replaces
an appearance of an x[i] without a prime in W - the old values. Thus
each occurrence of a dynamic variable becomes either an occurrence of an x
or an occurrence of a y. The formal meaning of W is given by
${U, W }::= {(u,v): U><U || u.z=v.z & P(v.x, u.x, v.z) }
Example 1
Let
Example 2
If U = Net{x,y,z::T} and w = P(a,b), a predicate with a,b as free variables
then:
State Spaces with Constraints
If U is not simple then the constraint complicates the above definition,
since some of the apparently static variables can depend on the dynamic ones.
Suppose that U has a basis (a set of variables that can be changed freely),
then z becomes a list of basic variables that are not dynamic in `{(u,v):
U><U || u.z=v.z & P(v.x, u.x, v.z) }. So u.z=v.z` is thus less
stringent, so the non-basic component values are dynamic and have values that
make v lie in U.
Example 3
If U = Net{x,y,z::T. x=f(y,z). } and w = P(a,b) then (y,z) is a basis for U
and:
Block Structure
The form with{D}P introduces temporary variables into P where D is some
documentation and P any simple predicate.
Abuses of Notation
Predicates allow us to avoid thinking in terms of the state of the system
[Dijkstra 89, Denning ed. 89b]. We often treat dynamic and static predicates
as if they were relations. This assumes that a universe of discourse (state
space) is fixed in elsewhere in the documentation. Thus given state space U
and predicate w,
. . . . . . . . . ( end of section Formal Treatment) <<Contents | End>>
Serial Data
Programs can access data from a source, one datum at a time. They need to put
items into a stream of data one at a time. Similar commands send streams of
messages to other programs or data into files. My strings and lists can
model serial streams of data. The string and list expressions in dynamic
predicates mimic reading and writing.
Suppose we are interested in objects in a set A. #A is the set of sequences of objects in A. MATHS defines the following operators (See Chapter 4(section 8), chapter 5 of my monograph, and samples at [ http://csci.csusb.edu/dick/samples/ ] ):
A small technicality arises [ Footnote3 ]
Here is an example Data Directed solution to the problem of replacing double asterisks by a caret symbol[Hoare 78, compare Chandy & Misra 88].
The variables i and o are used for the standard input and output of a program so:
The above looks deceptively like pseudocode, but may demand unusual coding techniques.
. . . . . . . . . ( end of section Appendices) <<Contents | End>>
Footnote1
Originally in Whitehead and Russell's Principia the same form is in Floyd's loop invariant technique
and reminiscent of Kleene closure in grammar theory. This definition relies on using higher order quantifiers[cf
Guerevich 87].
Footnote2
In previous publications, I have permitted the co-domain to merely contain the domain. This
has some unexpected interactions when the state space is not simple.
Footnote3
When one process generates the input to another parallel process it is not clear whether an empty input is merely as yet ungenerated data, or if the generator has
terminated.
. . . . . . . . . ( end of section FOOTNOTES) <<Contents | End>>
Glossary
. . . . . . . . . ( end of section Relations and Programs) <<Contents | End>>
End