[CSUSB] >> [CompSci] >> [Dick Botting] >> [Papers & Essays] >> rjb96b.mth2tex
[Index] || [Contents]
Disclaimer. CSUSB and the CS Dept have no responsibility for it.
This is a Beta Version, read at your own risk.
Copyright. Richard J. Botting(Sun Jul 6 10:03:42 PDT 2003). This paper is being developed for publication. Permission is granted to quote parts of it as long as the source is acknowledged and the author informed.

Contents (index)


    Making Formal Documentation Simpler

    Abstract

    This paper documents some of the properties of a new kind of language. This paper was written in a prototype of this kind of language. It was automatically marked up for publication on the Internet and at this conference. It demonstrates that formal methods and mathematics for software development can be much simpler and usable than previous publications would have us believe.

    The aim is to be able express meaningful ideas and worry about the appearance of the text later. The goal is for something that is readable in raw form. I have been trying out prototype languages. Earlier versions have been used by students. There is an extensive set of test samples. There is a tool for searching documentation for relevant definitions. There is a tool that marks up the text and produces HTML files with internal and external hyperlinks. In time I hope to make a robust set of tools available for a better version of the language used here.

    Introduction

    This paper is about languages for those who do not want to become mathematical typesetters and yet need to use certain kinds of mathematics to help them solve problems. The targetted users are not those who need to express clasical mathematical formulae as part of their research. These people are more than adequately served by \TeX, \LaTeX, and the American Mathematical Society.

    Motivation

    In the process of comparing the different ways that software is developed I was struck by notational problems faced by people trying to use mathematical methods to develop sosftware. Consistently they have shown that mathematics can help the software process. However using mathematics mearnt learning to read strange symbols and then learning to express them in a markup language.

    In most books on "Formal Methods" you will find, in the preface an acknowledgment of the help provided by a guru in finding the right way to mark up the text. I do not believe that we can ask software engineers to wate time doing this.

    The mathematics needed in these "Formal Methods" projects was a small subset of mathematics and logic focussing on logistic, discrete, and algebraic systems rather than the traditional calculi. My research into software development methods indicated that software engineers would soon need a simple ASCII language for documents that include mathematical formulae. Such documents would be sharable on the internet as well as published on paper.

    I decided to see if it was possible to encode discrete mathematics and algebra in ASCII. If successful it would mean that I would not have to teach programmers to read

  1. \forall x, \exists y, . . . and to write
    	\forall x, \exists y, . . .
    when they are thinking For all x and some y ....

    I decided to see if there could exist a single portable computer language that could be used to express logic, data, discrete mathematics, algebras, requirements, specifications, data bases, syntax, semantics, program structures and designs, and code. It could be searched like a data base. Tools would translate them into many different forms. These translations could be browsed on the internet. The languages must not be a markup language. It would be like a programming language because it would have a defined meaning other than the look of the output. Unlike a programming language the meaning would not be a program but a facts statable in simple mathematical terms.

    If such a language could be designed, and if it could be made easy to read and easy to write then it might make so-called formal methods more palatable.

    History

    I started with was an extension of BNF capable of expressing syntax, data, and programs. I had noticed that various forms of BNF are used in a great many different methods and places in softwar development.

    The first notation was used by students. They submitted work via EMail. They prepared formal documentation without having to learn a markup language or any special symbols. This included syntax definitions of most famous programming languages including: Ada, C, COBOL, LISP, Pascal, and Prolog [ CS320] .

    Feedback from the students helped me improve the notation. Since then I have added documentation othe languages and systems including: Algol-60, C++, SML, Java, and Z.

    To make this information more accessible I developed a simple script to search these documents and extract definitions:

     		define pascal loop

    With the arrival of the World Wide Web I developped a rough and ready UNIX script that marks up the raw notation in the HyperText Markup Language(HTML) used on the web. It is called mth2html. This tool translated all the sample language definitions have all be translated into HTML and published on the world wide web: [ Samples ] I also prepared a formal definition of the syntax and semantics of the notation [ Maths ]

    This tool was tested under fire in 1995 when it became necessary to prepare and distribute the equivalent of a text book on the formal semantics of programming languages to a class of graduate students. [ CS620 ]

    One of the striking properties of typical Formal Programming Language texts is that complicated formal defintions are assembled, piece by piece, from simpler ones. The text will comment that definition of a language incorporates all the definitions from a simpler language in it. In HyperText the link should be made explicit.

    I realized that when the name of the simpler specification should act as a formal cross reference to the original. It becme clear that this was very close to the idea of inheritance and extension used in object-oriented programming. The abillity to name collections of mathematical ideas and the reuse the names gives a system that include the techniques used in the Z specification language.

    I found that I had a tool that was a simple way to prepare online handouts. All my syllabi since then have been written out and then automatically marked up for delivery via the world wide web. However the printed version still needed manual markup or a proprietary word processor. In the summer of 1996 an experience with a popular word processor motivated me to find an alternative. I decided to develop a prototype tool to markup my notation into \LaTeX.

    The first working version took about 10 hours work. It is called mth2tex. It is not finished yet. This paper was marked up, from my notation, using this prototype.

    This Paper

    I will briefly describe the aims and goals of the notation - named
    		MATHS
    and some of its features. Then I will describe the changes that occurred in it when preparing this paper for publication.

    . . . . . . . . . ( end of section Introduction) <<Contents | Index>>

    The MATHS Manifesto

    Not A Programming Language.

    MATHS is not for "writing programs." MATHS uses the characters of ASCII to express logical and mathematical formulae. Some formulae can be implemented as systems, objects, data bases, or hardware. It does not depend on any implementation. MATHS describes ideas that are not algorithms. It helps all parts of the software process. MATHS stands for "Multi-purpose Abstractions, Terminology, Heuristics and Symbols."

    Low-cost Improvement

    MATHS is designed to make formal methods a part of the software process. MATHS does not require expensive hardware. MATHS does not need complex software. A laptop computer with an ASCII editor and printer is all that is needed.

    MATHS decreases the overhead of making a "good" documentation "look good." Rigorous specifications in MATHS can be pre-processed, without additional marking up, ready for a text processor to prepare publication quality documentation. Many figures can be generated automatically since many MATHS formulae have a definite graphic representation.

    MATHS minimizes training costs by borrowing from well known notations. It builds on ideas of Computer Science - EBNF, programming languages, data bases.... It is simple to set up local electronic help systems for practitioners.

    MATHS includes features to make documentation and source code reusable. It enables the creation of an electronic repository of ideas, data and code ready for reuse. MATHS lets you refer to any published document or public Internet document and then make use of its content.

    MATHS is not "Artificial Intelligence" but "Intelligence Amplification." It uses the existing skills and knowledge of its users but makes the recording of thoughts a computer aided process.

    Not a Method.

    It will support many methods and fit with any flavor of programming.

    Modular and Reusable.

    The MATHS user can describe complex things piece by piece. Instead of one complex description MATHS makes it possible to define each part of the problem independently of the rest. There are rules for combining the pieces into a whole. Research and practice shows that multiple descriptions are better than a monolithic description.

    MATHS encourages working from what is, to what should be, and then to detailed design. Understanding, documenting, and solving problems replaces 90% of what has been called "programming." Research and practice shows how to use MATHS to express syntactic and semantic structures and translate them into code. Experience shows that such systems are easier to understand, validate, and maintain.

    MATHS shows how to recognize and parse the formal parts of documentation. The formal parts can be organized, summarized, indexed and checked. MATHS documents are compatible with Electronic mail and BBS systems. There can be repositories and encyclopedia of ideas.

    . . . . . . . . . ( end of section The MATHS Manifesto) <<Contents | Index>>

    The Language

    A MATHS document is a sequence of sections. A section can contain subsections or be elementary. An elementary section is a sequence of paragraphs. Each paragraph is either commentary, or a definition, or a declaration, or a well formed formula. Well formed formulae are classified as either assumed (axioms) or a proved result (theorem) or a block. Blocks can express proofs or allow a piece of MATHS text to be named and treated as a mathematical expression of a network of ideas: comments, declarations, definitions, and formulae.

    An elementary section begins with a header line like this:

     		. Section <name>

    A section that is not elementary starts like this:

     		.Open <name>
    and ends like this
     		.Close <name>
    It is helpful to the author to have the closing line echo the the header line - there is evidence that this reduces errors. In a hypertext system all sections become hypertext anchors. In my prototype tool they are also used to generate a contents list .

    Paragraphs are separated by blank lines. Formulae are indicated (at this time) by the pressence of a space or tab at the start of a line. Definitions and declarations contain two adjacent colons(:):

     	(Nat, +, 0, -)::Group.
     	section::= ".Open" name eoln body eoln ".Close" name.
     	URL::= Universal resource Locator.

    A typical block starts with one of:

     		.Set, .List, .Table, .Box, .Net, .Let, ...
    and ends
     		.Close.Set, .Close.List, ...

    There are a few of special directives for lines and paragraphs:

     	.As_is	<a line of ASCII>
     	.See	<internal or URL hypertext link>
     	.Source	<cite offline document

    Definitions and Declarations

    The extensive use of BNF-like notations in software engineering suggests the following syntax
  2. simple_definition::= term definer expression.
  3. definer::= declarer optional(type) equals_sign.
  4. declarer::= colon colon.
  5. colon::ASCII=":".
  6. equals_sign::ASCI="=".

    A declaration is essentially a definition with no expression:

  7. declaration::= term declarer type.

    Semantically a definition is the same as a declaration plus a formula that asserts the equality of the new term and the expression. An expression in a definition can be any mathematical expression. The term can be a name or a function. It is possible to make more complex parameterized definitions as well. The full language: informal introduction, formal semantics, and formal semantics can be found at [ Maths ]

    This definer lexeme is deliberately verbose. It is it easy to recognise definitions and (1) format them nicely, (2) make tehm into hypertext anchors, (3) generate an index of defined terms, and (4) search for them using a program like "grep".

    Syntax is defined by using expressions that can include terminal symbols in double quotes and the following operators on sets of strings:

    		 	Concatenation
    		|	union
    		&	intersection
    		~	complement
     		#	any number of

    I've found it useful to invent a special use for three words in definitions.

     		given, goal, following
    The first two: given and goal indicate formal parameters in a piece of generic documentation. The word following indicates that the term is defined to man the whole following paragraph. Typically this is a block of text descriing a list, set, or algebra. They can also allow the writer to connect a name to a complete document either on the internet or published some where. Here is an example where the term logic is bound to a document on the World Wide Web:
     	logic::=following
     .See http://www/dick/maths/logic_2_Proofs.html
    
    Which, marked up and rendered, becomes:
    
  8. logic::=following [ logic_2_Proofs.html ]

    Expressions

    Formula are easily expressed by using some standard symbols for common operators:
     	Boolean:  	and, or, not, if(_)then(_), iff
    	Numbers:  	*, +, -,^, /,...
     	Relations:	=,<,>,<=,>=, <>, in, ==>, ...
    Here are some samples:
    	P and Q iff Q and P.
    is rendered as:
  9. P and Q iff Q and P.

     	x = (-b+sqrt(b^2-4*a*c))/(2*a)
    is rendered as:
  10. x = (-b+sqrt(b^2-4*a*c))/(2*a)

    Quantifiers

    A universal property is written using a quantifier:
    	for all x:angle( (sin(x))^2 + (cos(x))^2 = 1).
    is rendered as:
  11. for all x:angle( (sin(x))^2 + (cos(x))^2 = 1).

    MATHS has a special set of quantifiers:

     	all, some, one, no, 2, 1..3, ...
    		all:	universal
     		some:	existential
     		one:	unique
     		no:	complement of existential
    		12:	precisely twelve exist
    		1..3:	either one, two or three exist
    The keywords
     		for, For
    introduces a sequence of quantifications. Each quantification looks like this:
  12. quantifier variable : type There are some shorthand formats for the kinds of statements that need to be made. Here is an example. The raw form is:
    	for all x:Real, 0..2 y:Real ( y^2 = x ).
    After marking up, and rendering, we get:
  13. for all x:Real, 0..2 y:Real ( y^2 = x ).

    Formulae

    A well formed formula is rather like a boolean expression that occurs in a piece of documentation:
  14. For all x:human, x in mortal.

    Formulae can be labeled:

  15. (comm): a+b = b+a.
  16. (zero): for all x:G( x+0 = 0).

    Theorems are indicated by a well known symbol for assertions:

     	|-
    Here is a sample:
    	|- blatant assertion with no given proof or name.
    which becomes:
  17. |-blatant assertion with no given proof or name. A theorem should normally be named so that it becomes reusable:
  18. |- (uniqzero): For one z, all x( x+z=x ).

    A named theorem can also indicate its antecedents:

  19. (comm, zero) |- (comm2): x+0 = 0 = 0+x
  20. (logic) |- (and_commutes): if P and Q then Q and P.

    In a hypertext version the lables above are links to the statements they label and the assertion symbol is a link to the proof of the result. A proof is a section that has a block following the rules for constructing proofs [ Proofs ]

    Proof of comm2


    Let
    1. x::G.
    2. 0+x
    3. (comm) |-
    4. = x+0
    5. (zero) |-
    6. = 0,
    7. (Euclid) |- 0+x=0,

    (Close Let )

    Proof of and_commutes


    Let
    1. (1): P and Q.
    2. (1, and1) |- (2): P.
    3. (1, andt2) |- (3): Q.
    4. (3, 2, and3) |- (4): Q and P.

    (Close Let )

    Botice however, that theorems like the above are only training exercises when learning mathematical and logical reasoning. An engineer makes use of theorems rather than proves. Where a mathematician stops, then engineer starts.

    Long definitions

    As mentioned above it is possible to bind a name to a paragraph or a section, or indeed any document. This is allows us to name and reuse formal definitions. The effect is to make formal a very common mathematical idiom: " A group is a monoid with an inverse operator such that..."
  21. Group::= Monoid with following
    1. inverse::Set->Set.
    2. For all x:Set, x op inverse(x) = inverse(x) op x = unit.

    Or...
  22. Monoid::= Semigroup with following .Net unit::Set.
  23. For all x:Set, x op unit = unit op x = x.

(End of Net)

For many examples start with [ Algebras ] in my internet repository of mathematics.

. . . . . . . . . ( end of section Notation) <<Contents | Index>>

Experiment

This paper is the direct result and the test case for developing a translator from my language into \LaTeX. I developed the new tool within a few weeks using the UNIX Philosophy [ Gancarz 95 ] however the result is a bread-board prototype rather than a useful tool.

Only one change had to be made to the language itself. Bibliographic references in hypertext become links into the biliography. Thus hte handling of a internal, internet and non-electronic links is similar: generate a URL. In TeX the internal links and internet links become simple text, but bibliographic references have to be changed drastically. I therefore had to allow for a new directives ".Bibliography" and ".Close.Bibliography".

The hardwat part was trying to find a simple translation for the nested boxes of definitions that a common in documenting complex ideas. In this I was helped by Dr. George Georgiou.

. . . . . . . . . ( end of section Experiemnt) <<Contents | Index>>

Conclusions

This work does no more than show the feasibility and usefulness of a new kind of computer language that is niether a markup language nor a programming language. My research shows that it is possible to write the kind of mathematics needed for software development without using a markup language or an expensive word processor. A meaningful and readable notation can be processed as data and manipulated in many ways.

There already exist a prototype internet readable repository of reusable mathematics and logic ready to be bound to mathematical documentation. In addition there are detailed documentation of sample languages, problems, solutions, and protocols. The language has also been tested on writing papers and handouts [ Papers ] [ CS320 ] [ CS620] .

Future work

The notation needs to be be improved. In particular, while this version of mathematical notation was being developed the syntax of the C family of programming languages became the dominant form used and understood by many software developers. I would like to explore an alternative syntax that is based on the C family rather than my own experience.

The prototype repository of formal systems is full of errors, holes, and less than perfect expressions. This needs detailed review and reconstruction to imporve its reusability. I believe that it is important that these ideas should be freely, rapidly, and simply accessed by software developers.

Given that it has been simple to create a prototype I hope that some industrial strength languages and tools will be developed in the future. The ad hoc set of tools needs to be replaced by a suite of documented components before others can use them. These components will include lexical analyzers, parsers, and generators for different mark up and programming languages.

In time a proof annotator and style checker is also need. I visualize tools that aid the author by providing constructive crticism of what they have said.

. . . . . . . . . ( end of section Conclusions) <<Contents | Index>>



References

  • (Algebras): Botting, R.J.; [ math_31_One_Associative_Op.html ]
  • (CS320): Botting, R.J.; [ http://www.csci.csusb.edu/dick/cs320/ ]
  • (CS620): Botting, R.J.; [ http://www.csci.csusb.edu/dick/cs620/ ]
  • (Maths): Botting, R.J.; [ http://www.csci.csusb.edu/dick/maths/ ]
  • (Papers): Botting, R.J.; [ http://www.csci.csusb.edu/dick/papers/ ]
  • (Proofs): Botting, R.J.; [ logic_2_Proofs.html ]
  • (Samples): Botting, R. J.; [ http://www.csci.csusb.edu/dick/samples/ ]
  • (Gancarz 95): Mike Gancarz, The UNIX Philosophy, Digital Press Newton MA 1995

    . . . . . . . . . ( end of section Making Formal Documentation Simpler) <<Contents | Index>>


    Formulae and Definitions in Alphabetical Order