.Open MATHS as a MetaLanguage
MATHS can be used as a powerful meta-language to define formal and computer languages.
.Contents
. LANGUAGES
The MATHS approach to language definition is the observation that all practical languages of any size has been defined by several sets of rules that define and interpret the Each set of rules is typically a constraint on the possible strings. Typically there is a lexical description, a set of syntax rules, and some informal semantics. Often the basic language is expanded by relations that add new strings to the language but present translation and transformation rules into the basic language. In MATHS we formalise this by assuming that formal language can be defined by the four sets of rules:
.Table Sets of Rules
.Row Lexicon Lexical Described by a Lexical grammar
.Row Glossary Informal descriptions of elements in the language.
.Row Grammar Grammatical A set of Syntactic grammar
.Row Semantics Meaningful A set of partial maps from grammatical strings into meanings
.Row Defined Definitions map generalised strings into Meaningful strings
.Close.Table
. LANGUAGE_DEFINITION
In essence the language definition task is create a relation that maps certain recognisable strings of symbols into a set of meanings. The standard approach is to separate the description of the set of strings (syntax) from determining the meaning(semantics). Typically there are to other techniques - a map from strings of characters into strings of 'tokens' representing 'lexemes', and a mapping of certain defined patterns into a standard language by a collection of 'definitions' or 'macros'.
LANGUAGE::=Net{
Alphabet::FiniteSet,
Lexical::$GRAMMAR and Net{terminal:=character_string | name_of_character, source, tokens, lexeme in non_terminal}.
Syntax::@GRAMMAR(Alphabet=>lexeme) and Net{ terminal:= lexeme_name|character_string, tokens in non_terminal}.
Meanings::Sets.
Meaningful::=/Denotations(Meanings),
Denotation::Grammatical<>->Meanings,
Definitions::@#Alphabet->@#Alphabet,
Language::= Standard./do(|(Definitions)),
Standard::= (source(Lexical) & Grammatical & Meaningful),
Gramatical::= &{G.source||G:Syntax},
Since there is no effective procedure for deciding whether an arbitrary language defined by this technique is empty or not, it is necesary for the designer to always provide examples of strings in the language.
Examples:: @Language~{}.
}=::LANGUAGE.
. GRAMMAR
Grammars are is a specialized form of formal documentation. A grammar is a set of definitions of names of sets of strings. Where strings are (finite) sequences of terminals. A typical terminal might be a character, a data type, a name of an event, an object in a data base or data structure, a message type, etc... Each term defined in a grammar represents a set of sequences. Typically a non_terminal can define a language, a set of valid or possible files, a set of interesting patterns of events, a strategy for getting particular data from a data base or data structure, a trace of a program, a communications protocol, etc.
A context free grammars have definitions solely in terms of regular expressions (made of #,|,. alone). These and other Chomskian grammars are defined in Math.Strings.
. EXTENSIONS
MATHS is not limitted to Chomskian grammars. Formally any set of simultaneous equations expressed in terms of set theory etc might define a unique language by using the smallest language that is a fixed point under the map formed by replacing defined terms by their definitions. As long as this substitution tends to only add longer strings to the sets, then there is a unique fixed point and so a unique language defined. However there is no reason to suppose that a general set of defintions like this is any sense efficient and a wise engineer will select definitions that lead to efficient checkers/parsers.
A useful technique is to first define a simple context free grammar and then develop from it (by the insertion of tags in sequences and/or pairing the non-terminals with another grammar) a definition of a lexer, translator, parser, or other program.
. Traps
Complexity - The connonical grammar defines exactly the set of terms that the user understands, no more, no less.
Efficiency - the designer may be permitted to change the language to make recognition, translation and intepretation efficient, but should not search for the most efficient expression of the actual language until a working and correct and complete version is already running.
Inconsistency - the designer must exhibit a string defined by the system of definitions.
Ambiguity - the designer should not add to or diminish the ambiguity already present in the language (if any).
.See Math.64 Meta Macros
. SEMANTICS
The definition of syntax has been understood and used with success throughout computing since 1964. 'Semantics' is about what well formed strings mean. There is less agreement about how meaning should be defined. Initially natural language was used (COBOL) but even the acknowledged best designers failed to produce clean definitions this way (ALGOL 60's Blemishes). Later attempts map the language strings into a formal subset of a natural language (ALGOL68, thence to Pascal, Ada, Edison,...).
Another method is to define an abstract machine and map statements in the language into changes in the state of the machine (VDM tested on defining ALGOL 60 & PL/1). A variation of this is to define a virtual machine that can interpret a simplified version of the language plus a translator from the language into the simplified form. The interpreter can then be implemented in software or hardware (BCPL and OCODE, Pascal and PCODE, Modular 2 and LILITH). The LISP language is unique in being defined solely by a very simple LISP program that interprets LISP.
The axiomatic approach of Hoare associates statements in a programming language with inference rules allowing properties of the whole to be derived from the parts. Dijkstra's form of this is method is to specify for each elementary step and structure the weakest pre-condition necesary to guarantee a particular post_condition. Theorists have demonstrated that it is possible to define most programming languages and axiomatic specifications have been published for Pascal - but no well known language definitions have used this technique.
Growing from early attempts to map the language into a completely formal language (a logic), the approach created by Strachey, Stoy and Scott is to provide 'denotations' that map parts of a programming language into a set of meanings. The meanings have someking of algebraic and logical structure. The programming language elements and syntactic structures are mapped into elements and expressions in the abstract mathematical space of meanings. It is then necesary to show that one and only one meaning for each string in the language. Again this has developed into a feasible approach but has not been applied in any well known language. Scott has reccomended using complete partially ordered sets and function spaces to model the meanings of languages with some success more recently Manes and Arbib propose using categories instead.
MATHS can be used as a notation with any of these methods - the symbolic logic is suitable for specifying axioms for example, and the mathematics provides several models of abstract machines.
. DEFINITIONS
An example definition is:
For Q:@States, S:@#Input, Q S::= {q':States || for some q:Q, s:S ( q |--[s] q')}.
This indicates that any string that has two parts, the first repsenting a set of states, and the second a set of strings of inputs will be replaced by the more complex intensional set on the right of the tilde. A definition indicates a process of analysing the input, matching bound symbols and forming a replacement string out of the strings that match the bound symbols and the remaining characters as they stand.
Syntax
definition::= "For" bindings "," #(bound_symbol | lexeme~(comma | ":"":=")) ":"":= " expression&(#(bound_symbol | lexeme)), where bound_symbols are those bound in the bindings.
.Road_works_ahead Semantics
??{
Definition:#Alphabet<>->#Alphabet,
Definition=rel[s,...
}?
.Close MATHS as a MetaLanguage