XBNF uses space, bars ('|') and the symbol '::=' as in BNF. It uses '()' as parentheses and '#' to mean 'any number of'. This choice follows more than 10 years of testing other forms in class and on computers. The EBNF symbols '<>{}[]' are used for other purposes
Here is comparison of BNF, EBNF and XBNF.
BNF <number>::=<digit>|<number><digit>
Ada EBNF number::= digit {digit}
XBNF number::= N(digit).
You can see which is shorter. XBNF is more powerful than EBNF as well. It is not as short as the theoretical notations for a grammar because it is designed to work in ASCII (no Greek or special symbols) and yet say more.
"::=" is read "is defined to be"
"&" is read "and also"
"~" is read as "but not"
"|" is read as "or"
"#" is read as "any number of including none"
"(" is read as "a sequence of"
")" is read as ", end of sequence"
space between 2 parts in a sequences can be read as "and then"
"_" becomes a space.
"N(_)" is read as "one or more _" and short for "_ #(_)"
"O(_)" is read as "optional _" and is short for "(|_)"
A Meta-Grammar
Here is a set of definitions that defines a XBNF grammar.
Here is the definition of what an element or terminal looks like - its actually a C string.
Here is a more complex definition the defines what a defined term can be:
Notice that a defined_term's definition includes the item '...& defined' which indicates a special non-syntactic constraint - that a defined_term must be in the set of terms which are defined.
A defined term is made of numbers, words and the underscore character:
Numbers can have one or more digits:
Words are at least two letters long and are correfctly spelled:
Lexemes
It helps if you define the basic elements of your language (the lexemes)
in a special section -- a Lexicon. It also helps to indicate them:
name::lexeme="string", purpose.
double_plus::lexeme="++", used to add 1 to a variable.or
name::lexeme, purpose.
typedef::lexeme, use to introduce type definitions in C++.
In the second form the string is equal to the characters in the name:
typedef::lexeme="typedef".
Doing this lets the string that represents the lexeme be indexed and found by search engines. In practice many queries of language reference manuals on the web are trying to find the meaning a lexeme.
Lexicon
The following terms are defined here for completeness and
so you can use them in your own documents. Normally
you can take them as given merely by including a link to
this section of this web page:
http://www.csci.csusb.edu/dick/maths/intro_ebnf.html#Lexicon
Note: MATHS/XBNF/EBNF is case sensitive, but is used to define case insensitive languages. The following define some maps that may help to define the syntax of case insensitive languages.
c | to_upper(c) |
---|---|
a | A |
b | B |
c | C |
... | ... |
If you need to define a language that uses the ASCII code then [ comp.text.ASCII.html ] gives definitions of characters (including the control characters) by name and purpose.
Precedence of Operators.
The definitions of XBNF above
[ A Meta-Grammar ]
imply the following consequences and
interpretations:
Concatenation (space) has a higher precedence than a vertical bar ("|").
In an XBNF formula with both spaces and "|" symbols the bars separate
the sequences. For example, If a, b, c and d are items, then
a b | c d = (a b) | (c d) =either a sequence of a and then b, end of sequence or a sequence of c and then d, end of sequence.
a b | c d is not equal to a (b|c) d.
Indeed
The number sign("#") has lower precedence than a space and so always applies to the next item. Thus:
However, if
The next description
Similarly, If a and b are items then (a | #b ) indicates either
an a
or any number of b 's, as does (#b |a ). This, plus the rule relating
spaces and the "|" symbol means that
(a #b c | #b )=either a single a followed by many b's and one c or else many b 's alone.
Parentheses (()) are put around selections within sequences, and
iterations.
(a | b) (c | d)= a sequence of a or b, end of sequence and then a sequence of c or d, end of sequence.
Or informally
the first item is either an a or a b followed by either a c or a d.
Notice that this describes four alternative forms because:
#(a | b )=any number of a's and b's in some order.
So, #(a|b) is different from #a | #b because
So, #(a|b) includes alternatives like a b and a a b and b b a that are not permitted by #a | #b.
The following common form indicates a series of pairs:
#(a b )=any number of pairs. where each pair has one a followed by one b.
The #(a b) form implies that each a must be followed by a b. #a #b implies that all the as are followed by all the bs. #(a|b) implies that there is a series of as and bs in some order... a kind of muddled list.
Shorthand Idioms
Certain patterns appear again and again in practice and XBNF defines
shorthand "macro"s for these patterns:
Semantics
Informally each defined term maps into a set of sequences.
Thus it extends set_theory and a theory of sequences.
The structure of these sequences is determined by the definitions
of the defined terms. This is easy to see until one gets doubts about
definitions like this:
The above defines a train in terms of the meaning of a train. The formal semantics tackles and resolves the problem of such recursive definitions see grammar_theory below.
Acknowledgment
Thanks to Larry Evans <jcampbell3@prodigy....> who pointed out
the broken and bogus links in these notes on Mon May 21.
Also thanks to claudiu <claudiu@romatsa.ro>
on Tue, 23 Oct 2001 who spotted errors and made wise
suggestions.
Problems that remain are dick botting's.
. . . . . . . . . ( end of section Syntax (XBNF)) <<Contents | End>>
Notes on MATHS Notation
Special characters are defined in
[ intro_characters.html ]
that also outlines the syntax of expressions and a document.
Proofs follow a natural deduction style that start with assumptions ("Let") and continue to a consequence ("Close Let") and then discard the assumptions and deduce a conclusion. Look here [ Block Structure in logic_25_Proofs ] for more on the structure and rules.
The notation also allows you to create a new network of variables and constraints, and give them a name. The schema, formal system, or an elementary piece of documentation starts with "Net" and finishes "End of Net". For more, see [ notn_13_Docn_Syntax.html ] for these ways of defining and reusing pieces of logic and algebra in your documents.
For a complete listing of pages in this part of my site by topic see [ home.html ]
Notes on the Underlying Logic of MATHS
The notation used here is a formal language with syntax
and a semantics described using traditional formal logic
[ logic_0_Intro.html ]
plus sets, functions, relations, and other mathematical extensions.
For a more rigorous description of the standard notations see