[CSUSB] >>
[CompSci] >>
/u/faculty/dick/cs320/sebesta/03
CS320 Notes on Syntax and Semantics
Definitions
Syntax
Defines what is legal and so meaningful in a language.
HTML: An HTML document starts with an optional header and
then a body.
LISP: A list is an atom, nil, or a sequence of lists inside
parentheses.
An expression is an atom or list that has an operator
in the first item.
C++: 1+2 is a valid expression.
Syntax also
Describes the parsing of a string of symbols into a
data structure.
Normally a tree: An X has a Y, a Z and a W.
HTML: The document has a body part and a header part.
Document
/ \
Head Body
LISP: Each list becomes a binary tree.
Each expression is a variable or a constant or it
has an operator and some operands.
C++: 1+2 is a valid expression. It has an operator +
and two operands: 1 and 2.
The structure uncovered by parsing input can be shown in a diagram
using the UML.
Semantics
Associates a meaning with structures in the language
HTML:
<IMG SRC="me.gif"> means "let the user see this graphic here"
LISP:
(+ 1 2) means `the result of adding 1 and 2`.
(A 1 2) means `call function A with arguments 1 and 2`.
(+ 1 A) means `the result of adding 1 and the value of A`.
C++:
1+2 means `the result of adding constant 1 to constant 2`
Syntax
Lexemes
A language is a set of strings of lexemes.
LISP: atoms, "(", ")", ","
HTML: tags "<"...">", symbols ("&"...";") and text.
C++: 123, alph, const, =, >>, ++, <=, ...
Token
Internal struct-ure encoding a lexeme
LISP: struct atom{ int length, char chars[],...}
Recognition vs Generation
Equal and Opposite!
Recognition
string->(recognize)->{yes, no}
Given "main(){exit(0);}"
Goal: Produces YES when valid C else NO.
LISP: Is this a valid list: (A B C ( X () ?
HTML: Is this valid: <HTML><head></head><body></body>
Generation
(generator)->Language
Output all valid C programs!
A theory of languages.
Describes what might be written.
Parsing
string->(parser)->Tree Structure
Part of a compiler...
LISP: (A B C)->(parse)-> .
/ \
A .
/ \
B .
/ \
C NIL
FORMAL SYNTAX
Chomski - had thought out a theory of REAL languages:
Context Free Grammars.
ALGOL 60, Zurich.
Backus - "Lets use CFGs..." (Having suffered problems with FORTRAN!)
Naur - "Here is how to use it..."
Committee- "Excellent...."
Algol60 report
http://www.csci.csusb.edu/dick/samples/algol60.syntax.html
Metalanguage
BNF used ::= where Sebesta uses ->
We will use ::= NOT an arrow
Deriving = generating and noticing how we did it.
Parsing = recognize and output structure.
Later language definitions have abandoned the <...>
and allow (....), [....], and {....} as metacharacters.
Each of these is called an EBNF (Extended BNF).
Our Notation XBNF
Why not use the books notation?
Can you figure out the following? (I can't)
<assign>-><var>:=<expression>
<program>->{<stmt_list>}
<comparison>-><simple_expression><relation><simple_expression>
<relation>-><|>|<=|>=|==|!=
<logical_op>->|| | &&
Sebesta's notation is AMBIGUOUS in ASCII. Which is why our "XBNF"
does not use "<" and ">".
Now we don't have this problem in C++:
cout << "statement";
cout << statement;
cout << "<<";
cout << 1+1;
XBNF copies C++ and use C++ strings to indicate lexemes/terminal symbols.
XBNF puts lexemes in quotes.
XBNF versions of Sebesta's examples:
if_statement::="if" logic_expression "then" statement
| "if" logic_expression "then" statement "else" statement.
identifier_list::= identifier | identifier "," identifier_list.
http://www.csci.csusb.edu/dick/maths/intro_ebnf.html
XBNF has some neat extras inspired by EBNF (see later):
O(X)::=`X is optional`.
N(X)::=`one or more X's`.
#(X)::=`zero or more X's`.
L(X)::=`lists of X's`.
L(X,Y)::=`lists of X's separated by Y's`.
Axioms:
O(X)= X|empty.
N(X)= X | X N(X).
#(X)=O(N(X)).
L(X)= X | X "," L(X).
L(X,Y)= X | X Y L(X, Y).
Example 3.1 in XBNF
program ::= "begin" statement_list "end",
statement_list ::= L(statement, ";"),
statement::=variable":="expression,
variable::="A" | "B" | "C",
expression::= variable "+" variable | variable "-" variable | variable.
Notice - no <>, ->
Notice - defined terms are spelled out in full.
Notice - Short hand for lists
statement_list = statement #(";" statement),
because in XBNF
For all x,y, L(x,y) = x #(y x).
On the WWW we use the HyperText Markup Language or HTML:
HTML: HTML_file::= header body.
header::= "<head><title>"document_description"</title></head>".
body::="<body>"HTML_text"</body>"
...
preformated_text::="<pre>"HTML_text"</pre>.
In the LISP lab you'll use this info:
LISP: list::= atom | null | "(" #list ")".
atom::= atom_char #(atom_char).
atom_char::= letter | digit.
Ambiguity
Example 3.2 - Translate...into XBNF
Parse Tree
Recognizers
Ambiguity
Operator Precedence
Associativity
EBNF
[] Option
{} Zero or more of
::= is or can be
OUR XBNF - Uses {}[]<> as in math...
Ada EBNF Our XBNF
[ x ] O( x ) optional x = (x |)
{ x } #( x ) zero or more x
{x,y,z,...} set of x,y,z,...
XBNF
[ x ] subscript x
char..char range of characters,...
A & B Satisfy BOTH A and B at once
A ~ B Satisfy A BUT NOT B
Example - Lisp Lists:
list ::=atom | "NIL" | "(" #list ")",
atom ::= number | non_numeric | string,
number::= digit #digit,
digit ::="0".."9",
...
Example - HTML document
html_document::= "<HTML>" head body.
head::= O( "<head>" O( "<title>" text "</title>" ) "</head>" ).
body::= "<body>" # piece_of_html_text "</body>".
piece_of_html_text ::= heading | image | separator | anchor | ... .
heading ::= "<h1>" text "</h1>" | "<h2>" text "</h2>" | ... .
image ::= "<img src="filename" alt="text" ">".
separator::= "<br>" | "<p>" | "<hr>".
anchor ::= "<a " (hypertext_ref | named_link) ">" text "</a>".
hypertext_ref::= "href=" URL.
named_link::="name="text.
C--:
program::="main(){" #statement "}",
statement::=assignment | declaration,
declaration::= "int" identifier";"
assignment::=identifier"="expression";",
identifier::="a".."z",
expressions::=term #("+" term),
term::=factor #( "*" factor),
factor::="(" expression ")" | identifier | integer,
integer::= N(digit).
Example - LISP expressions
expression::= variable | constant | "(" function # argument ")",
argument::=expression,
variable::=atom,
constant::=string | number | "NIL" | "(" "QUOTE" list ")".
Syntax Graphs
Good for users and beginners.
Designers need the structure of an EBNF meta-language.
Syntax graphs are NOT USED in CS320
Not in final, optional in assigned work and projects.
Attribute Grammars
Skip in CS320
Dynamic Semantics
We will be using UML in CS320.
This material is covered extensively in CS620.
Skip this section.
Revision Exercises
Do not do exercises 10, 11, 12,.... they are on attribute grammars
and semantics.
Not on the final.
Note.
x:=y -- assignment in Algol 60, Algol 68, Pascal, Ada,
and most computer science text books.
C/C++ use 'x=y;' instead.
x=y -- equality relation Algols, Pascal, Ada, Mathematics,
and most CSci text books.
C/C++ uses 'x==y".
x::=y -- definition in BNF, EBNF, XBNF.
Silly Examples
Train::=Loco #Wagon Caboose.
Meal::=#Bite.
Drink::=(Big|Small) Gulp.
RAM::=#byte & #word
Silly Exercise
In the old days, in the Cajon pass, there were two kinds of
train. A train either had one or two engines. The two engined
trains had to be switched to one line and the one engined
trains switch to go over the pass. A one engined train
stated with a loco, then had zero or more wagons, and then
a caboose. A two engined train started with a loco, had
zero or more wagons, and ended with a caboose and another
loco.
Define a train. Use phrases in the above to make nonterminal
The terminals are loco, wagon, caboose.
Serious Exercise
Work out the XBNF syntax of a C function declaration.
Hint: It is quite complex and there are several possible
answers. THINK!
Answer... search in
http://www.csci.csusb.edu/dick/samples/c.syntax.html
Between Syntax and Semantics comes Structure.
When describing the meaning (semantics) of a piece of a language it
helps in a real language to distinguish between the actual concrete
syntax (do you write "begin" or "{"...?) form the abstract syntax
or structures that are involved (a block...)
Semantics
Informal semantics
Incomplete
Inconsistent
incomprehensible
Example - Algol 68, LISP, ...
But popular.
Three classical approaches to Formal Semantics
Operational
Axiomatic
Denotational
We will use the UML to express semantic structures.
The operations that define operational semantics can be put
onto these diagrams.
You don't need to go into the details unless you want to!
See notes below if you want to.
Exercise
Chapter 3 of Sebesta has lots of new words - shown in bold case. Several
of these have been defined in a collection of files in
http://www.csci.csusb.edu/dick/samples/
On the WWW you can look up these definitions using the search commands
embedded in
http://www.csci.csusb.edu/cs320/index.html#searches
Assigned work:
Do not do review questions 9, 10, 11, and 12; or problems 16,17,18.
Do not draw any syntax graphs!
Where the book says C you can use C++
If you don't know Pascal look in
http://www.csci.csusb.edu/dick/samples/pascal.syntax.html
Hints:
http://www.csci.csusb.edu/cs320/sebesta/03.answers
This chapter introduces many new terms. If you can not find their
meaning in the text then follow this link
http://www.csci.csusb.edu/cgi-bin/dick/lookup320
and enter the word into the form to search my repository of terms
for CS320.
Semantics (Optional)
Operational Semantics
Language parts are mapped into OPERATIONS on a virtual machine.
If well chosen it makes implementation trouble free.
JAVA
BCPL
VDL
Became Vienna Design Method(VDM)
for ALL Software engineering!
Axiomatic
Define what can be proved to be true as statements are obeyed.
For example if you know that i is 1 before i++; then
you know what it is afterwards!
Hoare Triples
Given P is true before S then we must have Q afterwards.
Specifies what the statements do
without stating a particular implementation.
The "P" implies a disclaimer that if P is not true
this triple doesn't say what happens.
Dijkstra's Calculus
Given that you want Q to be true after this statement
then the language merely requires you to set up P first.
Precondition: P
Postcondition: Q
To see how assertions can be imbedded in code
see examples of Quick sort in Java at:
http://csci.csusb.edu/public/faculty/dick/qsort0.java
Later version: Do both Pre and Post as a single predicate.
Done By Eric Hehner
Use: x' for new value of x.
"x:=e" is x'=value(e) and for all other y, y'=y
Denotational Semantics
Associates parts of a language with objects in an
abstract mathematical 'domain'.
Actually a "function space" that is also a
"complete Partially Ordered Set"
BOOK
N[[ Part ]] = Denotation of Part.
This Course.
XBNF includes the ability to define functions.
Symbol for set of functions mapping Givens to Goals:
Givens->Goals
Example: cos: Real->Real.
So we can define of functions
that map syntax into semantics.
For x:syntactic_form, meaning(x)::=expression.
For real languages we do it as a series of steps:
(lexical): lexer::=strings->tokens.
(syntax): parser::=tokens->structures.
(semantics): denotations::= structures->meanings.
Binary numbers -syntax and semantics
Translated into Our XBNF:
binary_digit::="0"|"1". -- note these are characters.
binary_number::= binary_digit | binary_number binary_digit.
meanings::=`0,1,2,3,...`::=Positive&Integer.
We now describe a mapping or function from character
strings defined above into the positive integers
0,1,2,3,...
M::denotation=`Meaning of (_)`.
A denotation maps valid syntactic strings into meanings
denotation::= #char->0.. .
M maps each #char into a unique integer,
M must obey the following carefully chosen axioms:
M("0")=0,
M("1")=1.
For b:binary_number, M(b "0")=2*M(b),
For b:binary_number, M(b "1")=1+2*M(b).
Notice: a binary_number is either like this: number digit or
it is a digit. So one of the above rules will apply
to all valid strings.
Thus
M("1101")=1+2*M("110")
=1+2*2*M("11")
=1+2*2*(1+2*M("1"))
=1+4*(1+2*1)=13.
Doing this for a full scale language is not easy...
You need a notation for handling mathematical functions.
For instance
http://www.csci.csusb.edu/dick/maths/intro_function.html
Attribute Grammars -- Not in this course
Idea - each piece of input has some values (attributes) attached to it.
Some types of value go down the parse tree (inherited)
Some values are computed and passed up the tree
(synthesized)