.Open CS320/04 Syntax & Parsing -- Parts of Chapters 3 and 4 .Table .Row Previous 03 Evolution of Main Languages Chapter 2 lab03 HTML lab pages .Row 04 Syntax: grammars, EBNF, parsing Chapter 3 sections 1 to 3 + Chapter 4 sections 1 to 3 + XBNF & $LRM Handouts lab04 BNF on the web .Row Next 05 Semantics: UML UML handout lab05 UML + Graphics .Close.Table . What is important in chapters 3 and 4 In chapter 3 we are most interested in the simplest methods of syntax description -- the Context Free Grammars and the BNFs. You can use syntax description when developing any program with complex input or output! In chapter 4 you need know something about how to input and parse data. All programs have to do this! However, the details are a part of the CSUSB CSCI Compilers classes. Notice that the lexical analyser (lexer) and the parser together convert the input stream of data into a tree structure -- a collection of objects linked together by pointers. When we draw UML diagrams of the semantics of a language we also describing these objects and how they will be associated. State transition diagrams (figure 4.1) are an important part of computer science, but very easy to understand and use. These diagrams express the simplest kind of theoretical computer: the .Key finite state machine or .Key FSA. These are machines with limited memory (finite number of states) and star in several upper division electives in our degrees. The Unified Modeling Language (UML) has a diagram called a .Key UML state machine. Here .See ./lexer.gif is an older version of Figure 4.1 drawn using an UML state machine. Can you figure out what is missing in my UML version when compared to Figure 4.1? More on the UML later. By the way, .Key XML documents have their detailed syntax described in a form of Extended BNF. . Assigned work due (1) Study chapter 3, sections 3.1(intro), 3.2(problem), and 3.3(solution). Study chapter 4, sections 4.1(intro), 4.2(lexers), and 4.3(parsers). However I do not expect you to memorize or duplicate the code in chapter 4. (2) You will use XBNF(eXtreme BNF) in your project work. Read the XBNF handout. You will be using a subset of HyperText Markup Language(HTML) in the Labs. Read the handout describing a subset of the HTML. Your project will be a longer document mixing XBNF, English, and diagrams, like these two handouts. Bring copies of today's handouts on XBNF and HTML to all future classes, and laboratories. Add your own notes and bring them to the final. (4) Study the first 8 Review Questions at the end of chapter 3 and the first 8 questions at the end of chapter 4. (4) Hand in written/typed/emailed copies of 2 Review Questions (any mix of chapters) at start of next class ( 2 points total max). Plus 1 point for being prompt and 1 point for full participation. . Your Project In your project you will be developing a LRM (Language Reference Manual) for a programming language that you will be inventing. What makes a good LRM? Four things: .Set $Comments $Examples $Syntax $Semantics .Close.Set .Key Examples show individual points in the set of possible languages and help a person interpolate (guess) other examples. Here is an example of a C expression .As_is a+1 .Key Comments help the user understand by being informal: .As_is an expression like `foo+bar` adds `foo` to `bar`. A grammar defines the syntax. .Key Syntax expresses precisely the possible forms of something. Grammars are written in a mathematical metalanguage: .As_is additive_expression::= term #(("+" | "-") term). The .Key Semantics eliminates meaningless possibilities and provides meaning to the rest. These are normally expressed informally but in this class we will use the Unified Modeling Language. --(UML) More on the UML later .See ./05.html and here are some sample LRMs for well known languages .See ../samples/algol60.syntax.html .See ../samples/php_intro.html .See ../samples/python.html plus the syntax from other LRMs: .See ../samples/cobol.syntax.html .See ../samples/pascal.syntax.html etc.... Also see .See ../samples/ Further, these four things (comments, examples, syntax, and semantics) need to be intermingled: you should be able to see the reason for a feature(comment), plus an example or two of the feature, and the rules (syntax and semantics) that apply to that feature all on the same page. We will use the UML to also provide a visual summary of the semantics of the language. So for each part of the language that you are defining you will need: .List Some commentary Some examples Some syntax definitions using some kind of BNF Some semantics....mainly English but sometimes a piece of the UML and/or math helps. .Close.List Here is the kind of thing I mean. .Open Mini-Example: LRM for Decimal Numbers (comment): Numbers are expressed using the familiar decimal notation. A string of decimal digits represents a number. The value of the number is calculated in the normal way. (syntax): number ::= digit | digit number. (examples): .As_is 1 .As_is 512 .As_is 8192 digit ::= "0".."9". .As_is 1 The meaning of a digit is an interger in range 0..9 using the normal encoding. (semantics): The meaning of a string of `n` digits `d` = (d[n] d[n-1] ... d[3] d[2] d[1]) is called it's `value`. We can model `value` as a map from syntactically correct numbers (see $number above) into the natural numbers, including 0, Nat0={0,1,2,3,4,...}. The function `value` is defined by: .Image 04.gif Formula value(d)=sum[i:1..n](10^(i-1) * d[i]) So for example, value("8192") = 8*1000 + 1*100 + 9*10 + 2 = 8192. .Close . Examples vs Syntax vs Semantics Note. In the final, in your project, and in class, I will often ask you to give me an example of something. I mean a piece of code that is in the language that shows that you know what it means. If I ask for an example of a C++ for loop then .As_is for(i=1; i<=n; i ++) .As_is v+= d [i-1]; is good. If I ask for the syntax of a C++ for loop I expect something like this for_loop::= "for" "(" initialization ";" condition ";" increment ")" statement. I'm unlikely to ask for formal semantics.... BUT instead I may ask to express a general for loop in simpler terms. This is a form of "Operational Semantics". A good answer would give the general form of for-loop: .As_is for (A; B; C) S and then a simpler equivalent form: .As_is { A; while(B) { .As_is S .As_is C; .As_is } .As_is } Notice that the translation works whatever A,B,C, and S are, as long as the original form is syntactically correct. .Close CS320/04 Syntax & Parsing -- Parts of Chapters 3 and 4 . Class Work .See ./04q.html . Lab Work .See ./lab/04.html . Next .See ./05.html . Project LRM::=http://www/dick/cs320/A.html