[Skip Navigation] [CSUSB] / [CNS] / [CSE] / [R J Botting] / [ MATHS ] / math_63_Languages
[Contents] [Source Text] || [Notation] || [Copyright] || [Contact] [Search ]
Sun Oct 14 12:34:52 PDT 2012

Contents


    Languages

      This set of notes are focussed on developing a theory of languages that can be used to define realisticformal languages and develop tools to manipulate them. As a result they can, by Michael A Jackson's pioneering methods they can be used to develop programs that process data (JSP) and Management Information Systems (JSD). I have written a monograph [ ../monograph/ ] to explore this.

      .RoadWorksAhead [click here [socket symbol] Languages and Grammars if you can fill this hole]

      The mathematical model of a language is a set of strings. The strings must be sequences of zero or more symbols taken from a finite set often called an alphabet.

    1. LANGUAGE::=following
      Net

      1. |-STRINGS
      2. Alphabet::Sets,
      3. |-finite Alphabet.
      4. Strings::=#Alphabet.
      5. Languages::=@Strings.

        See also

      6. STRINGS::= See http://cse.csusb.edu/dick/maths/math_62_Strings.html.

        Properties of Languages

          The set of languages for a given alphabet forms a Boolean Algebra with repsect to union, intersection, and complement, of course. More they form a Monoid with respect to concatenation:
        1. For A,B:Languages, A B::Languages= {a ! b . a:A, b:B}.
        2. 1::Languages= {}.
        3. (above)|-MONOID( Languages, (), 1 ).

          Indeed, we have a Ring with Union and Concatenation, but not with intersection and concatenation.

          [click here [socket symbol] codes, equivalence, etc. if you can fill this hole]

        . . . . . . . . . ( end of section Properties of Languages) <<Contents | End>>


      (End of Net)

      Grammar

      A Grammar defines a language and imposes a stucture on the strings in that language. Chomski invented a model for how grammars work. A grammar describes (explicitly) rules for generating a correct strings in a given language. If well chosen each string in the language can be generated in essentially only one way. This way of generating the string defines the structure of the string - how it is parsed.

      Grammars -- especially written in various extensions of Bachus-Naur-Form (BNF) -- are commonly used to define computer languages. Several examples can be found in my [ ../samples/ ] directory.

    2. GRAMMAR::=
      Net{
      1. Alphabet::Sets=given,
      2. |-finite Alphabet.
      3. productions::Sets.
      4. P::=@productions,
      5. terminals::@#Alphabet=given,
      6. T::= terminals,
      7. non_terminals::Finite_sets Name=given,
      8. N::= non_terminals=given,
      9. vocabulary::=T|N.
      10. V::=vocabulary.
      11. source::non_terminals=given,
      12. S::=source,

      13. meta_language::=#(production | comment),
      14. M::=meta_language,
      15. comment::=#character~production.
      16. |-P = img(M)&production.

      }=::GRAMMAR.

      Hierarchy of Meta_languages

      Different restrictions on the form of productions define different classes of languages. Many of these have been studied in detail.

      Here is a table of some classic grammatical types. The form of Productions is shown with capital letters (A,B,C) being used for Nonterminals and lower case letters for terminal symbols. Greek letters ( α, β, ...) are for strings of terminals and nonterminals.
      Table
      NameForm of ProductionsNotes
      RegularA->a, A->a BChomski
      Context FreeA->αChomski
      Conjunctive GrammarsA->α & β & ... See [Okhotin03] and my own [ ../monograph/03-intersections.html ]
      Boolean GrammarsA->α & β & ... & \not γ & \not δ See [Okhotin04b]
      Context_sensitiveα A β->α γ βChomski
      Context_dependentβ->αChomski

      (Close Table)
      In the above table the notation x->y indicates a substitution, [ math_62_Strings.html#Substitution ] for a formal definition.


    3. (theory)|-regular(T)=>>context_free_language(T) =>> para_grammar_language=>> extended_grammar_language.

      No useful languages are context free - so more general form of grammars are needed. However CFGs are the commonest form of grammar used in practice. They provide useful information in Language Reference Manuals and the easiest way of designing compilers and interpreters. They are also the basis of Jackson Structured Programming.

    4. Context_Free_Grammars::=CFG, below:
    5. CFG::=
      Net{
        Extended Backus-Naur Form is equivalent to a context free grammar(CFG). There exist standard tools for handling context free grammars.


      1. |-GRAMMAR(production=>context_free_production).
      2. context_free_production::syntax=non_terminal ":"":""=" regular_expression,
      3. regular_expression::=Regular_Expressions(terminal|non_terminal).

        Note this is more general than Chomski's definition of a CFG but describes the same family of languages.


      }=::CFG.

      Lexicons and Lexical Analysis

      Most languages have a basic vocabulary of strings that are treated as complete system (a lexeme). The basic symbols of a language are always defined by a simple context free grammar. Indeed most lexicons are a regular language. These are best defined using simple descriptions:
       	word :: lexeme, purpose.
      or definitions,
       	name :: lexeme = string, purpose.
       	name :: lexeme = regular_expression, purpose.

      Net
      1. typedef::lexeme, use to introduce type definitions in C/C++/Java/etc.
      2. double_plus::lexeme= "++", used to add 1 to a variable in C/C++/Java etc.
      3. all::lexeme= "all", universal quantifier.
      4. some::lexeme= "some", existential quantifier.
      5. Ada_end::lexeme=ignore_case("end").

      (End of Net)

      Para_Grammars

    6. Para_Grammar::=following,
      Net


      1. |-GRAMMAR(production=>para_production).
      2. para_production::= non_terminal ":"":""=" para_regular_expression,
      3. para_regular_expression::=(("&" | "|") parameters|) item #(("&"|"~") item),
      4. item::= (|hash)(terminal |non_terminal | "(" selection ")" | parameter),
      5. selection::= sequence #("|"sequence),
      6. sequence::= #(item | parameterized),

      7. parameterized::= parameters item,
      8. parameters::=l_bracket binding #(comma binding) r_bracket,
      9. binding::=parameter #(comma parameter) ":"item,
      10. parameter::=variable.

        Parameters are used to indicate correspondences across sequences, etc.

        Example
        Let

        1. dup::=|[w:#Bits ](w w).
        2. defines a context sensitive language in a very simple form.

        (Close Let )

      (End of Net Para_Grammar)

      Semantics

      The meaning of a grammar depends on how it is used - lexical or syntactical, .... Specialized languages can be constructed by using a grammar with different sets of terminal symbols and by interpretting the structures and tags differently.

      By default you can assume that a sequence is parsed as an array or list (of type #X for some X). Parameterized items become mappings from parameters to content(of type X<>->Y). For more, see [ notn_2_Structure.html ] [ Definitions in notn_14_Docn_Semantics ] [ notn_13_Docn_Syntax.html] , for an example of a how the formal grammar of MATHS maps into a structured objet.

      Translation

      A translation can be defined by giving two corresponding grammars. One defines the input, and the other the output. Tags indicate corresponding strings in the two grammars.

      More sophisticated is to simultaneously define both the input and the output at the same time. Here each definition doesn't define a set of strings, but does define a set of pairs of strings (equivalent therefore to a relationship between two languages). A simple version of this idea will be found in Aho and Ullman Volume 1.

      Parsers

      A parser constructs a data structure that encodes the input sequence. Each non_terminal in the grammar will have its own associated type of object in this data structure. A particular input is encoded as a set of objects which in generalwill not be a tree but a directed acyclic graph(dag). In MATHS a simple formal model might be to simultanously define the abstract and the concrete syntax like this:

      (expression, expressions) ::@#Alphabet><Sets=(f:function e:delimitted | e:delimitted dot f:function,$ {f:functions, e:expressions}).

      Clearly this involves much redundancy, and makes the definition harder to parse. A more readable convention is to place the abstract syntax imediately following the concrete syntax:

    7. expression::syntax=f:function e:delimitted | e:delimitted dot f:function,
    8. expression::parsing=$ Net{f:functions, e:expressions}.

      Thus sequence with tags leads to an object with tags identifying the components in it plus a special set of pointer tags identifying the object of which this object is part.

      In (A|B|C|...) only one object will constructed, but it will be one of the alternative types indicated. The alternatives may or may not have overlapping component types. When the alternatives overlap, the grammar is ambiguous. You should then assume that the earlier alternatives take priority and a parser should look for the first alternative to fit.

      In a concurrent expression the tags should all be distinct. From A & B & C ... The object is constructed by all parsings and so each concurrent part must have its own distinct tags. Thus each concurrent parsing operates on its part of the tpl, which can be implemented as a separately locked record. Thus the exact timing of the concurrent parsing is irrelevant.

      In (A~B) it is assumed that B will be attempted before hand, and if it fails then its n-tpl is removed and/or replace by the n-tple (if any) described in A.

      Data Directed Program Design

      By extending the concept of simultaneous definitions two a n-ary relation between sets of input, outputs and objects a formal version of JSP is developed.

      Dynamics

      A grammar can be used to describe the possible and/or significant patterns of events in a software system. This is the basis of JSD and part of SSADM.

      UNIX Regular Expressions

    9. UNIX_RE::=
      Net{
      1. regular_expression::= sequence #(vertical_bar sequence),
      2. sequence::=(circumflex|)regular_item #regular_item (dollar_sign|),
      3. regular_item::=(element |"("regular_expression")")
      4. (star |plus | "{" number(comma number|) "}" ),
      5. element::=char~special_character | back_slash char | "[" set "]",
      6. set::=(circumflex|) #(char | char minus char ),
      7. special_char::=left|back_slash|vertical_bar|star|plus.

      8. There exist standard tools for encoding and interpretting regular expressions - see other.

      }=::UNIX_RE.

    Notes on MATHS Notation

    Special characters are defined in [ intro_characters.html ] that also outlines the syntax of expressions and a document.

    Proofs follow a natural deduction style that start with assumptions ("Let") and continue to a consequence ("Close Let") and then discard the assumptions and deduce a conclusion. Look here [ Block Structure in logic_25_Proofs ] for more on the structure and rules.

    The notation also allows you to create a new network of variables and constraints. A "Net" has a number of variables (including none) and a number of properties (including none) that connect variables. You can give them a name and then reuse them. The schema, formal system, or an elementary piece of documentation starts with "Net" and finishes "End of Net". For more, see [ notn_13_Docn_Syntax.html ] for these ways of defining and reusing pieces of logic and algebra in your documents. A quick example: a circle might be described by Net{radius:Positive Real, center:Point, area:=π*radius^2, ...}.

    For a complete listing of pages in this part of my site by topic see [ home.html ]

    Notes on the Underlying Logic of MATHS

    The notation used here is a formal language with syntax and a semantics described using traditional formal logic [ logic_0_Intro.html ] plus sets, functions, relations, and other mathematical extensions.

    For a more rigorous description of the standard notations see

  1. STANDARD::= See http://www.csci.csusb.edu/dick/maths/math_11_STANDARD.html

    Glossary

  2. above::reason="I'm too lazy to work out which of the above statements I need here", often the last 3 or 4 statements. The previous and previous but one statments are shown as (-1) and (-2).
  3. given::reason="I've been told that...", used to describe a problem.
  4. given::variable="I'll be given a value or object like this...", used to describe a problem.
  5. goal::theorem="The result I'm trying to prove right now".
  6. goal::variable="The value or object I'm trying to find or construct".
  7. let::reason="For the sake of argument let...", introduces a temporary hypothesis that survives until the end of the surrounding "Let...Close.Let" block or Case.
  8. hyp::reason="I assumed this in my last Let/Case/Po/...".
  9. QED::conclusion="Quite Easily Done" or "Quod Erat Demonstrandum", indicates that you have proved what you wanted to prove.
  10. QEF::conclusion="Quite Easily Faked", -- indicate that you have proved that the object you constructed fitted the goal you were given.
  11. RAA::conclusion="Reducto Ad Absurdum". This allows you to discard the last assumption (let) that you introduced.

End