Summary of Chapter 2


The syntax-directed techniques in this chapter can be used to construct compiler front ends, such as those illustrated in the follow figure:

 translations of a statement

The starting point for a syntax-directed translator is a grammar for the source language. A grammar describes the hierarchical structure of programs. It is defined in terms of elementary symbols called terminals and variable symbols called nonterminals. These symbols represent language constructs. The rules or productions of a grammar consist of a nonterminal called the head or left side of a production and a sequence of terminals and nonterminals called the body or right side of the production. One nonterminal is designated as the start symbol.

In specifying a translator, it is helpful to attach attributes to programming construct, where an attribute is any quantity associated with a construct. Since constructs are represented by grammar symbols, the concept of attributes extends to grammar symbols. Examples of attributes include an integer value associated with a terminal num representing numbers, and a string associated with a terminal id representing identifiers.

A lexical analyzer reads the input one character at a time and produces as output a stream of tokens, where a token consists of a terminal symbol along with additional information in the form of attribute values. In the figure being presented, tokens are written as tuples enclosed between <>. The token <id, “peek”> consists of the terminal id and a pointer to the symbol-table entry containing the string “peek”. The translator uses the table to keep track of reserved words and identifiers that have already been seen.

Parsing is the problem of figuring out how a string of terminals can be derived from the start symbol of the grammar by repeatedly replacing a nonterminal by the body of one of its productions. Conceptually, a parser builds a parse tree in which the root is labeled with the start symbol, each nonleaf corresponds to a production, and each leaf is labeled with a terminal or the empty string ε. The parse tree derives the string of terminals at the leaves, read from left to right.

Efficient parsers can be built by hand, using a top-down (from the root to the leaves of a parse tree) method called predictive parsing. A predictive parser has a procedure for each nonterminal; procedure bodies mimic the productions for nonterminals; and, the flow of control through the procedure bodies can be determined unambiguously by looking one symbol ahead in the input stream.

Syntax-directed translation is done by attaching either rules or program fragments to productions in a grammar. In this chapter, we have considered only synthesized attributes – the value of a synthesized attribute at any node x can depend only on attributes at the children of x, if any. A syntax-directed definition attaches rules to productions; the rules compute attribute values. A translation scheme embeds program fragments called semantic actions in production bodies. The actions are executed in the order that productions are used during syntax analysis.

The result of syntax analysis is a representation of the source program, called intermediate code. Two primary forms of intermediate code are illustrated in the Figure above. An abstract syntax tree has nodes for programming constructs; the children of a node given the meaningful subconstructs. Alternatively, three-address code is a sequence of instructions in which each instruction carries out a single operation.

Symbol tables are data structures that hold information about identifiers. Information is put into the symbol table when the declaration of an identifier is analyzed. A semantic action gets information from the symbol table when the identifier is subsequently used, for example, as a factor in an expression.