April 15, 2013

Compiler Notes

Notes that I should study/review…

Chapter Summaries:

Area 1:
- Chapter 1
- Chapter 2
Area 2:
- Chapter 3: no 3.5 & 3.9
Area 3:
- Chapter 4: 4.1 – 4.6
Area 4:
- Chapter 5: all
Area 5:
- Chapter 6: **6.4**, no 6.7
- Chapter 7: 7.1 – 7.3.1
Area 6:
- Chapter 8: 8.1, 8.3-8.5, 8.7
- Chapter 9: 9.1 – 9.5

Posted in Notes
Tagged Compiler

April 14, 2013

Summary of Chapter 9

MACHINE-INDEPENDENT OPTIMIZATION

Global Common Subexpressions: An important optimization is finding computations of the same expression in two different basic blocks. If one precedes the other, we can store the result the first time it is computed and use the stored result on subsequent occurrences.

Copy Propagation: A copy statement, u = v, assigns on variable v to another, u. In some circumstances, we can replace all uses of u by v, thus eliminating both the assignment and u.

Code Motion: Another optimization is to move a computation outside the loop in which it appears. This change is only correct if the computation produces the same value each time around the loop.

Induction Variables: many loops have induction variables, variables that take on a linear sequence of values each time around the loop. Some of these are used only to count iterations, and they often can be eliminated, thus reducing the time it takes to go around the loop.

Data-Flow Analysis: A data-flow analysis schema defines a value at each point in the program. Statements of the program have associated transfer functions that relate the value before the statement to the value after. Statements with more than on predecessor must have their value defined by combining the values at the predecessors, suing a meet (or confluence) operator.

Data-Flow Analysis on Basic Blocks: Because the propagation of data-flow values within a block is usually quite simple, data-flow equations are generally set up to have two variables for each block, called IN and OUT, that represent the data-flow values at the beginning and end of the block, respectively. The transfer functions for the statements in a block are composed to get the transfer function for the block as a whole.

Reaching Definitions: The reaching-definitions data-flow framework has values that are sets of statements in the program that define values for one or more variables. The transfer function for a block kills definitions of variables that are definitely redefined in the block and adds (“generates”) in those definitions of variables that occur within the block. The confluence operator is union, since definitions reach a point if they reach any predecessor of that point.

Line Variable: Another important data-flow framework computes the variables that are live (will be used before redefinition) at each point. The framework is similar to reaching definitions, except that the transfer function runs backward. A variable is live at the beginning of a block if it is either used before definition in the block or is live at the end of the block and not redefined in the block.

Available expressions: To discover global common subexpressions, we determine the available expressions at each point – expressions that have been computed and neither of the expression’s arguments were redefined after the last computation. The data-flow framework is similar to reaching definitions, but the confluence operator is intersection rather than union.

Abstraction of Data-Flow Problems: Common data flow problems, such as those already mentioned, can be expressed in a common mathematical structure. The value are members of a semilattice, whose meet is the confluence operator. Transfer functions map lattice elements to lattice elements. The set of allowed transfer functions must be closed under composition and include the identity function.

Monotone Frameworks: A semilattice has a <= relation defined by a <= b if and only if a Λ b = a. Monotone frameworks have the property that each transfer function preserve the <= relationship; that is, a <= b implies f(a) <= f(b), for all lattice elements a and b and transfer function f.

Distributive Frameworks: These framework satisfy the condition that f(aΛb) = f(a)Λf(b), for all lattice elements a and b and transfer function f. It can be shown that the distributive condition implies the monotone condition.

Iterative Solution to Abstract Frameworks: All monotone data-flow frameworks can be solved by an iterative algorithm in which the IN and OUT values for each block are initialized appropriately (depending on the framework), and new values for these variables are repeatedly computed by applying the transfer and confluence operations. This solution is always safe (optimizations that it suggests will not change what the program does), but the solution is certain to be the best possible only if the framework is distributive.

The Constant Propagation Framework: While the basic frameworks such as reaching definitions are distributive there are interesting monotone-but-not-distributive framework as well. One involves propagating constants by using a semilattice whose elements are mappings from the program variables to constants, plus two special values that represent “no information” and “definitely not a constant.”

Partial-Redundancy Elimination: Many useful optimizations, such as code motion and global common-subexpression elimination can be generalized to a single problem called partial-redundancy elimination. Expressions that are needed, but are available along only some of the paths to a point, are computed only along the paths where they are not available. The correct application of this idea requires the solution to a sequence of four different data flow problems plus other operations.

Dominators: A node in a flow graph dominates another if every path to the latter must go through the former. A proper dominator is a dominator other than the node itself. Each node except the entry node has a immediate dominator – that one of its proper dominators that is dominated by all the other proper dominators.

Depth-First Ordering of Flow Graphs: If we perform a depth-first search of a flow graph, starting at its entry, we produce a depth-first spanning tree. The depth-first order of the nodes is the reverse of a postorder traversal of the tree.

Classification of Edges: When we construct a depth-first spanning tree, all the edges of the flow graph can be divided into three groups: advancing edges (those that go from ancestor to proper descendant), retreating edges (those from descendant to ancestor) and cross edges (others). An important property is that all the cross edges go from right to left in the tree. Another important property is that of these edges, only the retreating edges have a head lower than its tail in the depth-first order (reverse postorder).

Back Edges: A back edge is one whose head dominates its tail. Every back edge is a retreating edge, regardless of which depth-first spanning tree for its flow graph is chosen.

Reducible Flow Graphs: If every retreating edge is a back edge, regardless of which depth-first spanning tree is chosen, then the flow graph is said to be reducible. The vast majority of flow graphs are reducible; those whose only control-flow statements are the usual loop=forming and branching statements are certainly reducible.

Natural Loops: A natural loop is a set of nodes with a header node that dominates all the nodes in the set and has at least one back edge entering that node. Given any back edge, we can construct its natural loop by talking the head of the edge plus all nodes that can reach the tail of the edge without going through the head. Two natural loops with different headers are either disjoint or one is completely contained in the other; this fact lets us talk about a hierarchy of the nested loops, as long as “loops” are taken to be natural loops.

Depth-First Order Makes the Iterative Algorithm Efficient: The iterative algorithm requires few passes as long as propagation of information along acyclic paths is sufficient; ie., cycles add nothing. If we visit nodes in depth-first order, any data-flow framework that propagates information forwards, e.g., reaching definitions, will converge in no more than 2 plus the largest number of retreating edges on any acyclic path. The same holds for backward-propagating frameworks, like live variables, if we visit in the reverse of depth-first order (i.e., in postorder).

Regions: Regions are sets of nodes and edges with a header h that dominates all nodes in the region. The predecessors of any node other than h in the region must also be in the region. The edges of the region are all that go between nodes of the region, with the possible exception of some or all that enter the header.

Regions and Reducible Flow Graphs: Reducible flow graphs can be parsed into a hierarchy of regions. These regions are either loop regions, which include all the edge into the header, or body regions that have no edges into the header.

Region-Based Data-flow analysis: an alternative to the iterative approach to data-flow analysis is to work up and down the region hierarchy, computing transfer functions from the header of each region to each node in that region.

Region-Based Induction Variable Detection: An important application of region-based analysis is in a data-flow framework that tries to compute formulas for each variable in a loop region whose value is an affine (linear) function of the number of times around the loop.

Posted in Notes
Tagged Compiler

April 14, 2013

Summary of Chapter 8

CODE GENERATION

Code generation is the final phase of a compiler. The code generator maps the intermediate representation produced by the front end, or if there is a code optimization phase by the code optimizer, into the target program.

Instruction selection is the process of choosing target-language instructions for each IR statement.

Register allocation is the process of deciding which IR values to keep in registers. Graph coloring is an effective technique for doing register allocation in compilers.

Register assignment is the process of deciding which register should hold a given IR value.

A retargetable compiler is one that can generate code for multiple instruction set.

A virtual machine is an interpreter for bytecode intermediate language produced by languages such as Java and C#.

A CISC machine is typically a two-address machine with relatively few registers, several register classes, and variable-length instructions with complex addressing modes.

A RISC machine is typically a three-address machine with many registers in which operations are done in registers.

A basic block is a maximal sequence of consecutive three-address statements in which flow of control can only enter at the first statement of the block and leave at the last statement without halting or branching except possibly at the last statement in the basic block.

A flow graph is a graphical representation of a program in which the nodes of the graph are basic blocks and the edges of the graph show how control can flow among the blocks.

A loop in a flow graph is strongly connected region with a single entry point called the loop header.

A DAG representation of a basic block is directed acyclic graph in which the node of the DAG represent the statements within the block and each child of a node corresponds to the statement that is the last definition of an operand used in the statement.

Peephole optimizations are local code-improving transformations that can be applied to a program, usually through a sliding window.

Instruction selection can be done by a tree-rewriting process in which tree patterns corresponding to machine instructions are used to tile a syntax tree. We can associate costs with the tree-rewriting rules and apply dynamic programming to obtain an optimal tiling for useful classes of machines and expressions.

An Ershov number tells how many registers are needed to evaluate an expression without storing any temporaries

Spill code is an instruction sequence that stores a value in a register into memory in order to make room to hold another value in that register.

Posted in Notes
Tagged Compiler

April 13, 2013

Summary of Chapter 7

RUN-TIME ENVIRONMENTS

Run-time Organization. To implement the abstractions embodied in the source language, a compiler creates and manages a run-time environment in concert with the operating system and the target machine. The run-time environment has static data areas for the object code and the static data objects created at compile time. It also has dynamic stack and heap areas for managing objects created and destroyed as the target program executes.

Control Stack. Procedure calls and returns are usually managed by a run-time stack called the control stack. We can use a stack because procedure calls or activation nest in time; that is if p calls q, then this activation of q is nested within this activation of p.

Stack Allocation. Storage for local variables can allocated on a run-time stack for languages that allow or require local variable to become inaccessible when their procedures ends. For such languages, each live activation has an activation record (or frame) on the control stack, with the root of the activation tree at the bottom, and the entire sequence of activation records on the stack corresponding to the path in the activation tree to the activation where control currently resides. The latter activation has its record at the top of the stack.

Access to Nonlocal Data on the Stack. For language like C that do not allow nested procedure declarations, the location for a variable is either global or found in the activation record on top of the run-time stack. For languages with nested procedures, we can access nonlocal data on the stack through access links, which are pointers added to each activation record. This desired nonlocal data is found by following a chain of access links to the appropriate activation record. A display is an auxiliary array, used in conjunction with access links, that provides an efficient short-cut alternative to a chain of access links.

Heap Management. The heap is the portion of the store that is used for data that can live indefinitely, or until the program deletes it explicitly The memory manager allocates and deallocates space within the heap. Garbage collection finds spaces within the heap that are no longer in use and can therefore be reallocated to house other data items. For languages that require it, the garbage collector is an important subsystem of the memory manager.

Exploiting Locality. By making good use of the memory hierarchy, memory managers can influence the run time of a program. The time taken to access different parts of memory can vary from nanoseconds to milliseconds. Fortunately, most programs spend most of their time executing a relatively small fraction of the code and touching only a small fraction of the data. A program has temporal locality if it is likely to access the same memory locations again soon; it has spatial locality if it is likely to access nearby memory locations soon.

Reducing Fragmentation. As the program allocates and deallocates memory, the heap may get fragmented, or broken into large numbers of small non contiguous free spaces or holes. The best fit strategy – allocate the smallest available hole that satisfies a request – has been found empirically to work well. While best fit tends to improve space utilization, it may not be best for spatial locality. Fragmentation can be reduced by combining or coalescing adjacent holes.

Manual Deallocation. Manual memory management has two common failings: not deleting data that cannot be referenced is a memory leak error, and referencing deleted data is a dangling-pointer-dereference error.

Reachability. Garbage is data that cannot be referenced or reached. There are two basic ways of finding unreachable objects: either catch the transition as a reachable object turns unreachable, or periodically locate all reachable objects and infer that all remaining objects are unreachable.

Reference-Counting Collectors maintain a count of the references to an object; when the count transitions to zero, the object becomes unreachable. Such collectors introduce the overhead of maintaining references and can fail to find “cyclic” garbage, which consists of unreachable objects that reference each other, perhaps through a chain of references.

Trace-Based Garbage Collectors iteratively examine or trace all references to find reachable objects, starting with the root set consisting of objects that can be accessed directly without having to dereference any pointers.

Mark-and-Sweep Collectors visit and mark all reachable objects in a first tracing step and then sweep the heap to free up unreachable objects.

Mar-and-Compact Collectors improve upon mark-and-sweep; they relocate reachable objects in the heap to eliminate memory fragmentation.

Copying collectors break the dependency between tracing and finding free space. They partition the memory into two semispaces, A and B. Allocation requests are satisfied from one semispace, say A, until it fills up, at which point the garbage collector takes over, copies the reachable objects to the other space, say B, and reverses the roles of the semispaces.

Incremental Collectors. Simple traced-based collectors stop the user program while garbage is collected. Incremental collectors interleave the actions of the garbage collector and the mutator or user program. The mutator can interfere with increment reachability analysis, since it can change the references within previously scanned objects. Incremental collectors therefore play it safe by overestimating the set of reachable objects; any”floating garbage” can be picked up in the next round of collection.

Partial Collectors also reduce pauses; they collect a subset of the garbage at a time. The best known partial-collection algorithms, generational garbage collection, partitions objects according to how long they have been allocated and collects the newly created objects more often because they tend to have shorter lifetimes. An alternative algorithm, the train algorithm, uses fixed length partitions, called cars, that are collected into trains. Each collection step is applied to the first remaining car of the first remaining train. When a car is collected, reachable objects are moved out to other cars, so this car is left with garbage and can be removed from the train. These two algorithms can be used together to create a partial collector that applies the generational algorithm to younger objects and the train algorithm to more mature objects.

Posted in Notes
Tagged Compiler

April 12, 2013

Summary of Chapter 6

INTERMEDIATE-CODE GENERATION

Pick an intermediate representation: An intermediate representation is typically some combination of a graphical notation and three-address code. As in syntax trees, a node in a graphical notation represnets a construct; the children of a node represent its subconstructs. Three address code takes its name from instructions of the form x = y op z, with at most one operator per instruction. There are additional instructions for control flow.

Translate expressions: Expression with built-up operations can be unwound into a sequence of individual operations by attaching actions to each production of the form E -> E1 op E2. The action either creates a node for E with the nodes for E1 and E2 as children, or it generates a three-address instruciton that applies op to the addresses for E1 and E2 and puts the result into a new temporary name, which becomes the address for E.

Check types: The type of an expression E1 op E2 is determined by the operator op and the types of E1 and E2. A coercion is an implicit type conversion, such as from integer to float. Intermediate code contains explicit type conversions to ensure an exact match between operand types and the types expected by an operator.

Use a symbol table to implement declarations: A declaration specifies the type of a name. The width of a type is the amount of storage needed for a name with that type. Using widths, the relative address of a name at run time can be computed as an offset from the start of a data area. The type and relative address of a name are put into the symbol table due to a declaration, so the translator can subsequently get them when the name appears in an expression.

Flatten arrays: For quick access, array elements are stored in consecutive locations. Arrays of arrays are flattened so they can be treated as a one-dimensional array of individual elements. The type of an array is used to calculate the address of an array element relative to the base of the array.

Generate jumping code for boolean expressions: In short-circuit or jumping code, the value of a boolean expression is implicit in the position reached in the code. Jumping code is useful because a boolean expression B is typically used for control flow, as in if (B S. Boolean values can be computed by jumping to t = true or t = false, as appropriate, where t is a temporary name. Using labels for jumps, a boolean expression can be translated by inheriting labels corresponding to its true and false exits. The constants true and false translate into a jump to the true and false exits, respectively.

Implementing statements using control flow: Statements can be translated by inheriting a label next, where next marks the first instruction after the code for this statement. The conditional S -> if (B) S1 can be translated by attaching a new label marking the beginning of the code for S1 and passing the new label and S.next for the true and false exits, respectively.

Alternatively, use backpatching: Backpatching is a technique for generating code for boolean expressions and statements in one pass. The idea is to maintain lists of incomplete jumps, where all the jump instruction on a list have the same target. When the target becomes known, all the instructions on its list are completed by filling in the target.

Implement records: Field names in a record or class can be treated as a sequence of declarations. A record type encodes the types and relative addresses of the fields. A symbol table object can be used for this purpose.

Posted in Notes
Tagged Compiler

S	M	T	W	T	F	S
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

My Little Space

– info that you might or might not need –

Tag Archives: Compiler