Compiler Designing W09L1702

Garbage Collection Algorithm I: Mark and Sweep

  • Each object is provided with a mark bit, if it 1 then object is reachable if 0, it’s not.
  • There are two phases of this algorithm: Mark and Sweep.
  • Mark Phase:
//mark phase
pointers = {roots} 
REPEAT 
     choose v from pointers
     pointers = pointers - v
     if mark(v) == 0 
          then mark(v) = 1
          insert all v_i those are reachable from v to pointers
TILL pointers is EMPTY
  • Sweep Phase
//sizeof(p) is know, it represents size of an object
//starts from the end of the heap

freeList = Ø
p = bottom of heap
REPEAT 
     if mark(p) == 1 then mark(p) = 0
     else
          freeList = freeList + block(p to sizeof(p)-1)
          p = p + sizeof(p)
TILL p > or = heap_top

Example.

[II]

  • In the case of move phase, memory is allocated for pointers, and pointers can be as large as possible. So instead of making a list a reverse pointer is made and all the nodes from a starting node are traversed like DFS. A register is provided to make a reverse pointer. Hence the pointers is stored in object.
  • Mark and sweep can fragment the memory, so it is important to merge memory blocks whenever possible.
  • No need to move objects.
Advertisements

Compiler Designing W09L1701

Automatic Memory Management

  • The concept is implemented when program runs out of space, in this case memory is freed and used.
  • To free up memory, unreachable objects can be deallocated. Unreachable objects are those which are not referenced by any other pointer.
  • To find those unreachable memory objects, list the root registers and then find all the connections to others pointers, connect to these pointer’s pointers. Free up space for the not connected object.
 X ← new A;
 Y ← new B;
 X ← Y;     //Memory allocated to X cab be freed
 if 1 then X ← new A else ... fi
 //if Y is not used again then space for B is also unused although it is being reference.

if a variable is dead, then memory can surely be used and if it is not then possibly it can be used.

  • Most of the garbage collectors do it in this way:
 Allocate as much as unused space
 if program runs out off space, then run garbage collector
      compute all the reachable objects from the root registers
      free space if not reachable

Some strategy may perform garbage collection before running out of space.

  • See video for a better explanation.

Compiler Designing W09L1604

Managing Caches

  • Registers contain opcode and are very small but really fast.
  • When something is not found in cache it is called cache miss, cache misses are costly. Now-a-days computers use two level cache to reduce the cost of cache miss.

  • Handling cache and registers can improve performance.
  • Usually registers are handles by compiler and cache by programmer.
for(j=1; j<10; j++)
    for(i=1; i<1000000; i++)
          a[i]*=b[i];

cache has to load the value of a[i] and b[i], as the value is very large cache will be full at some time and values would be loaded into start of the cache. When j = 2, 2nd loop would be executed again. Same value would be loaded again. It would consume much time for all the cache misses. Instead interchanging loops can make it faster:

for(i=1; i<1000000; i++) 
     for(j=1; j<10; j++)
          a[i]*=b[i];
//many times this is done by compiler, but it is programmer's responsibility.

Compiler Designing W09L1603

Spilling

  • Spilling is done when coloring is not possible. It is the case when number of temporaries are more than registers.
  • So variable with k or more neighbor is spilled in memory. The whole thing is done like this:

First of all apply graph coloring algorithm’s part when vertices are being stored in stack. Make it as far as it can, if the condition arises that every vertex has k or more neighbors then put one of these in stack and apply graph coloring on the rest. Assign color to each vertex in the rest of the graph. Now pop the vertex and place it in graph in hope that the rest of the graph has used less than k colors, if this is not the case then store the vertex in memory. Now at each use of that variable, it is to be loaded from memory or stored.

For every load and store, different versions of the variables are used. Hence each version gets the very low range for its liveness. In this way spilling reduces this range and also the number of neighbors by dividing one node in many. Now the graph coloring can be applied as before.

  • Any choice of spilling variable is correct but some produces better code, some heuristics used to select are:
    • select variable having most conflicts most of the edges would be removed;
    • used very less, decreases the usage of memory;
    • avoid variables of inner loop.
  • Example is explained in the video.

[II]

  • Register Allocation is important as IC can use many registers. In CISC it is very complicated. The Register Allocation discussed here is good for RISC machines.

Compiler Designing W09L1602

Graph Coloring Algorithm

  • As written in previous post, graph coloring algorithm is used to produce RIGs which in turn produces mapping function from temporaries to registers.
  • In k-colorable graph, two vertices are not joined if they have same color and number of least total colors given to the graph is k.
  • In the mapping function, if the graph made has k colors, then a register assignment won’t have more than k registers. l is number of machine register.

[II]

  • Its a NP-hard problem, hence some heuristics are used.
  • The heuristic is to use the node having neighbous less than k.
  • Algorithm:
REPEAT
     Choose a node having fewer than k neighbors
     PUSH this node in STACK
     remove the node's edges from the graph
UNTIL graph is empty

REPEAT
     POP a node from STACK
     pick a different color from the colors assigned to its neighbors
UNTIL stack is empty

Example is here.

If in the problem, one node can be assigned two register then the lowest register that means that register that entered the graph first while popping from stack is used.

Compiler Designing W09L1601

Register Allocation

  • In intermediate code, unlimited/large number of registers can be used. But this is not a practical case, so the temporaries required by the IC are mapped to registers in machine. This would be clearly a many-one mapping. For making it clear, lets see this example,
a := c + d;                //r_a = r_c + r_d
e := a + 2;                //r_a = r_a + #2
x := e - 1                 //r_a = r_a - #1

instead of using separate registers for all temporaries, depending on the liveness of a variable, same register is used.

  • Two temporaries cannot have same register if they are alive at the same time.
  • The temporary mapping register allocation is a global optimization of IC.
  • Graph coloring algorithm is used for the provision. The graph results is called Register Interference Graph(RIG).
  • At each statement liveness is given to each value, if two variables are live at a time then an edge can be drawn between them. After making the graph, check the vertices having no edges in between, these variables can be then given same register for temporaries.
  • See this video for example.

Compiler Designing W08L1505

Liveness Analysis

  • A variable is live if it holds a value that may be needed in the future. Liveness is measured for each statement when control comes in and goes out of the statement.
  • Copy propagation can result in dead code. See examples,
X := 3
Y := 3*x         //X would be replaced with 3 and then first statement would be dead.
X:=3(dead);
X:=4;   //new value would be effective now
Y:=X;
  • As in the case of copy propagation, liveness depends on the information propagation between adjacent statements.
  • Livenesss can either be true or false; L(…)=true/false
  • Rules:
1. L(p,x,out) = ∨ {L(s,x,in) | s is a successor of p}
2. L(...=f(x),x, in) = true if x is used in the RHS of statement
3. L(x:=e,x,in) = false if e has nothing to do with x
4. L(s,x,in) = L(s,x,out) when s doesn't use x.

Algorithm:

initially take L(...)= false at all the point
and repeat till all the statements satisfy the above 4 rules.

Example from 7:50 minute.

  • A value of liveness can only change once, from false to true. Hence false < true.
  • Liveness is a backward analysis while constant propagation is a forward analysis.

Compiler Designing W08L1504

Orderings

  • Orderings are done is a such a way that:

⊥ < C < T

  • This can be imagined as having three levels, the uppermost contains T, the last one contains ⊥ and the mid level contains number of incomparable constants.
lub(T, a) = T;
lub(⊥, a) = a;
lub(T, ⊥) = T;
lub(a,b) = T

The rules stated in some previous post was:

1. C(x,pi,out) = T ∀i ⇒ C(x,s,in) = T
2. C(x,pi,out) = c or d and c ≠ d ∀i ⇒ C(x,s,in)=T
3. C(x,pi,out) = c or ⊥ ∀i ⇒ C(x,s,in) = c
4. C(x,pi,out) = ⊥ ∀i ⇒ C(x,s,in) = ⊥
these all rules can be combined and stated as:
C(s,x,in) = lub(C(p,x,out) | p is a predecessor of s)

In this arrangement value X_in and X_out can change at most twice. Hence the algorithm can be terminated if all the elements has changed twice or no change is there. Also number of steps would be number of statements times 4.

Compiler Designing W08L1503

Analysis of Loops

loops

  • This CFG is for a loop. At the statement Y := 0, the value of X can not be determined. As value of predecessors X := 3 and A : 2*X is needed but see that A := 2*x can only be defined when Y := 0 can get some output.

To solve this initially bottom is given to that variable for that predecessor. In this case X would have value bottom for the loop edge and then all the values are updated.

See the video for example.

Compiler Designing W08L1502

Global Optimization: Constant Propagation

  • Some notations that would be used:
    • T(Top) ⇒ X is not a constant or the value is not known now;
    • C(Constant) ⇒ X is constant C;
    • ⊥(Bottom) ⇒ X is dead.
  • At each node, define value of X after each statement of the node, including the start and end node.
  • Data flow analysis algorithms: The analysis of a complicated program can be expressed as a combination of simple rules relating the change in information between the adjacent statements.
  • For each statement two functions are defined C(x,s,in)=value of x before s and C(x,s,out)=value of x after s; (C:constant information). Statement’s predecessor statements are also used.
  • If the statement s is achieved after some independent statements pi; i=1,..n
1. C(x,pi,out) = T ∀i ⇒ C(x,s,in) = T
2. C(x,pi,out) = c or d and c ≠ d ∀i ⇒ C(x,s,in)=T
3. C(x,pi,out) = c or ⊥ ∀i ⇒ C(x,s,in) = c
4. C(x,pi,out) = ⊥ ∀i ⇒ C(x,s,in) = ⊥
  • When same statement’s in and out to be calculated:
5. C(x, s, out) = ⊥ if s doesn't use x
6. C(x, x :=a, out) = x := a
7. C(x, x=f(...), out) = x := T //f(...) is a complicated expression
8. C(x, y=..., out) = x is unchanged.
  • Algorithm looks like:
1. For entering node put X := T
2. for any other statement put x := ⊥
3. Repeat till all the points from 1-8 are satisfied.

Example from 17:53 minute.