Compiler Designing W09L1705 Part II

Automatic Memory Management Conclusions

  • It prevents very serious bugs but reduces programmer’s control as now programmer don’t know the memory structure and don’t know when memory is deallocated.
  • In real-time applications it can be very inefficient as the pauses should be less frequent or very short.
  • Memory leaks are also possible. X points to an AST and IL is made out of it then this AST is no longer needed, so it’s pointer should be changed to NULL.
  • There are more advanced GCs:
    • Concurrent: program and GC can run simultaneously
    • Generational: Objects that are not usually ccoolected are stored somewhere else and GC focuses on collecting on those objects which are more likely to be freed
    • Run-time: The pause time is restricted
    • Parallel: several collectors can work simultaneously.
Advertisements

Compiler Designing W09L1705

Garbage Collection Algorithm IV: Reference Count

  • A dedicated field is used to store the number of pointers pointing to some object. When this value drops to zero the object can be freed.
  • Each assignment would alter reference count.
X ← Y
rc(p)++            //p is the object pointed by Y
rc(q)--            //q is the object pointed by X
if(rc(q)==0) free q;
X ← Y

Assignment cost is increased because of the overhead to manipulate reference counts

  • This method is easy to implement and there are no pauses during implementation.
  • But this is slow and cannot collect circular structures.

Compiler Designing W09L1704

Garbage Collection Algorithm III: Conservative Collection

  • This algorithm is used by languages like C/C++.
  • In C/C++ it is impossible to identify contents of an object in memory. For example, a node of a list containing data and pointer to next node, can be considered correctly or a binary tree incorrectly.
  • Solution is to consider an address a pointer if it looks like one:
    • If it is aligned and;
    • if it points to a valid data segment in the program.

In this way set of reachables is overestimated. This is a conservative approach so that’s normal.

  • Even now objects cannot be moved, because it is not known that what is considered a pointer is actually a pointer.

Compiler Designing W09L1703

Garbage Collection Algorithm II: Stop and Copy

  • Here memory is divided into two parts: old space(for allocation) and new space(for invoking GC)
  • In old space, heap pointer points always to the next free word. When heap is full, then garbage collector is invoked and it marks all the reachable objects, the reachable values are copied to new space, new space and old space exchanges position. After garbage collection old space would have free space.
  • Copying one value to new space is done using a forward pointer. If the value is copied then it provided with the pointer pointing to its new address and also it shows that the value has been moved, any other object pointing to this object can now update its pointer to former object’s forwarding address.
  • Algorithm: pointer:
    • scan: points to the object in free space which are being scanned to the pointers they contain
    • alloc: it is either equal or ahead of scan, points to the free space in new space.
WHILE scan ≠ alloc do
    let O be the object at the scan pointer
    for each pointer p contained in O, and let p points to O'
          if O' has no forward pointer
                  copy O' at alloc pointer
                  make forward pointer of the old O'
                  change p to point to new address of O'
          else if O' has forward pointer
                  make O to point to address pointed by forward pointer
  • The problem with this algorithm is that the information in the stack is to be updated as the objects are moved which is an expensive task. Moreover, like mark and sweep object’s size is to be known and in addition location of an objects pointer is also needed.
  • It is the fastest, allocation is very cheap just increase the heap pointer. Collection is also very cheap and very beneficial if garbage is very large.
  • Moving objects is not possible in C/C++.

Compiler Designing W09L1702

Garbage Collection Algorithm I: Mark and Sweep

  • Each object is provided with a mark bit, if it 1 then object is reachable if 0, it’s not.
  • There are two phases of this algorithm: Mark and Sweep.
  • Mark Phase:
//mark phase
pointers = {roots} 
REPEAT 
     choose v from pointers
     pointers = pointers - v
     if mark(v) == 0 
          then mark(v) = 1
          insert all v_i those are reachable from v to pointers
TILL pointers is EMPTY
  • Sweep Phase
//sizeof(p) is know, it represents size of an object
//starts from the end of the heap

freeList = Ø
p = bottom of heap
REPEAT 
     if mark(p) == 1 then mark(p) = 0
     else
          freeList = freeList + block(p to sizeof(p)-1)
          p = p + sizeof(p)
TILL p > or = heap_top

Example.

[II]

  • In the case of move phase, memory is allocated for pointers, and pointers can be as large as possible. So instead of making a list a reverse pointer is made and all the nodes from a starting node are traversed like DFS. A register is provided to make a reverse pointer. Hence the pointers is stored in object.
  • Mark and sweep can fragment the memory, so it is important to merge memory blocks whenever possible.
  • No need to move objects.

Compiler Designing W09L1701

Automatic Memory Management

  • The concept is implemented when program runs out of space, in this case memory is freed and used.
  • To free up memory, unreachable objects can be deallocated. Unreachable objects are those which are not referenced by any other pointer.
  • To find those unreachable memory objects, list the root registers and then find all the connections to others pointers, connect to these pointer’s pointers. Free up space for the not connected object.
 X ← new A;
 Y ← new B;
 X ← Y;     //Memory allocated to X cab be freed
 if 1 then X ← new A else ... fi
 //if Y is not used again then space for B is also unused although it is being reference.

if a variable is dead, then memory can surely be used and if it is not then possibly it can be used.

  • Most of the garbage collectors do it in this way:
 Allocate as much as unused space
 if program runs out off space, then run garbage collector
      compute all the reachable objects from the root registers
      free space if not reachable

Some strategy may perform garbage collection before running out of space.

  • See video for a better explanation.

Compiler Designing W09L1604

Managing Caches

  • Registers contain opcode and are very small but really fast.
  • When something is not found in cache it is called cache miss, cache misses are costly. Now-a-days computers use two level cache to reduce the cost of cache miss.

  • Handling cache and registers can improve performance.
  • Usually registers are handles by compiler and cache by programmer.
for(j=1; j<10; j++)
    for(i=1; i<1000000; i++)
          a[i]*=b[i];

cache has to load the value of a[i] and b[i], as the value is very large cache will be full at some time and values would be loaded into start of the cache. When j = 2, 2nd loop would be executed again. Same value would be loaded again. It would consume much time for all the cache misses. Instead interchanging loops can make it faster:

for(i=1; i<1000000; i++) 
     for(j=1; j<10; j++)
          a[i]*=b[i];
//many times this is done by compiler, but it is programmer's responsibility.

Compiler Designing W09L1603

Spilling

  • Spilling is done when coloring is not possible. It is the case when number of temporaries are more than registers.
  • So variable with k or more neighbor is spilled in memory. The whole thing is done like this:

First of all apply graph coloring algorithm’s part when vertices are being stored in stack. Make it as far as it can, if the condition arises that every vertex has k or more neighbors then put one of these in stack and apply graph coloring on the rest. Assign color to each vertex in the rest of the graph. Now pop the vertex and place it in graph in hope that the rest of the graph has used less than k colors, if this is not the case then store the vertex in memory. Now at each use of that variable, it is to be loaded from memory or stored.

For every load and store, different versions of the variables are used. Hence each version gets the very low range for its liveness. In this way spilling reduces this range and also the number of neighbors by dividing one node in many. Now the graph coloring can be applied as before.

  • Any choice of spilling variable is correct but some produces better code, some heuristics used to select are:
    • select variable having most conflicts most of the edges would be removed;
    • used very less, decreases the usage of memory;
    • avoid variables of inner loop.
  • Example is explained in the video.

[II]

  • Register Allocation is important as IC can use many registers. In CISC it is very complicated. The Register Allocation discussed here is good for RISC machines.

Compiler Designing W09L1602

Graph Coloring Algorithm

  • As written in previous post, graph coloring algorithm is used to produce RIGs which in turn produces mapping function from temporaries to registers.
  • In k-colorable graph, two vertices are not joined if they have same color and number of least total colors given to the graph is k.
  • In the mapping function, if the graph made has k colors, then a register assignment won’t have more than k registers. l is number of machine register.

[II]

  • Its a NP-hard problem, hence some heuristics are used.
  • The heuristic is to use the node having neighbous less than k.
  • Algorithm:
REPEAT
     Choose a node having fewer than k neighbors
     PUSH this node in STACK
     remove the node's edges from the graph
UNTIL graph is empty

REPEAT
     POP a node from STACK
     pick a different color from the colors assigned to its neighbors
UNTIL stack is empty

Example is here.

If in the problem, one node can be assigned two register then the lowest register that means that register that entered the graph first while popping from stack is used.

Compiler Designing W09L1601

Register Allocation

  • In intermediate code, unlimited/large number of registers can be used. But this is not a practical case, so the temporaries required by the IC are mapped to registers in machine. This would be clearly a many-one mapping. For making it clear, lets see this example,
a := c + d;                //r_a = r_c + r_d
e := a + 2;                //r_a = r_a + #2
x := e - 1                 //r_a = r_a - #1

instead of using separate registers for all temporaries, depending on the liveness of a variable, same register is used.

  • Two temporaries cannot have same register if they are alive at the same time.
  • The temporary mapping register allocation is a global optimization of IC.
  • Graph coloring algorithm is used for the provision. The graph results is called Register Interference Graph(RIG).
  • At each statement liveness is given to each value, if two variables are live at a time then an edge can be drawn between them. After making the graph, check the vertices having no edges in between, these variables can be then given same register for temporaries.
  • See this video for example.