Compiler Designing W04L0806 & 07 & 08

SLR Parsing Example and Improvements

  • The example in video itself is sufficient for SLR.
  • Problem in SLR parsing: It has to re-run the viable prefix automata many times for the stack as at every transition it reads the stack.

Solution: The state of the stack can be remembered. Now stack stores items in this format:<symi, DFA statei>

and statei is the DFA state after sym1sym2…symi is processed. For the start symbol <dummy, startState> is added to the stack.

A GOTO table is made showing the transition from statei to a statej on input A, GOTO[i,A]=j

Moves:

     1. SHIFT x: PUSH <a, x> in stack where a is next input and x is the state;

     2. REDUCE would be the same,

     3. ACCEPT

     4. ERROR

An ACTION table is made showing the action to be taken when stack at the statei and a is the next input: ACTION[i,a]=action

if X → α.aβ and GOTO[i,a]=j then ACTION[i, a]=SHIFT to j

if X → α. and a ∈ FOLLOW(X) then ACTION[i,a]=REDUCE X → α

if X → S. then ACTION[i,a]=ACCEPT

else ACTION[i,a]=ERROR

  • Algorithm for SLR parsing:
LET I=W$ is the input and j=0
LET DFA state 1 has item S' → S
INSERT <dummy, 1> in STACK
REPEAT
     CASE action[top_state of stack, input a] of
          SHIFT to k: PUSH <I[j++] ,k> into the STACK
          REDUCE X → α: POP α and PUSH <X, goto[top_state of the STACK, X]>
          ACCEPT: Halt normally
          ERROR: Abort with a error message
  • SLR is very simple and LR(1) is generally used in bottom-up parser. In LR(1) grammar, an item is actually a pair like: T → [T → . int*T, $] means if T → int*T. is read then reduce if next symbol us $. This method is actually is better then searching in follow sets.
  • Left-recursion is okay for LR grammars.
  • The improved SLR example is here.
Advertisements

Compiler Designing W04L0805

SLR Parsing

  • Suppose there is a LR(0) parser such that:
    • stack contains α and t is input and on input α then DFA terminates to state s.
    • Reduce by X → β if s contains X → β.
    • Shift if s contains X → β.γδ
  • LR(0) parser can not deal with reduce-reduce conflict (X → β. and Y → δ.) and shift-reduce conflict(X → β. and Y → α.δη).
  • Suppose if E → T. and E → T. + E then there would be shift-reduce conflic in LR(0) grammar.
  • SLR improves on LR(0) shift-reduce heuristics.
  • A little modification in LR(0) parser produces SLR:
    1. stack contains α and t is input and on input α then DFA terminates to state s.
    2. Reduce by X → β if s contains X → β. and if t ∈ FOLLOW(X)
    3. Shift if s contains X → β.γδ
  • If there are any conflict the grammar is not SLR.
  • The SLR grammars are those where heuristics exactly detects handles.
  • All ambiguous grammars are not SLR.
  • Precedence can be added with conflict resolution to expand the span of SLR grammars.

E → E + E | E * E | ( E ) | int, then E → E * E. and E → E.+E would have shift-reduce conflict, this can be reduced by defining precedence of ‘*’ over ‘+’.

  • Algorithm for SLR parser:
LET M be a DFA of viable prefixes of G
REPEAT till configuration S | $ is achieved
     At any state α | ω; and let a be the next input and I be the items
          SHIFT if X → α.aβ ∈ I
          REDUCE if X → β. ∈ I and a ∈ FOLLOW(X)
          report PARSING ERROR if none possible

Difference Between LL and LR parsing

At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol.

An LL parse is a left-to-right, leftmost derivation. That is, we consider the input symbols from the left to the right and attempt to construct a leftmost derivation. This is done by beginning at the start symbol and repeatedly expanding out the leftmost nonterminal until we arrive at the target string. An LR parse is a left-to-right, rightmost derivation, meaning that we scan from the left to right and attempt to construct a rightmost derivation. The parser continuously picks a substring of the input and attempts to reverse it back to a nonterminal.

During an LL parse, the parser continuously chooses between two actions:

  1. Predict: Based on the leftmost nonterminal and some number of lookahead tokens, choose which production ought to be applied to get closer to the input string.
  2. Match: Match the leftmost guessed terminal symbol with the leftmost unconsumed symbol of input.

As an example, given this grammar:

  • S → E
  • E → T + E
  • E → T
  • T → int

Then given the string int + int + int, an LL(2) parser (which uses two tokens of lookahead) would parse the string as follows:

Production       Input              Action
---------------------------------------------------------
S                int + int + int    Predict S -> E
E                int + int + int    Predict E -> T + E
T + E            int + int + int    Predict T -> int
int + E          int + int + int    Match int
+ E              + int + int        Match +
E                int + int          Predict E -> T + E
T + E            int + int          Predict T -> int
int + E          int + int          Match int
+ E              + int              Match +
E                int                Predict E -> T
T                int                Predict T -> int
int              int                Match int
                                    Accept

Notice that in each step we look at the leftmost symbol in our production. If it’s a terminal, we match it, and if it’s a nonterminal, we predict what it’s going to be by choosing one of the rules.

In an LR parser, there are two actions:

  1. Shift: Add the next token of input to a buffer for consideration.
  2. Reduce: Reduce a collection of terminals and nonterminals in this buffer back to some nonterminal by reversing a production.

As an example, an LR(1) parser (with one token of lookahead) might parse that same string as follows:

Workspace        Input              Action
---------------------------------------------------------
                 int + int + int    Shift
int              + int + int        Reduce T -> int
T                + int + int        Shift
T +              int + int          Shift
T + int          + int              Reduce T -> int
T + T            + int              Shift
T + T +          int                Shift
T + T + int                         Reduce T -> int
T + T + T                           Reduce E -> T
T + T + E                           Reduce E -> T + E
T + E                               Reduce E -> T + E
E                                   Reduce S -> E
S                                   Accept

The two parsing algorithms you mentioned (LL and LR) are known to have different characteristics. LL parsers tend to be easier to write by hand, but they are less powerful than LR parsers and accept a much smaller set of grammars than LR parsers do. LR parsers come in many flavors (LR(0), SLR(1), LALR(1), LR(1), IELR(1), GLR(0), etc.) and are far more powerful. They also tend to have much more complex and are almost always generated by tools like yacc or bison. LL parsers also come in many flavors (including LL(*), which is used by the ANTLR tool), though in practice LL(1) is the most-widely used.

As a shameless plug, if you’d like to learn more about LL and LR parsing, I just finished teaching a compilers course and have some handouts and lecture slides on parsing on the course website. I’d be glad to elaborate on any of them if you think it would be useful.

This is as it is copied 😛 from the this link.

Compiler Designing W04L0804

Valid Items

  • The NFA of prefixes as in last post’s video example is changed to DFA.
  • The states of DFA are canonical collection of items or canonical collection of LR(0) items.
  • Item X → β.γ is valid if S’ →αXδ → αβγδ by a series of right-most derivation. As the dot is after β the viable prefix is αβ.
  • An item l is valid for a viable prefix α if in any state s containing l, DFA of viable prefixes terminates after accepting α.
  • In general, an item will be valid for many viable prefixes.
  • For example T → (.E) is valid for any sequence of (, like (( or ((…((
  • Page no 3 of this article.

Compiler Designing W04L0803

Recognizing Viable Prefixes

Algorithm:

1. Add a dummy production S' → S to G(grammar)
2. NFA states are items for grammar G including the extra productions 
                                 //NFA, because all the viable prefixes are regular languages,
3. For every item E → α.Xβ; X: is any grammar symbol                  //extends the prefix
     add transition E → αX.β
4. For every item E → α.Xβ and production X → γ                       //where prefix ends
     add item E → α.Xβ →ε X →.γ; where X is a non-terminal
                                 //this rule means X doesn't appear after a but X → γ
5. In the NFA every state is accepting state.
6. Start state S'->.S
  • NFA(stack)=yes/no and reads from bottom to top of the stack.
  • For example implementation see the video itself from 6:50 minute.

Compiler Designing W04L0802

Recognizing Handles and Viable Prefixes

    • There is no efficient algorithm for recognizing handles, but good heuristics are there to guess handles correctly.
  • cfgLR(k)LALR(k), SLR(k)
  • α is viable prefix if ω is such that α | ω is a state of shift-reduce parser. α is stack and ω is rest of the input. Consequently stack will always have prefixes of RHS of some production or the RHS.
  • Viable prefix is a prefix of handle, no error if a prefix is there.
  • Viable prefix:- The prefixes of right sentential forms that can appear on the stack of a shift-reduce parser are called viable prefixes.
  • Bottom-up Parsing:- For any grammar set of viable prefixes are regular languages.
  • Computing the viable prefix:
An item is production of "." somewhere in RHS. 
T → ( E ) 
     T → .( E )
     T → (. E )
     T → ( E. )
     T → ( E ).
and if X → ε then the only item is X → .
it is also called LR(0) item.
if (int) is input then (E|) is a valid state of shift-reduce parser. Because the item T → ( E. )indicates that a ')' is expected.

Lets Stack=Prefix1 Prefix2 ... Prefixn
then Prefixi will eventually reduced to Xi, having the production: Xi → αi
and then Xi-1 -> Prefixi-1 Xi β
  • Prefixk+1 Prefixk+2 … Prefixn reduces to αk.
  • The link for handles recognization is here.
  • Link for viable prefixes.

Compiler Designing W04L0801

Handles

  • Concept of bottom-up parsingL shift and reduce. Left can be impl by a stack, top is bar.
  • How to decide when to shift or reduce?
In earlier example of the grammar
E → TX;          X → +E|ε
T → (E)|int Y;   Y → *T|ε

int*int+int is to be parsed and the first step was a solution of shift-reduce conflict:
int | * int + int can be reduced to T | * int + int

handle is defined to deal with this situation, if S →* αXγ → αβγ then we say that αβ is that handle of αβγ, that means it is okay to do that reduction.

  • So a reduction is done only when the left hand side is a handle.

[II]

  • Handles always appear at the top of the string in bottom-up parsing.
  • The right-most symbol to the bar or the top of the stack is always a non-terminal when reduction is done, then sequence of shift moves reaches to next handle.
  • Bottom-up parsing is based on recognizing handles.

Compiler Designing W04L0706

Shift-Reduce Parsing

  • It is main strategy for implementation of bottom-up parsing.
  • As a consequence of reverse right-production αγβ → αXβ, then β is unexamined input.
  • So examined and not examined input is separated by a vertical bar.
  • While making a parse tree only two actions can be taken: shift and reduce
    • shift: When reduction of left string to the bar is not possible then one more token is read, hence bar shifts to right.
    • reduce: string that is left to the bar is reduced using inverse production.
E → TX;          X → +E|ε
T → (E)|int Y;   Y → *T|ε

and let int*int+int is to be parsed then steps would look like this:
1. int | * int + int
2. int * | int + int
3. int * int | + int  is reduced to int * T | + int
4. int * T | int is reduced to T | + int
5. T + | int
6. T + int |
7. T + E |
8. E|

[II]

  • Left string can be processed using stack. A shift will push a terminal onto the stack then stack is popped till it produces RHS of any given production and then LHS of that production is pushed onto the stack.
  • Shift-reduce conflict:- When it possible for parser to shift and also reduce at tha same time.

Example is:

if condition 1
     if condition 2
          statements;
     else
          statements;

the problem is else belongs to which if is the language doesn’t work on indentation. (This is solved by giving the possession of else to it’s closest if.)

  • Reduce-reduce conflict:-  when there are two productions, suitable for the reduction.

Compiler Designing W04L04

Using Parsing Table

Predictive Parsing
  • untitled12For the leftmost Non-terminals X we look at next input ‘a’ and then look at table cell [X,a]
  • A stack records frontier of the parse tree, stack have Non-terminals yet to be expanded, Terminals yet to be matched and top of the stack is leftmost Non-terminal or Terminal.
  • Reject if empty cell is encountered during the process and accept it end of the input has reached and empty stack.
  • Here is the algorithm of using parse table with the stack:
INITIALIZE stack = <S $> //S is the input string and $ represents the end of input
REPEAT
CASE IF leftmost symbol is non-terminal <X, rest> //X is Non-terminal
IF T[X, *next]==ABCDE... THEN stack ← <ABCDE... rest>
ELSE error(); 
CASE IF leftmost symbol is terminal <t, rest>
IF t==*next++ THEN stack ← <rest>;
ELSE error();
UNTIL stack is EMPTY

The implementation of the algorithm can be seen in video itself from 6:45 minute.

Compiler Designing W04L05

Bottom-up Parsing

  • Unlike top-down parsing which starts from root, it starts with leaves and reduces to start element.
  • It is more general than deterministic top-down-parsing. It has same efficiency and build on the same idea as top-down approach.
  • It doesn’t require grammar to be left-factored, so it is a “natural” grammar maintaining the precedence of operators.
  • Instead of productions, it uses reductions and rightmost derivation is used in reverse direction.
int * int + int          T → int
int * T + int            int * T → T
T + int                  int → E
T + E                    T + E → E
E
  • If seen in reverse order, rightmost non-terminal is always produced.