Introduction to Shift-Reduce Parsing

Last Update Time-stamp: "97/06/30 13:50:16 umrigar"

This is a brief intuitive introduction to shift-reduce bottom-up parsing. Any compiler text should provide more details. A elementary introduction to grammars and language analysis is also available.

A shift-reduce parser uses a parse stack which (conceptually) contains grammar symbols. During the operation of the parser, symbols from the input are shifted onto the stack. If a prefix of the symbols on top of the stack matches the RHS of a grammar rule which is the correct rule to use within the current context, then the parser reduces the RHS of the rule to its LHS, replacing the RHS symbols on top of the stack with the nonterminal occurring on the LHS of the rule. This shift-reduce process continues until the parser terminates, reporting either success or failure. It terminates with success when the input is legal and is accepted by the parser. It terminates with failure if an error is detected in the input.

The parser is nothing but a stack automaton which may be in one of several discrete states. A state is usually represented simply as an integer. In reality, the parse stack contains states, rather than grammar symbols. However, since each state corresponds to a unique grammar symbol, the state stack can be mapped onto the grammar symbol stack mentioned earlier.

The operation of the parser is controlled by a couple of tables:

Action Table
The action table is a table with rows indexed by states and columns indexed by terminal symbols. When the parser is in some state s and the current lookahead terminal is t, the action taken by the parser depends on the contents of action[s][t], which can contain four different kinds of entries:
Shift s'
Shift state s' onto the parse stack.
Reduce r
Reduce by rule r. This is explained in more detail below.
Accept
Terminate the parse with success, accepting the input.
Error
Signal a parse error.
Goto Table
The goto table is a table with rows indexed by states and columns indexed by nonterminal symbols. When the parser is in state s immediately after reducing by rule N, then the next state to enter is given by goto[s][N].

The current state of a shift-reduce parser is the state on top of the state stack. The detailed operation of such a parser is as follows:

  1. Initialize the parse stack to contain a single state s0, where s0 is the distinguished initial state of the parser.
  2. Use the state s on top of the parse stack and the current lookahead t to consult the action table entry action[s][t]:
  3. Repeat step (2) until the parser terminates.

For example, consider the following simple grammar

0)	$S:	stmt <EOF> 
1)	stmt:	ID ':=' expr 
2)	expr:	expr '+' ID 
3)	expr:	expr '-' ID 
4)	expr:	ID 

which describes assignment statements like a:= b + c - d. (Rule 0 is a special augmenting production added to the grammar).

One possible set of shift-reduce parsing tables is shown below (sn denotes shift n, rn denotes reduce n, acc denotes accept and blank entries denote error entries):

Parser Tables
 Action TableGoto Table
 ID':=''+''-'<EOF>stmtexpr
0s1    g2 
1 s3     
2    s4  
3s5     g6
4accaccaccaccacc  
5r4r4r4r4r4  
6r1r1s7s8r1  
7s9      
8s10      
9r2r2r2r2r2  
10r3r3r3r3r3  

A trace of the parser on the input a:= b + c - d is shown below:

StackRemaining InputAction
0/$Sa:= b + c - ds1
0/$S 1/a:= b + c - ds3
0/$S 1/a 3/:=b + c - ds5
0/$S 1/a 3/:= 5/b+ c - dr4
0/$S 1/a 3/:=+ c - dg6 on expr
0/$S 1/a 3/:= 6/expr+ c - ds7
0/$S 1/a 3/:= 6/expr 7/+c - ds9
0/$S 1/a 3/:= 6/expr 7/+ 9/c- dr2
0/$S 1/a 3/:=- dg6 on expr
0/$S 1/a 3/:= 6/expr- ds8
0/$S 1/a 3/:= 6/expr 8/-ds10
0/$S 1/a 3/:= 6/expr 8/- 10/d<EOF>r3
0/$S 1/a 3/:= <EOF>g6 on expr
0/$S 1/a 3/:= 6/expr<EOF>r1
0/$S <EOF>g2 on stmt
0/$S 2/stmt<EOF>s4
0/$S 2/stmt 4/<EOF> accept

Each stack entry is shown as a state number followed by the symbol which caused the transition to that state.

Construction of Shift-Reduce Parsing Tables

The general idea of bottom-up parsing is to repeatedly match the RHS of some rule and reduce it to the rule's LHS. To identify the matching RHS's, the parser needs to keep track of all possible rules which may match. This is done by means of the parser state, where each state keeps track of the set of rules the parser may currently be in, and how far along the parser may be within each rule. This idea of states will become clearer if we attempt to build the tables for a small example.

Consider the grammar used in the previous section, repeated below for convenience:

0)	$S:	stmt <EOF> 
1)	stmt:	ID ':=' expr 
2)	expr:	expr '+' ID 
3)	expr:	expr '-' ID 
4)	expr:	ID 

The input must be ultimately reducible to the augmenting nonterminal $S. Hence the parser should initially be in rule 0; more specifically, it should be expecting the stmt in rule 0. To show precisely which symbol is expected in a rule RHS, we define an item to be a rule, along with a position on the RHS specifying the next symbol to be expected in that RHS. We denote an item as a rule with a dot . just before the next expected symbol. Hence, returning to our example, the parser is initially expecting the item

0)	$S:	. stmt <EOF> 

However, if the parser is expecting to see a stmt, it could be at the beginning of any of the rules for stmt. Hence the initial state should include the initial item for stmt. (The process of including these additional induced items is referred to as forming the closure of the state).

State 0
0)	$S:	. stmt <EOF> 
1)	stmt:	. ID ':=' expr 

Now if the parser sees an ID in state 0, then it can move the dot past any ID symbols in state 0. We get a new state; let's call it State 1:

State 1
1)	stmt:	ID . ':=' expr 

If the parser has seen a stmt in state 0, then it can move the dot past any stmt symbols in state 0. We get a new state; let's call it State 2:

State 2
0)	$S:	stmt . <EOF> 

If the parser sees a ':=' in state 1, we get a new state; let's call it State 3:

State 3
1)	stmt:	ID ':=' . expr 

However since the dot is before the nonterminal expr, the parser could be in any of the rules for expr. Hence we need to include the rules for expr in a new state 3:

State 3
1)	stmt:	ID ':=' . expr 
2)	expr:	. expr '+' ID 
3)	expr:	. expr '-' ID 
4)	expr:   . ID 

We continue this process of following all possible transitions out of states until we cannot construct any new states. Completing this construction results in an automaton called a LR(0) machine. The transitions on terminal symbols correspond to shift actions in the parser; the transitions on nonterminal symbols correspond to goto actions in the parser.

Note that the construction guarantees that each state is entered by a unique grammar symbol; that is why we can map a state stack into a symbol stack as mentioned earlier.

For our example grammar, we get the following LR(0) machine:

State 0   	
0)	$S: . stmt  
1)	stmt: . ID ':=' expr 
	GOTO   2 on stmt
	SHIFT 1 on ID

State 1   	
1)	stmt: ID . ':=' expr 
	SHIFT 3 on ':='

State 2   	
0)	$S: stmt .  
	SHIFT 4 on 

State 3   	
1)	stmt: ID ':=' . expr 
2)	expr: . expr '+' ID 
3)	expr: . expr '-' ID 
4)	expr: . ID 
	GOTO   6 on expr
        SHIFT 5 on ID

State 4   	
0)	$S: stmt  . 

State 5   	
4)	expr: ID . 

State 6   	
1)	stmt: ID ':=' expr . 
2)	expr: expr . '+' ID 
3)	expr: expr . '-' ID 
        SHIFT 7 on '+'
        SHIFT 8 on '-'

State 7   	
2)	expr: expr '+' . ID 
        SHIFT 9 on ID

State 8   	
3)	expr: expr '-' . ID 
        SHIFT 10 on ID

State 9   	
2)	expr: expr '+' ID . 

State 10  
3)	expr: expr '-' ID . 

The LR(0) machine defines the shift and goto actions in our parse tables. But what corresponds to the reduce actions in the action table?

We should reduce by a rule when that rule has been completely recognized; i.e., when the state contains a reducing item with the dot at the extreme right. However, if a state has reducing items, under what conditions should the corresponding reductions be applied?

It is worth emphasizing that the only essential difference between SLR(1), LALR(1) and LR(1) parsers is how the decision is made as to when a reduction should be applied. If multiple actions are possible for a particular state and lookahead, then that state has conflicts. The parsers are listed above in order of increasing precision of the information used for the reduce decision; hence it is possible that a SLR(1) parser has conflicts when a LALR(1) parser has none, and a LALR(1) parser has conflicts when a LR(1) parser has none.

Parse tables can often be represented more compactly by designating a particular reduction to be the default reduction for a state. This means that if no other action is possible in a state, the default reduction for that state is applied. This explains why in the example, many rows of the table have reduce entries in almost all their columns.

Default reductions do not affect the correctness of the parse. Their only effect is on error detection: an error may not be detected until after several (useless) reductions have been applied. However, the error is still detected before any additional symbols are shifted onto the parse stack.

Copyright (C) 1997 Zerksis D. Umrigar


Feedback: Please email any feedback to zdu@acm.org.

Up to Parsing Demos Main Page Shift Reduce Parsing Page