Last Update Time-stamp: "97/06/30 13:50:16 umrigar"
This is a brief intuitive introduction to shift-reduce bottom-up parsing. Any compiler text should provide more details. A elementary introduction to grammars and language analysis is also available.
The operation of the parser is controlled by a couple of tables:
[
s][
t]
, which can contain four different kinds of entries: [
s][
N]
. The current state of a shift-reduce parser is the state on top of the state stack. The detailed operation of such a parser is as follows:
[
s][
t]
: [
s'][
N]
onto the stack. The lookahead token is not
changed by this step. For example, consider the following simple grammar
0) $S: stmt <EOF> 1) stmt: ID ':=' expr 2) expr: expr '+' ID 3) expr: expr '-' ID 4) expr: ID
which describes assignment statements like a:= b + c - d
. (Rule 0 is a special augmenting production added to the
grammar).
One possible set of shift-reduce parsing tables is shown below (s
n denotes shift n, r
n denotes reduce n, acc denotes accept
and blank entries denote error entries):
Action Table | Goto Table | ||||||
---|---|---|---|---|---|---|---|
ID | ':=' | '+' | '-' | <EOF> | stmt | expr | |
0 | s1 | g2 | |||||
1 | s3 | ||||||
2 | s4 | ||||||
3 | s5 | g6 | |||||
4 | acc | acc | acc | acc | acc | ||
5 | r4 | r4 | r4 | r4 | r4 | ||
6 | r1 | r1 | s7 | s8 | r1 | ||
7 | s9 | ||||||
8 | s10 | ||||||
9 | r2 | r2 | r2 | r2 | r2 | ||
10 | r3 | r3 | r3 | r3 | r3 |
A trace of the parser on the input a:= b + c - d
is shown below:
Stack | Remaining Input | Action |
---|---|---|
0/$S | a:= b + c - d | s1 |
0/$S 1/a | := b + c - d | s3 |
0/$S 1/a 3/:= | b + c - d | s5 |
0/$S 1/a 3/:= 5/b | + c - d | r4 |
0/$S 1/a 3/:= | + c - d | g6 on expr |
0/$S 1/a 3/:= 6/expr | + c - d | s7 |
0/$S 1/a 3/:= 6/expr 7/+ | c - d | s9 |
0/$S 1/a 3/:= 6/expr 7/+ 9/c | - d | r2 |
0/$S 1/a 3/:= | - d | g6 on expr |
0/$S 1/a 3/:= 6/expr | - d | s8 |
0/$S 1/a 3/:= 6/expr 8/- | d | s10 |
0/$S 1/a 3/:= 6/expr 8/- 10/d | <EOF> | r3 |
0/$S 1/a 3/:= | <EOF> | g6 on expr |
0/$S 1/a 3/:= 6/expr | <EOF> | r1 |
0/$S | <EOF> | g2 on stmt |
0/$S 2/stmt | <EOF> | s4 |
0/$S 2/stmt 4/<EOF> | | accept |
Each stack entry is shown as a state number followed by the symbol which caused the transition to that state.
The general idea of bottom-up parsing is to repeatedly match the RHS of some rule and reduce it to the rule's LHS. To identify the matching RHS's, the parser needs to keep track of all possible rules which may match. This is done by means of the parser state, where each state keeps track of the set of rules the parser may currently be in, and how far along the parser may be within each rule. This idea of states will become clearer if we attempt to build the tables for a small example.
Consider the grammar used in the previous section, repeated below for convenience:
0) $S: stmt <EOF> 1) stmt: ID ':=' expr 2) expr: expr '+' ID 3) expr: expr '-' ID 4) expr: ID
The input must be ultimately reducible to the augmenting nonterminal $S
. Hence the parser should initially be in rule 0; more
specifically, it should be expecting the stmt
in rule 0. To show precisely which symbol is expected in a rule RHS, we define
an item to be a rule, along with a position on the RHS specifying the next symbol to be expected in that RHS. We denote an
item as a rule with a dot .
just before the next expected symbol. Hence, returning to our example, the parser is initially
expecting the item
0) $S: . stmt <EOF>
However, if the parser is expecting to see a stmt
, it could be at the beginning of any of the rules for stmt
. Hence the initial
state should include the initial item for stmt
. (The process of including these additional induced items is referred to as
forming the closure of the state).
State 0 0) $S: . stmt <EOF> 1) stmt: . ID ':=' expr
Now if the parser sees an ID
in state 0, then it can move the dot past any ID
symbols in state 0. We get a new state; let's call
it State 1:
State 1 1) stmt: ID . ':=' expr
If the parser has seen a stmt
in state 0, then it can move the dot past any stmt
symbols in state 0. We get a new state; let's
call it State 2:
State 2 0) $S: stmt . <EOF>
If the parser sees a ':='
in state 1, we get a new state; let's call it State 3:
State 3 1) stmt: ID ':=' . expr
However since the dot is before the nonterminal expr
, the parser could be in any of the rules for expr
. Hence we need to
include the rules for expr
in a new state 3:
State 3 1) stmt: ID ':=' . expr 2) expr: . expr '+' ID 3) expr: . expr '-' ID 4) expr: . ID
We continue this process of following all possible transitions out of states until we cannot construct any new states. Completing this construction results in an automaton called a LR(0) machine. The transitions on terminal symbols correspond to shift actions in the parser; the transitions on nonterminal symbols correspond to goto actions in the parser.
Note that the construction guarantees that each state is entered by a unique grammar symbol; that is why we can map a state stack into a symbol stack as mentioned earlier.
For our example grammar, we get the following LR(0) machine:
State 0 0) $S: . stmt1) stmt: . ID ':=' expr GOTO 2 on stmt SHIFT 1 on ID State 1 1) stmt: ID . ':=' expr SHIFT 3 on ':=' State 2 0) $S: stmt . SHIFT 4 on State 3 1) stmt: ID ':=' . expr 2) expr: . expr '+' ID 3) expr: . expr '-' ID 4) expr: . ID GOTO 6 on expr SHIFT 5 on ID State 4 0) $S: stmt . State 5 4) expr: ID . State 6 1) stmt: ID ':=' expr . 2) expr: expr . '+' ID 3) expr: expr . '-' ID SHIFT 7 on '+' SHIFT 8 on '-' State 7 2) expr: expr '+' . ID SHIFT 9 on ID State 8 3) expr: expr '-' . ID SHIFT 10 on ID State 9 2) expr: expr '+' ID . State 10 3) expr: expr '-' ID .
The LR(0) machine defines the shift and goto actions in our parse tables. But what corresponds to the reduce actions in the action table?
We should reduce by a rule when that rule has been completely recognized; i.e., when the state contains a reducing item with the dot at the extreme right. However, if a state has reducing items, under what conditions should the corresponding reductions be applied?
It is worth emphasizing that the only essential difference between SLR(1), LALR(1) and LR(1) parsers is how the decision is made as to when a reduction should be applied. If multiple actions are possible for a particular state and lookahead, then that state has conflicts. The parsers are listed above in order of increasing precision of the information used for the reduce decision; hence it is possible that a SLR(1) parser has conflicts when a LALR(1) parser has none, and a LALR(1) parser has conflicts when a LR(1) parser has none.
Parse tables can often be represented more compactly by designating a particular reduction to be the default reduction for a state. This means that if no other action is possible in a state, the default reduction for that state is applied. This explains why in the example, many rows of the table have reduce entries in almost all their columns.
Default reductions do not affect the correctness of the parse. Their only effect is on error detection: an error may not be detected until after several (useless) reductions have been applied. However, the error is still detected before any additional symbols are shifted onto the parse stack.
Copyright (C) 1997 Zerksis D. Umrigar
Feedback: Please email any feedback to zdu@acm.org.