Is it possible to easily eliminate this reduce/reduce conflict?
Question:
This is my code for if statements that might have elses or elsifs for the scripting language I’m writing a parser for. I’m actually rather proud of how compact I got it:
@_( 'IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set { elsif_statement } ENDIF',
'IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set { elsif_statement } ELSE NEWLINE statement_set ENDIF',
)
def if_statement(self, p):
expressions = []
expressions.append((p.expr_comp, p[6]))
else_statement = None
if (p[8] == 'else'):
else_statement = p[10]
expressions.extend(p.elsif_statement)
return ('IF', expressions, else_statement)
@_('ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set')
def elsif_statement(self, p):
return (p.expr_comp, p.statement_set)
This generates the following rules:
Rule 64 if_statement -> IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set _3_elsif_statement_repeat ELSE NEWLINE statement_set ENDIF
Rule 65 _3_elsif_statement_repeat -> _3_elsif_statement_items
Rule 66 _3_elsif_statement_repeat ->
Rule 67 _3_elsif_statement_items -> _3_elsif_statement_items _3_elsif_statement_item
Rule 68 _3_elsif_statement_items -> _3_elsif_statement_item
Rule 69 _3_elsif_statement_item -> elsif_statement
Rule 70 if_statement -> IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set _4_elsif_statement_repeat ENDIF
Rule 71 _4_elsif_statement_repeat -> _4_elsif_statement_items
Rule 72 _4_elsif_statement_repeat ->
Rule 73 _4_elsif_statement_items -> _4_elsif_statement_items _4_elsif_statement_item
Rule 74 _4_elsif_statement_items -> _4_elsif_statement_item
Rule 75 _4_elsif_statement_item -> elsif_statement
Rule 76 elsif_statement -> ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set
Unfortunately, it also generates the following conflicts:
Conflicts:
reduce/reduce conflict for ELSIF in state 111 resolved using rule _3_elsif_statement_item -> elsif_statement
rejected rule (_4_elsif_statement_item -> elsif_statement) in state 111
reduce using _3_elsif_statement_item -> elsif_statement with lookahead ELSIF
╭╴
│ elsif_statement ♦ ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set
│ ╰_3_elsif_statement_item╯ ╰elsif_statement───────────────────────────────────────╯
│ ╰_3_elsif_statement_items╯ ╰_3_elsif_statement_item───────────────────────────────╯
│ ╰_3_elsif_statement_items─────────────────────────────────────────────────────────╯
╰╴
reduce using _4_elsif_statement_item -> elsif_statement with lookahead ELSIF
╭╴
│ elsif_statement ♦ ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set
│ ╰_4_elsif_statement_item╯ ╰elsif_statement───────────────────────────────────────╯
│ ╰_4_elsif_statement_items╯ ╰_4_elsif_statement_item───────────────────────────────╯
│ ╰_4_elsif_statement_items─────────────────────────────────────────────────────────╯
╰╴
The code works, but I’m trying to have as few conflicts as possible so that actual problems don’t get lost in the noise. I suspect that part of the problem is the automation used for an optional list of items, but my brain is not catching exactly how this is happening.
Answers:
The problem is that you have two different productions which include ´{ elsif_statement }´. The EBNF expander creates two different non-terminals; it does not attempt to notice that they are the same. That creates a conflict, because the parser can’t know which of the two identical non-terminals to use without looking ahead to the elsif
or end
, which it obviously can’t do.
There are a number of fixes for this. A simple one is to change elsif_statement
, which recognises a single elsif
clause, to instead recognise a sequence of elsif
clauses.
rici’s help was invaluable to me finding a way to even further condense the code:
@_('IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set elsif_statement_list optional_else ENDIF')
def if_statement(self, p):
expressions = []
expressions.append((p.expr_comp, p.statement_set))
else_statement = p.optional_else
expressions.extend(p.elsif_statement_list)
return ('IF', expressions, else_statement)
@_('[ ELSE NEWLINE statement_set ]')
def optional_else(self, p):
return p.statement_set
@_('{ elsif_statement }')
def elsif_statement_list(self, p):
return p.elsif_statement
@_('ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set')
def elsif_statement(self, p):
return (p.expr_comp, p.statement_set)
No reduce conflicts, and I got rid of a number of indexed checks. Incidentally, putting the else
and elsif
constructs into their own non-terminals helped a lot with that. My original statement of it, with more than one rule leading into the function, meant that multiple statement_set
items would mean that under one rule, there would be one p.statement_set
and in another, there would be p.statement_set0
and p.statement_set1
. Since the optional and repetition constructs return None or an empty list if the item isn’t present, I can just check that. And sticking them inside the non-terminal means I can reference p.elsif_statement_list
instead of something like _3_elsif_statement_repeat
or having to use an index when accessing the value (and if you embed multiple symbols into the curly braces instead of a non-terminal, it gets even messier with multiple levels of reference to get the the actual field you want).
Hopefully, this will help someone else out implementing a similar construct.
This is my code for if statements that might have elses or elsifs for the scripting language I’m writing a parser for. I’m actually rather proud of how compact I got it:
@_( 'IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set { elsif_statement } ENDIF',
'IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set { elsif_statement } ELSE NEWLINE statement_set ENDIF',
)
def if_statement(self, p):
expressions = []
expressions.append((p.expr_comp, p[6]))
else_statement = None
if (p[8] == 'else'):
else_statement = p[10]
expressions.extend(p.elsif_statement)
return ('IF', expressions, else_statement)
@_('ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set')
def elsif_statement(self, p):
return (p.expr_comp, p.statement_set)
This generates the following rules:
Rule 64 if_statement -> IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set _3_elsif_statement_repeat ELSE NEWLINE statement_set ENDIF
Rule 65 _3_elsif_statement_repeat -> _3_elsif_statement_items
Rule 66 _3_elsif_statement_repeat ->
Rule 67 _3_elsif_statement_items -> _3_elsif_statement_items _3_elsif_statement_item
Rule 68 _3_elsif_statement_items -> _3_elsif_statement_item
Rule 69 _3_elsif_statement_item -> elsif_statement
Rule 70 if_statement -> IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set _4_elsif_statement_repeat ENDIF
Rule 71 _4_elsif_statement_repeat -> _4_elsif_statement_items
Rule 72 _4_elsif_statement_repeat ->
Rule 73 _4_elsif_statement_items -> _4_elsif_statement_items _4_elsif_statement_item
Rule 74 _4_elsif_statement_items -> _4_elsif_statement_item
Rule 75 _4_elsif_statement_item -> elsif_statement
Rule 76 elsif_statement -> ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set
Unfortunately, it also generates the following conflicts:
Conflicts:
reduce/reduce conflict for ELSIF in state 111 resolved using rule _3_elsif_statement_item -> elsif_statement
rejected rule (_4_elsif_statement_item -> elsif_statement) in state 111
reduce using _3_elsif_statement_item -> elsif_statement with lookahead ELSIF
╭╴
│ elsif_statement ♦ ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set
│ ╰_3_elsif_statement_item╯ ╰elsif_statement───────────────────────────────────────╯
│ ╰_3_elsif_statement_items╯ ╰_3_elsif_statement_item───────────────────────────────╯
│ ╰_3_elsif_statement_items─────────────────────────────────────────────────────────╯
╰╴
reduce using _4_elsif_statement_item -> elsif_statement with lookahead ELSIF
╭╴
│ elsif_statement ♦ ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set
│ ╰_4_elsif_statement_item╯ ╰elsif_statement───────────────────────────────────────╯
│ ╰_4_elsif_statement_items╯ ╰_4_elsif_statement_item───────────────────────────────╯
│ ╰_4_elsif_statement_items─────────────────────────────────────────────────────────╯
╰╴
The code works, but I’m trying to have as few conflicts as possible so that actual problems don’t get lost in the noise. I suspect that part of the problem is the automation used for an optional list of items, but my brain is not catching exactly how this is happening.
The problem is that you have two different productions which include ´{ elsif_statement }´. The EBNF expander creates two different non-terminals; it does not attempt to notice that they are the same. That creates a conflict, because the parser can’t know which of the two identical non-terminals to use without looking ahead to the elsif
or end
, which it obviously can’t do.
There are a number of fixes for this. A simple one is to change elsif_statement
, which recognises a single elsif
clause, to instead recognise a sequence of elsif
clauses.
rici’s help was invaluable to me finding a way to even further condense the code:
@_('IF LPAREN expr_comp RPAREN THEN NEWLINE statement_set elsif_statement_list optional_else ENDIF')
def if_statement(self, p):
expressions = []
expressions.append((p.expr_comp, p.statement_set))
else_statement = p.optional_else
expressions.extend(p.elsif_statement_list)
return ('IF', expressions, else_statement)
@_('[ ELSE NEWLINE statement_set ]')
def optional_else(self, p):
return p.statement_set
@_('{ elsif_statement }')
def elsif_statement_list(self, p):
return p.elsif_statement
@_('ELSIF LPAREN expr_comp RPAREN THEN NEWLINE statement_set')
def elsif_statement(self, p):
return (p.expr_comp, p.statement_set)
No reduce conflicts, and I got rid of a number of indexed checks. Incidentally, putting the else
and elsif
constructs into their own non-terminals helped a lot with that. My original statement of it, with more than one rule leading into the function, meant that multiple statement_set
items would mean that under one rule, there would be one p.statement_set
and in another, there would be p.statement_set0
and p.statement_set1
. Since the optional and repetition constructs return None or an empty list if the item isn’t present, I can just check that. And sticking them inside the non-terminal means I can reference p.elsif_statement_list
instead of something like _3_elsif_statement_repeat
or having to use an index when accessing the value (and if you embed multiple symbols into the curly braces instead of a non-terminal, it gets even messier with multiple levels of reference to get the the actual field you want).
Hopefully, this will help someone else out implementing a similar construct.