How do I add operator precedence to a lark grammar for FOL with Equality?

Question:

How do I modify this grammar so it matches parenthesis that are further away?

?wff: compound_wff
?compound_wff: biconditional_wff
?biconditional_wff: conditional_wff (SPACE? BICONDITIONAL_SYMBOL SPACE? biconditional_wff)*
?conditional_wff: disjunctive_wff (SPACE? CONDITIONAL_SYMBOL SPACE? conditional_wff)*
?disjunctive_wff: conjunctive_wff (SPACE? DISJUNCTION_SYMBOL SPACE? disjunctive_wff)*
?conjunctive_wff: negated_wff (SPACE? CONJUNCTION_SYMBOL SPACE? conjunctive_wff)*
?negated_wff: (NEGATION_SYMBOL SPACE)* atomic_wff
?atomic_wff: predicate
           | term EQUAL_TO term
           | quantified_wff* LEFT_PARENTHESIS SPACE? wff SPACE? RIGHT_PARENTHESIS
?term: function
    | NAME
    | VARIABLE
?predicate: PREDICATE_NAME [LEFT_PARENTHESIS term (COMMA term)* RIGHT_PARENTHESIS]
?function: FUNCTION_NAME LEFT_PARENTHESIS term (COMMA SPACE? term)* RIGHT_PARENTHESIS
?quantified_wff: curly_quantifiers | quantifiers
?curly_quantifiers: quantifier_symbol LEFT_CURLY_BRACE VARIABLE (COMMA SPACE? VARIABLE)* RIGHT_CURLY_BRACE SPACE?
?quantifiers: quantifier_symbol SPACE? VARIABLE (COMMA SPACE? VARIABLE)* SPACE?

SPACE: /s+/
COMMA: ","
EQUAL_TO: "="
LEFT_PARENTHESIS: "("
RIGHT_PARENTHESIS: ")"
LEFT_CURLY_BRACE: "{"
RIGHT_CURLY_BRACE: "}"
quantifier_symbol: UNIVERSAL_QUANTIFIER_SYMBOL | EXISTENTIAL_QUANTIFIER_SYMBOL
UNIVERSAL_QUANTIFIER_SYMBOL: "\forall" | "∀"
EXISTENTIAL_QUANTIFIER_SYMBOL: "\exists" | "∃"
NAME: /[a-t]/ | /[a-t]_[1-9]d*/
VARIABLE: /[u-z]/ | /[u-z]_[1-9]d*/
PREDICATE_NAME: /[A-HJ-Z]/ | /[A-HJ-Z]_[1-9]d*/
FUNCTION_NAME: /[a-z]/ | /[a-z]_[1-9]d*/
NEGATION_SYMBOL: "\neg" | "\lnot" | "¬"
CONJUNCTION_SYMBOL: "\wedge" | "\land" | "∧"
DISJUNCTION_SYMBOL: "\vee" | "\lor" | "∨"
CONDITIONAL_SYMBOL: "\rightarrow" | "\Rightarrow" | "\Longrightarrow" | "\implies" | "→" | "⇒"
BICONDITIONAL_SYMBOL: "\leftrightarrow" | "\iff" | "↔" | "⇔"

I’m trying to parse this using my grammar:

equation

Which in LaTeX is:

exists{x} forall{y} (P(f(x, y)) vee forall{z}(V(z) iff neg R(a) wedge B(a)))

I followed the calculator example and modified my original grammar to add operator precedence which resulted in this. But it’s no longer accepting the input string.

I’m getting this error:

lark.exceptions.UnexpectedCharacters: No terminal matches '' in the current parser context, at line 1 col 35

exists{x} forall{y} (P(f(x, y)) vee forall{z}(V(z) iff neg R(a) wed
                                  ^
Expected one of: 
    * RIGHT_PARENTHESIS

Ideally I want to force requiring parenthesis wherever possible except in the case of in front of a negated atomic_wff without parenthesis. This is to make sure only one parse tree is produced even on explicit ambiguity setting. How do I resolve this issue?

Edit 1

I want to clarify that operator precedence for the same operators should be right associative. So P(a) ∧ Q(a) ∧ R(a) will resolve as P(a) ∧ (Q(a) ∧ R(a))

Edit 2

I have made the lark grammar easier to debug using proper terminals. It now parses the long latex equation, but is still ambiguous for simpler inputs like P(a) ∧ Q(a) ∧ R(a) which produces two parse trees. I still don’t know what I am doing wrong. The grammar is doing right recursion and still failing to produce a single parse tree.

Edit 3

My latest attempted solution works in every case except for giving negation a higher precedence. Any idea?

?wff: compound_wff
compound_wff: biconditional_wff
biconditional_wff: conditional_wff (SPACE? BICONDITIONAL_SYMBOL SPACE? conditional_wff)*
conditional_wff: disjunctive_wff (SPACE? CONDITIONAL_SYMBOL SPACE? disjunctive_wff)*
disjunctive_wff: conjunctive_wff (SPACE? DISJUNCTION_SYMBOL SPACE? conjunctive_wff)*
conjunctive_wff: negated_wff (SPACE? CONJUNCTION_SYMBOL SPACE? negated_wff)*
negated_wff: (NEGATION_SYMBOL SPACE?)* atomic_wff
atomic_wff: predicate
          | term EQUAL_TO term
          | quantified_wff* LEFT_PARENTHESIS SPACE? wff SPACE? RIGHT_PARENTHESIS
term: function
    | NAME
    | VARIABLE
predicate: PREDICATE_NAME [LEFT_PARENTHESIS term (COMMA term)* RIGHT_PARENTHESIS]
function: FUNCTION_NAME LEFT_PARENTHESIS term (COMMA term)* RIGHT_PARENTHESIS
quantified_wff: curly_quantifiers | quantifiers
curly_quantifiers: quantifier_symbol LEFT_CURLY_BRACE VARIABLE (COMMA SPACE? VARIABLE)* RIGHT_CURLY_BRACE SPACE?
quantifiers: quantifier_symbol SPACE? VARIABLE (COMMA SPACE? VARIABLE)* SPACE?

SPACE: /s+/
COMMA: /,s*/
EQUAL_TO: /s*=s*/
LEFT_PARENTHESIS: "("
RIGHT_PARENTHESIS: ")"
LEFT_CURLY_BRACE: "{"
RIGHT_CURLY_BRACE: "}"
quantifier_symbol: UNIVERSAL_QUANTIFIER_SYMBOL | EXISTENTIAL_QUANTIFIER_SYMBOL
UNIVERSAL_QUANTIFIER_SYMBOL: "\forall" | "∀"
EXISTENTIAL_QUANTIFIER_SYMBOL: "\exists" | "∃"
NAME: /[a-t]/ | /[a-t]_[1-9]d*/
VARIABLE: /[u-z]/ | /[u-z]_[1-9]d*/
PREDICATE_NAME: /[A-HJ-Z]/ | /[A-HJ-Z]_[1-9]d*/
FUNCTION_NAME: /[a-z]/ | /[a-z]_[1-9]d*/
NEGATION_SYMBOL: "\neg" | "\lnot" | "¬"
CONJUNCTION_SYMBOL: "\wedge" | "\land" | "∧"
DISJUNCTION_SYMBOL: "\vee" | "\lor" | "∨"
CONDITIONAL_SYMBOL: "\rightarrow" | "\Rightarrow" | "\Longrightarrow" | "\implies" | "→" | "⇒"
BICONDITIONAL_SYMBOL: "\leftrightarrow" | "\iff" | "↔" | "⇔"

%ignore SPACE
%ignore COMMA
%ignore LEFT_CURLY_BRACE
%ignore RIGHT_CURLY_BRACE

So it fails to to produce a single parse tree for ¬P(a) ∧ Q(b).

Asked By: daegontaven

||

Answers:

There are a couple of problems with this grammar.

1. Erroneous whitespace handling

In the cascading precedence rules, each rule imposes the requirement for an additional SPACE following the leftmost symbol, even if the repetition is null. So in the cascade, these SPACEs add up as each cascading level adds it’s own SPACE. (That’s unsatisfiable, because your SPACE token matches any number of consecutive spaces.)

So instead of:

?biconditional: conditional SPACE (biconditional_symbol SPACE conditional)*
?conditional: disjunction SPACE (conditional_symbol SPACE disjunction)*
?disjunction: conjunction SPACE (disjunction_symbol SPACE conjunction)*
?conjunction: negation SPACE (conjunction_symbol SPACE negation)*

you need to move the SPACE inside the repetition group:

?biconditional: conditional (SPACE biconditional_symbol SPACE conditional)*
?conditional: disjunction (SPACE conditional_symbol SPACE disjunction)*
?disjunction: conjunction (SPACE disjunction_symbol SPACE conjunction)*
?conjunction: negation (SPACE conjunction_symbol SPACE negation)*

(I have some doubts about the first two rules. Unlike disjunction and conjunction, implication is not associative, so a formula with two implication operators, P implies Q implies R, needs to be disambiguated, either with parentheses or precedence rules. iff is associative, but I think P iff Q iff R is also a bit dubious. Personally, I’d change the repetition operator to an optionality operator in both of those rules.)

But, frankly, I think you’d be better off just ignoring whitespace, as in the calculator example you claim to have based this grammar on, by removing all the SPACE tokens and adding:

%import common.WS
%ignore WS

(or use WS_INLINE instead if you’d prefer your grammar to treat newlines as significant.)

2. The grammar requires parentheses where they shouldn’t be needed.

(Awaiting clarification from OP)

Answered By: rici

The issue was each wff rule was not matching the antecedent rule in it’s parts. So:

?biconditional_wff: conditional_wff (SPACE? BICONDITIONAL_SYMBOL SPACE? biconditional_wff)*

Should be changed to:

?biconditional_wff: conditional_wff (SPACE? BICONDITIONAL_SYMBOL SPACE? conditional_wff)*

The solution to this issue is to have a grammar like this:

?wff: compound_wff
compound_wff: biconditional_wff
biconditional_wff: conditional_wff (SPACE? BICONDITIONAL_SYMBOL SPACE? conditional_wff)*
conditional_wff: disjunctive_wff (SPACE? CONDITIONAL_SYMBOL SPACE? disjunctive_wff)*
disjunctive_wff: conjunctive_wff (SPACE? DISJUNCTION_SYMBOL SPACE? conjunctive_wff)*
conjunctive_wff: negated_wff (SPACE? CONJUNCTION_SYMBOL SPACE? negated_wff)*
negated_wff: (NEGATION_SYMBOL SPACE?)* atomic_wff
atomic_wff: predicate
          | term EQUAL_TO term
          | quantified_wff* LEFT_PARENTHESIS SPACE? wff SPACE? RIGHT_PARENTHESIS
term: function
    | NAME
    | VARIABLE
predicate: PREDICATE_NAME [LEFT_PARENTHESIS term (COMMA term)* RIGHT_PARENTHESIS]
function: FUNCTION_NAME LEFT_PARENTHESIS term (COMMA term)* RIGHT_PARENTHESIS
quantified_wff: curly_quantifiers | quantifiers
curly_quantifiers: quantifier_symbol LEFT_CURLY_BRACE VARIABLE (COMMA SPACE? VARIABLE)* RIGHT_CURLY_BRACE SPACE?
quantifiers: quantifier_symbol SPACE? VARIABLE (COMMA SPACE? VARIABLE)* SPACE?

SPACE: /s+/
COMMA: /,s*/
EQUAL_TO: /s*=s*/
LEFT_PARENTHESIS: "("
RIGHT_PARENTHESIS: ")"
LEFT_CURLY_BRACE: "{"
RIGHT_CURLY_BRACE: "}"
quantifier_symbol: UNIVERSAL_QUANTIFIER_SYMBOL | EXISTENTIAL_QUANTIFIER_SYMBOL
UNIVERSAL_QUANTIFIER_SYMBOL: "\forall" | "∀"
EXISTENTIAL_QUANTIFIER_SYMBOL: "\exists" | "∃"
NAME: /[a-t]/ | /[a-t]_[1-9]d*/
VARIABLE: /[u-z]/ | /[u-z]_[1-9]d*/
PREDICATE_NAME: /[A-HJ-Z]/ | /[A-HJ-Z]_[1-9]d*/
FUNCTION_NAME: /[a-z]/ | /[a-z]_[1-9]d*/
NEGATION_SYMBOL: "\neg" | "\lnot" | "¬"
CONJUNCTION_SYMBOL: "\wedge" | "\land" | "∧"
DISJUNCTION_SYMBOL: "\vee" | "\lor" | "∨"
CONDITIONAL_SYMBOL: "\rightarrow" | "\Rightarrow" | "\Longrightarrow" | "\implies" | "→" | "⇒"
BICONDITIONAL_SYMBOL: "\leftrightarrow" | "\iff" | "↔" | "⇔"

Also remove the %ignore directives.

%ignore SPACE
%ignore COMMA
%ignore LEFT_CURLY_BRACE
%ignore RIGHT_CURLY_BRACE
Answered By: daegontaven