How do I write the lark grammar for First Order Logic with Equality?

Question:

According to AIMA (Russell & Norvig, 2010) this is the BNF grammar for FOL with Equality:

bnf

How do I convert this to a lark grammar? Specifically, how do I represent n-ary predicates using lark grammar?

Asked By: daegontaven

||

Answers:

I’m going to take this question as asking how to specify the syntax of an application of an identifier to a parenthesised, comma-separated list of terms.

In syntactic terms, that’s similar enough to JSON list syntax to make it worthwhile looking at the first sample grammar (for JSON) in the Lark documentation site. Functions and predicates in your FOL grammar differ from JSON lists: they use round parentheses (()) instead of square brackets ([]) and they need to additionally specify the name of a function or predicate, but the JSON grammar shows how to write a comma-separated list of things, and we can easily apply that exact same grammar syntax:

AtomicSentence
    : Predicate ['(' Term (',' Term)* ')']
    | Term '=' Term

Term: Function '(' Term (',' Term)* ')'
    | Constant
    | Variable

(I left out the rest of the supplied grammar, since it isn’t relevant to the question about Predicates.)

In that grammar syntax, parentheses are used for grouping and * is a postfix operator indicating the Kleene star; that is, "any number of repetitions, including zero, of the operand". Square brackets ([]) are used to enclose an optional syntactic sequence.

That’s not BNF, since BNF doesn’t have syntactic operators like optionality or Kleene Star. It’s an example of what’s often called "Extended BNF" (EBNF), which comes in a huge number of varieties with subtly different syntaxes. But it can be mechanically desugared into BNF; one BNF equivalent for the above would be:

AtomicSentence
    : Predicate '(' TermList ')'
    | Predicate
    | Term '=' Term

Term: Function '(' TermList ')'
    | Constant
    | Variable

TermList
    : Term
    | TermList ',' Term

That grammar does not provide a way to specify the arity (that is, the number of arguments) of each function and predicate, and consequently will generate invalid sentences. The abstract FOL grammar has no evident way of defining new predicates or functions, presumably because functions and predicates are not first-order objects. So every function, predicate and constant must be individually defined in some extra-grammatic way. Thus, the grammar might be considered short-hand for a class of more specific grammars, each with an associated finite set of defined symbols. Those definitions presumably specify the arity of each symbol, as well as indicating which are predicates and which are functions.

In order for the concrete grammar to restrict itself to predicates and functions written with correct arity (that is, with the correct number of arguments), it would need to be modified (for each specific collection of predicates and functions), according to a model like this:

AtomicSentence
    : Predicate/0
    | Predicate/1 '(' TermList/1 ')'
    | Predicate/2 '(' TermList/2 ')'
    | Predicate/3 '(' TermList/3 ')'
    | Predicate/4 '(' TermList/4 ')'
    ...
    | Term '=' Term

Term: Constant
    | Function/1 '(' TermList/1 ')'
    | Function/2 '(' TermList/2 ')'
    | Function/3 '(' TermList/3 ')'
    | Function/4 '(' TermList/4 ')'
    ...
    | Variable

TermList/1 : Term
TermList/2 : TermList/1 ',' Term
TermList/3 : TermList/2 ',' Term
TermList/4 : TermList/3 ',' Term
...
Predicate/0 : "True" | "False"
Predicate/2 : "After" | "Loves"
...
Function/1  : "Mother" | "LeftLeg"
Function/2  : "Sum" | "Product"
...

(The /n suffixes are part of the individual names; their semantic significance is external to the grammar. I take that particular sylistic convention from Prolog; it’s also used by Erlang and some ML derivatives.)
The ellipses represent other concrete definitions, which I didn’t happen to write out; they are not intended to be thought of as lists of arbitrary length. The number of aritys actually defined by the grammar will be limited to the aritys actually used by the specific functions and predicates for which the concrete grammar is being defined. So each concrete grammar produced according to that model will have a finite number of productions.

Note that a context-free grammar is not able to accurately represent a language in which functions are defined with a specific arity and elsewhere used only with exactly the same arity, unless there is a prespecified maximum arity. Grammatical concordance of that form (as with mandatory declaration of used symbols) requires a context-sensitive grammar formalism.

This answer deliberately does not discuss operator precedence (for the operators defined in ComplexSentence), because it’s not part of the original question. Without that specification, the grammar is ambiguous, but there is certainly an operator binding precedence hierarchy, presumably defined in the narrative surrounding the FOL grammar.

Answered By: rici

Here is my lark grammar for a LaTeX parser that only accepts valid WFFs of First-order logic with equality and functions. I’ve left out operator precedence because it causes ambiguity. Instead it is handled by required explicit parenthesis. This rules are adapted from the definitions given in a logic textbook (J., 2012).

wff: atomic_wff | compound_wff
atomic_wff: predicate [left_parenthesis term (comma term)* right_parenthesis] | term equal_to term
compound_wff: left_parenthesis wff right_parenthesis
            | not space wff
            | wff space and space wff
            | wff space nand space wff
            | wff space or space wff
            | wff space xor space wff
            | wff space nor space wff
            | wff space implied_by space? wff
            | wff space implies space wff
            | wff space iff space wff
            | (quantifier left_curly_brace variable right_curly_brace)* space? left_parenthesis wff right_parenthesis
term: function left_parenthesis term (comma term)* right_parenthesis
    | name
    | variable

space: /s+/
comma: ","
equal_to: "="
left_parenthesis: "("
right_parenthesis: ")"
left_curly_brace: "{"
right_curly_brace: "}"
quantifier: universal_quantifier | existential_quantifier | uniqueness_quantifier
universal_quantifier: "\forall"
existential_quantifier: "\exists"
uniqueness_quantifier: "\exists!"
name: /[a-t]/ | /[a-t]_[1-9]d*/
variable: /[u-z]/ | /[u-z]_[1-9]d*/
predicate: /[A-HJ-Z]/ | /[A-HJ-Z]_[1-9]d*/
function: /[a-z]/ | /[a-z]_[1-9]d*/
not: "\neg"
and: "\wedge"
nand: "\uparrow" | "\barwedge"
or: "\vee"
xor: "\veebar"
nor: "\downarrow"
implies: "\rightarrow" | "\Rightarrow" | "\Longrightarrow" | "\implies"
implied_by: "\leftarrow" | "\Leftarrow" | "\Longleftarrow"
iff: "\leftrightarrow" | "\iff"

Sources

  1. J., S. N. J. (2012). Syntax of GPL. In Logic: The laws of truth (pp. 267–268). essay, Princeton University Press.
Answered By: daegontaven