Use lark to analyze reST markup language like sections

Question:

I would like to define one basic grammar such as to start to work with lark. Here is my M(not)WE.

from lark import Lark

GRAMMAR = r"""
?start: _NL* (day_heading)*

day_heading : "==" _NL day_nb _NL "==" _NL+ (paragraph _NL)*
day_nb      : /d{2}/
paragraph   : /[^n={2}]+/ (_NL+ paragraph)*
_NL         : /(r?n[t ]*)+/
"""

parser = Lark(GRAMMAR)

tree = parser.parse("""


==
12
==

Bla, bla
Bli, Bli



Blu, Blu


==
10
==


Blo, blo


    """)

print(tree.pretty())

This prints :

start
  day_heading
    day_nb      12
    paragraph
      Bla, bla
      paragraph
        Bli, Bli
        paragraph       Blu, Blu
  day_heading
    day_nb      10
    paragraph   Blo, blo

The tree I want is the following one.

start
  day_heading
    day_nb      12
    paragraph
      line      Bla, bla
      line      Bli, Bli
      line      Blu, Blu
  day_heading
    day_nb      10
    paragraph
      line      Blo, blo

How can I modify my EBNF?

Asked By: projetmbc

||

Answers:

Here is a possible answer: I have misused a recursive rule in my initial question.

Replacing _NL by NL allows to keep the new lines.

from lark import Lark

GRAMMAR = r"""
?start: _NL* (day_heading)*

day_heading : "==" _NL day_nb _NL "==" _NL+ (paragraph)+
day_nb      : /d{2}/

paragraph : (line _NL)+

line : /[^n={2}]+/
_NL  : /(r?n[t ]*)+/
"""

parser = Lark(GRAMMAR)

tree = parser.parse("""


==
12
==

Bla, bla
Bli, Bli



Blu, Blu


==
10
==


Blo, blo


    """)

print(tree.pretty())

This produces:

start
  day_heading
    day_nb      12
    paragraph
      line      Bla, bla
      line      Bli, Bli
      line      Blu, Blu
  day_heading
    day_nb      10
    paragraph
      line      Blo, blo
Answered By: projetmbc
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.