How to parse restructuredtext in python?

Question:

Is there any module that can parse restructuredtext into a tree model?

Can docutils or sphinx do this?

Asked By: zhangailin

||

Answers:

Docutils does indeed contain the tools to do this.

What you probably want is the parser at docutils.parsers.rst

See this page for details on what is involved. There are also some examples at docutils/examples.py – particularly check out the internals() function, which is probably of interest.

Answered By: Gareth Latty

I’d like to extend upon the answer from Gareth Latty. “What you probably want is the parser at docutils.parsers.rst” is a good starting point of the answer, but what’s next? Namely:

How to parse restructuredtext in python?

Below is the exact answer for Python 3.6 and docutils 0.14:

import docutils.nodes
import docutils.parsers.rst
import docutils.utils
import docutils.frontend

def parse_rst(text: str) -> docutils.nodes.document:
    parser = docutils.parsers.rst.Parser()
    components = (docutils.parsers.rst.Parser,)
    settings = docutils.frontend.OptionParser(components=components).get_default_values()
    document = docutils.utils.new_document('<rst-doc>', settings=settings)
    parser.parse(text, document)
    return document

And the resulting document can be processed using, for example, below, which will print all references in the document:

class MyVisitor(docutils.nodes.NodeVisitor):

    def visit_reference(self, node: docutils.nodes.reference) -> None:
        """Called for "reference" nodes."""
        print(node)

    def unknown_visit(self, node: docutils.nodes.Node) -> None:
        """Called for all other node types."""
        pass

Here’s how to run it:

doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)
Answered By: mbdevpl