Saving nltk drawn parse tree to image file
Question:
Is there any way to save the draw image from tree.draw() to an image file programmatically? I tried looking through the documentation, but I couldn’t find anything.
Answers:
I had exactly the same need, and looking into the source code of nltk.draw.tree
I found a solution:
from nltk import Tree
from nltk.draw.util import CanvasFrame
from nltk.draw import TreeWidget
cf = CanvasFrame()
t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))')
tc = TreeWidget(cf.canvas(),t)
cf.add_widget(tc,10,10) # (10,10) offsets
cf.print_to_file('tree.ps')
cf.destroy()
The output file is a postscript, and you can convert it to an image file using ImageMagick on terminal:
$ convert tree.ps tree.png
I think this is a quick and dirty solution; it could be inefficient in that it displays the canvas and destroys it later (perhaps there is an option to disable display, which I couldn’t find). Please let me know if there is any better way.
To add to Minjoon’s answer, you can change the fonts and colours of the tree to look more like the NLTK .draw()
version as follows:
tc['node_font'] = 'arial 14 bold'
tc['leaf_font'] = 'arial 14'
tc['node_color'] = '#005990'
tc['leaf_color'] = '#3F8F57'
tc['line_color'] = '#175252'
Before (left) and after (right):
Using the nltk.draw.tree.TreeView
object to create the canvas frame automatically:
>>> from nltk.tree import Tree
>>> from nltk.draw.tree import TreeView
>>> t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))')
>>> TreeView(t)._cframe.print_to_file('output.ps')
Then:
>>> import os
>>> os.system('convert output.ps output.png')
[output.png]:
To save a given NLTK tree to an image file (OS-agnostic), I recommend the Constituent-Treelib library, which builds on benepar, spaCy and NLTK. First, install it via pip install constituent-treelib
Then, perform the following steps:
from nltk import Tree
from constituent_treelib import ConstituentTree
# Define your sentence that should be parsed and saved to a file
sentence = "At least nine tenths of the students passed."
# Rather than a raw string you can also provide an already constructed NLTK tree
sentence = Tree('S', [Tree('NP', [Tree('NP', [Tree('QP', [Tree('ADVP', [Tree('RB', ['At']), Tree('RBS', ['least'])]), Tree('CD', ['nine'])]), Tree('NNS', ['tenths'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['the']), Tree('NNS', ['students'])])])]), Tree('VP', [Tree('VBD', ['passed'])]), Tree('.', ['.'])])
# Define the language that should be considered with respect to the underlying benepar and spaCy models
language = ConstituentTree.Language.English
# You can also specify the desired model for the language ("Small" is selected by default)
spacy_model_size = ConstituentTree.SpacyModelSize.Large
# Create the neccesary NLP pipeline (required to instantiate a ConstituentTree object)
nlp = ConstituentTree.create_pipeline(language, spacy_model_size)
# In case you haven't downloaded the required benepar an spaCy models, you can tell the method to do it automatically for you
# nlp = ConstituentTree.create_pipeline(language, spacy_model_size, download_models=True)
# Instantiate a ConstituentTree object and pass it the sentence as well as the NLP pipeline
tree = ConstituentTree(sentence, nlp)
# Now you can export the tree to a file (e.g., a PDF)
tree.export_tree("NLTK_parse_tree.pdf", verbose=True)
>>> PDF-file successfully saved to: NLTK_parse_tree.pdf
Is there any way to save the draw image from tree.draw() to an image file programmatically? I tried looking through the documentation, but I couldn’t find anything.
I had exactly the same need, and looking into the source code of nltk.draw.tree
I found a solution:
from nltk import Tree
from nltk.draw.util import CanvasFrame
from nltk.draw import TreeWidget
cf = CanvasFrame()
t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))')
tc = TreeWidget(cf.canvas(),t)
cf.add_widget(tc,10,10) # (10,10) offsets
cf.print_to_file('tree.ps')
cf.destroy()
The output file is a postscript, and you can convert it to an image file using ImageMagick on terminal:
$ convert tree.ps tree.png
I think this is a quick and dirty solution; it could be inefficient in that it displays the canvas and destroys it later (perhaps there is an option to disable display, which I couldn’t find). Please let me know if there is any better way.
To add to Minjoon’s answer, you can change the fonts and colours of the tree to look more like the NLTK .draw()
version as follows:
tc['node_font'] = 'arial 14 bold'
tc['leaf_font'] = 'arial 14'
tc['node_color'] = '#005990'
tc['leaf_color'] = '#3F8F57'
tc['line_color'] = '#175252'
Before (left) and after (right):
Using the nltk.draw.tree.TreeView
object to create the canvas frame automatically:
>>> from nltk.tree import Tree
>>> from nltk.draw.tree import TreeView
>>> t = Tree.fromstring('(S (NP this tree) (VP (V is) (AdjP pretty)))')
>>> TreeView(t)._cframe.print_to_file('output.ps')
Then:
>>> import os
>>> os.system('convert output.ps output.png')
[output.png]:
To save a given NLTK tree to an image file (OS-agnostic), I recommend the Constituent-Treelib library, which builds on benepar, spaCy and NLTK. First, install it via pip install constituent-treelib
Then, perform the following steps:
from nltk import Tree
from constituent_treelib import ConstituentTree
# Define your sentence that should be parsed and saved to a file
sentence = "At least nine tenths of the students passed."
# Rather than a raw string you can also provide an already constructed NLTK tree
sentence = Tree('S', [Tree('NP', [Tree('NP', [Tree('QP', [Tree('ADVP', [Tree('RB', ['At']), Tree('RBS', ['least'])]), Tree('CD', ['nine'])]), Tree('NNS', ['tenths'])]), Tree('PP', [Tree('IN', ['of']), Tree('NP', [Tree('DT', ['the']), Tree('NNS', ['students'])])])]), Tree('VP', [Tree('VBD', ['passed'])]), Tree('.', ['.'])])
# Define the language that should be considered with respect to the underlying benepar and spaCy models
language = ConstituentTree.Language.English
# You can also specify the desired model for the language ("Small" is selected by default)
spacy_model_size = ConstituentTree.SpacyModelSize.Large
# Create the neccesary NLP pipeline (required to instantiate a ConstituentTree object)
nlp = ConstituentTree.create_pipeline(language, spacy_model_size)
# In case you haven't downloaded the required benepar an spaCy models, you can tell the method to do it automatically for you
# nlp = ConstituentTree.create_pipeline(language, spacy_model_size, download_models=True)
# Instantiate a ConstituentTree object and pass it the sentence as well as the NLP pipeline
tree = ConstituentTree(sentence, nlp)
# Now you can export the tree to a file (e.g., a PDF)
tree.export_tree("NLTK_parse_tree.pdf", verbose=True)
>>> PDF-file successfully saved to: NLTK_parse_tree.pdf