An error in implementing regex function on a list

Question:

I was trying to implement a regex on a list of grammar tags in python, for finding the tense form of the list of grammar. And I wrote the following code to implement it.

Data preprocessing:

from nltk import word_tokenize, pos_tag
import nltk

text = "He will have been doing his homework." 

tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)
tags = []
for i in range(len(tagged)):
    t = tagged[i]
    tags.append(t[1])
print(tags)

regex formula i.e. to be implemented

grammar = r"""
Future_Perfect_Continuous: {<MD><VB><VBN><VBG>}
Future_Continuous:         {<MD><VB><VBG>}
Future_Perfect:            {<MD><VB><VBN>}
Past_Perfect_Continuous:   {<VBD><VBN><VBG>}
Present_Perfect_Continuous:{<VBP|VBZ><VBN><VBG>}
Future_Indefinite:         {<MD><VB>}
Past_Continuous:           {<VBD><VBG>}
Past_Perfect:              {<VBD><VBN>}
Present_Continuous:        {<VBZ|VBP><VBG>}
Present_Perfect:           {<VBZ|VBP><VBN>}
Past_Indefinite:           {<VBD>}
Present_Indefinite:        {<VBZ>|<VBP>}

Function to implement the regex on the list tags

def check_grammar(grammar, tags):
    cp = nltk.RegexpParser(grammar)
    result = cp.parse(tags)
    print(result)
    result.draw()
 
check_grammar(grammar, tags)

But it returned an error as:

Traceback (most recent call last):
  File "/home/samar/Desktop/twitter_tense/main.py", line 35, in <module>
    check_grammar(grammar, tags)
  File "/home/samar/Desktop/twitter_tense/main.py", line 31, in check_grammar
    result = cp.parse(tags)
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 1276, in parse
    chunk_struct = parser.parse(chunk_struct, trace=trace)
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 1083, in parse
    chunkstr = ChunkString(chunk_struct)
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 95, in __init__
    tags = [self._tag(tok) for tok in self._pieces]
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 95, in <listcomp>
    tags = [self._tag(tok) for tok in self._pieces]
  File "/home/samar/.local/lib/python3.8/site-packages/nltk/chunk/regexp.py", line 105, in _tag
    raise ValueError("chunk structures must contain tagged " "tokens or trees")
ValueError: chunk structures must contain tagged tokens or trees
Asked By: Samar Pratap Singh

||

Answers:

Your call to the cp.parse() function expects each of the tokens in your sentence to be tagged, however, the tags list you created only contains the tags but not the tokens as well, hence your ValueError. The solution is to instead pass the output from the pos_tag() call (i.e. tagged) to your check_grammar call (see below).

Solution

from nltk import word_tokenize, pos_tag
import nltk

text = "He will have been doing his homework." 
tokenized = word_tokenize(text)
tagged = pos_tag(tokenized)
print(tagged)
# Output
>>> [('He', 'PRP'), ('will', 'MD'), ('have', 'VB'), ('been', 'VBN'), ('doing', 'VBG'), ('his', 'PRP$'), ('homework', 'NN'), ('.', '.')]

my_grammar = r"""
Future_Perfect_Continuous: {<MD><VB><VBN><VBG>}
Future_Continuous:         {<MD><VB><VBG>}
Future_Perfect:            {<MD><VB><VBN>}
Past_Perfect_Continuous:   {<VBD><VBN><VBG>}
Present_Perfect_Continuous:{<VBP|VBZ><VBN><VBG>}
Future_Indefinite:         {<MD><VB>}
Past_Continuous:           {<VBD><VBG>}
Past_Perfect:              {<VBD><VBN>}
Present_Continuous:        {<VBZ|VBP><VBG>}
Present_Perfect:           {<VBZ|VBP><VBN>}
Past_Indefinite:           {<VBD>}
Present_Indefinite:        {<VBZ>|<VBP>}"""


def check_grammar(grammar, tags):
    cp = nltk.RegexpParser(grammar)
    result = cp.parse(tags)
    print(result)
    result.draw()


check_grammar(my_grammar, tagged)

Output

>>> (S
>>>   He/PRP
>>>   (Future_Perfect_Continuous will/MD have/VB been/VBN doing/VBG)
>>>   his/PRP$
>>>   homework/NN
>>>   ./.)

Parse tree result

Answered By: Kyle F Hartzenberg