Spacy Regex "SyntaxError: invalid syntax"

Question:

Hi everyone I am executing this code in Spacy to match with Regex, but I get an error:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
doc1 = nlp("Hello hello hello, how are you?")
doc2 = nlp("Hello, how are you?")
doc3 = nlp("How are you?")
pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
matcher.add("greetings",  [pattern])
for mid, start, end in matcher(doc1):
print(start, end, doc1[start:end])

The error is

pattern = [{"LOWER": {"IN": ["hello", "hi", "hallo"]},"OP": "*",{"IS_PUNCT": True}}]
                                                                                  ^
SyntaxError: invalid syntax

I am following a book called Mastering Spacy and I copy-pasted the code from the book, but I checked not to include any special characters.

Regards

Asked By: Aureon

||

Answers:

A pattern added to the Matcher consists of a list of dictionaries.

(from docs). Your code, written more legibly:

pattern = [
    {
        "LOWER": {"IN": ["hello", "hi", "hallo"]},
        "OP": "*",
        {"IS_PUNCT": True}
    }
]

The first dictionary has three entries, but the third entry is malformed: each entry to a dictionary should consist of key: value, but you only have one item, which does not fit dictionary syntax.

Along those lines,

Each dictionary describes one token and its attributes.

Something that, lowercased, is in ["hello", "hi", "hallo"] cannot ever be punctuation. You seem to want to match something like "Hi Hi Hello!", two tokens with the first of them allowing for repetition; this would be matched by something like

pattern = [
    {
        "LOWER": {"IN": ["hello", "hi", "hallo"]},
        "OP": "*",
    },
    { "IS_PUNCT": True }
]
Answered By: Amadan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.