How can I do a non-greedy (backtracking) match with OneOrMore etc. in pyparsing?

Question:

I am trying to parse a partially standardized street address into it’s components using pyparsing. I want to non-greedy match a street name that may be N tokens long.

For example:

444 PARK GARDEN LN

Should be parsed into:

number: 444
street: PARK GARDEN
suffix: LN

How would I do this with PyParsing? Here’s my initial code:

from pyparsing import *

def main():
    street_number = Word(nums).setResultsName('street_number')
    street_suffix = oneOf("ST RD DR LN AVE WAY").setResultsName('street_suffix')
    street_name = OneOrMore(Word(alphas)).setResultsName('street_name')

    address = street_number + street_name + street_suffix
    result = address.parseString("444 PARK GARDEN LN")
    print result.dump()

if __name__ == '__main__':
    main()

but when I try parsing it, the street suffix gets gobbled up by the default greedy parsing behavior.

Asked By: zzz

||

Answers:

Use the negation, ~, to check to see if the upcoming street_name is actually a street_suffix.

from pyparsing import *

street_number = Word(nums)('street_number')
street_suffix = oneOf("ST RD DR LN AVE WAY")('street_suffix')
street_name = OneOrMore(~street_suffix + Word(alphas))('street_name')

address = street_number + street_name + street_suffix
result = address.parseString("444 PARK GARDEN LN")
print result.dump()

In addition, you don’t have to use setResultsName, you can simply use the syntax above. IMHO it leads to a much cleaner grammar definition.

Answered By: Hooked
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.