Regex Trying To Grab Specific Part Of String Python

Question:

I am pretty new to regex and I am trying to grab part of this string, I am looking for it to start grabbing the string at the first digit in the string and copy the entire string all the away until the end digits. Example below.

import re

string = "['Today is the open house of 1234 High Drive, Denver, COLORADO 80204; open to the Public "

property_address = re.findall('d-ddddd', str(string))

print(property_address)

Code above does not work, I’m a bit confused on how to tell Regex, start on first digit you find and grab until you find 5 digit sequence.

Thanks for all the help or examples.

Asked By: Josh

||

Answers:

You can use:

import re

s = """
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 aldjfladjfa alsdjflaksjdf 
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 - 1829
aldjfladjfa alsdjflaksjdf  1234 High Drive, Denver, COLORADO 00204 - 1829
aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf 
aldjfladjfa alsdjflaksjdf 1234 High Drive, 3rd, 4th phone number 1391713917 Denver, COLORADO 00204 - 1829 aldfjald

"""

p = r'b[1-9].*[0-9]{5}(?:-[0-9]{4}b)?'

find_address = re.findall(p, s)

print(find_address)

Prints

[‘1234 High Drive, Denver, COLORADO 80204’, ‘1234 High Drive, Denver,
COLORADO 80204’, ‘1234 High Drive, Denver, COLORADO 00204’, ‘1234 High
Drive, 3rd, 4th phone number 1391713917 Denver, COLORADO 00204’]

Notes

  • Occasionally, there is a - and four digits after zipcode. Right? That should be considered.

b[1-9].*[0-9]{5}(?:-[0-9]{4}b)?:

  • b is a word boundary.
  • [1-9] assumes that the address starts with [1-9] numbers and not 0. If you want 0, then use b[0-9].*[0-9]{5}(?:-[0-9]{4}b)?.
  • (?:-[0-9]{4}b)? is an optional group. It means, if the group is in the text, will take it, otherwise no.
  • [0-9]{5} means all digits, only five times.

Edge cases

  • Just in case, if we had multiple addresses in one input, then we use the lazy matching as opposed to greedy.

b[1-9].*?[0-9]{5}b(?:-[0-9]{4}b)?

Code

import re

s = """ aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 aldjfladjfa alsdjflaksjdf 
aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204-1829
aldjfladjfa alsdjflaksjdf  1234 High Drive, Denver, COLORADO 00204-1829
aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf 
aldjfladjfa alsdjflaksjdf 1234 High Drive, 3rd, 4th phone numbers (391) 871-3912 1-391-871-3912 Denver, COLORADO 00204-1829 aldfjald

aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204-1829 aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204-1829 aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204-1829 aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204-1829 aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204 - 1829aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO 80204-1829 aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf aldjfladjfa alsdjflaksjdf 

"""

p = r'b[1-9].*?[0-9]{5}b(?:-[0-9]{4}b)?'

find_address = re.findall(p, s)

print(find_address)


Prints

[‘1234 High Drive, Denver, COLORADO 80204’, ‘1234 High Drive, Denver,
COLORADO 80204-1829’, ‘1234 High Drive, Denver, COLORADO 00204-1829’,
‘1234 High Drive, 3rd, 4th phone numbers (391) 871-3912 1-391-871-3912
Denver, COLORADO 00204-1829’, ‘1234 High Drive, Denver, COLORADO
80204-1829’, ‘1234 High Drive, Denver, COLORADO 80204-1829’, ‘1234
High Drive, Denver, COLORADO 80204-1829’, ‘1234 High Drive, Denver,
COLORADO 80204-1829’, ‘1234 High Drive, Denver, COLORADO 80204’,
‘1829aldjfladjfa alsdjflaksjdf 1234 High Drive, Denver, COLORADO
80204-1829’]

Answered By: user24714692
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.