checking if year is in the string (4 consecutive digits)
Question:
How can I find if a strings in a list contains a year (ex. 1999
). I guess I would check for four consecutive digits such as: [1-2][0-9][0-9][0-9]
How to check that against a list piece? Here is what I’ve tried already
for piece in reflist:
if "d{4}" in piece:
# Do something
for piece in reflist:
if re.match('d{4}', piece):
print piece + 'n'
Answers:
You want to use re.search()
to test for matches anywhere in the input string.
To match (recent) years a little more precisely, you could use:
re.search(r'[12]d{3}', piece)
which would match anything from 1000 through to 2999.
While both 'd{4}'
and r'[12]d{3}'
will return 4 consecutive digits, they will also return the first 4 digits of a larger number like 199999.
To get an occurrence of a year like the OP example of 1999
, wrap the expression with s
which will match for whitespace characters.
r's[12]d{3}s'
To add on to the answers above – 'r's[12]d{3}s'
misses the edge case where the 4 digit number appears at the end of the string (because there is no space after the number). To capture those cases, use:
re.search('r's[12]d{3}s|s[12]d{3}$', piece)
How can I find if a strings in a list contains a year (ex. 1999
). I guess I would check for four consecutive digits such as: [1-2][0-9][0-9][0-9]
How to check that against a list piece? Here is what I’ve tried already
for piece in reflist:
if "d{4}" in piece:
# Do something
for piece in reflist:
if re.match('d{4}', piece):
print piece + 'n'
You want to use re.search()
to test for matches anywhere in the input string.
To match (recent) years a little more precisely, you could use:
re.search(r'[12]d{3}', piece)
which would match anything from 1000 through to 2999.
While both 'd{4}'
and r'[12]d{3}'
will return 4 consecutive digits, they will also return the first 4 digits of a larger number like 199999.
To get an occurrence of a year like the OP example of 1999
, wrap the expression with s
which will match for whitespace characters.
r's[12]d{3}s'
To add on to the answers above – 'r's[12]d{3}s'
misses the edge case where the 4 digit number appears at the end of the string (because there is no space after the number). To capture those cases, use:
re.search('r's[12]d{3}s|s[12]d{3}$', piece)