match dates using python regular expressions
Question:
I want to match dates that have the following format:
2010-08-27,
2010/08/27
Right now I am not very particular about the date being actually feasible, but just that it is in the correct format.
please tell the regular expression for this.
Thanks
Answers:
You can use the datetime
module to parse dates:
import datetime
print datetime.datetime.strptime('2010-08-27', '%Y-%m-%d')
print datetime.datetime.strptime('2010-15-27', '%Y-%m-%d')
output:
2010-08-27 00:00:00
Traceback (most recent call last):
File "./x.py", line 6, in <module>
print datetime.datetime.strptime('2010-15-27', '%Y-%m-%d')
File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '2010-15-27' does not match format '%Y-%m-%d'
So catching ValueError
will tell you if the date matches:
def valid_date(datestring):
try:
datetime.datetime.strptime(datestring, '%Y-%m-%d')
return True
except ValueError:
return False
To allow for various formats you could either test for all possibilities, or use re
to parse out the fields first:
import datetime
import re
def valid_date(datestring):
try:
mat=re.match('(d{2})[/.-](d{2})[/.-](d{4})$', datestring)
if mat is not None:
datetime.datetime(*(map(int, mat.groups()[-1::-1])))
return True
except ValueError:
pass
return False
Use the datetime
module. Here is a regex for the sake of knowledge although you shouldn’t use it:
r'd{4}[-/]d{2}[-/]d{2}'
You can use this code:
import re
# regular expression to match dates in format: 2010-08-27 and 2010/08/27
# date_reg_exp = re.compile('(d+[-/]d+[-/]d+)')
updated regular expression below:
# regular expression to match dates in format: 2010-08-27 and 2010/08/27
# and with mixed separators 2010/08-27
# date_reg_exp = re.compile('d{4}[-/]d{2}[-/]d{2}')
# if separators should not be mixed use backreference:
date_reg_exp = re.compile('d{4}(?P<sep>[-/])d{2}(?P=sep)d{2}')
# a string to test the regular expression above
test_str= """
fsf2010/08/27sdfsdfsd
dsf sfds f2010/08/26 fsdf
asdsds 2009-02-02 afdf
"""
# finds all the matches of the regular expression and
# returns a list containing them
matches_list=date_reg_exp.findall(test_str)
# iterates the matching list and prints all the matches
for match in matches_list:
print match
dateutil package has a quite smart dates parser. It parses a wide range of dateformats.
http://pypi.python.org/pypi/python-dateutil
use this:
test_str= '''
fsf2010/08/27sdfsdfsd
dsf sfds f2010/08/26 fsdf
asdsds 2009-02-02 afdf
'''
date_regex = re.compile('d{4}[/.-]d{2}[/.-]d{2}')
for match in date_regex.findall(test_str):
print(match)
output:
2010/08/27
2010/08/26
2009-02-02
I want to match dates that have the following format:
2010-08-27,
2010/08/27
Right now I am not very particular about the date being actually feasible, but just that it is in the correct format.
please tell the regular expression for this.
Thanks
You can use the datetime
module to parse dates:
import datetime
print datetime.datetime.strptime('2010-08-27', '%Y-%m-%d')
print datetime.datetime.strptime('2010-15-27', '%Y-%m-%d')
output:
2010-08-27 00:00:00
Traceback (most recent call last):
File "./x.py", line 6, in <module>
print datetime.datetime.strptime('2010-15-27', '%Y-%m-%d')
File "/usr/lib/python2.7/_strptime.py", line 325, in _strptime
(data_string, format))
ValueError: time data '2010-15-27' does not match format '%Y-%m-%d'
So catching ValueError
will tell you if the date matches:
def valid_date(datestring):
try:
datetime.datetime.strptime(datestring, '%Y-%m-%d')
return True
except ValueError:
return False
To allow for various formats you could either test for all possibilities, or use re
to parse out the fields first:
import datetime
import re
def valid_date(datestring):
try:
mat=re.match('(d{2})[/.-](d{2})[/.-](d{4})$', datestring)
if mat is not None:
datetime.datetime(*(map(int, mat.groups()[-1::-1])))
return True
except ValueError:
pass
return False
Use the datetime
module. Here is a regex for the sake of knowledge although you shouldn’t use it:
r'd{4}[-/]d{2}[-/]d{2}'
You can use this code:
import re
# regular expression to match dates in format: 2010-08-27 and 2010/08/27
# date_reg_exp = re.compile('(d+[-/]d+[-/]d+)')
updated regular expression below:
# regular expression to match dates in format: 2010-08-27 and 2010/08/27
# and with mixed separators 2010/08-27
# date_reg_exp = re.compile('d{4}[-/]d{2}[-/]d{2}')
# if separators should not be mixed use backreference:
date_reg_exp = re.compile('d{4}(?P<sep>[-/])d{2}(?P=sep)d{2}')
# a string to test the regular expression above
test_str= """
fsf2010/08/27sdfsdfsd
dsf sfds f2010/08/26 fsdf
asdsds 2009-02-02 afdf
"""
# finds all the matches of the regular expression and
# returns a list containing them
matches_list=date_reg_exp.findall(test_str)
# iterates the matching list and prints all the matches
for match in matches_list:
print match
dateutil package has a quite smart dates parser. It parses a wide range of dateformats.
http://pypi.python.org/pypi/python-dateutil
use this:
test_str= '''
fsf2010/08/27sdfsdfsd
dsf sfds f2010/08/26 fsdf
asdsds 2009-02-02 afdf
'''
date_regex = re.compile('d{4}[/.-]d{2}[/.-]d{2}')
for match in date_regex.findall(test_str):
print(match)
output:
2010/08/27
2010/08/26
2009-02-02