Check if string has date, any format

Question:

How do I check if a string can be parsed to a date?

  • Jan 19, 1990
  • January 19, 1990
  • Jan 19,1990
  • 01/19/1990
  • 01/19/90
  • 1990
  • Jan 1990
  • January1990

These are all valid dates. If there’s any concern regarding the lack of space in between stuff in item #3 and the last item above, that can be easily remedied via automatically inserting a space in between letters/characters and numbers, if so needed.

But first, the basics:

I tried putting it in an if statement:

if datetime.strptime(item, '%Y') or datetime.strptime(item, '%b %d %y') or datetime.strptime(item, '%b %d %Y')  or datetime.strptime(item, '%B %d %y') or datetime.strptime(item, '%B %d %Y'):

But that’s in a try-except block, and keeps returning something like this:

16343 time data 'JUNE1890' does not match format '%Y'

Unless, it met the first condition in the if statement.

To clarify, I don’t actually need the value of the date – I just want to know if it is. Ideally, it would’ve been something like this:

if item is date:
    print date
else:
    print "Not a date"

Is there any way to do this?

Asked By: zack_falcon

||

Answers:

The parse function in dateutils.parser is capable of parsing many date string formats to a datetime object.

If you simply want to know whether a particular string could represent or contain a valid date, you could try the following simple function:

from dateutil.parser import parse

def is_date(string, fuzzy=False):
    """
    Return whether the string can be interpreted as a date.

    :param string: str, string to check for date
    :param fuzzy: bool, ignore unknown tokens in string if True
    """
    try: 
        parse(string, fuzzy=fuzzy)
        return True

    except ValueError:
        return False

Then you have:

>>> is_date("1990-12-1")
True
>>> is_date("2005/3")
True
>>> is_date("Jan 19, 1990")
True
>>> is_date("today is 2019-03-27")
False
>>> is_date("today is 2019-03-27", fuzzy=True)
True
>>> is_date("Monday at 12:01am")
True
>>> is_date("xyz_not_a_date")
False
>>> is_date("yesterday")
False

Custom parsing

parse might recognise some strings as dates which you don’t want to treat as dates. For example:

  • Parsing "12" and "1999" will return a datetime object representing the current date with the day and year substituted for the number in the string

  • "23, 4" and "23 4" will be parsed as datetime.datetime(2023, 4, 16, 0, 0).

  • "Friday" will return the date of the nearest Friday in the future.
  • Similarly "August" corresponds to the current date with the month changed to August.

Also parse is not locale aware, so does not recognise months or days of the week in languages other than English.

Both of these issues can be addressed to some extent by using a custom parserinfo class, which defines how month and day names are recognised:

from dateutil.parser import parserinfo

class CustomParserInfo(parserinfo):

    # three months in Spanish for illustration
    MONTHS = [("Enero", "Enero"), ("Feb", "Febrero"), ("Marzo", "Marzo")]

An instance of this class can then be used with parse:

>>> parse("Enero 1990")
# ValueError: Unknown string format
>>> parse("Enero 1990", parserinfo=CustomParserInfo())
datetime.datetime(1990, 1, 27, 0, 0)
Answered By: Alex Riley

If you want to parse those particular formats, you can just match against a list of formats:

txt='''
Jan 19, 1990
January 19, 1990
Jan 19,1990
01/19/1990
01/19/90
1990
Jan 1990
January1990'''

import datetime as dt

fmts = ('%Y','%b %d, %Y','%b %d, %Y','%B %d, %Y','%B %d %Y','%m/%d/%Y','%m/%d/%y','%b %Y','%B%Y','%b %d,%Y')

parsed=[]
for e in txt.splitlines():
    for fmt in fmts:
        try:
           t = dt.datetime.strptime(e, fmt)
           parsed.append((e, fmt, t)) 
           break
        except ValueError as err:
           pass

# check that all the cases are handled        
success={t[0] for t in parsed}
for e in txt.splitlines():
    if e not in success:
        print e    

for t in parsed:
    print '"{:20}" => "{:20}" => {}'.format(*t) 

Prints:

"Jan 19, 1990        " => "%b %d, %Y           " => 1990-01-19 00:00:00
"January 19, 1990    " => "%B %d, %Y           " => 1990-01-19 00:00:00
"Jan 19,1990         " => "%b %d,%Y            " => 1990-01-19 00:00:00
"01/19/1990          " => "%m/%d/%Y            " => 1990-01-19 00:00:00
"01/19/90            " => "%m/%d/%y            " => 1990-01-19 00:00:00
"1990                " => "%Y                  " => 1990-01-01 00:00:00
"Jan 1990            " => "%b %Y               " => 1990-01-01 00:00:00
"January1990         " => "%B%Y                " => 1990-01-01 00:00:00
Answered By: dawg

The popular python library pandas has a function built into it that parses dates pretty consistently. If its argument errors='coerce', it can return NaN for non-date strings as well.

txt='''
Jan 19, 1990
January 19, 1990
Jan 19,1990
01/19/1990
01/19/90
1990
Jan 1990
January1990
19 Jan 1990
this is not date'''

for s in txt.split('n'):
    dt = pd.to_datetime(s.replace(',', ' '), errors='coerce')
    print(dt, dt == dt)
    
# 1990-01-19 00:00:00 True
# 1990-01-19 00:00:00 True
# 1990-01-19 00:00:00 True
# 1990-01-19 00:00:00 True
# 1990-01-19 00:00:00 True
# 1990-01-01 00:00:00 True
# 1990-01-01 00:00:00 True
# 1990-01-01 00:00:00 True
# 1990-01-19 00:00:00 True
# NaT False

Nice thing about pd.to_datetime is that it’s vectorized, so the entire list can be passed to it.

converted = pd.to_datetime(txt.split('n'), errors='coerce')

To return a boolean Series, call notna() on the result.

converted.notna()
Answered By: cottontail
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.