Parse_dates in Pandas

Question:

The following code can’t parse my date column into dates from csv file.

data=pd.read_csv('c:/data.csv',parse_dates=True,keep_date_col = True) 

or

data=pd.read_csv('c:/data.csv',parse_dates=[0]) 

data is like following

date          value 
30MAR1990    140000 
30JUN1990    30000  
30SEP1990    120000  
30DEC1990    34555

What did I do wrong? Please help!

Thanks.

Asked By: user3576212

||

Answers:

You can use the date_parser argument to read_csv

In [62]: from pandas.compat import StringIO

In [63]: s = """date,value 
30MAR1990,140000 
30JUN1990,30000  
30SEP1990,120000  
30DEC1990,34555
"""

In [64]: from pandas.compat import StringIO

In [65]: import datetime

date_parser expects a function that will be called on an array of strings. func calls datetime.datetime.strptime on each string. Check out the datetime module in the python docs for more on the format codes.

In [66]: func = lambda dates: [datetime.datetime.strptime(x, '%d%b%Y') for x in dates]

In [67]: s = """date,value 
30MAR1990,140000 
30JUN1990,30000  
30SEP1990,120000  
30DEC1990,34555
"""

In [68]: pd.read_csv(StringIO(s), parse_dates=['date'], date_parser=func)
Out[68]: 
        date  value 
0 1990-03-30  140000
1 1990-06-30   30000
2 1990-09-30  120000
3 1990-12-30   34555

[4 rows x 2 columns]
Answered By: TomAugspurger

This is a non-standard format, so not caught by the default parser, you can pass your own:

In [11]: import datetime as dt

In [12]: dt.datetime.strptime('30MAR1990', '%d%b%Y')
Out[12]: datetime.datetime(1990, 3, 30, 0, 0)

In [13]: parser = lambda date: pd.datetime.strptime(date, '%d%b%Y')

In [14]: pd.read_csv(StringIO(s), parse_dates=[0], date_parser=parser)
Out[14]:
        date  value
0 1990-03-30  140000
1 1990-06-30   30000
2 1990-09-30  120000
3 1990-12-30   34555

Another option is to use to_datetime after you’ve read in the strings:

df['date'] = pd.to_datetime(df['date'], format='%d%b%Y')
Answered By: Andy Hayden

data=pd.read_csv(‘c:/data.csv’,parse_dates=True,index_col=’Date’)

Answered By: mustang 91

30MAR1990 is in standard format and it can be caught by a default parser if you tell it explicitly the name of the column:

Neither of these will work:

data=pd.read_csv('c:/data.csv',parse_dates=[0]) 
data=pd.read_csv('c:/data.csv',parse_dates=True,keep_date_col = True) 

But if you parse date explicitly it will, as expected.

df = pd.read_csv('c:/data.csv', parse_dates=['date'])

Result:

date                value
1990-03-30 00:00:00 140000
1990-06-30 00:00:00 30000
1990-09-30 00:00:00 120000
1990-12-30 00:00:00 34555

There is no need for writing your own parser!
Simple & elegant!

Answered By: Marcel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.