Parse_dates in Pandas
Question:
The following code can’t parse my date column into dates from csv file.
data=pd.read_csv('c:/data.csv',parse_dates=True,keep_date_col = True)
or
data=pd.read_csv('c:/data.csv',parse_dates=[0])
data is like following
date value
30MAR1990 140000
30JUN1990 30000
30SEP1990 120000
30DEC1990 34555
What did I do wrong? Please help!
Thanks.
Answers:
You can use the date_parser
argument to read_csv
In [62]: from pandas.compat import StringIO
In [63]: s = """date,value
30MAR1990,140000
30JUN1990,30000
30SEP1990,120000
30DEC1990,34555
"""
In [64]: from pandas.compat import StringIO
In [65]: import datetime
date_parser
expects a function that will be called on an array of strings. func
calls datetime.datetime.strptime
on each string. Check out the datetime
module in the python docs for more on the format codes.
In [66]: func = lambda dates: [datetime.datetime.strptime(x, '%d%b%Y') for x in dates]
In [67]: s = """date,value
30MAR1990,140000
30JUN1990,30000
30SEP1990,120000
30DEC1990,34555
"""
In [68]: pd.read_csv(StringIO(s), parse_dates=['date'], date_parser=func)
Out[68]:
date value
0 1990-03-30 140000
1 1990-06-30 30000
2 1990-09-30 120000
3 1990-12-30 34555
[4 rows x 2 columns]
This is a non-standard format, so not caught by the default parser, you can pass your own:
In [11]: import datetime as dt
In [12]: dt.datetime.strptime('30MAR1990', '%d%b%Y')
Out[12]: datetime.datetime(1990, 3, 30, 0, 0)
In [13]: parser = lambda date: pd.datetime.strptime(date, '%d%b%Y')
In [14]: pd.read_csv(StringIO(s), parse_dates=[0], date_parser=parser)
Out[14]:
date value
0 1990-03-30 140000
1 1990-06-30 30000
2 1990-09-30 120000
3 1990-12-30 34555
Another option is to use to_datetime after you’ve read in the strings:
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y')
data=pd.read_csv(‘c:/data.csv’,parse_dates=True,index_col=’Date’)
30MAR1990
is in standard format and it can be caught by a default parser if you tell it explicitly the name of the column:
Neither of these will work:
data=pd.read_csv('c:/data.csv',parse_dates=[0])
data=pd.read_csv('c:/data.csv',parse_dates=True,keep_date_col = True)
But if you parse date
explicitly it will, as expected.
df = pd.read_csv('c:/data.csv', parse_dates=['date'])
Result:
date value
1990-03-30 00:00:00 140000
1990-06-30 00:00:00 30000
1990-09-30 00:00:00 120000
1990-12-30 00:00:00 34555
There is no need for writing your own parser!
Simple & elegant!
The following code can’t parse my date column into dates from csv file.
data=pd.read_csv('c:/data.csv',parse_dates=True,keep_date_col = True)
or
data=pd.read_csv('c:/data.csv',parse_dates=[0])
data is like following
date value
30MAR1990 140000
30JUN1990 30000
30SEP1990 120000
30DEC1990 34555
What did I do wrong? Please help!
Thanks.
You can use the date_parser
argument to read_csv
In [62]: from pandas.compat import StringIO
In [63]: s = """date,value
30MAR1990,140000
30JUN1990,30000
30SEP1990,120000
30DEC1990,34555
"""
In [64]: from pandas.compat import StringIO
In [65]: import datetime
date_parser
expects a function that will be called on an array of strings. func
calls datetime.datetime.strptime
on each string. Check out the datetime
module in the python docs for more on the format codes.
In [66]: func = lambda dates: [datetime.datetime.strptime(x, '%d%b%Y') for x in dates]
In [67]: s = """date,value
30MAR1990,140000
30JUN1990,30000
30SEP1990,120000
30DEC1990,34555
"""
In [68]: pd.read_csv(StringIO(s), parse_dates=['date'], date_parser=func)
Out[68]:
date value
0 1990-03-30 140000
1 1990-06-30 30000
2 1990-09-30 120000
3 1990-12-30 34555
[4 rows x 2 columns]
This is a non-standard format, so not caught by the default parser, you can pass your own:
In [11]: import datetime as dt
In [12]: dt.datetime.strptime('30MAR1990', '%d%b%Y')
Out[12]: datetime.datetime(1990, 3, 30, 0, 0)
In [13]: parser = lambda date: pd.datetime.strptime(date, '%d%b%Y')
In [14]: pd.read_csv(StringIO(s), parse_dates=[0], date_parser=parser)
Out[14]:
date value
0 1990-03-30 140000
1 1990-06-30 30000
2 1990-09-30 120000
3 1990-12-30 34555
Another option is to use to_datetime after you’ve read in the strings:
df['date'] = pd.to_datetime(df['date'], format='%d%b%Y')
data=pd.read_csv(‘c:/data.csv’,parse_dates=True,index_col=’Date’)
30MAR1990
is in standard format and it can be caught by a default parser if you tell it explicitly the name of the column:
Neither of these will work:
data=pd.read_csv('c:/data.csv',parse_dates=[0])
data=pd.read_csv('c:/data.csv',parse_dates=True,keep_date_col = True)
But if you parse date
explicitly it will, as expected.
df = pd.read_csv('c:/data.csv', parse_dates=['date'])
Result:
date value
1990-03-30 00:00:00 140000
1990-06-30 00:00:00 30000
1990-09-30 00:00:00 120000
1990-12-30 00:00:00 34555
There is no need for writing your own parser!
Simple & elegant!