How to specify time zone information when reading a csv with Pandas

Question:

I have a csv file with a timestamp given in CAT (Central African Time). When I read it in as a pandas dataframe using:

df = pd.read_csv(path, parse_dates=["timestamp"], dayfirst=True)

I get an error:

C:Users..libsite-packagesdateutilparser_parser.py:1218: UnknownTimezoneWarning: tzname CAT identified but not understood. Pass tzinfos argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)

which seems to indicate I need to pass a parameter tzinfos, but as far as I could see its not listed as an option for read_csv in the Pandas documentation. I tried both of:

df = pd.read_csv(path, parse_dates=["timestamp"], dayfirst=True, tzinfos={"CAT": "Etc/GMT+2"})
df = pd.read_csv(path, parse_dates=["timestamp"], dayfirst=True, tzinfos= "Etc/GMT+2")

but I keep getting an error:

TypeError: read_csv() got an unexpected keyword argument 'tzinfos'

Now at the moment its just a warning and it still reads it in as timezoneless data points to which I can just add the correct timezone info with: df.timestamp.dt.tz_localize("Etc/GMT+2"), however the fact that the warning says "In a future version, this will raise an exception" makes me think my code will break in the future so I would prefer to fix it now.

I tried googling for a solution but all the results seem to do with general datetime conversions, not reading in a csv (I couldn’t figure out how the results translate).

Example of the data

Asked By: 164_user

||

Answers:

tzinfos is an argument for dateutil’s parser. It cannot be supplied to pd.read_csv (or pd.to_datetime) directly, afaik.

Instead, you can read the csv without parsing the dates, import the parser, and apply it with the kwarg, Ex:

import pandas as pd
from dateutil import parser, tz

s = pd.Series(["01-Apr-17 12:00:00 AM CAT"])

# use tzfile Africa/Maputo for CAT:
s = s.apply(parser.parse, tzinfos={"CAT": tz.gettz("Africa/Maputo")})

s
0   2017-04-01 00:00:00+02:00
dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Africa/Maputo')]
Answered By: FObersteiner

A one liner importing only pandas

import pandas as pd

df = pd.read_csv('your_data.csv', date_parser=lambda x: pd.to_datetime(x).tz_localize('US/Eastern'))
Answered By: user3673