Creating pandas DatetimeIndex in Dataframe from DST aware datetime objects

Question:

From an online API I gather a series of data points, each with a value and an ISO timestamp. Unfortunately I need to loop over them, so I store them in a temporary dict and then create a pandas dataframe from that and set the index to the timestamp column (simplified example):

from datetime import datetime
import pandas


input_data = [
    '2019-09-16T06:44:01+02:00',
    '2019-11-11T09:13:01+01:00',
]

data = []
for timestamp in input_data:
    _date = datetime.fromisoformat(timestamp)

    data.append({'time': _date})

pd_data = pandas.DataFrame(data).set_index('time')

As long as all timestamps are in the same timezone and DST/non-DST everything works fine, and, I get a Dataframe with a DatetimeIndex which I can work on later.
However, once two different time-offsets appear in one dataset (above example), I only get an Index, in my dataframe, which does not support any time-based methods.

Is there any way to make pandas accept timezone-aware, differing date as index?

Asked By: Timm

||

Answers:

I’m unaware of a way to use timezone aware datetimes as the index and get a datetime index in pandas. I do have a suggestion that might help depending on what is required out of your data though.

Would it be acceptable to convert the datetime objects to the same timezone, or is the timezone information something that must be retained? If you do require the timezone but not necessarily with the index, While looping through the dates you can store a new column with the old timezone or have a duplicate of the original time from the timezone in a new column so it can still be accessed.

Answered By: Bradon Lodwick
  • A pandas datetime column also requires the offset to be the same. A column with different offsets, will not be converted to a datetime dtype.
  • I suggest, do not convert the data to a datetime until it’s in pandas.
  • Separate the time offset, and treat it as a timedelta
  • to_timedelta requires a format of 'hh:mm:ss' so add ':00' to the end of the offset
  • See Pandas: Time deltas for all the available timedelta operations
  • pandas.Series.dt.tz_convert
  • pandas.Series.tz_localize
  • Convert to a specific TZ with:
    • If a datetime is not datetime64[ns, UTC] dtype, then first use .dt.tz_localize('UTC') before .dt.tz_convert('US/Pacific')
    • Otherwise df.datetime_utc.dt.tz_convert('US/Pacific')
import pandas as pd

# sample data
input_data = ['2019-09-16T06:44:01+02:00', '2019-11-11T09:13:01+01:00']

# dataframe
df = pd.DataFrame(input_data, columns=['datetime'])

# separate the offset from the datetime and convert it to a timedelta
df['offset'] = pd.to_timedelta(df.datetime.str[-6:] + ':00')

# if desired, create a str with the separated datetime
# converting this to a datetime will lead to AmbiguousTimeError because of overlapping datetimes at 2AM, per the OP
df['datetime_str'] = df.datetime.str[:-6]

# convert the datetime column to a datetime format without the offset
df['datetime_utc'] = pd.to_datetime(df.datetime, utc=True)

# display(df)
                    datetime          offset        datetime_str              datetime_utc
0  2019-09-16T06:44:01+02:00 0 days 02:00:00 2019-09-16 06:44:01 2019-09-16 04:44:01+00:00
1  2019-11-11T09:13:01+01:00 0 days 01:00:00 2019-11-11 09:13:01 2019-11-11 08:13:01+00:00

print(df.info())
[out]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype              
---  ------        --------------  -----              
 0   datetime      2 non-null      object             
 1   offset        2 non-null      timedelta64[ns]    
 2   datetime_str  2 non-null      object             
 3   datetime_utc  2 non-null      datetime64[ns, UTC]
dtypes: datetime64[ns, UTC](1), object(2), timedelta64[ns](1)
memory usage: 192.0+ bytes

# convert to local timezone
df.datetime_utc.dt.tz_convert('US/Pacific')

[out]:
0   2019-09-15 21:44:01-07:00
1   2019-11-11 00:13:01-08:00
Name: datetime_utc, dtype: datetime64[ns, US/Pacific]

Other Resources

Answered By: Trenton McKinney

A minor correction of the question’s wording, which I think is important. What you have are UTC offsets – DST/no-DST would require more information than that, i.e. a time zone. Here, this matters since you can parse timestamps with UTC offsets (even different ones) to UTC easily:

import pandas as pd

input_data = [
    '2019-09-16T06:44:01+02:00',
    '2019-11-11T09:13:01+01:00',
]

dti = pd.to_datetime(input_data, utc=True)
# dti
# DatetimeIndex(['2019-09-16 04:44:01+00:00', '2019-11-11 08:13:01+00:00'], dtype='datetime64[ns, UTC]', freq=None)

I prefer to work with UTC so I’d be fine with that. If however you need date/time in a certain time zone, you can convert e.g. like

dti = dti.tz_convert('Europe/Berlin')
# dti
# DatetimeIndex(['2019-09-16 06:44:01+02:00', '2019-11-11 09:13:01+01:00'], dtype='datetime64[ns, Europe/Berlin]', freq=None)
Answered By: FObersteiner