How to remove timezone from a Timestamp column in a pandas dataframe

Question:

I read Pandas change timezone for forex DataFrame but I’d like to make the time column of my dataframe timezone naive for interoperability with an sqlite3 database.

The data in my pandas dataframe is already converted to UTC data, but I do not want to have to maintain this UTC timezone information in the database.

Given a sample of the data derived from other sources, it looks like this:

print(type(testdata))
print(testdata)
print(testdata.applymap(type))

gives:

<class 'pandas.core.frame.DataFrame'>
                        time  navd88_ft  station_id  new
0  2018-03-07 01:31:02+00:00  -0.030332          13    5
1  2018-03-07 01:21:02+00:00  -0.121653          13    5
2  2018-03-07 01:26:02+00:00  -0.072945          13    5
3  2018-03-07 01:16:02+00:00  -0.139917          13    5
4  2018-03-07 01:11:02+00:00  -0.152085          13    5
                                     time        navd88_ft     station_id  
0  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
1  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
2  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
3  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   
4  <class 'pandas._libs.tslib.Timestamp'>  <class 'float'>  <class 'int'>   

             new  
0  <class 'int'>  
1  <class 'int'>  
2  <class 'int'>  
3  <class 'int'>  
4  <class 'int'>  

but

newstamp = testdata['time'].tz_convert(None)

gives an eventual error:

TypeError: index is not a valid DatetimeIndex or PeriodIndex

What do I do to replace the column with a timezone naive timestamp?

Asked By: Dave X

||

Answers:

The column must be a datetime dtype, for example after using pd.to_datetime.
Then, you can use tz_localize to change the time zone, a naive timestamp corresponds to time zone None:

testdata['time'].dt.tz_localize(None)

Unless the column is an index (DatetimeIndex), the .dt accessor must be used to access pandas datetime functions.

Answered By: MaFF

When your data contains datetimes spanning different timezones or prior and after application of daylight saving time e.g. obtained from postges database with psycopg2, depending on pandas version you might end up in some of the scenarios where best method of conversion is:

testdata['time'].apply(lambda x: x.replace(tzinfo=None))

Scenarios when this works (note the usage of FixedOffsetTimezone with different offset) while usage of .dt.tz_localize(None) does not:

df = pd.DataFrame([
    datetime.datetime(2018, 5, 17, 21, 40, 20, 775854, 
                      tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)),
    datetime.datetime(2021, 3, 17, 14, 36, 13, 902741, 
                      tzinfo=psycopg2.tz.FixedOffsetTimezone(offset=60, name=None))
])

pd.__version__
'0.24.2'


df[0].dt.tz_localize(None)

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py", line 1861, in objects_to_datetime64ns
    values, tz_parsed = conversion.datetime_to_datetime64(data)
  File "pandas/_libs/tslibs/conversion.pyx", line 185, in pandas._libs.tslibs.conversion.datetime_to_datetime64
ValueError: Array must be all same time zone
pd.__version__
'1.1.2'


df[0].dt.tz_localize(None)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/pandas/core/generic.py", line 5132, in __getattr__
    return object.__getattribute__(self, name)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/accessor.py", line 187, in __get__
    accessor_obj = self._accessor(obj)
  File "/usr/local/lib/python3.8/site-packages/pandas/core/indexes/accessors.py", line 480, in __new__
    raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values

Answered By: Emsi

I know that you mentioned that your timestamps are already in UTC, but just to be defensive, you might as well make your code impervious to the case where timestamps (some or all of them) were in a different timezone. This doesn’t cost anything, and will be more robust:

newcol = testdata['time'].dt.tz_convert(None)

As per the docs:

A tz of None will convert to UTC and remove the timezone information.

This is safer than just dropping any timezone the timestamps may contain.

Answered By: Pierre D

Here is a function that will

  • find all columns with any instance of pd.Timestamp in them
  • convert those columns to dtype datetime (to be able to use the .dt accessor on the Series’)
  • Localize all timestamps with dt.tz_localize(None), which will keep the timeshift relative to UTC
def remove_tz_from_dataframe(df_in):
    df = df_in.copy()
    col_times = [ col for col in df.columns if any([isinstance(x, pd.Timestamp) for x in df[col]])]
    for col in col_times:
        df[col] = pd.to_datetime(
            df[col], infer_datetime_format=True) 
        df[col] = df[col].dt.tz_localize(None) 
    return df
Answered By: kjoeter