remove timezone from timestamp column of pandas dataframe
Question:
Data loaded with column as pandas date time:
df = pd.read_csv('test.csv', parse_dates=['timestamp'])
df
user timestamp speed
0 2 2016-04-01 01:06:26+01:00 9.76
1 2 2016-04-01 01:06:26+01:00 5.27
2 2 2016-04-01 01:06:26+01:00 8.12
3 2 2016-04-01 01:07:53+01:00 8.81
I want to remove time zone information from timestamp
column:
df['timestamp'].tz_convert(None)
TypeError: index is not a valid DatetimeIndex or PeriodIndex
Answers:
For this solution to work the column should be datetime
df['timestamp'].dt.tz_localize(None)
Given strings in your csv like "2016-04-01 01:06:26+01:00", I can think of the following options:
import pandas as pd
# will only work if *all* your timestamp contain "+hh:mm"
df = pd.read_csv('test.csv', parse_dates=['timestamp'])
df['timestamp'] = df.timestamp.dt.tz_localize(None)
print(df.timestamp.dtype)
datetime64[ns]
df = pd.read_csv('test.csv')
df['timestamp'] = pd.to_datetime(df.timestamp.str.split('+', expand=True)[0])
print(df.timestamp.dtype)
datetime64[ns]
df = pd.read_csv('test.csv', parse_dates=['timestamp'],
date_parser=lambda x: pd.to_datetime(x.split('+')[0]))
print(df.timestamp.dtype)
datetime64[ns]
Data loaded with column as pandas date time:
df = pd.read_csv('test.csv', parse_dates=['timestamp'])
df
user timestamp speed
0 2 2016-04-01 01:06:26+01:00 9.76
1 2 2016-04-01 01:06:26+01:00 5.27
2 2 2016-04-01 01:06:26+01:00 8.12
3 2 2016-04-01 01:07:53+01:00 8.81
I want to remove time zone information from timestamp
column:
df['timestamp'].tz_convert(None)
TypeError: index is not a valid DatetimeIndex or PeriodIndex
For this solution to work the column should be datetime
df['timestamp'].dt.tz_localize(None)
Given strings in your csv like "2016-04-01 01:06:26+01:00", I can think of the following options:
import pandas as pd
# will only work if *all* your timestamp contain "+hh:mm"
df = pd.read_csv('test.csv', parse_dates=['timestamp'])
df['timestamp'] = df.timestamp.dt.tz_localize(None)
print(df.timestamp.dtype)
datetime64[ns]
df = pd.read_csv('test.csv')
df['timestamp'] = pd.to_datetime(df.timestamp.str.split('+', expand=True)[0])
print(df.timestamp.dtype)
datetime64[ns]
df = pd.read_csv('test.csv', parse_dates=['timestamp'],
date_parser=lambda x: pd.to_datetime(x.split('+')[0]))
print(df.timestamp.dtype)
datetime64[ns]