Convert a Python dataframe date column in seconds
Question:
I am reading a .csv
data file using pd.read_csv
and I get these first 5 rows from my global dataframe (containing thousands of rows):
time id time_offset
0 2017-12-01 21:00:00 0 -60
1 2017-12-01 21:01:00 0 -59
2 2017-12-01 21:02:00 0 -58
3 2017-12-01 21:03:00 0 -57
4 2017-12-01 21:04:00 0 -56
I’m not very good at manipulating dates in Python and I haven’t found how to do this manipulation:
- create in my dataframe a new
hour
column from the existing time
column, containing only the hours:minutes:seconds
data, which should be: 21:00:00
, 21:01:00
, 21:02:00
, etc…
- then create another column
seconds
from the newly created hour
, containing the number of seconds elapsed since time 0
, which should be: 75600
(calculated as 21×3600), 75601
(calculated ,as 21×3600 + 1), etc…
Any help in sorting this out would be much appreciated.
Answers:
Assignment of the datetime series as the index is typically useful. Use pd.to_datetime() converts it to a usable format.
df.index = pd.to_datetime(df['time'])
df.drop('time',axis=1)
- can use the strftime function – https://strftime.org/
df['time'] = df.index.strftime("%H:%M:%S")
- since
df.index[0]
is the very first time you can subtract and use .seconds
attribute:
df['seconds since'] = (df.index = df.index[0]).seconds
You can try:
# convert `time` column to datetime (if necessary):
df["time"] = pd.to_datetime(df["time"])
df["hour"] = df["time"].dt.time
df["seconds"] = (
df["time"].dt.hour * 60 * 60
+ df["time"].dt.minute * 60
+ df["time"].dt.second
)
print(df)
Prints:
time id time_offset hour seconds
0 2017-12-01 21:00:00 0 -60 21:00:00 75600
1 2017-12-01 21:01:00 0 -59 21:01:00 75660
2 2017-12-01 21:02:00 0 -58 21:02:00 75720
3 2017-12-01 21:03:00 0 -57 21:03:00 75780
4 2017-12-01 21:04:00 0 -56 21:04:00 75840
Example
data = {'time': {0: '2017-12-01 21:00:00', 1: '2017-12-01 21:01:00', 2: '2017-12-01 21:02:00',
3: '2017-12-01 21:03:00', 4: '2017-12-01 21:04:00'},
'id': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'time_offset': {0: -60, 1: -59, 2: -58, 3: -57, 4: -56}}
df = pd.DataFrame(data)
df
time id time_offset
0 2017-12-01 21:00:00 0 -60
1 2017-12-01 21:01:00 0 -59
2 2017-12-01 21:02:00 0 -58
3 2017-12-01 21:03:00 0 -57
4 2017-12-01 21:04:00 0 -56
Code
make timedelta and use dt.total_seconds()
. In the case of example, since time column is object, it can be converted to timedelta in the following way.
pd.to_timedelta(df['time'].str.split(' ').str[1])
you can convert timedelta to second by using dt.total_seconds()
s = pd.to_timedelta(df['time'].str.split(' ').str[1]).dt.total_seconds()
s
0 75600.0
1 75660.0
2 75720.0
3 75780.0
4 75840.0
Name: time, dtype: float64
I am reading a .csv
data file using pd.read_csv
and I get these first 5 rows from my global dataframe (containing thousands of rows):
time id time_offset
0 2017-12-01 21:00:00 0 -60
1 2017-12-01 21:01:00 0 -59
2 2017-12-01 21:02:00 0 -58
3 2017-12-01 21:03:00 0 -57
4 2017-12-01 21:04:00 0 -56
I’m not very good at manipulating dates in Python and I haven’t found how to do this manipulation:
- create in my dataframe a new
hour
column from the existingtime
column, containing only thehours:minutes:seconds
data, which should be:21:00:00
,21:01:00
,21:02:00
, etc… - then create another column
seconds
from the newly createdhour
, containing the number of seconds elapsed since time0
, which should be:75600
(calculated as 21×3600),75601
(calculated ,as 21×3600 + 1), etc…
Any help in sorting this out would be much appreciated.
Assignment of the datetime series as the index is typically useful. Use pd.to_datetime() converts it to a usable format.
df.index = pd.to_datetime(df['time'])
df.drop('time',axis=1)
- can use the strftime function – https://strftime.org/
df['time'] = df.index.strftime("%H:%M:%S")
- since
df.index[0]
is the very first time you can subtract and use.seconds
attribute:
df['seconds since'] = (df.index = df.index[0]).seconds
You can try:
# convert `time` column to datetime (if necessary):
df["time"] = pd.to_datetime(df["time"])
df["hour"] = df["time"].dt.time
df["seconds"] = (
df["time"].dt.hour * 60 * 60
+ df["time"].dt.minute * 60
+ df["time"].dt.second
)
print(df)
Prints:
time id time_offset hour seconds
0 2017-12-01 21:00:00 0 -60 21:00:00 75600
1 2017-12-01 21:01:00 0 -59 21:01:00 75660
2 2017-12-01 21:02:00 0 -58 21:02:00 75720
3 2017-12-01 21:03:00 0 -57 21:03:00 75780
4 2017-12-01 21:04:00 0 -56 21:04:00 75840
Example
data = {'time': {0: '2017-12-01 21:00:00', 1: '2017-12-01 21:01:00', 2: '2017-12-01 21:02:00',
3: '2017-12-01 21:03:00', 4: '2017-12-01 21:04:00'},
'id': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0},
'time_offset': {0: -60, 1: -59, 2: -58, 3: -57, 4: -56}}
df = pd.DataFrame(data)
df
time id time_offset
0 2017-12-01 21:00:00 0 -60
1 2017-12-01 21:01:00 0 -59
2 2017-12-01 21:02:00 0 -58
3 2017-12-01 21:03:00 0 -57
4 2017-12-01 21:04:00 0 -56
Code
make timedelta and use dt.total_seconds()
. In the case of example, since time column is object, it can be converted to timedelta in the following way.
pd.to_timedelta(df['time'].str.split(' ').str[1])
you can convert timedelta to second by using dt.total_seconds()
s = pd.to_timedelta(df['time'].str.split(' ').str[1]).dt.total_seconds()
s
0 75600.0
1 75660.0
2 75720.0
3 75780.0
4 75840.0
Name: time, dtype: float64