Convert timedelta64[ns] column to seconds in Python Pandas DataFrame
Question:
A pandas DataFrame column duration
contains timedelta64[ns]
as shown. How can you convert them to seconds?
0 00:20:32
1 00:23:10
2 00:24:55
3 00:13:17
4 00:18:52
Name: duration, dtype: timedelta64[ns]
I tried the following
print df[:5]['duration'] / np.timedelta64(1, 's')
but got the error
Traceback (most recent call last):
File "test.py", line 16, in <module>
print df[0:5]['duration'] / np.timedelta64(1, 's')
File "C:Python27libsite-packagespandascoreseries.py", line 130, in wrapper
"addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed
Also tried
print df[:5]['duration'].astype('timedelta64[s]')
but received the error
Traceback (most recent call last):
File "test.py", line 17, in <module>
print df[:5]['duration'].astype('timedelta64[s]')
File "C:Python27libsite-packagespandascoreseries.py", line 934, in astype
values = com._astype_nansafe(self.values, dtype)
File "C:Python27libsite-packagespandascorecommon.py", line 1653, in _astype_nansafe
raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
Answers:
This works properly in the current version of Pandas (version 0.14):
In [132]: df[:5]['duration'] / np.timedelta64(1, 's')
Out[132]:
0 1232
1 1390
2 1495
3 797
4 1132
Name: duration, dtype: float64
Here is a workaround for older versions of Pandas/NumPy:
In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495, 797, 1132], dtype=int64)
timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype
'<i8'
). So the above views the timedelta64s as 8-byte ints and then does integer
division to convert nanoseconds to seconds.
Note that you need NumPy version 1.7 or newer to work with datetime64/timedelta64s.
Just realized it’s an old thread, anyway leaving it here if wanderers
like me clicks only on top 5 results on the search engine and ends up
here.
Make sure that your types are correct.
-
If you want to convert datetime to seconds , just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.
-
- hours – hours x 3600 = seconds
-
- minutes – minutes x 60 = seconds
-
- seconds – seconds
linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second
- If you want to convert timedelta to seconds use the one bellow.
linear_df[:5]['duration'].astype('timedelta64[s]')
I got it to work like this:
start_dt and end_dt columns are in this format:
import datetime
linear_df[:5]['start_dt']
0 1970-02-22 21:32:48.000
1 2016-12-30 17:47:33.216
2 2016-12-31 09:33:27.931
3 2016-12-31 09:52:53.486
4 2016-12-31 10:29:44.611
Name: start_dt, dtype: datetime64[ns]
Had my duration in timedelta64[ns] format, which was subtraction of start and end datetime values.
linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']
Resulted duration column look like this
linear_df[:5]['duration']
0 0 days 00:00:14
1 2 days 17:44:50.558000
2 0 days 15:37:28.418000
3 0 days 18:45:45.727000
4 0 days 19:21:27.159000
Name: duration, dtype: timedelta64[ns]
Using pandas I had my duration seconds between two dates in float. Easier to compare or filter your duration afterwards.
linear_df[:5]['duration'].astype('timedelta64[s]')
0 14.0
1 236690.0
2 56248.0
3 67545.0
4 69687.0
Name: duration, dtype: float64
In my case if I want to get all duration which is more than 1 second.
We can simply use the pandas apply() function
def get_seconds(time_delta):
return time_delta.seconds
def get_microseconds(time_delta):
return time_delta.micro_seconds
time_delta_series = df['duration']
converted_series = time_delta_series.apply(get_seconds)
print(converted_series)
Use the Series dt accessor to get access to the methods and attributes of a datetime (timedelta) series.
>>> s
0 -1 days +23:45:14.304000
1 -1 days +23:46:57.132000
2 -1 days +23:49:25.913000
3 -1 days +23:59:48.913000
4 00:00:00.820000
dtype: timedelta64[ns]
>>>
>>> s.dt.total_seconds()
0 -885.696
1 -782.868
2 -634.087
3 -11.087
4 0.820
dtype: float64
There are other Pandas Series Accessors for String, Categorical, and Sparse data types.
Use the ‘total_seconds()’ function :
df['durationSeconds'] = df['duration'].dt.total_seconds()
Well the answers didn’t age well. Here is a simpler solution :
df.duration.dt.total_seconds()
A pandas DataFrame column duration
contains timedelta64[ns]
as shown. How can you convert them to seconds?
0 00:20:32
1 00:23:10
2 00:24:55
3 00:13:17
4 00:18:52
Name: duration, dtype: timedelta64[ns]
I tried the following
print df[:5]['duration'] / np.timedelta64(1, 's')
but got the error
Traceback (most recent call last):
File "test.py", line 16, in <module>
print df[0:5]['duration'] / np.timedelta64(1, 's')
File "C:Python27libsite-packagespandascoreseries.py", line 130, in wrapper
"addition and subtraction, but the operator [%s] was passed" % name)
TypeError: can only operate on a timedeltas for addition and subtraction, but the operator [__div__] was passed
Also tried
print df[:5]['duration'].astype('timedelta64[s]')
but received the error
Traceback (most recent call last):
File "test.py", line 17, in <module>
print df[:5]['duration'].astype('timedelta64[s]')
File "C:Python27libsite-packagespandascoreseries.py", line 934, in astype
values = com._astype_nansafe(self.values, dtype)
File "C:Python27libsite-packagespandascorecommon.py", line 1653, in _astype_nansafe
raise TypeError("cannot astype a timedelta from [%s] to [%s]" % (arr.dtype,dtype))
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [timedelta64[s]]
This works properly in the current version of Pandas (version 0.14):
In [132]: df[:5]['duration'] / np.timedelta64(1, 's')
Out[132]:
0 1232
1 1390
2 1495
3 797
4 1132
Name: duration, dtype: float64
Here is a workaround for older versions of Pandas/NumPy:
In [131]: df[:5]['duration'].values.view('<i8')/10**9
Out[131]: array([1232, 1390, 1495, 797, 1132], dtype=int64)
timedelta64 and datetime64 data are stored internally as 8-byte ints (dtype
'<i8'
). So the above views the timedelta64s as 8-byte ints and then does integer
division to convert nanoseconds to seconds.
Note that you need NumPy version 1.7 or newer to work with datetime64/timedelta64s.
Just realized it’s an old thread, anyway leaving it here if wanderers
like me clicks only on top 5 results on the search engine and ends up
here.
Make sure that your types are correct.
-
If you want to convert datetime to seconds , just sum up seconds for each hour, minute and seconds of the datetime object if its for duration within one date.
-
- hours – hours x 3600 = seconds
-
- minutes – minutes x 60 = seconds
-
- seconds – seconds
linear_df['duration'].dt.hour*3600 + linear_df['duration'].dt.minute*60 + linear_df['duration'].dt.second
- If you want to convert timedelta to seconds use the one bellow.
linear_df[:5]['duration'].astype('timedelta64[s]')
I got it to work like this:
start_dt and end_dt columns are in this format:
import datetime
linear_df[:5]['start_dt']
0 1970-02-22 21:32:48.000
1 2016-12-30 17:47:33.216
2 2016-12-31 09:33:27.931
3 2016-12-31 09:52:53.486
4 2016-12-31 10:29:44.611
Name: start_dt, dtype: datetime64[ns]
Had my duration in timedelta64[ns] format, which was subtraction of start and end datetime values.
linear_df['duration'] = linear_df['end_dt'] - linear_df['start_dt']
Resulted duration column look like this
linear_df[:5]['duration']
0 0 days 00:00:14
1 2 days 17:44:50.558000
2 0 days 15:37:28.418000
3 0 days 18:45:45.727000
4 0 days 19:21:27.159000
Name: duration, dtype: timedelta64[ns]
Using pandas I had my duration seconds between two dates in float. Easier to compare or filter your duration afterwards.
linear_df[:5]['duration'].astype('timedelta64[s]')
0 14.0
1 236690.0
2 56248.0
3 67545.0
4 69687.0
Name: duration, dtype: float64
In my case if I want to get all duration which is more than 1 second.
We can simply use the pandas apply() function
def get_seconds(time_delta):
return time_delta.seconds
def get_microseconds(time_delta):
return time_delta.micro_seconds
time_delta_series = df['duration']
converted_series = time_delta_series.apply(get_seconds)
print(converted_series)
Use the Series dt accessor to get access to the methods and attributes of a datetime (timedelta) series.
>>> s
0 -1 days +23:45:14.304000
1 -1 days +23:46:57.132000
2 -1 days +23:49:25.913000
3 -1 days +23:59:48.913000
4 00:00:00.820000
dtype: timedelta64[ns]
>>>
>>> s.dt.total_seconds()
0 -885.696
1 -782.868
2 -634.087
3 -11.087
4 0.820
dtype: float64
There are other Pandas Series Accessors for String, Categorical, and Sparse data types.
Use the ‘total_seconds()’ function :
df['durationSeconds'] = df['duration'].dt.total_seconds()
Well the answers didn’t age well. Here is a simpler solution :
df.duration.dt.total_seconds()