Excel [h]:mm duration to pandas timedelta

Question:

I am importing data from an Excel worksheet where I have a ‘Duration’ field displayed in [h]:mm (so that the total number of hours is shown). I understand that underneath, this is simply number of days as a float.

I want to work with this as a timedelta column or similar in a Pandas dataframe but no matter what I do it’s dropping any hours over 24 (e.g. the days portion).

Excel data (over 24 hours highlighted):

Excel data with duration over 24 hours

Pandas import (1d 7h 51m):

     BATCH_NO             Duration
354      7154             04:36:00
465      7270             06:35:00
466      7271             08:05:00
467      7272             05:54:00
468      7273             09:10:00
472      7277             06:15:00
476      7280             10:23:00
477      7284             06:09:00
499      7313             06:46:00
503      7322             05:27:00
510      7333             14:15:00
515      7335  1900-01-01 07:51:00
516      7338             07:51:00
517      7339             09:00:00
518      7339             05:29:00
519      7339             09:00:00
520      7339             05:29:00
522      7342             12:10:00
525      7343             08:00:00
530      7346             08:25:00

Running a to_datetime conversion simply drops the day (integer) part of the column:

     BATCH_NO  Duration
354      7154  04:36:00
465      7270  06:35:00
466      7271  08:05:00
467      7272  05:54:00
468      7273  09:10:00
472      7277  06:15:00
476      7280  10:23:00
477      7284  06:09:00
499      7313  06:46:00
503      7322  05:27:00
510      7333  14:15:00
515      7335  07:51:00
516      7338  07:51:00
517      7339  09:00:00
518      7339  05:29:00
519      7339  09:00:00
520      7339  05:29:00
522      7342  12:10:00
525      7343  08:00:00
530      7346  08:25:00

I have tried importing by fixing the dtype as float, but only str or object work – dtype={'Duration': str} works.

float gives the error float() argument must be a string or a number, not 'datetime.time' and even with str or object, Python still thinks the column i a datetime.time

Ideally I do not want to change the Excel source data or export to .csv as in intermediate step.

Asked By: ChemEnger

||

Answers:

If I got it correctly, the imported objects are datetime and time with the datetime in Julian calendar.

So you must convert with a custom function:

from datetime import datetime, time, timedelta

def convert(t):
    if isinstance(t, time):
        t = datetime.combine(datetime.min, t)
    delta = t-datetime.min
    if delta.days != 0:
        delta -= timedelta(days=693594)
    return delta

df['Duration'].apply(convert)

Output:


0    0 days 04:36:00
1    0 days 06:35:00
2    0 days 08:05:00
3    0 days 05:54:00
4    0 days 09:10:00
5    0 days 06:15:00
6    0 days 10:23:00
7    0 days 06:09:00
8    0 days 06:46:00
9    0 days 05:27:00
10   0 days 14:15:00
11   1 days 07:51:00  # corrected
12   0 days 07:51:00
13   0 days 09:00:00
14   0 days 05:29:00
15   0 days 09:00:00
...
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.