Group by timestamp and get mean Dataframe

Question:

I generate for each week as csv file. The weeks are then merged into one. The merged CSV with the Dataframe looks like this:

  machineId    |       id     |     mean      |   min         |    max
  machine1     |        2     | 00:00:03.47   | 00:00:00.02   |   00:00:06.11
  machine1     |        1     | 00:00:01.30   | 00:00:00.74   |   00:00:01.86
  machine1     |        2     | 00:00:00.35   | 00:00:00.01   |   00:00:00.99
  machine1     |        2     | 00:00:01.63   | 00:00:00.67   |   00:00:02.60
  machine1     |        3     | 00:00:00.66   | 00:00:00.03   |   00:00:01.91

Then i want to group by the same rows and calculate the mean from the row. The first, thirt and fourth should be grouped by and the average of the columns should be calculated

I already used this method:

df = df.groupby([‘machineId’,’id’]).agg({‘mean’: ‘mean’,’min’:’mean’,’max’:’mean})

but there is an error:

TypeError: Could not convert 00:00:03.47 to numeric

Asked By: hubi3012

||

Answers:

If need aggregate datetimes need native unix format, so firt convert values to numeric, aggregate and then convert back to original format:

cols = ['mean','min','max']

df[cols]= df[cols].apply(pd.to_datetime).astype('int64')

df = (df.groupby(['machineId','id'], as_index=False)
         .agg({'mean': 'mean','min':'mean','max':'mean'}))

df[cols]= df[cols].apply(lambda x:pd.to_datetime(x).dt.strftime('%H:%M:%S.%f'))
print (df)

  machineId  id             mean              min              max
0  machine1   1  00:00:01.300000  00:00:00.740000  00:00:01.860000
1  machine1   2  00:00:01.816666  00:00:00.233333  00:00:03.233333
2  machine1   3  00:00:00.660000  00:00:00.030000  00:00:01.910000
Answered By: jezrael
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.