Grouping rows with same date Pandas df
Question:
I have a csv file with data from multiple sensors, like this:
YY, mm, dd, HH, MM, sensor, sensorvalue
2018, 1, 1, 00, 00, 1, 0.2
2018, 1, 1, 00, 10, 1, 0
2018, 1, 1, 00, 20, 1, 0.1
2018, 1, 1, 00, 00, 2, 90.1
2018, 1, 1, 00, 10, 2, 90.3
2018, 1, 1, 00, 20, 2, 91.0
2018, 1, 1, 00, 00, 7, 1.5
2018, 1, 1, 00, 10, 7, 1.3
2018, 1, 1, 00, 20, 7, 0.7
And I want to transform that on a pandas df with a column for each sensor, with datetime as index, like this:
date, sensor1value, sensor2value, sensor7value
2018-1-1 00:00, 0.2, 90.1, 1.5
2018-1-1 00:10, 0, 90.3, 1.3
2018-1-1 00:20, 0.1, 91.0, 0.7
There’s an easy way to do that on pandas?
Answers:
You may want to use pandas apply to loop over rows and create a datetime date and set it as an index.
Something like:
df.set_index(df.apply(lambda row: datetime(int(row["YY"]), int(row["mm"]), int(row["dd"]), int(row["HH"]), int(row["MM"])), axis=1)).loc[:, ["sensor1value", "sensor2value", "sensor7value"]]
Thanks Gabriel for the answer, the lambda function worked well, but the iloc part is not exactly what I needed to do, so I created a solution using a dic with the name of each sensor code and using pivot table:
estParams = {
1: 'sensor1value',
2: 'sensor2value',
7: 'sensor7value'
}
date = df.apply(lambda row: datetime(int(row["YY"]), int(row["mm"]), int(row["dd"]), int(row["HH"]), int(row["MM"])), axis=1)
df.insert(0, 'Date', date)
df["sensor"].replace(estParams, inplace=True)
pivot = df.pivot_table('sensorvalue', ['Date'], 'sensor')
The pivot table:
sensor sensor1value sensor2value sensor7value
Date
2018-01-01 00:00:00 0.2 90.1 1.5
2018-01-01 00:10:00 0.0 90.3 1.3
2018-01-01 00:20:00 0.1 91.0 0.7
I have a csv file with data from multiple sensors, like this:
YY, mm, dd, HH, MM, sensor, sensorvalue
2018, 1, 1, 00, 00, 1, 0.2
2018, 1, 1, 00, 10, 1, 0
2018, 1, 1, 00, 20, 1, 0.1
2018, 1, 1, 00, 00, 2, 90.1
2018, 1, 1, 00, 10, 2, 90.3
2018, 1, 1, 00, 20, 2, 91.0
2018, 1, 1, 00, 00, 7, 1.5
2018, 1, 1, 00, 10, 7, 1.3
2018, 1, 1, 00, 20, 7, 0.7
And I want to transform that on a pandas df with a column for each sensor, with datetime as index, like this:
date, sensor1value, sensor2value, sensor7value
2018-1-1 00:00, 0.2, 90.1, 1.5
2018-1-1 00:10, 0, 90.3, 1.3
2018-1-1 00:20, 0.1, 91.0, 0.7
There’s an easy way to do that on pandas?
You may want to use pandas apply to loop over rows and create a datetime date and set it as an index.
Something like:
df.set_index(df.apply(lambda row: datetime(int(row["YY"]), int(row["mm"]), int(row["dd"]), int(row["HH"]), int(row["MM"])), axis=1)).loc[:, ["sensor1value", "sensor2value", "sensor7value"]]
Thanks Gabriel for the answer, the lambda function worked well, but the iloc part is not exactly what I needed to do, so I created a solution using a dic with the name of each sensor code and using pivot table:
estParams = {
1: 'sensor1value',
2: 'sensor2value',
7: 'sensor7value'
}
date = df.apply(lambda row: datetime(int(row["YY"]), int(row["mm"]), int(row["dd"]), int(row["HH"]), int(row["MM"])), axis=1)
df.insert(0, 'Date', date)
df["sensor"].replace(estParams, inplace=True)
pivot = df.pivot_table('sensorvalue', ['Date'], 'sensor')
The pivot table:
sensor sensor1value sensor2value sensor7value
Date
2018-01-01 00:00:00 0.2 90.1 1.5
2018-01-01 00:10:00 0.0 90.3 1.3
2018-01-01 00:20:00 0.1 91.0 0.7