Calculate length of a cycle, average etc. in a Dataframe with Datetime tag
Question:
I have a Dataframe file that looks like below (imagine so many other rows):
DateTime hours uptime Start_stop machine_started Voltage
2022-02-22 07:50:58;269 18,7 43269 221 0 800
2022-02-22 07:50:59;614 18,7 44609 221 0 800
2022-02-22 07:50:59;614 18,7 44609 221 1 800
2022-02-22 07:51:01;915 18,7 46904 221 1 700
2022-02-22 07:51:01;915 18,7 46904 221 1 500
2022-02-22 08:16:20;343 18,7 48523 221 0 800
2022-02-22 08:16:20;638 18,7 48523 221 0 800
2022-02-23 07:51:01;915 18,7 46904 221 1 800
2022-02-23 07:51:03;543 18,7 48523 221 1 1000
2022-02-23 08:16:20;343 18,7 48523 221 0 800
2022-02-23 08:16:20;638 18,7 48523 221 0 800
I need a logic to say when the machine is working, from the moment the (machine_started==1) start taking the rows until it ends (machine_started==0) and calculate the average of those rows, standard deviation and the length of that cycle.
I would like to have the Datetime tag that shows the process start time. As you can see for this example I have two times that the process starts.
So in the final file I will see a number of rows which corresponds to the number of cycles in that file.
Unfortunately I can just think of doing one task at a time and also after finding the first cycle I get stuck and cannot count the rows or perform these calculation.
So at the I would like to see something like this:
DateTime Average_Voltage standard_deviation length_of_the_cycle
2022-02-22 07:50:59;614 666 xxx YYY
2022-02-23 07:51:01;915 900 xxx YYY
How can this be acheived?
Answers:
First identify the groups that you are interested in and remove all other rows. To get the groups, we can use shift
and cumsum
. For the filtering, we need to consider whether the first row starts with machine_started
0 or 1. This determines if we want the odd or even numbered groups.
Code for this part:
df['grp'] = df['machine_started'].ne(df['machine_started'].shift(1)).cumsum()
start = df['machine_started'].iloc[0]
df = df.loc[(df['grp'] + start) % 2 == 0]
Intermediate result:
DateTime hours uptime Start_stop machine_started Voltage grp
2 2022-02-22 07:50:59;614 18,7 44609 221 1 800 2
3 2022-02-22 07:51:01;915 18,7 46904 221 1 700 2
4 2022-02-22 07:51:01;915 18,7 46904 221 1 500 2
7 2022-02-23 07:51:01;915 18,7 46904 221 1 800 4
8 2022-02-23 07:51:03;543 18,7 48523 221 1 1000 4
Now, we can groupby
and aggregate the information needed. Finally, rename the obtained columns:
df = df.groupby('grp').agg({'DateTime': ['first'], 'Voltage': ['mean', 'std', 'count']}).reset_index(drop=True)
df.columns = ['DateTime', 'Average_Voltage', 'standard_deviation', 'length_of_the_cycle']
Result:
DateTime Average_Voltage standard_deviation length_of_the_cycle
0 2022-02-22 07:50:59;614 666.666667 152.752523 3
1 2022-02-23 07:51:01;915 900.000000 141.421356 2
I have a Dataframe file that looks like below (imagine so many other rows):
DateTime hours uptime Start_stop machine_started Voltage
2022-02-22 07:50:58;269 18,7 43269 221 0 800
2022-02-22 07:50:59;614 18,7 44609 221 0 800
2022-02-22 07:50:59;614 18,7 44609 221 1 800
2022-02-22 07:51:01;915 18,7 46904 221 1 700
2022-02-22 07:51:01;915 18,7 46904 221 1 500
2022-02-22 08:16:20;343 18,7 48523 221 0 800
2022-02-22 08:16:20;638 18,7 48523 221 0 800
2022-02-23 07:51:01;915 18,7 46904 221 1 800
2022-02-23 07:51:03;543 18,7 48523 221 1 1000
2022-02-23 08:16:20;343 18,7 48523 221 0 800
2022-02-23 08:16:20;638 18,7 48523 221 0 800
I need a logic to say when the machine is working, from the moment the (machine_started==1) start taking the rows until it ends (machine_started==0) and calculate the average of those rows, standard deviation and the length of that cycle.
I would like to have the Datetime tag that shows the process start time. As you can see for this example I have two times that the process starts.
So in the final file I will see a number of rows which corresponds to the number of cycles in that file.
Unfortunately I can just think of doing one task at a time and also after finding the first cycle I get stuck and cannot count the rows or perform these calculation.
So at the I would like to see something like this:
DateTime Average_Voltage standard_deviation length_of_the_cycle
2022-02-22 07:50:59;614 666 xxx YYY
2022-02-23 07:51:01;915 900 xxx YYY
How can this be acheived?
First identify the groups that you are interested in and remove all other rows. To get the groups, we can use shift
and cumsum
. For the filtering, we need to consider whether the first row starts with machine_started
0 or 1. This determines if we want the odd or even numbered groups.
Code for this part:
df['grp'] = df['machine_started'].ne(df['machine_started'].shift(1)).cumsum()
start = df['machine_started'].iloc[0]
df = df.loc[(df['grp'] + start) % 2 == 0]
Intermediate result:
DateTime hours uptime Start_stop machine_started Voltage grp
2 2022-02-22 07:50:59;614 18,7 44609 221 1 800 2
3 2022-02-22 07:51:01;915 18,7 46904 221 1 700 2
4 2022-02-22 07:51:01;915 18,7 46904 221 1 500 2
7 2022-02-23 07:51:01;915 18,7 46904 221 1 800 4
8 2022-02-23 07:51:03;543 18,7 48523 221 1 1000 4
Now, we can groupby
and aggregate the information needed. Finally, rename the obtained columns:
df = df.groupby('grp').agg({'DateTime': ['first'], 'Voltage': ['mean', 'std', 'count']}).reset_index(drop=True)
df.columns = ['DateTime', 'Average_Voltage', 'standard_deviation', 'length_of_the_cycle']
Result:
DateTime Average_Voltage standard_deviation length_of_the_cycle
0 2022-02-22 07:50:59;614 666.666667 152.752523 3
1 2022-02-23 07:51:01;915 900.000000 141.421356 2