Pandas "Formulas" not working as expected
Question:
I am trying to work with data from an accelerometer, trying to get the velocity from acceleration, on a df that looks like this:
{'T': {0: 0.007719999999999999,
1: 0.016677999999999797,
2: 0.024630999999996697,
3: 0.0325849999999983,
4: 0.040530999999995196},
'Ax': {0: 0.16, 1: 0.28, 2: 0.28, 3: 0.44, 4: 0.57},
'Ay': {0: 8.0, 1: 7.9, 2: 7.87, 3: 7.87, 4: 7.9},
'Az': {0: 3.83, 1: 3.83, 2: 3.79, 3: 3.76, 4: 3.76},
'delta T': {0: 0.00772,
1: 0.008957999999999798,
2: 0.0079529999999969,
3: 0.007954000000001606,
4: 0.007945999999996893}}
First, I set the Velocity of X, Y and Z to 0:
df_yt["Vx"] = 0
df_yt["Vy"] = 0
df_yt["Vz"] = 0
And then I entered the first value of each of these columns manually:
df_yt.loc[0,"Vx"] = 0.16*0.007720
df_yt.loc[0,"Vy"] = 8.00*0.007720
df_yt.loc[0,"Vz"] = 3.83*0.007720
I wanted to create a formula that returned the previous element in Vx + (Ax*delta T) of the same column. And to write the "formulas" of these 3 columns, I assumed it would be something like:
df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"]
df_yt.loc[1:,"Vy"] = df_yt["Vy"].shift(1) + df_yt["Ay"]*df_yt["delta T"]
df_yt.loc[1:,"Vz"] = df_yt["Vz"].shift(1) + df_yt["Az"]*df_yt["delta T"]
and this code doesn’t return any error but the numbers on the df don’t match what they should, for example:
should be 0.005970:
0.003743 + 0.28*0.007953 = 0.005970
I hope someone can help me with this because I don’t know what is causing this mistake and I can’t even understand where the wrong numbers are coming from.
Answers:
Try as follows:
- Use
df.mul
to multiply each column in ['Ax','Ay','Az']
with delta T
along axis 0
, and apply df.cumsum
.
df_yt[['Vx','Vy','Vz']] = df_yt[['Ax','Ay','Az']].mul(df_yt['delta T'],
axis=0).cumsum()
print(df_yt)
T Ax Ay Az delta T Vx Vy Vz
0 0.007720 0.16 8.00 3.83 0.007720 0.001235 0.061760 0.029568
1 0.016678 0.28 7.90 3.83 0.008958 0.003743 0.132528 0.063877
2 0.024631 0.28 7.87 3.79 0.007953 0.005970 0.195118 0.094019
3 0.032585 0.44 7.87 3.76 0.007954 0.009470 0.257716 0.123926
4 0.040531 0.57 7.90 3.76 0.007946 0.013999 0.320490 0.153803
Incidentally, the problem with your own attempt becomes apparent when you print the values for any of the .shift(1)
statements. E.g. you do:
df_yt["Vx"] = 0
df_yt.loc[0,"Vx"] = 0.16*0.007720
print(df_yt["Vx"].shift(1))
0 NaN
1 0.001235
2 0.000000
3 0.000000
4 0.000000
Name: Vx, dtype: float64
So, in a line such as df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"]
, per row you are adding: nothing (NaN
), 0.001235
, and then just zeros after that. E.g. this adds correct values only for the second row (index 1
).
Your calculations are vectorized and not iterative and therefore the relations between the rows are not based on the previous calculations.
For the input:
T Ax Ay Az delta T Vx Vy Vz
0 0.007720 0.16 8.00 3.83 0.007720 0.001235 0.06176 0.029568
1 0.016678 0.28 7.90 3.83 0.008958 0.000000 0.00000 0.000000
2 0.024631 0.28 7.87 3.79 0.007953 0.000000 0.00000 0.000000
3 0.032585 0.44 7.87 3.76 0.007954 0.000000 0.00000 0.000000
4 0.040531 0.57 7.90 3.76 0.007946 0.000000 0.00000 0.000000
If you would run df_yt["Vx"].shift(1)
, you will get:
0 NaN
1 0.001235
2 0.000000
3 0.000000
4 0.000000
Therefore you calculation for Vx, is actually:
Based on the post here: Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?
Based on the post above,
I would suggest:
for i in range(1, len(df_yt)):
df_yt.loc[i, 'Vx'] = df_yt.loc[i-1, 'Vx'] + df_yt.loc[i, 'Ax']*df_yt.loc[i, 'delta T']
df_yt.loc[i, 'Vy'] = df_yt.loc[i-1, 'Vy'] + df_yt.loc[i, 'Ay']*df_yt.loc[i, 'delta T']
df_yt.loc[i, 'Vz'] = df_yt.loc[i-1, 'Vz'] + df_yt.loc[i, 'Az']*df_yt.loc[i, 'delta T']
Output:
T Ax Ay Az delta T Vx Vy Vz
0 0.007720 0.16 8.00 3.83 0.007720 0.001235 0.061760 0.029568
1 0.016678 0.28 7.90 3.83 0.008958 0.003743 0.132528 0.063877
2 0.024631 0.28 7.87 3.79 0.007953 0.005970 0.195118 0.094019
3 0.032585 0.44 7.87 3.76 0.007954 0.009470 0.257716 0.123926
4 0.040531 0.57 7.90 3.76 0.007946 0.013999 0.320490 0.153803
- I know it’s not vectorize
Hope it helps
I am trying to work with data from an accelerometer, trying to get the velocity from acceleration, on a df that looks like this:
{'T': {0: 0.007719999999999999,
1: 0.016677999999999797,
2: 0.024630999999996697,
3: 0.0325849999999983,
4: 0.040530999999995196},
'Ax': {0: 0.16, 1: 0.28, 2: 0.28, 3: 0.44, 4: 0.57},
'Ay': {0: 8.0, 1: 7.9, 2: 7.87, 3: 7.87, 4: 7.9},
'Az': {0: 3.83, 1: 3.83, 2: 3.79, 3: 3.76, 4: 3.76},
'delta T': {0: 0.00772,
1: 0.008957999999999798,
2: 0.0079529999999969,
3: 0.007954000000001606,
4: 0.007945999999996893}}
First, I set the Velocity of X, Y and Z to 0:
df_yt["Vx"] = 0
df_yt["Vy"] = 0
df_yt["Vz"] = 0
And then I entered the first value of each of these columns manually:
df_yt.loc[0,"Vx"] = 0.16*0.007720
df_yt.loc[0,"Vy"] = 8.00*0.007720
df_yt.loc[0,"Vz"] = 3.83*0.007720
I wanted to create a formula that returned the previous element in Vx + (Ax*delta T) of the same column. And to write the "formulas" of these 3 columns, I assumed it would be something like:
df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"]
df_yt.loc[1:,"Vy"] = df_yt["Vy"].shift(1) + df_yt["Ay"]*df_yt["delta T"]
df_yt.loc[1:,"Vz"] = df_yt["Vz"].shift(1) + df_yt["Az"]*df_yt["delta T"]
and this code doesn’t return any error but the numbers on the df don’t match what they should, for example:
should be 0.005970:
0.003743 + 0.28*0.007953 = 0.005970
I hope someone can help me with this because I don’t know what is causing this mistake and I can’t even understand where the wrong numbers are coming from.
Try as follows:
- Use
df.mul
to multiply each column in['Ax','Ay','Az']
withdelta T
along axis0
, and applydf.cumsum
.
df_yt[['Vx','Vy','Vz']] = df_yt[['Ax','Ay','Az']].mul(df_yt['delta T'],
axis=0).cumsum()
print(df_yt)
T Ax Ay Az delta T Vx Vy Vz
0 0.007720 0.16 8.00 3.83 0.007720 0.001235 0.061760 0.029568
1 0.016678 0.28 7.90 3.83 0.008958 0.003743 0.132528 0.063877
2 0.024631 0.28 7.87 3.79 0.007953 0.005970 0.195118 0.094019
3 0.032585 0.44 7.87 3.76 0.007954 0.009470 0.257716 0.123926
4 0.040531 0.57 7.90 3.76 0.007946 0.013999 0.320490 0.153803
Incidentally, the problem with your own attempt becomes apparent when you print the values for any of the .shift(1)
statements. E.g. you do:
df_yt["Vx"] = 0
df_yt.loc[0,"Vx"] = 0.16*0.007720
print(df_yt["Vx"].shift(1))
0 NaN
1 0.001235
2 0.000000
3 0.000000
4 0.000000
Name: Vx, dtype: float64
So, in a line such as df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"]
, per row you are adding: nothing (NaN
), 0.001235
, and then just zeros after that. E.g. this adds correct values only for the second row (index 1
).
Your calculations are vectorized and not iterative and therefore the relations between the rows are not based on the previous calculations.
For the input:
T Ax Ay Az delta T Vx Vy Vz
0 0.007720 0.16 8.00 3.83 0.007720 0.001235 0.06176 0.029568
1 0.016678 0.28 7.90 3.83 0.008958 0.000000 0.00000 0.000000
2 0.024631 0.28 7.87 3.79 0.007953 0.000000 0.00000 0.000000
3 0.032585 0.44 7.87 3.76 0.007954 0.000000 0.00000 0.000000
4 0.040531 0.57 7.90 3.76 0.007946 0.000000 0.00000 0.000000
If you would run df_yt["Vx"].shift(1)
, you will get:
0 NaN
1 0.001235
2 0.000000
3 0.000000
4 0.000000
Therefore you calculation for Vx, is actually:
Based on the post here: Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?
Based on the post above,
I would suggest:
for i in range(1, len(df_yt)):
df_yt.loc[i, 'Vx'] = df_yt.loc[i-1, 'Vx'] + df_yt.loc[i, 'Ax']*df_yt.loc[i, 'delta T']
df_yt.loc[i, 'Vy'] = df_yt.loc[i-1, 'Vy'] + df_yt.loc[i, 'Ay']*df_yt.loc[i, 'delta T']
df_yt.loc[i, 'Vz'] = df_yt.loc[i-1, 'Vz'] + df_yt.loc[i, 'Az']*df_yt.loc[i, 'delta T']
Output:
T Ax Ay Az delta T Vx Vy Vz
0 0.007720 0.16 8.00 3.83 0.007720 0.001235 0.061760 0.029568
1 0.016678 0.28 7.90 3.83 0.008958 0.003743 0.132528 0.063877
2 0.024631 0.28 7.87 3.79 0.007953 0.005970 0.195118 0.094019
3 0.032585 0.44 7.87 3.76 0.007954 0.009470 0.257716 0.123926
4 0.040531 0.57 7.90 3.76 0.007946 0.013999 0.320490 0.153803
- I know it’s not vectorize
Hope it helps