Pandas "Formulas" not working as expected

Question:

I am trying to work with data from an accelerometer, trying to get the velocity from acceleration, on a df that looks like this:

{'T': {0: 0.007719999999999999,
  1: 0.016677999999999797,
  2: 0.024630999999996697,
  3: 0.0325849999999983,
  4: 0.040530999999995196},
 'Ax': {0: 0.16, 1: 0.28, 2: 0.28, 3: 0.44, 4: 0.57},
 'Ay': {0: 8.0, 1: 7.9, 2: 7.87, 3: 7.87, 4: 7.9},
 'Az': {0: 3.83, 1: 3.83, 2: 3.79, 3: 3.76, 4: 3.76},
 'delta T': {0: 0.00772,
  1: 0.008957999999999798,
  2: 0.0079529999999969,
  3: 0.007954000000001606,
  4: 0.007945999999996893}}

example of df

First, I set the Velocity of X, Y and Z to 0:

df_yt["Vx"] = 0
df_yt["Vy"] = 0
df_yt["Vz"] = 0

And then I entered the first value of each of these columns manually:

df_yt.loc[0,"Vx"] = 0.16*0.007720
df_yt.loc[0,"Vy"] = 8.00*0.007720
df_yt.loc[0,"Vz"] = 3.83*0.007720

I wanted to create a formula that returned the previous element in Vx + (Ax*delta T) of the same column. And to write the "formulas" of these 3 columns, I assumed it would be something like:

df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"]
df_yt.loc[1:,"Vy"] = df_yt["Vy"].shift(1) + df_yt["Ay"]*df_yt["delta T"]
df_yt.loc[1:,"Vz"] = df_yt["Vz"].shift(1) + df_yt["Az"]*df_yt["delta T"]

and this code doesn’t return any error but the numbers on the df don’t match what they should, for example:

This number

should be 0.005970:

0.003743 + 0.28*0.007953 = 0.005970

I hope someone can help me with this because I don’t know what is causing this mistake and I can’t even understand where the wrong numbers are coming from.

Asked By: Francisco Barroca

||

Answers:

Try as follows:

  • Use df.mul to multiply each column in ['Ax','Ay','Az'] with delta T along axis 0, and apply df.cumsum.
df_yt[['Vx','Vy','Vz']] = df_yt[['Ax','Ay','Az']].mul(df_yt['delta T'], 
                                                      axis=0).cumsum()

print(df_yt)

          T    Ax    Ay    Az   delta T        Vx        Vy        Vz
0  0.007720  0.16  8.00  3.83  0.007720  0.001235  0.061760  0.029568
1  0.016678  0.28  7.90  3.83  0.008958  0.003743  0.132528  0.063877
2  0.024631  0.28  7.87  3.79  0.007953  0.005970  0.195118  0.094019
3  0.032585  0.44  7.87  3.76  0.007954  0.009470  0.257716  0.123926
4  0.040531  0.57  7.90  3.76  0.007946  0.013999  0.320490  0.153803

Incidentally, the problem with your own attempt becomes apparent when you print the values for any of the .shift(1) statements. E.g. you do:

df_yt["Vx"] = 0
df_yt.loc[0,"Vx"] = 0.16*0.007720

print(df_yt["Vx"].shift(1))

0         NaN
1    0.001235
2    0.000000
3    0.000000
4    0.000000
Name: Vx, dtype: float64

So, in a line such as df_yt.loc[1:,"Vx"] = df_yt["Vx"].shift(1) + df_yt["Ax"]*df_yt["delta T"], per row you are adding: nothing (NaN), 0.001235, and then just zeros after that. E.g. this adds correct values only for the second row (index 1).

Answered By: ouroboros1

Your calculations are vectorized and not iterative and therefore the relations between the rows are not based on the previous calculations.

For the input:

       T      Ax         Ay      Az      delta T           Vx     Vy       Vz
0   0.007720    0.16    8.00    3.83    0.007720    0.001235    0.06176 0.029568
1   0.016678    0.28    7.90    3.83    0.008958    0.000000    0.00000 0.000000
2   0.024631    0.28    7.87    3.79    0.007953    0.000000    0.00000 0.000000
3   0.032585    0.44    7.87    3.76    0.007954    0.000000    0.00000 0.000000
4   0.040531    0.57    7.90    3.76    0.007946    0.000000    0.00000 0.000000

If you would run df_yt["Vx"].shift(1), you will get:

0         NaN
1    0.001235
2    0.000000
3    0.000000
4    0.000000

Therefore you calculation for Vx, is actually:

enter image description here

Based on the post here: Is there a way in Pandas to use previous row value in dataframe.apply when previous value is also calculated in the apply?

Based on the post above,
I would suggest:

for i in range(1, len(df_yt)):
    df_yt.loc[i, 'Vx'] = df_yt.loc[i-1, 'Vx'] + df_yt.loc[i, 'Ax']*df_yt.loc[i, 'delta T']
    df_yt.loc[i, 'Vy'] = df_yt.loc[i-1, 'Vy'] + df_yt.loc[i, 'Ay']*df_yt.loc[i, 'delta T']
    df_yt.loc[i, 'Vz'] = df_yt.loc[i-1, 'Vz'] + df_yt.loc[i, 'Az']*df_yt.loc[i, 'delta T']

Output:

       T         Ax      Ay      Az      delta T      Vx           Vy    Vz
0   0.007720    0.16    8.00    3.83    0.007720    0.001235    0.061760    0.029568
1   0.016678    0.28    7.90    3.83    0.008958    0.003743    0.132528    0.063877
2   0.024631    0.28    7.87    3.79    0.007953    0.005970    0.195118    0.094019
3   0.032585    0.44    7.87    3.76    0.007954    0.009470    0.257716    0.123926
4   0.040531    0.57    7.90    3.76    0.007946    0.013999    0.320490    0.153803
  • I know it’s not vectorize

Hope it helps

Answered By: Ilan Geffen