How to interpolate a value in a dataframe using custom formula

Question:

How can I apply a formula to interpolate missing values in my entire dataframe? I have already calculated the formula for one row and now I want to apply it to all the rows in my dataframe.

import pandas as pd

df = pd.DataFrame({'x': [2.2, 2.32, 2.38], 'y': [4.9644, None, 4.9738], 'z' : [5,4,2]})

###
missing_value = abs((((df.x[2] - df.x[1]) * (df.y[2] - df.y[0])) - ((df.x[2] - df.x[0]) * df.y[2]))/(df.x[2] - df.x[0]))

missing_value  = 4.9706

I want to extend this to my original data to calculate more missing values.

e.g

df = pd.DataFrame({'x': [2.2, 2.32, 2.38, 2.45,4.44,3.21,None, 2.45], 'y': [4.9644, None, 4.9738, 4.456,None, 4.356, None, None] , 'z' : [5,4,2, 1,1,3,4,5]})

#I tried this
import pandas as pd

# create a DataFrame with x and y columns
df = pd.DataFrame({'x': [2.2, 2.32, 2.38], 'y': [4.9644, None, 4.9738]})

# define a function to calculate the missing value in y
def calculate_y(row):
    x0, x1, x2 = row.iloc[0:2, 'x']
    y0, y1, y2 = row.iloc[0:2, 'y']
    return abs((((x2 - x1) * (y2 - y0)) - ((x2 - x0) * y2)) / (x2 - x0))

# apply the function to the DataFrame and save the result in a new column
df['calculated_y'] = df.apply(calculate_y, axis=1)

# print the DataFrame to see the calculated_y column
print(df)
Asked By: chuky pedro

||

Answers:

Given your formula, use the interpolate method with x as reference:

df.loc[df['y'].isna(), 'y'] = (df.set_index('x')['y']
                                 .interpolate('index')
                                 .set_axis(df.index)
                               )

Output:

      x         y  z
0  2.20  4.964400  5
1  2.32  4.970667  4
2  2.38  4.973800  2

If you have NaNs in x as in the second example, use a second mask to ignore them:

m = df['x'].notna()

df.loc[df['y'].isna()&m, 'y'] = (df[m].set_index('x')['y']
                                  .interpolate('index')
                                  .set_axis(df[m].index)
                                 )

Output:

      x         y  z
0  2.20  4.964400  5
1  2.32  4.970667  4
2  2.38  4.973800  2
3  2.45  4.456000  1
4  4.44  4.356000  1
5  3.21  4.356000  3
6   NaN       NaN  4
7  2.45  4.456000  5

custom formula

If you really want to use a custom formula, access the next/previous rows in a vectorial way with shift:

x = df['x']#.fillna(0)
y = df['y']#.fillna(0)


s = abs((((x.shift(-1) - x) * (y.shift(-1) - y.shift())) - ((x.shift(-1) - x.shift()) * y.shift(-1)))/(x.shift(-1) - x.shift()))

df['y'] = df['y'].fillna(s)
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.