Pandas interpolation to extend data is giving bad results

Question:

I have a dataset with 'DEN' values as a function
of 'Z', which goes to Z = ~425000, but I would like to extend it up to Z = 500000. I attempted to do this by adding a new data point to my pandas column at Z = 500000, and filling in the NaN values with spline and linear interpolation, but neither result gives a good fit. I tried the other interpolation methods, but none worked. This is the output:

enter image description here

but it should look something more like this (with some curvature, probably)

How could I get a better fit? Here is my code:
enter image description here

import pandas as pd

Z = [67.0016860961914, 202.4105987548828, 339.6864929199219, 478.540283203125, 618.98046875, 761.0699462890625, 904.8601684570312, 1050.3714599609375, 1197.6475830078125, 1346.7713623046875, 1498.6524658203125, 1653.7181396484375, 1837.8468017578125, 2079.58544921875, 2355.20263671875, 2638.4990234375, 2929.645751953125, 3229.349853515625, 3617.75537109375, 4104.705078125, 4617.30859375, 5158.63427734375, 5733.37353515625, 6345.8583984375, 6999.3095703125, 7698.5947265625, 8450.8916015625, 9409.703125, 10553.18359375, 11668.6767578125, 12741.990234375, 13772.39453125, 14763.3818359375, 15719.619140625, 16649.482421875, 17554.62109375, 18469.857421875, 19430.984375, 20422.037109375, 21439.216796875, 22486.515625, 23560.853515625, 24665.4140625, 25805.2890625, 26985.13671875, 28204.599609375, 29457.5390625, 30742.876953125, 32064.46875, 33431.6640625, 34851.78515625, 36325.08984375, 37850.5390625, 39429.359375, 41058.421875, 42756.5703125, 44542.91015625, 46396.62890625, 48294.97265625, 50223.65234375, 52059.734375, 53766.05859375, 55550.89453125, 57403.92578125, 59192.73828125, 60936.0234375, 62660.0703125, 64366.703125, 66034.6953125, 67662.28125, 69244.1484375, 70767.453125, 72257.0859375, 73747.015625, 75229.140625, 76701.421875, 78177.2734375, 79667.5859375, 81194.4921875, 82738.984375, 84261.1875, 85742.84375, 87180.859375, 88590.0078125, 89967.6328125, 91287.2109375, 92543.7890625, 93754.3515625, 94940.046875, 96156.3359375, 97419.6953125, 98704.9296875, 100037.609375, 101529.7890625, 103333.3046875, 105506.3125, 107998.2734375, 110736.3828125, 113655.7734375, 116720.3515625, 119921.03125, 123249.9140625, 126694.953125, 130245.1953125, 133898.578125, 137668.90625, 141590.484375, 145714.953125, 150101.5, 154805.265625, 159865.140625, 165296.765625, 171092.75, 177227.46875, 183663.90625, 190363.34375, 197291.578125, 204419.296875, 211723.71875, 219187.59375, 226796.78125, 234539.328125, 242404.578125, 250382.59375, 258463.765625, 266638.875, 274899.125, 283236.34375, 291643.1875, 300113.1875, 308640.875, 317221.75, 325852.28125, 334530.15625, 343254.53125, 352026.25, 360848.3125, 369726.125, 378668.3125, 387687.25, 396800.21875, 406031.03125, 415412.28125, 424988.75, 434822.46875]

DEN = [2.934534393261856e-09, 3.046047858390466e-09, 3.287511374239216e-09, 3.5445970603120713e-09, 3.818016125478607e-09, 4.121767371856322e-09, 4.556317101389595e-09, 4.9088515474693395e-09, 5.256281188081857e-09, 5.6165689876763736e-09, 5.962251581337341e-09, 6.3753162748980685e-09, 7.051504713473378e-09, 7.8024715577385e-09, 8.947714569274012e-09, 1.0181534726427799e-08, 1.1241577446696738e-08, 1.2603742050032452e-08, 1.4735684672473326e-08, 1.801341120710731e-08, 2.2238978658606356e-08, 2.669004040001255e-08, 4.1519353288776983e-08, 5.4049081654738984e-08, 6.672090790971197e-08, 7.643878774388213e-08, 9.041653470376332e-08, 1.5108118134321558e-07, 2.133168237605787e-07, 2.2585844305922365e-07, 2.736193494001782e-07, 2.0513878951078368e-07, 1.841667227608923e-07, 2.2165528434925363e-07, 1.9528846451066784e-07, 1.9578206433834566e-07, 2.8487770009633095e-07, 4.6579410195590754e-07, 8.215626507990237e-07, 1.3647811556438683e-06, 1.9769688606174896e-06, 2.7490045795275364e-06, 3.741817181435181e-06, 5.50590266357176e-06, 7.5475641097000334e-06, 1.0890928024309687e-05, 1.6043488358263858e-05, 2.4222534193540923e-05, 3.2964235288091004e-05, 4.440318662091158e-05, 6.032012606738135e-05, 8.014062041183934e-05, 0.00011465149145806208, 0.00018679998174775392, 0.0003073820553254336, 0.00041833153227344155, 0.0004992358153685927, 0.0005737273604609072, 0.000637566379737109, 0.0054977829568088055, 0.08780906349420547, 0.5919457674026489, 3.2492551803588867, 11.0116548538208, 26.446338653564453, 38.97719955444336, 49.074031829833984, 57.095115661621094, 65.73072814941406, 74.49889373779297, 82.38655853271484, 85.20193481445312, 87.75443267822266, 89.5878677368164, 87.19244384765625, 83.95445251464844, 80.79202270507812, 79.2406997680664, 85.03714752197266, 106.86959838867188, 131.66307067871094, 173.25582885742188, 192.48924255371094, 259.2572326660156, 283.16607666015625, 445.22332763671875, 874.3299560546875, 1519.5841064453125, 2273.568115234375, 2919.748046875, 3389.169677734375, 3628.605224609375, 3582.08447265625, 3295.89013671875, 2909.16015625, 2511.64208984375, 2176.353271484375, 1895.889892578125, 1657.74365234375, 1452.5299072265625, 1280.7451171875, 1150.204345703125, 1073.4234619140625, 1064.7406005859375, 1123.1546630859375, 1220.3154296875, 1320.48486328125, 1394.374267578125, 2001.9783935546875, 7956.9150390625, 24130.984375, 50260.73828125, 87648.0, 133744.84375, 179154.34375, 214849.5625, 237765.265625, 249376.375, 252877.265625, 251325.625, 246972.625, 241273.703125, 235049.90625, 228696.953125, 222232.765625, 215427.25, 207914.984375, 199298.984375, 189272.90625, 177776.140625, 165114.734375, 151878.109375, 138702.59375, 126065.34375, 114250.4921875, 103397.0546875, 93629.078125, 85087.1875, 77709.4296875, 71285.0859375, 65632.859375, 60621.0, 56143.3984375, 52112.625, 48511.3671875]

dict = {
    'Z' : Z,
    'DEN': DEN
}

df = pd.DataFrame.from_dict(dict)
df = df.append({'Z':500000}, ignore_index=True)

df2 = df.interpolate(method='spline', order=3,limit=10,limit_direction='both', axis=0)
df3 = df.interpolate(method='linear',limit=10,limit_direction='both', axis=0)

plt.plot(df2['DEN'],df2['Z'])
plt.plot(df3['DEN'],df3['Z'])
plt.show()
Asked By: Billiam

||

Answers:

The main issue is that your Z value makes a big jump from 434k to 500k. You should use Z as the index of df because the interpolate method is based on the index values.

Method 1 – Linear extrapolation
You can do it by adding a single new datapoint.

df = pd.DataFrame.from_dict(dict)

df_new = pd.DataFrame({'Z':[500000]})
df = pd.concat([df, df_new], ignore_index=True) # Append is deprecated, use concat instead
df.set_index('Z', inplace=True)

df1 = df.interpolate(method='spline', order=1, axis=0)

plt.plot(df1['DEN'], df1.index, label='1 - Linear')
plt.plot(df['DEN'], df.index, label='Initial data') # Initial data as a reference
plt.legend(loc="lower right")

plt.show()

Output:
Linear extrapolation

Method 2 – Polynomial extrapolation
You need to add more than 1 new datapoint to get a nice visual plot.

df = pd.DataFrame.from_dict(dict)

last_z = df.loc[len(df)-1,'Z'] # 434822.46875

# For example adding datapoints by steps of 1,000 until 500,000
df_new = pd.DataFrame({'Z' : list(range(int(last_z), 500_000, 1_000))})
df = pd.concat([df, df_new], ignore_index=True) # Append is deprecated, use concat instead

df.set_index('Z', inplace=True)

df1 = df.interpolate(method='spline', order=1, axis=0) # Linear extrapolation
df2 = df.interpolate(method='spline', order=2, axis=0) # Quadratic extrapolation
df3 = df.interpolate(method='spline', order=3, axis=0) # Cubic extrapolation
df4 = df.interpolate(method='spline', order=4, axis=0) # Quartic extrapolation

plt.plot(df1['DEN'], df1.index, label='1 - Linear')
plt.plot(df2['DEN'], df2.index, label='2 - Quadratic')
plt.plot(df3['DEN'], df3.index, label='3 - Cubic')
plt.plot(df4['DEN'], df4.index, label='4 - Quartic')
plt.plot(df['DEN'], df.index, label='Initial data') # Initial data as a reference
plt.legend(loc="lower right")

plt.show()

Output:
Polynomial extrapolation

Answered By: Mattravel