Why does not Pandas interpolate method works when having the consecutive nan values?
Question:
Lets say I have following pandas Series:
s = pd.Series([np.nan, np.nan, np.nan, 0, 1, 2, 3])
and I want to use the pandas nearest interpolate method on this data.
When I run the code
s.interpolate(method='nearest')
– it does not do the interpolation.
When I modify the series lets say s = pd.Series([np.nan, 1, np.nan, 0, 1, 2, 3])
then the same method works.
Do you know how to do the interpolation in the first case?
Thanks!
Answers:
You need two surrounding values to be able to interpolate, else this would be extrapolation.
As you can see with:
s = pd.Series([np.nan, 1, np.nan, 0, 1, 2, 3])
s.interpolate(method='nearest')
only the intermediate NaNs are interpolated:
0 NaN # cannot interpolate
1 1.0
2 1.0 # interpolated
3 0.0
4 1.0
5 2.0
6 3.0
dtype: float64
As you want the nearest value, a workaround could be to bfill
(or ffill
):
s.interpolate(method='nearest').bfill()
output:
0 1.0
1 1.0
2 1.0
3 0.0
4 1.0
5 2.0
6 3.0
dtype: float64
follow-up
The only problem occurred when 1. s = pd.Series([np.nan, np.nan, np.nan, 0, np.nan, np.nan, np.nan])
and 2. s = pd.Series([np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan])
. In the first case, I want to have 0 everywhere. In the second case, I want to leave s as it is
try:
s2 = s.interpolate(method='nearest').bfill().ffill()
except ValueError:
s2 = s.bfill().ffill()
Lets say I have following pandas Series:
s = pd.Series([np.nan, np.nan, np.nan, 0, 1, 2, 3])
and I want to use the pandas nearest interpolate method on this data.
When I run the code
s.interpolate(method='nearest')
– it does not do the interpolation.
When I modify the series lets say s = pd.Series([np.nan, 1, np.nan, 0, 1, 2, 3])
then the same method works.
Do you know how to do the interpolation in the first case?
Thanks!
You need two surrounding values to be able to interpolate, else this would be extrapolation.
As you can see with:
s = pd.Series([np.nan, 1, np.nan, 0, 1, 2, 3])
s.interpolate(method='nearest')
only the intermediate NaNs are interpolated:
0 NaN # cannot interpolate
1 1.0
2 1.0 # interpolated
3 0.0
4 1.0
5 2.0
6 3.0
dtype: float64
As you want the nearest value, a workaround could be to bfill
(or ffill
):
s.interpolate(method='nearest').bfill()
output:
0 1.0
1 1.0
2 1.0
3 0.0
4 1.0
5 2.0
6 3.0
dtype: float64
follow-up
The only problem occurred when 1. s = pd.Series([np.nan, np.nan, np.nan, 0, np.nan, np.nan, np.nan])
and 2. s = pd.Series([np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan])
. In the first case, I want to have 0 everywhere. In the second case, I want to leave s as it is
try:
s2 = s.interpolate(method='nearest').bfill().ffill()
except ValueError:
s2 = s.bfill().ffill()