Replace nan with zero or linear interpolation

Question:

I have a dataset with a lot of NaNs and numeric values with the following form:

                  PV_Power
2017-01-01 00:00:00 NaN
2017-01-01 01:00:00 NaN
2017-01-01 02:00:00 NaN
2017-01-01 03:00:00 NaN
2017-01-01 04:00:00 NaN
... ...
2017-12-31 20:00:00 NaN
2017-12-31 21:00:00 NaN
2017-12-31 22:00:00 NaN
2017-12-31 23:00:00 NaN
2018-01-01 00:00:00 NaN

What I need to do is to replace a NaN value with either 0 if it is between other NaN values or with the result of interpolation if it is between numeric values. Any idea of how can I achieve that?

Asked By: Kosmylo

||

Answers:

You could reindex your dataframe

idx = df.index
df = df.dropna().reindex(idx, fill_value=0)

or just set values where PV_Power is NaN:

df.loc[pd.isna(df.PV_Power), ["PV_Power"]] = 0
Answered By: xshagg

Use DataFrame.interpolate with limit_area='inside' if need interpolate between numeric values and then replace missing values:

print (df)
                     PV_Power
date                         
2017-01-01 00:00:00       NaN
2017-01-01 01:00:00       4.0
2017-01-01 02:00:00       NaN
2017-01-01 03:00:00       NaN
2017-01-01 04:00:00       5.0
2017-01-01 05:00:00       NaN
2017-01-01 06:00:00       NaN


df = df.interpolate(limit_area='inside').fillna(0)
print (df)
                     PV_Power
date                         
2017-01-01 00:00:00  0.000000
2017-01-01 01:00:00  4.000000
2017-01-01 02:00:00  4.333333
2017-01-01 03:00:00  4.666667
2017-01-01 04:00:00  5.000000
2017-01-01 05:00:00  0.000000
2017-01-01 06:00:00  0.000000
Answered By: jezrael

You Can use fillna(0) :-

df['PV_Power'].fillna(0, inplace=True)

or You Can Replace it:-

df['PV_Power'] = df['PV_Power'].replace(np.nan, 0)
Answered By: ZAVERI SIR
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.