how to fill missing timestamp values with mean value in pandas dataframe
Question:
I have large set of data here pasting piece of data in my data every 59 sec value is missing here 12:47:59 is missing how to append it and fill missing rpm value with mean rpm value
df = pd.DataFrame({ 'Time': ['12:47:56', '12:47:57', '12:47:58', '12:48:00', '12:48:01', '12:48:02', '12:48:03'], 'rpm': [5.5, 7.0, 9.0, 12.0, 16.0, 19.0, 20.0] })
here is my expected output
df = pd.DataFrame({ 'Time': ['12:47:56', '12:47:57', '12:47:58','12:47:59', '12:48:00', '12:48:01', '12:48:02', '12:48:03'], 'rpm': [5.5, 7.0, 9.0, 10.5,12.0, 16.0, 19.0, 20.0] })
Answers:
Create DatetimeIndex
and add missing values by DataFrame.asfreq
, then use DatetimeIndex.time
and Series.interpolate
:
out = df.set_index(pd.to_datetime(df['Time'], format='%H:%M:%S')).asfreq('S')
#alternative with resample, e.g. by aggregate first value
#out = df.set_index(pd.to_datetime(df['Time'], format='%H:%M:%S')).resample('S').first()
out['Time'] = out.index.time
out['rpm'] = out['rpm'].interpolate()
out = out.reset_index(drop=True)
print (out)
Time rpm
0 12:47:56 5.5
1 12:47:57 7.0
2 12:47:58 9.0
3 12:47:59 10.5
4 12:48:00 12.0
5 12:48:01 16.0
6 12:48:02 19.0
7 12:48:03 20.0
I have large set of data here pasting piece of data in my data every 59 sec value is missing here 12:47:59 is missing how to append it and fill missing rpm value with mean rpm value
df = pd.DataFrame({ 'Time': ['12:47:56', '12:47:57', '12:47:58', '12:48:00', '12:48:01', '12:48:02', '12:48:03'], 'rpm': [5.5, 7.0, 9.0, 12.0, 16.0, 19.0, 20.0] })
here is my expected output
df = pd.DataFrame({ 'Time': ['12:47:56', '12:47:57', '12:47:58','12:47:59', '12:48:00', '12:48:01', '12:48:02', '12:48:03'], 'rpm': [5.5, 7.0, 9.0, 10.5,12.0, 16.0, 19.0, 20.0] })
Create DatetimeIndex
and add missing values by DataFrame.asfreq
, then use DatetimeIndex.time
and Series.interpolate
:
out = df.set_index(pd.to_datetime(df['Time'], format='%H:%M:%S')).asfreq('S')
#alternative with resample, e.g. by aggregate first value
#out = df.set_index(pd.to_datetime(df['Time'], format='%H:%M:%S')).resample('S').first()
out['Time'] = out.index.time
out['rpm'] = out['rpm'].interpolate()
out = out.reset_index(drop=True)
print (out)
Time rpm
0 12:47:56 5.5
1 12:47:57 7.0
2 12:47:58 9.0
3 12:47:59 10.5
4 12:48:00 12.0
5 12:48:01 16.0
6 12:48:02 19.0
7 12:48:03 20.0