Fill missing dates hourly per group with previous value in certain column using Pandas

Question:

I have the following dataframe (sample):

import pandas as pd

data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 14:00:00', False, 3],
        ['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4]]
df = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])

  group                 date  indicator  value
0     A  2022-09-01 10:00:00      False      2
1     A  2022-09-01 14:00:00      False      3
2     B  2022-09-01 13:00:00      False      1
3     B  2022-09-01 16:00:00       True      4

I would like to fill in the missing dates between dates hourly. So each hour that is missing between dates should be filled and the values should be the same as the previous data. Here is the desired output:

data = [['A', '2022-09-01 10:00:00', False, 2], ['A', '2022-09-01 11:00:00', False, 2], 
        ['A', '2022-09-01 12:00:00', False, 2], ['A', '2022-09-01 13:00:00', False, 2], 
        ['A', '2022-09-01 14:00:00', False, 3],
        ['B', '2022-09-01 13:00:00', False, 1], ['B', '2022-09-01 14:00:00', False, 1],
        ['B', '2022-09-01 15:00:00', False, 1], ['B', '2022-09-01 16:00:00', True, 4]]
df_desired = pd.DataFrame(data = data, columns = ['group', 'date', 'indicator', 'value'])

  group                 date  indicator  value
0     A  2022-09-01 10:00:00      False      2
1     A  2022-09-01 11:00:00      False      2
2     A  2022-09-01 12:00:00      False      2
3     A  2022-09-01 13:00:00      False      2
4     A  2022-09-01 14:00:00      False      3
5     B  2022-09-01 13:00:00      False      1
6     B  2022-09-01 14:00:00      False      1
7     B  2022-09-01 15:00:00      False      1
8     B  2022-09-01 16:00:00       True      4

So I was wondering if it is possible to fill the missing dates hourly per group with the previous value in column value using Pandas?

Asked By: Quinten

||

Answers:

You can use:

df['date'] = pd.to_datetime(df['date'])

out = (df
   .groupby('group', as_index=False, group_keys=False)
   .apply(lambda g: g.set_index('date')
                     .reindex(pd.date_range(g['date'].min(),
                                            g['date'].max(),
                                            freq='H'))
                     .ffill(downcast='infer').reset_index()
         
         )
   .reset_index(drop=True)
)

output:

                index group  indicator  value
0 2022-09-01 10:00:00     A      False      2
1 2022-09-01 11:00:00     A      False      2
2 2022-09-01 12:00:00     A      False      2
3 2022-09-01 13:00:00     A      False      2
4 2022-09-01 14:00:00     A      False      3
5 2022-09-01 13:00:00     B      False      1
6 2022-09-01 14:00:00     B      False      1
7 2022-09-01 15:00:00     B      False      1
8 2022-09-01 16:00:00     B       True      4
Answered By: mozway

here is one other way about it

df['date']=pd.to_datetime(df['date'])

df2=(df.set_index('date' )
     .groupby('group',  group_keys=False)
     .apply(lambda x: x.resample('1H').ffill()) 
     .reset_index() )
df2
                   date     group   indicator   value
0   2022-09-01 10:00:00        A    False       2
1   2022-09-01 11:00:00        A    False       2
2   2022-09-01 12:00:00        A    False       2
3   2022-09-01 13:00:00        A    False       2
4   2022-09-01 14:00:00        A    False       3
5   2022-09-01 13:00:00        B    False       1
6   2022-09-01 14:00:00        B    False       1
7   2022-09-01 15:00:00        B    False       1
8   2022-09-01 16:00:00        B    True        4
Answered By: Naveed

One option is with complete from pyjanitor, to expose missing rows:

# pip install pyjanitor
import pandas as pd
import janitor
df['date'] = pd.to_datetime(df['date'])

# build a dictionary to contain the new dates
# the key of the dictionary must exist in the dataframe
new_date = {'date': lambda date: pd.date_range(date.min(), date.max(), freq='H')}

df.complete(new_date, by = 'group').ffill(downcast='infer') 
  group                date  indicator  value
0     A 2022-09-01 10:00:00      False      2
1     A 2022-09-01 11:00:00      False      2
2     A 2022-09-01 12:00:00      False      2
3     A 2022-09-01 13:00:00      False      2
4     A 2022-09-01 14:00:00      False      3
5     B 2022-09-01 13:00:00      False      1
6     B 2022-09-01 14:00:00      False      1
7     B 2022-09-01 15:00:00      False      1
8     B 2022-09-01 16:00:00       True      4
Answered By: sammywemmy
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.