Compare two DataFrames and find missing timestamps

Question:

I have the following two dataframes:

df1=

   date                col1
0  2023-01-01 16:00:00 100
1  2023-01-01 16:15:00 120
2  2023-01-01 16:30:00 140
3  2023-01-01 16:45:00 160
4  2023-01-01 17:00:00 200
5  2023-01-01 17:15:00 430
6  2023-01-01 17:30:00 890

df2 =

   date                col2 col3 
0  2023-01-01 16:00:00 100  200
1  2023-01-01 16:15:00 120  400
2  2023-01-01 17:00:00 200  500

and in df2 I have some missing timestamps compared to df1. I am able to find those timestamps using the following code:

df1[~df1['date'].isin(df2['date'])]

I want to populate those missing timestamps in df2 and fill in the values of the columns with the average value of the two previous rows.

So the new df2 should look like this:

df2 =

   date                col2    col3 
0  2023-01-01 16:00:00 100     200
1  2023-01-01 16:15:00 120     400
2  2023-01-01 16:30:00 110     300
3  2023-01-01 16:45:00 115     350
4  2023-01-01 17:00:00 200     500
5  2023-01-01 17:15:00 257.5   425
6  2023-01-01 17:30:00 228.75  462.5
Asked By: Pythoneer

||

Answers:

Not ideal solution via iteration:

df1 = [
    ['2023-01-01 16:00:00', 100],
    ['2023-01-01 16:15:00', 120],
    ['2023-01-01 16:30:00', 140],
    ['2023-01-01 16:45:00', 160],
    ['2023-01-01 17:00:00', 200],
    ['2023-01-01 17:15:00', 430],
    ['2023-01-01 17:30:00', 890],
]

df2 = [
    ['2023-01-01 16:00:00', 100,  200],
    ['2023-01-01 16:15:00', 120,  400],
    ['2023-01-01 17:00:00', 200,  500],
]

df1= pd.DataFrame(df1, columns = ['date', 'col1'])

df2= pd.DataFrame(df2, columns = ['date', 'col2', 'col3'])

missing = df1[~df1['date'].isin(df2['date'])]
missing = missing.drop(['col1'], axis=1)

merged = pd.concat([df2, missing])
merged.sort_values('date', inplace=True, ignore_index=True)

for index, row in merged.iterrows():
    if np.isnan(row['col2']):
        merged['col2'].at[index] = merged['col2'].iloc[[index-1, index-2]].mean()
    if np.isnan(row['col3']):
        merged['col3'].at[index] = merged['col3'].iloc[[index-1, index-2]].mean()

print(merged)

Output:

date col2 col3
2023-01-01 16:00:00 100.00 200.0
2023-01-01 16:15:00 120.00 400.0
2023-01-01 16:30:00 110.00 300.0
2023-01-01 16:45:00 115.00 350.0
2023-01-01 17:00:00 200.00 500.0
2023-01-01 17:15:00 157.50 425.0
2023-01-01 17:30:00 178.75 462.5
Answered By: Guru Stron
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.