Create Multi-Index empty DataFrame to join with main DataFrame

Question:

Suppose that I have a dataframe which can be created using code below

df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-05'],
                          'product':['A', 'A', 'A', 'B', 'B', 'B'],
                          'price':[10, 20, 30, 40, 50, 60]
                         }
                 )
df['date'] = pd.to_datetime(df['date'])

I want to create an empty dataframe let’s say main_df which will contain all dates between df.date.min() and df.date.max() for each product and on days where values in nan I want to ffill and bfill for remaning. The resulting dataframe would be as below:

+------------+---------+-------+
|    date    | product | price |
+------------+---------+-------+
| 2021-01-01 | A       |    10 |
| 2021-01-02 | A       |    20 |
| 2021-01-03 | A       |    20 |
| 2021-01-04 | A       |    20 |
| 2021-01-05 | A       |    30 |
| 2021-01-01 | B       |    40 |
| 2021-01-02 | B       |    40 |
| 2021-01-03 | B       |    50 |
| 2021-01-04 | B       |    50 |
| 2021-01-05 | B       |    60 |
+------------+---------+-------+
Asked By: Lopez

||

Answers:

Using resample

df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-06'],
                                'product':['A', 'A', 'A', 'B', 'B', 'B'],
                                'price':[10, 20, 30, 40, 50, 60]
                               }
                       )
df['date'] = pd.to_datetime(df['date'])

df
# Out: 
#          date product  price
# 0  2021-01-01       A     10
# 1  2021-01-02       A     20
# 2  2021-01-05       A     30
# 3  2021-01-02       B     40
# 4  2021-01-03       B     50
# 5  2021-01-06       B     60




df.set_index("date").groupby("product")["price"].resample("d").ffill().reset_index()
# Out: 
#   product       date  price
# 0       A 2021-01-01     10
# 1       A 2021-01-02     20
# 2       A 2021-01-03     20
# 3       A 2021-01-04     20
# 4       A 2021-01-05     30
# 5       B 2021-01-02     40
# 6       B 2021-01-03     50
# 7       B 2021-01-04     50
# 8       B 2021-01-05     50
# 9       B 2021-01-06     60

See the rows that have been filled by ffill:

df.set_index("date").groupby("product")["price"].resample("d").mean()
# Out: 
# product  date      
# A        2021-01-01    10.0
#          2021-01-02    20.0
#          2021-01-03     NaN
#          2021-01-04     NaN
#          2021-01-05    30.0
# B        2021-01-02    40.0
#          2021-01-03    50.0
#          2021-01-04     NaN
#          2021-01-05     NaN
#          2021-01-06    60.0
# Name: price, dtype: float64

Note that by grouping by product before resampling and filling the empty slots, you can have different ranges (from min to max) for each product (I modified the data to showcase this).

Answered By: user2314737

First

make pivot table, upsampling by asfreq and fill null

df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()

output:

product     A       B
date        
2021-01-01  10.0    40.0
2021-01-02  20.0    40.0
2021-01-03  20.0    50.0
2021-01-04  20.0    50.0
2021-01-05  30.0    60.0

Second

stack result and so on (include full code)

(df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()
 .stack().reset_index().rename(columns={0:'price'})
 .sort_values('product').reset_index(drop=True))

output:

    date        product price
0   2021-01-01  A       10.0
1   2021-01-02  A       20.0
2   2021-01-03  A       20.0
3   2021-01-04  A       20.0
4   2021-01-05  A       30.0
5   2021-01-01  B       40.0
6   2021-01-02  B       40.0
7   2021-01-03  B       50.0
8   2021-01-04  B       50.0
9   2021-01-05  B       60.0
Answered By: Panda Kim