Create Multi-Index empty DataFrame to join with main DataFrame


Suppose that I have a dataframe which can be created using code below

df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-05'],
                          'product':['A', 'A', 'A', 'B', 'B', 'B'],
                          'price':[10, 20, 30, 40, 50, 60]
df['date'] = pd.to_datetime(df['date'])

I want to create an empty dataframe let’s say main_df which will contain all dates between and for each product and on days where values in nan I want to ffill and bfill for remaning. The resulting dataframe would be as below:

|    date    | product | price |
| 2021-01-01 | A       |    10 |
| 2021-01-02 | A       |    20 |
| 2021-01-03 | A       |    20 |
| 2021-01-04 | A       |    20 |
| 2021-01-05 | A       |    30 |
| 2021-01-01 | B       |    40 |
| 2021-01-02 | B       |    40 |
| 2021-01-03 | B       |    50 |
| 2021-01-04 | B       |    50 |
| 2021-01-05 | B       |    60 |
Asked By: Lopez



Using resample

df = pd.DataFrame(data = {'date':['2021-01-01', '2021-01-02', '2021-01-05','2021-01-02', '2021-01-03', '2021-01-06'],
                                'product':['A', 'A', 'A', 'B', 'B', 'B'],
                                'price':[10, 20, 30, 40, 50, 60]
df['date'] = pd.to_datetime(df['date'])

# Out: 
#          date product  price
# 0  2021-01-01       A     10
# 1  2021-01-02       A     20
# 2  2021-01-05       A     30
# 3  2021-01-02       B     40
# 4  2021-01-03       B     50
# 5  2021-01-06       B     60

# Out: 
#   product       date  price
# 0       A 2021-01-01     10
# 1       A 2021-01-02     20
# 2       A 2021-01-03     20
# 3       A 2021-01-04     20
# 4       A 2021-01-05     30
# 5       B 2021-01-02     40
# 6       B 2021-01-03     50
# 7       B 2021-01-04     50
# 8       B 2021-01-05     50
# 9       B 2021-01-06     60

See the rows that have been filled by ffill:

# Out: 
# product  date      
# A        2021-01-01    10.0
#          2021-01-02    20.0
#          2021-01-03     NaN
#          2021-01-04     NaN
#          2021-01-05    30.0
# B        2021-01-02    40.0
#          2021-01-03    50.0
#          2021-01-04     NaN
#          2021-01-05     NaN
#          2021-01-06    60.0
# Name: price, dtype: float64

Note that by grouping by product before resampling and filling the empty slots, you can have different ranges (from min to max) for each product (I modified the data to showcase this).

Answered By: user2314737


make pivot table, upsampling by asfreq and fill null

df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()


product     A       B
2021-01-01  10.0    40.0
2021-01-02  20.0    40.0
2021-01-03  20.0    50.0
2021-01-04  20.0    50.0
2021-01-05  30.0    60.0


stack result and so on (include full code)

(df.pivot_table('price', 'date', 'product').asfreq('D').ffill().bfill()


    date        product price
0   2021-01-01  A       10.0
1   2021-01-02  A       20.0
2   2021-01-03  A       20.0
3   2021-01-04  A       20.0
4   2021-01-05  A       30.0
5   2021-01-01  B       40.0
6   2021-01-02  B       40.0
7   2021-01-03  B       50.0
8   2021-01-04  B       50.0
9   2021-01-05  B       60.0
Answered By: Panda Kim