Extrapolation and repetition of data between specific dates

Question:

I have this data:

    start        end        nominal
1   8/23/2021   9/15/2021   10000
2   9/1/2021    9/15/2021   100500
3   9/2/2021    9/15/2021   30000
4   9/3/2021    9/15/2021   2200

I want to transform it into:

date                  1       2       3      4      
2021-08-23 00:00:00 1000                    
2021-08-24 00:00:00 1000                    
2021-08-25 00:00:00 1000                    
2021-08-26 00:00:00 1000                    
2021-08-27 00:00:00 1000                    
2021-08-28 00:00:00 1000                    
2021-08-29 00:00:00 1000                    
2021-08-30 00:00:00 1000                    
2021-08-31 00:00:00 1000                    
2021-09-01 00:00:00 1000    10050               
2021-09-02 00:00:00 1000    10050   3000            
2021-09-03 00:00:00 1000    10050   3000    2200        
2021-09-04 00:00:00 1000    10050   3000    2200        
2021-09-05 00:00:00 1000    10050   3000    2200        
2021-09-06 00:00:00 1000    10050   3000    2200    
2021-09-07 00:00:00 1000    10050   3000    2200    
2021-09-08 00:00:00 1000    10050   3000    2200    
2021-09-09 00:00:00 1000    10050   3000    2200    
2021-09-10 00:00:00 1000    10050   3000    2200    
2021-09-11 00:00:00 1000    10050   3000    2200    
2021-09-12 00:00:00 1000    10050   3000    2200    
2021-09-13 00:00:00 1000    10050   3000    2200    
2021-09-14 00:00:00 1000    10050   3000    2200    
2021-09-15 00:00:00 1000    10050   3000    2200        

So that I generate the data range starting from the earliest date of the column "start" and finishing it with the very last date of the column "end".

I created a new df with nan columns.

How can I map values from nominal to generated dates between start and end?

I tried iterrows, map and even pivoting.

Asked By: bella

||

Answers:

You can create a date_range, explode it and pivot:

df['date'] = [pd.date_range(a, b) for a,b in zip(df.pop('start'), df.pop('end'))]

out = (df
 .explode('date')
 .reset_index()
 .pivot('date', 'index', 'nominal')
 .reset_index().rename_axis(columns=None)
 )

Output:

         date        1         2        3       4       5       6
0  2021-08-23  10000.0       NaN      NaN     NaN     NaN     NaN
1  2021-08-24  10000.0       NaN      NaN     NaN     NaN     NaN
2  2021-08-25  10000.0       NaN      NaN     NaN     NaN     NaN
3  2021-08-26  10000.0       NaN      NaN     NaN     NaN     NaN
4  2021-08-27  10000.0       NaN      NaN     NaN     NaN     NaN
5  2021-08-28  10000.0       NaN      NaN     NaN     NaN     NaN
6  2021-08-29  10000.0       NaN      NaN     NaN     NaN     NaN
7  2021-08-30  10000.0       NaN      NaN     NaN     NaN     NaN
8  2021-08-31  10000.0       NaN      NaN     NaN     NaN     NaN
9  2021-09-01  10000.0  100500.0      NaN     NaN     NaN     NaN
10 2021-09-02  10000.0  100500.0  30000.0     NaN     NaN     NaN
11 2021-09-03  10000.0  100500.0  30000.0  2200.0     NaN     NaN
12 2021-09-04  10000.0  100500.0  30000.0  2200.0     NaN     NaN
13 2021-09-05  10000.0  100500.0  30000.0  2200.0     NaN     NaN
14 2021-09-06  10000.0  100500.0  30000.0  2200.0  5700.0     NaN
15 2021-09-07  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
16 2021-09-08  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
17 2021-09-09  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
18 2021-09-10  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
19 2021-09-11  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
20 2021-09-12  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
21 2021-09-13  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
22 2021-09-14  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
23 2021-09-15  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
Answered By: mozway

You can create the range of dates for each record then explode it and finally reshape your dataframe:

out = (df.assign(date=df.apply(lambda x: pd.date_range(x['start'], x['end']), axis=1))
         .explode('date').reset_index().pivot('date', 'index', 'nominal')
         .rename_axis(columns=None).reset_index())
print(out)

# Output
         date        1         2        3       4       5       6
0  2021-08-23  10000.0       NaN      NaN     NaN     NaN     NaN
1  2021-08-24  10000.0       NaN      NaN     NaN     NaN     NaN
2  2021-08-25  10000.0       NaN      NaN     NaN     NaN     NaN
3  2021-08-26  10000.0       NaN      NaN     NaN     NaN     NaN
4  2021-08-27  10000.0       NaN      NaN     NaN     NaN     NaN
5  2021-08-28  10000.0       NaN      NaN     NaN     NaN     NaN
6  2021-08-29  10000.0       NaN      NaN     NaN     NaN     NaN
7  2021-08-30  10000.0       NaN      NaN     NaN     NaN     NaN
8  2021-08-31  10000.0       NaN      NaN     NaN     NaN     NaN
9  2021-09-01  10000.0  100500.0      NaN     NaN     NaN     NaN
10 2021-09-02  10000.0  100500.0  30000.0     NaN     NaN     NaN
11 2021-09-03  10000.0  100500.0  30000.0  2200.0     NaN     NaN
12 2021-09-04  10000.0  100500.0  30000.0  2200.0     NaN     NaN
13 2021-09-05  10000.0  100500.0  30000.0  2200.0     NaN     NaN
14 2021-09-06  10000.0  100500.0  30000.0  2200.0  5700.0     NaN
15 2021-09-07  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
16 2021-09-08  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
17 2021-09-09  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
18 2021-09-10  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
19 2021-09-11  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
20 2021-09-12  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
21 2021-09-13  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
22 2021-09-14  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
23 2021-09-15  10000.0  100500.0  30000.0  2200.0  5700.0  6050.0
Answered By: Corralien
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.