Expand Pandas date range
Question:
I have data that looks like this. Each row represents a value of that ID at some date.
ID Date Value
A 2012-01-05 50
A 2012-01-08 100
A 2012-01-10 200
B 2012-07-01 10
B 2012-07-03 20
I need to expand this so that I have rows for all days. The value of each day should be the value of the day before (i.e., think of the data above as updates of values, and the data below as a timeseries of values).
ID Date Value
A 2012-01-05 50
A 2012-01-06 50
A 2012-01-07 50
A 2012-01-08 100
A 2012-01-09 100
A 2012-01-10 200
B 2012-07-01 10
B 2012-07-02 10
B 2012-07-03 20
Currently, I have a solution that amounts to the following:
- Group by ID
- For each group, figure out the min and max date
- Create a pd.date_range
- Iterate simultaneously through the rows and through the date range, filling the values in the date range and incrementing the index-pointer to the rows if necessary
- Append all these date ranges to a final dataframe
It works, but seems like a pretty bad bruteforce solution. I wonder if there’s a better approach supported by Pandas?
Answers:
Using resample
on Date
indexed dataframe with ID
groups and ffill
on value
In [1725]: df.set_index('Date').groupby('ID').resample('1D')['Value'].ffill().reset_index()
Out[1725]:
ID Date Value
0 A 2012-01-05 50
1 A 2012-01-06 50
2 A 2012-01-07 50
3 A 2012-01-08 100
4 A 2012-01-09 100
5 A 2012-01-10 200
6 B 2012-07-01 10
7 B 2012-07-02 10
8 B 2012-07-03 20
Or you can try this one (Notice : this can be used for expend numeric column too ).
df.Date=pd.to_datetime(df.Date)
df=df.set_index(df.Date)
df.set_index(df.Date).groupby('ID')
.apply(lambda x : x.reindex(pd.date_range(min(x.index), max(x.index),freq='D')))
.ffill().reset_index(drop=True)
Out[519]:
ID Date Value
0 A 2012-01-05 50.0
1 A 2012-01-05 50.0
2 A 2012-01-05 50.0
3 A 2012-01-08 100.0
4 A 2012-01-08 100.0
5 A 2012-01-10 200.0
6 B 2012-07-01 10.0
7 B 2012-07-01 10.0
8 B 2012-07-03 20.0
This format is not working. It said code cannot reindex on an axis with duplicate labels.Here Picture
I have data that looks like this. Each row represents a value of that ID at some date.
ID Date Value
A 2012-01-05 50
A 2012-01-08 100
A 2012-01-10 200
B 2012-07-01 10
B 2012-07-03 20
I need to expand this so that I have rows for all days. The value of each day should be the value of the day before (i.e., think of the data above as updates of values, and the data below as a timeseries of values).
ID Date Value
A 2012-01-05 50
A 2012-01-06 50
A 2012-01-07 50
A 2012-01-08 100
A 2012-01-09 100
A 2012-01-10 200
B 2012-07-01 10
B 2012-07-02 10
B 2012-07-03 20
Currently, I have a solution that amounts to the following:
- Group by ID
- For each group, figure out the min and max date
- Create a pd.date_range
- Iterate simultaneously through the rows and through the date range, filling the values in the date range and incrementing the index-pointer to the rows if necessary
- Append all these date ranges to a final dataframe
It works, but seems like a pretty bad bruteforce solution. I wonder if there’s a better approach supported by Pandas?
Using resample
on Date
indexed dataframe with ID
groups and ffill
on value
In [1725]: df.set_index('Date').groupby('ID').resample('1D')['Value'].ffill().reset_index()
Out[1725]:
ID Date Value
0 A 2012-01-05 50
1 A 2012-01-06 50
2 A 2012-01-07 50
3 A 2012-01-08 100
4 A 2012-01-09 100
5 A 2012-01-10 200
6 B 2012-07-01 10
7 B 2012-07-02 10
8 B 2012-07-03 20
Or you can try this one (Notice : this can be used for expend numeric column too ).
df.Date=pd.to_datetime(df.Date)
df=df.set_index(df.Date)
df.set_index(df.Date).groupby('ID')
.apply(lambda x : x.reindex(pd.date_range(min(x.index), max(x.index),freq='D')))
.ffill().reset_index(drop=True)
Out[519]:
ID Date Value
0 A 2012-01-05 50.0
1 A 2012-01-05 50.0
2 A 2012-01-05 50.0
3 A 2012-01-08 100.0
4 A 2012-01-08 100.0
5 A 2012-01-10 200.0
6 B 2012-07-01 10.0
7 B 2012-07-01 10.0
8 B 2012-07-03 20.0
This format is not working. It said code cannot reindex on an axis with duplicate labels.Here Picture