Calculate difference between successive date column with groupby on another column in pandas?
Question:
I have a pandas dataframe,
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
['Train','2019-01-06T19:44:09Z'],
['Train','2019-01-02T19:44:09Z'],
['Car','2019-01-08T06:44:09Z'],
['Car','2019-01-06T18:44:09Z'],
['Train','2019-01-04T19:44:09Z'],
['Car','2019-01-05T16:34:09Z'],
['Train','2019-01-08T19:44:09Z'],
['Car','2019-01-07T14:44:09Z'],
['Car','2019-01-06T11:44:09Z'],
['Train','2019-01-10T19:44:09Z'],
],
columns=['Type', 'Date'])
Need to find the difference between successive dates for each type, after sorting them by dates
Final data looks like
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
['Train','2019-01-06T19:44:09Z',4],
['Train','2019-01-02T19:44:09Z',0],
['Car','2019-01-08T06:44:09Z',3],
['Car','2019-01-06T18:44:09Z',1],
['Train','2019-01-04T19:44:09Z',2],
['Car','2019-01-05T16:34:09Z',0],
['Train','2019-01-08T19:44:09Z',6],
['Car','2019-01-07T14:44:09Z',2],
['Car','2019-01-06T11:44:09Z',1],
['Train','2019-01-10T19:44:09Z',8],
],
columns=['Type', 'Date','diff'])
Here, Type Car min(Date) is 2019-01-05T16:34:09Z, so the diff starts as 0, then second date is 2019-01-06T18:44:09Z and 2019-01-06T11:44:09Z, so diff is 1 day (here not sure whether time can be included) and so on..
For Type Train min(Date) is 2019-01-02T19:44:09Z, so diff is 0 then 2019-01-04T19:44:09Z so 2 days diff
I tried groupby, but not sure how to include sort on date
data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')
Answers:
Use pandas.DataFrame.groupby
with dt.date
:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())
Output:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1 days
1 Train 2019-01-06 19:44:09+00:00 4 days
2 Train 2019-01-02 19:44:09+00:00 0 days
3 Car 2019-01-08 06:44:09+00:00 3 days
4 Car 2019-01-06 18:44:09+00:00 1 days
5 Train 2019-01-04 19:44:09+00:00 2 days
6 Car 2019-01-05 16:34:09+00:00 0 days
7 Train 2019-01-08 19:44:09+00:00 6 days
8 Car 2019-01-07 14:44:09+00:00 2 days
9 Car 2019-01-06 11:44:09+00:00 1 days
10 Train 2019-01-10 19:44:09+00:00 8 days
If you want them to be int
, add dt.days
:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days
Output:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1
1 Train 2019-01-06 19:44:09+00:00 4
2 Train 2019-01-02 19:44:09+00:00 0
3 Car 2019-01-08 06:44:09+00:00 3
4 Car 2019-01-06 18:44:09+00:00 1
5 Train 2019-01-04 19:44:09+00:00 2
6 Car 2019-01-05 16:34:09+00:00 0
7 Train 2019-01-08 19:44:09+00:00 6
8 Car 2019-01-07 14:44:09+00:00 2
9 Car 2019-01-06 11:44:09+00:00 1
10 Train 2019-01-10 19:44:09+00:00 8
- first convert Date into date into some other column
- use lambda function to subtract min of date and find days using dt.days
- Then Drop the extra date column
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
Direct subtraction from transform
s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days
Out[36]:
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
Just to add, need help on similar data, but how can we find the difference between the successive Months.
Output:
| Type| Date | Month Diff|
|:—- |:——: | —–:|
| Car | 2019-01-06| 0 |
| Car | 2019-03-02| 2 |
| Car | 2019-07-06| 4 |
| Car | 2019-08-23| 1 |
| Car | 2019-11-23| 3 |
| Train | 2020-01-23| 0 |
| Train | 2019-03-23| 2 |
| Train | 2019-09-23| 6 |
I have a pandas dataframe,
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z'],
['Train','2019-01-06T19:44:09Z'],
['Train','2019-01-02T19:44:09Z'],
['Car','2019-01-08T06:44:09Z'],
['Car','2019-01-06T18:44:09Z'],
['Train','2019-01-04T19:44:09Z'],
['Car','2019-01-05T16:34:09Z'],
['Train','2019-01-08T19:44:09Z'],
['Car','2019-01-07T14:44:09Z'],
['Car','2019-01-06T11:44:09Z'],
['Train','2019-01-10T19:44:09Z'],
],
columns=['Type', 'Date'])
Need to find the difference between successive dates for each type, after sorting them by dates
Final data looks like
data = pd.DataFrame([['Car','2019-01-06T21:44:09Z',1],
['Train','2019-01-06T19:44:09Z',4],
['Train','2019-01-02T19:44:09Z',0],
['Car','2019-01-08T06:44:09Z',3],
['Car','2019-01-06T18:44:09Z',1],
['Train','2019-01-04T19:44:09Z',2],
['Car','2019-01-05T16:34:09Z',0],
['Train','2019-01-08T19:44:09Z',6],
['Car','2019-01-07T14:44:09Z',2],
['Car','2019-01-06T11:44:09Z',1],
['Train','2019-01-10T19:44:09Z',8],
],
columns=['Type', 'Date','diff'])
Here, Type Car min(Date) is 2019-01-05T16:34:09Z, so the diff starts as 0, then second date is 2019-01-06T18:44:09Z and 2019-01-06T11:44:09Z, so diff is 1 day (here not sure whether time can be included) and so on..
For Type Train min(Date) is 2019-01-02T19:44:09Z, so diff is 0 then 2019-01-04T19:44:09Z so 2 days diff
I tried groupby, but not sure how to include sort on date
data['diff'] = data.groupby('Type')['Date'].diff() / np.timedelta64(1, 'D')
Use pandas.DataFrame.groupby
with dt.date
:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date())
Output:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1 days
1 Train 2019-01-06 19:44:09+00:00 4 days
2 Train 2019-01-02 19:44:09+00:00 0 days
3 Car 2019-01-08 06:44:09+00:00 3 days
4 Car 2019-01-06 18:44:09+00:00 1 days
5 Train 2019-01-04 19:44:09+00:00 2 days
6 Car 2019-01-05 16:34:09+00:00 0 days
7 Train 2019-01-08 19:44:09+00:00 6 days
8 Car 2019-01-07 14:44:09+00:00 2 days
9 Car 2019-01-06 11:44:09+00:00 1 days
10 Train 2019-01-10 19:44:09+00:00 8 days
If you want them to be int
, add dt.days
:
df['diff'] = df.groupby('Type')['Date'].apply(lambda x: x.dt.date - x.min().date()).dt.days
Output:
Type Date diff
0 Car 2019-01-06 21:44:09+00:00 1
1 Train 2019-01-06 19:44:09+00:00 4
2 Train 2019-01-02 19:44:09+00:00 0
3 Car 2019-01-08 06:44:09+00:00 3
4 Car 2019-01-06 18:44:09+00:00 1
5 Train 2019-01-04 19:44:09+00:00 2
6 Car 2019-01-05 16:34:09+00:00 0
7 Train 2019-01-08 19:44:09+00:00 6
8 Car 2019-01-07 14:44:09+00:00 2
9 Car 2019-01-06 11:44:09+00:00 1
10 Train 2019-01-10 19:44:09+00:00 8
- first convert Date into date into some other column
- use lambda function to subtract min of date and find days using dt.days
- Then Drop the extra date column
data['Date_date'] = pd.to_datetime(data['Date']).dt.date
data['diff'] = data.groupby(['Type'])['Date_date'].apply(lambda x:(x-x.min()).dt.days)
data.drop(['Date_date'],axis=1,inplace=True,errors='ignore')
print(data)
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
Direct subtraction from transform
s = pd.to_datetime(data['Date']).dt.date
data['diff'] = (s - s.groupby(data.Type).transform('min')).dt.days
Out[36]:
Type Date diff
0 Car 2019-01-06T21:44:09Z 1
1 Train 2019-01-06T19:44:09Z 4
2 Train 2019-01-02T19:44:09Z 0
3 Car 2019-01-08T06:44:09Z 3
4 Car 2019-01-06T18:44:09Z 1
5 Train 2019-01-04T19:44:09Z 2
6 Car 2019-01-05T16:34:09Z 0
7 Train 2019-01-08T19:44:09Z 6
8 Car 2019-01-07T14:44:09Z 2
9 Car 2019-01-06T11:44:09Z 1
10 Train 2019-01-10T19:44:09Z 8
Just to add, need help on similar data, but how can we find the difference between the successive Months.
Output:
| Type| Date | Month Diff|
|:—- |:——: | —–:|
| Car | 2019-01-06| 0 |
| Car | 2019-03-02| 2 |
| Car | 2019-07-06| 4 |
| Car | 2019-08-23| 1 |
| Car | 2019-11-23| 3 |
| Train | 2020-01-23| 0 |
| Train | 2019-03-23| 2 |
| Train | 2019-09-23| 6 |