compute session duration on an e-commerce dataset python
Question:
I work on an e-commerce dataset in python pandas
like this:
Timestamp
2019-10-23 08:18:14 UTC
2019-10-23 08:18:17 UTC
2019-10-23 08:18:27 UTC
2019-10-15 04:09:18 UTC
2019-10-15 04:10:14 UTC
SessionId
1
1
1
2
2
I would like to calculate each session duration and create a new data frame with that info.
How can I do that with pandas?
Answers:
Here is an example of how you might do this:
import pandas as pd
# dummy data
df = pd.DataFrame({
'Timestamp': ['2019-10-23 08:18:14', ' 2019-10-23 08:18:17', ' 2019-10-23 08:18:27', ' 2019-10-15 04:09:18', ' 2019-10-15 04:10:14'],
'SessionId': [1, 1, 1, 2, 2]
})
df.Timestamp = pd.to_datetime(df.Timestamp) # ensure timestamps are actual datetime objects
df.groupby('SessionId')['Timestamp'].agg(lambda x: max(x) - min(x)).to_frame().rename(columns={'Timestamp': 'Duration'})
I work on an e-commerce dataset in python pandas
like this:
Timestamp
2019-10-23 08:18:14 UTC
2019-10-23 08:18:17 UTC
2019-10-23 08:18:27 UTC
2019-10-15 04:09:18 UTC
2019-10-15 04:10:14 UTC
SessionId
1
1
1
2
2
I would like to calculate each session duration and create a new data frame with that info.
How can I do that with pandas?
Here is an example of how you might do this:
import pandas as pd
# dummy data
df = pd.DataFrame({
'Timestamp': ['2019-10-23 08:18:14', ' 2019-10-23 08:18:17', ' 2019-10-23 08:18:27', ' 2019-10-15 04:09:18', ' 2019-10-15 04:10:14'],
'SessionId': [1, 1, 1, 2, 2]
})
df.Timestamp = pd.to_datetime(df.Timestamp) # ensure timestamps are actual datetime objects
df.groupby('SessionId')['Timestamp'].agg(lambda x: max(x) - min(x)).to_frame().rename(columns={'Timestamp': 'Duration'})