compute session duration on an e-commerce dataset python

Question:

I work on an e-commerce dataset in python pandas like this:

Timestamp                 
2019-10-23 08:18:14 UTC     
2019-10-23 08:18:17 UTC     
2019-10-23 08:18:27 UTC     
2019-10-15 04:09:18 UTC     
2019-10-15 04:10:14 UTC 

SessionId
1
1
1
2
2  

I would like to calculate each session duration and create a new data frame with that info.
How can I do that with pandas?

Asked By: Nick Giannopoulos

||

Answers:

Here is an example of how you might do this:

import pandas as pd

# dummy data
df = pd.DataFrame({
    'Timestamp': ['2019-10-23 08:18:14', ' 2019-10-23 08:18:17', ' 2019-10-23 08:18:27', ' 2019-10-15 04:09:18', ' 2019-10-15 04:10:14'],
    'SessionId': [1, 1, 1, 2, 2]
})
df.Timestamp = pd.to_datetime(df.Timestamp)  # ensure timestamps are actual datetime objects

df.groupby('SessionId')['Timestamp'].agg(lambda x: max(x) - min(x)).to_frame().rename(columns={'Timestamp': 'Duration'})
Answered By: RubenB
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.