How to get the growth between two rows

Question:

I’m trying to get the growth (in %) between two values at different period. Here is how my DataFrame looks like:

     sessionSource dateRange activeUsers
0    instagram.com   current           5
1    instagram.com  previous           0
2  l.instagram.com   current          83
3  l.instagram.com  previous          11
4     snapchat.com   current           2
5     snapchat.com  previous           1

What I’m trying to get is:

     sessionSource dateRange activeUsers  Growth
0    instagram.com   current           5     xx%
2  l.instagram.com   current          83     xx%
4     snapchat.com   current           2     xx%

I’m not a Pandas expert, I tried few things but nothing came close to what I need.

Thanks a lot for any help.

Asked By: Omar

||

Answers:

You can use:

(df.sort_values(by=['sessionSource', 'dateRange'],
                ascending=[True, False])
   .groupby('sessionSource', as_index=False)
   .agg({'dateRange': 'first', 'activeUsers': lambda s: s.pct_change().dropna().mul(100)})
 )

Output:

     sessionSource dateRange  activeUsers
0    instagram.com  previous          inf
1  l.instagram.com  previous   654.545455
2     snapchat.com  previous   100.000000
Answered By: mozway

Assuming you literally just need the percent change between current and previous and current/previous are in the correct order, you can just group the data based on the source and get the percent change of the group
.Use the pandas.Series.pct_change() method on the grouped object and you should be good.

# sort values before to make sure the order is maintained
df = df.sort_values(by=["sessionSource", "dateRange"], ascending=False)
df['Growth']= (df.groupby('sessionSource')['activeUsers'].apply(pd.Series.pct_change))
#drop na from the unavailable results and convert to %
df["growth"] = (df["growth"].dropna()*100).round(2)

For ex.(taken from the official documentation and applied on a series):

s = pd.Series([90, 91, 85])
s
0    90
1    91
2    85
dtype: int64

s.pct_change()
0         NaN
1    0.011111
2   -0.065934
dtype: float64

EDIT
As @Omar suggested, I posted a small edit to the code that fully solved his problem(just added manual reordering + converting percentage points into percentages). The main gist is still group_by + pct_change

Answered By: Alex.Kh
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.