How to calculate percentage while matching columns on video_id?
Question:
I am trying to calculate a percentage of users who view a specific video and those who do not. I managed to calculate the total number of videos and also total number of videos viewed by each group.
However, when I try to calculate the percentages it does not work.
I believe I probably need to match the story ids as the columns do not match anymore after calculating, how do I do that?
This is my formula to calculate percentages:
pd.DataFrame(df.status.eq(3).astype(int).groupby(df.story_id).sum() / df['story_id'].value_counts())
However the results do not make sense as I believe that during the calculations the story_id
did not match.
Answers:
For percentage – sum
divide by count
is possible use mean
– solution is simplify:
print (df)
story_id status
0 1 3
1 1 5
2 1 3
3 2 3
4 2 3
5 4 5
6 4 3
7 5 7
df1 = df.status.eq(3).groupby(df.story_id).mean().reset_index(name='perc')
print (df1)
story_id perc
0 1 0.666667
1 2 1.000000
2 4 0.500000
3 5 0.000000
I am trying to calculate a percentage of users who view a specific video and those who do not. I managed to calculate the total number of videos and also total number of videos viewed by each group.
However, when I try to calculate the percentages it does not work.
I believe I probably need to match the story ids as the columns do not match anymore after calculating, how do I do that?
This is my formula to calculate percentages:
pd.DataFrame(df.status.eq(3).astype(int).groupby(df.story_id).sum() / df['story_id'].value_counts())
However the results do not make sense as I believe that during the calculations the story_id
did not match.
For percentage – sum
divide by count
is possible use mean
– solution is simplify:
print (df)
story_id status
0 1 3
1 1 5
2 1 3
3 2 3
4 2 3
5 4 5
6 4 3
7 5 7
df1 = df.status.eq(3).groupby(df.story_id).mean().reset_index(name='perc')
print (df1)
story_id perc
0 1 0.666667
1 2 1.000000
2 4 0.500000
3 5 0.000000