Pandas Get Count of Each Row of DataFrame and assign to new column for long-format
Question:
I get df:
task_id name tag
1 foo xyz
1 foo xyz
22 foo aaa
22 foo aaa
22 foo aaa
22 foo bbb
13 bar xyz
13 bar xyz
33 bar aaa
33 bar aaa
So I trying df['tag'].value_count()
and df_test.groupby('name')['tag'].count()
for two reasons:
One I need count each tag
per task
and second total sum of tags on each task
What I want get:
task_id name tag count_tag total_count
1 foo xyz 2 6
1 foo xyz 2 6
22 foo aaa 3 6
22 foo aaa 3 6
22 foo aaa 3 6
22 foo bbb 1 6
13 bar xyz 2 4
13 bar xyz 2 4
33 bar aaa 2 4
33 bar aaa 2 4
for better understanding, in sql to create such a table, I would do something like this:
SELECT
task_id,
name,
count(tag) AS count_tag,
sum(count(tag)) OVER (PARTITION BY name) AS total_count
Answers:
s1 = df.groupby('name')['tag'].transform('count')
s2 = df.groupby(['name', 'tag'])['tag'].transform('count')
df.assign(count_tag=s2, total_count=s1)
Use GroupBy.transform
by multiple columns:
df['count_tag'] = df.groupby(['task_id','name','tag'])['name'].transform('size')
df['total_count'] = df.groupby(['name'])['name'].transform('size')
#alternative solution
#df['total_count'] = df['name'].map(df['name'].value_counts())
print (df)
task_id name tag count_tag total_count
0 1 foo xyz 2 6
1 1 foo xyz 2 6
2 22 foo aaa 3 6
3 22 foo aaa 3 6
4 22 foo aaa 3 6
5 22 foo bbb 1 6
6 13 bar xyz 2 4
7 13 bar xyz 2 4
8 33 bar aaa 2 4
9 33 bar aaa 2 4
I get df:
task_id name tag
1 foo xyz
1 foo xyz
22 foo aaa
22 foo aaa
22 foo aaa
22 foo bbb
13 bar xyz
13 bar xyz
33 bar aaa
33 bar aaa
So I trying df['tag'].value_count()
and df_test.groupby('name')['tag'].count()
for two reasons:
One I need count each tag
per task
and second total sum of tags on each task
What I want get:
task_id name tag count_tag total_count
1 foo xyz 2 6
1 foo xyz 2 6
22 foo aaa 3 6
22 foo aaa 3 6
22 foo aaa 3 6
22 foo bbb 1 6
13 bar xyz 2 4
13 bar xyz 2 4
33 bar aaa 2 4
33 bar aaa 2 4
for better understanding, in sql to create such a table, I would do something like this:
SELECT
task_id,
name,
count(tag) AS count_tag,
sum(count(tag)) OVER (PARTITION BY name) AS total_count
s1 = df.groupby('name')['tag'].transform('count')
s2 = df.groupby(['name', 'tag'])['tag'].transform('count')
df.assign(count_tag=s2, total_count=s1)
Use GroupBy.transform
by multiple columns:
df['count_tag'] = df.groupby(['task_id','name','tag'])['name'].transform('size')
df['total_count'] = df.groupby(['name'])['name'].transform('size')
#alternative solution
#df['total_count'] = df['name'].map(df['name'].value_counts())
print (df)
task_id name tag count_tag total_count
0 1 foo xyz 2 6
1 1 foo xyz 2 6
2 22 foo aaa 3 6
3 22 foo aaa 3 6
4 22 foo aaa 3 6
5 22 foo bbb 1 6
6 13 bar xyz 2 4
7 13 bar xyz 2 4
8 33 bar aaa 2 4
9 33 bar aaa 2 4