Pandas Get Count of Each Row of DataFrame and assign to new column for long-format

Question

I get df:

task_id name tag
1       foo  xyz
1       foo  xyz
22      foo  aaa
22      foo  aaa
22      foo  aaa
22      foo  bbb
13      bar  xyz
13      bar  xyz
33      bar  aaa
33      bar  aaa

So I trying df['tag'].value_count() and df_test.groupby('name')['tag'].count() for two reasons:
One I need count each tag per task and second total sum of tags on each task
What I want get:

task_id name tag count_tag total_count
1       foo  xyz   2         6
1       foo  xyz   2         6
22      foo  aaa   3         6
22      foo  aaa   3         6
22      foo  aaa   3         6   
22      foo  bbb   1         6
13      bar  xyz   2         4
13      bar  xyz   2         4
33      bar  aaa   2         4
33      bar  aaa   2         4

for better understanding, in sql to create such a table, I would do something like this:

SELECT
    task_id,
    name,
    count(tag) AS count_tag,
    sum(count(tag)) OVER (PARTITION BY name) AS total_count

Asked By: TeoK

||

Source

Answer 1

s1 = df.groupby('name')['tag'].transform('count')
s2 = df.groupby(['name', 'tag'])['tag'].transform('count')
df.assign(count_tag=s2, total_count=s1)

Answered By: Panda Kim

Answer 2

Use GroupBy.transform by multiple columns:

df['count_tag'] = df.groupby(['task_id','name','tag'])['name'].transform('size')

df['total_count'] = df.groupby(['name'])['name'].transform('size')
#alternative solution
#df['total_count'] = df['name'].map(df['name'].value_counts())

print (df)
   task_id name  tag  count_tag  total_count
0        1  foo  xyz          2            6
1        1  foo  xyz          2            6
2       22  foo  aaa          3            6
3       22  foo  aaa          3            6
4       22  foo  aaa          3            6
5       22  foo  bbb          1            6
6       13  bar  xyz          2            4
7       13  bar  xyz          2            4
8       33  bar  aaa          2            4
9       33  bar  aaa          2            4

Answered By: jezrael

Pandas Get Count of Each Row of DataFrame and assign to new column for long-format

Question:

Answers: