Count and group individual values in a Dataframe of arrays

Question

Lets say I have a dataframe named df that looks like the one below. I have a column of arrays and I want to count the frequency of all values in the array. I would like to group these counts based on the "Group" column if possible as well. The arrays don’t have to be the same length.

Group	Value
a	[v1, v2]
a	[v1, v3]
b	[v2, v3]

When I call df.groupby('Group')['Value'].value_counts() I get the following output:

Group	Value	Freq
a	[v1, v2]	1
a	[v1, v3]	1
b	[v2, v3]	1

However, I really want to treat each value in the array uniquely as part of the count. In other words, I am looking for an output like this.

Group	Value	Freq
a	[v1]	2
a	[v2]	1
a	[v3]	1
b	[v2]	1
b	[v3]	1

How can I achieve this? Maybe some sort of flattening? I’m not sure if flattening would break my ability to group them though. Any help would be appreciated!

Asked By: Akshay Chacko

||

Source

Answer 1

Assuming a list in "Value":

(df.explode('Value')
   .value_counts()
   .reset_index(name='Freq')
)

Or:

(df.assign(Freq=1)
   .explode('Value')
   .groupby(['Group', 'Value'], as_index=False)
   .sum()
)

Or:

(df.explode('Value')
   .groupby(['Group', 'Value'], as_index=False)
   .agg(Freq=('Value', 'size'))
)

NB. If you have strings, add .str[1:-1].str.split(r',s*') before the explode.

Output:

  Group Value  Freq
0     a    v1     2
1     a    v2     1
2     a    v3     1
3     b    v2     1
4     b    v3     1

Answered By: mozway

Count and group individual values in a Dataframe of arrays

Question:

Answers: