Lets say I have a dataframe named df that looks like the one below. I have a column of arrays and I want to count the frequency of all values in the array. I would like to group these counts based on the "Group" column if possible as well. The arrays don’t have to be the same length.
When I call
df.groupby('Group')['Value'].value_counts() I get the following output:
However, I really want to treat each value in the array uniquely as part of the count. In other words, I am looking for an output like this.
How can I achieve this? Maybe some sort of flattening? I’m not sure if flattening would break my ability to group them though. Any help would be appreciated!
Assuming a list in "Value":
(df.explode('Value') .value_counts() .reset_index(name='Freq') )
(df.assign(Freq=1) .explode('Value') .groupby(['Group', 'Value'], as_index=False) .sum() )
(df.explode('Value') .groupby(['Group', 'Value'], as_index=False) .agg(Freq=('Value', 'size')) )
NB. If you have strings, add
.str[1:-1].str.split(r',s*') before the
Group Value Freq 0 a v1 2 1 a v2 1 2 a v3 1 3 b v2 1 4 b v3 1