Summarize the problem
I have a Dataframe with a Series of lists of strings.
I can use a Counter from ‘collections’ to iterate over each row of the series and count the times each string appeared easy enough:
for list_of_strings in df['info']: counter = Counter() for string in list_of_strings: counter[string]+=1
What I want is to mutate the df to now have as many columns as there were unique strings found in all of the lists of strings in the df[‘info’] series. The column names would be the string value.
For each row in the df, the new columns values would either be 0 if the string was not found or N where N is the count of how many times the string was found in that rows list_of_strings
Describe what you’ve tried
I was able to make a df whose # and name of columns matches the unique strings found, but I cant figure out how to append the counts per row, and Im not even sure if crafting and appending to a brand new dataframe is the right way to go about it?:
unique_df = pd.Series(copy_df['info'].explode().unique()).to_frame().T
I tried doing something using df.at() for each counter key but it exploded my juptyr notebook :
Any help is appreciated, let me know what other info I can provide.
Assuming this input Series:
s = pd.Series([['A','B','A'],['C','B'],['D']])
unstack with a
fill_value of 0:
(s.explode() .groupby(level=0) .value_counts() .unstack(fill_value=0) .reindex(s.index, fill_value=0) )
s2 = s.explode() (pd.crosstab(s2.index, s2) .rename_axis(index=None, columns=None) )
A B C D 0 2 1 0 0 1 0 1 1 0 2 0 0 0 1