Count the different values row-wise and each result as an extra column pandas dataframe
Question:
As can be seen from my title, I have a pandas dataframe with different columns and rows. I want to do a row by row count of the contained values. I want to add the count result for each value in a new column.
My dataframe looks like this:
Col1 Col2 Col3
2 2 1
1 1 1
3 1 2
And this is how the result should look like:
Col1 Col2 Col3 Count1 Count2 Count3
2 2 1 1 2 0
1 1 1 3 0 0
3 1 2 1 1 1
Answers:
You can loop through all of the unique values in the dataframe, creating a column for each that gives the count per row of that value.
for v in np.unique(df):
df[f'Count{v}'] = (df == v).sum(axis=1)
You can apply value_counts
on index axis:
df1 = (df.apply(lambda x: x.value_counts(), axis=1)
.fillna(0).astype(int).add_prefix('Count'))
out = pd.concat([df, df1], axis=1)
print(out)
# Output
Col1 Col2 Col3 Count1 Count2 Count3
0 2 2 1 1 2 0
1 1 1 1 3 0 0
2 3 1 2 1 1 1
# stack values and get rid of column names
s = df.stack().droplevel(1)
# compute a cross-tab, rename columns, join to original
out = df.join(pd.crosstab(s.index, s).add_prefix('Count'))
Output:
Col1 Col2 Col3 Count1 Count2 Count3
0 2 2 1 1 2 0
1 1 1 1 3 0 0
2 3 1 2 1 1 1
Alternative with groupby.value_counts
(likely less efficient):
out = df.join(df.stack()
.groupby(level=0).value_counts()
.unstack(level=1, fill_value=0)
.add_prefix('Count')
)
As can be seen from my title, I have a pandas dataframe with different columns and rows. I want to do a row by row count of the contained values. I want to add the count result for each value in a new column.
My dataframe looks like this:
Col1 Col2 Col3
2 2 1
1 1 1
3 1 2
And this is how the result should look like:
Col1 Col2 Col3 Count1 Count2 Count3
2 2 1 1 2 0
1 1 1 3 0 0
3 1 2 1 1 1
You can loop through all of the unique values in the dataframe, creating a column for each that gives the count per row of that value.
for v in np.unique(df):
df[f'Count{v}'] = (df == v).sum(axis=1)
You can apply value_counts
on index axis:
df1 = (df.apply(lambda x: x.value_counts(), axis=1)
.fillna(0).astype(int).add_prefix('Count'))
out = pd.concat([df, df1], axis=1)
print(out)
# Output
Col1 Col2 Col3 Count1 Count2 Count3
0 2 2 1 1 2 0
1 1 1 1 3 0 0
2 3 1 2 1 1 1
# stack values and get rid of column names
s = df.stack().droplevel(1)
# compute a cross-tab, rename columns, join to original
out = df.join(pd.crosstab(s.index, s).add_prefix('Count'))
Output:
Col1 Col2 Col3 Count1 Count2 Count3
0 2 2 1 1 2 0
1 1 1 1 3 0 0
2 3 1 2 1 1 1
Alternative with groupby.value_counts
(likely less efficient):
out = df.join(df.stack()
.groupby(level=0).value_counts()
.unstack(level=1, fill_value=0)
.add_prefix('Count')
)