How to GroupBy a Dataframe in Pandas and keep Columns
Question:
given a dataframe that logs uses of some books like this:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
Name Type ID Count
Book1 ebook 1 2
Book2 paper 2 2
Book3 paper 3 1
How can this be done?
Thanks!
Answers:
You want the following:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the ‘Name’, ‘Type’ and ‘ID’ cols match in values so we can groupby
on these, call count
and then reset_index
.
An alternative approach would be to add the ‘Count’ column using transform
and then call drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
I think as_index=False
should do the trick.
df.groupby(['Name','Type','ID'], as_index=False).count()
If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...)
, see here. The .agg()
function allows you to choose what to do with the columns you don’t want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}
. Instead of 'first'
, you can also apply 'sum'
, 'mean'
and others.
You can use value_counts()
as well:
df.value_counts().reset_index(name= 'Count')
Output:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
SIMPLEST WAY
df.groupby([‘col1’, ‘col1’], as_index=False).count(). Use as_index=False to retain column names. The default is True.
Also can use df.groupby([‘col_1’, ‘col_2’]).count().reset_index()
given a dataframe that logs uses of some books like this:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
Name Type ID Count
Book1 ebook 1 2
Book2 paper 2 2
Book3 paper 3 1
How can this be done?
Thanks!
You want the following:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the ‘Name’, ‘Type’ and ‘ID’ cols match in values so we can groupby
on these, call count
and then reset_index
.
An alternative approach would be to add the ‘Count’ column using transform
and then call drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
I think as_index=False
should do the trick.
df.groupby(['Name','Type','ID'], as_index=False).count()
If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...)
, see here. The .agg()
function allows you to choose what to do with the columns you don’t want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}
. Instead of 'first'
, you can also apply 'sum'
, 'mean'
and others.
You can use value_counts()
as well:
df.value_counts().reset_index(name= 'Count')
Output:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
SIMPLEST WAY
df.groupby([‘col1’, ‘col1’], as_index=False).count(). Use as_index=False to retain column names. The default is True.
Also can use df.groupby([‘col_1’, ‘col_2’]).count().reset_index()