Count number of unique row in dataframe pandas
Question:
I need to count the number of unique rows in a pandas dataframe. I have tried this solution but it generates an error.
This is my code:
import pandas as pd
df = {'x1': ['A','B','A','A','B','A','A','A'], 'x2': [1,3,2,2,3,1,2,3]}
df = pd.DataFrame(df)
print df.groupby(['x1','x2'], as_index=False).count()
This is the error:
Traceback (most recent call last):
File "/home/user/workspace/project/test.py", line 9, in <module>
print df.groupby(['x1','x2'], as_index=False).count()
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 4372, in count
return self._wrap_agged_blocks(data.items, list(blk))
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 4274, in _wrap_agged_blocks
index = np.arange(blocks[0].values.shape[1])
IndexError: list index out of range
What am I doing wrong?
Answers:
Do it by using size
(ps: you can add .reset_index()
at the end)
df.groupby(['x1','x2'], as_index=False).size()
Out[1262]:
x1 x2
A 1 2
2 3
3 1
B 3 2
dtype: int64
Or fix your code
df.groupby(['x1','x2'])['x2'].count()
Out[1264]:
x1 x2
A 1 2
2 3
3 1
B 3 2
Name: x2, dtype: int64
If you want to know the unique groups, you can using ngroups
df.groupby(['x1','x2']).ngroups
Out[1267]: 4
You could drop duplicates:
import pandas as pd
df = {'x1': ['A','B','A','A','B','A','A','A'], 'x2': [1,3,2,2,3,1,2,3]}
df = pd.DataFrame(df)
print(len(df.drop_duplicates()))
Returns
4
To count the number of occurences of unique rows in the dataframe, instead of using count
, you should use value_counts
now.
df.groupby(['x1','x2'], as_index=False).value_counts()
Out[417]:
x1 x2 count
0 A 1 2
1 A 2 3
2 A 3 1
3 B 3 2
I need to count the number of unique rows in a pandas dataframe. I have tried this solution but it generates an error.
This is my code:
import pandas as pd
df = {'x1': ['A','B','A','A','B','A','A','A'], 'x2': [1,3,2,2,3,1,2,3]}
df = pd.DataFrame(df)
print df.groupby(['x1','x2'], as_index=False).count()
This is the error:
Traceback (most recent call last):
File "/home/user/workspace/project/test.py", line 9, in <module>
print df.groupby(['x1','x2'], as_index=False).count()
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 4372, in count
return self._wrap_agged_blocks(data.items, list(blk))
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 4274, in _wrap_agged_blocks
index = np.arange(blocks[0].values.shape[1])
IndexError: list index out of range
What am I doing wrong?
Do it by using size
(ps: you can add .reset_index()
at the end)
df.groupby(['x1','x2'], as_index=False).size()
Out[1262]:
x1 x2
A 1 2
2 3
3 1
B 3 2
dtype: int64
Or fix your code
df.groupby(['x1','x2'])['x2'].count()
Out[1264]:
x1 x2
A 1 2
2 3
3 1
B 3 2
Name: x2, dtype: int64
If you want to know the unique groups, you can using ngroups
df.groupby(['x1','x2']).ngroups
Out[1267]: 4
You could drop duplicates:
import pandas as pd
df = {'x1': ['A','B','A','A','B','A','A','A'], 'x2': [1,3,2,2,3,1,2,3]}
df = pd.DataFrame(df)
print(len(df.drop_duplicates()))
Returns
4
To count the number of occurences of unique rows in the dataframe, instead of using count
, you should use value_counts
now.
df.groupby(['x1','x2'], as_index=False).value_counts()
Out[417]:
x1 x2 count
0 A 1 2
1 A 2 3
2 A 3 1
3 B 3 2