Pandas DataFrame.value_counts() does not allow dropna=False
Question:
Pandas Series.value_counts()
has a dropna
parameter but DataFrame.value_counts()
not. That is my problem. But I am sure there is a reason and an alternative solution for it.
The usecase is that I want to count pattern (value combinations of specific columns) in my DataFrame. For that usecase I want to count None
/NaN
, too.
This is the data with 8 rows.
name foo bar sun
0 Tim 1 2 True
1 Tim 2 2 False
2 Tim 2 2 False
3 Anna 1 3 None
4 Anna 3 5 None
5 Bob 2 2 False
6 Bob 5 5 True
7 Bob 1 1 None
I can count all foo-bar-combinations with df[['foo', 'bar']].value_counts()
and got in sum 8 (all rows).
foo bar
2 2 3
1 1 1
2 1
3 1
3 5 1
5 5 1
dtype: int64
But when I add a NaN
value containing column to the pattern the rows with NaN
are not counted.
foo bar sun
2 2 False 3
1 2 True 1
5 5 True 1
This is the full code.
import pandas as pd
import random as rd
data = {'name': ['Tim', 'Tim', 'Tim', 'Anna', 'Anna', 'Bob', 'Bob', 'Bob'],
'foo': [1, 2, 2, 1, 3, 2, 5, 1],
'bar': [2, 2, 2, 3, 5, 2, 5, 1],
'sun': [True, False, False, None, None, False, True, None]
}
# That is the initial DataFrame
df = pd.DataFrame(data)
print(df)
# counter foo-bar patterns
pa = df[['foo', 'bar']].value_counts()
print(pa)
# count foo-bar-sun patterns
# PROBLE: None/NaN is not counted
pb = df[['foo', 'bar', 'sun']].value_counts()
print(pb)
Answers:
I think it is not supported yet, possible alternative solution:
pb = df.groupby(['foo', 'bar', 'sun'], dropna=False).size()
print(pb)
foo bar sun
1 1 NaN 1
2 True 1
3 NaN 1
2 2 False 3
3 5 NaN 1
5 5 True 1
dtype: int64
Pandas Series.value_counts()
has a dropna
parameter but DataFrame.value_counts()
not. That is my problem. But I am sure there is a reason and an alternative solution for it.
The usecase is that I want to count pattern (value combinations of specific columns) in my DataFrame. For that usecase I want to count None
/NaN
, too.
This is the data with 8 rows.
name foo bar sun
0 Tim 1 2 True
1 Tim 2 2 False
2 Tim 2 2 False
3 Anna 1 3 None
4 Anna 3 5 None
5 Bob 2 2 False
6 Bob 5 5 True
7 Bob 1 1 None
I can count all foo-bar-combinations with df[['foo', 'bar']].value_counts()
and got in sum 8 (all rows).
foo bar
2 2 3
1 1 1
2 1
3 1
3 5 1
5 5 1
dtype: int64
But when I add a NaN
value containing column to the pattern the rows with NaN
are not counted.
foo bar sun
2 2 False 3
1 2 True 1
5 5 True 1
This is the full code.
import pandas as pd
import random as rd
data = {'name': ['Tim', 'Tim', 'Tim', 'Anna', 'Anna', 'Bob', 'Bob', 'Bob'],
'foo': [1, 2, 2, 1, 3, 2, 5, 1],
'bar': [2, 2, 2, 3, 5, 2, 5, 1],
'sun': [True, False, False, None, None, False, True, None]
}
# That is the initial DataFrame
df = pd.DataFrame(data)
print(df)
# counter foo-bar patterns
pa = df[['foo', 'bar']].value_counts()
print(pa)
# count foo-bar-sun patterns
# PROBLE: None/NaN is not counted
pb = df[['foo', 'bar', 'sun']].value_counts()
print(pb)
I think it is not supported yet, possible alternative solution:
pb = df.groupby(['foo', 'bar', 'sun'], dropna=False).size()
print(pb)
foo bar sun
1 1 NaN 1
2 True 1
3 NaN 1
2 2 False 3
3 5 NaN 1
5 5 True 1
dtype: int64