Pandas DataFrame.value_counts() does not allow dropna=False

Question:

Pandas Series.value_counts() has a dropna parameter but DataFrame.value_counts() not. That is my problem. But I am sure there is a reason and an alternative solution for it.

The usecase is that I want to count pattern (value combinations of specific columns) in my DataFrame. For that usecase I want to count None/NaN, too.

This is the data with 8 rows.

   name  foo  bar    sun
0   Tim    1    2   True
1   Tim    2    2  False
2   Tim    2    2  False
3  Anna    1    3   None
4  Anna    3    5   None
5   Bob    2    2  False
6   Bob    5    5   True
7   Bob    1    1   None

I can count all foo-bar-combinations with df[['foo', 'bar']].value_counts() and got in sum 8 (all rows).

foo  bar
2    2      3
1    1      1
     2      1
     3      1
3    5      1
5    5      1
dtype: int64

But when I add a NaN value containing column to the pattern the rows with NaN are not counted.

foo  bar  sun  
2    2    False    3
1    2    True     1
5    5    True     1

This is the full code.

import pandas as pd
import random as rd

data = {'name': ['Tim', 'Tim', 'Tim', 'Anna', 'Anna', 'Bob', 'Bob', 'Bob'],
        'foo': [1, 2, 2, 1, 3, 2, 5, 1],
        'bar': [2, 2, 2, 3, 5, 2, 5, 1],
        'sun': [True, False, False, None, None, False, True, None]
}

# That is the initial DataFrame
df = pd.DataFrame(data)
print(df)

# counter foo-bar patterns
pa = df[['foo', 'bar']].value_counts()
print(pa)

# count foo-bar-sun patterns
# PROBLE: None/NaN is not counted
pb = df[['foo', 'bar', 'sun']].value_counts()
print(pb)
Asked By: buhtz

||

Answers:

I think it is not supported yet, possible alternative solution:

pb = df.groupby(['foo', 'bar', 'sun'], dropna=False).size()
print(pb)
foo  bar  sun  
1    1    NaN      1
     2    True     1
     3    NaN      1
2    2    False    3
3    5    NaN      1
5    5    True     1
dtype: int64
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.