pandas how to count column boolean value that based on group

Question:

I have a Dataframe df ,you can have it by running the following code:

import pandas as pd
from io import StringIO
    
df = """  
    month       status_review             supply_review  
2023-01-01           False                    False                   
2023-01-01           True                     True                  
2022-12-01           False                    True                  
2022-12-01           True                     True                
2022-12-01           False                    False   
    """
df= pd.read_csv(StringIO(df.strip()), sep='ss+', engine='python')

How can I count how many status_reviews and supply_review are True in each month?

The output should look like the following:

    month       # of true status_review      # of true supply_review  
2023-01-01           1                         1              
2022-12-01           1                         2
Asked By: William

||

Answers:

You can use .groupby() and .sum(), which is more concise than using .agg():

df.groupby("month").sum()

This outputs:

            status_review  supply_review
month
2022-12-01              1              2
2023-01-01              1              1
Answered By: BrokenBenchmark

Assuming you have one date per month, like in this case, you can simply use groupby and sum all columns.

import pandas as pd
from io import StringIO
    
# setup sample data
df = """  
    month       status_review             supply_review  
2023-01-01           False                    False                   
2023-01-01           True                     True                  
2022-12-01           False                    True                  
2022-12-01           True                     True                
2022-12-01           False                    False   
    """
df= pd.read_csv(StringIO(df.strip()), sep='ss+', engine='python')

# sum all rows by month column
df.groupby('month').agg('sum')

If you have multiple dates per month, then you can convert the column to datetime type and then sum by month as follows:

df['month'] = pd.to_datetime(df['month'])
df.groupby([pd.Grouper(key='month', freq='M')]).agg('sum')
Answered By: harryhaller

use groupby and value_counts() with a reset_index to create a dataframe column for the counts. find the p-value by using the wilcoxon

data = """  
    month       status_review             supply_review  
2023-01-01           False                    False                   
2023-01-01           True                     True                  
2022-12-01           False                    True                  
2022-12-01           True                     True                
2022-12-01           False                    False   
    """
df= pd.read_csv(StringIO(data.strip()), sep='ss+', engine='python')
print(df)

results_status=df.groupby('month')['status_review'].value_counts().reset_index(name='n').sort_values('month')
results_supply=df.groupby('month')['supply_review'].value_counts().reset_index(name='n').sort_values('month')

results=pd.concat([results_status,results_supply],axis=1)
print(results)  

output

    month  status_review  n       month  supply_review  n
0  2022-12-01          False  2  2022-12-01           True  2
1  2022-12-01           True  1  2022-12-01          False  1
2  2023-01-01          False  1  2023-01-01          False  1
3  2023-01-01           True  1  2023-01-01           True  1
Answered By: Golden Lion
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.