Numpy Array count occurrences based on multiple filters

Question:

I am trying to count the number of occurrences of a NumPy array by having the first filter and then counting the second column of occurrences.

DataSet information:

data_dict = {
    'Outlook' : ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy', 'Rainy', 'Overcast', 'Sunny', 'Sunny','Rainy', 'Sunny', 'Overcast', 'Overcast', 'Rainy']
    ,'Temperature': ['Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild','Mild','Mild', 'Hot', 'Mild']
    ,'Humidity' : ['High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High','Normal','Normal', 'Normal', 'High', 'Normal', 'High']
    ,'Wind': ['False', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'False', 'False', 'True', 'True', 'False', 'True']
    ,'label': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No']
}

Resulting DataFrame:

    Outlook Temperature Humidity   Wind label
0     Sunny         Hot     High  False    No
1     Sunny         Hot     High   True    No
2  Overcast         Hot     High  False   Yes
3     Rainy        Mild     High  False   Yes
4     Rainy        Cool   Normal  False   Yes
...

I would like to get the following:

Outlook    No Yes All 
Sunny      2   3   5         
Overcast   4   0   4
Rain       3   2   5

Here is my code attempt (however it summarizes each column separately):

result = np.where(df.columns.values == 'label')
result1 = np.where(df.columns.values == 'Outlook')
lst = rows[:, [result, result1]]
uni, data = np.unique(lst, return_counts=True)
Asked By: N K

||

Answers:

You can use a pivot table:

pd.pivot_table(
    df,
    values="Day",
    index="Outlook",
    columns="label",
    aggfunc="count",
    margins=True,
    fill_value=0,
)

the result is:

         Day        
label     No Yes All
Outlook             
Overcast   0   4   4
Rainy      2   3   5
Sunny      3   2   5
All        5   9  14

The documentation is here

Answered By: itogaston
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.