Group by key words found in one column, then show summary statistics for another column python

Question:

I am trying to find a quick way to group records by a key word/string found in one column, and display summary statistics (.describe(), .mean(), etc.) for another column.

Below is a snippet of the code I was trying. I’m trying to avoid splitting/sub setting into a bunch of different dataframes.

My dataframe looks like the following:

PROMOTED_PRODUCT__CREATIVE              ROAS
0   Simple Green 1 Gal. Concentrated    0.027573
1   Simple Green 1 Gal. Concentrated    0.082969
2   Simple Green 1 Gal. Concentrated    0.056278
3   Simple Green 1 Gal. Concentrated    0.037286
4   Simple Green 1 Gal. Concentrated    0.355843


df_pi['PROMOTED_PRODUCT__CREATIVE'].str.contains('1 gal', re.IGNORECASE).groupby(df_pi['PROMOTED_PRODUCT__CREATIVE'])['ROAS'].mean()

When I run the following code, it results in a key error:

KeyError: 'Column not found: ROAS'

I understand that it’s because it’s not a dataframe/there are no column headers. Should I do this in steps instead of a single line?

Any help would be greatly appreciated! Thank you so much in advance.

Asked By: mexicanRmy

||

Answers:

If I understand you correctly, you can create a mask and filter the dataframe by this mask before doing .groupby:

mask = df["PROMOTED_PRODUCT__CREATIVE"].str.contains(
    "1 gal", flags=re.IGNORECASE
)

x = df[mask].groupby("PROMOTED_PRODUCT__CREATIVE")["ROAS"].mean()

print(x)

Prints:

PROMOTED_PRODUCT__CREATIVE
Simple Green 1 Gal. Concentrated    0.11199
Name: ROAS, dtype: float64
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.