Sum the values of a groupby from dataframe columns using a pattern in a list

Question:

Context: I’m trying to get the sum of the groups created using a groupby using a list of patterns that are present on the dataframe columns.

For example, let’s say we have this dataframe:

df = pd.DataFrame({'123_Pattern1_a':[0,1,2],'X_Y_Pattern2_X':[3,4,5],'Z_D_Pattern2_Y':[4,5,7],'312_Pattern1_Z':[8,2,4]})

I now would like to create a group by using the "Pattern" and get the sum of values for those columns for each row

If we have a list like this:

pattern = ['Pattern1','Pattern2']

With the dataframe above, the output should be another dataframe as such:

df_final = pd.DataFrame({'Pattern1':[8,3,6],'Pattern2':[7,9,12]}) 

Basically, "concatenating" all the columns that have a specific pattern on the given column name and get the sum of these values by row

I was trying something like this:

pattern = ['Pattern1','Pattern2','Pattern3',...]

grouped = pd.DataFrame(data_media.groupby(data_media.columns.str.extract(pattern, expand=False), axis=1))

But it doesn’t work since extract is a regex and I’m using a list with the patterns. How could I create a regex that would work for this problem? Or is there another way to do this?

Asked By: ROO

||

Answers:

Using melt and pivot_table:

pattern = ['Pattern1','Pattern2']

df_final = (df
 .reset_index().melt('index')
 .assign(variable=lambda d: d['variable'].str.extract(fr'({"|".join(pattern)})'))
 .pivot_table(index='index', columns='variable', values='value', aggfunc='sum')
)

One option using wide_to_long and groupby.sum (works with previous example before OP update):

pattern = ['Pattern1','Pattern2']

df_final = (pd
    .wide_to_long(df.reset_index(), stubnames=pattern, i='index', j='x',
                 sep='_', suffix='.+')
   .groupby(level=0).sum()
)

output:

       Pattern1  Pattern2
index                    
0           8.0       7.0
1           3.0       9.0
2           6.0      12.0
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.