What is the most efficient way to normalize values in a single row in pandas?

Question

I have two types of columns in a pandas dataframe, let’s say A and B.

How to normalize the values in each row individually using the mean for each type of column efficiently?

I can first calculate mean for each column type and then divide each column with it’s respective column type mean but it’s taking too much time(more than 30 mins). I have over 300 columns and 500K rows.

df = pd.DataFrame({'A1': [1,2,3],
                   'A2': [4,5,6],
                   'A3': [7,8,9],
                   'B1': [11,12,13],
                   'B2': [14,15,16],
                   'B3': [17,18,19]                  
                  })

df['A_mean'] = df.apply(lambda x: x.filter(regex='A').mean(), axis=1)
df['A1'] = df['A1']/df['A_mean']

I am expecting the following result.

Asked By: Vinay

||

Source

Answer 1

I just copy pasted the same question in chatGPT and with a minor modification it gave a good answer.

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'A1': [1,2,3],
                   'A2': [4,5,6],
                   'A3': [7,8,9],
                   'B1': [11,12,13],
                   'B2': [14,15,16],
                   'B3': [17,18,19]                  
                  })

# calculate the mean for each type of column A and B
A_mean = df.filter(regex='A').mean(axis=1)
B_mean = df.filter(regex='B').mean(axis=1)

# normalize the values in each row of column A
df[df.filter(regex='A').columns] = df.filter(regex='A').div(A_mean, axis=0)

# normalize the values in each row of column B
df[df.filter(regex='B').columns] = df.filter(regex='B').div(B_mean, axis=0)

Answered By: Vinay

Answer 2

Here’s a way to do what your question asks (note that I have used startswith instead of filter, but this can be tweaked for generality if needed):

prefixes = ['A','B']
colsByPrefix = [[col for col in df.columns if col.startswith(pref)] for pref in prefixes]
df = pd.concat([df[cols] / df[cols].mean(axis=1).to_frame().to_numpy() for cols in colsByPrefix], axis=1)

Output:

     A1   A2    A3        B1   B2        B3
0  0.25  1.0  1.75  0.785714  1.0  1.214286
1  0.40  1.0  1.60  0.800000  1.0  1.200000
2  0.50  1.0  1.50  0.812500  1.0  1.187500

Answered By: constantstranger

Answer 3

Run a groupby and unpack the dataframe within the assign function:

df.assign(**df.groupby(df.columns.str[0], axis = 1).mean().add_suffix("_mean"))
   A1  A2  A3  B1  B2  B3  A_mean  B_mean
0   1   4   7  11  14  17     4.0    14.0
1   2   5   8  12  15  18     5.0    15.0
2   3   6   9  13  16  19     6.0    16.0

Answered By: sammywemmy

What is the most efficient way to normalize values in a single row in pandas?

Question:

Answers: