Dividing two columns of pandas daraframe and keep the header name

Question:

With the following data frame

ID,WEIGHT,I1,I2,I4
1,0.2,839,1664,3266
2,0.1,851,863,858
3,0.4,1018,1999,3982
4,0.3,878,1724,3447

I want to iterate over I1..I4 and create new data frames by joining the WEIGHT column and I_i/I1. The following code works fine

for i in range(3,5):
    df_new = pd.concat([df['WEIGHT'], df.iloc[:,i]/df.iloc[:,2]], axis=1)
    print(df_new)

But as you can see in the output, the column header is 0 which I guess is the result of I2/I1 and I4/I1.

   WEIGHT         0
0     0.2  1.983313
1     0.1  1.014101
2     0.4  1.963654
3     0.3  1.963554
   WEIGHT         0
0     0.2  3.892729
1     0.1  1.008226
2     0.4  3.911591
3     0.3  3.925968

How can I keep the columns as I2 and I4? I mean keeping the column head of df_new the same as df.iloc[:,i] ?

Asked By: mahmood

||

Answers:

Your solution is possible modify with divide by one columns DataFrame or by rename:

for i in range(3,5):

    df_new = df[['WEIGHT']].assign(**{df.columns[i]: df.iloc[:,i]/df.iloc[:,2]})

    df_new = pd.concat([df['WEIGHT'], df.iloc[:,[i]].div(df.iloc[:,2], axis=0)], axis=1)

    df_new = pd.concat([df['WEIGHT'],
                         df.iloc[:,i]/df.iloc[:,2].rename(df.columns[i])], axis=1)
    print(df_new)
    
   WEIGHT        I2
0     0.2  1.983313
1     0.1  1.014101
2     0.4  1.963654
3     0.3  1.963554
   WEIGHT        I4
0     0.2  3.892729
1     0.1  1.008226
2     0.4  3.911591
3     0.3  3.925968

I think no loop necessary – select columns by position and divide:

df.iloc[:, 3:5] = df.iloc[:, 3:5].div(df.iloc[:, 2], axis=0)
print (df)
   ID  WEIGHT    I1        I2        I4
0   1     0.2   839  1.983313  3.892729
1   2     0.1   851  1.014101  1.008226
2   3     0.4  1018  1.963654  3.911591
3   4     0.3   878  1.963554  3.925968

For new DataFrame:

df_new = pd.concat([df.iloc[:, :3], df.iloc[:, 3:5].div(df.iloc[:, 2], axis=0)], axis=1)
print (df_new)
   ID  WEIGHT    I1        I2        I4
0   1     0.2   839  1.983313  3.892729
1   2     0.1   851  1.014101  1.008226
2   3     0.4  1018  1.963654  3.911591
3   4     0.3   878  1.963554  3.925968
Answered By: jezrael

You probably shouldn’t use a loop, but rather set_index the non target columns, then process the data and optionally stack if you want a long format:

out = (df.set_index(['ID', 'WEIGHT'])
         .pipe(lambda d: d.div(d['I1'], axis=0))
         .stack().reset_index(name='value')
      )
Answered By: mozway

If you want to create your new dataframe in a loop, you could assign a new column to df[['WEIGHT'][] and then rename that column appropriately:

cols = df.columns.to_list()
for i in range(3, 5):
    col = cols[i]
    df_new = df_new = df[['WEIGHT']].assign(**{col: df[col]/df.iloc[:, 2]})
    print(df_new)

Output (for your sample data):

   WEIGHT        I2
0     0.2  1.983313
1     0.1  1.014101
2     0.4  1.963654
3     0.3  1.963554
   WEIGHT        I4
0     0.2  3.892729
1     0.1  1.008226
2     0.4  3.911591
3     0.3  3.925968
Answered By: Nick

You can keep the column headers in the new data frames by setting them to the corresponding column name from the original data frame. You can use the rename method to achieve this.

Here’s how you can modify your code to keep the column headers as I2 and I4:

import pandas as pd

data = {
    'ID': [1, 2, 3, 4],
    'WEIGHT': [0.2, 0.1, 0.4, 0.3],
    'I1': [839, 851, 1018, 878],
    'I2': [1664, 863, 1999, 1724],
    'I4': [3266, 858, 3982, 3447]
}

df = pd.DataFrame(data)

for i in range(3, 5):
    column_name = df.columns[i]
    df_new = pd.concat([df['WEIGHT'], df.iloc[:, i] / df['I1']], axis=1)
    # To be added on your code
    df_new = df_new.rename(columns={'WEIGHT': 'WEIGHT', 0: column_name})
    print(df_new)

Output:

    WEIGHT    I2
0     0.2  1.983313
1     0.1  1.014101
2     0.4  1.963654
3     0.3  1.963554
    WEIGHT    I4
0     0.2  3.892729
1     0.1  1.008226
2     0.4  3.911591
3     0.3  3.925968
Answered By: Aditya Tiwari

Code

Indexing with [[column name]] does not remove column name.(becuz it is dataframe not series) And join is more convenient than concat for horizontal concatenation

for i in df.columns[3:]:
    df_new = df[['WEIGHT']].join(df[[i]].div(df['I1'], axis=0))
    print(df_new)
Answered By: Panda Kim

Another possible solution, which uses numpy for vectorized division of columns:

df[['I2', 'I4']] = df.iloc[:, -2:].values / df['I1'].values.reshape(-1,1)
[df[['WEIGHT', x]] for x in ['I2', 'I4']]

Output:

 [   WEIGHT        I2
 0     0.2  1.983313
 1     0.1  1.014101
 2     0.4  1.963654
 3     0.3  1.963554,
    WEIGHT        I4
 0     0.2  3.892729
 1     0.1  1.008226
 2     0.4  3.911591
 3     0.3  3.925968]
Answered By: PaulS
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.