Dividing two columns of pandas daraframe and keep the header name
Question:
With the following data frame
ID,WEIGHT,I1,I2,I4
1,0.2,839,1664,3266
2,0.1,851,863,858
3,0.4,1018,1999,3982
4,0.3,878,1724,3447
I want to iterate over I1..I4 and create new data frames by joining the WEIGHT
column and I_i/I1
. The following code works fine
for i in range(3,5):
df_new = pd.concat([df['WEIGHT'], df.iloc[:,i]/df.iloc[:,2]], axis=1)
print(df_new)
But as you can see in the output, the column header is 0 which I guess is the result of I2/I1
and I4/I1
.
WEIGHT 0
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT 0
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
How can I keep the columns as I2 and I4? I mean keeping the column head of df_new
the same as df.iloc[:,i]
?
Answers:
Your solution is possible modify with divide by one columns DataFrame
or by rename
:
for i in range(3,5):
df_new = df[['WEIGHT']].assign(**{df.columns[i]: df.iloc[:,i]/df.iloc[:,2]})
df_new = pd.concat([df['WEIGHT'], df.iloc[:,[i]].div(df.iloc[:,2], axis=0)], axis=1)
df_new = pd.concat([df['WEIGHT'],
df.iloc[:,i]/df.iloc[:,2].rename(df.columns[i])], axis=1)
print(df_new)
WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
I think no loop necessary – select columns by position and divide:
df.iloc[:, 3:5] = df.iloc[:, 3:5].div(df.iloc[:, 2], axis=0)
print (df)
ID WEIGHT I1 I2 I4
0 1 0.2 839 1.983313 3.892729
1 2 0.1 851 1.014101 1.008226
2 3 0.4 1018 1.963654 3.911591
3 4 0.3 878 1.963554 3.925968
For new DataFrame:
df_new = pd.concat([df.iloc[:, :3], df.iloc[:, 3:5].div(df.iloc[:, 2], axis=0)], axis=1)
print (df_new)
ID WEIGHT I1 I2 I4
0 1 0.2 839 1.983313 3.892729
1 2 0.1 851 1.014101 1.008226
2 3 0.4 1018 1.963654 3.911591
3 4 0.3 878 1.963554 3.925968
You probably shouldn’t use a loop, but rather set_index
the non target columns, then process the data and optionally stack
if you want a long format:
out = (df.set_index(['ID', 'WEIGHT'])
.pipe(lambda d: d.div(d['I1'], axis=0))
.stack().reset_index(name='value')
)
If you want to create your new dataframe in a loop, you could assign a new column to df[['WEIGHT'][]
and then rename that column appropriately:
cols = df.columns.to_list()
for i in range(3, 5):
col = cols[i]
df_new = df_new = df[['WEIGHT']].assign(**{col: df[col]/df.iloc[:, 2]})
print(df_new)
Output (for your sample data):
WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
You can keep the column headers in the new data frames by setting them to the corresponding column name from the original data frame. You can use the rename
method to achieve this.
Here’s how you can modify your code to keep the column headers as I2 and I4:
import pandas as pd
data = {
'ID': [1, 2, 3, 4],
'WEIGHT': [0.2, 0.1, 0.4, 0.3],
'I1': [839, 851, 1018, 878],
'I2': [1664, 863, 1999, 1724],
'I4': [3266, 858, 3982, 3447]
}
df = pd.DataFrame(data)
for i in range(3, 5):
column_name = df.columns[i]
df_new = pd.concat([df['WEIGHT'], df.iloc[:, i] / df['I1']], axis=1)
# To be added on your code
df_new = df_new.rename(columns={'WEIGHT': 'WEIGHT', 0: column_name})
print(df_new)
Output:
WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
Code
Indexing with [[column name]] does not remove column name.(becuz it is dataframe not series) And join
is more convenient than concat
for horizontal concatenation
for i in df.columns[3:]:
df_new = df[['WEIGHT']].join(df[[i]].div(df['I1'], axis=0))
print(df_new)
Another possible solution, which uses numpy
for vectorized division of columns:
df[['I2', 'I4']] = df.iloc[:, -2:].values / df['I1'].values.reshape(-1,1)
[df[['WEIGHT', x]] for x in ['I2', 'I4']]
Output:
[ WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554,
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968]
With the following data frame
ID,WEIGHT,I1,I2,I4
1,0.2,839,1664,3266
2,0.1,851,863,858
3,0.4,1018,1999,3982
4,0.3,878,1724,3447
I want to iterate over I1..I4 and create new data frames by joining the WEIGHT
column and I_i/I1
. The following code works fine
for i in range(3,5):
df_new = pd.concat([df['WEIGHT'], df.iloc[:,i]/df.iloc[:,2]], axis=1)
print(df_new)
But as you can see in the output, the column header is 0 which I guess is the result of I2/I1
and I4/I1
.
WEIGHT 0
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT 0
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
How can I keep the columns as I2 and I4? I mean keeping the column head of df_new
the same as df.iloc[:,i]
?
Your solution is possible modify with divide by one columns DataFrame
or by rename
:
for i in range(3,5):
df_new = df[['WEIGHT']].assign(**{df.columns[i]: df.iloc[:,i]/df.iloc[:,2]})
df_new = pd.concat([df['WEIGHT'], df.iloc[:,[i]].div(df.iloc[:,2], axis=0)], axis=1)
df_new = pd.concat([df['WEIGHT'],
df.iloc[:,i]/df.iloc[:,2].rename(df.columns[i])], axis=1)
print(df_new)
WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
I think no loop necessary – select columns by position and divide:
df.iloc[:, 3:5] = df.iloc[:, 3:5].div(df.iloc[:, 2], axis=0)
print (df)
ID WEIGHT I1 I2 I4
0 1 0.2 839 1.983313 3.892729
1 2 0.1 851 1.014101 1.008226
2 3 0.4 1018 1.963654 3.911591
3 4 0.3 878 1.963554 3.925968
For new DataFrame:
df_new = pd.concat([df.iloc[:, :3], df.iloc[:, 3:5].div(df.iloc[:, 2], axis=0)], axis=1)
print (df_new)
ID WEIGHT I1 I2 I4
0 1 0.2 839 1.983313 3.892729
1 2 0.1 851 1.014101 1.008226
2 3 0.4 1018 1.963654 3.911591
3 4 0.3 878 1.963554 3.925968
You probably shouldn’t use a loop, but rather set_index
the non target columns, then process the data and optionally stack
if you want a long format:
out = (df.set_index(['ID', 'WEIGHT'])
.pipe(lambda d: d.div(d['I1'], axis=0))
.stack().reset_index(name='value')
)
If you want to create your new dataframe in a loop, you could assign a new column to df[['WEIGHT'][]
and then rename that column appropriately:
cols = df.columns.to_list()
for i in range(3, 5):
col = cols[i]
df_new = df_new = df[['WEIGHT']].assign(**{col: df[col]/df.iloc[:, 2]})
print(df_new)
Output (for your sample data):
WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
You can keep the column headers in the new data frames by setting them to the corresponding column name from the original data frame. You can use the rename
method to achieve this.
Here’s how you can modify your code to keep the column headers as I2 and I4:
import pandas as pd
data = {
'ID': [1, 2, 3, 4],
'WEIGHT': [0.2, 0.1, 0.4, 0.3],
'I1': [839, 851, 1018, 878],
'I2': [1664, 863, 1999, 1724],
'I4': [3266, 858, 3982, 3447]
}
df = pd.DataFrame(data)
for i in range(3, 5):
column_name = df.columns[i]
df_new = pd.concat([df['WEIGHT'], df.iloc[:, i] / df['I1']], axis=1)
# To be added on your code
df_new = df_new.rename(columns={'WEIGHT': 'WEIGHT', 0: column_name})
print(df_new)
Output:
WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968
Code
Indexing with [[column name]] does not remove column name.(becuz it is dataframe not series) And join
is more convenient than concat
for horizontal concatenation
for i in df.columns[3:]:
df_new = df[['WEIGHT']].join(df[[i]].div(df['I1'], axis=0))
print(df_new)
Another possible solution, which uses numpy
for vectorized division of columns:
df[['I2', 'I4']] = df.iloc[:, -2:].values / df['I1'].values.reshape(-1,1)
[df[['WEIGHT', x]] for x in ['I2', 'I4']]
Output:
[ WEIGHT I2
0 0.2 1.983313
1 0.1 1.014101
2 0.4 1.963654
3 0.3 1.963554,
WEIGHT I4
0 0.2 3.892729
1 0.1 1.008226
2 0.4 3.911591
3 0.3 3.925968]