Adding calculated column in Pandas
Question:
I have a dataframe with 10 columns. I want to add a new column ‘age_bmi’ which should be a calculated column multiplying ‘age’ * ‘bmi’. age is an INT, bmi is a FLOAT.
That then creates the new dataframe with 11 columns.
Something I am doing isn’t quite right. I think it’s a syntax issue. Any ideas?
Thanks
df2['age_bmi'] = df(['age'] * ['bmi'])
print(df2)
Answers:
try df2['age_bmi'] = df.age * df.bmi
.
You’re trying to call the dataframe as a function, when you need to get the values of the columns, which you can access by key like a dictionary or by property if it’s a lowercase name with no spaces that doesn’t match a built-in DataFrame method.
Someone linked this in a comment the other day and it’s pretty awesome. I recommend giving it a watch, even if you don’t do the exercises: https://www.youtube.com/watch?v=5JnMutdy6Fw
As pointed by Cory, you’re calling a dataframe as a function, that’ll not work as you expect. Here are 4 ways to multiple two columns, in most cases you’d use the first method.
In [299]: df['age_bmi'] = df.age * df.bmi
or,
In [300]: df['age_bmi'] = df.eval('age*bmi')
or,
In [301]: df['age_bmi'] = pd.eval('df.age*df.bmi')
or,
In [302]: df['age_bmi'] = df.age.mul(df.bmi)
You have combined age & bmi inside a bracket and treating df as a function rather than a dataframe. Here df should be used to call the columns as a property of DataFrame-
df2['age_bmi'] = df['age'] *df['bmi']
You can also use assign
:
df2 = df.assign(age_bmi = df['age'] * df['bmi'])
I have a dataframe with 10 columns. I want to add a new column ‘age_bmi’ which should be a calculated column multiplying ‘age’ * ‘bmi’. age is an INT, bmi is a FLOAT.
That then creates the new dataframe with 11 columns.
Something I am doing isn’t quite right. I think it’s a syntax issue. Any ideas?
Thanks
df2['age_bmi'] = df(['age'] * ['bmi'])
print(df2)
try df2['age_bmi'] = df.age * df.bmi
.
You’re trying to call the dataframe as a function, when you need to get the values of the columns, which you can access by key like a dictionary or by property if it’s a lowercase name with no spaces that doesn’t match a built-in DataFrame method.
Someone linked this in a comment the other day and it’s pretty awesome. I recommend giving it a watch, even if you don’t do the exercises: https://www.youtube.com/watch?v=5JnMutdy6Fw
As pointed by Cory, you’re calling a dataframe as a function, that’ll not work as you expect. Here are 4 ways to multiple two columns, in most cases you’d use the first method.
In [299]: df['age_bmi'] = df.age * df.bmi
or,
In [300]: df['age_bmi'] = df.eval('age*bmi')
or,
In [301]: df['age_bmi'] = pd.eval('df.age*df.bmi')
or,
In [302]: df['age_bmi'] = df.age.mul(df.bmi)
You have combined age & bmi inside a bracket and treating df as a function rather than a dataframe. Here df should be used to call the columns as a property of DataFrame-
df2['age_bmi'] = df['age'] *df['bmi']
You can also use assign
:
df2 = df.assign(age_bmi = df['age'] * df['bmi'])