subtract a constant from a column to create a new one
Question:
I have a geodata frame. I have a column and I would like to create a new one subtracting one if that column is strictly greater than 0, otherwise, maintain the same value.
I have tried the following:
df['new_column'] = df.apply(lambda y: (df['old_column'].subtract(1)) if y['old_column'] > 0 else y['old_column'], axis=1)
It’s doing well at the time to differentiate when old_column
is greater than 0, but at the moment to substract one, it’s doing something strange, it’s not substracting, it’s just given a series of numbers, 3-2 2-1 1 1-1, things like that. Why is it doing that?
Answers:
Instead of apply
you can use np.where
which is faster for bigger dataframes and easier to read.
import numpy as np
import pandas as pd
df = pd.DataFrame({"old_column": [-3, -2, -1, 0, 1, 2, 3]})
df["new_column"] = np.where(df.old_column > 0, df.old_column-1, df.old_column)
df
old_column new_column
0 -3 -3
1 -2 -2
2 -1 -1
3 0 0
4 1 0
5 2 1
6 3 2
If this does not work for your df, please include an example
The error is that you need to take one cell and not the entire column df[‘old_column’] => y[‘old_column’]. In addition, there is no subtract method for a numpy object.
df['new_column'] = df.apply(lambda y: (y['old_column'] - 1) if y['old_column'] > 0 else y['old_column'], axis=1)
A simpler expression if data from one column is used
df['new_column'] = df['old_column'].apply(lambda y: y - 1 if y > 0 else y)
I have a geodata frame. I have a column and I would like to create a new one subtracting one if that column is strictly greater than 0, otherwise, maintain the same value.
I have tried the following:
df['new_column'] = df.apply(lambda y: (df['old_column'].subtract(1)) if y['old_column'] > 0 else y['old_column'], axis=1)
It’s doing well at the time to differentiate when old_column
is greater than 0, but at the moment to substract one, it’s doing something strange, it’s not substracting, it’s just given a series of numbers, 3-2 2-1 1 1-1, things like that. Why is it doing that?
Instead of apply
you can use np.where
which is faster for bigger dataframes and easier to read.
import numpy as np
import pandas as pd
df = pd.DataFrame({"old_column": [-3, -2, -1, 0, 1, 2, 3]})
df["new_column"] = np.where(df.old_column > 0, df.old_column-1, df.old_column)
df
old_column new_column
0 -3 -3
1 -2 -2
2 -1 -1
3 0 0
4 1 0
5 2 1
6 3 2
If this does not work for your df, please include an example
The error is that you need to take one cell and not the entire column df[‘old_column’] => y[‘old_column’]. In addition, there is no subtract method for a numpy object.
df['new_column'] = df.apply(lambda y: (y['old_column'] - 1) if y['old_column'] > 0 else y['old_column'], axis=1)
A simpler expression if data from one column is used
df['new_column'] = df['old_column'].apply(lambda y: y - 1 if y > 0 else y)