How to replace column values that are not in a particular range will null values using a conditional in python
Question:
I have a dataframe that contains a column for age. Some of the values are outside of my desired range and I want to replace them will null values. I want ages that are not in the range between 20 and 50 to be replaced with null values.
This is what I tried and it doesn’t seem to work
import pandas as pd
import numpy as np
age_range = (df['age'] < 20) | (df['age'] > 50)
df[age_range = np.nan]
Answers:
Simple syntax error. Do this
import pandas as pd
import numpy as np
df = pd.DataFrame({'age': [18, 25, 35, 40, 55]})
age_range = (df['age'] < 20) | (df['age'] > 50)
df.loc[age_range, 'age'] = np.nan
print(df)
which gives
age
0 NaN
1 25.0
2 35.0
3 40.0
4 NaN
You can do this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'age': [18, 22, 35, 55, 42]})
df['age'] = np.where((df['age'] < 20) | (df['age'] > 50), np.nan, df['age'])
print(df)
Output:
age
0 NaN
1 22.0
2 35.0
3 NaN
4 42.0
I have a dataframe that contains a column for age. Some of the values are outside of my desired range and I want to replace them will null values. I want ages that are not in the range between 20 and 50 to be replaced with null values.
This is what I tried and it doesn’t seem to work
import pandas as pd
import numpy as np
age_range = (df['age'] < 20) | (df['age'] > 50)
df[age_range = np.nan]
Simple syntax error. Do this
import pandas as pd
import numpy as np
df = pd.DataFrame({'age': [18, 25, 35, 40, 55]})
age_range = (df['age'] < 20) | (df['age'] > 50)
df.loc[age_range, 'age'] = np.nan
print(df)
which gives
age
0 NaN
1 25.0
2 35.0
3 40.0
4 NaN
You can do this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'age': [18, 22, 35, 55, 42]})
df['age'] = np.where((df['age'] < 20) | (df['age'] > 50), np.nan, df['age'])
print(df)
Output:
age
0 NaN
1 22.0
2 35.0
3 NaN
4 42.0