Changing values in each row of a column based on values in other columns of the corresponding row (Python/Pandas)
Question:
data = [{'a': 12, 'b': 2, 'c': 3, 'd': 'bat'},
{'a': 'NaN', 'b': 20, 'c': 30, 'd': 'ball'},
{'a': 4, 'b': 20, 'c': 30, 'd': 'pin'}]
df = pd.DataFrame(data)
I’m having a hard time figuring out how to replace the NaN values in column A with values in column b based on conditions set on columns c and d. For example, if I wanted to replace the NaN values in the a column with the values of column b (2 and 20, respectively) under circumstances where the value of c > 20 and ‘d’ = ‘ball.
Could someone help me with this?
I’ve tried a number of solutions with df.loc and df.mask that have not worked.
Answers:
You can use df.apply
for this
def fill_na(row: pd.Series):
if pd.isna(row[0]):
if row[2] > 20 and row[3] == "ball":
row[0] = row[1]
return row
df = df.apply(fill_na, axis=1)
As you don’t have rows that would satisfy your criteria, you can use this for testing
[{'a': np.NaN, 'b': 2, 'c': 3, 'd': 'bat'},
{'a': 10, 'b': 20, 'c': 30, 'd': 'ball'},
{'a': np.NaN, 'b': 20, 'c': 30, 'd': 'pin'},
{'a': np.NaN, 'b': 15, 'c': 30, 'd': 'ball'}]
Try:
df['a'].loc[(df['a'].isna())&(df['c'] > 20) & (df['d'] == 'ball')] =df['b'].loc[(df['a'].isna())&(df['c'] > 20) & (df['d'] == 'ball')]
data = [{'a': 12, 'b': 2, 'c': 3, 'd': 'bat'},
{'a': 'NaN', 'b': 20, 'c': 30, 'd': 'ball'},
{'a': 4, 'b': 20, 'c': 30, 'd': 'pin'}]
df = pd.DataFrame(data)
I’m having a hard time figuring out how to replace the NaN values in column A with values in column b based on conditions set on columns c and d. For example, if I wanted to replace the NaN values in the a column with the values of column b (2 and 20, respectively) under circumstances where the value of c > 20 and ‘d’ = ‘ball.
Could someone help me with this?
I’ve tried a number of solutions with df.loc and df.mask that have not worked.
You can use df.apply
for this
def fill_na(row: pd.Series):
if pd.isna(row[0]):
if row[2] > 20 and row[3] == "ball":
row[0] = row[1]
return row
df = df.apply(fill_na, axis=1)
As you don’t have rows that would satisfy your criteria, you can use this for testing
[{'a': np.NaN, 'b': 2, 'c': 3, 'd': 'bat'},
{'a': 10, 'b': 20, 'c': 30, 'd': 'ball'},
{'a': np.NaN, 'b': 20, 'c': 30, 'd': 'pin'},
{'a': np.NaN, 'b': 15, 'c': 30, 'd': 'ball'}]
Try:
df['a'].loc[(df['a'].isna())&(df['c'] > 20) & (df['d'] == 'ball')] =df['b'].loc[(df['a'].isna())&(df['c'] > 20) & (df['d'] == 'ball')]