Set variable column values to nan based on row condition
Question:
I want to be able to variably change a column value based on the value of the first column.
Say I have a dataframe as follows:
col_ind col_1 col_2 col_3
3 a b c
2 d e f
1 g h i
I effectively want to do
df.loc[:, df.columns[-df['col_ind']:]] = np.nan
Which would result in:
col_ind col_1 col_2 col_3
3 nan nan nan
2 d nan nan
1 g h nan
Answers:
You can get the values
of df["col_ind"]
, iterate through them and set the slice
to np.nan
:
vals = df["col_ind"].values
for i, v in enumerate(vals):
df.iloc[i, -v:] = np.nan
You an use apply
with result_type='broadcast'
. (Edit: borrowing @marcelo-paco’s code)
def make_nan(row):
row[-row[0]:] = np.nan
return row
df = pd.DataFrame({'col_ind': [3, 2, 1], 'col_1': ['a', 'd', 'g'], 'col_2': ['b', 'e', 'h'], 'col_3': ['c', 'f', 'i']})
df[:] = df.apply(make_nan, axis=1, result_type='broadcast')
df
This will give:
col_ind col_1 col_2 col_3
3 NaN NaN NaN
2 d NaN NaN
1 g h NaN
Lets use broadcasting to check the indices which can be masked
c = df.columns[1:]
m = range(len(c), 0, -1) <= df['col_ind'].values[:, None]
df[c] = df[c].mask(m)
Result
col_ind col_1 col_2 col_3
0 3 NaN NaN NaN
1 2 d NaN NaN
2 1 g h NaN
I want to be able to variably change a column value based on the value of the first column.
Say I have a dataframe as follows:
col_ind col_1 col_2 col_3
3 a b c
2 d e f
1 g h i
I effectively want to do
df.loc[:, df.columns[-df['col_ind']:]] = np.nan
Which would result in:
col_ind col_1 col_2 col_3
3 nan nan nan
2 d nan nan
1 g h nan
You can get the values
of df["col_ind"]
, iterate through them and set the slice
to np.nan
:
vals = df["col_ind"].values
for i, v in enumerate(vals):
df.iloc[i, -v:] = np.nan
You an use apply
with result_type='broadcast'
. (Edit: borrowing @marcelo-paco’s code)
def make_nan(row):
row[-row[0]:] = np.nan
return row
df = pd.DataFrame({'col_ind': [3, 2, 1], 'col_1': ['a', 'd', 'g'], 'col_2': ['b', 'e', 'h'], 'col_3': ['c', 'f', 'i']})
df[:] = df.apply(make_nan, axis=1, result_type='broadcast')
df
This will give:
col_ind col_1 col_2 col_3
3 NaN NaN NaN
2 d NaN NaN
1 g h NaN
Lets use broadcasting to check the indices which can be masked
c = df.columns[1:]
m = range(len(c), 0, -1) <= df['col_ind'].values[:, None]
df[c] = df[c].mask(m)
Result
col_ind col_1 col_2 col_3
0 3 NaN NaN NaN
1 2 d NaN NaN
2 1 g h NaN