Pandas .iloc indexing coupled with boolean indexing in a Dataframe

Question:

I looked into existing threads regarding indexing, none of said threads address the present use case.

I would like to alter specific values in a DataFrame based on their position therein, ie., I’d like the values in the second column from the first to the 4th row to be NaN and values in the third column, first and second row to be NaN say we have the following `DataFrame`:

df = pd.DataFrame(np.random.standard_normal((7,3)))
print(df)
          0         1         2
0 -1.102888  1.293658 -2.290175
1 -1.826924 -0.661667 -1.067578
2  1.015479  0.058240 -0.228613
3 -0.760368  0.256324 -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

And I want alter df like below with the least amount of code:

          0         1         2
0 -1.102888       NaN       NaN
1 -1.826924       NaN       NaN
2  1.015479       NaN -0.228613
3 -0.760368       NaN -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

I tried using boolean indexing with .loc but resulted in an error:

df.loc[(:2,1:) & (2:4,1)] = np.nan

# exception message:
df.loc[(:2,1:) & (2:4,1)] = np.nan
            ^
SyntaxError: invalid syntax

I also thought about converting the DataFrame object to a numpy narray object but then I wouldn’t know how to use boolean in that case.

Asked By: Mehdi RH

||

Answers:

One way is define the requirement and assign to be clear:

d = {1:4,2:2}
for col,val in d.items():
    df.iloc[:val,col] = np.nan

print(df)

          0         1         2
0 -1.102888       NaN       NaN
1 -1.826924       NaN       NaN
2  1.015479       NaN -0.228613
3 -0.760368       NaN -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350
Answered By: anky
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.