Pandas .iloc indexing coupled with boolean indexing in a Dataframe

Question

I looked into existing threads regarding indexing, none of said threads address the present use case.

I would like to alter specific values in a DataFrame based on their position therein, ie., I’d like the values in the second column from the first to the 4th row to be NaN and values in the third column, first and second row to be NaN say we have the following `DataFrame`:

df = pd.DataFrame(np.random.standard_normal((7,3)))
print(df)
          0         1         2
0 -1.102888  1.293658 -2.290175
1 -1.826924 -0.661667 -1.067578
2  1.015479  0.058240 -0.228613
3 -0.760368  0.256324 -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

And I want alter df like below with the least amount of code:

          0         1         2
0 -1.102888       NaN       NaN
1 -1.826924       NaN       NaN
2  1.015479       NaN -0.228613
3 -0.760368       NaN -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

I tried using boolean indexing with .loc but resulted in an error:

df.loc[(:2,1:) & (2:4,1)] = np.nan

# exception message:
df.loc[(:2,1:) & (2:4,1)] = np.nan
            ^
SyntaxError: invalid syntax

I also thought about converting the DataFrame object to a numpy narray object but then I wouldn’t know how to use boolean in that case.

Asked By: Mehdi RH

||

Source

Answer 1

One way is define the requirement and assign to be clear:

d = {1:4,2:2}
for col,val in d.items():
    df.iloc[:val,col] = np.nan

print(df)

          0         1         2
0 -1.102888       NaN       NaN
1 -1.826924       NaN       NaN
2  1.015479       NaN -0.228613
3 -0.760368       NaN -0.259946
4  0.496348  0.437496  0.646149
5  0.717212  0.481687 -2.640917
6 -0.141584 -1.997986  1.226350

Answered By: anky

Pandas .iloc indexing coupled with boolean indexing in a Dataframe

Question:

Answers: