Index Pandas Dataframe mixing row number and column name

Question

Coming from R and finding the index rules for pandas dataframes to be not easy to use. I have a dataframe where I want to get the ith row and some columns by their names. I can clearly understand using either iloc or loc as shown below.

df = pd.DataFrame(np.random.randn(8, 4),columns=['A', 'B', 'C', 'D'])
df.loc[:,['A', 'B']]
df.iloc[0:,0:2]

Conceptually what I want is something like:

df.loc[0:,['A', 'B']]

Meaning the first row with those columns. Of course that code fails. I can seemingly use:

df.loc[0:0,['A', 'B']]

But, this seems strange, though it works. How does one properly index using a combination of row number and column names? In R we would do something like:

df = data.frame(matrix(rnorm(32),8,4))
colnames(df) <- c("A", "B", "C", "D") 
df[1, c('A', 'B')]

*** UPDATE ***
I was mistaken, the example code above indeed works on this toy dataframe. But, on my real data, I see the following? Both objects are of same type and code is the same, not understanding the error here.

type(poly_set)
<class 'pandas.core.frame.DataFrame'>
poly_set.loc[:,['P1', 'P2', 'P3']]
                      P1            P2           P3
29   -2.0897226679999998  -1.237649556         None
361  -2.0789117340000001   0.144751427  1.572417454
642  -2.0681314259999999  -0.196563749  1.500834574

poly_set.loc[0,['P1', 'P2', 'P3']]
Traceback (most recent call last):
  File "C:UsersAppDataLocalProgramsPythonPython38-32libsite-packagespandascoreindexesbase.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas_libsindex.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas_libsindex.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas_libshashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas_libshashtable_class_helper.pxi", line 1005, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0

Asked By: user350540

||

Source

Answer 1

You are using slicing which means between two given index. If you only want first row data just use:

Try:

df = df.reset_index()    
df.loc[0,['A', 'B']]

Answered By: Kriti Pawar

Answer 2

You can use .iloc (to get the i-th row) and .loc (to get columns by name) together:

row_number = 0
df.iloc[row_number].loc[['A', 'B']]

You can even remove the .loc:

df.iloc[row_number][['A', 'B']]

Answered By: Julio Batista Silva

Answer 3

I agree that pandas slicing rules are not as easy to use as they should be. I believe the suggested approach these days is to use loc[] with a nested index lookup

df.loc[df.index[row_numbers], ['A','B']]

I have no idea why pandas still does not have an xloc[] or something similar that allows for row numbers and column names. See this answer to the same question.

In your answer update, you use loc[], which can only look up row and column indexes, but you can see from the previous printout that there is no row with an index of 0. The row that is in location 0 has an index of 29. If you use my approach or the others mentioned here, you will have success.

Answered By: farnsy

Index Pandas Dataframe mixing row number and column name

Question:

Answers: