Index Pandas Dataframe mixing row number and column name
Question:
Coming from R and finding the index rules for pandas dataframes to be not easy to use. I have a dataframe where I want to get the ith row and some columns by their names. I can clearly understand using either iloc
or loc
as shown below.
df = pd.DataFrame(np.random.randn(8, 4),columns=['A', 'B', 'C', 'D'])
df.loc[:,['A', 'B']]
df.iloc[0:,0:2]
Conceptually what I want is something like:
df.loc[0:,['A', 'B']]
Meaning the first row with those columns. Of course that code fails. I can seemingly use:
df.loc[0:0,['A', 'B']]
But, this seems strange, though it works. How does one properly index using a combination of row number and column names? In R we would do something like:
df = data.frame(matrix(rnorm(32),8,4))
colnames(df) <- c("A", "B", "C", "D")
df[1, c('A', 'B')]
*** UPDATE ***
I was mistaken, the example code above indeed works on this toy dataframe. But, on my real data, I see the following? Both objects are of same type and code is the same, not understanding the error here.
type(poly_set)
<class 'pandas.core.frame.DataFrame'>
poly_set.loc[:,['P1', 'P2', 'P3']]
P1 P2 P3
29 -2.0897226679999998 -1.237649556 None
361 -2.0789117340000001 0.144751427 1.572417454
642 -2.0681314259999999 -0.196563749 1.500834574
poly_set.loc[0,['P1', 'P2', 'P3']]
Traceback (most recent call last):
File "C:UsersAppDataLocalProgramsPythonPython38-32libsite-packagespandascoreindexesbase.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas_libsindex.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libsindex.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libshashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas_libshashtable_class_helper.pxi", line 1005, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
Answers:
You are using slicing which means between two given index. If you only want first row data just use:
Try:
df = df.reset_index()
df.loc[0,['A', 'B']]
You can use .iloc
(to get the i-th row) and .loc
(to get columns by name) together:
row_number = 0
df.iloc[row_number].loc[['A', 'B']]
You can even remove the .loc
:
df.iloc[row_number][['A', 'B']]
I agree that pandas slicing rules are not as easy to use as they should be. I believe the suggested approach these days is to use loc[]
with a nested index lookup
df.loc[df.index[row_numbers], ['A','B']]
I have no idea why pandas still does not have an xloc[]
or something similar that allows for row numbers and column names. See this answer to the same question.
In your answer update, you use loc[]
, which can only look up row and column indexes, but you can see from the previous printout that there is no row with an index of 0. The row that is in location 0 has an index of 29. If you use my approach or the others mentioned here, you will have success.
Coming from R and finding the index rules for pandas dataframes to be not easy to use. I have a dataframe where I want to get the ith row and some columns by their names. I can clearly understand using either iloc
or loc
as shown below.
df = pd.DataFrame(np.random.randn(8, 4),columns=['A', 'B', 'C', 'D'])
df.loc[:,['A', 'B']]
df.iloc[0:,0:2]
Conceptually what I want is something like:
df.loc[0:,['A', 'B']]
Meaning the first row with those columns. Of course that code fails. I can seemingly use:
df.loc[0:0,['A', 'B']]
But, this seems strange, though it works. How does one properly index using a combination of row number and column names? In R we would do something like:
df = data.frame(matrix(rnorm(32),8,4))
colnames(df) <- c("A", "B", "C", "D")
df[1, c('A', 'B')]
*** UPDATE ***
I was mistaken, the example code above indeed works on this toy dataframe. But, on my real data, I see the following? Both objects are of same type and code is the same, not understanding the error here.
type(poly_set)
<class 'pandas.core.frame.DataFrame'>
poly_set.loc[:,['P1', 'P2', 'P3']]
P1 P2 P3
29 -2.0897226679999998 -1.237649556 None
361 -2.0789117340000001 0.144751427 1.572417454
642 -2.0681314259999999 -0.196563749 1.500834574
poly_set.loc[0,['P1', 'P2', 'P3']]
Traceback (most recent call last):
File "C:UsersAppDataLocalProgramsPythonPython38-32libsite-packagespandascoreindexesbase.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas_libsindex.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libsindex.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libshashtable_class_helper.pxi", line 998, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas_libshashtable_class_helper.pxi", line 1005, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 0
You are using slicing which means between two given index. If you only want first row data just use:
Try:
df = df.reset_index()
df.loc[0,['A', 'B']]
You can use .iloc
(to get the i-th row) and .loc
(to get columns by name) together:
row_number = 0
df.iloc[row_number].loc[['A', 'B']]
You can even remove the .loc
:
df.iloc[row_number][['A', 'B']]
I agree that pandas slicing rules are not as easy to use as they should be. I believe the suggested approach these days is to use loc[]
with a nested index lookup
df.loc[df.index[row_numbers], ['A','B']]
I have no idea why pandas still does not have an xloc[]
or something similar that allows for row numbers and column names. See this answer to the same question.
In your answer update, you use loc[]
, which can only look up row and column indexes, but you can see from the previous printout that there is no row with an index of 0. The row that is in location 0 has an index of 29. If you use my approach or the others mentioned here, you will have success.