Get Pandas DataFrame first column

Question:

Suppose simple data frame:

import pandas as pd
a = pd.DataFrame([[0,1], [2,3]])

I can slice this data frame very easily, first column is a[[0]], second is a[[1]].

Now, lets have more complex data frame. This is part of my code:

frame = pd.DataFrame(range(100), columns=["Variable"], index=["_".join(["loc", str(i)]) for i in range(1, 101)])
frame[1] = [i**3 for i in range(100)]

DataFrame frame is also a pandas DataFrame. I can get the second column by frame[[1]]. But when I try frame[[0]], I get an error:

Traceback (most recent call last):

  File "<ipython-input-55-0c56ffb47d0d>", line 1, in <module>
    frame[[0]]

  File "C:UsersRobertDesktopZálohaWinPython-64bit-3.5.2.2python-    3.5.2.amd64libsite-packagespandascoreframe.py", line 1991, in __getitem__
    return self._getitem_array(key)

  File "C:UsersRobertDesktopZálohaWinPython-64bit-3.5.2.2python-    3.5.2.amd64libsite-packagespandascoreframe.py", line 2035, in     _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)

  File "C:UsersRobertDesktopZálohaWinPython-64bit-3.5.2.2python-    3.5.2.amd64libsite-packagespandascoreindexing.py", line 1184, in     _convert_to_indexer
    indexer = labels._convert_list_indexer(objarr, kind=self.name)

  File "C:UsersRobertDesktopZálohaWinPython-64bit-3.5.2.2python-    3.5.2.amd64libsite-packagespandasindexesbase.py", line 1112, in     _convert_list_indexer
    return maybe_convert_indices(indexer, len(self))

  File "C:UsersRobertDesktopZálohaWinPython-64bit-3.5.2.2python-    3.5.2.amd64libsite-packagespandascoreindexing.py", line 1856, in     maybe_convert_indices
    raise IndexError("indices are out-of-bounds")

IndexError: indices are out-of-bounds

I can still use frame.iloc[:,0] but problem is that I don’t understand why I can’t use simple slicing by [[]]? I use winpython spyder 3.

Asked By: Bobesh

||

Answers:

using your code:

import pandas as pd

var_vec = [i for i in range(100)]
num_of_sites = 100
row_names = ["_".join(["loc", str(i)]) for i in 
             range(1,num_of_sites + 1)]
frame = pd.DataFrame(var_vec, columns = ["Variable"], index = row_names)
spec_ab = [i**3 for i in range(100)]
frame[1] = spec_ab

if you ask to print out the ‘frame’ you get:

    Variable    1
loc_1   0       0
loc_2   1       1
loc_3   2       8
loc_4   3       27
loc_5   4       64
loc_6   5       125
......

So the cause of your problem becomes obvious, you have no column called ‘0’.
At line one you specify a lista called var_vec.
At line 4 you make a dataframe out of that list, but you specify the index values and the column name (which is usually good practice).
The numerical column name, ‘0’, ‘1’,.. as in the first example, only takes place when you dont specify the column name, its not a column position indexer.

If you want to access columns by their position, you can:

df[df.columns[0]]

what happens than, is you get the list of columns of the df, and you choose the term ‘0’ and pass it to the df as a reference.

hope that helps you understand

edit:

another way (better) would be:

df.iloc[:,0]

where “:” stands for all rows. (also indexed by number from 0 to range of rows)

Answered By: epattaro

[] is a wrapper for __getitem__() which selects by label and as @epattaro explained, there’s no column label 0 in the dataframe created as in the OP. To select a column (or row) by position, the canonical way is via iloc.

df.iloc[:, 0]         # select first column as a Series
df.iloc[:, [0]]       # select first column as a single column DataFrame

df.iloc[0]            # select first row as a Series
df.iloc[[0]]          # select first row as a single row DataFrame

Yet another method is take():

df.take([0], axis=1)  # select first column
df.take([0])          # select first row

You can verify that for any df, df.take([0], axis=1).equals(df.iloc[:, [0]]) returns True.

Answered By: cottontail