Why does a list variable work as an argument in loc in Pandas but a list does not?

Question:

I am new to machine learning and am learning Pandas.
I encountered a wierd behaviour with the loc function in pandas where a array variable works just fine as an argument in iloc but an array itself does not.

for eg, i have a dataframe named reviews with many columns. out of which i want to extract the first 100 data in columns ‘country’ and ‘variety’.
The following code works just fine:

cols = ['country', 'variety']
df = reviews.loc[:99, cols]

but this code does not:

df = reviews.loc[:99, ['country, variety']]

I am getting the following error on executing the above statement:

KeyError: "None of [Index(['country, variety'], dtype='object')] are in the [columns]"

Where am I going wrong?

Asked By: Archit Mishra

||

Answers:

The issue with your code is that the column names ‘country, variety’ are enclosed in a single quote, which makes them a single string instead of two separate strings. As a result, Pandas is looking for a column with the name ‘country, variety’ instead of two separate columns ‘country’ and ‘variety’.

To fix this, you should separate the column names with a comma, but enclose each column name in a separate set of quotes. Here is the corrected code:

df = reviews.loc[:99, ['country', 'variety']]

This should work as expected and extract the first 100 rows of the ‘country’ and ‘variety’ columns from the ‘reviews’ DataFrame.

It’s worth noting that you can also use the iloc function to achieve the same result, as follows:

df = reviews.iloc[:100, [0, 3]]

In this case, we’re using integer positions to select the first 100 rows and the 1st and 4th columns (remember that Python uses 0-based indexing, so the first column has an index of 0).

Answered By: Victor Vasiliev