Why is the `df.columns` an empty list while I can see the column names if I print out the dataframe? Python Pandas

Question:

import pandas as pd
DATA = pd.read_csv(url)
DATA.head()

I have a large dataset that have dozens of columns. After loading it like above into Colab, I can see the name of each column. But running DATA.columns just return Index([], dtype='object'). What’s happening in this?

Now I find it impossible to pick out a few columns without column names. One way is to specify names = [...] when I load it, but I’m reluctant to do that since there’re too many columns. So I’m looking for a way to index a column by integers, like in R df[:,[1,2,3]] would simply give me the first three columns of a dataframe. Somehow Pandas seems to focus on column names and makes integer indexing very inconvenient, though.

So what I’m asking is (1) What did I do wrong? Can I obtain those column names as well when I load the dataframe? (2) If not, how can I pick out the [0, 1, 10]th column by a list of integers?


It seems that the problem is in the loading as DATA.shape returns (10000,0). I rerun the loading code a few times, and all of a sudden, things go back normal. Maybe Colab was taking a nap or something?

Asked By: Paw in Data

||

Answers:

You can perfectly do that using df.loc[:,[1,2,3]] but i would suggest you to use the names because if the columns ever change the order or you insert new columns, the code can break it.

Answered By: umbreon29
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.