Pandas how to use initially generated column names without renaming them

Question:

I was curious that is any way we can use these initially generated column names by Pandas while reading a csv/Text files like as follows

df = pd.read_csv("some_text_file.txt", header = None)

which will produce something like

     0         1         2

0   data1    data2     data3  
1  r2 data1  r2 data2     r2 data3  

When we used header = None it generated some column names as = 0 1 2 by default.

When I try to access them like

-->    df['0'] = sometask

It throws error

raise KeyError(key) from err
KeyError: ‘0’

Aren’t they column names at all?. I’ve seen some people calling them as Levels. Like

level0 - column 0
level1 - column 1
level2 - column 2 

I’ve also tried

-->    df[level0] = sometask

it throws an error:

NameError: name ‘level0’ is not definedNameError: name ‘level0’ is not defined

I know we have to rename the column names and use them like

df.columns =['col1','col2'.....]

But, wondering there is any way we can these pandas generated column names without renaming them as shown above.

Asked By: user17993062

||

Answers:

The name of the columns is, by default, a number. Hence, when trying to access df['0'], you get a KeyError, but if you use df[0], you will get the first column.

Answered By: Florent Monin

Inside pd.read_csv, you can pass a list to the names parameter. E.g.:

df = pd.read_csv('some_text_file.txt', header=None, 
                 names=[f'col_{i}' for i in range(1,4)])

print(df)

      col_1     col_2     col_3
0     data1     data2     data3
1  r2 data1  r2 data2  r2 data3

Note that the list of names cannot contain any duplicates (e.g. ['col', 'col', 'col2'] will cause an error).


The default col "names" 0,1,2 etc. are integers, rather than strings. You can check this as follows:

print(df.columns)

Int64Index([0, 1, 2], dtype='int64')

E.g. to access column 0, you should use df[0] or df.loc[:,0], not df['0'] etc.

Answered By: ouroboros1
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.