how to delete a duplicate column read from excel in pandas

Question

Data in excel:

a   b   a   d
1   2   3   4
2   3   4   5
3   4   5   6
4   5   6   7

Code:

df= pd.io.excel.read_excel(r"sample.xlsx",sheetname="Sheet1")
df
   a  b  a.1  d
0  1  2    3  4
1  2  3    4  5
2  3  4    5  6
3  4  5    6  7

how to delete the column a.1?

when pandas reads the data from excel it automatically changes the column name of 2nd a to a.1.

I tried df.drop("a.1",index=1) , this does not work.

I have a huge excel file which has duplicate names, and i am interested only in few of columns.

||

Answer 1

If you know the name of the column you want to drop:

df = df[[col for col in df.columns if col != 'a.1']]

and if you have several columns you want to drop:

columns_to_drop = ['a.1', 'b.1', ... ]
df = df[[col for col in df.columns if col not in columns_to_drop]]

Answered By: DeepSpace

Answer 2

You need to pass axis=1 for drop to work:

In [100]:
df.drop('a.1', axis=1)

Out[100]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

Or just pass a list of the cols of interest for column selection:

In [102]:
cols = ['a','b','d']
df[cols]

Out[102]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

Also works with ‘fancy indexing’:

In [103]:
df.ix[:,cols]

Out[103]:
   a  b  d
0  1  2  4
1  2  3  5
2  3  4  6
3  4  5  7

Answered By: EdChum

Answer 3

Much more generally drop all duplicated columns

df= df.drop(df.filter(regex='.d').columns, axis=1)

Question: