How to delete a column from a data frame with pandas?
Question:
I read my data
import pandas as pd
df = pd.read_csv('/path/file.tsv', header=0, delimiter='t')
print df
and get:
id text
0 361.273 text1...
1 374.350 text2...
2 374.350 text3...
How can I delete the id
column from the above data frame?. I tried the following:
import pandas as pd
df = pd.read_csv('/path/file.tsv', header=0, delimiter='t')
print df.drop('id', 1)
But it raises this exception:
ValueError: labels ['id'] not contained in axis
Answers:
df.drop(colname, axis=1)
(or del df[colname]
) is the correct method to use to delete a column.
If a ValueError
is raised, it means the column name is not exactly what you think it is.
Check df.columns
to see what Pandas thinks are the names of the columns.
To actually delete the column
del df['id']
or df.drop('id', 1)
should have worked if the passed column matches exactly
However, if you don’t need to delete the column then you can just select the column of interest like so:
In [54]:
df['text']
Out[54]:
0 text1
1 text2
2 textn
Name: text, dtype: object
If you never wanted it in the first place then you pass a list of cols to read_csv
as a param usecols
:
In [53]:
import io
temp="""id text
363.327 text1
366.356 text2
37782 textn"""
df = pd.read_csv(io.StringIO(temp), delimiter='s+', usecols=['text'])
df
Out[53]:
text
0 text1
1 text2
2 textn
Regarding your error it’s because 'id'
is not in your columns or that it’s spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist())
this will output a list of the columns and will show if you have any leading/trailing whitespace.
The best way to delete a column in pandas is to use drop:
df = df.drop('column_name', axis=1)
where 1
is the axis number (0
for rows and 1
for columns.)
To delete the column without having to reassign df
you can do:
df.drop('column_name', axis=1, inplace=True)
Finally, to drop by column number instead of by column label, try this.
To delete, e.g. the 1st, 2nd and 4th columns:
df.drop(df.columns[[0, 1, 3]], axis=1) # df.columns is zero-based pd.Index
Exceptions:
If a wrong column number or label is requested an error will be thrown.
To check the number of columns use df.shape[1]
or len(df.columns.values)
and to check the column labels use df.columns.values
.
An exception would be raised
answer was based on @LondonRob’s answer and left here to help future visitors of this page.
I read my data
import pandas as pd
df = pd.read_csv('/path/file.tsv', header=0, delimiter='t')
print df
and get:
id text
0 361.273 text1...
1 374.350 text2...
2 374.350 text3...
How can I delete the id
column from the above data frame?. I tried the following:
import pandas as pd
df = pd.read_csv('/path/file.tsv', header=0, delimiter='t')
print df.drop('id', 1)
But it raises this exception:
ValueError: labels ['id'] not contained in axis
df.drop(colname, axis=1)
(or del df[colname]
) is the correct method to use to delete a column.
If a ValueError
is raised, it means the column name is not exactly what you think it is.
Check df.columns
to see what Pandas thinks are the names of the columns.
To actually delete the column
del df['id']
or df.drop('id', 1)
should have worked if the passed column matches exactly
However, if you don’t need to delete the column then you can just select the column of interest like so:
In [54]:
df['text']
Out[54]:
0 text1
1 text2
2 textn
Name: text, dtype: object
If you never wanted it in the first place then you pass a list of cols to read_csv
as a param usecols
:
In [53]:
import io
temp="""id text
363.327 text1
366.356 text2
37782 textn"""
df = pd.read_csv(io.StringIO(temp), delimiter='s+', usecols=['text'])
df
Out[53]:
text
0 text1
1 text2
2 textn
Regarding your error it’s because 'id'
is not in your columns or that it’s spelt differently or has whitespace. To check this look at the output from print(df.columns.tolist())
this will output a list of the columns and will show if you have any leading/trailing whitespace.
The best way to delete a column in pandas is to use drop:
df = df.drop('column_name', axis=1)
where 1
is the axis number (0
for rows and 1
for columns.)
To delete the column without having to reassign df
you can do:
df.drop('column_name', axis=1, inplace=True)
Finally, to drop by column number instead of by column label, try this.
To delete, e.g. the 1st, 2nd and 4th columns:
df.drop(df.columns[[0, 1, 3]], axis=1) # df.columns is zero-based pd.Index
Exceptions:
If a wrong column number or label is requested an error will be thrown.
To check the number of columns use df.shape[1]
or len(df.columns.values)
and to check the column labels use df.columns.values
.
An exception would be raised
answer was based on @LondonRob’s answer and left here to help future visitors of this page.