How to drop a specific column of csv file while reading it using pandas?

Question:

I need to remove a column with label name at the time of loading a csv using pandas. I am reading csv as follows and want to add parameters inside it to do so. Thanks.

pd.read_csv("sample.csv")

I know this to do after reading csv:

df.drop('name', axis=1)
Asked By: Anon George

||

Answers:

If you know the column names prior, you can do it by setting usecols parameter

When you know which columns to use

Suppose you have csv file with columns ['id','name','last_name'] and you want just ['name','last_name']. You can do it as below:

import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])

when you want first N columns

If you don’t know the column names but you want first N columns from dataframe. You can do it by

import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])

Edit

When you know name of the column to be dropped

# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)

# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])
Answered By: Sociopath

Get the column headers from your CSV using pd.read_csv with nrows=1, then do a subsequent read with usecols to pull everything but the column(s) you want to omit.

headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))

Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,

import csv

with open("sample.csv", 'r') as f:
    header = next(csv.reader(f))
    # For python 2, use
    # header = csv.reader(f).next()

df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))
Answered By: cs95

Using df= df.drop(['ID','prediction'],axis=1) made the work for me. I dropped ‘ID’ and ‘prediction’ columns. Make sure you put them in square brackets like ['column1','column2'].
There is no need for other complicated solutions.

Answered By: Ege

Columns can be dropped at the time of reading itself.

columns_to_be_removed = ['a', 'b']

data = pd.read_csv(sourceFileName).drop(columns_to_be_removed, axis = 'columns')
Answered By: Arcane

The only parameter to read_csv() that you can use to select the columns you use is usecols. According to the documentation, usecols accepts list-like or callable. Because you only know the columns you want to drop, you can’t use a list of the columns you want to keep. So use a callable:

pd.read_csv("sample.csv", 
            usecols=lambda x: x != 'name'
            )

And you could of course say x not in ['unwanted', 'column', 'names'] if you had a list of column names you didn’t want to use.

Answered By: jalopezp

This answer with two lines of code will really help you. You can even dynamically remove column names while creating CSV.

https://stackoverflow.com/a/71440977/12819393

Answered By: Ganesh Ghuge
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.