Pandas read in table without headers
Question:
Using pandas, how do I read in only a subset of the columns (say 4th and 7th columns) of a .csv file with no headers? I cannot seem to be able to do so using usecols
.
Answers:
Make sure you specify pass header=None
and add usecols=[3,6]
for the 4th and 7th columns.
In order to read a csv in that doesn’t have a header and for only certain columns you need to pass params header=None
and usecols=[3,6]
for the 4th and 7th columns:
df = pd.read_csv(file_path, header=None, usecols=[3,6])
See the docs
Previous answers were good and correct, but in my opinion, an extra names
parameter will make it perfect, and it should be the recommended way, especially when the csv has no headers
.
Solution
Use usecols
and names
parameters
df = pd.read_csv(file_path, usecols=[3,6], names=['colA', 'colB'])
Additional reading
or use header=None
to explicitly tells people that the csv
has no headers (anyway both lines are identical)
df = pd.read_csv(file_path, usecols=[3,6], names=['colA', 'colB'], header=None)
So that you can retrieve your data by
# with `names` parameter
df['colA']
df['colB']
instead of
# without `names` parameter
df[0]
df[1]
Explain
Based on read_csv, when names
are passed explicitly, then header
will be behaving like None
instead of 0
, so one can skip header=None
when names
exist.
As per documentation https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html :
headerint, list of int, default ‘infer’
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
namesarray-like, optional
List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.
columts = ['Day', 'PLMN', 'RNCname']
tempo = pd.read_csv("info.csv", sep=';', header=0, names=columts, index_col=False)
You can also call read_table()
with header=None
(to read the first row of the file as the first row of the data):
df = pd.read_table('test.tsv', sep=',', usecols=[3,6], header=None)
This function is more useful if the separator is t
(.tsv file etc.) because the default delimiter is t
(unlike read_csv
whose default delimiter is ,
).
Using pandas, how do I read in only a subset of the columns (say 4th and 7th columns) of a .csv file with no headers? I cannot seem to be able to do so using usecols
.
Make sure you specify pass header=None
and add usecols=[3,6]
for the 4th and 7th columns.
In order to read a csv in that doesn’t have a header and for only certain columns you need to pass params header=None
and usecols=[3,6]
for the 4th and 7th columns:
df = pd.read_csv(file_path, header=None, usecols=[3,6])
See the docs
Previous answers were good and correct, but in my opinion, an extra names
parameter will make it perfect, and it should be the recommended way, especially when the csv has no headers
.
Solution
Use usecols
and names
parameters
df = pd.read_csv(file_path, usecols=[3,6], names=['colA', 'colB'])
Additional reading
or use header=None
to explicitly tells people that the csv
has no headers (anyway both lines are identical)
df = pd.read_csv(file_path, usecols=[3,6], names=['colA', 'colB'], header=None)
So that you can retrieve your data by
# with `names` parameter
df['colA']
df['colB']
instead of
# without `names` parameter
df[0]
df[1]
Explain
Based on read_csv, when names
are passed explicitly, then header
will be behaving like None
instead of 0
, so one can skip header=None
when names
exist.
As per documentation https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html :
headerint, list of int, default ‘infer’
Row number(s) to use as the column names, and the start of the data. Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are inferred from the first line of the file, if column names are passed explicitly then the behavior is identical to header=None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
namesarray-like, optional
List of column names to use. If the file contains a header row, then you should explicitly pass header=0 to override the column names. Duplicates in this list are not allowed.
columts = ['Day', 'PLMN', 'RNCname']
tempo = pd.read_csv("info.csv", sep=';', header=0, names=columts, index_col=False)
You can also call read_table()
with header=None
(to read the first row of the file as the first row of the data):
df = pd.read_table('test.tsv', sep=',', usecols=[3,6], header=None)
This function is more useful if the separator is t
(.tsv file etc.) because the default delimiter is t
(unlike read_csv
whose default delimiter is ,
).