How to read a CSV without the first column
Question:
I am trying to read a simple CSV file like below, and put its contents in a 2D array:
"","x","y","sim1","sim2","sim3","sim4","sim5","sim6","sim7","sim8","sim9","sim10","sim11","sim12"
"1",181180,333740,5.56588745117188,6.29487752914429,7.4835410118103,5.75873327255249,6.62183284759521,5.81478500366211,4.85671949386597,5.90418815612793,6.32611751556396,6.99649047851562,6.52076387405396,5.68944215774536
"2",181140,333700,6.36264753341675,6.5217604637146,6.16843748092651,5.55328798294067,7.00429201126099,6.43625402450562,6.17744159698486,6.72836923599243,6.38574266433716,6.81451606750488,6.68060827255249,6.14339065551758
"3",181180,333700,6.16541910171509,6.44704437255859,7.51744651794434,5.46270132064819,6.8890323638916,6.46842670440674,6.07698059082031,6.2140531539917,6.43774271011353,6.21923875808716,6.43355655670166,5.90692138671875
To do this, I use this:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
But I always got this message:
"ValueError: could not convert string to float: "1"
I thought the problem was with the first column of each row. So, I tried to read it without the first column, but I couldn’t find out how.
So, how could I ignore the first column? Is there a way to read this file with the first column?
Answers:
You can specify a converter for any column.
converters = {0: lambda s: float(s.strip('"')}
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, converters=converters)
Or, you can specify which columns to use, something like:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,15))
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
One way you can skip the first column, without knowing the number of columns, is to read the number of columns from the csv manually. It’s easy enough, although you may need to tweak this on occasion to account for formatting inconsistencies*.
with open("Data/sim.csv") as f:
ncols = len(f.readline().split(','))
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
*If there are blank lines at the top, you’ll need to skip them. If there may be commas in the field headers, you should count columns using the first data line instead. So, if you have specific problems, I can add some details to make the code more robust.
Trying reading csv file using csv library
import csv
def someFunc(fname):
with open(fname) as f:
reader = csv.reader(f)
i = 0
header = True
for row in reader:
if header:
header = False
continue
out[i] = [row[j] for j in range(len(columns))]
i += 1
return out
out will have the 2D array.
You could use pandas and read it as a DataFrame object. If you know the column that you do not want, just add a .drop
to the loading line:
a = pandas.read_csv("Data/sim.csv",sep=",")
a = a.drop(a.columns[0], axis=1)
The first row will be read as a header, but you can add a skiprows=1 in the read_csv parameter.
Pandas DataFrames are numpy arrays, so, converting columns or matrices to numpy arrays is pretty straightforward.
jmilloy and Deninhos’s answers are both good. If OP specifically wants to read in an NumPy array (as opposed to pandas dataframe), another simplistic alternative is to delete the index column after reading it in. This works when you know the index column is always the first, but number of features (columns) are flexible.
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
data = np.delete(data, 0, axis = 1)
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
data.append(float(row[1]))
import pandas
pandas_data = pandas.read_csv('filename', sep=",", header=0,index_col=0)
This worked for me
import pandas
data = pandas.read_csv("Data/sim.csv",sep=",").iloc[:,1:]
Read the csv
file like this.
df = pd.read_csv('file.csv', usecols=range(1, len(df.columns)))
I am trying to read a simple CSV file like below, and put its contents in a 2D array:
"","x","y","sim1","sim2","sim3","sim4","sim5","sim6","sim7","sim8","sim9","sim10","sim11","sim12"
"1",181180,333740,5.56588745117188,6.29487752914429,7.4835410118103,5.75873327255249,6.62183284759521,5.81478500366211,4.85671949386597,5.90418815612793,6.32611751556396,6.99649047851562,6.52076387405396,5.68944215774536
"2",181140,333700,6.36264753341675,6.5217604637146,6.16843748092651,5.55328798294067,7.00429201126099,6.43625402450562,6.17744159698486,6.72836923599243,6.38574266433716,6.81451606750488,6.68060827255249,6.14339065551758
"3",181180,333700,6.16541910171509,6.44704437255859,7.51744651794434,5.46270132064819,6.8890323638916,6.46842670440674,6.07698059082031,6.2140531539917,6.43774271011353,6.21923875808716,6.43355655670166,5.90692138671875
To do this, I use this:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
But I always got this message:
"ValueError: could not convert string to float: "1"
I thought the problem was with the first column of each row. So, I tried to read it without the first column, but I couldn’t find out how.
So, how could I ignore the first column? Is there a way to read this file with the first column?
You can specify a converter for any column.
converters = {0: lambda s: float(s.strip('"')}
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, converters=converters)
Or, you can specify which columns to use, something like:
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,15))
http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
One way you can skip the first column, without knowing the number of columns, is to read the number of columns from the csv manually. It’s easy enough, although you may need to tweak this on occasion to account for formatting inconsistencies*.
with open("Data/sim.csv") as f:
ncols = len(f.readline().split(','))
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
*If there are blank lines at the top, you’ll need to skip them. If there may be commas in the field headers, you should count columns using the first data line instead. So, if you have specific problems, I can add some details to make the code more robust.
Trying reading csv file using csv library
import csv
def someFunc(fname):
with open(fname) as f:
reader = csv.reader(f)
i = 0
header = True
for row in reader:
if header:
header = False
continue
out[i] = [row[j] for j in range(len(columns))]
i += 1
return out
out will have the 2D array.
You could use pandas and read it as a DataFrame object. If you know the column that you do not want, just add a .drop
to the loading line:
a = pandas.read_csv("Data/sim.csv",sep=",")
a = a.drop(a.columns[0], axis=1)
The first row will be read as a header, but you can add a skiprows=1 in the read_csv parameter.
Pandas DataFrames are numpy arrays, so, converting columns or matrices to numpy arrays is pretty straightforward.
jmilloy and Deninhos’s answers are both good. If OP specifically wants to read in an NumPy array (as opposed to pandas dataframe), another simplistic alternative is to delete the index column after reading it in. This works when you know the index column is always the first, but number of features (columns) are flexible.
data = np.loadtxt("Data/sim.csv", delimiter=',', skiprows=1)
data = np.delete(data, 0, axis = 1)
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
data.append(float(row[1]))
import pandas
pandas_data = pandas.read_csv('filename', sep=",", header=0,index_col=0)
This worked for me
import pandas
data = pandas.read_csv("Data/sim.csv",sep=",").iloc[:,1:]
Read the csv
file like this.
df = pd.read_csv('file.csv', usecols=range(1, len(df.columns)))