Extract csv file specific columns to list in Python
Question:
What I’m trying to do is plot the latitude and longitude values of specific storms on a map using matplotlib,basemap,python, etc. My problem is that I’m trying to extract the latitude, longitude, and name of the storms on map but I keep getting errors between lines 41-44 where I try to extract the columns into the list.
Here is what the file looks like:
1957,AUDREY,HU, 21.6N, 93.3W
1957,AUDREY,HU,22.0N, 93.4W
1957,AUDREY,HU,22.6N, 93.5W
1957,AUDREY,HU,23.2N, 93.6W
I want the list to look like the following:
latitude = [21.6N,22.0N,23.4N]
longitude = [93.3W, 93.5W,93.8W]
name = ["Audrey","Audrey"]
Here’s what I have so far:
data = np.loadtxt('louisianastormb.csv',dtype=np.str,delimiter=',',skiprows=1)
'''print data'''
data = np.loadtxt('louisianastormb.csv',dtype=np.str,delimiter=',',skiprows=0)
f= open('louisianastormb.csv', 'rb')
reader = csv.reader(f, delimiter=',')
header = reader.next()
zipped = zip(*reader)
latitude = zipped[3]
longitude = zipped[4]
names = zipped[1]
x, y = m(longitude, latitude)
Here’s the last error message/traceback I received:
Traceback (most recent call last):
File "/home/darealmzd/lstorms.py", line 42, in
header = reader.next()
_csv.Error: new-line character seen in unquoted field – do you need to open the file in universal-newline mode?
Answers:
This looks like a problem with line endings in your code. If you’re going to be using all these other scientific packages, you may as well use Pandas for the CSV reading part, which is both more robust and more useful than just the csv
module:
import pandas
colnames = ['year', 'name', 'city', 'latitude', 'longitude']
data = pandas.read_csv('test.csv', names=colnames)
If you want your lists as in the question, you can now do:
names = data.name.tolist()
latitude = data.latitude.tolist()
longitude = data.longitude.tolist()
A standard-lib version (no pandas)
This assumes that the first row of the csv is the headers
import csv
# open the file in universal line ending mode
with open('test.csv', 'rU') as infile:
# read the file as a dictionary for each row ({header : value})
reader = csv.DictReader(infile)
data = {}
for row in reader:
for header, value in row.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
# extract the variables you want
names = data['name']
latitude = data['latitude']
longitude = data['longitude']
import csv
from sys import argv
d = open("mydata.csv", "r")
db = []
for line in csv.reader(d):
db.append(line)
# the rest of your code with 'db' filled with your list of lists as rows and columbs of your csv file.
What I’m trying to do is plot the latitude and longitude values of specific storms on a map using matplotlib,basemap,python, etc. My problem is that I’m trying to extract the latitude, longitude, and name of the storms on map but I keep getting errors between lines 41-44 where I try to extract the columns into the list.
Here is what the file looks like:
1957,AUDREY,HU, 21.6N, 93.3W
1957,AUDREY,HU,22.0N, 93.4W
1957,AUDREY,HU,22.6N, 93.5W
1957,AUDREY,HU,23.2N, 93.6W
I want the list to look like the following:
latitude = [21.6N,22.0N,23.4N]
longitude = [93.3W, 93.5W,93.8W]
name = ["Audrey","Audrey"]
Here’s what I have so far:
data = np.loadtxt('louisianastormb.csv',dtype=np.str,delimiter=',',skiprows=1)
'''print data'''
data = np.loadtxt('louisianastormb.csv',dtype=np.str,delimiter=',',skiprows=0)
f= open('louisianastormb.csv', 'rb')
reader = csv.reader(f, delimiter=',')
header = reader.next()
zipped = zip(*reader)
latitude = zipped[3]
longitude = zipped[4]
names = zipped[1]
x, y = m(longitude, latitude)
Here’s the last error message/traceback I received:
Traceback (most recent call last):
File "/home/darealmzd/lstorms.py", line 42, inheader = reader.next()
_csv.Error: new-line character seen in unquoted field – do you need to open the file in universal-newline mode?
This looks like a problem with line endings in your code. If you’re going to be using all these other scientific packages, you may as well use Pandas for the CSV reading part, which is both more robust and more useful than just the csv
module:
import pandas
colnames = ['year', 'name', 'city', 'latitude', 'longitude']
data = pandas.read_csv('test.csv', names=colnames)
If you want your lists as in the question, you can now do:
names = data.name.tolist()
latitude = data.latitude.tolist()
longitude = data.longitude.tolist()
A standard-lib version (no pandas)
This assumes that the first row of the csv is the headers
import csv
# open the file in universal line ending mode
with open('test.csv', 'rU') as infile:
# read the file as a dictionary for each row ({header : value})
reader = csv.DictReader(infile)
data = {}
for row in reader:
for header, value in row.items():
try:
data[header].append(value)
except KeyError:
data[header] = [value]
# extract the variables you want
names = data['name']
latitude = data['latitude']
longitude = data['longitude']
import csv
from sys import argv
d = open("mydata.csv", "r")
db = []
for line in csv.reader(d):
db.append(line)
# the rest of your code with 'db' filled with your list of lists as rows and columbs of your csv file.