Sum a csv column in python
Question:
I’m trying to make a sum of a column in a csv file. The file looks like:
Date Value
2012-11-20 12
2012-11-21 10
2012-11-22 3
This can be in the range of hundreds of rows. I need to get the total of Value (in this case it would be 25) printed on to a terminal. I so far have some code but it’s resulting in a much smaller figure than it should sum. When troubleshooting it, I did a print of the sum and realized that instead of summing 12 + 10 + 3, it actually breaks the numbers in each column and sums as 1 + 2 + 1 + 0 + 3, which obviously equals to a much smaller total. Here’s my code, if someone could make a recommendation would be great!
with open("file.csv")) as fin:
headerline = fin.next()
total = 0
for row in csv.reader(fin):
print col # for troubleshooting
for col in row[1]:
total += int(col)
print total
Answers:
The csv
module loops over your rows one by one, there is no need to then loop over the column. Just sum int(row[1])
:
with open("file.csv") as fin:
headerline = next(fin)
total = 0
for row in csv.reader(fin):
total += int(row[1])
print(total)
You can use a shortcut with a generator expression and the sum()
built-in function:
with open("file.csv") as fin:
next(fin)
total = sum(int(r[1]) for r in csv.reader(fin))
Note that in Python, strings are sequences too, so when you do for col in row[1]:
you are looping over the individual characters of row[1]
; so for your first row that’d be 1
and 2
:
>>> for c in '123':
... print(repr(c))
...
'1'
'2'
'3'
You can use pandas instead.
import pandas as pd
df2=pd.read_csv('file.csv')
df2['Value'].sum()
import csv
csv_file = 'file.csv'
with open(csv_file) as f:
total = sum(int(r['Value']) for r in csv.DictReader(f))
I’m trying to make a sum of a column in a csv file. The file looks like:
Date Value
2012-11-20 12
2012-11-21 10
2012-11-22 3
This can be in the range of hundreds of rows. I need to get the total of Value (in this case it would be 25) printed on to a terminal. I so far have some code but it’s resulting in a much smaller figure than it should sum. When troubleshooting it, I did a print of the sum and realized that instead of summing 12 + 10 + 3, it actually breaks the numbers in each column and sums as 1 + 2 + 1 + 0 + 3, which obviously equals to a much smaller total. Here’s my code, if someone could make a recommendation would be great!
with open("file.csv")) as fin:
headerline = fin.next()
total = 0
for row in csv.reader(fin):
print col # for troubleshooting
for col in row[1]:
total += int(col)
print total
The csv
module loops over your rows one by one, there is no need to then loop over the column. Just sum int(row[1])
:
with open("file.csv") as fin:
headerline = next(fin)
total = 0
for row in csv.reader(fin):
total += int(row[1])
print(total)
You can use a shortcut with a generator expression and the sum()
built-in function:
with open("file.csv") as fin:
next(fin)
total = sum(int(r[1]) for r in csv.reader(fin))
Note that in Python, strings are sequences too, so when you do for col in row[1]:
you are looping over the individual characters of row[1]
; so for your first row that’d be 1
and 2
:
>>> for c in '123':
... print(repr(c))
...
'1'
'2'
'3'
You can use pandas instead.
import pandas as pd
df2=pd.read_csv('file.csv')
df2['Value'].sum()
import csv
csv_file = 'file.csv'
with open(csv_file) as f:
total = sum(int(r['Value']) for r in csv.DictReader(f))