How to sum values for the same key
Question:
I have a file
gu|8
gt|5
gr|5
gp|1
uk|2
gr|20
gp|98
uk|1
me|2
support|6
And I want to have one number per TLD like:
gr|25
gp|99
uk|3
me|2
support|6
gu|8
gt|5
and here is my code:
f = open(file,'r')
d={}
for line in f:
line = line.strip('n')
TLD,count = line.split('|')
d[TLD] = d.get(TLD)+count
print d
But I get this error:
d[TLD] = d.get(TLD)+count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Can anybody help?
Answers:
Taking a look at the full traceback:
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD) + count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
The error is telling us that we tried to add something of type NoneType
to something of type str
, which isn’t allowed in Python.
There’s only one object of type NoneType
, which, unsurprisingly, is None
– so we know that we tried to add a string to None
.
The two things we tried to add together in that line were d.get(TLD)
and count
, and looking at the documentation for dict.get()
, we see that what it does is
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None
, so that this method never raises a KeyError
.
Since we didn’t supply a default, d.get(TLD)
returned None
when it didn’t find TLD
in the dictionary, and we got the error attempting to add count
to it. So, let’s supply a default of 0
and see what happens:
f = open('data','r')
d={}
for line in f:
line = line.strip('n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + count
print d
$ python mee.py
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD, 0) + count
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Well, we’ve still got an error, but now the problem is that we’re trying to add a string to an integer, which is also not allowed, because it would be ambiguous.
That’s happening because line.split('|')
returns a list of strings – so we need to explicitly convert count
to an integer:
f = open('data','r')
d={}
for line in f:
line = line.strip('n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + int(count)
print d
… and now it works:
$ python mee.py
{'me': 2, 'gu': 8, 'gt': 5, 'gr': 25, 'gp': 99, 'support': 6, 'uk': 3}
Turning that dictionary back into the file output you want is a separate issue (and not attempted by your code), so I’ll leave you to work on that.
To answer the title of your question: “how to sum values for the same key” – well, there is the builtin class called collections.Counter
that is a perfect match for you:
import collections
d = collections.Counter()
with open(file) as f:
tld, cnt = line.strip().split('|')
d[tld] += int(cnt)
then to write back:
with open(file, 'w') as f:
for tld, cnt in sorted(d.items()):
print >> f, "%s|%d" % (tld, cnt)
I have a file
gu|8
gt|5
gr|5
gp|1
uk|2
gr|20
gp|98
uk|1
me|2
support|6
And I want to have one number per TLD like:
gr|25
gp|99
uk|3
me|2
support|6
gu|8
gt|5
and here is my code:
f = open(file,'r')
d={}
for line in f:
line = line.strip('n')
TLD,count = line.split('|')
d[TLD] = d.get(TLD)+count
print d
But I get this error:
d[TLD] = d.get(TLD)+count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
Can anybody help?
Taking a look at the full traceback:
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD) + count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
The error is telling us that we tried to add something of type NoneType
to something of type str
, which isn’t allowed in Python.
There’s only one object of type NoneType
, which, unsurprisingly, is None
– so we know that we tried to add a string to None
.
The two things we tried to add together in that line were d.get(TLD)
and count
, and looking at the documentation for dict.get()
, we see that what it does is
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to
None
, so that this method never raises aKeyError
.
Since we didn’t supply a default, d.get(TLD)
returned None
when it didn’t find TLD
in the dictionary, and we got the error attempting to add count
to it. So, let’s supply a default of 0
and see what happens:
f = open('data','r')
d={}
for line in f:
line = line.strip('n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + count
print d
$ python mee.py
Traceback (most recent call last):
File "mee.py", line 6, in <module>
d[TLD] = d.get(TLD, 0) + count
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Well, we’ve still got an error, but now the problem is that we’re trying to add a string to an integer, which is also not allowed, because it would be ambiguous.
That’s happening because line.split('|')
returns a list of strings – so we need to explicitly convert count
to an integer:
f = open('data','r')
d={}
for line in f:
line = line.strip('n')
TLD, count = line.split('|')
d[TLD] = d.get(TLD, 0) + int(count)
print d
… and now it works:
$ python mee.py
{'me': 2, 'gu': 8, 'gt': 5, 'gr': 25, 'gp': 99, 'support': 6, 'uk': 3}
Turning that dictionary back into the file output you want is a separate issue (and not attempted by your code), so I’ll leave you to work on that.
To answer the title of your question: “how to sum values for the same key” – well, there is the builtin class called collections.Counter
that is a perfect match for you:
import collections
d = collections.Counter()
with open(file) as f:
tld, cnt = line.strip().split('|')
d[tld] += int(cnt)
then to write back:
with open(file, 'w') as f:
for tld, cnt in sorted(d.items()):
print >> f, "%s|%d" % (tld, cnt)