How to sum values for the same key

Question:

I have a file

gu|8
gt|5
gr|5
gp|1
uk|2
gr|20
gp|98
uk|1
me|2
support|6

And I want to have one number per TLD like:

 gr|25
 gp|99
 uk|3
 me|2
 support|6
 gu|8
 gt|5

and here is my code:

f = open(file,'r')
d={}
for line in f:
    line = line.strip('n')
    TLD,count = line.split('|')
    d[TLD] = d.get(TLD)+count

print d

But I get this error:

    d[TLD] = d.get(TLD)+count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Can anybody help?

Asked By: UserYmY

||

Answers:

Taking a look at the full traceback:

Traceback (most recent call last):
  File "mee.py", line 6, in <module>
    d[TLD] = d.get(TLD) + count
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

The error is telling us that we tried to add something of type NoneType to something of type str, which isn’t allowed in Python.

There’s only one object of type NoneType, which, unsurprisingly, is None – so we know that we tried to add a string to None.

The two things we tried to add together in that line were d.get(TLD) and count, and looking at the documentation for dict.get(), we see that what it does is

Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.

Since we didn’t supply a default, d.get(TLD) returned None when it didn’t find TLD in the dictionary, and we got the error attempting to add count to it. So, let’s supply a default of 0 and see what happens:

f = open('data','r')
d={}
for line in f:
    line = line.strip('n')
    TLD, count = line.split('|')
    d[TLD] = d.get(TLD, 0) + count

print d
$ python mee.py
Traceback (most recent call last):
  File "mee.py", line 6, in <module>
    d[TLD] = d.get(TLD, 0) + count
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Well, we’ve still got an error, but now the problem is that we’re trying to add a string to an integer, which is also not allowed, because it would be ambiguous.

That’s happening because line.split('|') returns a list of strings – so we need to explicitly convert count to an integer:

f = open('data','r')
d={}
for line in f:
    line = line.strip('n')
    TLD, count = line.split('|')
    d[TLD] = d.get(TLD, 0) + int(count)

print d

… and now it works:

$ python mee.py 
{'me': 2, 'gu': 8, 'gt': 5, 'gr': 25, 'gp': 99, 'support': 6, 'uk': 3}

Turning that dictionary back into the file output you want is a separate issue (and not attempted by your code), so I’ll leave you to work on that.

Answered By: Zero Piraeus

To answer the title of your question: “how to sum values for the same key” – well, there is the builtin class called collections.Counter that is a perfect match for you:

import collections
d = collections.Counter()
with open(file) as f:
    tld, cnt = line.strip().split('|')
    d[tld] += int(cnt)

then to write back:

with open(file, 'w') as f:
    for tld, cnt in sorted(d.items()):
        print >> f, "%s|%d" % (tld, cnt)
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.