Write a program to read mbox-short.txt and figure out who has sent the greatest number of mail messages

Question:

File: http://www.py4e.com/code3/mbox-short.txt
I had tried to filter the email addresses from this file but failed to filter the greatest number of mail messages.

name = input("Enter file:")
if len(name) < 1:
name = "mbox-short.txt"
handle = open(name)

di = dict ()
largest = -1
wds = None

for line in handle:
line = line.rstrip()
wds = line.split()

#Guardian
if len(wds) < 1:
      continue
# wds[0] !='from could blown up
if wds[0] != 'From':
        continue
print (wds[1])

for w in wds:
    di[w] = di.get(w,0) + 1
    #print(w,di[w])
    for k,v in di.items():
        #print(k,v) # print key & value (items/both)
        if v > largest:
            largest = v
            theword = w # capture/remember the word or key that was largest
        print('Largest:',largest,'The largest word:', theword)

I have tried to:
Create a dictionary to store the key and value of this file.
Used get function in the dictionary to create a new count if the email address is new and increase the count by one if the email address existed before.
Filtered all the email addresses from this file.
Tried to create a maximum loop to find out the count and the largest number of mail messages but failed.

I am expecting to get this result: [email protected] 5

Asked By: Wynnie

||

Answers:

You can use regular expressions with Python’s built-in re for this task. The following code matches every element of the form [Any letter (small and large caps) or dot]@[Any letter (small and large caps) or dot] and returns them in a list.

import re

with open("c:/trash/mbox-short.txt") as f:
    txt_file = f.read()

mail_regex = "[a-zA-Z\.]*@[a-zA-Z\.]*"
mails = re.findall(mail_regex, txt_file)
max(mails)

using the max function will give you the most frequent element of the list. This is a very primitive way of matching mail addresses, but I think in this case it works just fine.

Answered By: mafe

Just open the file, read it line by line looking for any line that starts with ‘From ‘. The email address is the second whitespace delimited token in the line. Construct a dictionary as you find email addresses where the key is the email address and the value is the number of occurrences

count = dict()

with open('mbox-short.txt') as data:
    for line in data:
        try:
            a, b, *_ = line.split()
            if a == 'From':
                count[b] = count.get(b, 0) + 1
        except ValueError:
            pass

print(*max(count.items(), key=lambda x: x[1]))

Output:

[email protected] 5
Answered By: DarkKnight
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.