Write a program to read mbox-short.txt and figure out who has sent the greatest number of mail messages
Question:
File: http://www.py4e.com/code3/mbox-short.txt
I had tried to filter the email addresses from this file but failed to filter the greatest number of mail messages.
name = input("Enter file:")
if len(name) < 1:
name = "mbox-short.txt"
handle = open(name)
di = dict ()
largest = -1
wds = None
for line in handle:
line = line.rstrip()
wds = line.split()
#Guardian
if len(wds) < 1:
continue
# wds[0] !='from could blown up
if wds[0] != 'From':
continue
print (wds[1])
for w in wds:
di[w] = di.get(w,0) + 1
#print(w,di[w])
for k,v in di.items():
#print(k,v) # print key & value (items/both)
if v > largest:
largest = v
theword = w # capture/remember the word or key that was largest
print('Largest:',largest,'The largest word:', theword)
I have tried to:
Create a dictionary to store the key and value of this file.
Used get function in the dictionary to create a new count if the email address is new and increase the count by one if the email address existed before.
Filtered all the email addresses from this file.
Tried to create a maximum loop to find out the count and the largest number of mail messages but failed.
I am expecting to get this result: [email protected] 5
Answers:
You can use regular expressions with Python’s built-in re
for this task. The following code matches every element of the form [Any letter (small and large caps) or dot]@[Any letter (small and large caps) or dot] and returns them in a list.
import re
with open("c:/trash/mbox-short.txt") as f:
txt_file = f.read()
mail_regex = "[a-zA-Z\.]*@[a-zA-Z\.]*"
mails = re.findall(mail_regex, txt_file)
max(mails)
using the max
function will give you the most frequent element of the list. This is a very primitive way of matching mail addresses, but I think in this case it works just fine.
Just open the file, read it line by line looking for any line that starts with ‘From ‘. The email address is the second whitespace delimited token in the line. Construct a dictionary as you find email addresses where the key is the email address and the value is the number of occurrences
count = dict()
with open('mbox-short.txt') as data:
for line in data:
try:
a, b, *_ = line.split()
if a == 'From':
count[b] = count.get(b, 0) + 1
except ValueError:
pass
print(*max(count.items(), key=lambda x: x[1]))
Output:
[email protected] 5
File: http://www.py4e.com/code3/mbox-short.txt
I had tried to filter the email addresses from this file but failed to filter the greatest number of mail messages.
name = input("Enter file:")
if len(name) < 1:
name = "mbox-short.txt"
handle = open(name)
di = dict ()
largest = -1
wds = None
for line in handle:
line = line.rstrip()
wds = line.split()
#Guardian
if len(wds) < 1:
continue
# wds[0] !='from could blown up
if wds[0] != 'From':
continue
print (wds[1])
for w in wds:
di[w] = di.get(w,0) + 1
#print(w,di[w])
for k,v in di.items():
#print(k,v) # print key & value (items/both)
if v > largest:
largest = v
theword = w # capture/remember the word or key that was largest
print('Largest:',largest,'The largest word:', theword)
I have tried to:
Create a dictionary to store the key and value of this file.
Used get function in the dictionary to create a new count if the email address is new and increase the count by one if the email address existed before.
Filtered all the email addresses from this file.
Tried to create a maximum loop to find out the count and the largest number of mail messages but failed.
I am expecting to get this result: [email protected] 5
You can use regular expressions with Python’s built-in re
for this task. The following code matches every element of the form [Any letter (small and large caps) or dot]@[Any letter (small and large caps) or dot] and returns them in a list.
import re
with open("c:/trash/mbox-short.txt") as f:
txt_file = f.read()
mail_regex = "[a-zA-Z\.]*@[a-zA-Z\.]*"
mails = re.findall(mail_regex, txt_file)
max(mails)
using the max
function will give you the most frequent element of the list. This is a very primitive way of matching mail addresses, but I think in this case it works just fine.
Just open the file, read it line by line looking for any line that starts with ‘From ‘. The email address is the second whitespace delimited token in the line. Construct a dictionary as you find email addresses where the key is the email address and the value is the number of occurrences
count = dict()
with open('mbox-short.txt') as data:
for line in data:
try:
a, b, *_ = line.split()
if a == 'From':
count[b] = count.get(b, 0) + 1
except ValueError:
pass
print(*max(count.items(), key=lambda x: x[1]))
Output:
[email protected] 5