Parse list of strings and find max values
Question:
I’m quite new to Python and struggling to get my head round the logic in this for loop. My data has two values, a city and a temp. I would like to write a "for loop" that outputs the maximum temp for each city as follows:
PAR 31
LON 23
RIO 36
DUB 44
As it is to be used in Hadoop, I can’t use any python libraries.
Here is my dataset:
['PAR,31',
'PAR,18',
'PAR,14',
'PAR,18',
'LON,12',
'LON,13',
'LON,9',
'LON,23',
'LON,5',
'RIO,36',
'RIO,33',
'RIO,21',
'RIO,25',
'DUB,44',
'DUB,42',
'DUB,38',
'DUB,34']
This is my code:
current_city = None
current_max = 0
for line in lines:
(city, temp) = line.split(',')
temp = int(temp)
if city == current_city:
if current_max < temp:
current_max == temp
current_city = city
print(current_city, current_max)
This was my output:
DUB 0
Answers:
You could iterate over your list. Separate your data. Check if the City is already in the dictionary. If so check if the temp is higher as the one saved in the dictionary if that’s the case replace the entry in the dictionary.
If the city isn’t in the dictionary simply add it into the dictionary.
a = ['PAR,31',
'PAR,18',
'PAR,14',
'PAR,18',
'LON,12',
'LON,13',
'LON,9',
'LON,23',
'LON,5',
'RIO,36',
'RIO,33',
'RIO,21',
'RIO,25',
'DUB,44',
'DUB,42',
'DUB,38',
'DUB,34']
dict = {}
for entry in a:
city,temp = entry.split(",")
if city in dict.keys():
if dict[city] < int(temp):
dict[city] = int(temp)
else:
dict[city] = int(temp)
print(dict)
Output:
{'PAR': 31, 'LON': 23, 'RIO': 36, 'DUB': 44}
Build a dictionary keyed on city names. The associated values should be a list of integers (the temperatures).
Once the dictionary has been constructed you can then iterate over its items to determine the highest value in each list of temperatures,
data = ['PAR,31',
'PAR,18',
'PAR,14',
'PAR,18',
'LON,12',
'LON,13',
'LON,9',
'LON,23',
'LON,5',
'RIO,36',
'RIO,33',
'RIO,21',
'RIO,25',
'DUB,44',
'DUB,42',
'DUB,38',
'DUB,34']
d = {}
for e in data:
city, temp = e.split(',')
d.setdefault(city, []).append(temp)
for k, v in d.items():
print(k, max(map(int, v)))
Output:
PAR 31
LON 23
RIO 36
DUB 44
Given the answers here are a bit verbose…
result = {}
for city, t in (l.split(',') for l in lines):
t = int(t)
result[city] = max(result.setdefault(city, t), t)
# you can print result however you like, e.g.:
for c, t in result.items():
print(f"{c} {t}")
If you want to sacrifice a bit of readability for ~30% performance boost, compare values yourself instead of calling max
:
for city, t in (l.split(',') for l in lines):
t = int(t)
old_t = result.setdefault(city, t)
result[city] = old_t if old_t > t else t
I’m quite new to Python and struggling to get my head round the logic in this for loop. My data has two values, a city and a temp. I would like to write a "for loop" that outputs the maximum temp for each city as follows:
PAR 31
LON 23
RIO 36
DUB 44
As it is to be used in Hadoop, I can’t use any python libraries.
Here is my dataset:
['PAR,31',
'PAR,18',
'PAR,14',
'PAR,18',
'LON,12',
'LON,13',
'LON,9',
'LON,23',
'LON,5',
'RIO,36',
'RIO,33',
'RIO,21',
'RIO,25',
'DUB,44',
'DUB,42',
'DUB,38',
'DUB,34']
This is my code:
current_city = None
current_max = 0
for line in lines:
(city, temp) = line.split(',')
temp = int(temp)
if city == current_city:
if current_max < temp:
current_max == temp
current_city = city
print(current_city, current_max)
This was my output:
DUB 0
You could iterate over your list. Separate your data. Check if the City is already in the dictionary. If so check if the temp is higher as the one saved in the dictionary if that’s the case replace the entry in the dictionary.
If the city isn’t in the dictionary simply add it into the dictionary.
a = ['PAR,31',
'PAR,18',
'PAR,14',
'PAR,18',
'LON,12',
'LON,13',
'LON,9',
'LON,23',
'LON,5',
'RIO,36',
'RIO,33',
'RIO,21',
'RIO,25',
'DUB,44',
'DUB,42',
'DUB,38',
'DUB,34']
dict = {}
for entry in a:
city,temp = entry.split(",")
if city in dict.keys():
if dict[city] < int(temp):
dict[city] = int(temp)
else:
dict[city] = int(temp)
print(dict)
Output:
{'PAR': 31, 'LON': 23, 'RIO': 36, 'DUB': 44}
Build a dictionary keyed on city names. The associated values should be a list of integers (the temperatures).
Once the dictionary has been constructed you can then iterate over its items to determine the highest value in each list of temperatures,
data = ['PAR,31',
'PAR,18',
'PAR,14',
'PAR,18',
'LON,12',
'LON,13',
'LON,9',
'LON,23',
'LON,5',
'RIO,36',
'RIO,33',
'RIO,21',
'RIO,25',
'DUB,44',
'DUB,42',
'DUB,38',
'DUB,34']
d = {}
for e in data:
city, temp = e.split(',')
d.setdefault(city, []).append(temp)
for k, v in d.items():
print(k, max(map(int, v)))
Output:
PAR 31
LON 23
RIO 36
DUB 44
Given the answers here are a bit verbose…
result = {}
for city, t in (l.split(',') for l in lines):
t = int(t)
result[city] = max(result.setdefault(city, t), t)
# you can print result however you like, e.g.:
for c, t in result.items():
print(f"{c} {t}")
If you want to sacrifice a bit of readability for ~30% performance boost, compare values yourself instead of calling max
:
for city, t in (l.split(',') for l in lines):
t = int(t)
old_t = result.setdefault(city, t)
result[city] = old_t if old_t > t else t