Sum categories by unique values in list in python
Question:
I have this list:
[('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)]
How do you reproduce the same list syntax summing similar activities like "flight", taking the latest date as the date for the totals for each activity in the new list? Summing the integers associated.
Answers:
original_list = [('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)]
activity_totals = {}
for date, city, activity, num in original_list:
if activity not in activity_totals:
activity_totals[activity] = (date, num)
else:
latest_date, total_num = activity_totals[activity]
if date > latest_date:
activity_totals[activity] = (date, total_num + num)
else:
activity_totals[activity] = (latest_date, total_num + num)
new_list = [(latest_date, city, activity, total_num) for activity, (latest_date, total_num) in activity_totals.items()]
print(new_list)
Code:
lis=[('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)]
totals={}
for dt,loc,activ,num in lis:
if activ in totals:
totals[activ]['total']+=num
totals[activ]['latest_date']=max(totals[activ]['latest_date'],dt)
else:
totals[activ]={'total': num, 'latest_date': dt}
res=[(totals[activ]['latest_date'], loc, activ, totals[activ]['total']) for dt,loc,activ,num in lis if activ in totals]
print(res)
Output:
[('2023-03-21', 'paris', 'flight', 11),
('2023-03-21', 'berlin', 'flight', 11),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-21', 'amsterdam', 'flight', 11),
('2023-03-14', 'london', 'boat', 1)]
You didn’t really specify what you were looking for in the output, but I took it as you are trying to condense all the entries of a given key into the entry with the latest date and the sum of the integers.
import itertools
input_list = [('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)
key = 'flight'
sorted_list = sorted(input_list, key=lambda x: x[2])
groups = itertools.groupby(sorted_list, key=lambda x: x[2])
for k, g in groups:
if k == key:
group_list = list(g)
sum_value = sum(x[3] for x in group_list)
selected_value = list(max(group_list, key=lambda x: x[0]))
selected_value[-1] = sum_value
print(selected_value)
Output
['2023-03-21', 'berlin', 'flight', 11]
I have this list:
[('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)]
How do you reproduce the same list syntax summing similar activities like "flight", taking the latest date as the date for the totals for each activity in the new list? Summing the integers associated.
original_list = [('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)]
activity_totals = {}
for date, city, activity, num in original_list:
if activity not in activity_totals:
activity_totals[activity] = (date, num)
else:
latest_date, total_num = activity_totals[activity]
if date > latest_date:
activity_totals[activity] = (date, total_num + num)
else:
activity_totals[activity] = (latest_date, total_num + num)
new_list = [(latest_date, city, activity, total_num) for activity, (latest_date, total_num) in activity_totals.items()]
print(new_list)
Code:
lis=[('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)]
totals={}
for dt,loc,activ,num in lis:
if activ in totals:
totals[activ]['total']+=num
totals[activ]['latest_date']=max(totals[activ]['latest_date'],dt)
else:
totals[activ]={'total': num, 'latest_date': dt}
res=[(totals[activ]['latest_date'], loc, activ, totals[activ]['total']) for dt,loc,activ,num in lis if activ in totals]
print(res)
Output:
[('2023-03-21', 'paris', 'flight', 11),
('2023-03-21', 'berlin', 'flight', 11),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-21', 'amsterdam', 'flight', 11),
('2023-03-14', 'london', 'boat', 1)]
You didn’t really specify what you were looking for in the output, but I took it as you are trying to condense all the entries of a given key into the entry with the latest date and the sum of the integers.
import itertools
input_list = [('2023-03-15', 'paris', 'flight', 4),
('2023-03-21', 'berlin', 'flight', 2),
('2023-03-01', 'madrid', 'drive', 10),
('2023-03-04', 'madrid', 'cycling', 3),
('2023-03-08', 'rome', 'train', 9),
('2023-03-11', 'amsterdam', 'flight', 5),
('2023-03-14', 'london', 'boat', 1)
key = 'flight'
sorted_list = sorted(input_list, key=lambda x: x[2])
groups = itertools.groupby(sorted_list, key=lambda x: x[2])
for k, g in groups:
if k == key:
group_list = list(g)
sum_value = sum(x[3] for x in group_list)
selected_value = list(max(group_list, key=lambda x: x[0]))
selected_value[-1] = sum_value
print(selected_value)
Output
['2023-03-21', 'berlin', 'flight', 11]