how to make using regex dict with sum of values instead of their overwrite
Question:
I’m new in python. I have a log file with content like this:
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000
I need to make a dict which will content an employees as a key and a sum of his prices like {"Oleg": 1250}
i used this code to make lis of employees:
with open ("17082022.log", "r") as file:
text = file.read()
emp_list = set(re.findall(r'employee:(.*)/', text))
and this to make list of prices
output_pluses = re.findall(r"(?<=price:)[+-]?d+", text)
Answers:
You can use re.findall
with capturing groups to get employee name and price in one step. Next, create a dictionary:
import re
log = """
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000"""
out = {}
for employee, price in re.findall(r"employee:([^/]+)/price:(d+)", log):
out[employee] = out.get(employee, 0) + int(price)
print(out)
Prints:
{'Oleg': 1250, 'Serega': 750, 'Sasha': 2950}
Another option is to use the .split()
function. The advantage is that this way it is not necessary to import the re
module and use advanced knowledge about designing regular expressions:
log = """
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000"""
dct = {}
for line in log.split('n'):
employee, price = line.split('/employee:')[1].split('/price:')
dct[employee] = dct.get(employee, 0) + int(price)
print(dct) # gives {'Oleg': 1250, 'Serega': 750, 'Sasha': 2950}
The ‘trick’ with short dct.get(employee, 0)
code is that if the employee
isn’t yet in dictionary the value 0
will be returned as price, what is equivalent to (dct[employee] if employee in dct else 0)
what is then a shortened version of an if-statement going over multiple lines.
Another advantage of using the .split()
approach over regular expression search is that it will most probably result in a notification with an error message if the lines in the log-file have an unexpected format or content, where the regular expression search approach will just deliver a (wrong) result.
For extremely large log-files the regular expression search approach runs about 10% faster, but for small log-files time required for loading the re
module makes it much slower compared to the .split()
approach.
I’m new in python. I have a log file with content like this:
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000
I need to make a dict which will content an employees as a key and a sum of his prices like {"Oleg": 1250}
i used this code to make lis of employees:
with open ("17082022.log", "r") as file:
text = file.read()
emp_list = set(re.findall(r'employee:(.*)/', text))
and this to make list of prices
output_pluses = re.findall(r"(?<=price:)[+-]?d+", text)
You can use re.findall
with capturing groups to get employee name and price in one step. Next, create a dictionary:
import re
log = """
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000"""
out = {}
for employee, price in re.findall(r"employee:([^/]+)/price:(d+)", log):
out[employee] = out.get(employee, 0) + int(price)
print(out)
Prints:
{'Oleg': 1250, 'Serega': 750, 'Sasha': 2950}
Another option is to use the .split()
function. The advantage is that this way it is not necessary to import the re
module and use advanced knowledge about designing regular expressions:
log = """
[14:43:28]Toyota Camry/BH1488XO/service:complex/employee:Oleg/price:550
[15:56:15]Nissan Almera/BE0348CH/service:outside+interior/employee:Serega/price:450
[15:59:44]VW Amarok /BH138E/service:complex/employee:Oleg/price:700
[16:00:48]BMW X7/BH1155HH/service:2-phase complex+plastic /employee:Sasha/price:1400
[16:02:38]Jeep Renegade/BE6782IK/service:wash/employee:Serega/price:300
[16:03:19]MB C300/BT4500BT/service:complex/employee:Sasha/price:550
[16:04:19]MB C200/BT4400HT/service:complex/employee:Sasha/price:1000"""
dct = {}
for line in log.split('n'):
employee, price = line.split('/employee:')[1].split('/price:')
dct[employee] = dct.get(employee, 0) + int(price)
print(dct) # gives {'Oleg': 1250, 'Serega': 750, 'Sasha': 2950}
The ‘trick’ with short dct.get(employee, 0)
code is that if the employee
isn’t yet in dictionary the value 0
will be returned as price, what is equivalent to (dct[employee] if employee in dct else 0)
what is then a shortened version of an if-statement going over multiple lines.
Another advantage of using the .split()
approach over regular expression search is that it will most probably result in a notification with an error message if the lines in the log-file have an unexpected format or content, where the regular expression search approach will just deliver a (wrong) result.
For extremely large log-files the regular expression search approach runs about 10% faster, but for small log-files time required for loading the re
module makes it much slower compared to the .split()
approach.