Parsing data containing escaped quotes and separators in python

Question:

I have data that is structured like this:

1661171420, foo="bar", test="This, is a "TEST"", count=5, com="foo, bar=blah"

It always starts with a unix timestamp, but then I can’t know how many other fields follow and how they are called.

The goal is to parse this into a dictionary as such:

{"timestamp": 1661171420,
 "foo": "bar",
 "test": 'This, is a "TEST"',
 "count": 5,
 "com": "foo, bar=blah"}

I’m having trouble parsing this, especially regarding the escaped quotes and commas in the values.
What would be the best way to parse this correctly? preferably without any 3rd party modules.

Asked By: Freddie Mercury

||

Answers:

If changing the format of input data is not an option (JSON would be much easier to handle, but if it is an API as you say then you might be stuck with this) the following would work assuming the file follows given structure more or less. Not the cleanest solution, I agree, but it does the job.

import re

d = r'''1661171420, foo="bar", test="This, is a "TEST"", count=5, com="foo, bar=blah", fraction=-0.11'''.replace(r""", "'''")

string_pattern = re.compile(r'''(w+)="([^"]*)"''')

matches = re.finditer(string_pattern, d)

parsed_data = {}
parsed_data['timestamp'] = int(d.partition(", ")[0])
for match in matches:
    parsed_data[match.group(1)] = match.group(2).replace("'''", """)

number_pattern = re.compile(r'''(w+)=([+-]?d+(?:.d+)?)''')

matches = re.finditer(number_pattern, d)
for match in matches:
    try:
        parsed_data[match.group(1)] = int(match.group(2))
    except ValueError:
        parsed_data[match.group(1)] = float(match.group(2))

print(parsed_data)
Answered By: matszwecja
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.