Match pattern and put following lines into data structure

Question:

I have a data feed that I download on a regular bases into a csv. It looks like this

TABLE # 196712 / 9000_
>= 10   : 0.002
>= 5    : 0.001
>= 2    : 0.0005
>= 1    : 0.0002
>= 0.5  : 0.0001
>= 0.2  : 0.0001
>= 0.1  : 0.0001
>= 0.0001   : 0.0001
TABLE # 196714 / Dark
>= 0.0001   : 5e-05
TABLE # 196715 / GBD
>= 25   : 0.01
>= 10   : 0.005
>= 5    : 0.0025
>= 0.1  : 0.001
>= 0.0005   : 0.005

I would like to parse the file and categorize the data into a dictionary, where the number after the hash is a unique id (the new dict key) and the following rows (starting with >=) are volumes plus associated penalty values.

s.th like this would work:

{196712: [(10,0.002),(5,0.001),(2,0.0005),(1,0.0002),(0.5,0.0001),(0.2,0.0001),(0.1,0.0001),(0.0001, 0.0001)], 
 196714: [(0.0001,5e-05)], 
 196715: [(25,0.01),(10,0.005),(5,0.0025),(0.1,0.001),(0.0005,0.005)]}

What I would do to filter it outside python would be a grep and get the following lines, however the varying number of lines between IDs makes it more complex. Any other suggested more convenient data structure could be used as well.

Asked By: ThatQuantDude

||

Answers:

Try:

s = """
TABLE # 196712 / 9000_
>= 10   : 0.002
>= 5    : 0.001
>= 2    : 0.0005
>= 1    : 0.0002
>= 0.5  : 0.0001
>= 0.2  : 0.0001
>= 0.1  : 0.0001
>= 0.0001   : 0.0001
TABLE # 196714 / Dark
>= 0.0001   : 5e-05
TABLE # 196715 / GBD
>= 25   : 0.01
>= 10   : 0.005
>= 5    : 0.0025
>= 0.1  : 0.001
>= 0.0005   : 0.005"""

import re

out = {}
for table, data in re.findall(
    r"^TABLE # (d+).*?n(.*?)(?=^TABLE|Z)", s, flags=re.M | re.S
):
    table = int(table)
    for a, b in re.findall(r"([de.+-]+)s*:s*([de.+-]+)", data):
        out.setdefault(table, []).append((float(a), float(b)))

print(out)

Prints:

{
    196712: [
        (10.0, 0.002),
        (5.0, 0.001),
        (2.0, 0.0005),
        (1.0, 0.0002),
        (0.5, 0.0001),
        (0.2, 0.0001),
        (0.1, 0.0001),
        (0.0001, 0.0001),
    ],
    196714: [(0.0001, 5e-05)],
    196715: [
        (25.0, 0.01),
        (10.0, 0.005),
        (5.0, 0.0025),
        (0.1, 0.001),
        (0.0005, 0.005),
    ],
}
Answered By: Andrej Kesely
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.