How to get each group

Question

I’m trying to get each group starting with "BO_" using python regex.
(The data was from: https://github.com/commaai/opendbc)

Original Text:

...
BS_:

BU_: XXX CAMERA FRONT_RADAR ADRV APRK


BO_ 53 ACCELERATOR: 32 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 192|3@1+ (1,0) [0|7] "" XXX
 SG_ ACCELERATOR_PEDAL : 40|8@1+ (1,0) [0|255] "" XXX

BO_ 64 GEAR_ALT: 32 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 32|3@1+ (1,0) [0|7] "" XXX

BO_ 69 GEAR: 24 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 44|3@1+ (1,0) [0|7] "" XXX

...
CM_ SG_ 96 BRAKE_PRESSURE "User applied brake pedal pressure. Ramps from computer applied pressure on falling edge of cruise. Cruise cancels if !=0";
CM_ SG_ 101 BRAKE_POSITION "User applied brake pedal position, max is ~700. Signed on some vehicles";
CM_ SG_ 373 PROBABLY_EQUIP "aeb equip?";

I want to capture BO_ blocks whose BO id(BO_ ID) is in [53, 69] like this:

BO_ 53 ACCELERATOR: 32 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 192|3@1+ (1,0) [0|7] "" XXX
 SG_ ACCELERATOR_PEDAL : 40|8@1+ (1,0) [0|255] "" XXX

BO_ 69 GEAR: 24 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 44|3@1+ (1,0) [0|7] "" XXX

What I’ve tried so far was
1)capturing BO_ and relevant SGs using the regex below but it only captured each BO and the first SG groups.

BO_ (w+) (w+) *: (w+) (w+)n (SG_ (w+) : (d+)|(d+)@(d+)([+|-]) (([0-9.+-eE]+),([0-9.+-eE]+)) [([0-9.+-eE]+)|([0-9.+-eE]+)] "(.*)" (.*)n)*

using greedy method but it captured all BOs at once except the last occurence.

BO_ (w+) (w+) *: (w+) (w+)((.|n)*)BO_

Also, for selecting only BOs including digits in the list [53, 69], I used raw f-string method something like rf"{digit}" in regex expressions.

Asked By: Stella

||

Source

Answer 1

You can easily capture paragraphs with re.findall using the re.DOTALL (inline s) and re.MULTILINE (inline m) flags.

Regex (with inline flags): (?sm)BO_ (?:53|69) .+?^$

Usage (pick one):

re.findall(r"(?sm)BO_ (?:53|69) .+?^$", text)

re.findall(r"BO_ (?:53|69) .+?^$", text, flags=re.DOTALL | re.MULTILINE)

This lazily captures all lines from BO_ 53 or BO_ 69 to a blank line ^$ (demo).

Answered By: ljmc

Answer 2

Inside of jumping through hoops in order to parse a dbc file with regular expressions, I suggest you use a proper parser like cantools:

CAN BUS tools in Python 3.

DBC, KCD, SYM, ARXML 3&4 and CDD file parsing.

CAN message encoding and decoding.

Simple and extended signal multiplexing.

Diagnostic DID encoding and decoding.

candump output decoder.

Node tester_.

C source code generator.

CAN bus monitor.

Graphical plots of signals.

Answered By: Mahmoud

How to get each group

Question:

Answers: