How to get each group

Question:

I’m trying to get each group starting with "BO_" using python regex.
(The data was from: https://github.com/commaai/opendbc)

Original Text:

...
BS_:

BU_: XXX CAMERA FRONT_RADAR ADRV APRK


BO_ 53 ACCELERATOR: 32 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 192|3@1+ (1,0) [0|7] "" XXX
 SG_ ACCELERATOR_PEDAL : 40|8@1+ (1,0) [0|255] "" XXX

BO_ 64 GEAR_ALT: 32 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 32|3@1+ (1,0) [0|7] "" XXX

BO_ 69 GEAR: 24 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 44|3@1+ (1,0) [0|7] "" XXX

...
CM_ SG_ 96 BRAKE_PRESSURE "User applied brake pedal pressure. Ramps from computer applied pressure on falling edge of cruise. Cruise cancels if !=0";
CM_ SG_ 101 BRAKE_POSITION "User applied brake pedal position, max is ~700. Signed on some vehicles";
CM_ SG_ 373 PROBABLY_EQUIP "aeb equip?";

I want to capture BO_ blocks whose BO id(BO_ ID) is in [53, 69] like this:

BO_ 53 ACCELERATOR: 32 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 192|3@1+ (1,0) [0|7] "" XXX
 SG_ ACCELERATOR_PEDAL : 40|8@1+ (1,0) [0|255] "" XXX

BO_ 69 GEAR: 24 XXX
 SG_ CHECKSUM : 0|16@1+ (1,0) [0|65535] "" XXX
 SG_ COUNTER : 16|8@1+ (1,0) [0|255] "" XXX
 SG_ GEAR : 44|3@1+ (1,0) [0|7] "" XXX

What I’ve tried so far was
1)capturing BO_ and relevant SGs using the regex below but it only captured each BO and the first SG groups.

BO_ (w+) (w+) *: (w+) (w+)n (SG_ (w+) : (d+)|(d+)@(d+)([+|-]) (([0-9.+-eE]+),([0-9.+-eE]+)) [([0-9.+-eE]+)|([0-9.+-eE]+)] "(.*)" (.*)n)*
  1. using greedy method but it captured all BOs at once except the last occurence.
BO_ (w+) (w+) *: (w+) (w+)((.|n)*)BO_ 

Also, for selecting only BOs including digits in the list [53, 69], I used raw f-string method something like rf"{digit}" in regex expressions.

Asked By: Stella

||

Answers:

You can easily capture paragraphs with re.findall using the re.DOTALL (inline s) and re.MULTILINE (inline m) flags.

Regex (with inline flags): (?sm)BO_ (?:53|69) .+?^$

Usage (pick one):

re.findall(r"(?sm)BO_ (?:53|69) .+?^$", text)
re.findall(r"BO_ (?:53|69) .+?^$", text, flags=re.DOTALL | re.MULTILINE)

This lazily captures all lines from BO_ 53 or BO_ 69 to a blank line ^$ (demo).

Answered By: ljmc

Inside of jumping through hoops in order to parse a dbc file with regular expressions, I suggest you use a proper parser like cantools:

CAN BUS tools in Python 3.

  • DBC, KCD, SYM, ARXML 3&4 and CDD file parsing.
  • CAN message encoding and decoding.
  • Simple and extended signal multiplexing.
  • Diagnostic DID encoding and decoding.
  • candump output decoder.
  • Node tester_.
  • C source code generator.
  • CAN bus monitor.
  • Graphical plots of signals.
Answered By: Mahmoud
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.