extract text lines between two lines with text marks using regex

Question:

I have a text file like this:

## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

## USA
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

## ESP
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

I need to extract just the lines for a specific country using regex and python, for example:

## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

Note: There is no key or value that identifies the country, only those text marks line from the previous example

I try this regex without success:

(?<=## COL).*[ws]*(?=##})

Thanks in advance!

Asked By: xarc

||

Answers:

With a regex:

import re

m = re.search(r'^## COLn(?:(?!##).)+', text, flags=re.S)

if m:
    print(m.group())

More efficient alternative:

m = re.search(r'^## COLn(?:(?:(?!##).*)n)+', text).group()

Output:

## COL
{ "Id": 1, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
{ "Id": 1, "key1": "value1", "key2": "valueC", ... "keyN": "valueN"}
{ "Id": 2, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueA", ... "keyN": "valueN"}
{ "Id": 3, "key1": "value1", "key2": "valueB", ... "keyN": "valueN"}
                               .
                               .
                               .
{ "Id": n, "key1": "value1", "key2": "valueZ", ... "keyN": "valueN"}

regex demo option 1

regex demo alternative (with blank lines)

Answered By: mozway

What about ## COL[^#]* ? It should be sufficient to match the requested pattern ? No look ahead or behind necessary.

See https://regex101.com/r/pc0iaV/1 for demonstration that it works.

Answered By: Claudio

Without the re.S flag you can write the pattern as:

^## COL(?:n(?!## ).*)*

Explanation

  • ^ Start of string
  • ## COL Match literally
  • (?: Non capture group
    • n(?!## ).* Match a newline and match the whole line if it does not start with ##
  • )* Close the non capture group and optionally repeat it

See a regex demo.

Answered By: The fourth bird
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.