Regex return match plus string up until next match

Question:

Goal: Break text into a list based on a numeric or decimal match that retrieves all text up until, but not including the next match. Language/version: Python 3.8.5 using python re.findall() and I’m open to alternate suggestions.

Text example (yes, it’s all on one line):

 1 Something Interesting here 2 More interesting text 2.1 An example of 2C19 a header 2.3 Another header example 2.4 another interesting header 10.1 header stuff  14 the last interesting 3A4 header

Goal Output:

['1 Something Interesting here',
'2 More interesting text',
'2.1 An example of 2C19 a header',
'2.3 Another header example',
'2.4 another interesting header',
'10.1 header stuff',
'14 the last interesting 3A4 header'
]

I can identify most of the appropriate integer/decimal starting points using:

(d+.d+)|([^a-zA-Z]dd)|( d )

I’m struggling to find a way to return the text between the matches plus the match itself.

To save you some time, here’s my Regex sandbox

Thank you kindly

Asked By: HamiltonPharmD

||

Answers:

You can use positive lookahead expressions to match until the next match.

Here is the updated regex (sandbox):

b(?:d+(?:.d+)?)b.*?(?=b(?:d+(?:.d+)?)b|$)

In python:

regex = r'b(?:d+(?:.d+)?)b.*?(?=b(?:d+(?:.d+)?)b|$)'
string = ' 1 Something Interesting here 2 More interesting text 2.1 An example of 2C19 a header 2.3 Another header example 2.4 another interesting header 10.1 header stuff  14 the last interesting 3A4 header'
result = re.findall(regex, string)

In this case, result will be:

>>> result
['1 Something Interesting here ',
 '2 More interesting text ',
 '2.1 An example of 2C19 a header ',
 '2.3 Another header example ',
 '2.4 another interesting header ',
 '10.1 header stuff  ',
 '14 the last interesting 3A4 header']

Note that this solution also extracts the spacing at the end. If you don’t want this spacing, you can call strip on your strings:

>>> [ match.strip() for match in result ]
['1 Something Interesting here',
 '2 More interesting text',
 '2.1 An example of 2C19 a header',
 '2.3 Another header example',
 '2.4 another interesting header',
 '10.1 header stuff',
 '14 the last interesting 3A4 header']
Answered By: Marco
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.