Python regex – Extract all the matching text between two patterns

Question:

I want to extract all the text in the bullet points numbered as 1.1, 1.2, 1.3 etc. Sometimes the bullet points can have space like 1. 1, 1. 2, 1 .3, 1 . 4

Sample text

    text = "some text before pattern 1.1 text_1_here  1.2 text_2_here  1 . 3 text_3_here  1. 4 text_4_here  1 .5 text_5_here 1.10 last_text_here 1.23 text after pattern"

For the text above, the output should be
[‘ text_1_here ‘, ‘ text_2_here ‘, ‘ text_3_here ‘, ‘ text_4_here ‘, ‘ text_5_here ‘, ‘ last_text_here ‘]

I tried regex findall but not getting the required output. It is able to identify and extract 1.1 & 1.2 and then 1.3 & 1.4. It is skipping text between 1.2 & 1.3.

    import re
    re.findall(r'[0-9].s?[0-9]+(.*?)[0-9].s?[0-9]+', text)
Asked By: Prince

||

Answers:

I’m unsure about the exact rule why you’d want to exclude the last bit of text but based on your comments it seems we could also just split the entire text on the bullits and simply exclude the 1st and last element from the resulting array:

re.split(r's+d(?:s*.s*d+)+s+', text)[1:-1]

Which would output:

['text_1_here', 'text_2_here', 'text_3_here', 'text_4_here', 'text_5_here', 'last_text_here']
Answered By: JvdV
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.