Python regex does not find the pattern to parse markdown python code while regex101 does

Question:

In a markdown file, I would like to extract python code in

```python 
...
```(end)

Using regex and python.
While the python code

import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
findpythoncodepattern = re.compile(r'```python.+```',re.MULTILINE)
for item in findpythoncodepattern.finditer(text):
    print(item)

Does not find a result (even when I add or delete the re.MULTILINE flag), the regex does not seem to be the problem since Regex101 finds it.

When I change the text into a raw text ' '->r' ', it finds something but not the full match. What is the problem here?

Asked By: Uwe.Schneider

||

Answers:

Try to use flags = re.S (aka re.DOTALL):

import re

text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'

findpythoncodepattern = re.compile(r"```python.+```", flags=re.S)

for item in findpythoncodepattern.finditer(text):
    print(item.group(0))

Prints:

    ```python
    def halloworld():
            for item in range(10):
                    print("Hello")
    ```
Answered By: Andrej Kesely

In a markdown file, I would like to extract python code

To extract only the code, use the (?<=```python)([sS]+)(?=```) pattern.

import re

text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'

pattern = re.compile(r'(?<=```python)([sS]+)(?=```)')
for item in pattern.findall(text):
    print(item)

# def halloworld():
#    for item in range(10):
#        print("Hello")

NOTE: [sS] is the same as the . with the re.S flag.

Answered By: Artyom Vancyan
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.