Python regex does not find the pattern to parse markdown python code while regex101 does
Question:
In a markdown file, I would like to extract python code in
```python
...
```(end)
Using regex and python.
While the python code
import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
findpythoncodepattern = re.compile(r'```python.+```',re.MULTILINE)
for item in findpythoncodepattern.finditer(text):
print(item)
Does not find a result (even when I add or delete the re.MULTILINE
flag), the regex does not seem to be the problem since Regex101 finds it.
When I change the text into a raw text ' '
->r' '
, it finds something but not the full match. What is the problem here?
Answers:
Try to use flags = re.S
(aka re.DOTALL
):
import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
findpythoncodepattern = re.compile(r"```python.+```", flags=re.S)
for item in findpythoncodepattern.finditer(text):
print(item.group(0))
Prints:
```python
def halloworld():
for item in range(10):
print("Hello")
```
In a markdown file, I would like to extract python code
To extract only the code, use the (?<=```python)([sS]+)(?=```)
pattern.
import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
pattern = re.compile(r'(?<=```python)([sS]+)(?=```)')
for item in pattern.findall(text):
print(item)
# def halloworld():
# for item in range(10):
# print("Hello")
NOTE: [sS]
is the same as the .
with the re.S
flag.
In a markdown file, I would like to extract python code in
```python
...
```(end)
Using regex and python.
While the python code
import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
findpythoncodepattern = re.compile(r'```python.+```',re.MULTILINE)
for item in findpythoncodepattern.finditer(text):
print(item)
Does not find a result (even when I add or delete the re.MULTILINE
flag), the regex does not seem to be the problem since Regex101 finds it.
When I change the text into a raw text ' '
->r' '
, it finds something but not the full match. What is the problem here?
Try to use flags = re.S
(aka re.DOTALL
):
import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
findpythoncodepattern = re.compile(r"```python.+```", flags=re.S)
for item in findpythoncodepattern.finditer(text):
print(item.group(0))
Prints:
```python
def halloworld():
for item in range(10):
print("Hello")
```
In a markdown file, I would like to extract python code
To extract only the code, use the (?<=```python)([sS]+)(?=```)
pattern.
import re
text = 'We want to examine the python codenn```pythonndef halloworld():ntfor item in range(10):nttprint("Hello")n``` and have no bad intention when we want to parse it'
pattern = re.compile(r'(?<=```python)([sS]+)(?=```)')
for item in pattern.findall(text):
print(item)
# def halloworld():
# for item in range(10):
# print("Hello")
NOTE: [sS]
is the same as the .
with the re.S
flag.