Regex problem searching through a pyperclip multipleline copied text
Question:
Happens to me a rare thing when trying to do a search with regex trough a pyperclip.paste()
if the search expression involves a n
new line character.
Excuse my English.
When the search, I make it trough this triple quote assigned to a text
variable:
import re
text = '''
This as the line 1
This as the line 2
'''
pattern = re.compile(r'dnw+')
result = pattern.findall(text)
print(result)
It actually prints the new line character n
. Which is what I want, or almost what I expect.
»»» [‘1nThis’]
But the problem starts when the string to search come from a text copied from the clipboard.
This as the line 1
This as the line 2
Say I just select and copy to clipboard that text and i want regex to extract the same previous output from it.
This time I need to use pyperclip module.
So, forgetting the previous code and write this instead:
import re, pyperclip
text = pyperclip.paste()
pattern = re.compile(r'dnw+')
result = pattern.findall(text)
print(result)
This is the result:
»»» [ ]
Nothing but two brackets. I discover (in my inexperience) that the problem causing this is the n
character. And it has nothing to do with a conflict between the python (also n character), because we avoid that with ‘r’.
I already found a not too clearly solution for this (for me almost, because I’m just with the basics of Python right now).
import re, pyperclip
text = pyperclip.paste()
lines = text.split('n')
spam = ''
for i in lines:
spam = spam + i
pattern = re.compile(r'drw+')
result = pattern.findall(spam)
print(result)
Note that instead of n
for detect new lines in the last regex expression, I opted to r
(n
would cause the same bad behavior printing only brackets).
r
its exchangeable with s
, the output works, but:
»»» [‘1rThis’]
With r
instead of n
At least it was a little victory for me.
It’ll helps me a lot if you could explain to me a better solution for this o almost understand why this happened. You also can recommend me some concepts to investigate to, for a fully comprehension of this.
Answers:
The reason you are getting the r
when pasting is because you are pasting from a Windows machine. On windows, the newline characters are represented by rn
. Note that s
is different from r
. s
means any whitespace characters. r
is only the carriage return character.
The text:
This as the line 1
This as the line 2
actually looks like:
This as the line 1rn
This as the line 2rn
on a windows machine.
In the regex, thedr
matches to end of the first line: 1r
but then the w+
doesn’t match the n
. You need to edit your first regex to be:
pattern = re.compile(r'drnw+')
Happens to me a rare thing when trying to do a search with regex trough a pyperclip.paste()
if the search expression involves a n
new line character.
Excuse my English.
When the search, I make it trough this triple quote assigned to a text
variable:
import re
text = '''
This as the line 1
This as the line 2
'''
pattern = re.compile(r'dnw+')
result = pattern.findall(text)
print(result)
It actually prints the new line character n
. Which is what I want, or almost what I expect.
»»» [‘1nThis’]
But the problem starts when the string to search come from a text copied from the clipboard.
This as the line 1
This as the line 2
Say I just select and copy to clipboard that text and i want regex to extract the same previous output from it.
This time I need to use pyperclip module.
So, forgetting the previous code and write this instead:
import re, pyperclip
text = pyperclip.paste()
pattern = re.compile(r'dnw+')
result = pattern.findall(text)
print(result)
This is the result:
»»» [ ]
Nothing but two brackets. I discover (in my inexperience) that the problem causing this is the n
character. And it has nothing to do with a conflict between the python (also n character), because we avoid that with ‘r’.
I already found a not too clearly solution for this (for me almost, because I’m just with the basics of Python right now).
import re, pyperclip
text = pyperclip.paste()
lines = text.split('n')
spam = ''
for i in lines:
spam = spam + i
pattern = re.compile(r'drw+')
result = pattern.findall(spam)
print(result)
Note that instead of n
for detect new lines in the last regex expression, I opted to r
(n
would cause the same bad behavior printing only brackets).
r
its exchangeable with s
, the output works, but:
»»» [‘1rThis’]
With r
instead of n
At least it was a little victory for me.
It’ll helps me a lot if you could explain to me a better solution for this o almost understand why this happened. You also can recommend me some concepts to investigate to, for a fully comprehension of this.
The reason you are getting the r
when pasting is because you are pasting from a Windows machine. On windows, the newline characters are represented by rn
. Note that s
is different from r
. s
means any whitespace characters. r
is only the carriage return character.
The text:
This as the line 1
This as the line 2
actually looks like:
This as the line 1rn
This as the line 2rn
on a windows machine.
In the regex, thedr
matches to end of the first line: 1r
but then the w+
doesn’t match the n
. You need to edit your first regex to be:
pattern = re.compile(r'drnw+')