Regex problem searching through a pyperclip multipleline copied text

Question:

Happens to me a rare thing when trying to do a search with regex trough a pyperclip.paste() if the search expression involves a n new line character.

Excuse my English.

When the search, I make it trough this triple quote assigned to a text variable:

import re

text = '''
This as the line 1
This as the line 2
'''

pattern = re.compile(r'dnw+')
result = pattern.findall(text)
print(result)

It actually prints the new line character n. Which is what I want, or almost what I expect.

»»» [‘1nThis’]

But the problem starts when the string to search come from a text copied from the clipboard.

This as the line 1
This as the line 2

Say I just select and copy to clipboard that text and i want regex to extract the same previous output from it.
This time I need to use pyperclip module.

So, forgetting the previous code and write this instead:

import re, pyperclip

text = pyperclip.paste()

pattern = re.compile(r'dnw+')
result = pattern.findall(text)
print(result)

This is the result:

»»» [ ]

Nothing but two brackets. I discover (in my inexperience) that the problem causing this is the n character. And it has nothing to do with a conflict between the python (also n character), because we avoid that with ‘r’.

I already found a not too clearly solution for this (for me almost, because I’m just with the basics of Python right now).

import re, pyperclip

text = pyperclip.paste()
lines = text.split('n')
spam = ''

for i in lines:
    spam = spam + i

pattern = re.compile(r'drw+')
result = pattern.findall(spam)
print(result)

Note that instead of n for detect new lines in the last regex expression, I opted to r (n would cause the same bad behavior printing only brackets).
r its exchangeable with s, the output works, but:

»»» [‘1rThis’]

With r instead of n

At least it was a little victory for me.

It’ll helps me a lot if you could explain to me a better solution for this o almost understand why this happened. You also can recommend me some concepts to investigate to, for a fully comprehension of this.

Asked By: serranomorante

||

Answers:

The reason you are getting the r when pasting is because you are pasting from a Windows machine. On windows, the newline characters are represented by rn. Note that s is different from r. s means any whitespace characters. r is only the carriage return character.

The text:

This as the line 1
This as the line 2

actually looks like:

This as the line 1rn
This as the line 2rn

on a windows machine.

In the regex, thedr matches to end of the first line: 1r but then the w+ doesn’t match the n. You need to edit your first regex to be:

pattern = re.compile(r'drnw+')

Source: Do line endings differ between Windows and Linux?

Answered By: Tyler Marshall