Python regex match space only
Question:
In python3, how do I match exactly whitespace character and not newline n or tab t?
I’ve seen the s+[^n]
answer from Regex match space not n answer, but for the following example it does not work:
a='rasdnsa sd'
print(re.search(r's+[^ n]',a))
Result is <_sre.SRE_Match object; span=(4, 6), match='ns'>
, which is the newline matched.
Answers:
If you want to match 1 or more whitespace chars except the newline and a tab use
r"[^Snt]+"
The [^S]
matches any char that is not a non-whitespace = any char that is whitespace. However, since the character class is a negated one, when you add characters to it they are excluded from matching.
import re
a='rasdnsa sd'
print(re.findall(r'[^Snt]+',a))
# => [' ']
Some more considerations: s
matches [ tnrfv]
if ASCII flag is used. So, if you plan to only match ASCII, you might as well use [ rfv]
to exclude the chars you want. If you need to work with Unicode strings, the solution above is a viable one.
No need for special groups. Just create a regex with a space character. The space character does not have any special meaning, it just means “match a space”.
RE = re.compile(' +')
So for your case
a='rasdnsa sd'
print(re.search(' +', a))
would give
<_sre.SRE_Match object; span=(7, 8), match=' '>
In python3, how do I match exactly whitespace character and not newline n or tab t?
I’ve seen the s+[^n]
answer from Regex match space not n answer, but for the following example it does not work:
a='rasdnsa sd'
print(re.search(r's+[^ n]',a))
Result is <_sre.SRE_Match object; span=(4, 6), match='ns'>
, which is the newline matched.
If you want to match 1 or more whitespace chars except the newline and a tab use
r"[^Snt]+"
The [^S]
matches any char that is not a non-whitespace = any char that is whitespace. However, since the character class is a negated one, when you add characters to it they are excluded from matching.
import re
a='rasdnsa sd'
print(re.findall(r'[^Snt]+',a))
# => [' ']
Some more considerations: s
matches [ tnrfv]
if ASCII flag is used. So, if you plan to only match ASCII, you might as well use [ rfv]
to exclude the chars you want. If you need to work with Unicode strings, the solution above is a viable one.
No need for special groups. Just create a regex with a space character. The space character does not have any special meaning, it just means “match a space”.
RE = re.compile(' +')
So for your case
a='rasdnsa sd'
print(re.search(' +', a))
would give
<_sre.SRE_Match object; span=(7, 8), match=' '>