How can I use b boundary around special characters
Question:
b✅b
do not match a single emoji: ‘✅’.
bu2B07b
do not match: ‘⬇️’.
b-b do not match ‘-‘.
bfoob
certainly match ‘foo’.
Why does that happens and what’s an alternative to ensure my emoji or any special character is not in the middle of a string
playground: https://regex101.com/r/jRaQuJ/2
Edit: For the record, I think this question because i think it’s still useful even somehow duplicated. 1st duplicate marked shows a specific and verbose question while this one is simple short and easy to find. 2nd duplicate is just the definition of b
boundary and someone with my problem would probably need something more specific.
Answers:
You can use the pattern:
(?<!w)✅(?!w)
This uses negative lookarounds to match an emoji with no word characters on either side.
The reason for the matches you asked about is that b
is a zero-width boundary where one side of the boundary is w
(a word character, or [0-9A-Za-z_]
) and the other is the beginning or end of the string or W
(a non-word character).
For example, consider the string "foo."
:
start of string boundary (zero width)
|
| non-word character
| |
v v
foo.
^ ^
| |
word characters
The b
boundary could be used in the regex bfoob
and find a match thanks to the boundary between o
and .
characters and the boundary between the beginning of the string and the character f
.
"foobar"
does not match bfoob
because the second o
and b
don’t satisfy the boundary condition, that is, b
isn’t a non-word character or end of the string.
The pattern b-b
does not match the string "-"
because "-"
isn’t a word character. Likewise, emojis are built from non-word characters so they won’t respond to the boundary as a word character does as is the case with bfoob
.
b✅b
do not match a single emoji: ‘✅’.
bu2B07b
do not match: ‘⬇️’.
b-b do not match ‘-‘.
bfoob
certainly match ‘foo’.
Why does that happens and what’s an alternative to ensure my emoji or any special character is not in the middle of a string
playground: https://regex101.com/r/jRaQuJ/2
Edit: For the record, I think this question because i think it’s still useful even somehow duplicated. 1st duplicate marked shows a specific and verbose question while this one is simple short and easy to find. 2nd duplicate is just the definition of b
boundary and someone with my problem would probably need something more specific.
You can use the pattern:
(?<!w)✅(?!w)
This uses negative lookarounds to match an emoji with no word characters on either side.
The reason for the matches you asked about is that b
is a zero-width boundary where one side of the boundary is w
(a word character, or [0-9A-Za-z_]
) and the other is the beginning or end of the string or W
(a non-word character).
For example, consider the string "foo."
:
start of string boundary (zero width)
|
| non-word character
| |
v v
foo.
^ ^
| |
word characters
The b
boundary could be used in the regex bfoob
and find a match thanks to the boundary between o
and .
characters and the boundary between the beginning of the string and the character f
.
"foobar"
does not match bfoob
because the second o
and b
don’t satisfy the boundary condition, that is, b
isn’t a non-word character or end of the string.
The pattern b-b
does not match the string "-"
because "-"
isn’t a word character. Likewise, emojis are built from non-word characters so they won’t respond to the boundary as a word character does as is the case with bfoob
.