How can I use b boundary around special characters

Question:

b✅b do not match a single emoji: ‘✅’.

bu2B07b do not match: ‘⬇️’.

b-b do not match ‘-‘.

bfoob certainly match ‘foo’.

Why does that happens and what’s an alternative to ensure my emoji or any special character is not in the middle of a string

playground: https://regex101.com/r/jRaQuJ/2

Edit: For the record, I think this question because i think it’s still useful even somehow duplicated. 1st duplicate marked shows a specific and verbose question while this one is simple short and easy to find. 2nd duplicate is just the definition of b boundary and someone with my problem would probably need something more specific.

Asked By: Leonardo Rick

||

Answers:

You can use the pattern:

(?<!w)✅(?!w) 

This uses negative lookarounds to match an emoji with no word characters on either side.

The reason for the matches you asked about is that b is a zero-width boundary where one side of the boundary is w (a word character, or [0-9A-Za-z_]) and the other is the beginning or end of the string or W (a non-word character).

For example, consider the string "foo.":

start of string boundary (zero width)
     |
     |   non-word character
     |   |
     v   v
      foo.
      ^ ^
      | |
word characters

The b boundary could be used in the regex bfoob and find a match thanks to the boundary between o and . characters and the boundary between the beginning of the string and the character f.

"foobar" does not match bfoob because the second o and b don’t satisfy the boundary condition, that is, b isn’t a non-word character or end of the string.

The pattern b-b does not match the string "-" because "-" isn’t a word character. Likewise, emojis are built from non-word characters so they won’t respond to the boundary as a word character does as is the case with bfoob.

Answered By: ggorlen
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.