Regex to match a condition UNLESS it is a hashtag

Question:

I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag. I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.

Here is a test string that I have been using to try and find a solution:

happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old

I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.

I have tried the following regex statements.

(#w+d+w*)|(w+d+w*)
this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only.

(?<!#)w*d+w*
this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.

Asked By: Waleed Alfaris

||

Answers:

You might use

(?<!S)[^Wd]*dw*
  • (?<!S) Assert a whitespace boundary to the left
  • [^Wd]* Match optional word chars except a digit
  • d Match at least a single digit
  • w* Match optional word chars

See a regex demo.

If you want to allow a partial match, you can use a negative lookbehind to not assert a # followed by a word boundary:

(?<!#)b[^Wd]*dw*

See another regex demo.

Answered By: The fourth bird
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.