Regex to match a condition UNLESS it is a hashtag
Question:
I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag. I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.
Here is a test string that I have been using to try and find a solution:
happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old
I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.
I have tried the following regex statements.
(#w+d+w*)|(w+d+w*)
this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only.
(?<!#)w*d+w*
this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.
Answers:
You might use
(?<!S)[^Wd]*dw*
(?<!S)
Assert a whitespace boundary to the left
[^Wd]*
Match optional word chars except a digit
d
Match at least a single digit
w*
Match optional word chars
See a regex demo.
If you want to allow a partial match, you can use a negative lookbehind to not assert a #
followed by a word boundary:
(?<!#)b[^Wd]*dw*
See another regex demo.
I am trying to write a regex statement to remove digits or words that contain digits in them only if they are not a hashtag. I am able to succesfully match words that have digits in them, but cannot seem to write a condition that ignores words that begin with a hashtag.
Here is a test string that I have been using to try and find a solution:
happening bit mediacon #2022ppopcon wearing stell naman today #sb19official 123 because h3llo also12 or 23old
I need a regex command that will capture the 123, h3llo, also12 and 23old but ignore the #2022ppopcon and #sb19official strings.
I have tried the following regex statements.
(#w+d+w*)|(w+d+w*)
this succesfully captures the hashtags in group 1 and the non-hashtags in group 2, but I cannot figure out how to make it select group 2 only.
(?<!#)w*d+w*
this excludes the first character after the hashtag but still captures all the remaining characters in the hashtag string. for example in the string #2022ppopcan, it ignores #2 and captures 022ppopcan.
You might use
(?<!S)[^Wd]*dw*
(?<!S)
Assert a whitespace boundary to the left[^Wd]*
Match optional word chars except a digitd
Match at least a single digitw*
Match optional word chars
See a regex demo.
If you want to allow a partial match, you can use a negative lookbehind to not assert a #
followed by a word boundary:
(?<!#)b[^Wd]*dw*
See another regex demo.