Find the first/last n words of a string with a maximum of 20 characters using regex

Question

I’m trying to find any number of words at the beginning or end of a string with a maximum of 20 characters.

This is what I have right now:

s1 = "Hello,    World! This is a reallly long string"
match = re.search(r"^(b.{0,20}b)", s1)
print(f"'{match.group(0)}'") # 'Hello, World! This '

My problem is the extra space that it adds at the end. I believe this is because b matches either the beginning or the end of the string but I’m not sure what to do about it.

I run into the same issue if I try to do the same with the end of the string but with a leading space instead:

s1 = "Hello,    World! This is a reallly long string"
match = re.search(r"(b.{0,20}b)$", s1)
print(f"'{match.group(0)}'") # ' reallly long string'

I know I can just use rstrip and lstrip to get rid of the leading/trailing whitespace but I was just wondering if there’s a way to do it with regex.

Asked By: NOT

||

Source

Answer 1

You can use r"^(.{0,19}Sb|)" (regex demo), S ensuring to have a non space character on the bound. You need to decrease the number of previous characters to 19 and use | with empty string to match 0 characters if needed:

import re
s1 = "Hello,    World! This is a reallly long string"
match = re.search(r"^(.{0,19}Sb|)", s1)
print(f"'{match.group(0)}'", len(match.group(0)))

Output:

'Hello,    World' 15

For the end of string r"(|bS.{0,19})$" (regex demo):

import re
s1 = "Hello,    World! This is a reallly long string"
match = re.search(r"(|bS.{0,19})$", s1)
print(f"'{match.group(0)}'", len(match.group(0)))

output:

'reallly long string' 19

why `(...|)`?

to enable zeros characters, the below example would fail with ^(.{0,19}Sb)

import re
s1 = "X"*21
match = re.search(r"^(.{0,19}Sb|)$", s1)
print(f"'{match.group(0)}'", len(match.group(0)))

output:

'' 0

Answered By: mozway

Answer 2

You may use this regex:

^S.{0,18}Sb|bS.{0,18}S$

S (not a whitespace) at start and end guarantees that your matches start and with with a non-whitespace character.

RegEx Demo

Code Demo

code:

import re

s = "Hello,    World! This is a reallly long string"

print(re.findall(r'^S.{0,18}Sb|bS.{0,18}S$', s))
# ['Hello,    World', 'reallly long string']

Answered By: anubhava

Find the first/last n words of a string with a maximum of 20 characters using regex

Question:

Answers:

why `(...|)`?

Find the first/last n words of a string with a maximum of 20 characters using regex

Question:

Answers:

why (...|)?

why `(...|)`?