Search start of the word using regular expression

Question:

How to write regular expression where we can find all words which are started by specified string. For ex-

a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"

Here I want to fetch all words which are starting by dr using ignorecase. I tried but all functions results where dr is found in word not start of the word.

Thanks in advance.

Asked By: Madhur Rampal

||

Answers:

You can use b to find word boundaries, and the re.IGNORECASE flag to search case-insensitively.

import re

a = "asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl"
for match in re.finditer(r'bdr', a, re.IGNORECASE):
    print 'Found match: "{0}" at position {1}'.format(match.group(0), match.start())

This will output:

Found match: "dr" at position 18
Found match: "DR" at position 28
Found match: "Dr" at position 40

Here, the pattern bdr matches dr, but only if it is found at the start of a word. This will also yield matches for strings like driving. If you only want to find dr as unique word, use bdrb.

I use re.finditer() to scan through the search string and yield every match for dr in a loop. The re.IGNORECASE flag causes dr to also match DR, Dr and dR.

Answered By: Ferdinand Beyer

@Ferdinand Beyer’s answer shows how to do it by regex. But you can easily achieve that with string functions:

>>> a
'asasasa sasDRasas dr.klklkl DR.klklklkl Dr klklklkklkl'
>>> cleaned = "".join(" " if i in string.punctuation else i for i in a)
>>> cleaned
'asasasa sasDRasas dr klklkl DR klklklkl Dr klklklkklkl'
>>> [word for word in cleaned.split() if word.lower().startswith("dr")]
['dr', 'DR', 'Dr']
Answered By: utdemir
>>> string_to_search_in
'this a a dr.seuse dr.brown dr. oz dr noone'
>>> re.compile('b*?dr.?s*?w+', re.IGNORECASE).findall(string_to_search_in)
['dr.seuse', 'dr.brown', 'dr. oz', 'dr noone']
Answered By: sampwing

Yet another solution.

The expression will search and return the exact and starting with words from a string matched with a string variable.

import re

txt = "this a a dr.seuse dr.brown dr. oz dr noone"
suggtxt= "dr."
w_regex = r"b"+re.escape(suggtxt)+r"+S*"
x = re.findall(w_regex, txt,  re.IGNORECASE)
print(x)

Output:

['dr.seuse', 'dr.brown', 'dr.']
Answered By: sariDon
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.