Regular expression in Python to find certain letters without letters to the left or right of them
Question:
I have data in string format where somewhere within the string, a size is given (s, m or l). I want to extract the size, but the formatting is a bit all over the place and you can have (s), /s, S , and all kinds of variations for the small size. I figured as long as the letter denoting size does not have a letter to the left or right, it should be the size I’m looking for, so for example, "es", "set" etc. would be substrings where the size "s" is not to be found, while " s, ", "/s " are substrings where the size (small in this case) can be found.
I have no idea how to do that in Regex; I googled but did not found close matches to what I’m looking for. I’m also new-ish to regular expressions.
Examples:
"es", "setr", "qwfq qf", "lllll" Output: None
"e s", "(s)", "s, " Output: "s"
" m ", "qdqd m dqq ", "sssss m", "m lllll" Output: "m"
"l", "qddwfq l " Output: "l"
Answers:
You can use the following regex:
b[smlSML]b
b
= word boundary
[smlSML]
= find any of the following characters : s,m,l,S,M,L
b
= word boundary
Code:
import re
examples = ["es", "setr", "qwfq qf", "lllll", "e s", "(s)", "s, ", " m ", "qdqd m dqq ", "sssss m", "m lllll", "l", "qddwfq l "]
p = re.compile(r'b[smlSML]b')
for ex in examples:
result = p.search(ex)
if result is not None:
result = result.group(0)
print(f"Input = {ex:<11}- Output = {result}")
Output:
Input = es - Output = None
Input = setr - Output = None
Input = qwfq qf - Output = None
Input = lllll - Output = None
Input = e s - Output = s
Input = (s) - Output = s
Input = s, - Output = s
Input = m - Output = m
Input = qdqd m dqq - Output = m
Input = sssss m - Output = m
Input = m lllll - Output = m
Input = l - Output = l
Input = qddwfq l - Output = l
I have data in string format where somewhere within the string, a size is given (s, m or l). I want to extract the size, but the formatting is a bit all over the place and you can have (s), /s, S , and all kinds of variations for the small size. I figured as long as the letter denoting size does not have a letter to the left or right, it should be the size I’m looking for, so for example, "es", "set" etc. would be substrings where the size "s" is not to be found, while " s, ", "/s " are substrings where the size (small in this case) can be found.
I have no idea how to do that in Regex; I googled but did not found close matches to what I’m looking for. I’m also new-ish to regular expressions.
Examples:
"es", "setr", "qwfq qf", "lllll" Output: None
"e s", "(s)", "s, " Output: "s"
" m ", "qdqd m dqq ", "sssss m", "m lllll" Output: "m"
"l", "qddwfq l " Output: "l"
You can use the following regex:
b[smlSML]b
b
= word boundary[smlSML]
= find any of the following characters :s,m,l,S,M,L
b
= word boundary
Code:
import re
examples = ["es", "setr", "qwfq qf", "lllll", "e s", "(s)", "s, ", " m ", "qdqd m dqq ", "sssss m", "m lllll", "l", "qddwfq l "]
p = re.compile(r'b[smlSML]b')
for ex in examples:
result = p.search(ex)
if result is not None:
result = result.group(0)
print(f"Input = {ex:<11}- Output = {result}")
Output:
Input = es - Output = None
Input = setr - Output = None
Input = qwfq qf - Output = None
Input = lllll - Output = None
Input = e s - Output = s
Input = (s) - Output = s
Input = s, - Output = s
Input = m - Output = m
Input = qdqd m dqq - Output = m
Input = sssss m - Output = m
Input = m lllll - Output = m
Input = l - Output = l
Input = qddwfq l - Output = l