Extract string before underscore in python string
Question:
I have a list of strings in python that looks something like: [AAA_X, BBB_X, CCC_X].
How can I efficiently extract the part of the string before the underscore?
Thank you!
Answers:
Regular expressions can remove a suffix including the underscore from a string. Using regular expressions you can match on exactly what you want to operate on.
import re
for s in ["AAA_X", "BBB_X", "CCC_X"]:
print(s, ">>", re.sub(r"_.*", "", s))
Output:
AAA_X >> AAA
BBB_X >> BBB
CCC_X >> CCC
If want to output the prefix only when an underscore character is present in a string then can use the re.match()
function. If no match such as input string "DDD" then the target string does not have a prefix followed by an underscore and will not be printed.
import re
for s in ["AAA_X", "BBB_X", "CCC_X", "DDD"]:
if m := re.match(r"(w+)_", s):
print(s, ">>", m.group(1))
The above code has identical output as the first code example.
You don’t really need the re module for something so trivial. How about:
_list = ['AAA_X', 'BBB_X', 'CCC_X']
print(*(f'{e} >> {e.split("_")[0]}' for e in _list), sep='n')
Output:
AAA_X >> AAA
BBB_X >> BBB
CCC_X >> CCC
I have a list of strings in python that looks something like: [AAA_X, BBB_X, CCC_X].
How can I efficiently extract the part of the string before the underscore?
Thank you!
Regular expressions can remove a suffix including the underscore from a string. Using regular expressions you can match on exactly what you want to operate on.
import re
for s in ["AAA_X", "BBB_X", "CCC_X"]:
print(s, ">>", re.sub(r"_.*", "", s))
Output:
AAA_X >> AAA
BBB_X >> BBB
CCC_X >> CCC
If want to output the prefix only when an underscore character is present in a string then can use the re.match()
function. If no match such as input string "DDD" then the target string does not have a prefix followed by an underscore and will not be printed.
import re
for s in ["AAA_X", "BBB_X", "CCC_X", "DDD"]:
if m := re.match(r"(w+)_", s):
print(s, ">>", m.group(1))
The above code has identical output as the first code example.
You don’t really need the re module for something so trivial. How about:
_list = ['AAA_X', 'BBB_X', 'CCC_X']
print(*(f'{e} >> {e.split("_")[0]}' for e in _list), sep='n')
Output:
AAA_X >> AAA
BBB_X >> BBB
CCC_X >> CCC