How can I remove characters after underscore?

Question:

I need to convert a text like

photo_id 102297_skdjksd223238 text black dog in a water

to

photo_id 102297 text black dog in a water

by removeing the substring after underscore

inputFile = open("text.txt", "r") 
exportFile = open("result", "w")
sub_str = "_"
for line in inputFile:    
   new_line = line[:line.index(sub_str) + len(sub_str)]
   exportFile.writelines(new_line) 

but couldn’t access the second underscore as it removed all text after photo_id ..

Asked By: user20133451

||

Answers:

You might use a pattern to capture the leading digits to make it a bit more specific, and then match the underscore followed by optional non whitespace characters.

In the replacement use the first capture group.

b(d+)_S*

Explanation

  • b A word boundary to prevent a partial word match
  • (d+) Capture group 1, match 1+ digits
  • _S* Match an underscore and optional non whitespace characters

See a regex101 demo.

import re
 
pattern = r"b(d+)_S*"
s = "photo_id 102297_skdjksd223238 text black dog in a water"
 
result = re.sub(pattern, r"1", s)
 
if result:
    print (result)

Output

photo_id 102297 text black dog in a water

Another option including photo_id and matching until the first underscore:

b(photo_ids+[^_s]+)_S*

See another regex101 demo.

Answered By: The fourth bird

Note: The question was tagged when I wrote this:

_[^s]*
  • _ – a literal _
  • [^s]* – (or S* if supported) any character but whitespaces – zero or more times

Substitute with a blank string.

Demo

inp = 'photo_id 102297_skdjksd223238 text black dog in a water foo_baz bar'

res = re.sub(r'_[^s]*', '', inp)

print(res)

Output

photo 102297 text black dog in a water foo bar
Answered By: Ted Lyngmo

You could split the first underscore from the right:

s= "photo_id 102297_skdjksd223238 text black dog in a water"
prefix, suffix = s.rsplit('_', 1)
print(f"{prefix} {suffix.split(' ', 1)[-1]}")

Out:

photo_id 102297 text black dog in a water
Answered By: Maurice Meyer
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.