How to remove a specific combination of letters from the end of every words in a dataframe column?

Question

I want to remove the letters br from the end of every word in my Pandas dataframe column (As you will see, the rows of this column are actually sentances – all different from one another).

Unfortunately, I would already cleaned the data without giving much thought to the < br > tags, so I am now left with words like ‘startbr,’ ‘nicebr,’ and ‘hellobr,’ which are of no use to me.

A dataframe row may look something like this (errors denoted by ** ** tags):

Sentence = here are **somebr** examples of poorly written paragraphs **andbr** well-written **paragraphsbr** on the same **topicbr** how do they compare?

What I would like (without the br at the end):

Sentence: here are **some** examples of poorly written **and** well-written **paragraphs** on the same **topic** how do they compare?

I am hoping for an answer that will allow me to keep the original sentance (without any words that are followed by the letter br at the end). Words like "brutish," "breathtaking," and "ember" should be kept as is, since they could be of value. Fortunately there are not any words that I would like to retain that end with the letters br.

Asked By: merit

||

Source

Answer 1

Use a regex with a word boundary (b) to match the end of words:

df['text'] = df['text'].str.replace(r'brb', '', regex=True)

Example (with assignment as a new column text2):

                        text                  text2
0  word wordbr bread breadbr  word word bread bread

Answered By: mozway

How to remove a specific combination of letters from the end of every words in a dataframe column?

Question:

Answers: