Python Regular Expression: re.sub to replace matches

Question:

I am trying to analyze an earnings call using python regular expression.
I want to delete unnecessary lines which only contain the name and position of the person, who is speaking next.

This is an excerpt of the text I want to analyze:

"Questions and AnswersnOperator [1]nn Shannon Siemsen Cross, Cross Research LLC – Co-Founder, Principal & Analyst [2]n I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you’re seeing in the various regions that you’re selling in and what you’re hearing from your customers. And then I have a follow-up.n Timothy D. Cook, Apple Inc. – CEO & Director [3]n …"

At the end of each line that I want to delete, you have [some number].

So I used the following line of code to get these lines:

name_lines = re.findall('.*[d]]', text)

This works and gives me the following list:
[‘Operator [1]’,
‘ Shannon Siemsen Cross, Cross Research LLC – Co-Founder, Principal & Analyst [2]’,
‘ Timothy D. Cook, Apple Inc. – CEO & Director [3]’]

So, now in the next step I want to replace this strings in the text using the following line of code:

for i in range(0,len(name_lines)): 
    text = re.sub(name_lines[i], '', text)

But this does not work. Also if I just try to replace 1 instead of using the loop it does not work, but I have no clue why.

Also if I try now to use re.findall and search for the lines I obtained from the first line of code I don`t get a match.

Asked By: Kyle_Stockton

||

Answers:

The first argument to re.sub is treated as a regular expression, so the square brackets get a special meaning and don’t match literally.

You don’t need a regular expression for this replacement at all though (and you also don’t need the loop counter i):

for name_line in name_lines:
    text = text.replace(name_line, '')
Answered By: Thomas

Try to use re.sub to replace the match:

import re

text = """
Questions and Answers
Operator [1]

Shannon Siemsen Cross, Cross Research LLC - Co-Founder, Principal & Analyst [2]
I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.
Timothy D. Cook, Apple Inc. - CEO & Director [3]"""

text = re.sub(r".*d]", "", text)
print(text)

Prints:

Questions and Answers



I hope everyone is well. Tim, you talked about seeing some improvement in the second half of April. So I was wondering if you could just talk maybe a bit more on the segment and geographic basis what you're seeing in the various regions that you're selling in and what you're hearing from your customers. And then I have a follow-up.
Answered By: Andrej Kesely
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.