How to match a changing pattern in python?

Question:

So I have a collection of lyrics from different artists, but in the middle of all the lyrics there is always an advertisement I want to remove. It looks like this:

‘lyric lyric See John Mayer LiveGet tickets as low as $53 lyric lyric’

More generally, the pattern is always: ‘See ARTIST LiveGet tickets as low as $NUMBER’

Is there a way I can match this changing pattern so I can get rid of these advertisements in the text?

Asked By: Nova

||

Answers:

Edit: fixed so it removes the space where the text was removed.

Assuming the ad is ALWAYS in that format, this is a very simplified version that you could expand upon..

import re

lyrics = "lyric lyric See John Mayer Live Get tickets as low as $53 lyric lyric"

pattern = r'Sees+(.*?)s+Live Get tickets as low ass+$[d,]+'

clean_lyrics = re.sub(pattern, '', lyrics).strip()
clean_lyrics = re.sub(r's+', ' ', clean_lyrics)

print(clean_lyrics)
# Output: 'lyric lyric lyric lyric'

The s+ , .*? , d+ are whitespaces, any random characters in a group, and digits in that order. This is used to help identify a pattern.

Answered By: Caleb Carson
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.