Python Regex: How to remove brackets and n
Question:
I am trying to remove the [] and n when I use a regex expression to grab the data to get a score. Here is an example of what a description would look like
On Liars’ fifth album, the band throws their music into the deepest, darkest, most serpent-filled hole they could possibly find. They create an alternate dimension where eerie sounds and dissonant intervals reign supreme. There are some lush strings, a spot of horns; but Sisterworld revolves mostly around the simplicity of normal rock instrumentation. Yet, it sounds so otherworldly.
Overall, it lays on the tension a little too much, kind of making it difficult to enjoy. But it would be stupid to assume that wasn’t the underlying intent. Though I wasn’t able to enjoy that anxiety as much as some people may, I’ve still got to give kudos to these guys for turning another cohesive concept into an album.
6/10
http://theneedledrop.com
http://twitter.com/theneedledrop
I would then use the following regex expression to grab only the scores from this text. The score for this example is 6/10.
df["Description"].str.findall("[0-9]/10[^a-zA-Z0-9]")
However, the output that I get from using this regex expression is
[6/10n]
Is there a way to remove these brackets, the n, and make sure to only grab scores that are in the format of number(0-10)/10 with a regex expression?
Answers:
Use a lookahead:
df["Description"].str.findall("[0-9]/10(?=[^a-zA-Z0-9])")
Or a negative lookahead for non-digit:
df["Description"].str.findall("[0-9]/10(?!d)")
Output:
0 [6/10]
Name: Description, dtype: object
NB. The brackets are coming from findall
. You might also want to use extract
/extractall
(first match / all matches) in place of findall
, in which case use:
df["Description"].str.extract("([0-9]/10)D", expand=False)
# or
df["Description"].str.extract("([0-9]/10)(?!d)", expand=False)
Or for all matches:
df["Description"].str.extractall("([0-9]/10)D")
# or
df["Description"].str.extractall("([0-9]/10)(?!d)")
Output (with extract
):
0 6/10
Name: Description, dtype: object
I am trying to remove the [] and n when I use a regex expression to grab the data to get a score. Here is an example of what a description would look like
On Liars’ fifth album, the band throws their music into the deepest, darkest, most serpent-filled hole they could possibly find. They create an alternate dimension where eerie sounds and dissonant intervals reign supreme. There are some lush strings, a spot of horns; but Sisterworld revolves mostly around the simplicity of normal rock instrumentation. Yet, it sounds so otherworldly.
Overall, it lays on the tension a little too much, kind of making it difficult to enjoy. But it would be stupid to assume that wasn’t the underlying intent. Though I wasn’t able to enjoy that anxiety as much as some people may, I’ve still got to give kudos to these guys for turning another cohesive concept into an album.
6/10
http://theneedledrop.com
http://twitter.com/theneedledrop
I would then use the following regex expression to grab only the scores from this text. The score for this example is 6/10.
df["Description"].str.findall("[0-9]/10[^a-zA-Z0-9]")
However, the output that I get from using this regex expression is
[6/10n]
Is there a way to remove these brackets, the n, and make sure to only grab scores that are in the format of number(0-10)/10 with a regex expression?
Use a lookahead:
df["Description"].str.findall("[0-9]/10(?=[^a-zA-Z0-9])")
Or a negative lookahead for non-digit:
df["Description"].str.findall("[0-9]/10(?!d)")
Output:
0 [6/10]
Name: Description, dtype: object
NB. The brackets are coming from findall
. You might also want to use extract
/extractall
(first match / all matches) in place of findall
, in which case use:
df["Description"].str.extract("([0-9]/10)D", expand=False)
# or
df["Description"].str.extract("([0-9]/10)(?!d)", expand=False)
Or for all matches:
df["Description"].str.extractall("([0-9]/10)D")
# or
df["Description"].str.extractall("([0-9]/10)(?!d)")
Output (with extract
):
0 6/10
Name: Description, dtype: object