How to find ellipses in text string Python?

Question

Fairly new to Python (And Stack Overflow!) here. I have a data set with subject line data (text strings) that I am working on building a bag of words model with. I’m creating new variables that flags a 0 or 1 for various possible scenarios, but I’m stuck trying to identify where there is an ellipsis (“…”) in the text. Here’s where I’m starting from:

Data_Frame['Elipses'] = Data_Frame.Subject_Line.str.match('(w+).{2,}(.+)')

Inputting (‘…’) doesn’t work for obvious reasons, but the above RegEx code was suggested–still not working. Also tried this:

Data_Frame['Elipses'] = Data_Frame.Subject_Line.str.match('...')

No dice.

The above code shell works for other variables I’ve created, but I’m also having trouble creating a 0-1 output instead of True/False (would be an ‘as.numeric’ argument in R.) Any help here would also be appreciated.

Thanks!

Asked By: foosgold

||

Source

Answer 1

Using search() instead of match() would spot an ellipses at any point in the text. In Pandas str.contains() supports regular expressions:

For example in Pandas:

import pandas as pd

df = pd.DataFrame({'Text' : ["hello..", "again... this", "is......a test",  "Real ellipses… here", "...not here"]})
df['Ellipses'] = df.Text.str.contains(r'w+(.{3,})|…')

print(df)

Giving you:

                  Text  Ellipses
0              hello..     False
1        again... this      True
2       is......a test      True
3  Real ellipses… here      True
4          ...not here     False

Or without pandas:

import re

for test in ["hello..", "again... this", "is......a test",  "Real ellipses… here", "...not here"]:
    print(int(bool(re.search(r'w+(.{3,})|…', test))))

This matches on the middle tests giving:

Take a look at search-vs-match for a good explanation in the Python docs.

To display the matching words:

import re
    
for test in ["hello..", "again... this", "is......a test",  "...def"]:
    ellipses = re.search(r'(w+).{3,}', test)
    
    if ellipses:
        print(ellipses.group(1))

Giving you:

again
is

Answered By: Martin Evans

How to find ellipses in text string Python?

Question:

Answers: