Extract a word from regular expression using python data frame

Question:

This is the data I’m working with:

Topic                                 About                                                     Group Discussion
microwave is not working              i tried turning on the microwave and it wont turn on      [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]
the light of the oven wont turn on    i have tried to press the light on the oven and nothing   [[person3 did you power on the oven], [person4 it was powered on], ...]
water will not come out of sink       i turn the valve and nothing comes out of the sink        [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...]

What I would like is this:

Topic                                 About                                                     Group Discussion                                                                                                                           Topic_Extract         About_Extract        Group_Discussion_Extract
microwave is not working              i tried turning on the microwave and it wont turn on      [[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]                                      microwave             microwave            microwave
the light of the oven wont turn on    i have tried to press the light on the oven and nothing   [[person3 did you power on the oven], [person4 it was powered on], ...]                                                                    oven                  oven                 oven
water will not come out of sink       i turn the valve and nothing comes out of the sink        [[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...]     sink                  sink                 sink

EDIT:
Okay, now it’s saying everything is ‘unclassified’ not sure how to fix this:

df['Title_Extract'] = ''
def loop(data):
    for i,j in data['Topic'].iteritems():
        if (re.search(r'microwave|microwave will not turn on|microwave is not working|microwave wont work|microwave will not work|microwave is broken', j) == True):
            return(data['Topic_Extract'].str.replace('', 'microwave'))
        elif (re.search(r'oven|oven will not turn on|oven is not working|oven wont work|oven will not work|oven is broken|oven wont turn on', j) == True):
            return(data['Topic_Extract'].str.replace('', 'oven'))
        elif (re.search(r'sink|sink will not turn on|sink is not working|sink wont work|sink will not work|sink is broken|sink wont turn on', j) == True):
            return(data['Topic_Extract'].str.replace('', 'sink'))
        else:
            return 'unclassified'

loop(df)

I am running into the following error when I’m trying to extract a word – not classifying correctly:

0        unclassified
...
2        unclassified
Asked By: Renee

||

Answers:

create a list of values to search. then use findall to return the values that are found in the df column

topic_terms = ['microwave','sink', 'oven']
df['term']=df['Topic'].str.findall("|".join(terms))
df


data used

data = {'Topic': {0: 'microwave is not working ',
  1: 'the light of the oven wont turn on ',
  2: 'water will not come out of sink '},
 'About': {0: 'i tried turning on the microwave and it wont turn on ',
  1: 'i have tried to press the light on the oven and nothing ',
  2: 'i turn the valve and nothing comes out of the sink '},
 'Group Discussion': {0: '[[person1 yeah the microwave wont turn on i tested it], [person2 okay send it over to warranty], ...]',
  1: '[[person3 did you power on the oven], [person4 it was powered on], ...]',
  2: '[[person5 okay it looks like water is not coming out of the sink], [person6 okay send it over to this department to take a look], ...'}}

df=pd.DataFrame(data)
df
Topic   About   Group Discussion    term
0   microwave is not working    i tried turning on the microwave and it wont t...   [[person1 yeah the microwave wont turn on i te...   microwave
1   the light of the oven wont turn on  i have tried to press the light on the oven an...   [[person3 did you power on the oven], [person4...   oven
2   water will not come out of sink     i turn the valve and nothing comes out of the ...   [[person5 okay it looks like water is not comi...   sink

enter image description here

Answered By: Naveed

I figured it out, thanks for the help everyone. This is what my solution looks like:

def loop(data):
    for i,j in data['Topic'].iteritems():
        if (re.search(r'bmicrowaveb', j)):
            data['Topic Extract'][i].append('microwave')
        elif (re.search(r'bovenb', j)):
            data['Topic Extract'][i].append('oven')
        elif (re.search(r'bsinkb', j)):
            data['Topic Extract'][i].append('sink')
        else:
            data['Topic Extract'][i].append('unclassified')
    return data

df = loop(df)

Answered By: Renee