pandas: how to duplicate a value for every substring in a column

Question:

I have a pandas dataframe as folllows,

import pandas as pd

df = pd.DataFrame({'text': ['set an alarm for [time : two hours from now]','wake me up at [time : nine am] on [date : friday]','check email from [person : john]']})
print(df)

original dataframe

                                                text
0       set an alarm for [time : two hours from now]
1  wake me up at [time : nine am] on [date : friday]
2                   check email from [person : john]

I would like to repeat the list and the labels (date, time, and person) for all the values inside the lists if the value inside the list is more than one. so the desired output is,

desired output:

                                                new_text                                
0       set an alarm for [time : two] [time : hours] [time : from] [time : now]        
1  wake me up at [time : nine] [time : am] on [date : friday]  
2                   check email from [person : john]

I have so far tried to separate the lists from the original column, but do not know how to continue.

df['separated_list'] = df.text.str.split(r"s(?![^[]*])|[|]").apply(lambda x: [y for y in x if '[' in y])
Asked By: zara kolagar

||

Answers:

You can use a regex with a custom function as replacement:

df['new_text'] = df.text.str.replace(
  r"[([^[]]*?)s*:s*([^[]]*)]",
  lambda m: ' '.join([f'[{m.group(1)} : {x}]'
                      for x in m.group(2).split()]), # new chunk for each word
  regex=True)

output:

                                                text                                                                 new_text
0       set an alarm for [time : two hours from now]  set an alarm for [time : two] [time : hours] [time : from] [time : now]
1  wake me up at [time : nine am] on [date : friday]               wake me up at [time : nine] [time : am] on [date : friday]
2                   check email from [person : john]                                         check email from [person : john]

regex demo

Answered By: mozway

find the [] using look behind and ahead, use a repeating capture group to get the string contents then split the contents using :

df = pd.DataFrame({'text': ['set an alarm for [time : two hours from now]','wake me up at [time : nine am] on [date : friday]','check email from [person : john]']})
#print(df)
data=df['text']
for item in data:
    print(item)
    matches=re.findall(r'(?<=[)(?:[w+s*]+:[w+s*]+)(?=])', item)
    for match in matches:
        parts=match.split(":")
        print(parts)

output:

set an alarm for [time : two hours from now]
['time ', ' two hours from now']
wake me up at [time : nine am] on [date : friday]
['time ', ' nine am']
['date ', ' friday']
check email from [person : john]
['person ', ' john']
Answered By: Golden Lion
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.