Extract part of a text and split into two columns

Question:

I am trying to extract some part of the following sentences (I have similar rows following similar pattern):

Text
19 hours ago — Catch up on key developments an...
8 hour ago — Catch up on key developments an...
10 minutes ago — Catch up on key developments an...
1 day ago — Catch up on key developments an...

I would like to split the Text column into two. (before and after the —) :

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...

I did this:

df[['Text1', 'Text2']] = df['Text'].str.extract(r"(d+ w+, d{5})?s*—?s*(.*)", expand=True)

However it seems not working.
If you have experience with re, could you please point me to the mistake and to the solution? Thanks

Asked By: V_sqrt

||

Answers:

You can use the pandas.Series.str.split function:

df['Text'].str.split(' — ', n=1, expand=True)

You need n=1 to limit the number of splits in output. Also, you need to set expand=True to use the expanding functionality.

Answered By: Riccardo Bucco

You can use split rather than Regex.

df[['Text1', 'Text2']] = df['Text'].str.split('-',n=1,expand=True)

Output:

Text1          Text 2
19 hours ago   Catch up on key developments an...
8 hour ago     Catch up on key developments an...
10 minutes ago Catch up on key developments an...
1 day ago      Catch up on key developments an...
Answered By: Kedar U Shet
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.