Splitting Pandas series with regex and word grouping constraints

Question:

Say I have df as follows:

MyCol
Red Motor
Blue Taxi
Green Taxi-1
Light blue small Taxi-1 
Light blue big Taxi-2

I would like to split the color and the vehicle into two columns. I used this command to split the last word (could be any character).

The last word (could be any character, like taxi or taxi-1) refers to the vehicle. Sometimes, there is a ‘big’ or ‘small’ associated with the car name. The first few words (can be one or more than one words) refers to the color.

This is what I have tried. It only works when the last word is a word without special characters. How can I include the case when special characters in the last word too?

df['MyCol'].str.extract('^(.*?)s((?:small|big)?s?w+).*$')
Asked By: test tes

||

Answers:

df['MyCol'].str.extract('^(.*?)s((?:small|big|)s?S+)$')

resulting in:

0 1
0 Red Motor
1 Blue Taxi
2 Green Taxi-1
3 Light blue small Taxi-1
4 Light blue big Taxi
Answered By: Will Holtz
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.