Pandas extract word from link domain

Question:

I have dataframe :

import pandas as pd    
d = {'domain': ['linkedin.com','aumniversal.tumblr.com','plasticdrea.ms','linkedin.com','s-lw.tumblr.com','newsonline.media','creshendo.co.vu','deadly-skz-gods-cb.tumblr.com','deo.progr.am']}
df = pd.DataFrame(d)
df

I want to extract the words before the last word (for example, before .com, but I have not only .com there). So it will be:

    domain                            words
0   linkedin.com                    linkedin
1   aumniversal.tumblr.com          tumblr
2   plasticdrea.ms                  plasticdrea
3   linkedin.com                    linkedin
4   s-lw.tumblr.com                 tumblr
5   newsonline.media                newsonline
6   creshendo.co.vu                 co
7   deadly-skz-gods-cb.tumblr.com   tumblr
8   deo.progr.am                    progr
Asked By: Rory

||

Answers:

Use Series.str.split and select previous last value by indexing:

df['words'] = df['domain'].str.split('.').str[-2]
print (df)
                          domain        words
0                   linkedin.com     linkedin
1         aumniversal.tumblr.com       tumblr
2                 plasticdrea.ms  plasticdrea
3                   linkedin.com     linkedin
4                s-lw.tumblr.com       tumblr
5               newsonline.media   newsonline
6                creshendo.co.vu           co
7  deadly-skz-gods-cb.tumblr.com       tumblr
8                   deo.progr.am        progr
Answered By: jezrael

Use str.extract

df['words'] = df['domain'].str.extract(r'([^.]+).[^.]*$')

output:

                          domain        words
0                   linkedin.com     linkedin
1         aumniversal.tumblr.com       tumblr
2                 plasticdrea.ms  plasticdrea
3                   linkedin.com     linkedin
4                s-lw.tumblr.com       tumblr
5               newsonline.media   newsonline
6                creshendo.co.vu           co
7  deadly-skz-gods-cb.tumblr.com       tumblr
8                   deo.progr.am        progr

regex demo

([^.]+)   # capture word
.[^.]*   # followed by .xxx
$         # and end of line
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.