Extract string between two delimiters in Python dataframe

Question:

I am trying to extract values between : and - from this below

>>> all_cancers.iloc[:,3]
0        chr1:100414771-100414772
1          chr1:10506157-10506158
2        chr1:109655506-109655507
3        chr1:113903257-113903258
4        chr1:117598869-117598870

I tried re.findall(':(.*?)-', all_cancers.iloc[:,3].astype(str)) to do this but it generates the following error: TypeError: expected string or bytes-like object.

What is missing here?

Asked By: MAPK

||

Answers:

You can use this pattern,

In [33]: re.match(r'.*:(.*)-',"chr1:100414771-100414772").group(1)
Out[33]: '100414771'

In datafame you can do with apply + lambda

all_cancers.iloc[:,3].apply(lambda x: re.match(r'.*:(.*)-', x).group(1))

Using extract

all_cancers.iloc[:,3].str.extract(r'.*:(.*)-')

(credit: OlvinRoght’s comment)

Debuggex Demo

Answered By: Rahul K P
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.