Remove string from alpha numeric column in python

Question:

I have a dataframe

import pandas as pd    
data_as_dict={'CHROM': {232: 1, 233: 1, 234: 1, 10: 'chr15', 11: 'chr15'}, 'POS_GRCh38': {232: 10506158, 233: 109655507, 234: 113903258, 10: '67165147', 11: '67163292'}, 'REF': {232: 'G', 233: 'CAAA', 234: 'G', 10: 'G', 11: 'C'}, 'Effect_allele': {232: 'A', 233: 'C', 234: 'A', 10: 'C', 11: 'T'}, 'Effect_size': {232: 0.1109, 233: 0.0266, 234: 0.0579, 10: 0.2070141693843261, 11: 0.2151113796169455}, 'TYPE': {232: 'Mavaddat_2019_ER_NEG_Breast', 233: 'Mavaddat_2019_ER_NEG_Breast', 234: 'Mavaddat_2019_ER_NEG_Breast', 10: 'THYROID_PGS', 11: 'THYROID_PGS'}, 'Cancer': {232: 'Breast', 233: 'Breast', 234: 'Breast', 10: 'THYROID', 11: 'THYROID'}, 'Significant_YN': {232: 'Y', 233: 'Y', 234: 'Y', 10: 'Y', 11: 'Y'}} 

all_cancers = pd.DataFrame.from_dict(data_as_dict)

I want to remove chr from CHROM column. I tried all_cancers['CHROM'] = all_cancers['CHROM'].str.replace(r'chr', '') which generates NaNs. I know it can be done easily in R with gsub, but I wanted to try in python. How do I do it correctly?

Asked By: MAPK

||

Answers:

We could cast the column type as string and it should work

all_cancers['CHROM'] = all_cancers['CHROM'].astype(str).str.replace(r'chr', '')

-output

all_cancers
    CHROM   POS_GRCh38  REF Effect_allele   Effect_size TYPE    Cancer  Significant_YN
232 1   10506158    G   A   0.110900    Mavaddat_2019_ER_NEG_Breast Breast  Y
233 1   109655507   CAAA    C   0.026600    Mavaddat_2019_ER_NEG_Breast Breast  Y
234 1   113903258   G   A   0.057900    Mavaddat_2019_ER_NEG_Breast Breast  Y
10  15  67165147    G   C   0.207014    THYROID_PGS THYROID Y
11  15  67163292    C   T   0.215111    THYROID_PGS THYROID
Answered By: akrun

Using RegEx;

import re
all_cancers["CHROM"] = all_cancers["CHROM"].apply(lambda x: re.sub('D', '', str(x)))
Answered By: Sachin Kohli
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.