how to extract only letter from a string mixed with numbers with python

Question:

I have this table in my dataframe, the char column is mixed either with letters only, numbers only or the combination between letters and numbers.

char     count
123        24
test       25
te123      26
test123    26

I want to extract only the letters, and if the rows has numbers only then I want to make it blank.

The expected results would be:

char     count
NaN       24
test      25
te        26
test      26

How can I do this in python?

Thank you in advance

Asked By: nomnom3214

||

Answers:

You can use regex to do this.

import pandas as pd
import numpy as np
import re

data = {'char': ['123', 'test', 'te123', 'test123'], 'count': [24, 25, 26, 26]}
df = pd.DataFrame(data)

df['char'] = df['char'].apply(lambda x: re.sub('[^a-zA-Z]+', '', x) if bool(re.search('[a-zA-Z]', x)) else np.nan)

print(df)

Here re.sub('[^a-zA-Z]+', '', x) removes all non letter chars from the string and the next regex bool(re.search('[a-zA-Z]', x)) checks if the resulting string contains a letter else makes it NaN.

Answered By: AishwaryaK

You can use extract :

df["char"] = df["char"].str.extract("([a-zA-Z]+)", expand=False)

If you have intermittent characters like "te12s3t", use findall :

df["char"] = df["char"].str.findall("([a-zA-Z]+)").str.join("")

Or simply replace to handle both cases :

df["char"] = df["char"].replace("d+", "", regex=True).mask(lambda s: s.eq(""))

Or in a @Corralien way, use isdigit combined with replace :

df["char"] = df["char"].mask(df["char"].str.isdigit()).str.replace(r"d+", "", regex=True)

Output :

print(df)

   char  count
0   NaN     24
1  test     25
2    te     26
3  test     26
Answered By: Timeless

We can use str.replace as follows:

df["char"] = df["char"].str.replace(r'd+', '', regex=True)
Answered By: Tim Biegeleisen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.