How can I remove emojis from a dataframe?

Question:

I know that

test = []
for item in my_texts:
    test.append(item.encode('ascii', 'ignore').decode('ascii'))

removes emojis from a list. But how can I remove emojis from a dataframe? When I try

a = []
for item in goldtest['Text']:
    a.append(item.encode('ascii', 'ignore').decode('ascii'))

I get only the last entry of goldtest. When I try the code on the whole dataframe, I get ”AttributeError: ‘DataFrame’ object has no attribute ‘encode”’

Asked By: maybeyourneighour

||

Answers:

This would be the equivalent code for pandas. It operates column by column.

df.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))
Answered By: ivallesp

This will remove all special characters including emojis except letters and numbers from a given Column

goldtest['Text'] = goldtest['Text'].str.replace('[^A-Za-z0-9]', '', flags=re.UNICODE)
Answered By: Skynet

You can use emoji package:

import emoji
df = pd.DataFrame(data={'str_data':['يااا واجعوط هذا راه باغي يبدع فالسانكيام ‍♀️']})
df['str_data'] = df['str_data'].apply(lambda s: emoji.replace_emoji(s, ''))
df

Output:

str_data
يااا واجعوط هذا راه باغي يبدع فالسانكيام
Answered By: Guru Stron
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.