How to remove non-ascii characters from a list

Question

I have an object type DataFrame with some elements that are text and some are numbers.

when I convert a column to a list, some of the elements have non-ascii characters.
Is there a way to get rid of the characters, like .encode('ascii', 'ignore') but for a list?

here is the list that I get:

['Central Parku202c',
 'Top of the Rock',
 'Statue of Libertyu202c',
 'Brooklyn Bridge'
]

Asked By: Amit Goft

||

Source

Answer 1

You can use the str accessor:

df.my_column.str.encode('ascii','ignore').str.decode('ascii').tolist()

Answered By: Stef

Answer 2

If you want to post-process your list, you can apply encode('ascii', 'ignore') over it:

my_list = [
    'Central Parku202c',
    'Top of the Rock',
    'Statue of Libertyu202c',
    'Brooklyn Bridge'
]
my_list = [e.encode('ascii', 'ignore').decode("utf-8") for e in my_list]
print(my_list)

And the output should be:

['Central Park', 'Top of the Rock', 'Statue of Liberty', 'Brooklyn Bridge']

Answered By: Giorgos Myrianthous

How to remove non-ascii characters from a list

Question:

Answers: