How to remove non-ascii characters from a list

Question:

I have an object type DataFrame with some elements that are text and some are numbers.

when I convert a column to a list, some of the elements have non-ascii characters.
Is there a way to get rid of the characters, like .encode('ascii', 'ignore') but for a list?

here is the list that I get:

['Central Parku202c',
 'Top of the Rock',
 'Statue of Libertyu202c',
 'Brooklyn Bridge'
]
Asked By: Amit Goft

||

Answers:

You can use the str accessor:

df.my_column.str.encode('ascii','ignore').str.decode('ascii').tolist()
Answered By: Stef

If you want to post-process your list, you can apply encode('ascii', 'ignore') over it:

my_list = [
    'Central Parku202c',
    'Top of the Rock',
    'Statue of Libertyu202c',
    'Brooklyn Bridge'
]
my_list = [e.encode('ascii', 'ignore').decode("utf-8") for e in my_list]
print(my_list)

And the output should be:

['Central Park', 'Top of the Rock', 'Statue of Liberty', 'Brooklyn Bridge']
Answered By: Giorgos Myrianthous
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.