Python: replace french letters with english

Question:

I would like to replace all the french letters within words with their ASCII equivalent.

letters = [['é', 'à'], ['è', 'ù'], ['â', 'ê'], ['î', 'ô'], ['û', 'ç']]

for x in letters:
   for a in x:
        a = a.replace('é', 'e')
        a = a.replace('à', 'a')
        a = a.replace('è', 'e')
        a = a.replace('ù', 'u')
        a = a.replace('â', 'a')
        a = a.replace('ê', 'e')
        a = a.replace('î', 'i')
        a = a.replace('ô', 'o')
        a = a.replace('û', 'u')
        a = a.replace('ç', 'c')

print(letters[0][0])

This code prints é however. How can I make this work?

Asked By: David Ferris

||

Answers:

The replace function returns the string with the character replaced.

In your code you don’t store this return value.

The lines in your loop should be a = a.replace('é', 'e').

You also need to store that output so you can print it in the end.


This post explains how variables within loops are accessed.

Answered By: mimre

May I suggest you consider using translation tables.

translationTable = str.maketrans("éàèùâêîôûç", "eaeuaeiouc")

test = "Héllô Càèùverâêt Jîôûç"
test = test.translate(translationTable)
print(test)

will print Hello Caeuveraet Jiouc. Pardon my French.

Answered By: logan rakai

You can also use unidecode. Install it: pip install unidecode.
Then, do:

from unidecode import unidecode

s = "Héllô Càèùverâêt Jîôûç ïîäüë"
s = unidecode(s)
print(s)  # Hello Caeuveraet Jiouc iiaue

The result will be the same string, but the french characters will be converted to their ASCII equivalent: Hello Caeuveraet Jiouc iiaue

Answered By: vvvvv

Here is another solution, using the low level unicode package called unicodedata.

In the unicode structure, a character like ‘ô’ is actually a composite character, made of the character ‘o’ and another character called ‘COMBINING GRAVE ACCENT‘, which is basically the ‘̀’. Using the method decomposition in unicodedata, one can obtain the unicodes (in hex) of these two parts.

>>> import unicodedata as ud
>>> ud.decomposition('ù')
'0075 0300'
>>> chr(0x0075)
'u'
>>> >>> chr(0x0300)
'̀'

Therefore, to retrieve ‘u’ from ‘ù’, we can first do a string split, then use the built-in int function for the conversion(see this thread for converting a hex string to an integer), and then get the character using chr function.

import unicodedata as ud

def get_ascii_char(c):
    s = ud.decomposition(c)
    if s == '': # for an indecomposable character, it returns ''
        return c
    code = int('0x' + s.split()[0], 0)
    return chr(code)
Answered By: Zheng Liu

Although I am new to Python, I would approach it this way:

letterXchange = {'à':'a', 'â':'a', 'ä':'a', 'é':'e', 'è':'e', 'ê':'e', 'ë':'e',
    'î':'i', 'ï':'i', 'ô':'o', 'ö':'o', 'ù':'u', 'û':'u', 'ü':'u', 'ç':'c'}
text = input() # Replace it with the string in your code.
for item in list(text):
    if item in letterXchange:
        text = text.replace(item,letterXchange.get(str(item)))
    else:
        pass
print(text)
Answered By: Adam Alison
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.