Python: replace french letters with english
Question:
I would like to replace all the french letters within words with their ASCII equivalent.
letters = [['é', 'à'], ['è', 'ù'], ['â', 'ê'], ['î', 'ô'], ['û', 'ç']]
for x in letters:
for a in x:
a = a.replace('é', 'e')
a = a.replace('à', 'a')
a = a.replace('è', 'e')
a = a.replace('ù', 'u')
a = a.replace('â', 'a')
a = a.replace('ê', 'e')
a = a.replace('î', 'i')
a = a.replace('ô', 'o')
a = a.replace('û', 'u')
a = a.replace('ç', 'c')
print(letters[0][0])
This code prints é
however. How can I make this work?
Answers:
The replace
function returns the string with the character replaced.
In your code you don’t store this return value.
The lines in your loop should be a = a.replace('é', 'e')
.
You also need to store that output so you can print it in the end.
This post explains how variables within loops are accessed.
May I suggest you consider using translation tables.
translationTable = str.maketrans("éàèùâêîôûç", "eaeuaeiouc")
test = "Héllô Càèùverâêt Jîôûç"
test = test.translate(translationTable)
print(test)
will print Hello Caeuveraet Jiouc
. Pardon my French.
You can also use unidecode
. Install it: pip install unidecode
.
Then, do:
from unidecode import unidecode
s = "Héllô Càèùverâêt Jîôûç ïîäüë"
s = unidecode(s)
print(s) # Hello Caeuveraet Jiouc iiaue
The result will be the same string, but the french characters will be converted to their ASCII equivalent: Hello Caeuveraet Jiouc iiaue
Here is another solution, using the low level unicode package called unicodedata
.
In the unicode structure, a character like ‘ô’ is actually a composite character, made of the character ‘o’ and another character called ‘COMBINING GRAVE ACCENT‘, which is basically the ‘̀’. Using the method decomposition
in unicodedata
, one can obtain the unicodes (in hex) of these two parts.
>>> import unicodedata as ud
>>> ud.decomposition('ù')
'0075 0300'
>>> chr(0x0075)
'u'
>>> >>> chr(0x0300)
'̀'
Therefore, to retrieve ‘u’ from ‘ù’, we can first do a string split, then use the built-in int
function for the conversion(see this thread for converting a hex string to an integer), and then get the character using chr
function.
import unicodedata as ud
def get_ascii_char(c):
s = ud.decomposition(c)
if s == '': # for an indecomposable character, it returns ''
return c
code = int('0x' + s.split()[0], 0)
return chr(code)
Although I am new to Python, I would approach it this way:
letterXchange = {'à':'a', 'â':'a', 'ä':'a', 'é':'e', 'è':'e', 'ê':'e', 'ë':'e',
'î':'i', 'ï':'i', 'ô':'o', 'ö':'o', 'ù':'u', 'û':'u', 'ü':'u', 'ç':'c'}
text = input() # Replace it with the string in your code.
for item in list(text):
if item in letterXchange:
text = text.replace(item,letterXchange.get(str(item)))
else:
pass
print(text)
I would like to replace all the french letters within words with their ASCII equivalent.
letters = [['é', 'à'], ['è', 'ù'], ['â', 'ê'], ['î', 'ô'], ['û', 'ç']]
for x in letters:
for a in x:
a = a.replace('é', 'e')
a = a.replace('à', 'a')
a = a.replace('è', 'e')
a = a.replace('ù', 'u')
a = a.replace('â', 'a')
a = a.replace('ê', 'e')
a = a.replace('î', 'i')
a = a.replace('ô', 'o')
a = a.replace('û', 'u')
a = a.replace('ç', 'c')
print(letters[0][0])
This code prints é
however. How can I make this work?
The replace
function returns the string with the character replaced.
In your code you don’t store this return value.
The lines in your loop should be a = a.replace('é', 'e')
.
You also need to store that output so you can print it in the end.
This post explains how variables within loops are accessed.
May I suggest you consider using translation tables.
translationTable = str.maketrans("éàèùâêîôûç", "eaeuaeiouc")
test = "Héllô Càèùverâêt Jîôûç"
test = test.translate(translationTable)
print(test)
will print Hello Caeuveraet Jiouc
. Pardon my French.
You can also use unidecode
. Install it: pip install unidecode
.
Then, do:
from unidecode import unidecode
s = "Héllô Càèùverâêt Jîôûç ïîäüë"
s = unidecode(s)
print(s) # Hello Caeuveraet Jiouc iiaue
The result will be the same string, but the french characters will be converted to their ASCII equivalent: Hello Caeuveraet Jiouc iiaue
Here is another solution, using the low level unicode package called unicodedata
.
In the unicode structure, a character like ‘ô’ is actually a composite character, made of the character ‘o’ and another character called ‘COMBINING GRAVE ACCENT‘, which is basically the ‘̀’. Using the method decomposition
in unicodedata
, one can obtain the unicodes (in hex) of these two parts.
>>> import unicodedata as ud
>>> ud.decomposition('ù')
'0075 0300'
>>> chr(0x0075)
'u'
>>> >>> chr(0x0300)
'̀'
Therefore, to retrieve ‘u’ from ‘ù’, we can first do a string split, then use the built-in int
function for the conversion(see this thread for converting a hex string to an integer), and then get the character using chr
function.
import unicodedata as ud
def get_ascii_char(c):
s = ud.decomposition(c)
if s == '': # for an indecomposable character, it returns ''
return c
code = int('0x' + s.split()[0], 0)
return chr(code)
Although I am new to Python, I would approach it this way:
letterXchange = {'à':'a', 'â':'a', 'ä':'a', 'é':'e', 'è':'e', 'ê':'e', 'ë':'e',
'î':'i', 'ï':'i', 'ô':'o', 'ö':'o', 'ù':'u', 'û':'u', 'ü':'u', 'ç':'c'}
text = input() # Replace it with the string in your code.
for item in list(text):
if item in letterXchange:
text = text.replace(item,letterXchange.get(str(item)))
else:
pass
print(text)