How to handle Japanese characters?
Question:
I have an input in Japanese language from other source which is out of my control.
But I get this error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 15-41: character maps to undefined
Code:
import mutagen
def addTag(fpath, title, albumName):
audio = mutagen.File(fpath, easy=True)
audio.add_tags()
audio['title'] = title
audio['album'] = albumName
audio.save(fpath)
# The Code below this comment is out of my control but this is how it is implemented
file = "1.mp3"
title = "We Must Go TV"
album = "アニメ「風が強く吹いている」オリジナルサウンドトラック"
addTag(file, title, album)
Answers:
Read the documentation: https://docs.python.org/3/howto/unicode.html
It says how to process and include non-ASCII text in your Python code. Essentially, you use unicode literals to represent a single character. This will print one character:
ru = u'u30EB'
You could also try to force the string to be a unicode
object in python 2:
album = u"uアニメ「風が強く吹いている」オリジナルサウンドトラック"
By default, all strings are already unicode.
Also check out this informative video: https://www.youtube.com/watch?v=oEbNWXhS_mk
there are 2 solutions
1- when you look at end of error message you will see the encoding library have the issue in my case it was cp1252 I could insert the value by encode string first to utf-8 then decode the string to the library have the issue and used ignore errors so data will inserted but the unknown characters will replaced with not needed characters eg: ð¥ð¢ð ð¡ð ð“ð, but this not best way but good for inserting the data without errors
newCatalogue.product_description = newCatalogue.product_description.encode('utf-8').decode('cp1252', 'ignore')
second and this what I used I added charset in connection url and data inserted normal
mysql://user:pass@localhost/dbname?charset=utf8mb4
I have an input in Japanese language from other source which is out of my control.
But I get this error:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 15-41: character maps to undefined
Code:
import mutagen
def addTag(fpath, title, albumName):
audio = mutagen.File(fpath, easy=True)
audio.add_tags()
audio['title'] = title
audio['album'] = albumName
audio.save(fpath)
# The Code below this comment is out of my control but this is how it is implemented
file = "1.mp3"
title = "We Must Go TV"
album = "アニメ「風が強く吹いている」オリジナルサウンドトラック"
addTag(file, title, album)
Read the documentation: https://docs.python.org/3/howto/unicode.html
It says how to process and include non-ASCII text in your Python code. Essentially, you use unicode literals to represent a single character. This will print one character:
ru = u'u30EB'
You could also try to force the string to be a unicode
object in python 2:
album = u"uアニメ「風が強く吹いている」オリジナルサウンドトラック"
By default, all strings are already unicode.
Also check out this informative video: https://www.youtube.com/watch?v=oEbNWXhS_mk
there are 2 solutions
1- when you look at end of error message you will see the encoding library have the issue in my case it was cp1252 I could insert the value by encode string first to utf-8 then decode the string to the library have the issue and used ignore errors so data will inserted but the unknown characters will replaced with not needed characters eg: ð¥ð¢ð ð¡ð ð“ð, but this not best way but good for inserting the data without errors
newCatalogue.product_description = newCatalogue.product_description.encode('utf-8').decode('cp1252', 'ignore')
second and this what I used I added charset in connection url and data inserted normal
mysql://user:pass@localhost/dbname?charset=utf8mb4