How to make unicode string with python3
Question:
I used this :
u = unicode(text, 'utf-8')
But getting error with Python 3 (or… maybe I just forgot to include something) :
NameError: global name 'unicode' is not defined
Thank you.
Answers:
Literal strings are unicode by default in Python3.
Assuming that text
is a bytes
object, just use text.decode('utf-8')
unicode
of Python2 is equivalent to str
in Python3, so you can also write:
str(text, 'utf-8')
if you prefer.
What’s new in Python 3.0 says:
All text is Unicode; however encoded Unicode is represented as binary
data
If you want to ensure you are outputting utf-8, here’s an example from this page on unicode in 3.0:
b'x80abc'.decode("utf-8", "strict")
As a workaround, I’ve been using this:
# Fix Python 2.x.
try:
UNICODE_EXISTS = bool(type(unicode))
except NameError:
unicode = lambda s: str(s)
In a Python 2 program that I used for many years there was this line:
ocd[i].namn=unicode(a[:b], 'utf-8')
This did not work in Python 3.
However, the program turned out to work with:
ocd[i].namn=a[:b]
I don’t remember why I put unicode there in the first place, but I think it was because the name can contains Swedish letters åäöÅÄÖ. But even they work without “unicode”.
the easiest way in python 3.x
text = "hi , I'm text"
text.encode('utf-8')
This how I solved my problem to convert chars like uFE0F, u000A, etc. And also emojis that encoded with 16 bytes.
example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\uD83D\uDE0D\uD83D\uDE0D\u2764\uFE0F Present Moment Caf\u00E8 in St.Augustine\u2764\uFE0F\u2764\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla creamud83dude0dud83dude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', errors='surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream ❤️ Present Moment Cafè in St.Augustine❤️❤️ '
I used this :
u = unicode(text, 'utf-8')
But getting error with Python 3 (or… maybe I just forgot to include something) :
NameError: global name 'unicode' is not defined
Thank you.
Literal strings are unicode by default in Python3.
Assuming that text
is a bytes
object, just use text.decode('utf-8')
unicode
of Python2 is equivalent to str
in Python3, so you can also write:
str(text, 'utf-8')
if you prefer.
What’s new in Python 3.0 says:
All text is Unicode; however encoded Unicode is represented as binary
data
If you want to ensure you are outputting utf-8, here’s an example from this page on unicode in 3.0:
b'x80abc'.decode("utf-8", "strict")
As a workaround, I’ve been using this:
# Fix Python 2.x.
try:
UNICODE_EXISTS = bool(type(unicode))
except NameError:
unicode = lambda s: str(s)
In a Python 2 program that I used for many years there was this line:
ocd[i].namn=unicode(a[:b], 'utf-8')
This did not work in Python 3.
However, the program turned out to work with:
ocd[i].namn=a[:b]
I don’t remember why I put unicode there in the first place, but I think it was because the name can contains Swedish letters åäöÅÄÖ. But even they work without “unicode”.
the easiest way in python 3.x
text = "hi , I'm text"
text.encode('utf-8')
This how I solved my problem to convert chars like uFE0F, u000A, etc. And also emojis that encoded with 16 bytes.
example = 'raw vegan chocolate cocoa pie w chocolate & vanilla cream\uD83D\uDE0D\uD83D\uDE0D\u2764\uFE0F Present Moment Caf\u00E8 in St.Augustine\u2764\uFE0F\u2764\uFE0F '
import codecs
new_str = codecs.unicode_escape_decode(example)[0]
print(new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla creamud83dude0dud83dude0d❤️ Present Moment Cafè in St.Augustine❤️❤️ '
new_new_str = new_str.encode('utf-16', errors='surrogatepass').decode('utf-16')
print(new_new_str)
>>> 'raw vegan chocolate cocoa pie w chocolate & vanilla cream ❤️ Present Moment Cafè in St.Augustine❤️❤️ '