What is the difference between encoding utf-8 and utf8 in Python 3.5?
Question:
What is the difference between encoding utf-8
and utf8
(if there is any)?
Given the following example:
u = u'€'
print('utf-8', u.encode('utf-8'))
print('utf8 ', u.encode('utf8'))
It produces the following output:
utf-8 b'xe2x82xac'
utf8 b'xe2x82xac'
Answers:
There’s no difference. See the table of standard encodings. Specifically for 'utf_8'
, the following are all valid aliases:
'U8', 'UTF', 'utf8'
Also note the statement in the first paragraph:
Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g. 'utf-8'
is a valid alias for the 'utf_8'
codec
You can also check the aliases of a specific encoding using encodings
module, this way, which will give you a Key matching aliases as values:
>>> from encodings.aliases import aliases
>>>
>>> for k,v in aliases.items():
if 'utf_8' in v:
print('Encoding name:{:>10} -- Module Name: {:}'.format(k,v))
Encoding name: utf -- Module Name: utf_8
Encoding name: u8 -- Module Name: utf_8
Encoding name: utf8_ucs4 -- Module Name: utf_8
Encoding name: utf8_ucs2 -- Module Name: utf_8
Encoding name: utf8 -- Module Name: utf_8
And as pointed by the mgilson‘s answer:
Notice that spelling alternatives that only differ in case or use a
hyphen instead of an underscore are also valid aliases; therefore,
e.g. ‘utf-8’ is a valid alias for the ‘utf_8’ codec.
What is the difference between encoding utf-8
and utf8
(if there is any)?
Given the following example:
u = u'€'
print('utf-8', u.encode('utf-8'))
print('utf8 ', u.encode('utf8'))
It produces the following output:
utf-8 b'xe2x82xac'
utf8 b'xe2x82xac'
There’s no difference. See the table of standard encodings. Specifically for 'utf_8'
, the following are all valid aliases:
'U8', 'UTF', 'utf8'
Also note the statement in the first paragraph:
Notice that spelling alternatives that only differ in case or use a hyphen instead of an underscore are also valid aliases; therefore, e.g.
'utf-8'
is a valid alias for the'utf_8'
codec
You can also check the aliases of a specific encoding using encodings
module, this way, which will give you a Key matching aliases as values:
>>> from encodings.aliases import aliases
>>>
>>> for k,v in aliases.items():
if 'utf_8' in v:
print('Encoding name:{:>10} -- Module Name: {:}'.format(k,v))
Encoding name: utf -- Module Name: utf_8
Encoding name: u8 -- Module Name: utf_8
Encoding name: utf8_ucs4 -- Module Name: utf_8
Encoding name: utf8_ucs2 -- Module Name: utf_8
Encoding name: utf8 -- Module Name: utf_8
And as pointed by the mgilson‘s answer:
Notice that spelling alternatives that only differ in case or use a
hyphen instead of an underscore are also valid aliases; therefore,
e.g. ‘utf-8’ is a valid alias for the ‘utf_8’ codec.