Backporting Python 3 open(encoding="utf-8") to Python 2
Question:
I have a Python codebase, built for Python 3, which uses Python 3 style open() with encoding parameter:
https://github.com/miohtama/vvv/blob/master/vvv/textlineplugin.py#L47
with open(fname, "rt", encoding="utf-8") as f:
Now I’d like to backport this code to Python 2.x, so that I would have a codebase which works with Python 2 and Python 3.
What’s the recommended strategy to work around open()
differences and lack of encoding parameter?
Could I have a Python 3 open()
style file handler which streams bytestrings, so it would act like Python 2 open()
?
Answers:
I think
from io import open
should do.
1. To get an encoding parameter in Python 2:
If you only need to support Python 2.6 and 2.7 you can use io.open
instead of open
. io
is the new io subsystem for Python 3, and it exists in Python 2,6 ans 2.7 as well. Please be aware that in Python 2.6 (as well as 3.0) it’s implemented purely in python and very slow, so if you need speed in reading files, it’s not a good option.
If you need speed, and you need to support Python 2.6 or earlier, you can use codecs.open
instead. It also has an encoding parameter, and is quite similar to io.open
except it handles line-endings differently.
2. To get a Python 3 open()
style file handler which streams bytestrings:
open(filename, 'rb')
Note the ‘b’, meaning ‘binary’.
This may do the trick:
import sys
if sys.version_info[0] > 2:
# py3k
pass
else:
# py2
import codecs
import warnings
def open(file, mode='r', buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None):
if newline is not None:
warnings.warn('newline is not supported in py2')
if not closefd:
warnings.warn('closefd is not supported in py2')
if opener is not None:
warnings.warn('opener is not supported in py2')
return codecs.open(filename=file, mode=mode, encoding=encoding,
errors=errors, buffering=buffering)
Then you can keep you code in the python3 way.
Note that some APIs like newline
, closefd
, opener
do not work
Here’s one way:
with open("filename.txt", "rb") as f:
contents = f.read().decode("UTF-8")
Here’s how to do the same thing when writing:
with open("filename.txt", "wb") as f:
f.write(contents.encode("UTF-8"))
If you are using six
, you can try this, by which utilizing the latest Python 3 API and can run in both Python 2/3:
import six
if six.PY2:
# FileNotFoundError is only available since Python 3.3
FileNotFoundError = IOError
from io import open
fname = 'index.rst'
try:
with open(fname, "rt", encoding="utf-8") as f:
pass
# do_something_with_f ...
except FileNotFoundError:
print('Oops.')
And, Python 2 support abandon is just deleting everything related to six
.
Not a general answer, but may be useful for the specific case where you are happy with the default python 2 encoding, but want to specify utf-8 for python 3:
if sys.version_info.major > 2:
do_open = lambda filename: open(filename, encoding='utf-8')
else:
do_open = lambda filename: open(filename)
with do_open(filename) as file:
pass
I have a Python codebase, built for Python 3, which uses Python 3 style open() with encoding parameter:
https://github.com/miohtama/vvv/blob/master/vvv/textlineplugin.py#L47
with open(fname, "rt", encoding="utf-8") as f:
Now I’d like to backport this code to Python 2.x, so that I would have a codebase which works with Python 2 and Python 3.
What’s the recommended strategy to work around open()
differences and lack of encoding parameter?
Could I have a Python 3 open()
style file handler which streams bytestrings, so it would act like Python 2 open()
?
I think
from io import open
should do.
1. To get an encoding parameter in Python 2:
If you only need to support Python 2.6 and 2.7 you can use io.open
instead of open
. io
is the new io subsystem for Python 3, and it exists in Python 2,6 ans 2.7 as well. Please be aware that in Python 2.6 (as well as 3.0) it’s implemented purely in python and very slow, so if you need speed in reading files, it’s not a good option.
If you need speed, and you need to support Python 2.6 or earlier, you can use codecs.open
instead. It also has an encoding parameter, and is quite similar to io.open
except it handles line-endings differently.
2. To get a Python 3 open()
style file handler which streams bytestrings:
open(filename, 'rb')
Note the ‘b’, meaning ‘binary’.
This may do the trick:
import sys
if sys.version_info[0] > 2:
# py3k
pass
else:
# py2
import codecs
import warnings
def open(file, mode='r', buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None):
if newline is not None:
warnings.warn('newline is not supported in py2')
if not closefd:
warnings.warn('closefd is not supported in py2')
if opener is not None:
warnings.warn('opener is not supported in py2')
return codecs.open(filename=file, mode=mode, encoding=encoding,
errors=errors, buffering=buffering)
Then you can keep you code in the python3 way.
Note that some APIs like newline
, closefd
, opener
do not work
Here’s one way:
with open("filename.txt", "rb") as f:
contents = f.read().decode("UTF-8")
Here’s how to do the same thing when writing:
with open("filename.txt", "wb") as f:
f.write(contents.encode("UTF-8"))
If you are using six
, you can try this, by which utilizing the latest Python 3 API and can run in both Python 2/3:
import six
if six.PY2:
# FileNotFoundError is only available since Python 3.3
FileNotFoundError = IOError
from io import open
fname = 'index.rst'
try:
with open(fname, "rt", encoding="utf-8") as f:
pass
# do_something_with_f ...
except FileNotFoundError:
print('Oops.')
And, Python 2 support abandon is just deleting everything related to six
.
Not a general answer, but may be useful for the specific case where you are happy with the default python 2 encoding, but want to specify utf-8 for python 3:
if sys.version_info.major > 2:
do_open = lambda filename: open(filename, encoding='utf-8')
else:
do_open = lambda filename: open(filename)
with do_open(filename) as file:
pass