Correct way to define Python source code encoding
Question:
PEP 263 defines how to declare Python source code encoding. Normally, the first 2 lines of a Python file should start with:
#!/usr/bin/python
# -*- coding: <encoding name> -*-
But I have seen a lot of files starting with:
#!/usr/bin/python
# -*- encoding: <encoding name> -*-
I.e., it says encoding
rather than coding
.
How should the file encoding be declared?
Please use "SyntaxError: Non-ASCII character …" or "SyntaxError: Non-UTF-8 code starting with …" trying to use non-ASCII text in a Python script to close duplicate questions about syntax errors resulting from a missing or faulty encoding declaration. This question, on the other hand, is the canonical for questions about how the declaration is written and whether it is necessary.
Answers:
I suspect it is similar to Ruby – either method is okay.
This is largely because different text editors use different methods (ie, these two) of marking encoding.
With Ruby, as long as the first, or second if there is a shebang line contains a string that matches:
coding: encoding-name
and ignoring any whitespace and other fluff on those lines. (It can often be a = instead of :, too).
If I’m not mistaken, the original proposal for source file encodings was to use a regular expression for the first couple of lines, which would allow both.
I think the regex was something along the lines of coding:
followed by something.
I found this: http://www.python.org/dev/peps/pep-0263/
Which is the original proposal, but I can’t seem to find the final spec stating exactly what they did.
I’ve certainly used encoding:
to great effect, so obviously that works.
Try changing to something completely different, like duhcoding: ...
to see if that works just as well.
Check the docs here:
“If a comment in the first or second line of the Python script matches the regular expression coding[=:]s*([-w.]+)
, this comment is processed as an encoding declaration”
“The recommended forms of this expression are
# -*- coding: <encoding-name> -*-
which is recognized also by GNU Emacs, and
# vim_fileencoding=<encoding-name>
which is recognized by Bram Moolenaar’s VIM.”
So, you can put pretty much anything before the “coding” part, but stick to “coding” (with no prefix) if you want to be 100% python-docs-recommendation-compatible.
More specifically, you need to use whatever is recognized by Python and the specific editing software you use (if it needs/accepts anything at all). E.g. the coding
form is recognized (out of the box) by GNU Emacs but not Vim (yes, without a universal agreement, it’s essentially a turf war).
PEP 263:
the first or second line must match
the regular
expression “coding[:=]s*([-w.]+)”
So, “encoding: UTF-8” matches.
PEP provides some examples:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
# This Python file uses the following encoding: utf-8
import os, sys
Just copy paste below statement on the top of your program.It will solve character encoding problems
#!/usr/bin/env python
# -*- coding: utf-8 -*-
As of today — June 2018
PEP 263 itself mentions the regex it follows:
To define a source code encoding, a magic comment must be placed into
the source files either as first or second line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors):
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
More precisely, the first or second line must match the following regular expression:
^[ tf]*#.*?coding[:=][ t]*([-_.a-zA-Z0-9]+)
So, as already summed up by other answers, it’ll match coding
with any prefix, but if you’d like to be as PEP-compliant as it gets (even though, as far as I can tell, using encoding
instead of coding
does not violate PEP 263 in any way) — stick with ‘plain’ coding
, with no prefixes.
PEP 263 defines how to declare Python source code encoding. Normally, the first 2 lines of a Python file should start with:
#!/usr/bin/python
# -*- coding: <encoding name> -*-
But I have seen a lot of files starting with:
#!/usr/bin/python
# -*- encoding: <encoding name> -*-
I.e., it says encoding
rather than coding
.
How should the file encoding be declared?
Please use "SyntaxError: Non-ASCII character …" or "SyntaxError: Non-UTF-8 code starting with …" trying to use non-ASCII text in a Python script to close duplicate questions about syntax errors resulting from a missing or faulty encoding declaration. This question, on the other hand, is the canonical for questions about how the declaration is written and whether it is necessary.
I suspect it is similar to Ruby – either method is okay.
This is largely because different text editors use different methods (ie, these two) of marking encoding.
With Ruby, as long as the first, or second if there is a shebang line contains a string that matches:
coding: encoding-name
and ignoring any whitespace and other fluff on those lines. (It can often be a = instead of :, too).
If I’m not mistaken, the original proposal for source file encodings was to use a regular expression for the first couple of lines, which would allow both.
I think the regex was something along the lines of coding:
followed by something.
I found this: http://www.python.org/dev/peps/pep-0263/
Which is the original proposal, but I can’t seem to find the final spec stating exactly what they did.
I’ve certainly used encoding:
to great effect, so obviously that works.
Try changing to something completely different, like duhcoding: ...
to see if that works just as well.
Check the docs here:
“If a comment in the first or second line of the Python script matches the regular expression coding[=:]s*([-w.]+)
, this comment is processed as an encoding declaration”
“The recommended forms of this expression are
# -*- coding: <encoding-name> -*-
which is recognized also by GNU Emacs, and
# vim_fileencoding=<encoding-name>
which is recognized by Bram Moolenaar’s VIM.”
So, you can put pretty much anything before the “coding” part, but stick to “coding” (with no prefix) if you want to be 100% python-docs-recommendation-compatible.
More specifically, you need to use whatever is recognized by Python and the specific editing software you use (if it needs/accepts anything at all). E.g. the coding
form is recognized (out of the box) by GNU Emacs but not Vim (yes, without a universal agreement, it’s essentially a turf war).
PEP 263:
the first or second line must match
the regular
expression “coding[:=]s*([-w.]+)”
So, “encoding: UTF-8” matches.
PEP provides some examples:
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
# This Python file uses the following encoding: utf-8
import os, sys
Just copy paste below statement on the top of your program.It will solve character encoding problems
#!/usr/bin/env python
# -*- coding: utf-8 -*-
As of today — June 2018
PEP 263 itself mentions the regex it follows:
To define a source code encoding, a magic comment must be placed into
the source files either as first or second line in the file, such as:# coding=<encoding name>
or (using formats recognized by popular editors):
#!/usr/bin/python # -*- coding: <encoding name> -*-
or:
#!/usr/bin/python # vim: set fileencoding=<encoding name> :
More precisely, the first or second line must match the following regular expression:
^[ tf]*#.*?coding[:=][ t]*([-_.a-zA-Z0-9]+)
So, as already summed up by other answers, it’ll match coding
with any prefix, but if you’d like to be as PEP-compliant as it gets (even though, as far as I can tell, using encoding
instead of coding
does not violate PEP 263 in any way) — stick with ‘plain’ coding
, with no prefixes.