Python json.loads fails with `ValueError: Invalid control character at: line 1 column 33 (char 33)`
Question:
I have a string like this:
s = u"""{"desc": "u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026"}"""
json.loads(s)
returns error message like this:
ValueError: Invalid control character at: line 1 column 33 (char 33)
Why does this error occur? How can I solve this problem?
Answers:
Try to escape your n
and r
:
s = s.replace('r', '\r').replace('n', '\n')
json.loads(s)
>>> {u'desc': u'u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026'}
The problem is your unicode string contains carriage returns (r
) and newlines (n
) within a string literal in the JSON data. If they were meant to be part of the string itself, they should be escaped appropriately. If they weren’t meant to be part of the string, they shouldn’t be in your JSON either.
If you can’t fix where you got this JSON string to produce valid JSON, you could either remove the offending characters:
>>> json.loads(s.replace('rn', ''))
or escape them manually:
>>> json.loads(s.replace('rn', '\r\n'))
The problem is that the character at index 33 is a carriage return control character.
>>> s[33]
u'r'
According to the JSON spec, valid characters are:
-
Any Unicode character except: "
,
, and control-characters (ord(char) < 32
).
-
The following character sequences are allowed: "
, \
, /
, b
(backspace), f
(form feed), n
(line-feed/new-line), r
(carriage return), t
(tab), or u
followed by four hexadecimal digits.
However, in Python you’re going to have to double escape control characters (unless the string is raw) because Python also interprets those control characters.
>>> s = ur"""{"desc": "u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026"}"""
>>> json.loads(s)
{u'desc': u'u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026'}
References:
Another option, perhaps, is to use the strict=False
argument
According to http://docs.python.org/2/library/json.html
“If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including ‘t’ (tab), ‘n’, ‘r’ and ‘
I have a string like this:
s = u"""{"desc": "u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026"}"""
json.loads(s)
returns error message like this:
ValueError: Invalid control character at: line 1 column 33 (char 33)
Why does this error occur? How can I solve this problem?
Try to escape your n
and r
:
s = s.replace('r', '\r').replace('n', '\n')
json.loads(s)
>>> {u'desc': u'u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026'}
The problem is your unicode string contains carriage returns (r
) and newlines (n
) within a string literal in the JSON data. If they were meant to be part of the string itself, they should be escaped appropriately. If they weren’t meant to be part of the string, they shouldn’t be in your JSON either.
If you can’t fix where you got this JSON string to produce valid JSON, you could either remove the offending characters:
>>> json.loads(s.replace('rn', ''))
or escape them manually:
>>> json.loads(s.replace('rn', '\r\n'))
The problem is that the character at index 33 is a carriage return control character.
>>> s[33]
u'r'
According to the JSON spec, valid characters are:
-
Any Unicode character except:
"
,, and control-characters (
ord(char) < 32
). -
The following character sequences are allowed:
"
,\
,/
,b
(backspace),f
(form feed),n
(line-feed/new-line),r
(carriage return),t
(tab), oru
followed by four hexadecimal digits.
However, in Python you’re going to have to double escape control characters (unless the string is raw) because Python also interprets those control characters.
>>> s = ur"""{"desc": "u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026"}"""
>>> json.loads(s)
{u'desc': u'u73cdu54c1u7f51-u5168u7403u6f6eu6d41u5962u54c1u7f51u7edcu96f6u552eu5546 <br />rnhttp://www.zhenpin.com/ <br />rn<br />rn200u591au4e2au56fdu9645u4e00u7ebfu54c1u724cuff0cu9876u7ea7u4e70u624bu5168u7403u91c7u8d2duff0c100%u6b63u54c1u4fddu969cuff0c7u5929u65e0u6761u2026'}
References:
Another option, perhaps, is to use the strict=False
argument
According to http://docs.python.org/2/library/json.html
“If strict is False (True is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0-31 range, including ‘t’ (tab), ‘n’, ‘r’ and ‘