Invalid control character with Python json.loads
Question:
Below is my string that is getting printed out with the below code –
jsonString = data.decode("utf-8")
print jsonString
And below is the string that got printed out on the console –
{"description":"Script to check testtbeat of TEST 1 server.", "script":"#!/bin/bashnset -ennCOUNT=60 #number of 10 second timeouts in 10 minutesnSUM_SYNCS=0nSUM_SYNCS_BEHIND=0nHOSTNAME=$hostname nnwhile [[ $COUNT -ge "0" ]]; donnecho $HOSTNAMEnn#send the request, put response in variablenDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)nn#grep $DATA for syncs and syncs_behindnSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')nnecho $SYNCSnecho $SYNCS_BEHINDnn#verify conditionalsnif [[ $SYNCS -gt "8" && $SYNCS_BEHIND -eq "0" ]]; then exit 0; finn#decrement the counternlet COUNT-=1nn#wait another 10 secondsnsleep 10nndonen"}
But when I load this out using python json.loads
as shown below-
jStr = json.loads(jsonString)
I am getting this error –
ERROR Invalid control character at: line 1 column 202 (char 202)
I looked at char 202 but I have no idea why that is causing an issue? char 202 in my notepad++ is e
I guess.. Or may be I am calculating it wrong
Any idea what is wrong? How do I find out which one is causing problem.
UPDATE:-
jsonString = {"description":"Script to check testtbeat of TIER 1 server.", "script":"#!/bin/bashnset -ennCOUNT=60 #number of 10 second timeouts in 10 minutesnSUM_SYNCS=0nSUM_SYNCS_BEHIND=0nHOSTNAME=$hostname nnwhile [[ $COUNT -ge "0" ]]; donnecho $HOSTNAMEnn#send the request, put response in variablenDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)nn#grep $DATA for syncs and syncs_behindnSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')nnecho $SYNCSnecho $SYNCS_BEHINDnn#verify conditionalsnif [[ $SYNCS -gt "8" && $SYNCS_BEHIND -eq "0" ]]; then exit 0; finn#decrement the counternlet COUNT-=1nn#wait another 10 secondsnsleep 10nndonen"}
print jsonString[202]
Below error I got –
KeyError: 202
Answers:
{"description":"Script to check testtbeat of TEST 1 server.", "script":"#!/bin/bash\nset -e\n\nCOUNT=60 #number of 10 second timeouts in 10 minutes\nSUM_SYNCS=0\nSUM_SYNCS_BEHIND=0\nHOSTNAME=$hostname #dc1dbx1145.dc1.host.com\n\nwhile [[ $COUNT -ge \"0\" ]]; do\n\necho $HOSTNAME\n\n#send the request, put response in variable\nDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)\n\n#grep $DATA for syncs and syncs_behind\nSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')\nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')\n\necho $SYNCS\necho $SYNCS_BEHIND\n\n#verify conditionals\nif [[ $SYNCS -gt \"8\" && $SYNCS_BEHIND -eq \"0\" ]]; then exit 0; fi\n\n#decrement the counter\nlet COUNT-=1\n\n#wait another 10 seconds\nsleep 10\n\ndone\n"}
Works for me.
Also, if you get an error like this in the future, a debugging technique you can use is to shorten the string to something that works and slowly add data until it doesn’t.
There is no error in your json text.
You can get the error if you copy-paste the string into your Python source code as a string literal. In that case n
is interpreted as a single character (newline). You can fix it by using raw-string literals instead (r''
, Use triple-quotes r'''..'''
to avoid escaping "'
quotes inside the string literal).
The control character can be allowed inside a string as follows,
json_str = json.loads(jsonString, strict=False)
You can find this in the docs for python 2, or the docs for python 3
If strict is false (True
is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0–31 range, including 't'
(tab), 'n'
, 'r'
and ' '
.
try to use "strict=False" in json.loads , it will ignore "n" and another Control characters. like the follwing:
import json
test_string = ' { "key1" : "1015391654687" , "key2": "value2 n " } '
res = json.loads(test_string, strict=False)
print(res)
output :
{'key1': '1015391654687', 'key2': 'value2 n '}
Below is my string that is getting printed out with the below code –
jsonString = data.decode("utf-8")
print jsonString
And below is the string that got printed out on the console –
{"description":"Script to check testtbeat of TEST 1 server.", "script":"#!/bin/bashnset -ennCOUNT=60 #number of 10 second timeouts in 10 minutesnSUM_SYNCS=0nSUM_SYNCS_BEHIND=0nHOSTNAME=$hostname nnwhile [[ $COUNT -ge "0" ]]; donnecho $HOSTNAMEnn#send the request, put response in variablenDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)nn#grep $DATA for syncs and syncs_behindnSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')nnecho $SYNCSnecho $SYNCS_BEHINDnn#verify conditionalsnif [[ $SYNCS -gt "8" && $SYNCS_BEHIND -eq "0" ]]; then exit 0; finn#decrement the counternlet COUNT-=1nn#wait another 10 secondsnsleep 10nndonen"}
But when I load this out using python json.loads
as shown below-
jStr = json.loads(jsonString)
I am getting this error –
ERROR Invalid control character at: line 1 column 202 (char 202)
I looked at char 202 but I have no idea why that is causing an issue? char 202 in my notepad++ is e
I guess.. Or may be I am calculating it wrong
Any idea what is wrong? How do I find out which one is causing problem.
UPDATE:-
jsonString = {"description":"Script to check testtbeat of TIER 1 server.", "script":"#!/bin/bashnset -ennCOUNT=60 #number of 10 second timeouts in 10 minutesnSUM_SYNCS=0nSUM_SYNCS_BEHIND=0nHOSTNAME=$hostname nnwhile [[ $COUNT -ge "0" ]]; donnecho $HOSTNAMEnn#send the request, put response in variablenDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)nn#grep $DATA for syncs and syncs_behindnSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')nnecho $SYNCSnecho $SYNCS_BEHINDnn#verify conditionalsnif [[ $SYNCS -gt "8" && $SYNCS_BEHIND -eq "0" ]]; then exit 0; finn#decrement the counternlet COUNT-=1nn#wait another 10 secondsnsleep 10nndonen"}
print jsonString[202]
Below error I got –
KeyError: 202
{"description":"Script to check testtbeat of TEST 1 server.", "script":"#!/bin/bash\nset -e\n\nCOUNT=60 #number of 10 second timeouts in 10 minutes\nSUM_SYNCS=0\nSUM_SYNCS_BEHIND=0\nHOSTNAME=$hostname #dc1dbx1145.dc1.host.com\n\nwhile [[ $COUNT -ge \"0\" ]]; do\n\necho $HOSTNAME\n\n#send the request, put response in variable\nDATA=$(wget -O - -q -t 1 http://$HOSTNAME:8080/heartbeat)\n\n#grep $DATA for syncs and syncs_behind\nSYNCS=$(echo $DATA | grep -oE 'num_syncs: [0-9]+' | awk '{print $2}')\nSYNCS_BEHIND=$(echo $DATA | grep -oE 'num_syncs_behind: [0-9]+' | awk '{print $2}')\n\necho $SYNCS\necho $SYNCS_BEHIND\n\n#verify conditionals\nif [[ $SYNCS -gt \"8\" && $SYNCS_BEHIND -eq \"0\" ]]; then exit 0; fi\n\n#decrement the counter\nlet COUNT-=1\n\n#wait another 10 seconds\nsleep 10\n\ndone\n"}
Works for me.
Also, if you get an error like this in the future, a debugging technique you can use is to shorten the string to something that works and slowly add data until it doesn’t.
There is no error in your json text.
You can get the error if you copy-paste the string into your Python source code as a string literal. In that case n
is interpreted as a single character (newline). You can fix it by using raw-string literals instead (r''
, Use triple-quotes r'''..'''
to avoid escaping "'
quotes inside the string literal).
The control character can be allowed inside a string as follows,
json_str = json.loads(jsonString, strict=False)
You can find this in the docs for python 2, or the docs for python 3
If strict is false (
True
is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0–31 range, including't'
(tab),'n'
,'r'
and' '
.
try to use "strict=False" in json.loads , it will ignore "n" and another Control characters. like the follwing:
import json
test_string = ' { "key1" : "1015391654687" , "key2": "value2 n " } '
res = json.loads(test_string, strict=False)
print(res)
output :
{'key1': '1015391654687', 'key2': 'value2 n '}