Convert a bytes array into JSON format
Question:
I want to parse a bytes
string in JSON format to convert it into python objects. This is the source I have:
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
And this is the desired outcome I want to have:
[{
"Date": "2016-05-21T21:35:40Z",
"CreationDate": "2012-05-05",
"LogoType": "png",
"Ref": 164611595,
"Classes": [
"Email addresses",
"Passwords"
],
"Link": "http://some_link.com"}]
First, I converted the bytes to string:
my_new_string_value = my_bytes_value.decode("utf-8")
but when I try to invoke loads
to parse it as JSON:
my_json = json.loads(my_new_string_value)
I get this error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 174 (char 173)
Answers:
To convert this bytesarray directly to json, you could first convert the bytesarray to a string with decode(), utf-8 is standard. Change the quotation markers.. The last step is to remove the ” from the dumped string, to change the json object from string to list.
dumps(s.decode()).replace("'", '"')[1:-1]
Your bytes
object is almost JSON, but it’s using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes
to str
and replace the quotes. Another option is to use ast.literal_eval
; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,
import json
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
# Decode UTF-8 bytes to Unicode, and convert single quotes
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)
# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
output
[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - -
[
{
"Classe": [
"Email addresses",
"Passwords"
],
"CreationDate": "2012-05-05",
"Date": "2016-05-21T21:35:40Z",
"Link": "http://some_link.com",
"LogoType": "png",
"Ref": 164611595
}
]
As Antti Haapala mentions in the comments, we can use ast.literal_eval
to convert my_bytes_value
to a Python list, once we’ve decoded it to a string.
from ast import literal_eval
import json
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
Generally, this problem arises because someone has saved data by printing its Python repr
instead of using the json
module to create proper JSON data. If it’s possible, it’s better to fix that problem so that proper JSON data is created in the first place.
You can simply use,
import json
json.loads(my_bytes_value)
Python 3.5 + Use io module
import json
import io
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
fix_bytes_value = my_bytes_value.replace(b"'", b'"')
my_json = json.load(io.BytesIO(fix_bytes_value))
Better solution is:
import json
byte_array_example = b'{"text": "u0627u06CCu0646 u06CCu06A9 u0645u062Au0646 u062Au0633u062Au06CC u0641u0627u0631u0633u06CC u0627u0633u062A."}'
res = json.loads(byte_array_example.decode('unicode_escape'))
print(res)
result:
{'text': 'این یک متن تستی فارسی است.'}
decode by utf-8 cannot decode unicode characters. The right solution is uicode_escape
It is OK
d = json.dumps(byte_str.decode('utf-8'))
if you have a bytes object and want to store it in a JSON file, then you should first decode the byte object because JSON only has a few data types and raw byte data isn’t one of them. It has arrays, decimal numbers, strings, and objects.
To decode a byte object you first have to know its encoding. For this, you can use
import chardet
encoding = chardet.detect(your_byte_object)['encoding']
then you can save this object to your json file like this
data = {"data": your_byte_object.decode(encoding)}
with open('request.txt', 'w') as file:
json.dump(data, file)
I want to parse a bytes
string in JSON format to convert it into python objects. This is the source I have:
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
And this is the desired outcome I want to have:
[{
"Date": "2016-05-21T21:35:40Z",
"CreationDate": "2012-05-05",
"LogoType": "png",
"Ref": 164611595,
"Classes": [
"Email addresses",
"Passwords"
],
"Link": "http://some_link.com"}]
First, I converted the bytes to string:
my_new_string_value = my_bytes_value.decode("utf-8")
but when I try to invoke loads
to parse it as JSON:
my_json = json.loads(my_new_string_value)
I get this error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 174 (char 173)
To convert this bytesarray directly to json, you could first convert the bytesarray to a string with decode(), utf-8 is standard. Change the quotation markers.. The last step is to remove the ” from the dumped string, to change the json object from string to list.
dumps(s.decode()).replace("'", '"')[1:-1]
Your bytes
object is almost JSON, but it’s using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes
to str
and replace the quotes. Another option is to use ast.literal_eval
; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,
import json
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
# Decode UTF-8 bytes to Unicode, and convert single quotes
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)
# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
output
[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - -
[
{
"Classe": [
"Email addresses",
"Passwords"
],
"CreationDate": "2012-05-05",
"Date": "2016-05-21T21:35:40Z",
"Link": "http://some_link.com",
"LogoType": "png",
"Ref": 164611595
}
]
As Antti Haapala mentions in the comments, we can use ast.literal_eval
to convert my_bytes_value
to a Python list, once we’ve decoded it to a string.
from ast import literal_eval
import json
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
Generally, this problem arises because someone has saved data by printing its Python repr
instead of using the json
module to create proper JSON data. If it’s possible, it’s better to fix that problem so that proper JSON data is created in the first place.
You can simply use,
import json
json.loads(my_bytes_value)
Python 3.5 + Use io module
import json
import io
my_bytes_value = b'[{'Date': '2016-05-21T21:35:40Z', 'CreationDate': '2012-05-05', 'LogoType': 'png', 'Ref': 164611595, 'Classe': ['Email addresses', 'Passwords'],'Link':'http://some_link.com'}]'
fix_bytes_value = my_bytes_value.replace(b"'", b'"')
my_json = json.load(io.BytesIO(fix_bytes_value))
Better solution is:
import json
byte_array_example = b'{"text": "u0627u06CCu0646 u06CCu06A9 u0645u062Au0646 u062Au0633u062Au06CC u0641u0627u0631u0633u06CC u0627u0633u062A."}'
res = json.loads(byte_array_example.decode('unicode_escape'))
print(res)
result:
{'text': 'این یک متن تستی فارسی است.'}
decode by utf-8 cannot decode unicode characters. The right solution is uicode_escape
It is OK
d = json.dumps(byte_str.decode('utf-8'))
if you have a bytes object and want to store it in a JSON file, then you should first decode the byte object because JSON only has a few data types and raw byte data isn’t one of them. It has arrays, decimal numbers, strings, and objects.
To decode a byte object you first have to know its encoding. For this, you can use
import chardet
encoding = chardet.detect(your_byte_object)['encoding']
then you can save this object to your json file like this
data = {"data": your_byte_object.decode(encoding)}
with open('request.txt', 'w') as file:
json.dump(data, file)