How to bulk transfer JSON via MongoEngine
Question:
Trying to send a lot of data to MongoDB through MongoEngine. I start with a DataFrame that I write to JSON like this:
result = df.to_json(orient="index")
parsed = json.loads(result)
json_data = json.dumps(parsed, indent=4)
I then make it a little prettier using this:
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
This is the result:
{
"0": {
"Address": " Bursvej 30 ",
"Zip/city": "4930 Maribo",
"Price": " 148.000kr. ",
"Date": 1673371545635
},
"1": {
"Address": " Garrdesmuttevej 20 ",
"Zip/city": "9550 Mariager",
"Price": " 148.000kr. ",
"Date": 1673371545635
},
"2": {
"Address": " Norrevej 21 ",
"Zip/city": "6990 Ulfborg",
"Price": " 150.000kr. ",
"Date": 1673371545635
},
But when i try to send it to Mongo:
MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)
I get this error:
TypeError: main.MarketData() argument after ** must be a mapping, not str
Is there any other way to do this? I have been trying with PyMongo for several hours but gave up. Should I go back to that? Would prefer MongoEngine to be honest.
Thanks in advance
EDIT>
Dataframe
Address Zip/city Price Date
0 Bursøvej 30, Bursø 4930 Maribo 148.000 kr. 2023-01-10 17:25:45.635483
1 Gærdesmuttevej 20 9550 Mariager 148.000 kr. 2023-01-10 17:25:45.635483
2 Nørrevej 21 6990 Ulfborg 150.000 kr. 2023-01-10 17:25:45.635483
3 Egernvænget 54 4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483
4 Egernvænget 56 4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483
And my schema
class MarketData(Document):
#answers = DictField()
Address = DynamicField(required=False)
city = DynamicField(required=False)
Price = DynamicField(required=False)
date = DynamicField(required=False)
def json(self):
market_dict = {
"username": self.username,
"city": self.city,
"Price": self.price
}
return json.dumps(market_dict)
Answers:
You should use the json object instead of the json string to insert data.
When you iterate over json string, it goes over each character in the string one by one which further, again is treated as string, and hence the error.
So, after
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
instead of,
MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)
try,
MD = [MarketData(**data) for data in json_object]
MarketData.objects.insert(MD, load_bulk=False)
This should solve it.
A little more info…
json.dumps() returns a string value
json.loads() returns a json object
Therefore, json.dumps() is good for when you want to print json data or write it to a file
but when you need to perform some logic over the json data within the program, use the result from json.loads() as python will interpret result from json.loads() as valid json object which could very well be a python dictionary or a python list object.
Edit: There is still an error in the code below which i missed, sorry for that…
MD = [MarketData(**data) for data in json_object]
MarketData.objects.insert(MD, load_bulk=False)
As we are iterating over json_object, data will return only the keys for all entries in list, therefore to solve this, we must iterate over key, value pair object and create dictionary out of it, therefore correct way would be,
MD = [MarketData({k: v}) for k,v in json_object.items()]
MarketData.objects.insert(MD, load_bulk=False)
This should do the job.
Edit 2: Post the error still coming,
Try this,
items = [{k: v} for k,v in json_object.items()]
MD = [MarketData(**i) for i in items]
MarketData.objects.insert(MD, load_bulk=False)
You probably want to bypass all the conversion of this data to and from a string. To that end, let’s just convert to a dictionary. Of course you could also just iterate over the rows of your dataframe as well but let’s start with:
MD = [
MarketData(**data)
for data
in df.to_dict(orient="records")
]
Trying to send a lot of data to MongoDB through MongoEngine. I start with a DataFrame that I write to JSON like this:
result = df.to_json(orient="index")
parsed = json.loads(result)
json_data = json.dumps(parsed, indent=4)
I then make it a little prettier using this:
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
This is the result:
{
"0": {
"Address": " Bursvej 30 ",
"Zip/city": "4930 Maribo",
"Price": " 148.000kr. ",
"Date": 1673371545635
},
"1": {
"Address": " Garrdesmuttevej 20 ",
"Zip/city": "9550 Mariager",
"Price": " 148.000kr. ",
"Date": 1673371545635
},
"2": {
"Address": " Norrevej 21 ",
"Zip/city": "6990 Ulfborg",
"Price": " 150.000kr. ",
"Date": 1673371545635
},
But when i try to send it to Mongo:
MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)
I get this error:
TypeError: main.MarketData() argument after ** must be a mapping, not str
Is there any other way to do this? I have been trying with PyMongo for several hours but gave up. Should I go back to that? Would prefer MongoEngine to be honest.
Thanks in advance
EDIT>
Dataframe
Address Zip/city Price Date
0 Bursøvej 30, Bursø 4930 Maribo 148.000 kr. 2023-01-10 17:25:45.635483
1 Gærdesmuttevej 20 9550 Mariager 148.000 kr. 2023-01-10 17:25:45.635483
2 Nørrevej 21 6990 Ulfborg 150.000 kr. 2023-01-10 17:25:45.635483
3 Egernvænget 54 4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483
4 Egernvænget 56 4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483
And my schema
class MarketData(Document):
#answers = DictField()
Address = DynamicField(required=False)
city = DynamicField(required=False)
Price = DynamicField(required=False)
date = DynamicField(required=False)
def json(self):
market_dict = {
"username": self.username,
"city": self.city,
"Price": self.price
}
return json.dumps(market_dict)
You should use the json object instead of the json string to insert data.
When you iterate over json string, it goes over each character in the string one by one which further, again is treated as string, and hence the error.
So, after
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)
instead of,
MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)
try,
MD = [MarketData(**data) for data in json_object]
MarketData.objects.insert(MD, load_bulk=False)
This should solve it.
A little more info…
json.dumps() returns a string value
json.loads() returns a json object
Therefore, json.dumps() is good for when you want to print json data or write it to a file
but when you need to perform some logic over the json data within the program, use the result from json.loads() as python will interpret result from json.loads() as valid json object which could very well be a python dictionary or a python list object.
Edit: There is still an error in the code below which i missed, sorry for that…
MD = [MarketData(**data) for data in json_object]
MarketData.objects.insert(MD, load_bulk=False)
As we are iterating over json_object, data will return only the keys for all entries in list, therefore to solve this, we must iterate over key, value pair object and create dictionary out of it, therefore correct way would be,
MD = [MarketData({k: v}) for k,v in json_object.items()]
MarketData.objects.insert(MD, load_bulk=False)
This should do the job.
Edit 2: Post the error still coming,
Try this,
items = [{k: v} for k,v in json_object.items()]
MD = [MarketData(**i) for i in items]
MarketData.objects.insert(MD, load_bulk=False)
You probably want to bypass all the conversion of this data to and from a string. To that end, let’s just convert to a dictionary. Of course you could also just iterate over the rows of your dataframe as well but let’s start with:
MD = [
MarketData(**data)
for data
in df.to_dict(orient="records")
]