How to bulk transfer JSON via MongoEngine

Question:

Trying to send a lot of data to MongoDB through MongoEngine. I start with a DataFrame that I write to JSON like this:

result = df.to_json(orient="index")
parsed = json.loads(result)
json_data = json.dumps(parsed, indent=4) 

I then make it a little prettier using this:

json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)

This is the result:

{
  "0": {
    "Address": " Bursvej 30 ",
    "Zip/city": "4930 Maribo",
    "Price": " 148.000kr. ",
    "Date": 1673371545635
  },
  "1": {
    "Address": " Garrdesmuttevej 20 ",
    "Zip/city": "9550 Mariager",
    "Price": " 148.000kr. ",
    "Date": 1673371545635
  },
  "2": {
    "Address": " Norrevej 21 ",
    "Zip/city": "6990 Ulfborg",
    "Price": " 150.000kr. ",
    "Date": 1673371545635
  },

But when i try to send it to Mongo:

MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)

I get this error:
TypeError: main.MarketData() argument after ** must be a mapping, not str

Is there any other way to do this? I have been trying with PyMongo for several hours but gave up. Should I go back to that? Would prefer MongoEngine to be honest.

Thanks in advance

EDIT>

Dataframe

    Address Zip/city    Price   Date
0   Bursøvej 30, Bursø  4930 Maribo 148.000 kr. 2023-01-10 17:25:45.635483
1   Gærdesmuttevej 20   9550 Mariager   148.000 kr. 2023-01-10 17:25:45.635483
2   Nørrevej 21 6990 Ulfborg    150.000 kr. 2023-01-10 17:25:45.635483
3   Egernvænget 54  4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483
4   Egernvænget 56  4733 Tappernøje 195.000 kr. 2023-01-10 17:25:45.635483

And my schema

class MarketData(Document):
    #answers = DictField()
    Address = DynamicField(required=False)
    city = DynamicField(required=False)
    Price = DynamicField(required=False)
    date = DynamicField(required=False)
    
    def json(self):
        market_dict = {
            "username": self.username,
            "city": self.city,
            "Price": self.price
        }
        return json.dumps(market_dict)
Asked By: jmChrist

||

Answers:

You should use the json object instead of the json string to insert data.
When you iterate over json string, it goes over each character in the string one by one which further, again is treated as string, and hence the error.

So, after

json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent=2)
print(json_formatted_str)

instead of,

MD = [MarketData(**data) for data in json_formatted_str]
MarketData.objects.insert(MD, load_bulk=False)

try,

MD = [MarketData(**data) for data in json_object]
    MarketData.objects.insert(MD, load_bulk=False)

This should solve it.

A little more info…

json.dumps() returns a string value

json.loads() returns a json object

Therefore, json.dumps() is good for when you want to print json data or write it to a file
but when you need to perform some logic over the json data within the program, use the result from json.loads() as python will interpret result from json.loads() as valid json object which could very well be a python dictionary or a python list object.

Edit: There is still an error in the code below which i missed, sorry for that…

MD = [MarketData(**data) for data in json_object]
MarketData.objects.insert(MD, load_bulk=False)

As we are iterating over json_object, data will return only the keys for all entries in list, therefore to solve this, we must iterate over key, value pair object and create dictionary out of it, therefore correct way would be,

MD = [MarketData({k: v}) for k,v in json_object.items()]
MarketData.objects.insert(MD, load_bulk=False)

This should do the job.

Edit 2: Post the error still coming,
Try this,

items = [{k: v} for k,v in json_object.items()]
MD = [MarketData(**i) for i in items]
MarketData.objects.insert(MD, load_bulk=False)
Answered By: Vee Dee

You probably want to bypass all the conversion of this data to and from a string. To that end, let’s just convert to a dictionary. Of course you could also just iterate over the rows of your dataframe as well but let’s start with:

MD = [
    MarketData(**data)
    for data
    in df.to_dict(orient="records")
]
Answered By: JonSG
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.