python big json database loading

Question:

The application I am making requires me to load a json file which is 100mb-1gb. I used json library but it takes a lot of time to load then, I tried to switch to yaml but was rather slower than json.

So is there a library (for dumping/loading) which loads much faster than json or another file format that would be better for databases?

Asked By: Mantrix

||

Answers:

I got an answer long ago but forgot about the question.

Here is what I did if someone wants to do the same:

Thanks to JonSG, I used parquet files for an amount of time, I then went into depth into its working principles. Based on that, I created my own code, using zstd which is a text compression algorithm, and it does save a lot of storage.

from zstd import dumps, loads
from json import loads as jloads, dumps as jdumps
def dump(data, file, level=9):
    data = jdumps(data).encode()
    with open(file, 'wb') as f:
        f.write(dumps(data, level))
def load(file):
    with open(file, 'rb') as f:
        return jloads(loads(f.read()))

test cases:

  1. 128mb json was compressed to 6mb
  2. 89mb json was compressed to 4-5mb
Answered By: Mantrix
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.