python big json database loading
Question:
The application I am making requires me to load a json file which is 100mb-1gb. I used json library but it takes a lot of time to load then, I tried to switch to yaml but was rather slower than json.
So is there a library (for dumping/loading) which loads much faster than json or another file format that would be better for databases?
Answers:
I got an answer long ago but forgot about the question.
Here is what I did if someone wants to do the same:
Thanks to JonSG, I used parquet files for an amount of time, I then went into depth into its working principles. Based on that, I created my own code, using zstd
which is a text compression algorithm, and it does save a lot of storage.
from zstd import dumps, loads
from json import loads as jloads, dumps as jdumps
def dump(data, file, level=9):
data = jdumps(data).encode()
with open(file, 'wb') as f:
f.write(dumps(data, level))
def load(file):
with open(file, 'rb') as f:
return jloads(loads(f.read()))
test cases:
- 128mb json was compressed to 6mb
- 89mb json was compressed to 4-5mb
The application I am making requires me to load a json file which is 100mb-1gb. I used json library but it takes a lot of time to load then, I tried to switch to yaml but was rather slower than json.
So is there a library (for dumping/loading) which loads much faster than json or another file format that would be better for databases?
I got an answer long ago but forgot about the question.
Here is what I did if someone wants to do the same:
Thanks to JonSG, I used parquet files for an amount of time, I then went into depth into its working principles. Based on that, I created my own code, using zstd
which is a text compression algorithm, and it does save a lot of storage.
from zstd import dumps, loads
from json import loads as jloads, dumps as jdumps
def dump(data, file, level=9):
data = jdumps(data).encode()
with open(file, 'wb') as f:
f.write(dumps(data, level))
def load(file):
with open(file, 'rb') as f:
return jloads(loads(f.read()))
test cases:
- 128mb json was compressed to 6mb
- 89mb json was compressed to 4-5mb