How to split dictionary into multiple dictionaries fast

Question:

I have found a solution but it is really slow:

def chunks(self,data, SIZE=10000):
    for i in xrange(0, len(data), SIZE):
        yield dict(data.items()[i:i+SIZE])

Do you have any ideas without using external modules (numpy and etc.)

Asked By: badc0re

||

Answers:

Since the dictionary is so big, it would be better to keep all the items involved to be just iterators and generators, like this

from itertools import islice

def chunks(data, SIZE=10000):
    it = iter(data)
    for i in range(0, len(data), SIZE):
        yield {k:data[k] for k in islice(it, SIZE)}

Sample run:

for item in chunks({i:i for i in xrange(10)}, 3):
    print(item)

Output

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{8: 8, 6: 6, 7: 7}
{9: 9}
Answered By: thefourtheye

Another method is iterators zipping:

>>> from itertools import izip_longest, ifilter
>>> d = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5, 'f':6, 'g':7, 'h':8}

Create a list with copies of dict iterators (number of copies is number of elements in result dicts). By passing each iterator from chunks list to izip_longest you will get needed number of elements from source dict (ifilter used to remove None from zip results). With generator expression you can lower memory usage:

>>> chunks = [d.iteritems()]*3
>>> g = (dict(ifilter(None, v)) for v in izip_longest(*chunks))
>>> list(g)
[{'a': 1, 'c': 3, 'b': 2},
 {'e': 5, 'd': 4, 'g': 7},
 {'h': 8, 'f': 6}]
Answered By: ndpu
import numpy as np
chunk_size = 3
chunked_data = [[k, v] for k, v in d.items()]
chunked_data = np.array_split(chunked_data, chunk_size)

Afterwards you have ndarray which is iterable like this:

for chunk in chunked_data:
    for key, value in chunk:
        print(key)
        print(value)

Which could be re-assigned to a list of dicts using a simple for loop.

Answered By: gies0r

This code takes a large dictionary and splits it into a list of small dictionaries. max_limit variable is to tell maximum number of key-value pairs allowed in a sub-dictionary.
This code doesn’t take much effort to break the dictionary, just one complete parsing over the dictionary object.

import copy
def split_dict_to_multiple(input_dict, max_limit=200):
"""Splits dict into multiple dicts with given maximum size. 
Returns a list of dictionaries."""
chunks = []
curr_dict ={}
for k, v in input_dict.items():
    if len(curr_dict.keys()) < max_limit:
        curr_dict.update({k: v})
    else:
        chunks.append(copy.deepcopy(curr_dict))
        curr_dict = {k: v}
# update last curr_dict
chunks.append(curr_dict)
return chunks
Answered By: Pratibha Gupta

For Python 3+.

xrange() was renamed to range() in Python 3+.

You can use;

from itertools import islice

def chunks(data, SIZE=10000):
   it = iter(data)
   for i in range(0, len(data), SIZE):
      yield {k:data[k] for k in islice(it, SIZE)}

Sample:

for item in chunks({i:i for i in range(10)}, 3):
   print(item)

With following output.

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{6: 6, 7: 7, 8: 8}
{9: 9}
Answered By: Patrick Acioli

This code works in Python 3.8 and does not use any external modules:

def split_dict(d, n):
    keys = list(d.keys())
    for i in range(0, len(keys), n):
        yield {k: d[k] for k in keys[i: i + n]}


for item in split_dict({i: i for i in range(10)}, 3):
    print(item)

prints this:

{0: 0, 1: 1, 2: 2}
{3: 3, 4: 4, 5: 5}
{6: 6, 7: 7, 8: 8}
{9: 9}

… and might even be slightly faster than the (currently) accepted answer of thefourtheye:

from hwcounter import count, count_end


start = count()
for item in chunks({i: i for i in range(100000)}, 3):
    pass
elapsed = count_end() - start
print(f'elapsed cycles: {elapsed}')

start = count()
for item in split_dict({i: i for i in range(100000)}, 3):
    pass
elapsed = count_end() - start
print(f'elapsed cycles: {elapsed}')

prints

elapsed cycles: 145773597
elapsed cycles: 138041191
Answered By: Mat

Something like the following should work, with only builtins:

>>> adict = {1:'a', 2:'b', 3:'c', 4:'d'}
>>> chunklen = 2
>>> dictlist = list(adict.items())
>>> [ dict(dictlist[i:i + chunklen]) for i in range(0, len(dictlist), chunklen) ]
[{1: 'a', 2: 'b'}, {3: 'c', 4: 'd'}]

This preps the original dictionary into a list of items, but you could possibly could do that in a one-liner.

Answered By: MattK
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.