How can I reduce the time needed to run my code and what is the cause of the slow speed?

Question

The code works well on small datasets as illustrated in the example code, but the data set I have to process are two lists picker and order each with a length of 1592798 and 288 and 528510 unique values respectively.
For the sake of the example I have replaced these with two short lists, but the concept is the same. I am wondering if the long time required to run the code is due to the sheer amount of data, or if the code is inefficient at processing the data and can be improved.

The purpose of the code is to group all pickers associated with a unique order into a list(hold) within a list(pairs). The order the elements occur in the pair list must be determined, by the first entry in each element on the list, for instance [1, 'a'] must come before [2, 'b', 'k'], because 1 is a smaller number than 2. Regarding for instance 'b', 'k' in [2, 'b', 'k'], the order of these is determined by which of these occurs first in the list picker. 'b' comes before 'k' because 'b' has a lower index.
The current code looks like this


order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []

order_picker = list(zip(order, picker))

for x in set(order):
    hold = []
    hold.append(x)
    for i in range(len(order_picker)):
        if x == list(order_picker[i])[0]:
            if list(order_picker[i])[1] not in hold:
                hold.append(list(order_picker[i])[1])
    pairs.append(hold)

print(pairs)

The output from the print(pairs)

>>> print(pairs)
[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

The output must be on this format for me to later write it to excel.

I suspect that the long time required to run the code occurs due to checking the entire list of length 1592798 each time a new value must be identified, but I have been unable to create a faster solution. How can I reduce the time required to run the code.

Asked By: KriLum

||

Source

Answer 1

Perhaps you can speed up your code by only looping over the elements in picker and order once

In the example I made, I am zipping the two lists, and using a defaultdict consisting of sets to add each element. Finally, the dictionary is converted to your desired output format

from collections import defaultdict

order = [1,  2,  3,  4,  1,  5,  3,  6,  7,  1,  8,  9,  4,  4,  2,  8,  4,  4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']

pairs = defaultdict(set)
for o, p in zip(order, picker):
    pairs[o].add(p)
pairs = [[k, *v] for k, v in pairs.items()]

print(pairs)

Answered By: oskros

Answer 2

You can use dictionary to store the orders and associated pickers and solve it in O(n) complexity instead of O(n^2)

order = [1,  2,  3,  4,  1,  5,  3,  6,  7,  1,  8,  9,  4,  4,  2,  8,  4,  4, 2]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']
pairs = []

order_picker = list(zip(order, picker))

orders_dict = {}
for order, picker in order_picker:
    if order in orders_dict:
        if picker not in orders_dict[order]:
            orders_dict[order].append(picker)
    else:
        orders_dict[order] = [picker]

for order, pickers in orders_dict.items():
    pairs.append([order] + pickers)

print(pairs)

If your dataset is very large and performance is critical, you can consider using Pandas

import pandas as pd

df = pd.DataFrame({'order': order, 'picker': picker})
pairs = df.groupby('order')['picker'].apply(set).reset_index().values.tolist()

Answered By: Ayush Naik

Answer 3

It takes long because you iterate multiple times on the same data : zip, for and for

Try to optimize by iterating less,

something like this produces the same output with only 1 for loop

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']


order_indexes = {} # stores indexes of orders
pairs = []

for i in range(0, len(order)):
    order_item = order[i]
    picker_item = picker[i]

    if (order_item not in order_indexes):
        order_indexes[order_item] = len(pairs)
        # the index it will be inserted in
        pairs.append([order_item]) 
        # insertion of new order
  
    if (picker_item not in pairs[order_indexes[order_item]]): 
        pairs[order_indexes[order_item]].append(picker_item)
        # add picker if not already present
        
print(pairs)

Answered By: gui3

Answer 4

Fast solution with the desired orders:

def pairs(order, picker):
    d = {o: {} for o in sorted(set(order))}
    for o, p in zip(order, picker):
        d[o][p] = None
    return [[o, *p] for o, p in d.items()]

order  = [ 1,   2,   3,   4,   1,   5,   3,   6,   7,   1,   8,   9,   4,   4,   2,   8,   4,   4,   2 ]
picker = ['a', 'b', 'c', 'd', 'a', 'e', 'c', 'f', 'g', 'a', 'h', 'i', 'j', 'k', 'b', 'h', 'j', 'j', 'k']

print(pairs(order, picker))

Output (Attempt This Online!):

[[1, 'a'], [2, 'b', 'k'], [3, 'c'], [4, 'd', 'j', 'k'], [5, 'e'], [6, 'f'], [7, 'g'], [8, 'h'], [9, 'i']]

Answered By: Kelly Bundy

How can I reduce the time needed to run my code and what is the cause of the slow speed?

Question:

Answers: