Comparing three data frames to evaluate multiple criteria

Question:

I have three dataframes:

  • ob (Orderbook) – an orderbook containing Part Numbers, the week they are due and the hours it takes to build them.

    Part Number Due Week Build Hours
    A 2022-46 4
    A 2022-46 5
    B 2022-46 8
    C 2022-47 1.6
  • osm (Operator Skill Matrix) – a skills matrix containing operators names and part numbers

    Operator Part number
    Mr.One A
    Mr.One B
    Mr.Two A
    Mr.Two B
    Mrs. Three C
  • ah (Avaliable Hours) – a list containg how many hours an operator can work in a given week

    Operator YYYYWW Hours
    Mr.One 2022-45 40
    Mr.One 2022-46 35
    Mr.Two 2022-46 37
    Mr.Two 2022-47 39
    Mrs. Three 2022-47 40
    Mrs. Three 2022-48 45

I am trying to work out for each week if there are enough operators, with the right skills, working enough hours to complete all of the orders on the orderbook. And if not, identify the orders that cant be complete.

Step by Step it would look like this:

  1. Take the part number of the first row of the orderbook.
  2. Seach the skills matrix to find a list of operators who can build that part.
  3. Seach the hours list and check if the operators have any hours avaliable for the week the order is due.
  4. If the operator has hours avalible, add their name to that row of the orderbook.
  5. Subtract the Build hours in the orderbook from the Avalible hours in the Avalible Hours df.
  6. Repeat this for each row in the orderbook until all orders have a name against them or there are no avalible hours left.

The only thing i could think to try was a bunch of nested for loops, but as there are thousands of rows it takes ~45 minutes to complete one iteration and would take days if not weeks to complete the whole thing.

#for each row in the orderbook
for i, rowi in ob_sum_hours.iterrows():
    #for each row in the operator skill matrix
    for j, rowj in osm.iterrows():
        #for each row in the avalible operator hours
        for y, rowy in aoh.iterrows():
            if(rowi['Material']==rowj['MATERIAL'] and rowi['ProdYYYYWW']==rowy['YYYYWW'] and rowj['Operator']==rowy['Operator'] and rowy['Hours'] > 0):`
        rowy['Hours'] -=rowi['PlanHrs']
        rowi['HoursAllocated'] = rowi['Operator']
    

The final result would look like this:

Part Number Due Week Build Hours Operator
A 2022-46 4 Mr.One
A 2022-46 5 Mr.One
B 2022-46 8 Mr.Two
C 2022-47 1.6 Mrs.Three

Is there a better way to achieve this?

Asked By: jhew123

||

Answers:

Made with one loop + apply on each line.

Orderbook.groupby(Orderbook.index) groups by index, i.e. my_func iterates through each row, still better than a loop.

In the ‘aaa’ list, we get a list of unique Operators that match. In the ‘bbb’ list, filter Avaliable by: ‘YYYYWW’, ‘Operator’ (using isin for the list of unique Operators) and ‘Hours’ greater than 0. Further in the loop, using the ‘bbb’ indices, we check free time and if ‘ava’ is greater than zero, using explicit indexing loc set values.

import pandas as pd

Orderbook = pd.read_csv('Orderbook.csv', header=0)
Operator = pd.read_csv('Operator.csv', header=0)
Avaliable= pd.read_csv('Avaliable.csv', header=0)

Orderbook['Operator'] = 'no'


def my_func(x):
    aaa = Operator.loc[Operator['Part number'] == x['Part Number'].values[0], 'Operator'].unique()
    bbb = Avaliable[(Avaliable['YYYYWW'] == x['Due Week'].values[0]) &
                    (Avaliable['Operator'].isin(aaa)) & (Avaliable['Hours'] > 0)]

    for i in bbb.index:
        ava = Avaliable.loc[i, 'Hours'] - x['Build Hours'].values
        if ava >= 0:
            Avaliable.loc[i, 'Hours'] = ava
            Orderbook.loc[x.index, 'Operator'] = Avaliable.loc[i, 'Operator']
            break# added loop interrupt


Orderbook.groupby(Orderbook.index).apply(my_func)

print(Orderbook)
print(Avaliable)

Update 18.11.2022
I did it without cycles. But, you need to check. If you find something incorrect please let me know. You can also measure the exact processing time by putting at the beginning:

import datetime

now = datetime.datetime.now()

and printing the elapsed time at the end:

time_ = datetime.datetime.now() - now
print('elapsed time', time_)

the code:

Orderbook = pd.read_csv('Orderbook.csv', header=0)
Operator = pd.read_csv('Operator.csv', header=0)
Avaliable = pd.read_csv('Avaliable.csv', header=0)

Orderbook['Operator'] = 'no'

aaa = [Operator.loc[Operator['Part number'] == Orderbook.loc[i, 'Part Number'], 'Operator'].unique() for i in
       range(len(Orderbook))]


def my_func(x):
    bbb = Avaliable[(Avaliable['YYYYWW'] == x['Due Week'].values[0]) &
                    (Avaliable['Operator'].isin(aaa[x.index[0]])) & (Avaliable['Hours'] > 0)]

    fff = Avaliable.loc[bbb.index, 'Hours'] - x['Build Hours'].values
    ind = fff[fff.ge(0)].index
    Avaliable.loc[ind[0], 'Hours'] = fff[ind[0]]
    Orderbook.loc[x.index, 'Operator'] = Avaliable.loc[ind[0], 'Operator']


Orderbook.groupby(Orderbook.index).apply(my_func)

print(Orderbook)
print(Avaliable)
Answered By: inquirer
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.