Pandas replace NaN values with data from a different dataframe

Question:

I’m trying to replace NaN values in df1 with values from df2. The ‘replacement’ value in df2 is located by two inputs, p/n and operation.

enter image description here

enter image description here

Because the desired NaN value in df1 is dependent on the P/N and Operation, looking up the value in df2 must contain both the P/N and Operation. Sorting df1 for the Nan values can be found using:

df1 = df1.loc[df1['Time'].isnull()]

Where finding the replacement value can also be located

`

df2 = df2.loc[(df2['P/N'] == pn) & (df2['Operation'] == op), 'Time']

`

The dataframes are too large to merge (plus there is a benefit to keeping them separate)…how can df2 be used as a lookup table for the ‘Time’ column in df1?

This post is similar but I had a difficult time understanding the solution, plus I’m unsure how to make lambda dependent on two lookup values: Pandas replace all NaN values with data from a different dataframe

Any help is appreciated.

*** UPDATE *** UPDATE ***
Posting the code used to troubleshoot the issue, special thanks to Naveed for the help!

”’

df1 = pd.DataFrame(np.array([['10/25/2022', 'Apple', 
'Wash','Joe','Washer','EA', 100, 100, 0, 0.25, 1.1, 90.91, 55],
                        ['10/25/2022', 'Apple', 
'Peel','Tom','Peeler','EA', 100, 95, 5, 0.25, 1.30, 73.08, pd.NA],
                        ['10/25/2022', 'Apple', 
'Package','Sally','Packing','EA', 95, 95, 0, 0.8, 5.0, 19.00, pd.NA],
                        ['10/25/2022','Orange', 'Wash', 
'Joe','Washer','EA', 100, 100, 0, 0.25, 1.1, 90.91, pd.NA],
                        ['10/25/2022','Orange', 
'Peel','Tom','Peeler','EA', 95, 95, 5, 0.25, 2.30, 41.30, pd.NA],
                        ['10/25/2022', 'Orange', 
'Package','Sally','Packing','EA', 95, 95, 0, 1.1, 5.30, 17.92, 
pd.NA],
                        ['10/25/2022','Banana', 'Peel & 
Eat','Tom','Peeler','EA', 100, 100, 0, 0.25, 1.1, 90.91, pd.NA]]),
                        columns=['date', 'fruit', 'operation', 
'operator', 'station', 'uom', 'pcs_processed', 'pcs_accepted',
                         'pcs_reject', 'setup', 'runtime', 'actual', 
'target'])

df2 = pd.DataFrame(np.array([['Apple', 'Wash', 'EA', 0.25, 0.25, 96],
                        ['Apple', 'Peel', 'EA', 0.33, 0.1, 11.63],
                        ['Apple', 'Slice', 'EA', 1.25, 0.25,32],
                        ['Apple', 'Package', 'EA', 2.0, 1.0, 16.0],
                        ['Orange', 'Wash', 'EA', 0.25, 0.25, 96.0],
                        ['Orange','Peel', 'EA', 1.0, 0.1, 43.64],
                        ['Orange', 'Slice','EA', 0.25, 0.25, 96.0],
                        ['Orange', 'Package', 'EA', 2.0, 1.0, 16.0],
                        ['Banana', 'Peel', 'EA', 0.5, 0.25, 64.0],
                        ['Banana', 'Slice', 'EA', 1.10, 0.25, 35.56],
                        ['Banana', 'Package', 'EA', 1.50, 1.00, 
19.20],
                        ['Banana', 'Peel & Eat', 'EA', 2.0, 3.0, 22],
                        ['Pear', 'Peel & Eat', 'EA', 1.3, 0.25, 
6.5]]),
                        columns=['fruit', 'operation', 'uom', 
'cycle_time', 'chng_time', 'target'])


# Set to multi-level index
df1.set_index(['fruit','operation'], inplace=True)
df2.set_index(['fruit','operation'], inplace=True)

# Update df1 with df2 'Time' values
df1.update(df2, overwrite=False)

# Reset df index
df1 = df1.reset_index()
df2 = df2.reset_index()

”’

Asked By: RamenRich

||

Answers:

if you provide data as a code (preferably) or as a text. I would be able to share the result as well

# set the index on the column that you like to update NaN values on

df.set_index(['PN','Op'], inplace=True)
df2.set_index(['PN','Op'], inplace=True)

# use update to update the null values in DF, with corresponding values from DF2
df.update(df2, overwrite=False)

df.reset_index()


Answered By: Naveed
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.