Check if two columns are having matching values, but values are not in the same index places(Python, Pandas)

Question:

I have two DataFrames about Super Store Sales:

  • the first is named df_orders
  • the second one is named df_returns

In both DataFrames we have a matching column called "Order ID", but df_returns has less rows than df_orders. What I want to do is make a new column in df_orders with the value ‘Returned’ if the Order ID is present in df_returns and ‘Not returned’ if not.

Here are samples of both DataFrames:

df_order= {'City':['Prior Lake','Chicago','NY','Prior Lake', 'Round Rock'],
           'Order ID':[86838 ,90154,15000,10000, 12447]}
df_return= {'Order ID':[90154, 86838 ],
           'Returned':['Returned', 'Returned']}

# Create DataFrame from dict
df_orders = pd.DataFrame.from_dict(df_order)
df_returns = pd.DataFrame.from_dict(df_return)

This is how i thought it should be checked but it is definitely not correct cause everywhere says "not returned", but I’ve checked manually and seen that some orders are matching. Please, help me out.

excel_path = r'C:UsersKorisnikDesktopPythonFilesOmegaSuperStoreUS.xlsx'
df = pd.read_excel(excel_path, sheet_name=None)

# 1.
df_order = df.get('Orders')
df_returns = df.get('Returns')
df_users = df.get('Users')

df_n.reset_index(drop=True)
df_returns.reset_index(drop=True)
df_n['Status'] = np.where( df_n['Order ID'].equals(df_returns['Order ID'])  and df_returns["Status"] == "Returned", "Returned", "Not returned")

Answers:

You can use pandas.DataFrame.merge with pandas.Series.fillna :

df_order = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Orders")
df_return = pd.read_excel("SuperStoreUS.xlsx", sheet_name="Returns")

Use either :

# --- To create a new dataframe
out = df_order.merge(df_return, on="Order ID", how="left")
out["Status"] = out["Status"].fillna("Not Returned")

Or:

# --- To update df_order
df_order = df_order.merge(df_return, on="Order ID", how="left")
df_order["Status"] = df_order["Status"].fillna("Not Returned")
Answered By: abokey

Another way is to create a new column in df_order, with values conditioned on the row’s Order ID:

df_orders['Status'] = df_orders['Order ID'].map(lambda x: 'Returned' if x in df_returns['Order ID'].tolist() else 'Not returned')
Answered By: Swifty