Defining a new column based on non null values in other columns

Question:

I working with two tables that I performed an outer join on. Below is the table.

Merged Table

I want to create a column called Job Number which looks at the Job Number Salesforce and Job Number Coins columns and returns which ever one is not null.

I attempted this using the following code:

if outer["Job Number_salesforce"] is not None:
    outer["Job Number"] = outer["Job Number_salesforce"]
else:
    outer["Job Number"] = outer["Job Number_coins"]

However, as you can see in the Job Number column it returns the job number if the first column is not null, but it doesn’t return the value of the second column when the first column is null.

Asked By: user21126867

||

Answers:

Here’s one way to solve your issue:

import numpy as np
import pandas as pd

# Create some sample DataFrame with the two columns that our code will use.
outer = pd.DataFrame(
    [
        ['11D01', '11D01'],
        ['11D01', '11D01'],
        ['11D01', None],
        ['31D01', '31D01'],
        ['31D01', '31D01'],
        [None, '31D01'],
        [None, '22D05'],
    ], columns=["Job Number_salesforce", "Job Number_coins"]
)

# np.where() works by identifying the rows where the condition (first
# parameter that was passed to it) is True or not.
# If the condition is True, then the value of the
# second parameter (values from the column "Job Number_salesforce") is used,
# otherwise, the value from the third parameter (values from the column
# "Job Number_coins") is used.
outer["Job Number"] = np.where(outer["Job Number_salesforce"].notnull(),
                               outer["Job Number_salesforce"],
                               outer["Job Number_coins"])

print(outer)
# Prints:
#
#   Job Number_salesforce Job Number_coins Job Number
# 0                 11D01            11D01      11D01
# 1                 11D01            11D01      11D01
# 2                 11D01             None      11D01
# 3                 31D01            31D01      31D01
# 4                 31D01            31D01      31D01
# 5                  None            31D01      31D01
# 6                  None            22D05      22D05

NOTE

The above implementation has two potential kryptonite’s:

  1. If both columns have non-missing values, then the value from the
    "Job Number_salesforce" column will be given preference.
  2. If both columns have missing values, then the newly created "Job Number" column will still contain missing values.
Answered By: Ingwersen_erik