How to define a function based on vaues of multiple columns

Question:

I have a datafile as follows:

Activity Hazard Condition consequence
welding fire chamber high
painting falling none medium

I need to create a fifth column based on the values in the activity, hazard, condition, and consequence columns. The conditions are as follows

if the activity column includes "working" or "performing" return ‘none’
if the condition column includes ‘none’ return ‘none’
else return datafile.Hazard.map(str) + " is " + datafile.Condition.map(str) + " impact " datafile.consequence.map(str)

I wrote the following code using regular expressions and dictionaries. But it didn’t provide an answer. It is really appreciated if someone can give an answer

Dt1 = ['working','performing duties']
Dt2 = ["none"]

Dt1_regex = "|".join(Dt1)
Dt2_regex = "|".join(Dt2)

def clean(x):
    if datafile.Activity.str.contains (Dt1_regex, regex=True) | datafile.Condition.str.contains(Dt2_regex, regex=True):
        return 'none'
    else:
        return datafile.Hazard.map(str) + " is " + datafile.Condition.map(str)" impact " datafile.consequence.map(str)

datafile['combined'] = datafile.apply(clean)
Asked By: Sonali

||

Answers:

You can create a bunch of conditions, and a list of the associated values when the conditions are true. You can then create the new column as shown in the example below:

Conditions:

  • When Activity == 'working' OR Activity == 'performing' – set to none
  • When Condition == 'none' – set to none

Otherwise set to:

df.Hazard + ' is ' + df.Condition + ' impact ' + df.consequence

Code:

import numpy as np
import pandas as pd

df = pd.DataFrame({ 'Activity': ['welding', 'painting', 'working'],
                    'Hazard': ['fire', 'falling', 'drowning'],
                    'Condition': ['chamber', 'none', 'wet'],
                    'consequence': ['high', 'medium', 'high']})

# Create your conditions
conditions = [
    (df['Activity'] == 'working') | (df['Activity'] == 'performing'),
    (df['Condition'] == 'none')
    ]

# create a list of the values we want to assign for each condition
values = ['none', 'none']

# Create the new column based on the conditions and values
df['combined'] = np.select(conditions, values, default = "x")
df.loc[df['combined'] == 'x', 'combined'] = df.Hazard + ' is ' + df.Condition + " impact " + df.consequence

print(df)

Output:

Dataframe

Answered By: ScottC

I tried the following code which also gave me the correct answer.

Dt1 = ["working", "performing duties"]
Dt1_regex = "|".join(Dt1)

conditions = [
    (datafile.Activity.str.contains(Dt1_regex,regex=True)),
    (datafile.Condition.str.contains('none')==True),
    (datafile.Activity.str.contains(Dt1_regex,regex=False))|(datafile.Condition.str.contains('none')==False)
]
values = ['none', 'none', datafile.Hazard + ' is ' + datafile.Condition + " impact " + datafile.consequence]
datafile['combined'] = np.select(conditions, values)
Answered By: Sonali