How to define a function based on vaues of multiple columns

Question

I have a datafile as follows:

Activity	Hazard	Condition	consequence
welding	fire	chamber	high
painting	falling	none	medium

I need to create a fifth column based on the values in the activity, hazard, condition, and consequence columns. The conditions are as follows

if the activity column includes "working" or "performing" return ‘none’
if the condition column includes ‘none’ return ‘none’
else return datafile.Hazard.map(str) + " is " + datafile.Condition.map(str) + " impact " datafile.consequence.map(str)

I wrote the following code using regular expressions and dictionaries. But it didn’t provide an answer. It is really appreciated if someone can give an answer

Dt1 = ['working','performing duties']
Dt2 = ["none"]

Dt1_regex = "|".join(Dt1)
Dt2_regex = "|".join(Dt2)

def clean(x):
    if datafile.Activity.str.contains (Dt1_regex, regex=True) | datafile.Condition.str.contains(Dt2_regex, regex=True):
        return 'none'
    else:
        return datafile.Hazard.map(str) + " is " + datafile.Condition.map(str)" impact " datafile.consequence.map(str)

datafile['combined'] = datafile.apply(clean)

Asked By: Sonali

||

Source

Answer 1

You can create a bunch of conditions, and a list of the associated values when the conditions are true. You can then create the new column as shown in the example below:

Conditions:

When Activity == 'working' OR Activity == 'performing' – set to none
When Condition == 'none' – set to none

Otherwise set to:

df.Hazard + ' is ' + df.Condition + ' impact ' + df.consequence

Code:

import numpy as np
import pandas as pd

df = pd.DataFrame({ 'Activity': ['welding', 'painting', 'working'],
                    'Hazard': ['fire', 'falling', 'drowning'],
                    'Condition': ['chamber', 'none', 'wet'],
                    'consequence': ['high', 'medium', 'high']})

# Create your conditions
conditions = [
    (df['Activity'] == 'working') | (df['Activity'] == 'performing'),
    (df['Condition'] == 'none')
    ]

# create a list of the values we want to assign for each condition
values = ['none', 'none']

# Create the new column based on the conditions and values
df['combined'] = np.select(conditions, values, default = "x")
df.loc[df['combined'] == 'x', 'combined'] = df.Hazard + ' is ' + df.Condition + " impact " + df.consequence

print(df)

Output:

Answered By: ScottC

Answer 2

I tried the following code which also gave me the correct answer.

Dt1 = ["working", "performing duties"]
Dt1_regex = "|".join(Dt1)

conditions = [
    (datafile.Activity.str.contains(Dt1_regex,regex=True)),
    (datafile.Condition.str.contains('none')==True),
    (datafile.Activity.str.contains(Dt1_regex,regex=False))|(datafile.Condition.str.contains('none')==False)
]
values = ['none', 'none', datafile.Hazard + ' is ' + datafile.Condition + " impact " + datafile.consequence]
datafile['combined'] = np.select(conditions, values)

Answered By: Sonali

How to define a function based on vaues of multiple columns

Question:

Answers:

Conditions:

Code:

Output: