How to create a column as function of other two?
Question:
I have a dataframe with two columns. I want to create a third column such that, if Col1 is null, then Col3 = Col2, else Col3 = Col1 * 2
I have tried:
def myf(col1,col2):
if pd.isnull(col1):
return col2
else:
return col1 * 2
df['col3'] = df.apply(lambda x: myf(df['col1'], df['col2']), axis= 1)
but I get an error that "’The truth value of a Series is ambiguous".
How can I fix this? My tiny, used-to-SQL brain still struggles to understand how pandas works; maybe I’m very dumb, maybe pandas’ documentation is very poor, maybe both 🙂
I understand that apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series, and I understand the error arises because pd.isnull returns a T/F array.
However, I’m not sure how I’d use applymap or map in a case like this, where two other columns are my input.
Answers:
Need change df
to x
in lambda function
for scalars instead Series
as input in function:
df['col3'] = df.apply(lambda x: myf(x['col1'], x['col2']), axis= 1)
Another faster solution is with combine_first
or Series.where
:
df['col3'] = df['col1'].mul(2).combine_first(df['col2'])
df['Col3'] = df['col2'].where(df['col1'].isnull(), df['col1']*2)
You can use fillna
:
df.col1.mul(2).fillna(df.col2)
df = pd.DataFrame({
'col1': [1, 2, pd.np.nan, 3, pd.np.nan],
'col2': [2, pd.np.nan, 3, 2, pd.np.nan]
})
df['col3'] = df.col1.mul(2).fillna(df.col2)
df
# col1 col2 col3
#0 1.0 2.0 2.0
#1 2.0 NaN 4.0
#2 NaN 3.0 3.0
#3 3.0 2.0 6.0
#4 NaN NaN NaN
I have a dataframe with two columns. I want to create a third column such that, if Col1 is null, then Col3 = Col2, else Col3 = Col1 * 2
I have tried:
def myf(col1,col2):
if pd.isnull(col1):
return col2
else:
return col1 * 2
df['col3'] = df.apply(lambda x: myf(df['col1'], df['col2']), axis= 1)
but I get an error that "’The truth value of a Series is ambiguous".
How can I fix this? My tiny, used-to-SQL brain still struggles to understand how pandas works; maybe I’m very dumb, maybe pandas’ documentation is very poor, maybe both 🙂
I understand that apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series, and I understand the error arises because pd.isnull returns a T/F array.
However, I’m not sure how I’d use applymap or map in a case like this, where two other columns are my input.
Need change df
to x
in lambda function
for scalars instead Series
as input in function:
df['col3'] = df.apply(lambda x: myf(x['col1'], x['col2']), axis= 1)
Another faster solution is with combine_first
or Series.where
:
df['col3'] = df['col1'].mul(2).combine_first(df['col2'])
df['Col3'] = df['col2'].where(df['col1'].isnull(), df['col1']*2)
You can use fillna
:
df.col1.mul(2).fillna(df.col2)
df = pd.DataFrame({
'col1': [1, 2, pd.np.nan, 3, pd.np.nan],
'col2': [2, pd.np.nan, 3, 2, pd.np.nan]
})
df['col3'] = df.col1.mul(2).fillna(df.col2)
df
# col1 col2 col3
#0 1.0 2.0 2.0
#1 2.0 NaN 4.0
#2 NaN 3.0 3.0
#3 3.0 2.0 6.0
#4 NaN NaN NaN