Creating a new column in Panda by using lambda function on two existing columns

Question:

I am able to add a new column in Panda by defining user function and then using apply. However, I want to do this using lambda; is there a way around?

For example, df has two columns a and b. I want to create a new column c which is equal to the longest length between a and b.

df = pd.DataFrame({'a':['dfg','f','fff','fgrf','fghj'], 'b' : ['sd','dfg','edr','df','fghjky']})

Some thing like:

df['c'] = df.apply(lambda x, len(df['a']) if len(df['a']) > len(df['b']) or len(df['b']) )

One approach:

df['c'] = df.apply(lambda x: max([len(x) for x in [df['a'], df['b']]]))

which gives a column of NaNs.

      a       b   c
0   dfg      sd NaN
1     f     dfg NaN
2   fff     edr NaN
3  fgrf      df NaN
4  fghj  fghjky NaN
Asked By: piyush sharma

||

Answers:

You can use function map and select by function np.where more info

print df
#     a     b
#0  aaa  rrrr
#1   bb     k
#2  ccc     e
#condition if condition is True then len column a else column b
df['c'] = np.where(df['a'].map(len) > df['b'].map(len), df['a'].map(len), df['b'].map(len))
print df
#     a     b  c
#0  aaa  rrrr  4
#1   bb     k  2
#2  ccc     e  3

Next solution is with function apply with parameter axis=1:

axis = 1 or ‘columns’: apply function to each row

df['c'] = df.apply(lambda x: max(len(x['a']), len(x['b'])), axis=1)
Answered By: jezrael

Working on strings is a bit of a special case because string operations in pandas are not optimized so, a Python loop may actually perform better than vectorized pandas methods. So a list comprehension is a viable method; it’s readable and very fast:

df['c'] = [max(len(a), len(b)) for a, b in zip(df['a'], df['b'])]

For a little shorter code, you can try applymap():

df['c'] = df.applymap(len).max(1)

If you’re applying a lambda using if-condition, make sure to also supply the else.

df['c'] = df.apply(lambda row: len(row['a']) if len(row['a']) > len(row['b']) else len(row['b']), axis=1)

In general, you should avoid using a lambda wherever possible, because pandas has a whole host of optimized operations you can use to operate directly on the columns. For example, if you need to find the maximum value of each row, you can simply call max(axis=1) like: df[['a', 'b']].max(1).

Answered By: cottontail