How to map a function using multiple columns in pandas?

Question:

I’ve checked out map, apply, mapapply, and combine, but can’t seem to find a simple way of doing the following:

I have a dataframe with 10 columns. I need to pass three of them into a function that takes scalars and returns a scalar …

some_func(int a, int b, int c) returns int d

I want to apply this and create a new column in the dataframe with the result.

df['d'] = some_func(a = df['a'], b = df['b'], c = df['c'])

All the solutions that I’ve found seem to suggest to rewrite some_func to work with Series instead of scalars, but this is not possible as it is part of another package. How do I elegantly do the above?

Asked By: ashishsingal

||

Answers:

I’m using the following:

df['d'] = df.apply(lambda x: some_func(a = x['a'], b = x['b'], c = x['c']))

Seems to be working well, but if anyone else has a better solution, please let me know.

Answered By: ashishsingal

Use pd.DataFrame.apply(), as below:

df['d'] = df.apply(lambda x: some_func(a = x['a'], b = x['b'], c = x['c']), axis=1)

NOTE: As @ashishsingal asked about columns, the axis argument should be provided with a value of 1, as the default is 0 (as in the documentation and copied below).

axis : {0 or ‘index’, 1 or ‘columns’}, default 0

  • 0 or ‘index’: apply function to each column
  • or ‘columns’: apply function to each row
Answered By: tsherwen

If it is a really simple function, such as one based on simple arithmetic, chances are it can be vectorized. For instance, a linear combination can be made directly from the columns:

df["d"] = w1*df["a"] + w2*df["b"] + w3*["c"]

where w1,w2,w3 are scalar weights.

Answered By: Elias Hasle

For what it’s worth on such an old question; I find that zipping function arguments into tuples and then applying the function as a list comprehension is much faster than using df.apply. For example:

import pandas as pd

# Setup:
df = pd.DataFrame(np.random.rand(10000, 3), columns=list("abc"))
def some_func(a, b, c):
    return a*b*c

# Using apply:
%timeit df['d'] = df.apply(lambda x: some_func(a = x['a'], b = x['b'], c = x['c']), axis=1)

222 ms ± 63.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Using tuples + list comprehension:
%timeit df["d"] = [some_func(*a) for a in tuple(zip(df["a"], df["b"], df["c"]))]

8.07 ms ± 640 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answered By: Toby Petty

I use map that is as fast as list comprehension (much faster than apply):

df['d'] = list(map(some_func, df['a'], df['b'], df['c']))

Example on my machine:

import pandas as pd

# Setup:
df = pd.DataFrame(np.random.rand(10000, 3), columns=list("abc"))
def some_func(a, b, c):
    return a*b*c

# Using apply:
%timeit df['d'] = df.apply(lambda x: some_func(a = x['a'], 
b = x['b'], c = x['c']), axis=1)

130 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit df['d'] = list(map(some_func, df['a'], df['b'], df['c']))

3.91 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Answered By: Andrea Dalseno

Very nice tip to use a list comprehension like Toby Petty recommended

df["d"] = [some_func(*a) for a in tuple(zip(df["a"], df["b"], df["c"]))]

This can be further optimized by removing the tuple instantiation

df["d"] = [some_func(*a) for a in zip(df["a"], df["b"], df["c"])]

A even faster way to map multiple columnns is to use frompyfunc from numpy to create a vectorized version of the python function

import numpy as np
    
some_func_vec = np.frompyfunc(some_func, 3, 1)
df["d"] = some_func_vec(df["a"], df["b"], df["c"])
Answered By: Max O
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.