Apply function to each cell in DataFrame

Question:

I have a dataframe that may look like this:

A        B        C
foo      bar      foo bar
bar foo  foo      bar

I want to look through every element of each row (or every element of each column) and apply the following function to get the subsequent dataframe:

def foo_bar(x):
    return x.replace('foo', 'wow')

After applying the function, my dataframe will look like this:

A        B        C
wow      bar      wow bar
bar wow  wow      bar

Is there a simple one-liner that can apply a function to each cell?

This is a simplistic example so there may be an easier way to execute this specific example other than applying a function, but what I am really asking about is how to apply a function in every cell within a dataframe.

Asked By: eljusticiero67

||

Answers:

You can use applymap() which is concise for your case.

df.applymap(foo_bar)

#     A       B       C
#0  wow     bar wow bar
#1  bar wow wow     bar

Another option is to vectorize your function and then use apply method:

import numpy as np
df.apply(np.vectorize(foo_bar))
#     A       B       C
#0  wow     bar wow bar
#1  bar wow wow     bar
Answered By: Psidom

I guess you could use np.vectorize:

>>> df[:] = np.vectorize(foo_bar)(df)
>>> df
       A    B    C
foo  bar  wow  bar
bar  wow  wow  bar
>>> 

This might be quicker, since it’s using numpy.

Answered By: U13-Forward

Expanding on Psidom’s answer, if the function you define accepts additional arguments, then you can pass them along using kwargs. For example, to toggle repl of foo_bar() in the OP:

def foo_bar(x, bar=''):
    return x.replace('foo', bar)

df.applymap(foo_bar, bar='haha')

One of the common cases where applymap is especially useful is string operations (as in the OP). Since string operations in pandas are not optimized, a loop often performs better than vectorized operations especially if there are many operations. For example, for the following simple task of replacing values in a frame using a condition, applymap is over 3 times faster than an equivalent vectorized pandas code.

def foo_bar(x):
    return x.replace('foo', 'wow') if len(x)>3 else x + ' this'

df = pd.DataFrame([['foo', 'bar', 'foo bar'], ['bar foo', 'foo', 'bar']]*500000, columns=[*'ABC'])

%timeit df.applymap(foo_bar)
# 1.47 s ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df.apply(lambda x: np.where(x.str.len()>3, x.str.replace('foo', 'wow'), x + ' this'))
# 4.64 s ± 597 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Answered By: cottontail
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.