Apply function to each cell in DataFrame
Question:
I have a dataframe that may look like this:
A B C
foo bar foo bar
bar foo foo bar
I want to look through every element of each row (or every element of each column) and apply the following function to get the subsequent dataframe:
def foo_bar(x):
return x.replace('foo', 'wow')
After applying the function, my dataframe will look like this:
A B C
wow bar wow bar
bar wow wow bar
Is there a simple one-liner that can apply a function to each cell?
This is a simplistic example so there may be an easier way to execute this specific example other than applying a function, but what I am really asking about is how to apply a function in every cell within a dataframe.
Answers:
You can use applymap()
which is concise for your case.
df.applymap(foo_bar)
# A B C
#0 wow bar wow bar
#1 bar wow wow bar
Another option is to vectorize your function and then use apply
method:
import numpy as np
df.apply(np.vectorize(foo_bar))
# A B C
#0 wow bar wow bar
#1 bar wow wow bar
I guess you could use np.vectorize
:
>>> df[:] = np.vectorize(foo_bar)(df)
>>> df
A B C
foo bar wow bar
bar wow wow bar
>>>
This might be quicker, since it’s using numpy
.
Expanding on Psidom’s answer, if the function you define accepts additional arguments, then you can pass them along using kwargs. For example, to toggle repl
of foo_bar()
in the OP:
def foo_bar(x, bar=''):
return x.replace('foo', bar)
df.applymap(foo_bar, bar='haha')
One of the common cases where applymap
is especially useful is string operations (as in the OP). Since string operations in pandas are not optimized, a loop often performs better than vectorized operations especially if there are many operations. For example, for the following simple task of replacing values in a frame using a condition, applymap
is over 3 times faster than an equivalent vectorized pandas code.
def foo_bar(x):
return x.replace('foo', 'wow') if len(x)>3 else x + ' this'
df = pd.DataFrame([['foo', 'bar', 'foo bar'], ['bar foo', 'foo', 'bar']]*500000, columns=[*'ABC'])
%timeit df.applymap(foo_bar)
# 1.47 s ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.apply(lambda x: np.where(x.str.len()>3, x.str.replace('foo', 'wow'), x + ' this'))
# 4.64 s ± 597 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
I have a dataframe that may look like this:
A B C
foo bar foo bar
bar foo foo bar
I want to look through every element of each row (or every element of each column) and apply the following function to get the subsequent dataframe:
def foo_bar(x):
return x.replace('foo', 'wow')
After applying the function, my dataframe will look like this:
A B C
wow bar wow bar
bar wow wow bar
Is there a simple one-liner that can apply a function to each cell?
This is a simplistic example so there may be an easier way to execute this specific example other than applying a function, but what I am really asking about is how to apply a function in every cell within a dataframe.
You can use applymap()
which is concise for your case.
df.applymap(foo_bar)
# A B C
#0 wow bar wow bar
#1 bar wow wow bar
Another option is to vectorize your function and then use apply
method:
import numpy as np
df.apply(np.vectorize(foo_bar))
# A B C
#0 wow bar wow bar
#1 bar wow wow bar
I guess you could use np.vectorize
:
>>> df[:] = np.vectorize(foo_bar)(df)
>>> df
A B C
foo bar wow bar
bar wow wow bar
>>>
This might be quicker, since it’s using numpy
.
Expanding on Psidom’s answer, if the function you define accepts additional arguments, then you can pass them along using kwargs. For example, to toggle repl
of foo_bar()
in the OP:
def foo_bar(x, bar=''):
return x.replace('foo', bar)
df.applymap(foo_bar, bar='haha')
One of the common cases where applymap
is especially useful is string operations (as in the OP). Since string operations in pandas are not optimized, a loop often performs better than vectorized operations especially if there are many operations. For example, for the following simple task of replacing values in a frame using a condition, applymap
is over 3 times faster than an equivalent vectorized pandas code.
def foo_bar(x):
return x.replace('foo', 'wow') if len(x)>3 else x + ' this'
df = pd.DataFrame([['foo', 'bar', 'foo bar'], ['bar foo', 'foo', 'bar']]*500000, columns=[*'ABC'])
%timeit df.applymap(foo_bar)
# 1.47 s ± 37.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df.apply(lambda x: np.where(x.str.len()>3, x.str.replace('foo', 'wow'), x + ' this'))
# 4.64 s ± 597 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)