What is `pandas.DataFrame.apply` actually operating on?

Question:

I have two questions, but first I will give the context. I am trying to use a pandas DataFrame with some existing code using a functional programming approach. I basically want to map a function to every row of a DataFrame, expanding the row using the double-asterisk keyword argument notation, where each column name of the DataFrame corresponds to one of the arguments of the existing function.

For example, say I have the following function.

def line(m, x, b):
    y = (m * x) + b

    return y

And I have a pandas DataFrame

data = [{"b": 1, "m": 1, "x": 2}, {"b": 2, "m": 2, "x": 3}]
df = pd.DataFrame(data)

# Returns
#    b  m  x
# 0  1  1  2
# 1  2  2  3

Ultimately, I want to construct a column in the DataFrame from the results of line applied to each row; something like the following.

# Note that I'm using the list of dicts defined above, not the DataFrame.
results = [line(**datum) for datum in data]

I feel like I should be able to use some combination of DataFrame.apply, a lambda, probably Series.to_dict, and the double-asterisk keyword argument expansion but I can’t figure out what is passed to the lambda in the following expression.

df.apply(lambda x: x, axis=1)
#               ^
#               What is pandas passing to my identity lambda?

I’ve tried to inspect with type and x.__class__, but both of the following lines throw TypeErrors.

df.apply(lambda x: type(x), axis=1)
df.apply(lambda x: x.__class__, axis=1)

I don’t want to write/refactor a new line function that can wrangle some pandas object because I shouldn’t have to. Ultimately, I want to end up with a DataFrame with columns for the input data and a column with the corresponding output of the line function.

My two questions are:

  1. How can I pass a row of a pandas DataFrame to a function using keyword-argument expansion, either using the DataFrame.apply method or some other (functional) approach?
  2. What exactly is DataFrame.apply passing to the function that I specify?

Maybe there is some other functional approach I could take that I’m just not aware of, but I figure pandas is a pretty popular library for this kind of thing and that’s why I’m trying to use it. Also there are some data (de)serialization issues I’m facing that pandas should make pretty easy vs. writing a more bespoke solution.

Thanks.

Asked By: joshua.r.smith

||

Answers:

Maybe this is what you are looking for.

1)

df.apply(lambda x: line(**x.to_dict()), axis=1)

Result

0    3
1    8

2)

The function for df.apply(..., axis=1) receives a Series representing a row with the column names as index entries.

Answered By: jch