Apply a function row by row using other dataframes' rows as list inputs in python

Question:

I’m trying to apply a function row-by-row which takes 5 inputs, 3 of which are lists. I want these lists to come from each row of 3 correspondings dataframes.

I’ve tried using ‘apply’ and ‘lambda’ as follows:

sol['tf_dd']=sol.apply(lambda tsol, rfsol, rbsol: 
                           taurho_difdif(xy=xy,
                                         l=l,
                                         t=tsol,
                                         rf=rfsol,
                                         rb=rbsol),
                           axis=1)

However I get the error <lambda>() missing 2 required positional arguments: 'rfsol' and 'rbsol'

The DataFrame sol and the DataFrames tsol, rfsol and rbsol all have the same length. For each row, I want the entire row from tsol, rfsol and rbsol to be input as three lists.

Here is much simplified example (first with single lists, which I then want to replicate row by row with dataframes):

The output with single lists is a single value (120). With dataframes as inputs I want an output dataframe of length 10 where all values are 120.

t=[1,2,3,4,5]
rf=[6,7,8,9,10]
rb=[11,12,13,14,15]

def simple_func(t, rf, rb):
    x=sum(t)
    y=sum(rf)
    z=sum(rb)

    return x+y+z

out=simple_func(t,rf,rb)

# dataframe rows as lists
tsol=pd.DataFrame((t,t,t,t,t,t,t,t,t,t))
rfsol=pd.DataFrame((rf,rf,rf,rf,rf,rf,rf,rf,rf,rf))
rbsol=pd.DataFrame((rb,rb,rb,rb,rb,rb,rb,rb,rb,rb))


out2 = pd.DataFrame(index=range(len(tsol)), columns=['output'])
out2['output'] = out2.apply(lambda tsol, rfsol, rbsol:
                            simple_func(t=tsol.tolist(),
                                        rf=rfsol.tolist(),
                                        rb=rbsol.tolist()),
                            axis=1)
Asked By: Joe Roberts

||

Answers:

When you run df.apply() with axis=1, it does not pass on the columns as individual arguments to the function, but as a Series object, as explained here. The correct way to do this would be

out2['output'] = out2.apply(lambda row:
                            simple_func(t=row["tsol"],
                                        rf=row["rfsol"],
                                        rb=row["rbsol"]),
                            axis=1)
Answered By: OuterSoda

Try to use "name" field in Series Type to get index value, and then get the same index for the other DataFrame

import pandas as pd
import numpy as np


def postional_sum(inot, df1, df2, df3):
    """
        Get input index and gather the same position for the other DataFrame collection
    """

    position = inot.name

    x = df1.iloc[position].sum()
    y = df2.iloc[position].sum()
    z = df3.iloc[position].sum()
    return x + y + z


# dataframe rows as lists
tsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
rfsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))
rbsol = pd.DataFrame(np.random.randn(10, 5), columns=range(5))

out2 = pd.DataFrame(index=range(len(tsol)), columns=["output"])

out2["output"] = out2.apply(lambda x: postional_sum(x, tsol, rfsol, rbsol), axis=1)

out2

Hope this helps!

Answered By: AntonioRB

You can eliminate the simple function using this:

out2["output"] = tsol.sum(axis=1) + rfsol.sum(axis=1) + rbsol.sum(axis=1)

Here is the complete code:

t=[1,2,3,4,5]
rf=[6,7,8,9,10]
rb=[11,12,13,14,15]

# dataframe rows as lists
tsol=pd.DataFrame((t,t,t,t,t,t,t,t,t,t))
rfsol=pd.DataFrame((rf,rf,rf,rf,rf,rf,rf,rf,rf,rf))
rbsol=pd.DataFrame((rb,rb,rb,rb,rb,rb,rb,rb,rb,rb))

out2 = pd.DataFrame(index=range(len(tsol)), columns=["output"])
out2["output"] = tsol.sum(axis=1) + rfsol.sum(axis=1) + rbsol.sum(axis=1)

print(out2)

OUTPUT:

   output
0     120
1     120
2     120
3     120
4     120
5     120
6     120
7     120
8     120
9     120
Answered By: ScottC