Dataframe list comprehension "zip(…)": loop through chosen df columns efficiently with just a list of column name strings

Question:

This is just a nitpicking syntactic question…

I have a dataframe, and I want to use list comprehension to evaluate a function using lots of columns.

I know I can do this

df['result_col'] = [some_func(*var) for var in zip(df['col_1'], df['col_2'],... ,df['col_n'])]

I would like to do something like this

df['result_col'] = [some_func(*var) for var in zip(df[['col_1', 'col_2',... ,'col_n']])]

i.e. not having to write df n times. I cannot for the life of me figure out the syntax.

Asked By: mortysporty

||

Answers:

this should work, but honestly, OP figured it himself as well, so +1 OP 🙂

df['result_col'] = [some_func(*var) for var in zip(*[df[col] for col in ['col_1', 'col_2',... ,'col_n']])]
Answered By: deadvoid

As mentioned in the comments above, you should use apply instead:

df['reult_col'] = df.apply(lambda x: some_func(*tuple(x.values)), axis=1)
Answered By: gyx-hh

df.apply() is almost as slow as df.iterrows(), both are not recommended, see How to iterate over rows in a DataFrame in Pandas –> search for "An Obvious Example" of @cs95a and see the comparison graph. As the fastest ways (vectorization, Cython routines) are not easy to implement, the 3rd best and thus usually best solution is list comprehension:

# print 3rd col
def some_func(row):
    print(row[2])


df['result_col'] = [some_func(*row) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

or

# print 3rd col
def some_func(row):
    print(row[2])

df['result_col'] = [some_func(row[0]) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

or

# print 3rd col
def some_func(x):
    print(x)

df['result_col'] = [some_func(row[0][2]) for row in zip(df[['col_1', 'col_2',... ,'col_n']].to_numpy())]

Further reading:

EDIT:

Please use df.iloc and df.loc instead of df[[...]], see Selecting multiple columns in a Pandas dataframe

Answered By: questionto42