Converting iterrows into itertuples and accessing namedtuples

Question:

Trying to reduce the overhead of iterrows by changing it to itertuples. (there are many columns)

I’m trying to turn this with iterrows

def named_tuple_issue_iterrows(df: pd.DataFrame, column_name: str):
    for index, series in df.iterrows():
         result = series[column_name]
         # Do something with result

Into itertuples.

def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
    for namedtuple in df.itertuples():
         result = namedtuple[column_name]  # line throws error
         # Do something with result

This function doesn’t know what the column_name is before hand and also doesn’t know what index it might be.
So namedtuple.column_a and namedtuple[1] are not usable solutions.

The real logic requires each row to construct another dataframe(based on other data), works out some more things and then edit a 3rd dataframe. The original dataframe itself is not changed in any manner. And there is the desire to access multiple unknown columns in the original frame.

Is there a way around this or do I need to use iterrows if the column_name required is not known?

Asked By: bob marley

||

Answers:

You need to use getattr:

def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
    for namedtuple in df.itertuples():
         result = getattr(namedtuple, column_name)
         # Do something with result

Note that depending on what you really want to do, there might be a way to avoid the loop completely.

Answered By: mozway