Converting iterrows into itertuples and accessing namedtuples
Question:
Trying to reduce the overhead of iterrows by changing it to itertuples. (there are many columns)
I’m trying to turn this with iterrows
def named_tuple_issue_iterrows(df: pd.DataFrame, column_name: str):
for index, series in df.iterrows():
result = series[column_name]
# Do something with result
Into itertuples.
def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
for namedtuple in df.itertuples():
result = namedtuple[column_name] # line throws error
# Do something with result
This function doesn’t know what the column_name is before hand and also doesn’t know what index it might be.
So namedtuple.column_a
and namedtuple[1]
are not usable solutions.
The real logic requires each row to construct another dataframe(based on other data), works out some more things and then edit a 3rd dataframe. The original dataframe itself is not changed in any manner. And there is the desire to access multiple unknown columns in the original frame.
Is there a way around this or do I need to use iterrows if the column_name required is not known?
Answers:
You need to use getattr
:
def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
for namedtuple in df.itertuples():
result = getattr(namedtuple, column_name)
# Do something with result
Note that depending on what you really want to do, there might be a way to avoid the loop completely.
Trying to reduce the overhead of iterrows by changing it to itertuples. (there are many columns)
I’m trying to turn this with iterrows
def named_tuple_issue_iterrows(df: pd.DataFrame, column_name: str):
for index, series in df.iterrows():
result = series[column_name]
# Do something with result
Into itertuples.
def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
for namedtuple in df.itertuples():
result = namedtuple[column_name] # line throws error
# Do something with result
This function doesn’t know what the column_name is before hand and also doesn’t know what index it might be.
So namedtuple.column_a
and namedtuple[1]
are not usable solutions.
The real logic requires each row to construct another dataframe(based on other data), works out some more things and then edit a 3rd dataframe. The original dataframe itself is not changed in any manner. And there is the desire to access multiple unknown columns in the original frame.
Is there a way around this or do I need to use iterrows if the column_name required is not known?
You need to use getattr
:
def named_tuple_issue_itertuples(df: pd.DataFrame, column_name: str):
for namedtuple in df.itertuples():
result = getattr(namedtuple, column_name)
# Do something with result
Note that depending on what you really want to do, there might be a way to avoid the loop completely.