Iterating through a list of data frames and updating the data frame

Question:

Need help with iterating through list of data frames and updating the data frame

I have 3 data frames and I want to have only column names containing ‘FLAG’ and I used the below code

import pandas as pd

df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])

for df in [df1, df2, df3]:
    # col_lst = [for col in df.columns if col in 'FLAG_']
    df = df.filter(regex='FLAG')

print(df3.columns)

Output

enter image description here

but if I assign separately like

df1 = df1.filter(regex='FLAG')

I am getting the expected result. How to iterate through df list to get the desired result

Asked By: rams

||

Answers:

You are currently only creating copies that you discard at each iteration.

You can instead use drop with inplace=True:

for df in [df1, df2, df3]:
    df.drop(columns=df.columns.difference(df.filter(regex='FLAG').columns), inplace=True)

print(df3.columns)

output:

Index(['FLAG_ACTIVE', 'FLAG_CURRENT'], dtype='object')
Answered By: mozway

We can use enumerate to get in the index of the list item we currently have in our loop and update it.

dfs = [df1, df2, df3]
for i, df in enumerate(dfs):
    dfs[i] = df.filter(like="FLAG")


print(dfs[0])
Empty DataFrame
Columns: [FLAG_ACTIVE, FLAG_CURRENT]
Index: []

A dictionary would be a more clear data structure to use here:

dfs = {"df1": df1, "df2": df2, "df3": df3}

for name, df in dfs.items():
    dfs[name] = df.filter(like="FLAG")
Answered By: Erfan

or:

import pandas as pd

df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])

for df in [df1, df2, df3]:
    df.drop(df.columns[~df.columns.str.contains('FLAG')], axis = 1, inplace = True)

print(df3.columns)
Answered By: Kamil Oster
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.