Iterating through a list of data frames and updating the data frame
Question:
Need help with iterating through list of data frames and updating the data frame
I have 3 data frames and I want to have only column names containing ‘FLAG’ and I used the below code
import pandas as pd
df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
for df in [df1, df2, df3]:
# col_lst = [for col in df.columns if col in 'FLAG_']
df = df.filter(regex='FLAG')
print(df3.columns)
Output
but if I assign separately like
df1 = df1.filter(regex='FLAG')
I am getting the expected result. How to iterate through df list to get the desired result
Answers:
You are currently only creating copies that you discard at each iteration.
You can instead use drop
with inplace=True
:
for df in [df1, df2, df3]:
df.drop(columns=df.columns.difference(df.filter(regex='FLAG').columns), inplace=True)
print(df3.columns)
output:
Index(['FLAG_ACTIVE', 'FLAG_CURRENT'], dtype='object')
We can use enumerate
to get in the index of the list item we currently have in our loop and update it.
dfs = [df1, df2, df3]
for i, df in enumerate(dfs):
dfs[i] = df.filter(like="FLAG")
print(dfs[0])
Empty DataFrame
Columns: [FLAG_ACTIVE, FLAG_CURRENT]
Index: []
A dictionary would be a more clear data structure to use here:
dfs = {"df1": df1, "df2": df2, "df3": df3}
for name, df in dfs.items():
dfs[name] = df.filter(like="FLAG")
or:
import pandas as pd
df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
for df in [df1, df2, df3]:
df.drop(df.columns[~df.columns.str.contains('FLAG')], axis = 1, inplace = True)
print(df3.columns)
Need help with iterating through list of data frames and updating the data frame
I have 3 data frames and I want to have only column names containing ‘FLAG’ and I used the below code
import pandas as pd
df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
for df in [df1, df2, df3]:
# col_lst = [for col in df.columns if col in 'FLAG_']
df = df.filter(regex='FLAG')
print(df3.columns)
Output
but if I assign separately like
df1 = df1.filter(regex='FLAG')
I am getting the expected result. How to iterate through df list to get the desired result
You are currently only creating copies that you discard at each iteration.
You can instead use drop
with inplace=True
:
for df in [df1, df2, df3]:
df.drop(columns=df.columns.difference(df.filter(regex='FLAG').columns), inplace=True)
print(df3.columns)
output:
Index(['FLAG_ACTIVE', 'FLAG_CURRENT'], dtype='object')
We can use enumerate
to get in the index of the list item we currently have in our loop and update it.
dfs = [df1, df2, df3]
for i, df in enumerate(dfs):
dfs[i] = df.filter(like="FLAG")
print(dfs[0])
Empty DataFrame
Columns: [FLAG_ACTIVE, FLAG_CURRENT]
Index: []
A dictionary would be a more clear data structure to use here:
dfs = {"df1": df1, "df2": df2, "df3": df3}
for name, df in dfs.items():
dfs[name] = df.filter(like="FLAG")
or:
import pandas as pd
df1 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df2 = pd.DataFrame(columns=['PRODUCT_ID', 'PURCHASE_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
df3 = pd.DataFrame(columns=['FREQUNECY_ID', 'START_DATE', 'END_DATE', 'FLAG_ACTIVE', 'FLAG_CURRENT'])
for df in [df1, df2, df3]:
df.drop(df.columns[~df.columns.str.contains('FLAG')], axis = 1, inplace = True)
print(df3.columns)