looping over a list and define dataframe using the list element in Python

Question:

I have a list of names. for each name, I start with my dataframe df, and use the elements in the list to define new columns for the df. after my data manipulation is complete, I eventually create a new data frame whose name is partially derived from the list element.

list = ['foo','bar']
for x in list :
      df = prior_df
      (long code for manipulating df)
      new_df_x = df
      new_df_x.to_parquet('new_df_x.parquet')
      del new_df_x

new_df_foo = pd.read_parquet(new_df_foo.parquet)
new_df_bar = pd.read_parquet(new_df_bar.parquet)
new_df = pd.merege(new_df_foo ,new_df_bar , ...)

The reason I am using this approach is that, if I don’t use a loop and just add the foo and bar columns one after another to the original df, my data gets really big and highly fragmented before I go from wide to long and I encounter insufficient memory error. The workaround for me is to create a loop and store the data frame for each element and then at the very end join the long-format data frames together. Therefore, I cannot use the approach suggested in other answers such as creating dictionaries etc.
I am stuck at the line

new_df_x = df

where within the loop, I am using the list element in the name of the data frame.
I’d appreciate any help.

Asked By: jayjunior

||

Answers:

IIUC, you only want the filenames, i.e. the stored parquet files to have the foo and bar markers, and you can reuse the variable name itself.

list = ['foo','bar']
for x in list :
      df = prior_df
      (long code for manipulating df)
      df.to_parquet(f'new_df_{x}.parquet')
      del df

new_df_foo = pd.read_parquet(new_df_foo.parquet)
new_df_bar = pd.read_parquet(new_df_bar.parquet)
new_df = pd.merge(new_df_foo ,new_df_bar , ...)
Answered By: Mortz

Here is an example, if you are looking to define a variables names dataframe using a list element.

import pandas as pd
data = {"A": [42, 38, 39],"B": [13, 25, 45]}

prior_df=pd.DataFrame(data)

list= ['foo','bar'] 

variables = locals()


for x in list :
      df = prior_df.copy() # assign a dataframe copy to the variable df.
      # (smple code for manipulating df)
      #-----------------------------------
      if x=='foo':
        df['B']=df['A']+df['B'] #
      if x=='bar':
        df['B']=df['A']-df['B'] #
      #-----------------------------------
        
      new_df_x="new_df_{0}".format(x)
      variables[new_df_x]=df
      #del variables[new_df_x]   

print(new_df_foo) # print the 1st df variable.
print(new_df_bar) # print the 2nd df variable.
Answered By: AziMez
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.