how to loop over a list of dataframes and change specific values

Question:

I have a list of dataframes. Each dataframe of the list has following format:

enter image description here

(screenshot from spyder)

On this list of dataframes, I perform several tasks. For example, I want to change the name of "specA" to "Homo sapiens" in all of the dataframes, so I do the following:

for i,df in enumerate(dataframes_list):
    dataframes_list[i]=df.replace("specA","Homo sapiens")

This gives me the desired output.

I perform several other of these kind of for loops with the same structure, i.e.:

for i,df in enumerate(dataframes_list):
    dataframes_list[i]= *expression*

In the end, for each dataframe I sum the "reads" column of all species:

for i,df in enumerate(dataframes_list):
    dataframes_list[i]=(df.groupby('species')['reads'].sum()).to_frame()

and thereafter merge all dataframes in the list to a single dataframe:

df_merged = reduce(lambda left,right: pd.merge(left,right,on=['species'],
                                            how='outer'), dataframes_list)

This gives me exactly the output I want to have.

Now there is a new task I want to perform, but I don’t know how to implement it.

For each dataframe in the list, I want to change the "perc_ID" value of species "specC" to "100". This could be done with .loc:

df.loc[df.species == "specC","perc_ID"]=100

But I need to loop over de DFs in the list so that would be:

for i,df in enumerate(dataframes_list):
  dataframes_list[i]=df.loc[df.species == "specC","perc_ID"]=100

this obviously does not work, as in the second line, there is 2x "=".

I could change this by removing enumerate:

for df in dataframes_list:
  df.loc[df.species == "specC","perc_ID"]=100

this works. However, as I mentioned, i perform several of these for loops, including enumerate. For some reason, if I remove the enumerate in combination with dataframes_list[i] for all for loops, my merging of the dataframes gets messed up. Therefore I would like to keep the enumerate and dataframes_list[i] for all my for loops.

So my question: How can I change the value of a specific column, based on the value of another column, when looping over a list of dataframes?

So, how to write the following code, but keeping enumerate and dataframes_list[i]?

for i,df in enumerate(dataframes_list):
  dataframes_list[i]=df.loc[df.species == "specC","perc_ID"]=100

Thanks!

PS: there are likely way better methods to work with this kind of data (no for loops, no dataframe lists,..) However, I’m not a Python expert, and I understand what I write in this way. Therefore I like to keep this structure.

Asked By: Robvh

||

Answers:

Try this:

If I understand correctly, the df is on each iteration of the loop going to point to a different dataframe, so this should result in what you want.

for df in dataframes_list:
  df.loc[df.species == "specC","perc_ID"]=100
  

Answered By: Eureka
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.