how to loop over a list of dataframes and change specific values
Question:
I have a list of dataframes. Each dataframe of the list has following format:
(screenshot from spyder)
On this list of dataframes, I perform several tasks. For example, I want to change the name of "specA" to "Homo sapiens" in all of the dataframes, so I do the following:
for i,df in enumerate(dataframes_list):
dataframes_list[i]=df.replace("specA","Homo sapiens")
This gives me the desired output.
I perform several other of these kind of for loops
with the same structure, i.e.:
for i,df in enumerate(dataframes_list):
dataframes_list[i]= *expression*
In the end, for each dataframe I sum the "reads" column of all species:
for i,df in enumerate(dataframes_list):
dataframes_list[i]=(df.groupby('species')['reads'].sum()).to_frame()
and thereafter merge all dataframes in the list to a single dataframe:
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['species'],
how='outer'), dataframes_list)
This gives me exactly the output I want to have.
Now there is a new task I want to perform, but I don’t know how to implement it.
For each dataframe in the list, I want to change the "perc_ID" value of species "specC" to "100". This could be done with .loc:
df.loc[df.species == "specC","perc_ID"]=100
But I need to loop over de DFs in the list so that would be:
for i,df in enumerate(dataframes_list):
dataframes_list[i]=df.loc[df.species == "specC","perc_ID"]=100
this obviously does not work, as in the second line, there is 2x "=".
I could change this by removing enumerate
:
for df in dataframes_list:
df.loc[df.species == "specC","perc_ID"]=100
this works. However, as I mentioned, i perform several of these for loops
, including enumerate
. For some reason, if I remove the enumerate
in combination with dataframes_list[i]
for all for loops, my merging of the dataframes gets messed up. Therefore I would like to keep the enumerate
and dataframes_list[i]
for all my for loops
.
So my question: How can I change the value of a specific column, based on the value of another column, when looping over a list of dataframes?
So, how to write the following code, but keeping enumerate
and dataframes_list[i]
?
for i,df in enumerate(dataframes_list):
dataframes_list[i]=df.loc[df.species == "specC","perc_ID"]=100
Thanks!
PS: there are likely way better methods to work with this kind of data (no for loops, no dataframe lists,..) However, I’m not a Python expert, and I understand what I write in this way. Therefore I like to keep this structure.
Answers:
Try this:
If I understand correctly, the df
is on each iteration of the loop going to point to a different dataframe, so this should result in what you want.
for df in dataframes_list:
df.loc[df.species == "specC","perc_ID"]=100
I have a list of dataframes. Each dataframe of the list has following format:
(screenshot from spyder)
On this list of dataframes, I perform several tasks. For example, I want to change the name of "specA" to "Homo sapiens" in all of the dataframes, so I do the following:
for i,df in enumerate(dataframes_list):
dataframes_list[i]=df.replace("specA","Homo sapiens")
This gives me the desired output.
I perform several other of these kind of for loops
with the same structure, i.e.:
for i,df in enumerate(dataframes_list):
dataframes_list[i]= *expression*
In the end, for each dataframe I sum the "reads" column of all species:
for i,df in enumerate(dataframes_list):
dataframes_list[i]=(df.groupby('species')['reads'].sum()).to_frame()
and thereafter merge all dataframes in the list to a single dataframe:
df_merged = reduce(lambda left,right: pd.merge(left,right,on=['species'],
how='outer'), dataframes_list)
This gives me exactly the output I want to have.
Now there is a new task I want to perform, but I don’t know how to implement it.
For each dataframe in the list, I want to change the "perc_ID" value of species "specC" to "100". This could be done with .loc:
df.loc[df.species == "specC","perc_ID"]=100
But I need to loop over de DFs in the list so that would be:
for i,df in enumerate(dataframes_list):
dataframes_list[i]=df.loc[df.species == "specC","perc_ID"]=100
this obviously does not work, as in the second line, there is 2x "=".
I could change this by removing enumerate
:
for df in dataframes_list:
df.loc[df.species == "specC","perc_ID"]=100
this works. However, as I mentioned, i perform several of these for loops
, including enumerate
. For some reason, if I remove the enumerate
in combination with dataframes_list[i]
for all for loops, my merging of the dataframes gets messed up. Therefore I would like to keep the enumerate
and dataframes_list[i]
for all my for loops
.
So my question: How can I change the value of a specific column, based on the value of another column, when looping over a list of dataframes?
So, how to write the following code, but keeping enumerate
and dataframes_list[i]
?
for i,df in enumerate(dataframes_list):
dataframes_list[i]=df.loc[df.species == "specC","perc_ID"]=100
Thanks!
PS: there are likely way better methods to work with this kind of data (no for loops, no dataframe lists,..) However, I’m not a Python expert, and I understand what I write in this way. Therefore I like to keep this structure.
Try this:
If I understand correctly, the df
is on each iteration of the loop going to point to a different dataframe, so this should result in what you want.
for df in dataframes_list:
df.loc[df.species == "specC","perc_ID"]=100