Pandas: calculate mean for each row by cycle number
Question:
I have a CSV file (Mspec Data) which looks like this:
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452; 1,00; 620
0000000001;00:00:01;0000001452; 1,20; 4730
0000000001;00:00:01;0000001452; 1,40; 4610
... ;..:..:..;..........;.........;...........
I read it via:
df = pd.read_csv(Filename, header=30,delimiter=';',decimal= ',' )
the result looks like this:
Cycle Time ms mass amu SEM c/s
0 1 00:00:01 1452 1.0 620
1 1 00:00:01 1452 1.2 4730
2 1 00:00:01 1452 1.4 4610
... ... ... ... ... ...
3872 4 00:06:30 390971 1.0 32290
3873 4 00:06:30 390971 1.2 31510
This data contains several Mass spec scans with identical parameters. Cycle number 1 means scan 1 and so forth. I would like to calculate the mean in the last column SEM c/s for each corresponding identical mass. in the end i would like to have a new data frame containing only:
ms "mass amu" "SEM c/s(mean over all cycles)"
obviously the mean of the mass does not need to be calculated. I would like to avoid to read each cycle into a new dataframe as this would mean I have to look up the length of each Mass spectrum . The "mass range" and " resolution" is obviously different for different measurements (Solution).
I guess doing the calculation in numpy directly would be best but I am stuck?
Thank you in advance
Answers:
You can use groupby()
, something like this:
df.groupby(['ms', 'mass amu'])['SEM c/s'].mean()
You have different ms over all the cycles, and you want to calculate the mean of SEM over each group of same ms.
I will show you a step-by-step example.
You should invoke each group and then put the mean in a dictionary to convert in DataFrame.
ms_uni = df['ms'].unique() #calculate the unique ms values
new_df_dict = { "ma":[], "SEM":[] } #later you will rename them
for un in range( len(ms_uni) ):
cms = ms_uni[un]
new_df_dict['ma'].append( cms )
new_df_dict['SEM'].append( df[ df['ms']==cms ]['SEM c/s'].mean() ) #advise: change the column name in a more safe SEM-c_s
new_df = pd.DataFrame(new_df_dict) #end of the dirty work
new_df.rename(index=str, columns={'ma':"mass amu", "SEM": "SEM c/s(mean over all cycles)"} )
Hope it will be helpful
I have a CSV file (Mspec Data) which looks like this:
#Header
#
"Cycle";"Time";"ms";"mass amu";"SEM c/s"
0000000001;00:00:01;0000001452; 1,00; 620
0000000001;00:00:01;0000001452; 1,20; 4730
0000000001;00:00:01;0000001452; 1,40; 4610
... ;..:..:..;..........;.........;...........
I read it via:
df = pd.read_csv(Filename, header=30,delimiter=';',decimal= ',' )
the result looks like this:
Cycle Time ms mass amu SEM c/s
0 1 00:00:01 1452 1.0 620
1 1 00:00:01 1452 1.2 4730
2 1 00:00:01 1452 1.4 4610
... ... ... ... ... ...
3872 4 00:06:30 390971 1.0 32290
3873 4 00:06:30 390971 1.2 31510
This data contains several Mass spec scans with identical parameters. Cycle number 1 means scan 1 and so forth. I would like to calculate the mean in the last column SEM c/s for each corresponding identical mass. in the end i would like to have a new data frame containing only:
ms "mass amu" "SEM c/s(mean over all cycles)"
obviously the mean of the mass does not need to be calculated. I would like to avoid to read each cycle into a new dataframe as this would mean I have to look up the length of each Mass spectrum . The "mass range" and " resolution" is obviously different for different measurements (Solution).
I guess doing the calculation in numpy directly would be best but I am stuck?
Thank you in advance
You can use groupby()
, something like this:
df.groupby(['ms', 'mass amu'])['SEM c/s'].mean()
You have different ms over all the cycles, and you want to calculate the mean of SEM over each group of same ms.
I will show you a step-by-step example.
You should invoke each group and then put the mean in a dictionary to convert in DataFrame.
ms_uni = df['ms'].unique() #calculate the unique ms values
new_df_dict = { "ma":[], "SEM":[] } #later you will rename them
for un in range( len(ms_uni) ):
cms = ms_uni[un]
new_df_dict['ma'].append( cms )
new_df_dict['SEM'].append( df[ df['ms']==cms ]['SEM c/s'].mean() ) #advise: change the column name in a more safe SEM-c_s
new_df = pd.DataFrame(new_df_dict) #end of the dirty work
new_df.rename(index=str, columns={'ma':"mass amu", "SEM": "SEM c/s(mean over all cycles)"} )
Hope it will be helpful