how to compute mean absolute deviation row wise in pandas

Question:

snippet of the dataframe is as follows. but actual dataset is 200000 x 130.

ID 1-jan 2-jan 3-jan 4-jan 
1.  4      5    7    8     
2.  2      0    1    9     
3.  5      8    0    1     
4.  3      4    0    0   

I am trying to compute Mean Absolute Deviation for each row value like this.

ID 1-jan 2-jan 3-jan 4-jan mean
1.    4      5    7    8   12.5 
1_MAD 8.5   7.5 5.5  4.5
2.    2      0    1    9    6  
2_MAD.4      6    5    3
.
. 
    

I tried this,

new_df = pd.DataFrame()
for rows in (df['ID']):
    new_df[str(rows) + '_mad'] = mad(df3.loc[row_value][1:])
    new_df.T

where mad is a function that compares the mean to each value.

But, this is very time consuming since i have a large dataset and i need to do in a quickest way possible.

Asked By: skiddy

||

Answers:

It’s possible to specify axis=1 to apply the mean calculation across columns:

df['mean_across_cols'] = df.mean(axis=1)
Answered By: SultanOrazbayev

IIUC use:

#convert ID to index
df = df.set_index('ID')
#mean to Series
mean = df.mean(axis=1)

from toolz import interleave

#subtract all columns by mean, add suffix
df1 = df.sub(mean, axis=0).abs().rename(index=lambda x: f'{x}_MAD')
#join with original with mean and interleave indices
df = pd.concat([df.assign(mean=mean), df1]).loc[list(interleave([df.index, df1.index]))]
print (df)
         1-jan  2-jan  3-jan  4-jan  mean
ID                                       
1.0       4.00   5.00   7.00   8.00  6.00
1.0_MAD   2.00   1.00   1.00   2.00   NaN
2.0       2.00   0.00   1.00   9.00  3.00
2.0_MAD   1.00   3.00   2.00   6.00   NaN
3.0       5.00   8.00   0.00   1.00  3.50
3.0_MAD   1.50   4.50   3.50   2.50   NaN
4.0       3.00   4.00   0.00   0.00  1.75
4.0_MAD   1.25   2.25   1.75   1.75   NaN
Answered By: jezrael
pd.concat([df1.assign(mean1=df1.mean(axis=1)).set_index(df1.index.astype('str'))
              ,df1.assign(mean1=df1.mean(axis=1)).apply(lambda ss:ss.mean1-ss,axis=1)
                    .T.add_suffix('_MAD').T.assign(mean1='')]).sort_index().pipe(print)


         1-jan  2-jan  3-jan  4-jan mean1
ID                                       
1.0       4.00   5.00   7.00   8.00   6.0
1.0_MAD   2.00   1.00  -1.00  -2.00      
2.0       2.00   0.00   1.00   9.00   3.0
2.0_MAD   1.00   3.00   2.00  -6.00      
3.0       5.00   8.00   0.00   1.00   3.5
3.0_MAD  -1.50  -4.50   3.50   2.50      
4.0       3.00   4.00   0.00   0.00  1.75
4.0_MAD  -1.25  -2.25   1.75   1.75
Answered By: G.G