Preprocessing Sklearn Imputer when column missing values

Question

I’m trying to use Imputer for missing values.
I would like to keep track also of columns with all missing values but because otherwise I don’t know which of them (columns) have been processed:
Is possible to return also columns with all missing values?

Impute Notes

When axis=0, columns which only contained missing values at fit are
discarded upon transform. When axis=1, an exception is raised if there
are rows for which it is not possible to fill in the missing values
(e.g., because they only contain missing values).

import pandas as pd
import numpy as np
from sklearn.preprocessing import Imputer
data={'b1':[1,2,3,4,5],'b2':[1,2,4,4,0],'b3':[0,0,0,0,0]}
X= pd.DataFrame(data)
Imp = Imputer(missing_values=0)
print (Imp.fit_transform(X))

print(X)
   b1  b2  b3
0   1   1   0
1   2   2   0
2   3   4   0
3   4   4   0
4   5   0   0

runfile
[[ 1.    1.  ]
 [ 2.    2.  ]
 [ 3.    4.  ]
 [ 4.    4.  ]
 [ 5.    2.75]]

Asked By: Guido

||

Source

Answer 1

The statistics_ attribute from the Imputer class will return the fill value for each column, including the dropped ones.

statistics_ : array of shape (n_features,)
The imputation fill value for each feature if axis == 0.

Imp.statistics_
array([3.  , 2.75,  nan])

An example of getting column names of the columns with all “missing” values.

nanmask = np.isnan(Imp.statistics_)

nanmask
array([False, False,  True])

X.columns[nanmask]
Index([u'b3'], dtype='object')

Answered By: Kevin

Preprocessing Sklearn Imputer when column missing values

Question:

Answers: