DataFrame most efficient way update row value less than 40% to NaN?

Question:

I have big dataframe, need to find all element less than 40% in a row set to NaN, element not sorted, repeat this for each row.

I can force the calculation, but you can imagine it’s not very efficient, there is no efficient way to do it?

40% mean row element order asc, and set low order 40% element to nan, does not contain an element that is itself a nan.
If I have ten element : 1,21,20,4,5,6,7,9,10,11, should sort it to 1,4,5,6,7,9,10,11,20,21 and remove 1,4,5,6, finally become nan,21,20,nan,nan,nan,7,9,10,11.

1  21  20  4  5  6  7  9  10  11

to

NaN  21  20 NaN NaN NaN  7  9  10  11
Asked By: Zheng Xiaodong

||

Answers:

Use DataFrame.count for get number of non missing values per rows, then compare by positions of sorted values by double numpy.argsort and last set missing values by mask:

print (df)
   0   1   2   3   4   5   6    7   8   9     10
0   1   2   3  10   5   6   7  NaN   9   4  11.0
1   1  21  20   4   5   6   7  9.0  10  11   NaN

counts = df.count(axis=1).mul(0.4).to_numpy()[:, None]
arr = np.argsort(np.argsort(df.to_numpy()))

df[arr < counts] = np.nan
print (df)
   0     1     2     3    4    5   6    7   8     9     10
0 NaN   NaN   NaN  10.0  5.0  6.0   7  NaN   9   NaN  11.0
1 NaN  21.0  20.0   NaN  NaN  NaN   7  9.0  10  11.0   NaN
Answered By: jezrael
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.