How to remove rows with null values from kth column onward in python

Question:

I need to remove all rows in which elements from column 3 onwards are all NaN

df = DataFrame(np.random.randn(6, 5), index=['a', 'c', 'e', 'f', 'g','h'], columns=['one', 'two', 'three', 'four', 'five'])

df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df2.ix[1][0] = 111
df2.ix[1][1] = 222

In the example above, my final data frame would not be having rows ‘b’ and ‘c’.

How to use df.dropna() in this case?

Asked By: user1140126

||

Answers:

You can call dropna with arguments subset and how:

df2.dropna(subset=['three', 'four', 'five'], how='all')

As the names suggests:

  • how='all' requires every column (of subset) in the row to be NaN in order to be dropped, as opposed to the default 'any'.
  • subset is those columns to inspect for NaNs.

As @PaulH points out, we can generalise to drop the last k columns with:

subset=df2.columns[k:]

Indeed, we could even do something more complicated if desired:

subset=filter(lambda x: len(x) > 3, df2.columns)
Answered By: Andy Hayden
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.