How to remove rows with null values from kth column onward in python
Question:
I need to remove all rows in which elements from column 3 onwards are all NaN
df = DataFrame(np.random.randn(6, 5), index=['a', 'c', 'e', 'f', 'g','h'], columns=['one', 'two', 'three', 'four', 'five'])
df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df2.ix[1][0] = 111
df2.ix[1][1] = 222
In the example above, my final data frame would not be having rows ‘b’ and ‘c’.
How to use df.dropna()
in this case?
Answers:
You can call dropna
with arguments subset
and how
:
df2.dropna(subset=['three', 'four', 'five'], how='all')
As the names suggests:
how='all'
requires every column (of subset
) in the row to be NaN
in order to be dropped, as opposed to the default 'any'
.
subset
is those columns to inspect for NaN
s.
As @PaulH points out, we can generalise to drop the last k
columns with:
subset=df2.columns[k:]
Indeed, we could even do something more complicated if desired:
subset=filter(lambda x: len(x) > 3, df2.columns)
I need to remove all rows in which elements from column 3 onwards are all NaN
df = DataFrame(np.random.randn(6, 5), index=['a', 'c', 'e', 'f', 'g','h'], columns=['one', 'two', 'three', 'four', 'five'])
df2 = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df2.ix[1][0] = 111
df2.ix[1][1] = 222
In the example above, my final data frame would not be having rows ‘b’ and ‘c’.
How to use df.dropna()
in this case?
You can call dropna
with arguments subset
and how
:
df2.dropna(subset=['three', 'four', 'five'], how='all')
As the names suggests:
how='all'
requires every column (ofsubset
) in the row to beNaN
in order to be dropped, as opposed to the default'any'
.subset
is those columns to inspect forNaN
s.
As @PaulH points out, we can generalise to drop the last k
columns with:
subset=df2.columns[k:]
Indeed, we could even do something more complicated if desired:
subset=filter(lambda x: len(x) > 3, df2.columns)