Python dataframe drop negative values in multiple columns
Question:
I want drop negative values in some columns of the dataframe.
My code:
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,1]})
A B C
0 -1 5 1
1 2 -6 -2
2 3 7 0
3 4 8 -1
I want to drop negative value rows in A and B columns
Expected result:
ad =
A B C
1 3 7 0
2 4 8 -1
Present solution:
ad = ad[ad[['A','B']]>0]
A B C
0 NaN 5.0 NaN
1 2.0 NaN NaN
2 3.0 7.0 NaN
3 4.0 8.0 NaN
ad.dropna(how='any',inplace=True)
ad = []
Update:
I tried the below-accepted answer. Also, I figured out a numpy-based solution.
import pandas as pd
import numpy as np
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,1]})
print(ad[np.logical_and.reduce(ad[['A','B']]>0,axis=1)])
%timeit ad[np.logical_and.reduce(ad[['A','B']]>0,axis=1)]
A B C
2 3 7 0
3 4 8 1
1000 loops, best of 5: 795 µs per loop
print(ad[(ad[['A','B']] > 0).all(1)])
%timeit ad[(ad[['A','B']] > 0).all(1)]
A B C
2 3 7 0
3 4 8 1
1000 loops, best of 5: 979 µs per loop
Answers:
With all you can check whether all elements in a row or column are true. You can use this in a filter on a subset of columns:
import pandas as pd
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,-1]})
ad[(ad[['A','B']] > 0).all(1)]
Output:
A
B
C
2
3
7
0
3
4
8
-1
For an exclusively Pandas solution, use .loc
to filter out by boolean expressions.
The line of code:
ad.loc[(ad['A'] > 0) & (ad['B'] > 0)]
should filter out all rows corresponding to columns with negative A
and negative B
. If you want to reset the index as you did in your expected output, then just put .reset_index(drop=True)
at the end of the above expression.
Using this on your example, here’s what I get:
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,1]})
ad.loc[(ad['A'] > 0) & (ad['B'] > 0)]
results in:
A B C
2 3 7 0
3 4 8 1
Then doing
ad.loc[(ad['A'] > 0) & (ad['B'] > 0)].reset_index(drop=True)
results in:
A B C
0 3 7 0
1 4 8 1
I want drop negative values in some columns of the dataframe.
My code:
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,1]})
A B C
0 -1 5 1
1 2 -6 -2
2 3 7 0
3 4 8 -1
I want to drop negative value rows in A and B columns
Expected result:
ad =
A B C
1 3 7 0
2 4 8 -1
Present solution:
ad = ad[ad[['A','B']]>0]
A B C
0 NaN 5.0 NaN
1 2.0 NaN NaN
2 3.0 7.0 NaN
3 4.0 8.0 NaN
ad.dropna(how='any',inplace=True)
ad = []
Update:
I tried the below-accepted answer. Also, I figured out a numpy-based solution.
import pandas as pd
import numpy as np
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,1]})
print(ad[np.logical_and.reduce(ad[['A','B']]>0,axis=1)])
%timeit ad[np.logical_and.reduce(ad[['A','B']]>0,axis=1)]
A B C
2 3 7 0
3 4 8 1
1000 loops, best of 5: 795 µs per loop
print(ad[(ad[['A','B']] > 0).all(1)])
%timeit ad[(ad[['A','B']] > 0).all(1)]
A B C
2 3 7 0
3 4 8 1
1000 loops, best of 5: 979 µs per loop
With all you can check whether all elements in a row or column are true. You can use this in a filter on a subset of columns:
import pandas as pd
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,-1]})
ad[(ad[['A','B']] > 0).all(1)]
Output:
A | B | C | |
---|---|---|---|
2 | 3 | 7 | 0 |
3 | 4 | 8 | -1 |
For an exclusively Pandas solution, use .loc
to filter out by boolean expressions.
The line of code:
ad.loc[(ad['A'] > 0) & (ad['B'] > 0)]
should filter out all rows corresponding to columns with negative A
and negative B
. If you want to reset the index as you did in your expected output, then just put .reset_index(drop=True)
at the end of the above expression.
Using this on your example, here’s what I get:
ad = pd.DataFrame({'A':[-1,2,3,4],'B':[5,-6,7,8],'C':[1,-2,0,1]})
ad.loc[(ad['A'] > 0) & (ad['B'] > 0)]
results in:
A B C
2 3 7 0
3 4 8 1
Then doing
ad.loc[(ad['A'] > 0) & (ad['B'] > 0)].reset_index(drop=True)
results in:
A B C
0 3 7 0
1 4 8 1