pandas dataframe get rows when list values in specific columns meet certain condition
Question:
I have a dataframe:
df = A B
1 [0.2,0.8]
2 [0.6,0.9]
I want to get only rows where all the values of B are >= 0.5
So here:
new_df = A B
2 [0.6, 0.9]
What is the best way to do it?
Answers:
You can use apply
to filter the values,
import pandas as pd
df = pd.DataFrame({'A': [1,2], 'B':[[0.2, 0.8], [0.6, 0.9]]})
print(df[df['B'].apply(lambda x: all([i>=0.5 for i in x]))])
You can
- explode the list in
B
column to rows
- check if the rows are all greater and equal than 0.5 based on index group
- boolean indexing the
df
with satisfied rows
out = df[df.explode('B')['B'].ge(0.5).groupby(level=0).all()]
print(out)
A B
1 2 [0.6, 0.9]
Method1:
first drive a new columns e.g. flag which indicate the result of filter condition. Then use this flag to filter out records. I am using a custom function to drive flag value. You can do much more operations in custom function.
Below code:
def fun1(r):
flg = all(b>0.5 for b in r['B'])
#print(flg)
r['flg'] = flg
return r
df1 = pd.DataFrame([{'A':1,'B':[0.2,0.8]},{'A':2,'B':[0.6,0.9]}])
#
df1 = df1[df1.apply(fun1, axis=1)['flg']==True]
df1
result:
A B
2 [0.6, 0.9]
Method2:
Using lambda one liner:
df1 = df1[ df1['B'].apply(lambda x: all([b>0.5 for b in x])) ]
import pandas as pd
df = pd.DataFrame({'A':[1, 2],
'B':[[0.2,0.8], [0.6,0.9]],
})
mask = df.agg({'B': lambda v: all(map(lambda x: x>0.5, v))})
r = df[mask['B']]
print(r)
A B
1 2 [0.6, 0.9]
I have a dataframe:
df = A B
1 [0.2,0.8]
2 [0.6,0.9]
I want to get only rows where all the values of B are >= 0.5
So here:
new_df = A B
2 [0.6, 0.9]
What is the best way to do it?
You can use apply
to filter the values,
import pandas as pd
df = pd.DataFrame({'A': [1,2], 'B':[[0.2, 0.8], [0.6, 0.9]]})
print(df[df['B'].apply(lambda x: all([i>=0.5 for i in x]))])
You can
- explode the list in
B
column to rows - check if the rows are all greater and equal than 0.5 based on index group
- boolean indexing the
df
with satisfied rows
out = df[df.explode('B')['B'].ge(0.5).groupby(level=0).all()]
print(out)
A B
1 2 [0.6, 0.9]
Method1:
first drive a new columns e.g. flag which indicate the result of filter condition. Then use this flag to filter out records. I am using a custom function to drive flag value. You can do much more operations in custom function.
Below code:
def fun1(r):
flg = all(b>0.5 for b in r['B'])
#print(flg)
r['flg'] = flg
return r
df1 = pd.DataFrame([{'A':1,'B':[0.2,0.8]},{'A':2,'B':[0.6,0.9]}])
#
df1 = df1[df1.apply(fun1, axis=1)['flg']==True]
df1
result:
A B
2 [0.6, 0.9]
Method2:
Using lambda one liner:
df1 = df1[ df1['B'].apply(lambda x: all([b>0.5 for b in x])) ]
import pandas as pd
df = pd.DataFrame({'A':[1, 2],
'B':[[0.2,0.8], [0.6,0.9]],
})
mask = df.agg({'B': lambda v: all(map(lambda x: x>0.5, v))})
r = df[mask['B']]
print(r)
A B
1 2 [0.6, 0.9]