Filter list-valued columns


I have this kind of dataset:

id   value   cond1     cond2
 a   1      ['a','b']  [1,2]
 b   1      ['a']      [1]
 a   2      ['b']      [2]
 a   3      ['a','b']  [1,2]
 b   3      ['a','b']  [1,2]

I would like to extract all the rows using the conditions, something like

df.loc[(df['cond1']==['a','b']) & (df['cond2']==[1,2])

this syntax produces however

ValueError: ('Lengths must match to compare', (100,), (1,))    

or this if I use isin:

SystemError: <built-in method view of numpy.ndarray object at 0x7f1e4da064e0> returned a result with an error set

How to do it right?


Asked By: Ilja



Since it tries to interpret the lists as an array-like, it attempts a column-wise comparison and fails as seen. A way is to tuplify:

df.loc[(df["cond1"].map(tuple) == ("a", "b")) & (df["cond2"].map(tuple) == (1, 2))]

  id  value   cond1   cond2
0  a      1  [a, b]  [1, 2]
3  a      3  [a, b]  [1, 2]
4  b      3  [a, b]  [1, 2]
Answered By: Mustafa Aydın
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.