How do I subset a pandas data frame based on a list of string values?

Question:

I’ve got a dF that’s over 100k rows long, and a few columns wide — nothing crazy. I’m trying to subset the rows based on a list of some 4000 strings, but am struggling to figure out how to do so. Is there a way to subset using something like.

The dF looks something like this

dog_name    count
===================
Jenny        2
Fido         4
Joey         7
Yeller       2

and the list of strings is contained the variable dog_name_list=['Fido', 'Yeller']

I’ve tried something along the lines of
df[df['dog_name'].isin(dog_name_list), but am getting a fun error: unhashable type: 'list'

I’ve checked a similar question, the docs and this rundown for subsetting data frames by seeing whether a value is present in a list, but that’s got me right about nowhere, and I’m a little confused by what I’m missing. Would really appreciate someone’s advice!

Asked By: scrollex

||

Answers:

I believe you have a list in your dog name column.

This works fine:

>>> df[df['dog_name'].isin({'Fido', 'Yeller'})]
  dog_name  count
1     Fido      4
3   Yeller      2

But if you one of those dogs happens to have a list for a name instead of a string, you will get TypeError: unhashable type: 'list'

df.ix[4] = (['a'], 2)
>>> df
  dog_name  count
0    Jenny      2
1     Fido      4
2     Joey      7
3   Yeller      2
4      [a]      2

>>> df[df['dog_name'].isin({'Fido', 'Yeller'})]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-1b68dd948f39> in <module>()
----> 1 df[df['dog_name'].isin({'Fido', 'Yeller'})]
...
pandas/lib.pyx in pandas.lib.ismember (pandas/lib.c:5014)()

TypeError: unhashable type: 'list'

To find those bad dogs:

>>> df[[isinstance(dog, list) for dog in df.dog_name]]
  dog_name  count
4      [a]      2

To find all the data types in the column:

>>> set((type(dog) for dog in df.dog_name))
{list, str}
Answered By: Alexander
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.