Pandas filter string data based on its string length using DataFrame.query

Question:

The question is very similar to this question Python: Pandas filter string data based on its string length, but I want to use pandas.DataFrame.query. Let’s say we have a pandas.DataFrame. I like to filter out the rows where the string length of the column A is not equal to 3 using pandas.DataFrame.query

import pandas as pd
import numpy as np
df = pd.DataFrame({'A' : ['hi', 'hello', 'day', np.nan], 'B' : [1, 2, 3, 4]})  
df.query('A.str.len() != 3')

However, I got the following error

TypeError: unhashable type: 'numpy.ndarray'
Asked By: Chen Chen

||

Answers:

Replacing 3 with "3" works. I’m using pandas 0.23.1.

df.query('A.str.len() != "3"')

Output:

       A  B
0     hi  1
1  hello  2
3    NaN  4

Alternatively, if you want to remove np.nan as 3-character string (NaN):

df.query('A.astype("str").str.len() != "3"')

Output:

       A  B
0     hi  1
1  hello  2

Hope this helps.

Answered By: gyoza

As of Pandas 1.4.2, OP’s original code works.

Filter out rows where A values have length equal to 3:

df.query('A.str.len() != 3')

Filter out NaN values in addition to strings of length 3 (leverage the fact that NaN != NaN):

df.query('A.str.len() != 3 and A == A')

For Python 3.9.7 and Pandas 1.3.4, gyoza’s answer is not filtering (it’s returning the entire df back). However, converting the result of str.len() to dtype str works.

df.query('A.str.len().astype("str") != "3"')

or if it contains NaN that needs to be filtered out:

df.query('A.astype("str").str.len().astype("str") != "3"')
Answered By: cottontail
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.