How to trim a string inside a query string passed to the Pandas query function?
Question:
I need to trim the ‘My Pet’ column inside the query string. How can that be done?
# Note that one cat literal has a trailing space.
testDF = pd.DataFrame([{"My Pet":"cat ", "Cost":"$10 ", "Weight":"10 pounds", "Name":"Violet"},
{"My Pet":"cat", "Cost":"$10 ", "Weight":"15 pounds", "Name":"Sirius"},
{"My Pet":"dog", "Cost":"$0 ", "Weight":"50 pounds", "Name":"Sam"},
{"My Pet":"turtle", "Cost":"$5 ", "Weight":"20 ounces", "Name":"Tommy"},
])
# We try to filter on cat.
catDF = testDF.query("`My Pet` == 'cat'") # This yields only one row because one cat cell has a trailing space
catDF.head()
Output is only one row but I would like to get both rows with cat in them
My Pet Cost Weight Name
1 cat $10 15 pounds Sirius
Answers:
Use Series.str.strip
, but need engine="python"
:
catDF = testDF.query("`My Pet`.str.strip() == 'cat'", engine="python")
print (catDF)
My Pet Cost Weight Name
0 cat $10 10 pounds Violet
1 cat $10 15 pounds Sirius
You can access string methods on string series. This means that you can use the Series.str.strip
function to clean remove whitespace.
testDF["My Pet"] = testDF["My Pet"].str.strip()
If you want to only clean trailing spaces then use rstrip
(and lstrip
for leading spaces)
You can optionally specify the set of characters you want to remove explicitly. E.g. to remove only spaces and tabs:
testDF["My Pet"] = testDF["My Pet"].str.strip(" t")
I need to trim the ‘My Pet’ column inside the query string. How can that be done?
# Note that one cat literal has a trailing space.
testDF = pd.DataFrame([{"My Pet":"cat ", "Cost":"$10 ", "Weight":"10 pounds", "Name":"Violet"},
{"My Pet":"cat", "Cost":"$10 ", "Weight":"15 pounds", "Name":"Sirius"},
{"My Pet":"dog", "Cost":"$0 ", "Weight":"50 pounds", "Name":"Sam"},
{"My Pet":"turtle", "Cost":"$5 ", "Weight":"20 ounces", "Name":"Tommy"},
])
# We try to filter on cat.
catDF = testDF.query("`My Pet` == 'cat'") # This yields only one row because one cat cell has a trailing space
catDF.head()
Output is only one row but I would like to get both rows with cat in them
My Pet Cost Weight Name
1 cat $10 15 pounds Sirius
Use Series.str.strip
, but need engine="python"
:
catDF = testDF.query("`My Pet`.str.strip() == 'cat'", engine="python")
print (catDF)
My Pet Cost Weight Name
0 cat $10 10 pounds Violet
1 cat $10 15 pounds Sirius
You can access string methods on string series. This means that you can use the Series.str.strip
function to clean remove whitespace.
testDF["My Pet"] = testDF["My Pet"].str.strip()
If you want to only clean trailing spaces then use rstrip
(and lstrip
for leading spaces)
You can optionally specify the set of characters you want to remove explicitly. E.g. to remove only spaces and tabs:
testDF["My Pet"] = testDF["My Pet"].str.strip(" t")