Fixing IndexingError to clean the data

Question:

I’m trying to identify outliers in each housing type category, but encountering an issue. Whenever I run the code, I receive the following error: "IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

grouped = df.groupby('Type')
q1 = grouped["price"].quantile(0.25)
q3 = grouped["price"].quantile(0.75)
iqr = q3 - q1

upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)

outliers = df[(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)) | (df["price"].reset_index(drop=True) < lower_bound[df["Type"].reset_index(drop=True)])]
print(outliers)

When I run this part of the code

(df["price"].reset_index(drop=True) > upper_bound[df["Type"]].reset_index(drop=True)).reset_index(drop = True)

I’m getting boolean Series, but when I put it in the df[] it breaks.

Asked By: Omarov Alen

||

Answers:

Use Series.map, then reset_index is not necessary:

outliers = df[(df["price"] > df["Type"].map(upper_bound)) | 
              (df["price"] < df["Type"].map(lower_bound))]
print(outliers)
Answered By: jezrael

Use transform to compute q1/q3, this will maintain the original index:

q1 = grouped["price"].transform(lambda x: x.quantile(0.25))
q3 = grouped["price"].transform(lambda x: x.quantile(0.75))

iqr = q3 - q1

upper_bound = q3 + (1.5 * iqr)
lower_bound = q1 - (1.5 * iqr)

outliers = df[df["price"].gt(upper_bound) | df["price"].lt(lower_bound)]
Answered By: mozway
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.