How to Index a dataframe based on an applied function? -Pandas
Question:
I have a dataframe that I created from a master table in SQL. That new dataframe is then grouped by type as I want to find the outliers for each group in the master table.
The function finds the outliers, showing where in the GroupDF they outliers occur. How do I see this outliers as a part of the original dataframe? Not just volume but also location, SKU, group etc.
dataframe: HOSIERY_df
Code:
##Sku Group Data Frames
grouped_skus = sku_volume.groupby('SKUGROUP')
HOSIERY_df = grouped_skus.get_group('HOSIERY')
hosiery_outliers = find_outliers_IQR(HOSIERY_df['VOLUME'])
hosiery_outliers
#.iloc[[hosiery_outliers]]
#hosiery_outliers
Picture to show code and output:
I know enough that I need to find the rows based on location of the index. Like Vlookup in Excel but i need to do it with in Python. Not sure how to pull only the 5, 6, 7…3888 and 4482nd place in the HOSIERY_df.
Answers:
You can provide a list of index numbers as integers to iloc
, which it looks like you have tried based on your commented-out code. So, you may want to make sure that find_outliers_IQR
is returning a list
of int
so it will work properly with iloc
, or convert it’s output.
It looks like it’s currently returning a DataFrame. You can get the index of that frame as a list like this:
hosiery_outliers.index.tolist()
I have a dataframe that I created from a master table in SQL. That new dataframe is then grouped by type as I want to find the outliers for each group in the master table.
The function finds the outliers, showing where in the GroupDF they outliers occur. How do I see this outliers as a part of the original dataframe? Not just volume but also location, SKU, group etc.
dataframe: HOSIERY_df
Code:
##Sku Group Data Frames
grouped_skus = sku_volume.groupby('SKUGROUP')
HOSIERY_df = grouped_skus.get_group('HOSIERY')
hosiery_outliers = find_outliers_IQR(HOSIERY_df['VOLUME'])
hosiery_outliers
#.iloc[[hosiery_outliers]]
#hosiery_outliers
Picture to show code and output:
I know enough that I need to find the rows based on location of the index. Like Vlookup in Excel but i need to do it with in Python. Not sure how to pull only the 5, 6, 7…3888 and 4482nd place in the HOSIERY_df.
You can provide a list of index numbers as integers to iloc
, which it looks like you have tried based on your commented-out code. So, you may want to make sure that find_outliers_IQR
is returning a list
of int
so it will work properly with iloc
, or convert it’s output.
It looks like it’s currently returning a DataFrame. You can get the index of that frame as a list like this:
hosiery_outliers.index.tolist()