How to extract a category from describe.()
Question:
import pandas as pd
df = {'fish': [38,10,23,45],'eggs': [24.0,10,12,8],'fruit': [80.5,60,12,32],'sugar': [0.234,0.8,0.34,0.76],
'category':['Price A', 'Price B','Price B','Price A']}
df = pd.DataFrame(data=df)
xy=['fish',
'eggs',
'fruit',
'sugar',
'category']
df=(df[xy].groupby(['category']).describe()).T
df['Ratio'] = df['Price A']/df['Price B']
df
I am trying to extract the category with only the mean with a ratio smaller than 0.5 and bigger than 2.
I tried to use
df.reset_index()
and making two separate df with the respective conditions, then turning the unwanted values to null, removing the null values(but I don’t know how to specifically only choose mean’s ratio), and then concatenating both of the df.
Maybe there are some better ways to do this?
Answers:
Use DataFrame.loc
for select groups by categories:
m = df['Ratio'].xs('mean', level=1).between(0.5,2)
out = df.loc[m.index[~m]]
print(out)
category Price A Price B Ratio
fish count 2.000000 2.000000 1.000000
mean 41.500000 16.500000 2.515152
std 4.949747 9.192388 0.538462
min 38.000000 10.000000 3.800000
25% 39.750000 13.250000 3.000000
50% 41.500000 16.500000 2.515152
75% 43.250000 19.750000 2.189873
max 45.000000 23.000000 1.956522
Explanation:
Select levels mean
by Series.xs
:
print(df['Ratio'].xs('mean', level=1))
fish 2.515152
eggs 1.454545
fruit 1.562500
sugar 0.871930
Name: Ratio, dtype: float64
Create mask between 0.5
and 2
by Series.between
:
print(df['Ratio'].xs('mean', level=1).between(0.5,2))
fish False
eggs True
fruit True
sugar True
Name: Ratio, dtype: bool
Get indices if False
s – invert mask and filter m.index
:
print (m.index[~m])
Index(['fish'], dtype='object')
import pandas as pd
df = {'fish': [38,10,23,45],'eggs': [24.0,10,12,8],'fruit': [80.5,60,12,32],'sugar': [0.234,0.8,0.34,0.76],
'category':['Price A', 'Price B','Price B','Price A']}
df = pd.DataFrame(data=df)
xy=['fish',
'eggs',
'fruit',
'sugar',
'category']
df=(df[xy].groupby(['category']).describe()).T
df['Ratio'] = df['Price A']/df['Price B']
df
I am trying to extract the category with only the mean with a ratio smaller than 0.5 and bigger than 2.
I tried to use
df.reset_index()
and making two separate df with the respective conditions, then turning the unwanted values to null, removing the null values(but I don’t know how to specifically only choose mean’s ratio), and then concatenating both of the df.
Maybe there are some better ways to do this?
Use DataFrame.loc
for select groups by categories:
m = df['Ratio'].xs('mean', level=1).between(0.5,2)
out = df.loc[m.index[~m]]
print(out)
category Price A Price B Ratio
fish count 2.000000 2.000000 1.000000
mean 41.500000 16.500000 2.515152
std 4.949747 9.192388 0.538462
min 38.000000 10.000000 3.800000
25% 39.750000 13.250000 3.000000
50% 41.500000 16.500000 2.515152
75% 43.250000 19.750000 2.189873
max 45.000000 23.000000 1.956522
Explanation:
Select levels mean
by Series.xs
:
print(df['Ratio'].xs('mean', level=1))
fish 2.515152
eggs 1.454545
fruit 1.562500
sugar 0.871930
Name: Ratio, dtype: float64
Create mask between 0.5
and 2
by Series.between
:
print(df['Ratio'].xs('mean', level=1).between(0.5,2))
fish False
eggs True
fruit True
sugar True
Name: Ratio, dtype: bool
Get indices if False
s – invert mask and filter m.index
:
print (m.index[~m])
Index(['fish'], dtype='object')