How do I select a subset of a DataFrame based on a condition on a column

Question:

Similar to How do I select a subset of a DataFrame based on one level of a MultiIndex, let

df = pd.DataFrame({"v":[x*x for x in range(12)]}, 
                  index=pd.MultiIndex.from_product([["a","b","c"],[1,2,3,4]]))

and suppose I want to select only rows with the v being within 25 from its smallest value for the given first level:

       v
a 1    0
  2    1
  3    4
  4    9
b 1   16
  2   25
  3   36
c 1   64
  2   81

This time I have no idea how to do that easily….

Asked By: sds

||

Answers:

You can do groupby the level 0 of the dataframe and get the minimum value of v column in each group. Then make a comparison between the v and the smallest v in each group.

out = df[df['v'].sub(df.groupby(level=0)['v'].transform('min')) < 25]
print(out)

      v
a 1   0
  2   1
  3   4
  4   9
b 1  16
  2  25
  3  36
c 1  64
  2  81

If you want to find the min within level 1 of multiindex, you can do

out = df[((df.index.get_level_values(level=-1) -
           df.reset_index(level=-1).groupby(level=0)['level_1'].transform('min'))
          # or without reset index
          # df.groupby(level=0).transform(lambda g: g.index.get_level_values(level=-1).min())
          < 25).values]
Answered By: Ynjxsjmh