In a panda dataframe how can I use boolean indexing on an index (in a multi-index dataframe)?

Question

I have a data frame with four named indices, time, lev, lon, and lat, like this (this is just the head, it’s a huge dataframe):

                                           O         N
time       lev          lat   lon                     
2021-01-01 4.055141e-10 -90.0 0.0   0.954735  0.046307
                              2.5   0.954735  0.046307
                              5.0   0.954735  0.046307
                              7.5   0.954735  0.046307
                              10.0  0.954735  0.046307
                              12.5  0.954735  0.046307
                              15.0  0.954735  0.046307
                              17.5  0.954735  0.046307
                              20.0  0.954735  0.046307
                              22.5  0.954735  0.046307

I would like to omit all data where lev < 1. If lev were a column, I could do this just by:

df = df[df['lev'] > 1]

but lev is an idnex, rather than a column. In theory, I could use

df.reset_index(level=['lev'])

to turn the index into a column, but my dataframe is too large for that and it always crashes. So how I can index by the index?

Asked By: Billiam

||

Source

Answer 1

You can use Index.get_level_values:

df = df[df.index.get_level_values('lev') > 1]

Or with query (provided there is no column with the same name):

df = df.query('lev > 1')

Example with a different condition to get a non-empty output:

df[df.index.get_level_values('lon') > 17]

output:

                                           O         N
time       lev          lat   lon                     
2021-01-01 4.055141e-10 -90.0 17.5  0.954735  0.046307
                              20.0  0.954735  0.046307
                              22.5  0.954735  0.046307

Answered By: mozway

In a panda dataframe how can I use boolean indexing on an index (in a multi-index dataframe)?

Question:

Answers: