Pandas: Get unique MultiIndex level values by label

Question:

Say you have this MultiIndex-ed DataFrame:

df = pd.DataFrame({'co':['DE','DE','FR','FR'],
                   'tp':['Lake','Forest','Lake','Forest'],
                   'area':[10,20,30,40],
                   'count':[7,5,2,3]})
df = df.set_index(['co','tp'])

Which looks like this:

           area  count
co tp
DE Lake      10      7
   Forest    20      5
FR Lake      30      2
   Forest    40      3

I would like to retrieve the unique values per index level. This can be accomplished using

df.index.levels[0]  # returns ['DE', 'FR]
df.index.levels[1]  # returns ['Lake', 'Forest']

What I would really like to do, is to retrieve these lists by addressing the levels by their name, i.e. 'co' and 'tp'. The shortest two ways I could find looks like this:

list(set(df.index.get_level_values('co')))  # returns ['DE', 'FR']
df.index.levels[df.index.names.index('co')]  # returns ['DE', 'FR']

But non of them are very elegant. Is there a shorter way?

Asked By: ojdo

||

Answers:

I guess u want unique values in a certain level (and by level names) of a multiindex. I usually do the following, which is a bit long.

In [11]: df.index.get_level_values('co').unique()
Out[11]: array(['DE', 'FR'], dtype=object)
Answered By: Happy001

If you’re going to do the level lookup repeatedly, you could create a map of your index level names to level unique values with:

df_level_value_map = {
    name: level 
    for name, level in zip(df.index.names, df.index.levels)
}
df_level_value_map['']

But this is not in any way more efficient (or shorter) than your original attempts if you’re only going to do this lookup once.

I really wish there was a method on indexes that returned such a dictionary (or series?) with a name like:

df.index.get_level_map(levels={...})

Where the levels parameter can limit the map to a subset of the existing levels. I could do without the parameter if it could be a property like:

df.index.level_map
Answered By: LeoRochael

An alternative approach is to find the number of levels by calling df.index.levels[level_index] where level_index can be inferred from df.index.names.index(level_name). In the above example level_name = ‘co’.

The proposed answer by @Happy001 computes the unique which may be computationally intensive.

Answered By: Hanan Shteingart

Pandas 0.23.0 finally introduced a much cleaner solution to this problem: the level argument to Index.unique():

In [3]: df.index.unique(level='co')
Out[3]: Index(['DE', 'FR'], dtype='object', name='co')

This is now the recommended solution. It is far more efficient because it avoids creating a complete representation of the level values in memory, and re-scanning it.

Answered By: Pietro Battiston

If you already know the index names, is it not straightforward to simply do:
df['co'].unique() ?

Answered By: CyclicUniverse
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.