Get all keys from GroupBy object in Pandas

Question:

I’m looking for a way to get a list of all the keys in a GroupBy object, but I can’t seem to find one via the docs nor through Google.

There is definitely a way to access the groups through their keys, like so:

df_gb = df.groupby(['EmployeeNumber'])
df_gb.get_group(key)

…so I figure there’s a way to access a list (or the like) of the keys in a GroupBy object. I’m looking for something like this:

df_gb.keys
Out: [1234, 2356, 6894, 9492]

I figure I could just loop through the GroupBy object and get the keys that way, but I think there’s got to be a better way.

Asked By: Nate

||

Answers:

You can access this via attribute .groups on the groupby object, this returns a dict, the keys of the dict gives you the groups:

In [40]:
df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[40]:
dict_keys([0, 1, 2, 3])

here is the output from groups:

In [41]:
gp.groups

Out[41]:
{0: Int64Index([0], dtype='int64'),
 1: Int64Index([1, 2, 3], dtype='int64'),
 2: Int64Index([4, 5], dtype='int64'),
 3: Int64Index([6, 7, 8], dtype='int64')}

Update

it looks like that because the type of groups is a dict then the group order isn’t maintained when you call keys:

In [65]:
df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
gp.groups.keys()

Out[65]:
dict_keys(['b', 'e', 'g', 'a', 'x'])

if you call groups you can see the order is maintained:

In [79]:
gp.groups

Out[79]:
{'a': Int64Index([2, 3, 4], dtype='int64'),
 'b': Int64Index([0, 5, 8], dtype='int64'),
 'e': Int64Index([7], dtype='int64'),
 'g': Int64Index([1], dtype='int64'),
 'x': Int64Index([6], dtype='int64')}

then the key order is maintained, a hack around this is to access the .name attribute of each group:

In [78]:
gp.apply(lambda x: x.name)

Out[78]:
group
a    a
b    b
e    e
g    g
x    x
dtype: object

which isn’t great as this isn’t vectorised, however if you already have an aggregated object then you can just get the index values:

In [81]:
agg = gp.sum()
agg

Out[81]:
       val
group     
a        9
b       13
e        7
g        1
x        6

In [83]:    
agg.index.get_level_values(0)

Out[83]:
Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group')
Answered By: EdChum

Use the option sort=False to have group key order reserved
gp = df.groupby('group', sort=False)

Answered By: user11827562

A problem with EdChum’s answer is that getting keys by launching gp.groups.keys() first constructs the full group dictionary. On large dataframes, this is a very slow operation, which effectively doubles the memory consumption. Iterating is waaay faster:

df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
gp = df.groupby('group')
keys = [key for key, _ in gp]

Executing this list comprehension took me 16 s on my groupby object, while I had to interrupt gp.groups.keys() after 3 minutes.

Answered By: Dr_ZaszuĹ›
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.