How to get last group in Pandas' groupBy?
Question:
I wish to get the last group of my group by:
df.groupby(pd.TimeGrouper(freq='M')).groups[-1]:
but that gives the error:
KeyError: -1
Using get_group
is useless as I don’t know the last group’s value (unless there’s a specific way to get that value?). Also I might want to get the last 2 groups, etc
How do I do this?
Answers:
You can call last
which computes the last values for each group and use iloc
to get the row values and access the index group values using the name
attribute, there is probably a better way but unable to figure this out yet:
In [170]:
# dummy data
df = pd.DataFrame({'a':['1','2','2','4','5','2'], 'b':np.random.randn(6)})
df
Out[170]:
a b
0 1 0.097176
1 2 -1.400536
2 2 0.352093
3 4 -0.696436
4 5 -0.308680
5 2 -0.217767
In [179]:
gp = df.groupby('a', sort=False)
gp.get_group(df.groupby('a').last().iloc[-1].name)
Out[179]:
a b
4 5 0.608724
In [180]:
df.groupby('a').last().iloc[-2:]
Out[180]:
b
a
4 0.390451
5 0.608724
In [181]:
mult_groups = gp.last().iloc[-2:].index
In [182]:
for gp_val in mult_groups:
print(gp.get_group(gp_val))
a b
3 4 0.390451
a b
4 5 0.608724
Using Ed’s example
You can slice out the last group. The groups iterate in the correct order (meaning the given order, or sorted, as determined by the options).
In [12]: df = pd.DataFrame({'a':['1','2','2','4','5','2'], 'b':np.random.randn(6)})
In [13]: g = df.groupby('a')
In [14]: g.groups
Out[14]: {'1': [0], '2': [1, 2, 5], '4': [3], '5': [4]}
In [15]: import itertools
In [16]: list(itertools.islice(g,len(g)-1,len(g)))
Out[16]:
[('5', a b
4 5 -0.644857)]
Easiest is to convert the groups to a DataFrame and index it as you would a DataFrame. The resulting DataFrame has a row for each group, there the first column is the group index, and the second column is the DataFrame from that group. The one-liner for the last group’s DataFrame is:
last_dataframe = pd.Dataframe(df.groupby('whatever')).iloc[-1, 1]
If you want the index and group:
last_group = pd.DataFrame(df.groupby('whatever')).iloc[-1, :]
last_group[0]
is the index of the last group, and
last_group[1]
is the DataFrame of the last group
I wish to get the last group of my group by:
df.groupby(pd.TimeGrouper(freq='M')).groups[-1]:
but that gives the error:
KeyError: -1
Using get_group
is useless as I don’t know the last group’s value (unless there’s a specific way to get that value?). Also I might want to get the last 2 groups, etc
How do I do this?
You can call last
which computes the last values for each group and use iloc
to get the row values and access the index group values using the name
attribute, there is probably a better way but unable to figure this out yet:
In [170]:
# dummy data
df = pd.DataFrame({'a':['1','2','2','4','5','2'], 'b':np.random.randn(6)})
df
Out[170]:
a b
0 1 0.097176
1 2 -1.400536
2 2 0.352093
3 4 -0.696436
4 5 -0.308680
5 2 -0.217767
In [179]:
gp = df.groupby('a', sort=False)
gp.get_group(df.groupby('a').last().iloc[-1].name)
Out[179]:
a b
4 5 0.608724
In [180]:
df.groupby('a').last().iloc[-2:]
Out[180]:
b
a
4 0.390451
5 0.608724
In [181]:
mult_groups = gp.last().iloc[-2:].index
In [182]:
for gp_val in mult_groups:
print(gp.get_group(gp_val))
a b
3 4 0.390451
a b
4 5 0.608724
Using Ed’s example
You can slice out the last group. The groups iterate in the correct order (meaning the given order, or sorted, as determined by the options).
In [12]: df = pd.DataFrame({'a':['1','2','2','4','5','2'], 'b':np.random.randn(6)})
In [13]: g = df.groupby('a')
In [14]: g.groups
Out[14]: {'1': [0], '2': [1, 2, 5], '4': [3], '5': [4]}
In [15]: import itertools
In [16]: list(itertools.islice(g,len(g)-1,len(g)))
Out[16]:
[('5', a b
4 5 -0.644857)]
Easiest is to convert the groups to a DataFrame and index it as you would a DataFrame. The resulting DataFrame has a row for each group, there the first column is the group index, and the second column is the DataFrame from that group. The one-liner for the last group’s DataFrame is:
last_dataframe = pd.Dataframe(df.groupby('whatever')).iloc[-1, 1]
If you want the index and group:
last_group = pd.DataFrame(df.groupby('whatever')).iloc[-1, :]
last_group[0]
is the index of the last group, and
last_group[1]
is the DataFrame of the last group