Interweave groups in Pandas
Question:
I have a DataFrame that I want "intereaved" row-wise by groups.
For example, this DataFrame:
Group
Score
A
10
A
9
A
8
B
7
B
6
B
5
The desired result would be grabbing the first of A, and the first of B, then the second of A, then the second of B, etc.
Group
Score
A
10
B
7
A
9
B
6
A
8
B
5
Any ideas?
Answers:
You can use the cumcount
of each Group
as a sorting key :
out = df.sort_values("Group", key=lambda _: df.groupby("Group").cumcount())
Or better, as suggested by @mozway, you can use one of these variants:
out = df.sort_values(by="Group", key=lambda s: s.groupby(s).cumcount())
out = df.iloc[np.argsort(df.groupby("Group").cumcount())]
Output :
print(out)
Group Score
0 A 10
3 B 7
1 A 9
4 B 6
2 A 8
5 B 5
Solution 1
Another possible, which calculates the number of elements by group, n
, and then, by reshaping and flattening the index of the dataframe with a convenient order (order F
), gets the proper index to reindex (with iloc
) and obtain the wanted output:
n = len(df) // df['Group'].nunique()
df.iloc[df.index.values.reshape(-1, n).flatten(order='F')]
Solution 2
Yet another possible solution, which is based on list comprehension
. This solution may not be as efficient as the previous one, because there is a groupby
:
g = df.groupby('Group')
df.iloc[[index for y in zip(*[x.index for _, x in g]) for index in y]]
Output
Group Score
0 A 10
3 B 7
1 A 9
4 B 6
2 A 8
5 B 5
I have a DataFrame that I want "intereaved" row-wise by groups.
For example, this DataFrame:
Group | Score |
---|---|
A | 10 |
A | 9 |
A | 8 |
B | 7 |
B | 6 |
B | 5 |
The desired result would be grabbing the first of A, and the first of B, then the second of A, then the second of B, etc.
Group | Score |
---|---|
A | 10 |
B | 7 |
A | 9 |
B | 6 |
A | 8 |
B | 5 |
Any ideas?
You can use the cumcount
of each Group
as a sorting key :
out = df.sort_values("Group", key=lambda _: df.groupby("Group").cumcount())
Or better, as suggested by @mozway, you can use one of these variants:
out = df.sort_values(by="Group", key=lambda s: s.groupby(s).cumcount())
out = df.iloc[np.argsort(df.groupby("Group").cumcount())]
Output :
print(out)
Group Score
0 A 10
3 B 7
1 A 9
4 B 6
2 A 8
5 B 5
Solution 1
Another possible, which calculates the number of elements by group, n
, and then, by reshaping and flattening the index of the dataframe with a convenient order (order F
), gets the proper index to reindex (with iloc
) and obtain the wanted output:
n = len(df) // df['Group'].nunique()
df.iloc[df.index.values.reshape(-1, n).flatten(order='F')]
Solution 2
Yet another possible solution, which is based on list comprehension
. This solution may not be as efficient as the previous one, because there is a groupby
:
g = df.groupby('Group')
df.iloc[[index for y in zip(*[x.index for _, x in g]) for index in y]]
Output
Group Score
0 A 10
3 B 7
1 A 9
4 B 6
2 A 8
5 B 5