Pandas "rolling" groupby
Question:
Assuming this is my df:
group connected_to
0 1 1
1 2 0
2 2 1
3 2 2
4 3 5
5 4 4
6 3 7
7 5 5
what I want to get is the minimal group per connected rows
.
So row 0 is connected to 1, thus they are in the same group. Row 2 is also connected to 1 – thus it joins the group. Row 3 is connected to row 2 which joined the group, thus it is also joining the group etc.
Row 4 is not connected to any row in the first group thus it is a new group. The output should look like that:
group connected_to minimal_group
0 1 1 1
1 2 0 1
2 2 1 1
3 2 2 1
4 3 5 3
5 4 4 3
6 3 7 3
7 5 5 3
I implemented it using a for
inside a while
– really ugly solution.
Is there a more elegant way to do it on pandas?
Answers:
Use:
import networkx as nx
#convert index to column index
df1 = df.reset_index()
# Create the graph from the dataframe
g = nx.Graph()
g = nx.from_pandas_edgelist(df1,'index','connected_to')
connected_components = nx.connected_components(g)
# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
for node in component:
node2id[node] = cid
mapping index column by connected groups and get minimal group to new column
df['minimal_group'] = df1.groupby(df1['index'].map(node2id))['group'].transform('min')
print (df)
group connected_to minimal_group
0 1 1 1
1 2 0 1
2 2 1 1
3 2 2 1
4 3 5 3
5 4 4 3
6 3 7 3
7 5 5 3
Assuming this is my df:
group connected_to
0 1 1
1 2 0
2 2 1
3 2 2
4 3 5
5 4 4
6 3 7
7 5 5
what I want to get is the minimal group per connected rows
.
So row 0 is connected to 1, thus they are in the same group. Row 2 is also connected to 1 – thus it joins the group. Row 3 is connected to row 2 which joined the group, thus it is also joining the group etc.
Row 4 is not connected to any row in the first group thus it is a new group. The output should look like that:
group connected_to minimal_group
0 1 1 1
1 2 0 1
2 2 1 1
3 2 2 1
4 3 5 3
5 4 4 3
6 3 7 3
7 5 5 3
I implemented it using a for
inside a while
– really ugly solution.
Is there a more elegant way to do it on pandas?
Use:
import networkx as nx
#convert index to column index
df1 = df.reset_index()
# Create the graph from the dataframe
g = nx.Graph()
g = nx.from_pandas_edgelist(df1,'index','connected_to')
connected_components = nx.connected_components(g)
# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
for node in component:
node2id[node] = cid
mapping index column by connected groups and get minimal group to new column
df['minimal_group'] = df1.groupby(df1['index'].map(node2id))['group'].transform('min')
print (df)
group connected_to minimal_group
0 1 1 1
1 2 0 1
2 2 1 1
3 2 2 1
4 3 5 3
5 4 4 3
6 3 7 3
7 5 5 3