Extract unique items in a column and map with all items in another column in pandas
Question:
I have a pandas dataframe df
which looks like this:
Col1 Col2 Label
0 D1 C38 1
1 D1 C65 1
2 D1 C53 1
3 D2 C02 1
4 D2 C01 1
5 D4 C73 1
I want to first extract all the unqiue items from Col1
and each unique item in Col1
needs to be mapped to all items in Col2
except for those corresponding items that are already having a connection with label as 1 in third column.
For example, if we take D1
in Col1
it is having three repetitions with label as 1 in third column Label
. Now, map D1
with remaining items in col2 i.e., C02 C01 C73 C61 C03
and add these new connections in the same Col1
and Col2
with label as 0.
The output of dataframe needs to be like this:
Col1 Col2 Label
0 D1 C38 1
1 D1 C65 1
2 D1 C53 1
3 D1 C02 0
4 D1 C01 0
5 D1 C73 0
6 D2 C02 1
7 D2 C01 1
8 D2 C38 0
9 D2 C65 0
10 D2 C53 0
11 D2 C73 0
12 D4 C73 1
13 D4 C02 0
14 D4 C01 0
15 D4 C38 0
16 D4 C65 0
17 D4 C53 0
Is there a way to do this? Appreciate your suggestions
Answers:
Here is one option to keep the order:
cols = ['Col1', 'Col2']
idx = pd.MultiIndex.from_product([df[c].unique() for c in cols], names=cols)
out = (df
.set_index(cols).reindex(idx, fill_value=0).reset_index()
.sort_values(by=['Col1', 'Label'], ascending=[True, False],
kind='stable', ignore_index=True)
)
Or, if the groups cannot be sorted:
cols = ['Col1', 'Col2']
idx = pd.MultiIndex.from_product([df[c].unique() for c in cols], names=cols)
out = (df
.set_index(cols).reindex(idx, fill_value=0).reset_index()
.groupby('Col1', group_keys=False)
.apply(lambda g: g.sort_values(by='Label', ascending=False, kind='stable'))
.reset_index(drop=True)
)
output:
Col1 Col2 Label
0 D1 C38 1
1 D1 C65 1
2 D1 C53 1
3 D1 C02 0
4 D1 C01 0
5 D1 C73 0
6 D2 C02 1
7 D2 C01 1
8 D2 C38 0
9 D2 C65 0
10 D2 C53 0
11 D2 C73 0
12 D4 C73 1
13 D4 C38 0
14 D4 C65 0
15 D4 C53 0
16 D4 C02 0
17 D4 C01 0
I have a pandas dataframe df
which looks like this:
Col1 Col2 Label
0 D1 C38 1
1 D1 C65 1
2 D1 C53 1
3 D2 C02 1
4 D2 C01 1
5 D4 C73 1
I want to first extract all the unqiue items from Col1
and each unique item in Col1
needs to be mapped to all items in Col2
except for those corresponding items that are already having a connection with label as 1 in third column.
For example, if we take D1
in Col1
it is having three repetitions with label as 1 in third column Label
. Now, map D1
with remaining items in col2 i.e., C02 C01 C73 C61 C03
and add these new connections in the same Col1
and Col2
with label as 0.
The output of dataframe needs to be like this:
Col1 Col2 Label
0 D1 C38 1
1 D1 C65 1
2 D1 C53 1
3 D1 C02 0
4 D1 C01 0
5 D1 C73 0
6 D2 C02 1
7 D2 C01 1
8 D2 C38 0
9 D2 C65 0
10 D2 C53 0
11 D2 C73 0
12 D4 C73 1
13 D4 C02 0
14 D4 C01 0
15 D4 C38 0
16 D4 C65 0
17 D4 C53 0
Is there a way to do this? Appreciate your suggestions
Here is one option to keep the order:
cols = ['Col1', 'Col2']
idx = pd.MultiIndex.from_product([df[c].unique() for c in cols], names=cols)
out = (df
.set_index(cols).reindex(idx, fill_value=0).reset_index()
.sort_values(by=['Col1', 'Label'], ascending=[True, False],
kind='stable', ignore_index=True)
)
Or, if the groups cannot be sorted:
cols = ['Col1', 'Col2']
idx = pd.MultiIndex.from_product([df[c].unique() for c in cols], names=cols)
out = (df
.set_index(cols).reindex(idx, fill_value=0).reset_index()
.groupby('Col1', group_keys=False)
.apply(lambda g: g.sort_values(by='Label', ascending=False, kind='stable'))
.reset_index(drop=True)
)
output:
Col1 Col2 Label
0 D1 C38 1
1 D1 C65 1
2 D1 C53 1
3 D1 C02 0
4 D1 C01 0
5 D1 C73 0
6 D2 C02 1
7 D2 C01 1
8 D2 C38 0
9 D2 C65 0
10 D2 C53 0
11 D2 C73 0
12 D4 C73 1
13 D4 C38 0
14 D4 C65 0
15 D4 C53 0
16 D4 C02 0
17 D4 C01 0