Assigning increasing integer numbers to distinct values that share identical values in the previous columns

Question

I have a dataframe that goes like this

Index	One	Two	Three	Four	Five	Six
1	A	–	–	–	–	–
2	A	B	C	–	–	–
3	A	B	C	F	L	–
4	A	B	C	F	M	S
5	A	B	D	G	N	–
6	A	B	D	H	O	–
7	A	B	D	I	P	T
8	A	B	E	J	Q	–
9	A	B	E	K	R	U

And I would like an output like the following where the distinct identifiers (that builds on top of the combinations of the previous columns) are assigned an increasing integer number for each distinct value:

Column "One" has only 1 distinct ("A") identifier so all of them are substituted by the integer 1.
Column "Two" has 2 distinct identifiers "-" and "B" (this works as a regular pd.Categorical because they all share the same value in column "One")
Column "Three" is where things get tricky for me. Row index 1 gets 1 because the "-" is the only distinct identifier for the combination ("A","-" the two previous columns "One" and "Two"). Row index 2, 3 and 4 gets 1 as well because "C" is the first distinct identifier for the combination ("A","B" coming respectively from column "One" and "Two"). Row 5, 6, 7 gets 2, because "D" is the second distinct identifier for the combination "A","B" etc…
Last example row 3 and 4, column "Five". They got value 1 and 2 because they share the same path (A,B,C,F) until column "Four" but in column "Five" they got distinct values (L and M).

Index	One	Two	Three	Four	Five	Six
1	1	1	1	1	1	1
2	1	2	1	1	1	1
3	1	2	1	2	1	1
4	1	2	1	2	2	1
5	1	2	2	1	1	1
6	1	2	2	2	1	1
7	1	2	2	3	1	1
8	1	2	3	1	1	1
9	1	2	3	2	1	1

Apologies for the small essay.
And thanks for your help.
I tried to loop over multiples groupby but I got lost in it.

Regards,
Dario

Asked By: Dario Bani

||

Source

Answer 1

IIUC you need to perform successive groupby.ngroup using the previous column as grouper:

out = pd.DataFrame(index=df.index)

out[df.columns[0]] = df.groupby(df.columns[0]).ngroup().add(1)

for i in range(1, df.shape[1]):
    out[df.columns[i]] = (df
  .groupby(df.columns[i-1], group_keys=False)
  .apply(lambda g: g.groupby(df.columns[i]).ngroup().add(1)).squeeze()
)

print(out)

If you need to group by all previous columns, change the loop to:

for i in range(1, df.shape[1]):
    out[df.columns[i]] = (df
  .groupby(list(df.columns[:i]), group_keys=False)
  .apply(lambda g: g.groupby(df.columns[i]).ngroup().add(1)).squeeze()
)

Output:

       One  Two  Three  Four  Five  Six
Index                                  
1        1    1      1     1     1    1
2        1    2      1     1     1    1
3        1    2      1     2     1    1
4        1    2      1     2     2    1
5        1    2      2     1     1    1
6        1    2      2     2     1    1
7        1    2      2     3     1    1
8        1    2      3     1     1    1
9        1    2      3     2     1    1

Answered By: mozway

Assigning increasing integer numbers to distinct values that share identical values in the previous columns

Question:

Answers:

Index	One	Two	Three	Four	Five	Six
1	A	–	–	–	–	–
2	A	B	C	–	–	–
3	A	B	C	F	L	–
4	A	B	C	F	M	S
5	A	B	D	G	N	–
6	A	B	D	H	O	–
7	A	B	D	I	P	T
8	A	B	E	J	Q	–
9	A	B	E	K	R	U

Index	One	Two	Three	Four	Five	Six
1	1	1	1	1	1	1
2	1	2	1	1	1	1
3	1	2	1	2	1	1
4	1	2	1	2	2	1
5	1	2	2	1	1	1
6	1	2	2	2	1	1
7	1	2	2	3	1	1
8	1	2	3	1	1	1
9	1	2	3	2	1	1

Index	One	Two	Three	Four	Five	Six
1	A	–	–	–	–	–
2	A	B	C	–	–	–
3	A	B	C	F	L	–
4	A	B	C	F	M	S
5	A	B	D	G	N	–
6	A	B	D	H	O	–
7	A	B	D	I	P	T
8	A	B	E	J	Q	–
9	A	B	E	K	R	U

Index	One	Two	Three	Four	Five	Six
1	1	1	1	1	1	1
2	1	2	1	1	1	1
3	1	2	1	2	1	1
4	1	2	1	2	2	1
5	1	2	2	1	1	1
6	1	2	2	2	1	1
7	1	2	2	3	1	1
8	1	2	3	1	1	1
9	1	2	3	2	1	1

Index	One	Two	Three	Four	Five	Six
1	A	–	–	–	–	–
2	A	B	C	–	–	–
3	A	B	C	F	L	–
4	A	B	C	F	M	S
5	A	B	D	G	N	–
6	A	B	D	H	O	–
7	A	B	D	I	P	T
8	A	B	E	J	Q	–
9	A	B	E	K	R	U

Index	One	Two	Three	Four	Five	Six
1	1	1	1	1	1	1
2	1	2	1	1	1	1
3	1	2	1	2	1	1
4	1	2	1	2	2	1
5	1	2	2	1	1	1
6	1	2	2	2	1	1
7	1	2	2	3	1	1
8	1	2	3	1	1	1
9	1	2	3	2	1	1