assign number to group in pandas dataframe based on multiple columns


I have the following dataframe (nodes)

                        nodeType    subType
Supplier 1              Supplier    Supplier
Supplier 2              Supplier    Supplier
Supplier 3              Supplier    Supplier of another type
System Integrator       System Integrator   System Integrator
Availability Zone 1     Availability Zone   Server
Availability Zone 2     Availability Zone   Warehouse
Availability Zone 3     Availability Zone   Warehouse
Availability Zone 4     Availability Zone   Warehouse

I would like to have a new column assigning a number to "subType" depending if they belong to the same nodeType

Expected result:

                            nodeType            subType                    enumeration
  Supplier 1                Supplier            Supplier                    0
  Supplier 2                Supplier            Supplier                    0
  Supplier 3                Supplier            Supplier of another type    1
  System Integrator         System Integrator   System Integrator           0
  Availability Zone 1       Availability Zone   Server                      0
  Availability Zone 2       Availability Zone   Warehouse                   1
  Availability Zone 3       Availability Zone   Warehouse                   1
  Availability Zone 4       Availability Zone   Warehouse                   1

up to this point, my best approach was to use

nodes["enumeration"] = nodes.groupby("nodeType").subType.cumcount()

but this doesn´t yield what I am expecting.

Thanks in advance

Asked By: Lorenzo Gutiérrez



The solution can be achieved by using the following command

nodes["nodeType_enum"] = nodes.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())

I tried this same command without setting "group_keys" to False. Once done that, I got what I was expecting

Answered By: Lorenzo Gutiérrez

try using ngroup (counts the number of groups)

print("show where nodeType is the parent and subType is counted for unique groups by countn")
Supplier,Supplier of another type
System Integrator,System Integrator
Availability Zone,Server
Availability Zone,Warehouse
Availability Zone,Warehouse
Availability Zone,Warehouse

df = pd.read_csv(StringIO(data), sep=',',usecols=["nodeType","subType"])

df["nodeType_enum"] = df.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())
Answered By: Golden Lion
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.