assign number to group in pandas dataframe based on multiple columns

Question:

I have the following dataframe (nodes)

                        nodeType    subType
Supplier 1              Supplier    Supplier
Supplier 2              Supplier    Supplier
Supplier 3              Supplier    Supplier of another type
System Integrator       System Integrator   System Integrator
Availability Zone 1     Availability Zone   Server
Availability Zone 2     Availability Zone   Warehouse
Availability Zone 3     Availability Zone   Warehouse
Availability Zone 4     Availability Zone   Warehouse

I would like to have a new column assigning a number to "subType" depending if they belong to the same nodeType

Expected result:

                            nodeType            subType                    enumeration
  Supplier 1                Supplier            Supplier                    0
  Supplier 2                Supplier            Supplier                    0
  Supplier 3                Supplier            Supplier of another type    1
  System Integrator         System Integrator   System Integrator           0
  Availability Zone 1       Availability Zone   Server                      0
  Availability Zone 2       Availability Zone   Warehouse                   1
  Availability Zone 3       Availability Zone   Warehouse                   1
  Availability Zone 4       Availability Zone   Warehouse                   1

up to this point, my best approach was to use

nodes["enumeration"] = nodes.groupby("nodeType").subType.cumcount()

but this doesn´t yield what I am expecting.

Thanks in advance

Asked By: Lorenzo Gutiérrez

||

Answers:

The solution can be achieved by using the following command

nodes["nodeType_enum"] = nodes.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())

I tried this same command without setting "group_keys" to False. Once done that, I got what I was expecting

Answered By: Lorenzo Gutiérrez

try using ngroup (counts the number of groups)

print("show where nodeType is the parent and subType is counted for unique groups by countn")
data="""nodeType,subType
Supplier,Supplier
Supplier,Supplier
Supplier,Supplier of another type
System Integrator,System Integrator
Availability Zone,Server
Availability Zone,Warehouse
Availability Zone,Warehouse
Availability Zone,Warehouse
"""

df = pd.read_csv(StringIO(data), sep=',',usecols=["nodeType","subType"])

df["nodeType_enum"] = df.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())
print(df)
Answered By: Golden Lion
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.