assign number to group in pandas dataframe based on multiple columns
Question:
I have the following dataframe (nodes
)
nodeType subType
Supplier 1 Supplier Supplier
Supplier 2 Supplier Supplier
Supplier 3 Supplier Supplier of another type
System Integrator System Integrator System Integrator
Availability Zone 1 Availability Zone Server
Availability Zone 2 Availability Zone Warehouse
Availability Zone 3 Availability Zone Warehouse
Availability Zone 4 Availability Zone Warehouse
I would like to have a new column assigning a number to "subType" depending if they belong to the same nodeType
Expected result:
nodeType subType enumeration
Supplier 1 Supplier Supplier 0
Supplier 2 Supplier Supplier 0
Supplier 3 Supplier Supplier of another type 1
System Integrator System Integrator System Integrator 0
Availability Zone 1 Availability Zone Server 0
Availability Zone 2 Availability Zone Warehouse 1
Availability Zone 3 Availability Zone Warehouse 1
Availability Zone 4 Availability Zone Warehouse 1
up to this point, my best approach was to use
nodes["enumeration"] = nodes.groupby("nodeType").subType.cumcount()
but this doesn´t yield what I am expecting.
Thanks in advance
Answers:
The solution can be achieved by using the following command
nodes["nodeType_enum"] = nodes.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())
I tried this same command without setting "group_keys" to False. Once done that, I got what I was expecting
try using ngroup (counts the number of groups)
print("show where nodeType is the parent and subType is counted for unique groups by countn")
data="""nodeType,subType
Supplier,Supplier
Supplier,Supplier
Supplier,Supplier of another type
System Integrator,System Integrator
Availability Zone,Server
Availability Zone,Warehouse
Availability Zone,Warehouse
Availability Zone,Warehouse
"""
df = pd.read_csv(StringIO(data), sep=',',usecols=["nodeType","subType"])
df["nodeType_enum"] = df.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())
print(df)
I have the following dataframe (nodes
)
nodeType subType
Supplier 1 Supplier Supplier
Supplier 2 Supplier Supplier
Supplier 3 Supplier Supplier of another type
System Integrator System Integrator System Integrator
Availability Zone 1 Availability Zone Server
Availability Zone 2 Availability Zone Warehouse
Availability Zone 3 Availability Zone Warehouse
Availability Zone 4 Availability Zone Warehouse
I would like to have a new column assigning a number to "subType" depending if they belong to the same nodeType
Expected result:
nodeType subType enumeration
Supplier 1 Supplier Supplier 0
Supplier 2 Supplier Supplier 0
Supplier 3 Supplier Supplier of another type 1
System Integrator System Integrator System Integrator 0
Availability Zone 1 Availability Zone Server 0
Availability Zone 2 Availability Zone Warehouse 1
Availability Zone 3 Availability Zone Warehouse 1
Availability Zone 4 Availability Zone Warehouse 1
up to this point, my best approach was to use
nodes["enumeration"] = nodes.groupby("nodeType").subType.cumcount()
but this doesn´t yield what I am expecting.
Thanks in advance
The solution can be achieved by using the following command
nodes["nodeType_enum"] = nodes.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())
I tried this same command without setting "group_keys" to False. Once done that, I got what I was expecting
try using ngroup (counts the number of groups)
print("show where nodeType is the parent and subType is counted for unique groups by countn")
data="""nodeType,subType
Supplier,Supplier
Supplier,Supplier
Supplier,Supplier of another type
System Integrator,System Integrator
Availability Zone,Server
Availability Zone,Warehouse
Availability Zone,Warehouse
Availability Zone,Warehouse
"""
df = pd.read_csv(StringIO(data), sep=',',usecols=["nodeType","subType"])
df["nodeType_enum"] = df.groupby("nodeType",group_keys=False).apply(lambda x: x.groupby("subType").ngroup())
print(df)