using np cut and raise error bins must increase on python pandas

Question:

I have a dataframe like this :

Code    Notes   Qty
AAD     2A 2K   17
AAD     2A 1K    1 
AAG     2A 3K   12 
AAG     4A 4K    2
AAG     5A 5K    1
ALA     1A 2K   10
ALA     1A 1K    3

And my aim is to find class for EACH code, meaning Qty with biggest 80% of sum is Class 1, and rest 15% is class 2, and rest class 3. So, AAD sum is 18, that means AAD with 2A 2K (its cumsum 16 which is in scope of 80% of 18+4) is class 1 and AAD with 2A 1K is class 2. And so on.
Someone help me from my previous question to use pd.cut (I dont know if I am allowed to mention a username here). So I am trying to modified the code and apply to this new problem but the error says ‘bins must increase monotically`.
This is my code

df['cumsum']=df.groupby('Code')['Qty'].cumsum()
total = df.groupby('Code')['Qty'].sum()
bins = [total*0.15,total*0.05,total*0.8]
labels = ['class3','class2','class1']
df['class']=pd.cut(df['cumsum'], bins=bins, labels=labels, right=False)

this is result I want to achieve :

Code    Notes   Qty   Class
AAD     2A 2K   17    class1
AAD     2A 1K    1    class2
AAG     2A 3K   12    class1 
AAG     4A 4K    2    class2
AAG     5A 5K    1    class3
ALA     1A 2K   10    class1
ALA     1A 1K    3    class2  
Asked By: beginner

||

Answers:

This is what I think you want to do:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Code': ['AAD', 'AAD', 'AAG', 'AAG', 'AAG', 'ALA', 'ALA'],
    'Notes': ['2A 2K', '2A 1K', '2A 3K', '4A 4K', '5A 5K', '1A 2K', '1A 1K'],
    'Qty': [14, 4, 12, 10, 1, 10, 8]
})

df['cumsum'] = df.groupby('Code')['Qty'].cumsum()
df['total'] = df.groupby('Code')['Qty'].transform('sum')
df['class'] = np.where(df['cumsum'] <= df['total']*0.8, 'class1', 
                       np.where(df['cumsum'] <= df['total']*0.95, 'class3', 'class2'))

df = df.drop(['cumsum', 'total'], axis=1)

print(df)

which returns:

 Code  Notes  Qty   class
0  AAD  2A 2K   14  class1
1  AAD  2A 1K    4  class2
2  AAG  2A 3K   12  class1
3  AAG  4A 4K   10  class2
4  AAG  5A 5K    1  class2
5  ALA  1A 2K   10  class1
6  ALA  1A 1K    8  class2

You can combine cut and groupby.transform:

df['class'] = pd.cut(df.groupby('Code')['Qty'].transform('sum').rdiv(df['Qty']), 
                     bins=[0, 0.15, 0.8, 1], labels=['class3', 'class2', 'class1'])

NB. if you want 80% of the total per group, this doesn’t match the provided output. Maybe you want 80% of the maximum? In which can use transform('max').

Output:

  Code  Notes  Qty   class
0  AAD  2A 2K   17  class1
1  AAD  2A 1K    1  class3
2  AAG  2A 3K   12  class2
3  AAG  4A 4K    2  class3
4  AAG  5A 5K    1  class3
5  ALA  1A 2K   10  class2
6  ALA  1A 1K    3  class2

If you want to use a groupby.cumsum (which is not really what you describe):

df['class'] = pd.cut(df.groupby('Code')['Qty'].cumsum().rdiv(df['Qty']), 
                     bins=[0, 0.8, 1], labels=['class2', 'class1'])

Output:

  Code  Notes  Qty   class
0  AAD  2A 2K   14  class1
1  AAD  2A 1K    4  class2
2  AAG  2A 3K   12  class1
3  AAG  4A 4K   10  class2
4  AAG  5A 5K    4  class2
5  ALA  1A 2K   10  class1
6  ALA  1A 1K    8  class2
Answered By: mozway
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.