Python – how do I use pd.cut correctly?

Question:

I have a column in a dataframe called ‘STREET_NO’ and I’m trying to add a new column in the dataframe called ‘BIN’. I am trying to use pd.cut() to take the ‘STREET_NO’ (such as 1290) and change it to 1200. In other words, for street numbers from 0-300, I would like the new ‘BIN’ column to show 0. For street numbers from 301-600, I would like the ‘BIN’ column to show 300, and so on. My ‘STREET_NO’ values range from 1 – 99999. Currently I have:

df['BIN'] = pd.cut(x=df['STREET_NO'], bins=[0,300,600,900,1200,1500,1800,2100], labels=['0','300','600','900','1200','1500','1800']

I can simply keep adding more and more numbers to my ‘bins’ and ‘labels’ until I reach the final 99999, but is there an easier way to do this?

Answers:

You can use range() function for both bins and labels, just make sure the ranges overlap the total number of rows:

import pandas as pd

df = pd.DataFrame({'STREET_NO': range(1, 100000)}) # range end is n+1 to reproduce 1-99999
df['BIN'] = pd.cut(x=df['STREET_NO'], 
                   bins=list(range(0, 100500, 300)), # so 99901 would be in the last bin
                   labels=list(range(0, 100200, 300))) # labels are bins-1

print(df.tail(300))

# Output:

        STREET_NO   BIN
99599   99600       99300
99600   99601       99600
99601   99602       99600
99602   99603       99600
99603   99604       99600
...     ...         ...
99994   99995       99900
99995   99996       99900
99996   99997       99900
99997   99998       99900
99998   99999       99900
400 rows × 2 columns
Answered By: n.shabankin