How can I add new category in a categorical variables after a qcut?

Question:

I created a categorical variable and I want to create a new category for a specific value of other variable

I have a dataframe with a variable Score that takes values from 0-100. I did deciles out of it but I want to create a new category for a specific value

df['Score_pr']=pd.qcut(df['Score'] ,10,duplicates='drop')

df.loc[X_n['Score']==1,'Score_pr']='0'

I expected a new categroy 0 for all cases that had Score=1
but I had this message:

Cannot setitem on a Categorical with a new category, set the
categories first

Asked By: Marina

||

Answers:

The error litterally says that you need to set the category before assigning something to it. So, create it. Here is a link to the doc.

Since you didn’t provide an output, I don’t know if that’s what you were looking for, but I think this is it.

df = pd.DataFrame({'Score': [1, 2, 3,4,5,6]*100})
print(df.head())
#      Score
# 0      1
# 1      2
# 2      3
# 3      4
# 4      5
df['Score_pr'] = pd.qcut(df['Score'] , 10, duplicates='drop')
print(df.head())
#      Score      Score_pr
# 0      1  (0.999, 2.0]
# 1      2  (0.999, 2.0]
# 2      3    (2.0, 3.0]
# 3      4    (3.5, 4.0]
# 4      5    (4.0, 5.0]
df['Score_pr'] = df['Score_pr'].cat.add_categories('0')
df.loc[df['Score']==1,'Score_pr']='0'
print(df.head())
#      Score      Score_pr
# 0      1             0
# 1      2  (0.999, 2.0]
# 2      3    (2.0, 3.0]
# 3      4    (3.5, 4.0]
# 4      5    (4.0, 5.0]

And if you want to reorder so that the ‘0’ comes as the first category…

cat = df['Score_pr'].cat.categories.tolist()
cat = cat[:-1]
cat.insert(0, '0')
series = pd.Series(cat)
df['Score_pr'] = df['Score_pr'].cat.reorder_categories(series)
Answered By: IMCoins

At least with modern pandas versions, ordering the new value first can be done in one (long) line:

# Add '0' as a category value:
df['Score_pr'] = df['Score_pr'].cat.add_categories('0')
# Order it before the other values:
df['Score_pr'] = df['Score_pr'].cat.reorder_categories(np.roll(df['Score_pr'].cat.categories, 1))
Answered By: Michel de Ruiter
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.