How can I add new category in a categorical variables after a qcut?
Question:
I created a categorical variable and I want to create a new category for a specific value of other variable
I have a dataframe with a variable Score
that takes values from 0-100. I did deciles out of it but I want to create a new category for a specific value
df['Score_pr']=pd.qcut(df['Score'] ,10,duplicates='drop')
df.loc[X_n['Score']==1,'Score_pr']='0'
I expected a new categroy 0
for all cases that had Score=1
but I had this message:
Cannot setitem on a Categorical with a new category, set the
categories first
Answers:
The error litterally says that you need to set the category before assigning something to it. So, create it. Here is a link to the doc.
Since you didn’t provide an output, I don’t know if that’s what you were looking for, but I think this is it.
df = pd.DataFrame({'Score': [1, 2, 3,4,5,6]*100})
print(df.head())
# Score
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
df['Score_pr'] = pd.qcut(df['Score'] , 10, duplicates='drop')
print(df.head())
# Score Score_pr
# 0 1 (0.999, 2.0]
# 1 2 (0.999, 2.0]
# 2 3 (2.0, 3.0]
# 3 4 (3.5, 4.0]
# 4 5 (4.0, 5.0]
df['Score_pr'] = df['Score_pr'].cat.add_categories('0')
df.loc[df['Score']==1,'Score_pr']='0'
print(df.head())
# Score Score_pr
# 0 1 0
# 1 2 (0.999, 2.0]
# 2 3 (2.0, 3.0]
# 3 4 (3.5, 4.0]
# 4 5 (4.0, 5.0]
And if you want to reorder so that the ‘0’ comes as the first category…
cat = df['Score_pr'].cat.categories.tolist()
cat = cat[:-1]
cat.insert(0, '0')
series = pd.Series(cat)
df['Score_pr'] = df['Score_pr'].cat.reorder_categories(series)
At least with modern pandas
versions, ordering the new value first can be done in one (long) line:
# Add '0' as a category value:
df['Score_pr'] = df['Score_pr'].cat.add_categories('0')
# Order it before the other values:
df['Score_pr'] = df['Score_pr'].cat.reorder_categories(np.roll(df['Score_pr'].cat.categories, 1))
I created a categorical variable and I want to create a new category for a specific value of other variable
I have a dataframe with a variable Score
that takes values from 0-100. I did deciles out of it but I want to create a new category for a specific value
df['Score_pr']=pd.qcut(df['Score'] ,10,duplicates='drop')
df.loc[X_n['Score']==1,'Score_pr']='0'
I expected a new categroy 0
for all cases that had Score=1
but I had this message:
Cannot setitem on a Categorical with a new category, set the
categories first
The error litterally says that you need to set the category before assigning something to it. So, create it. Here is a link to the doc.
Since you didn’t provide an output, I don’t know if that’s what you were looking for, but I think this is it.
df = pd.DataFrame({'Score': [1, 2, 3,4,5,6]*100})
print(df.head())
# Score
# 0 1
# 1 2
# 2 3
# 3 4
# 4 5
df['Score_pr'] = pd.qcut(df['Score'] , 10, duplicates='drop')
print(df.head())
# Score Score_pr
# 0 1 (0.999, 2.0]
# 1 2 (0.999, 2.0]
# 2 3 (2.0, 3.0]
# 3 4 (3.5, 4.0]
# 4 5 (4.0, 5.0]
df['Score_pr'] = df['Score_pr'].cat.add_categories('0')
df.loc[df['Score']==1,'Score_pr']='0'
print(df.head())
# Score Score_pr
# 0 1 0
# 1 2 (0.999, 2.0]
# 2 3 (2.0, 3.0]
# 3 4 (3.5, 4.0]
# 4 5 (4.0, 5.0]
And if you want to reorder so that the ‘0’ comes as the first category…
cat = df['Score_pr'].cat.categories.tolist()
cat = cat[:-1]
cat.insert(0, '0')
series = pd.Series(cat)
df['Score_pr'] = df['Score_pr'].cat.reorder_categories(series)
At least with modern pandas
versions, ordering the new value first can be done in one (long) line:
# Add '0' as a category value:
df['Score_pr'] = df['Score_pr'].cat.add_categories('0')
# Order it before the other values:
df['Score_pr'] = df['Score_pr'].cat.reorder_categories(np.roll(df['Score_pr'].cat.categories, 1))