Python creating categorical variable error

Question:

I need to create categorical variables for RAM category.

Basic: RAM [0-4]

Intermediate: RAM [5-8]

Advanced: RAM [8-12]

Command:

df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic','Intermediate', 'Advaced'])

Error:

TypeError                                 Traceback (most recent call last)
<ipython-input-58-5c93d7c00ba2> in <cell line: 1>()
----> 1 df['Memory']=pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])

1 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
    425 
    426     side: Literal["left", "right"] = "left" if right else "right"
--> 427     ids = ensure_platform_int(bins.searchsorted(x, side=side))
    428 
    429     if include_lowest:

TypeError: '<' not supported between instances of 'int' and 'str'

Could you please help me to fix this? I’m new to Python.

Asked By: codproe

||

Answers:

It seems like you have numerical-like values in your column RAM, so use to_numeric :

df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), bins=[0,4,8,12],
                     include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])

With an example :

df = pd.DataFrame({"RAM": np.random.randint(low=1, high=12, size=100).astype(str)})

df["RAM"] = ​pd.to_numeric(df["RAM"], errors="coerce")
​
df["Memory"] = pd.cut(df["RAM"], bins=[0, 4, 8, 12],
                      labels=["Basic", "Intermediate", "Advaced"])

​
Output :

   RAM        Memory
0    2         Basic
1    2         Basic
2    6  Intermediate
..  ..           ...
97   6  Intermediate
98   1         Basic
99   7  Intermediate

[100 rows x 2 columns]
Answered By: Timeless
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.