Python categorical variables NaN while creating box-plot

Question:

After I successfully created categorical values, their result is NaN.

I used this command:

df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), [0,4,8,12],
                     include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])

After running df.head() here’s the table:

enter image description here

When I try to box-plot them:

sns.boxplot(x='Memory', y='Price', data=df['RAM '] = pd.to_numeric(df['RAM '], errors="coerce") ;  df['Memory']= pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, ..)

I get error invalid syntax

After I run command: print(df.sample(10).to_dict("list"))

I get this output:

{'Brand': ['Oppo', 'Apple', 'Samsung', 'Apple', 'Xiaomi', 'OnePlus', 'Xiaomi', 'Nokia', 'Samsung', 'Vivo'], 'Model': ['Reno6 Pro 5G', 'iPhone XR', 'Galaxy S20 FE 5G', 'iPhone 12', 'Poco X3 Pro', 'Nord 2', 'Redmi 9 Power', 'G20', 'Galaxy Note20 5G', 'Y33s'], 'Storage': ['128 GB', '64 GB', '128', '128', '128GB', '128GB', '128GB', '64GB', '128', '128GB'], 'RAM': ['12 GB', '3 GB', '6', '4', '6GB', '8GB', '4GB', '4GB', '8', '8GB'], 'ScreenSize': ['6.55', '6.1', '6.5', '6.1', '6.67', '6.43', '6.53', '6.5', '6.7', '6.58'], 'Camera': ['64 + 8 + 2 + 2', '12 + 12', '12+12+8', '12+12', '48MP + 8MP + 2MP + 2MP', '50MP + 8MP + 2MP', '48MP + 8MP + 2MP + 2MP', '48MP + 5MP', '12+64+12', '50MP + 2MP + 2MP'], 'BatteryCapacity': [4500, 2942, 4500, 2815, 5160, 4500, 6000, 5050, 4300, 5000], 'Price': ['659', '499', '699', '799', '$249 ', '$399 ', '$199 ', '$199 ', '1049', '$269 '], 'Memory': ['Advanced', 'Basic', 'Intermediate', 'Basic', 'Intermediate', 'Intermediate', 'Basic', 'Basic', 'Intermediate', 'Intermediate']}

Is there any way to fix this?

Asked By: Amar

||

Answers:

IIUC, use this :

df.columns = df.columns.str.strip()

df["Memory"] = pd.cut(df["RAM"].str.extract(r"(d+)", expand=False).astype(float), [0,4,8,12],
                     include_lowest=True, labels=['Basic', 'Intermediate', 'Advanced'])

df["Price"] = df["Price"].str.extract(r"(d+)", expand=False).astype(float)

sns.boxplot(x="Memory", y="Price", data=df);

Output :

enter image description here

Answered By: Timeless