Python categorical variables NaN while creating box-plot
Question:
After I successfully created categorical values, their result is NaN.
I used this command:
df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), [0,4,8,12],
include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
After running df.head()
here’s the table:
When I try to box-plot them:
sns.boxplot(x='Memory', y='Price', data=df['RAM '] = pd.to_numeric(df['RAM '], errors="coerce") ; df['Memory']= pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, ..)
I get error invalid syntax
After I run command: print(df.sample(10).to_dict("list"))
I get this output:
{'Brand': ['Oppo', 'Apple', 'Samsung', 'Apple', 'Xiaomi', 'OnePlus', 'Xiaomi', 'Nokia', 'Samsung', 'Vivo'], 'Model': ['Reno6 Pro 5G', 'iPhone XR', 'Galaxy S20 FE 5G', 'iPhone 12', 'Poco X3 Pro', 'Nord 2', 'Redmi 9 Power', 'G20', 'Galaxy Note20 5G', 'Y33s'], 'Storage': ['128 GB', '64 GB', '128', '128', '128GB', '128GB', '128GB', '64GB', '128', '128GB'], 'RAM': ['12 GB', '3 GB', '6', '4', '6GB', '8GB', '4GB', '4GB', '8', '8GB'], 'ScreenSize': ['6.55', '6.1', '6.5', '6.1', '6.67', '6.43', '6.53', '6.5', '6.7', '6.58'], 'Camera': ['64 + 8 + 2 + 2', '12 + 12', '12+12+8', '12+12', '48MP + 8MP + 2MP + 2MP', '50MP + 8MP + 2MP', '48MP + 8MP + 2MP + 2MP', '48MP + 5MP', '12+64+12', '50MP + 2MP + 2MP'], 'BatteryCapacity': [4500, 2942, 4500, 2815, 5160, 4500, 6000, 5050, 4300, 5000], 'Price': ['659', '499', '699', '799', '$249 ', '$399 ', '$199 ', '$199 ', '1049', '$269 '], 'Memory': ['Advanced', 'Basic', 'Intermediate', 'Basic', 'Intermediate', 'Intermediate', 'Basic', 'Basic', 'Intermediate', 'Intermediate']}
Is there any way to fix this?
Answers:
IIUC, use this :
df.columns = df.columns.str.strip()
df["Memory"] = pd.cut(df["RAM"].str.extract(r"(d+)", expand=False).astype(float), [0,4,8,12],
include_lowest=True, labels=['Basic', 'Intermediate', 'Advanced'])
df["Price"] = df["Price"].str.extract(r"(d+)", expand=False).astype(float)
sns.boxplot(x="Memory", y="Price", data=df);
Output :
After I successfully created categorical values, their result is NaN.
I used this command:
df['Memory']= pd.cut(pd.to_numeric(df['RAM '], errors="coerce"), [0,4,8,12],
include_lowest=True, labels=['Basic', 'Intermediate', 'Advaced'])
After running df.head()
here’s the table:
When I try to box-plot them:
sns.boxplot(x='Memory', y='Price', data=df['RAM '] = pd.to_numeric(df['RAM '], errors="coerce") ; df['Memory']= pd.cut(df['RAM '], [0,4,8,12], include_lowest=True, ..)
I get error invalid syntax
After I run command: print(df.sample(10).to_dict("list"))
I get this output:
{'Brand': ['Oppo', 'Apple', 'Samsung', 'Apple', 'Xiaomi', 'OnePlus', 'Xiaomi', 'Nokia', 'Samsung', 'Vivo'], 'Model': ['Reno6 Pro 5G', 'iPhone XR', 'Galaxy S20 FE 5G', 'iPhone 12', 'Poco X3 Pro', 'Nord 2', 'Redmi 9 Power', 'G20', 'Galaxy Note20 5G', 'Y33s'], 'Storage': ['128 GB', '64 GB', '128', '128', '128GB', '128GB', '128GB', '64GB', '128', '128GB'], 'RAM': ['12 GB', '3 GB', '6', '4', '6GB', '8GB', '4GB', '4GB', '8', '8GB'], 'ScreenSize': ['6.55', '6.1', '6.5', '6.1', '6.67', '6.43', '6.53', '6.5', '6.7', '6.58'], 'Camera': ['64 + 8 + 2 + 2', '12 + 12', '12+12+8', '12+12', '48MP + 8MP + 2MP + 2MP', '50MP + 8MP + 2MP', '48MP + 8MP + 2MP + 2MP', '48MP + 5MP', '12+64+12', '50MP + 2MP + 2MP'], 'BatteryCapacity': [4500, 2942, 4500, 2815, 5160, 4500, 6000, 5050, 4300, 5000], 'Price': ['659', '499', '699', '799', '$249 ', '$399 ', '$199 ', '$199 ', '1049', '$269 '], 'Memory': ['Advanced', 'Basic', 'Intermediate', 'Basic', 'Intermediate', 'Intermediate', 'Basic', 'Basic', 'Intermediate', 'Intermediate']}
Is there any way to fix this?
IIUC, use this :
df.columns = df.columns.str.strip()
df["Memory"] = pd.cut(df["RAM"].str.extract(r"(d+)", expand=False).astype(float), [0,4,8,12],
include_lowest=True, labels=['Basic', 'Intermediate', 'Advanced'])
df["Price"] = df["Price"].str.extract(r"(d+)", expand=False).astype(float)
sns.boxplot(x="Memory", y="Price", data=df);
Output :