Converted columns from str to floats, but when I attempt to subtract two columns I get a ValueError
Question:
I imported data from a csv into a pandas data frame. I removed values from the string that are not numbers, the "$" in front of all the values. I then converted the columns to a float data type. I run a print(df.dtypes) after the conversion and it shows all the columns as being a float64. After the print statement I attempt to subtract one column from another but get an error saying:
line 23, in <module>
Price_Diff = df["HTB_Price" - "McMaster_Price"]
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Here is my code
import pandas as pd
import matplotlib.pyplot as mp
import numpy as np
# Reads the csv and create a dataframe titled "df"
df = pd.read_csv('Example Price Dataset.csv', sep='s*,s*', engine='python')
# Removes the "$" from all columns using a left strip
df['HTB_Price'] = df['HTB_Price'].map(lambda x: x.lstrip('$'))
df['McMaster_Price'] = df['McMaster_Price'].map(lambda x: x.lstrip('$'))
df['Motion_Price'] = df['Motion_Price'].map(lambda x: x.lstrip('$'))
df['MRO_Price'] = df['MRO_Price'].map(lambda x: x.lstrip('$'))
# Converts each column to a float datatype instead of a string
df["HTB_Price"] = df["HTB_Price"].astype(float)
df["McMaster_Price"] = df["McMaster_Price"].astype(float)
df["Motion_Price"] = df["Motion_Price"].astype(float)
df["MRO_Price"] = df["MRO_Price"].astype(float)
print(df.dtypes)
#
Price_Diff = df["HTB_Price" - "McMaster_Price"]
# Prints the dataframe
# print(df.dtypes)
The error is on the Price_Diff line, and I’m not sure why it is throwing an error about not being able to subtract strings from each other, when right before that line I’m checking the data types and it says they are both floats.
I’m expecting the values in each column to be subtracted and placed in the variable Price_Diff
Answers:
What you want to write instead is:
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
The part inside the brackets is related to indexing your dataframe, so here, Python just tells you it is not able to substract "McMaster_Price"
from "HTB_Price"
.
The issue in the code is with the line that calculates the Price_Diff. You are trying to subtract two strings "HTB_Price" – "McMaster_Price" instead of the actual columns of the dataframe df["HTB_Price"] - df["McMaster_Price"]
. Here’s the corrected code:
# Calculates the price difference between two columns
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
You could indeed try Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
but a string-based interface exists too:
df.eval("HTB_Price - McMaster_Price")
A similar interface exists for filtering:
df.query("HTB_Price < McMaster_Price")
You can even modify the original dataframe inplace adding columns directly:
>>> df.eval('C = A + B', inplace=True)
>>> df
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7
I imported data from a csv into a pandas data frame. I removed values from the string that are not numbers, the "$" in front of all the values. I then converted the columns to a float data type. I run a print(df.dtypes) after the conversion and it shows all the columns as being a float64. After the print statement I attempt to subtract one column from another but get an error saying:
line 23, in <module>
Price_Diff = df["HTB_Price" - "McMaster_Price"]
TypeError: unsupported operand type(s) for -: 'str' and 'str'
Here is my code
import pandas as pd
import matplotlib.pyplot as mp
import numpy as np
# Reads the csv and create a dataframe titled "df"
df = pd.read_csv('Example Price Dataset.csv', sep='s*,s*', engine='python')
# Removes the "$" from all columns using a left strip
df['HTB_Price'] = df['HTB_Price'].map(lambda x: x.lstrip('$'))
df['McMaster_Price'] = df['McMaster_Price'].map(lambda x: x.lstrip('$'))
df['Motion_Price'] = df['Motion_Price'].map(lambda x: x.lstrip('$'))
df['MRO_Price'] = df['MRO_Price'].map(lambda x: x.lstrip('$'))
# Converts each column to a float datatype instead of a string
df["HTB_Price"] = df["HTB_Price"].astype(float)
df["McMaster_Price"] = df["McMaster_Price"].astype(float)
df["Motion_Price"] = df["Motion_Price"].astype(float)
df["MRO_Price"] = df["MRO_Price"].astype(float)
print(df.dtypes)
#
Price_Diff = df["HTB_Price" - "McMaster_Price"]
# Prints the dataframe
# print(df.dtypes)
The error is on the Price_Diff line, and I’m not sure why it is throwing an error about not being able to subtract strings from each other, when right before that line I’m checking the data types and it says they are both floats.
I’m expecting the values in each column to be subtracted and placed in the variable Price_Diff
What you want to write instead is:
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
The part inside the brackets is related to indexing your dataframe, so here, Python just tells you it is not able to substract "McMaster_Price"
from "HTB_Price"
.
The issue in the code is with the line that calculates the Price_Diff. You are trying to subtract two strings "HTB_Price" – "McMaster_Price" instead of the actual columns of the dataframe df["HTB_Price"] - df["McMaster_Price"]
. Here’s the corrected code:
# Calculates the price difference between two columns
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
You could indeed try Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
but a string-based interface exists too:
df.eval("HTB_Price - McMaster_Price")
A similar interface exists for filtering:
df.query("HTB_Price < McMaster_Price")
You can even modify the original dataframe inplace adding columns directly:
>>> df.eval('C = A + B', inplace=True)
>>> df
A B C
0 1 10 11
1 2 8 10
2 3 6 9
3 4 4 8
4 5 2 7