Converted columns from str to floats, but when I attempt to subtract two columns I get a ValueError

Question:

I imported data from a csv into a pandas data frame. I removed values from the string that are not numbers, the "$" in front of all the values. I then converted the columns to a float data type. I run a print(df.dtypes) after the conversion and it shows all the columns as being a float64. After the print statement I attempt to subtract one column from another but get an error saying:

line 23, in <module>
    Price_Diff = df["HTB_Price" - "McMaster_Price"]
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Here is my code

import pandas as pd
import matplotlib.pyplot as mp
import numpy as np

# Reads the csv and create a dataframe titled "df"
df = pd.read_csv('Example Price Dataset.csv', sep='s*,s*', engine='python')

# Removes the "$" from all columns using a left strip
df['HTB_Price'] = df['HTB_Price'].map(lambda x: x.lstrip('$'))
df['McMaster_Price'] = df['McMaster_Price'].map(lambda x: x.lstrip('$'))
df['Motion_Price'] = df['Motion_Price'].map(lambda x: x.lstrip('$'))
df['MRO_Price'] = df['MRO_Price'].map(lambda x: x.lstrip('$'))

# Converts each column to a float datatype instead of a string
df["HTB_Price"] = df["HTB_Price"].astype(float)
df["McMaster_Price"] = df["McMaster_Price"].astype(float)
df["Motion_Price"] = df["Motion_Price"].astype(float)
df["MRO_Price"] = df["MRO_Price"].astype(float)
print(df.dtypes)


#
Price_Diff = df["HTB_Price" - "McMaster_Price"]


# Prints the dataframe
# print(df.dtypes)

The error is on the Price_Diff line, and I’m not sure why it is throwing an error about not being able to subtract strings from each other, when right before that line I’m checking the data types and it says they are both floats.

I’m expecting the values in each column to be subtracted and placed in the variable Price_Diff

Asked By: Christian Brune

||

Answers:

What you want to write instead is:

Price_Diff = df["HTB_Price"] - df["McMaster_Price"]

The part inside the brackets is related to indexing your dataframe, so here, Python just tells you it is not able to substract "McMaster_Price" from "HTB_Price".

Answered By: R_D

The issue in the code is with the line that calculates the Price_Diff. You are trying to subtract two strings "HTB_Price" – "McMaster_Price" instead of the actual columns of the dataframe df["HTB_Price"] - df["McMaster_Price"]. Here’s the corrected code:

# Calculates the price difference between two columns
Price_Diff = df["HTB_Price"] - df["McMaster_Price"]
Answered By: Umrbek

You could indeed try Price_Diff = df["HTB_Price"] - df["McMaster_Price"] but a string-based interface exists too:

df.eval("HTB_Price - McMaster_Price")

A similar interface exists for filtering:

df.query("HTB_Price < McMaster_Price")

You can even modify the original dataframe inplace adding columns directly:

>>> df.eval('C = A + B', inplace=True)
>>> df
   A   B   C
0  1  10  11
1  2   8  10
2  3   6   9
3  4   4   8
4  5   2   7
Answered By: rudolfovic
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.