Price column object to int in pandas
Question:
I have a column called amount with holds values that look like this: $3,092.44 when I do dataframe.dtypes()
it returns this column as an object how can i convert this column to type int?
Answers:
You can set it to Int by:
df['amount'] = df['amount'].astype(np.int)
If you want to tell Python to read the column as Int at first place, use:
#assuming you're reading from a file
pd.read_csv(file_name, dtype={'amount':np.int32})
Assuming your column name is amount
, here is what you should do:
dataframe['amount'] = dataframe.amount.str.replace('$|.|,', '').astype(int)
You can use Series.replace
or Series.str.replace
with Series.astype
:
dataframe = pd.DataFrame(data={'amount':['$3,092.44', '$3,092.44']})
print (dataframe)
amount
0 $3,092.44
1 $3,092.44
dataframe['amount'] = dataframe['amount'].replace('[$,.]', '', regex=True).astype(int)
print (dataframe)
amount
0 309244
1 309244
dataframe['amount'] = dataframe['amount'].astype(int)
print (dataframe)
amount
0 309244
1 309244
in regex D
means not digit… so we can use pd.Series.str.replace
dataframe.amount.replace('D', '', regex=True).astype(int)
0 309244
1 309244
Name: amount, dtype: int64
This is how you do it while also discarding the cents:
car_sales["Price"] = car_sales["Price"].str.replace('[$,]|.d*', '').astype(int)
This will also work: dframe.amount.str.replace("$","").astype(int)
dataframe["amount"] = dataframe["amount"].str.replace('[$,.]', '').astype(int)
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000.00
1 Honda Red 87899 4 $5,000.00
2 Toyota Blue 32549 3 $7,000.00
3 BMW Black 11179 5 $22,000.00
4 Nissan White 213095 4 $3,500.00
5 Toyota Green 99213 4 $4,500.00
6 Honda Blue 45698 4 $7,500.00
7 Honda Blue 54738 4 $7,000.00
8 Toyota White 60000 4 $6,250.00
9 Nissan White 31600 4 $9,700.00
car_sales["Price"].dtype
output-dtype('O')
car_sales["Price"]=car_sales["Price"].str.replace('[$,.]', '').astype(int)
car_sales["Price"]
output:
0 400000
1 500000
2 700000
3 2200000
4 350000
5 450000
6 750000
7 700000
8 625000
9 970000
Name: Price, dtype: int32
This should be simple, just by replacing $
, commas(,
), and decimals (.
dots) with nothing (''
) and removing extra zeros, it would work.
your_column_name = your_column_name.str.replace('[$,]|.d*', '').astype(int)
I think using lambda and ignoring $ is also better solution
dollarizer = lambda x: float(x[1:-1])
dataframe.amount = dataframe.amount.apply(dollarizer)
To avoid extra ZEROs while converting object to int. you should convert the object ($3,092.440) to float using following code:
Syntax:
your_dataframe["your_column_name"] = your_dataframe["your_column_name"].str.replace('[$,]', '').astype(float)
Example:
car_sales["Price"] = car_sales["Price"].replace('[$,]', '').astype(float)
Result:
4000.0
dataframe["amount"] = dataframe["amount"].str.replace('[$,.]|..$','',regex=True).astype(int)
in str.replace(...)
[$,.] mean find $ , .
| mean or
..$ mean find any last 2 character
so '[$,.]|..$' mean find $ , . or any last 2 character
If you want to convert a price into string then you can use the below method:
car_sales["Price"] = car_sales["Price"].replace('[$,]', '').astype(str)
car_sales["Price"]
0 400000
1 500000
2 700000
3 2200000
4 350000
5 450000
6 750000
7 700000
8 625000
9 970000
Name: Price, dtype: object
Here is a simple way to do it:
cars["amount"] = cars["amount"].str.replace("$" , "").str.replace("," , "").astype("float").astype("int")
- First you remove the dollar sign
- Next you remove the comma
- Then you convert the column to float. If you try to convert the column straight to integer, you will get the following error: Can only use .str accessor with string values!
- Finally you convert the column to integer
export_car_sales["Price"] = export_car_sales["Price"].replace('[$,.]', '', regex=True).astype(int)
Try with this one:
car_sales["Price"] = car_sales["Price"].str.replace('[$,]|.d*', '').astype(int)
but you have to divide it by 100 to remove the additional zeros that are going to be created, so you will have to run this additional instruction:
car_sales["Price"]=car_sales["Price"].apply(lambda x: x/100)
In the above code we have to use float
instead of integer
so that the cent value would be remain as cents.
df['Price'] = df['Price'].str.replace('[$,]','').astype(float)
This should work:
import pandas as pd
pd.read_csv('car-sales.csv')
car_sales['Price']=car_sales['Price'].str.replace('$','',regex=False).str.replace(',','',regex=False).astype(float).astype(int)
# Initially the code removes all the dollar signs and commas.
# Then it converts the string type values into float type values (the code can't directly convert the string type values into int type values).
# And then finally it converts the float type values into int type values
The original data behind this code is:
Hope this was useful!
This worked for me
car_sales = pd.read_csv("https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/data/car-sales.csv")
car_sales["Price"] = car_sales["Price"].replace("[$,.]", "", regex=True).map(lambda x: str(x)[:-2]).astype(int)
car_sales
car_sales["Price"] = car_sales["Price"].replace('[$,]', '', regex=True).astype(float)
I have a column called amount with holds values that look like this: $3,092.44 when I do dataframe.dtypes()
it returns this column as an object how can i convert this column to type int?
You can set it to Int by:
df['amount'] = df['amount'].astype(np.int)
If you want to tell Python to read the column as Int at first place, use:
#assuming you're reading from a file
pd.read_csv(file_name, dtype={'amount':np.int32})
Assuming your column name is amount
, here is what you should do:
dataframe['amount'] = dataframe.amount.str.replace('$|.|,', '').astype(int)
You can use Series.replace
or Series.str.replace
with Series.astype
:
dataframe = pd.DataFrame(data={'amount':['$3,092.44', '$3,092.44']})
print (dataframe)
amount
0 $3,092.44
1 $3,092.44
dataframe['amount'] = dataframe['amount'].replace('[$,.]', '', regex=True).astype(int)
print (dataframe)
amount
0 309244
1 309244
dataframe['amount'] = dataframe['amount'].astype(int)
print (dataframe)
amount
0 309244
1 309244
in regex D
means not digit… so we can use pd.Series.str.replace
dataframe.amount.replace('D', '', regex=True).astype(int)
0 309244
1 309244
Name: amount, dtype: int64
This is how you do it while also discarding the cents:
car_sales["Price"] = car_sales["Price"].str.replace('[$,]|.d*', '').astype(int)
This will also work: dframe.amount.str.replace("$","").astype(int)
dataframe["amount"] = dataframe["amount"].str.replace('[$,.]', '').astype(int)
Make Colour Odometer (KM) Doors Price
0 Toyota White 150043 4 $4,000.00
1 Honda Red 87899 4 $5,000.00
2 Toyota Blue 32549 3 $7,000.00
3 BMW Black 11179 5 $22,000.00
4 Nissan White 213095 4 $3,500.00
5 Toyota Green 99213 4 $4,500.00
6 Honda Blue 45698 4 $7,500.00
7 Honda Blue 54738 4 $7,000.00
8 Toyota White 60000 4 $6,250.00
9 Nissan White 31600 4 $9,700.00
car_sales["Price"].dtype
output-dtype('O')
car_sales["Price"]=car_sales["Price"].str.replace('[$,.]', '').astype(int)
car_sales["Price"]
output:
0 400000
1 500000
2 700000
3 2200000
4 350000
5 450000
6 750000
7 700000
8 625000
9 970000
Name: Price, dtype: int32
This should be simple, just by replacing $
, commas(,
), and decimals (.
dots) with nothing (''
) and removing extra zeros, it would work.
your_column_name = your_column_name.str.replace('[$,]|.d*', '').astype(int)
I think using lambda and ignoring $ is also better solution
dollarizer = lambda x: float(x[1:-1])
dataframe.amount = dataframe.amount.apply(dollarizer)
To avoid extra ZEROs while converting object to int. you should convert the object ($3,092.440) to float using following code:
Syntax:
your_dataframe["your_column_name"] = your_dataframe["your_column_name"].str.replace('[$,]', '').astype(float)
Example:
car_sales["Price"] = car_sales["Price"].replace('[$,]', '').astype(float)
Result:
4000.0
dataframe["amount"] = dataframe["amount"].str.replace('[$,.]|..$','',regex=True).astype(int)
in str.replace(...)
[$,.] mean find $ , .
| mean or
..$ mean find any last 2 character
so '[$,.]|..$' mean find $ , . or any last 2 character
If you want to convert a price into string then you can use the below method:
car_sales["Price"] = car_sales["Price"].replace('[$,]', '').astype(str)
car_sales["Price"]
0 400000
1 500000
2 700000
3 2200000
4 350000
5 450000
6 750000
7 700000
8 625000
9 970000
Name: Price, dtype: object
Here is a simple way to do it:
cars["amount"] = cars["amount"].str.replace("$" , "").str.replace("," , "").astype("float").astype("int")
- First you remove the dollar sign
- Next you remove the comma
- Then you convert the column to float. If you try to convert the column straight to integer, you will get the following error: Can only use .str accessor with string values!
- Finally you convert the column to integer
export_car_sales["Price"] = export_car_sales["Price"].replace('[$,.]', '', regex=True).astype(int)
Try with this one:
car_sales["Price"] = car_sales["Price"].str.replace('[$,]|.d*', '').astype(int)
but you have to divide it by 100 to remove the additional zeros that are going to be created, so you will have to run this additional instruction:
car_sales["Price"]=car_sales["Price"].apply(lambda x: x/100)
In the above code we have to use float
instead of integer
so that the cent value would be remain as cents.
df['Price'] = df['Price'].str.replace('[$,]','').astype(float)
This should work:
import pandas as pd
pd.read_csv('car-sales.csv')
car_sales['Price']=car_sales['Price'].str.replace('$','',regex=False).str.replace(',','',regex=False).astype(float).astype(int)
# Initially the code removes all the dollar signs and commas.
# Then it converts the string type values into float type values (the code can't directly convert the string type values into int type values).
# And then finally it converts the float type values into int type values
The original data behind this code is:
Hope this was useful!
This worked for me
car_sales = pd.read_csv("https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/data/car-sales.csv")
car_sales["Price"] = car_sales["Price"].replace("[$,.]", "", regex=True).map(lambda x: str(x)[:-2]).astype(int)
car_sales
car_sales["Price"] = car_sales["Price"].replace('[$,]', '', regex=True).astype(float)