python pandas read_csv thousands separator does not work

Question:

I use pandas read_csv to read a simple csv file. However, it has ValueError: could not convert string to float: which I do not understand why.

The code is simply

rawdata = pd.read_csv( r'Journal_input.csv' ,
                      dtype = { 'Base Amount' : 'float64' } , 
                      thousands = ',' ,
                      decimal = '.',
                      encoding = 'ISO-8859-1')

But I get this error

pandasparser.pyx in pandas.parser.TextReader.read
(pandasparser.c:10415)()

pandasparser.pyx in pandas.parser.TextReader._read_low_memory
(pandasparser.c:10691)()

pandasparser.pyx in pandas.parser.TextReader._read_rows
(pandasparser.c:11728)()

pandasparser.pyx in pandas.parser.TextReader._convert_column_data
(pandasparser.c:13162)()

pandasparser.pyx in pandas.parser.TextReader._convert_tokens
(pandasparser.c:14487)()

ValueError: could not convert string to float: ‘79,026,695.50’

How can it possible to get error when converting a string of ‘79,026,695.50’ to float? I have already specified the two options

thousands = ',' ,
decimal = '.',

Is it some problem our my code or a bug in pandas?

Asked By: palazzo train

||

Answers:

It seems there is problem with quoting, because if separator is , and thousands is , too, some quoting has to be in csv:

import pandas as pd
from pandas.compat import StringIO
import csv

temp=u"""'a','Base Amount'
'11','79,026,695.50'"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), 
                 dtype = { 'Base Amount' : 'float64' },
                 thousands = ',' ,
                 quotechar = "'",
                 quoting = csv.QUOTE_ALL,
                 decimal = '.',
                 encoding = 'ISO-8859-1')

print (df)
    a  Base Amount
0  11   79026695.5

temp=u'''"a","Base Amount"
"11","79,026,695.50"'''
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), 
                 dtype = { 'Base Amount' : 'float64' },
                 thousands = ',' ,
                 quotechar = '"',
                 quoting = csv.QUOTE_ALL,
                 decimal = '.',
                 encoding = 'ISO-8859-1')

print (df)
    a  Base Amount
0  11   79026695.5
Answered By: jezrael

First of all you get rid of the comma:
Example:

num = '79,026,695.50' 
print(num)
# '79,026,695.50'
num = num.replace(',', '') 
print(num)
79026695.50 
num = float(num)

in case:

rawdata['base_amount'] = rawdata['base_amount'].str.replace(',', '').astype(np.float64)
Answered By: Andre Gustavo
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.