python pandas read_csv thousands separator does not work
Question:
I use pandas read_csv to read a simple csv file. However, it has ValueError: could not convert string to float:
which I do not understand why.
The code is simply
rawdata = pd.read_csv( r'Journal_input.csv' ,
dtype = { 'Base Amount' : 'float64' } ,
thousands = ',' ,
decimal = '.',
encoding = 'ISO-8859-1')
But I get this error
pandasparser.pyx in pandas.parser.TextReader.read
(pandasparser.c:10415)()
pandasparser.pyx in pandas.parser.TextReader._read_low_memory
(pandasparser.c:10691)()
pandasparser.pyx in pandas.parser.TextReader._read_rows
(pandasparser.c:11728)()
pandasparser.pyx in pandas.parser.TextReader._convert_column_data
(pandasparser.c:13162)()
pandasparser.pyx in pandas.parser.TextReader._convert_tokens
(pandasparser.c:14487)()
ValueError: could not convert string to float: ‘79,026,695.50’
How can it possible to get error when converting a string of ‘79,026,695.50’ to float? I have already specified the two options
thousands = ',' ,
decimal = '.',
Is it some problem our my code or a bug in pandas?
Answers:
It seems there is problem with quoting
, because if separator is ,
and thousands
is ,
too, some quoting has to be in csv
:
import pandas as pd
from pandas.compat import StringIO
import csv
temp=u"""'a','Base Amount'
'11','79,026,695.50'"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
dtype = { 'Base Amount' : 'float64' },
thousands = ',' ,
quotechar = "'",
quoting = csv.QUOTE_ALL,
decimal = '.',
encoding = 'ISO-8859-1')
print (df)
a Base Amount
0 11 79026695.5
temp=u'''"a","Base Amount"
"11","79,026,695.50"'''
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
dtype = { 'Base Amount' : 'float64' },
thousands = ',' ,
quotechar = '"',
quoting = csv.QUOTE_ALL,
decimal = '.',
encoding = 'ISO-8859-1')
print (df)
a Base Amount
0 11 79026695.5
First of all you get rid of the comma:
Example:
num = '79,026,695.50'
print(num)
# '79,026,695.50'
num = num.replace(',', '')
print(num)
79026695.50
num = float(num)
in case:
rawdata['base_amount'] = rawdata['base_amount'].str.replace(',', '').astype(np.float64)
I use pandas read_csv to read a simple csv file. However, it has ValueError: could not convert string to float:
which I do not understand why.
The code is simply
rawdata = pd.read_csv( r'Journal_input.csv' ,
dtype = { 'Base Amount' : 'float64' } ,
thousands = ',' ,
decimal = '.',
encoding = 'ISO-8859-1')
But I get this error
pandasparser.pyx in pandas.parser.TextReader.read
(pandasparser.c:10415)()pandasparser.pyx in pandas.parser.TextReader._read_low_memory
(pandasparser.c:10691)()pandasparser.pyx in pandas.parser.TextReader._read_rows
(pandasparser.c:11728)()pandasparser.pyx in pandas.parser.TextReader._convert_column_data
(pandasparser.c:13162)()pandasparser.pyx in pandas.parser.TextReader._convert_tokens
(pandasparser.c:14487)()ValueError: could not convert string to float: ‘79,026,695.50’
How can it possible to get error when converting a string of ‘79,026,695.50’ to float? I have already specified the two options
thousands = ',' ,
decimal = '.',
Is it some problem our my code or a bug in pandas?
It seems there is problem with quoting
, because if separator is ,
and thousands
is ,
too, some quoting has to be in csv
:
import pandas as pd
from pandas.compat import StringIO
import csv
temp=u"""'a','Base Amount'
'11','79,026,695.50'"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
dtype = { 'Base Amount' : 'float64' },
thousands = ',' ,
quotechar = "'",
quoting = csv.QUOTE_ALL,
decimal = '.',
encoding = 'ISO-8859-1')
print (df)
a Base Amount
0 11 79026695.5
temp=u'''"a","Base Amount"
"11","79,026,695.50"'''
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp),
dtype = { 'Base Amount' : 'float64' },
thousands = ',' ,
quotechar = '"',
quoting = csv.QUOTE_ALL,
decimal = '.',
encoding = 'ISO-8859-1')
print (df)
a Base Amount
0 11 79026695.5
First of all you get rid of the comma:
Example:
num = '79,026,695.50'
print(num)
# '79,026,695.50'
num = num.replace(',', '')
print(num)
79026695.50
num = float(num)
in case:
rawdata['base_amount'] = rawdata['base_amount'].str.replace(',', '').astype(np.float64)