How to read csv with redundant characters as dataframe?

Question:

I have hundreds of CSV files separated by comma, and the decimal separator is also a comma. These files look like this:

ID,columnA,columnB
A,0,"15,6"
B,"1,2",0
C,0,

I am trying to read all these files in python using pandas, but I am not able to separate these values properly in three columns, maybe because of the decimal separator or because some values have quotation marks.

I first tried with the code below, but then even with different encodings I could not achieve my goal

df = pd.read_csv("test.csv", sep=",")

Anyone could help me? The result should be a dataframe like this:

  ID  columnA  columnB
0  A      0.0     15.6
1  B      1.2      0.0
2  C      0.0      NaN
Asked By: user026

||

Answers:

You just need to specify decimal=","

from io import StringIO

file = '''ID,columnA,columnB
A,0,"15,6"
B,"1,2",0
C,0,'''

df = pd.read_csv(StringIO(file), decimal=",")
print(df)

Output:

  ID  columnA  columnB
0  A      0.0     15.6
1  B      1.2      0.0
2  C      0.0      NaN
Answered By: BeRT2me
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.