pandas.read_csv is ignoring quoting of strings
Question:
I am having some trouble reading/importing a csv file into a pandas dataframe. The import is not skipping the comma that is enclosed in quotes.
I have tried different options for quotechar but none made any difference
import csv
import pandas
df = pandas.read_csv( 'test_quote.csv', header=None,sep=',', quotechar='"', quoting=csv.QUOTE_MINIMAL, encoding='ascii', engine='python')
print(df)
code output
$ python3 test_quote.py
0 1 2 3 4 5 6
0 201571 2080 "December 2 2022" "November 1 - November 30 2022" 487.29
1 345741 5377 "December 3 2022" "November 1 - November 30 2022" 729.35
2 995349 3672 "December 2 2022" "November 1 - November 30 2022" 937.33
3 475601 3672 "December 2 2022" "November 1 - November 30 2022" 790.17
4 228548 3672 "December 7 2022" "November 1 - November 30 2022" 682.38
expected output
$ python3 test_quote.py
0 1 2 3 4
0 201571 2080 "December 2, 2022" "November 1 - November 30, 2022" 487.29
1 345741 5377 "December 3, 2022" "November 1 - November 30, 2022" 729.35
2 995349 3672 "December 2 , 2022" "November 1 - November 30 , 2022" 937.33
3 475601 3672 "December 2 , 2022" "November 1 - November 30 , 2022" 790.17
4 228548 3672 "December 7, 2022" "November 1 - November 30, 2022" 682.38
input file = test_quote.csv
201571, 2080, "December 2, 2022", "November 1 - November 30, 2022", 487.29
345741, 5377, "December 3, 2022", "November 1 - November 30, 2022", 729.35
995349, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 937.33
475601, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 790.17
228548, 3672, "December 7, 2022", "November 1 - November 30, 2022", 682.38
Answers:
The extra spaces after the commas are causing the issue. Use the following, but note most of your parameters are already the defaults.
import csv
import pandas
df = pandas.read_csv( 'test_quote.csv', header=None, skipinitialspace=True)
print(df)
Output:
0 1 2 3 4
0 201571 2080 December 2, 2022 November 1 - November 30, 2022 487.29
1 345741 5377 December 3, 2022 November 1 - November 30, 2022 729.35
2 995349 3672 December 2 , 2022 November 1 - November 30 , 2022 937.33
3 475601 3672 December 2 , 2022 November 1 - November 30 , 2022 790.17
4 228548 3672 December 7, 2022 November 1 - November 30, 2022 682.38
I am having some trouble reading/importing a csv file into a pandas dataframe. The import is not skipping the comma that is enclosed in quotes.
I have tried different options for quotechar but none made any difference
import csv
import pandas
df = pandas.read_csv( 'test_quote.csv', header=None,sep=',', quotechar='"', quoting=csv.QUOTE_MINIMAL, encoding='ascii', engine='python')
print(df)
code output
$ python3 test_quote.py
0 1 2 3 4 5 6
0 201571 2080 "December 2 2022" "November 1 - November 30 2022" 487.29
1 345741 5377 "December 3 2022" "November 1 - November 30 2022" 729.35
2 995349 3672 "December 2 2022" "November 1 - November 30 2022" 937.33
3 475601 3672 "December 2 2022" "November 1 - November 30 2022" 790.17
4 228548 3672 "December 7 2022" "November 1 - November 30 2022" 682.38
expected output
$ python3 test_quote.py
0 1 2 3 4
0 201571 2080 "December 2, 2022" "November 1 - November 30, 2022" 487.29
1 345741 5377 "December 3, 2022" "November 1 - November 30, 2022" 729.35
2 995349 3672 "December 2 , 2022" "November 1 - November 30 , 2022" 937.33
3 475601 3672 "December 2 , 2022" "November 1 - November 30 , 2022" 790.17
4 228548 3672 "December 7, 2022" "November 1 - November 30, 2022" 682.38
input file = test_quote.csv
201571, 2080, "December 2, 2022", "November 1 - November 30, 2022", 487.29
345741, 5377, "December 3, 2022", "November 1 - November 30, 2022", 729.35
995349, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 937.33
475601, 3672, "December 2 , 2022", "November 1 - November 30 , 2022", 790.17
228548, 3672, "December 7, 2022", "November 1 - November 30, 2022", 682.38
The extra spaces after the commas are causing the issue. Use the following, but note most of your parameters are already the defaults.
import csv
import pandas
df = pandas.read_csv( 'test_quote.csv', header=None, skipinitialspace=True)
print(df)
Output:
0 1 2 3 4
0 201571 2080 December 2, 2022 November 1 - November 30, 2022 487.29
1 345741 5377 December 3, 2022 November 1 - November 30, 2022 729.35
2 995349 3672 December 2 , 2022 November 1 - November 30 , 2022 937.33
3 475601 3672 December 2 , 2022 November 1 - November 30 , 2022 790.17
4 228548 3672 December 7, 2022 November 1 - November 30, 2022 682.38