How to split a comma-separated line if the chunk contains a comma in Python?

Question:

I’m trying to split current line into 3 chunks.
Title column contains comma which is delimiter

1,"Rink, The (1916)",Comedy

Current code is not working

id, title, genres = line.split(',')

Expected result

id = 1
title = 'Rink, The (1916)'
genres = 'Comedy'

Any thoughts how to split it properly?

Asked By: Volodymyr K

||

Answers:

Ideally, you should use a proper CSV parser and specify that double quote is an escape character. If you must proceed with the current string as the starting point, here is a regex trick which should work:

inp = '1,"Rink, The (1916)",Comedy'
parts = re.findall(r'".*?"|[^,]+', inp)
print(parts)  # ['1', '"Rink, The (1916)"', 'Comedy']

The regex pattern works by first trying to find a term "..." in double quotes. That failing, it falls back to finding a CSV term which is defined as a sequence of non comma characters (leading up to the next comma or end of the line).

Answered By: Tim Biegeleisen

Use the csv package from the standard library:

>>> import csv, io
>>> s = """1,"Rink, The (1916)",Comedy"""
>>> # Load the string into a buffer so that csv reader will accept it.
>>> reader = csv.reader(io.StringIO(s))
>>> next(reader)
['1', 'Rink, The (1916)', 'Comedy']
Answered By: snakecharmerb

lets talk about why your code does not work

id, title, genres = line.split(',')

here line.split(‘,’) return 4 values(since you have 3 commas) on the other hand you are expecting 3 values hence you get.

ValueError: too many values to unpack (expected 3)

My advice to you will be to not use commas but use other characters

"1#"Rink, The (1916)"#Comedy"

and then

id, title, genres = line.split('#')

Well you can do it by making it a tuple

line = (1,"Rink, The (1916)",Comedy)
id, title, genres = line
Answered By: Arsal Khan
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.