Converting Broken String with Python

Question

my Python code :

cursor = conn.cursor()
cursor.execute("select * from %s" % table_name)

row = cursor.fetchall()
data = ( [tuple(el.encode('latin1').decode('euc-kr') for el in t) for t in row] )

    
# Open CSV file for writing.
csvFile = csv.writer(open(filePath + fileName, 'w', newline='', encoding='utf-8'),
                    delimiter=',', lineterminator='rn',
                    quoting=csv.QUOTE_ALL, escapechar='\')

csvFile.writerows(data)

Convert euckr data to utf8 to create a csv file

Normal data is converted.

Broken characters cannot be converted, how should I deal with them?

Broken characters exmaple : 뚦 딺똚

Error message when executing code :

Traceback (most recent call last):
File "test.py", line 42, in <module>
batch_extrat('test_table')
File "test.py", line 30, in batch_extrat
data = ( [tuple(el.encode('latin1').decode('euc-kr') for el in t) for t in row] )
File "test.py", line 30, in <listcomp>
data = ( [tuple(el.encode('latin1').decode('euc-kr') for el in t) for t in row] )
File "test.py", line 30, in <genexpr>
data = ( [tuple(el.encode('latin1').decode('euc-kr') for el in t) for t in row] )
UnicodeDecodeError: 'euc_kr' codec can't decode byte 0x8c in position 0: illegal multibyte sequence

If I can’t convert broken letters, I want to convert them into "?"

Asked By: Bellpump

||

Source

Answer 1

You can use the replace error handler

# Replace invalid characters with '?' using the 'replace' error handler.
decoded_t = tuple(el.encode('latin1').decode('euc-kr', errors='replace') for el in t)

The replace error handler to replace any invalid or undefined characters with the ‘?’ placeholder

Answered By: Ayush Naik

Converting Broken String with Python

Question:

Answers: