Python prints wrong encoding from MySQL database

Question:

I’m having issues with my Python when calling data from my MySQL database.

The database is set as UTF-8 and is containing, special letters such as ‘Æ’ and ‘Ø’.

This is the code i use to call the Usernames from my Table

# -*- coding: UTF-8 -*-
import pymysql
db = pymysql.connect(
    host="localhost",
    user="root",
    password="XXXXXXXX",
    db="pythonconnectiontest", 
    charset="utf8"
)
cursor = db.cursor()
cursor.execute("SELECT Username FROM Account")

numrows = cursor.rowcount 
for i in range (numrows):
    row = cursor.fetchone()
    print row

The expected output is:

ThisTextIsBrøken
Tæst 
Word

Actual output:

(u'ThisTextIsBrxf8ken',)
(u'Txe6st',)
(u'Word',)

I do get that the ‘U’ in front of the value, indicates that this is UNICODE and i probably just need to make python interpret the text correctly with encode() or decode().

I’ve spent many hours trying to figure this out, and the solution is probably very simple.

I hope someone can help me fix this.

Thanks for reading.

Asked By: TwizzX

||

Answers:

The unicode strings it’s outputting are perfectly fine. You can verify this by trying print u'ThisTextIsBrxf8ken' in a python shell:

➜ python2
Python 2.7.15 (default, Jan 10 2019, 23:20:52)
[GCC 8.2.1 20181127] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print u'ThisTextIsBrxf8ken'
ThisTextIsBrøken

Your confusion is merely about how python represents unicode strings that are wrapped in another object. In this case your rows each represent a tuple of columns or more specifically a tuple containing a single unicode string.

You can verify this by adjusting your code to print the actual unicode string rather than the tuple containing it.

# -*- coding: UTF-8 -*-
import pymysql
db = pymysql.connect(
    host="localhost",
    user="root",
    password="XXXXXXXX",
    db="pythonconnectiontest", 
    charset="utf8"
)
cursor = db.cursor()
cursor.execute("SELECT Username FROM Account")

numrows = cursor.rowcount 
for i in range (numrows):
    row = cursor.fetchone()
    print row[0]

This should now output the expected strings.

Answered By: Fynn Becker