Generating and reading QR codes with special characters

Question:

I’m writing Python program that does the following:

Create a QR code > Save to a png file > Open the file > Read the QR code information

However, when the data on the code has special characters, I got some confusion output data. Here’s my code:

import pyqrcode
from PIL import Image
from pyzbar.pyzbar import decode


data = 'Thomsôn Gonçalves Ámaral,325.432.123-21'

file_iso = 'QR_ISO.png'
file_utf = 'QR_Utf.png'

#creating QR codes
qr_iso = pyqrcode.create(data) #creates qr code using iso-8859-1 encoding
qr_utf = pyqrcode.create(data, encoding = 'utf-8') #creates qr code using utf-8 encoding
#saving png files
qr_iso.png(file_iso, scale = 8)
qr_utf.png(file_utf, scale = 8)

#Reading  and Identifying QR codes

img_iso = Image.open(file_iso)
img_utf = Image.open(file_utf)

dec_iso = decode(img_iso)
dec_utf = decode(img_utf)

# Reading Results:

print(dec_iso[0].data)
print(dec_iso[0].data.decode('utf-8'))
print(dec_iso[0].data.decode('iso-8859-1'),'n')

print(dec_utf[0].data)
print(dec_utf[0].data.decode('utf-8'))
print(dec_utf[0].data.decode('iso-8859-1'))

And here’s the output:

b'Thomsxeex8cx9e Gonxe8xbbx8blves xefxbex81maral,325.432.123-21'
Thoms Gon軋lves チmaral,325.432.123-21
Thoms Gon軋lves ï¾maral,325.432.123-21 

b'Thomsxefxbex83xefxbdxb4n Gonxefxbex83xefxbdxa7alves xefxbex83xefxbcxbbaral,325.432.123-21'
Thomsテエn Gonテァalves テ[aral,325.432.123-21
Thomsテエn Gonテァalves テ[aral,325.432.123-21

For simple data it works just fine, but when data has characters like ‘Á, ç ‘ and so on this happens.
Any ideas of what should I do to fix it?

Additional information:

Asked By: BrunoHuf

||

Answers:

Try to encode the UTF-8 decoded result with shift-jis and decode the result again with UTF-8.

dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')

This works at least with your example where the QR code uses UTF-8 as well.

See also https://github.com/NaturalHistoryMuseum/pyzbar/issues/14

Answered By: user14091216

Alright! Got some updates:

Short version:

The answer from @user14091216 seems to solve the problem. The line:

dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')

does a double-decoding, which fixes the problem. I did lots of tests without any error. The new code is down below.

What I’ve tried and found out – Long version:

After talking to some colleagues they suggested that my data was somehow double-encoded. I still don’t know why this happens, but for what I’ve read, it seems to be a problem with pyzbar lib, when it reads data with special characters.

The first thing I’ve tried was to use the BOM (byte order mark):

Based on my original code, I wrote these lines:

data = 'xEFxBBxBF' + 'Thomsôn Gonçalves Ámaral,325.432.123-21'
qr_iso = pyqrcode.create(data) # Creates QR code using ISO 8859-1 encoding as standard    
qr_iso.png(file_iso, scale = 8)
img_iso = Image.open(file_iso)
dec_iso = decode(img_iso)
print(dec_iso[0].data.decode('utf-8'))

And this was the output:

Thomsôn Gonçalves Ámaral,325.432.123-21

Note that even though I created the QR code using ISO 8859-1 encoding, it only worked when decoded as UTF-8.
I also need to treat this data, removing the BOM. Which is easy, but it is an additional step. It is worth mentioning that for simpler data (without the special characters), the output didn’t have the  with it.

The solution above works, but at least for me it didn’t seem completely right. I was using it because I didn’t have a better one.

I even try to double decode the data:

Based on ‘python double-decoding’ searches, I’ve tried codes like this (and some variations):

dec_iso[0].data.decode('iso-8859-1').encode('raw_unicode_escape').decode('iso-8859-1')
dec_utf[0].data.decode('utf-8').encode('raw_unicode_escape').decode('utf-8')

but none of this worked.

The fix:

As suggested, I tried the following line:

dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')

And it worked perfectly. I’ve tested it with over 1800 data strings without getting a single error.
The QR code generation seems to be fine. This line of code only treats the output data from the pyzbar lib, when it reads the QR image (and it doesn’t need to be a QR code created by pyqrcode lib specifically).

I haven’t been able to decode QR codes generated with ISO 8859-1 encoding using the same technique. It might be something related to pyzbar or I simply haven’t found out which one is the right pattern for the decode-encode-decode process.

So here’s a simple code for creating and reading a QR code, based on UTF-8 encoding:

import pyqrcode
from PIL import Image
from pyzbar.pyzbar import decode


data = 'Thomsôn Gonçalves Ámaral,325.432.123-21'

file_utf = 'QR_Utf.png'

# Creating QR codes
qr_utf = pyqrcode.create(data, encoding = 'utf-8') # Creates QR code using UTF-8 encoding

# Saving png file
qr_utf.png(file_utf, scale = 8)

# Reading and identifying QR code

img_utf = Image.open(file_utf)
dec_utf = decode(img_utf)

# Decoding results:

print(dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8'))

For more info, see also:
iOS: ZBar SDK unicode characters
https://sourceforge.net/p/zbar/support-requests/21/

Answered By: BrunoHuf