Unicode not printing correctly to cp850 (cp437), play card suits

Question:

To summarize: How do I print unicode system independently to produce play card symbols?

What I do wrong, I consider myself quite fluent in Python, except I seem not able to print correctly!

# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
import sys

symbols = ('♥','♦','♠','♣')
# red suits to sdterr for IDLE
print(' '.join(symbols[:2]), file=sys.stderr)
print(' '.join(symbols[2:]))

sys.stdout.write(symbols) # also correct in IDLE
print(' '.join(symbols))

Printing to console, which is main consern for console application, is failing miserably though:

J:test>chcp
Aktiivinen koodisivu: 850


J:test>symbol2
Traceback (most recent call last):
  File "J:testsymbol2.py", line 9, in <module>
    print(''.join(symbols))
  File "J:Python26libencodingscp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <unde
fined>
J:test>chcp 437
Aktiivinen koodisivu: 437

J:test>d:Python27python.exe symbol2.py
Traceback (most recent call last):
  File "symbol2.py", line 6, in <module>
    print(' '.join(symbols))
  File "d:Python27libencodingscp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'u2660' in position 0: character maps
o <undefined>

J:test>

So summa summarum I have console application which works as long as you are not using console, but IDLE.

I can of course generate the symbols myself by producing them by chr:

# correct symbols for cp850
print(''.join(chr(n) for n in range(3,3+4)))

But this looks very stupid way to do it. And I do not make programs only run on Windows or have many special cases (like conditional compiling). I want readable code.

I do not mind which letters it outputs, as long as it looks correct no matter if it is Nokia phone, Windows or Linux. Unicode should do it but it does not print correctly to Console

Asked By: Tony Veijalainen

||

Answers:

In response to the updated question

Since all you want to do is to print out UTF-8 characters on the CMD, you’re out of luck, CMD does not support UTF-8:
Is there a Windows command shell that will display Unicode characters?

Old Answer

It’s not totally clear what you’re trying to do here, my best bet is that you want to write the encoded UTF-8 to a file.

Your problems are:

  1. symbols = ('♠','♥', '♦','♣') while your file encoding maybe UTF-8, unless you’re using Python 3 your strings wont be UTF-8 by default, you need to prefix them with a small u:
    symbols = (u'♠', u'♥', u'♦', u'♣')

  2. Your str(arg) converts the unicode string back into a normal one, just leave it out or use unicode(arg) to convert to a unicode string

  3. The naming of .decode() may be confusing, this decodes bytes into UTF-8, but what you need to do is to encode UTF-8 into bytes so use .encode()

  4. You’re not writing to the file in binary mode, instead of open('test.txt', 'w') your need to use open('test.txt', 'wb') (notice the wb) this will open the file in binary mode which is important on windows

If we put all of this together we get:

# -*- coding: utf-8 -*-
from __future__ import print_function
import sys

symbols = (u'♠',u'♥', u'♦',u'♣')

print(' '.join(symbols))
print('Failure!')

def print(*args,**kwargs):
    end = kwargs[end] if 'end' in kwargs else 'n'
    sep = kwargs[sep] if 'sep' in kwargs else ' '
    stdout = sys.stdout if 'file' not in kwargs else kwargs['file']
    stdout.write(sep.join(unicode(arg).encode('utf-8') for arg in args))
    stdout.write(end)

print(*symbols)
print('Success!')
with open('test.txt', 'wb') as testfile:
    print(*symbols, file=testfile)

That happily writes the byte encoded UTF-8 to the file (at least on my Ubuntu box here).

Answered By: Ivo Wetzel

Whenever I need to output utf-8 characters, I use the following approach:

import codecs

out = codecs.getwriter('utf-8')(sys.stdout)

str = u'♠'

out.write("%sn" % str)

This saves me an encode('utf-8') every time something needs to be sent to sdtout/stderr.

Answered By: Fredrik Pihl

Use Unicode strings and the codecs module:

Either:

# coding: utf-8
from __future__ import print_function
import sys
import codecs

symbols = (u'♠',u'♥',u'♦',u'♣')

print(u' '.join(symbols))
print(*symbols)
with codecs.open('test.txt','w','utf-8') as testfile:
    print(*symbols, file=testfile)

or:

# coding: utf-8
from __future__ import print_function
from __future__ import unicode_literals
import sys
import codecs

symbols = ('♠','♥','♦','♣')

print(' '.join(symbols))
print(*symbols)
with codecs.open('test.txt','w','utf-8') as testfile:
    print(*symbols, file=testfile)

No need to re-implement print.

Answered By: Mark Tolonen

UTF-8 in the Windows console is a long and painful story.

You can read issue 1602 and issue 6058 and have something that works, more or less, but it’s fragile.

Let me summarise:

  • add ‘cp65001’ as an alias for ‘utf8’ in Lib/encodings/aliases.py
  • select Lucida Console or Consolas as your console font
  • run chcp 65001
  • run python
Answered By: tzot
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.