Suppress the u'prefix indicating unicode' in python strings

Question

Is there a way to globally suppress the unicode string indicator in python? I’m working exclusively with unicode in an application, and do a lot of interactive stuff. Having the u’prefix’ show up in all of my debug output is unnecessary and obnoxious. Can it be turned off?

Asked By: Ryan

||

Source

Answer 1

You could use Python 3.0.. The default string type is unicode, so the u'' prefix is no longer required..

In short, no. You cannot turn this off.

The u comes from the unicode.__repr__ method, which is used to display stuff in REPL:

>>> print repr(unicode('a'))
u'a'
>>> unicode('a')
u'a'

If I’m not mistaken, you cannot override this without recompiling Python.

The simplest way around this is to simply print the string..

>>> print unicode('a')
a

If you use the unicode() builtin to construct all your strings, you could do something like..

>>> class unicode(unicode):
...     def __repr__(self):
...             return __builtins__.unicode.__repr__(self).lstrip("u")
... 
>>> unicode('a')
a

..but don’t do that, it’s horrible

Answered By: dbr

Answer 2

I know this isn’t a global option, but you can also suppress the Unicode u by placing the string in a str() function.

So a Unicode derived list that would look like:

>>> myList=[unicode('a'),unicode('b'),unicode('c')]
>>> myList
[u'a', u'b', u'c']

would become this:

>>> myList=[str(unicode('a')),str(unicode('b')),str(unicode('c'))]
>>> myList
['a', 'b', 'c']

It’s a bit cumbersome, but might be useful to some one

Answered By: Electrice

Answer 3

Try the following

print str(result.url)

It could be that your default encoding has been changed.

You can check your default encoding with the following:-

> import sys
> print sys.getdefaultencoding()
> ascii

The default should be ascii which means u’string’ should be printed as ‘string’ but yours may have been modified.

Answered By: Martin

Answer 4

Not sure with unicode, but generally you can call str.encode() to convert it to a more suitable form. For instance, subprocess output captured in Python 3.0+ captures it as a byte stream (prefix ‘b’), and encode() fixes to a regular string form.

Answered By: AK.

Answer 5

You have to use print str(your_Variable)

Answered By: Nasser Hadjloo

Answer 6

using str( text ) is a somewhat bad idea in fact whenever you cannot be 100% sure about both your python’s default encoding and the exact content of the string—the latter would be typical for a text fetched from the internet. also, depending on what you want to do, using print text.encode( 'utf-8' ) or print repr( text.encode( 'utf-8' ) ) may yield disappointing results, as you might get a rendering full of unreadable codepoints like x3a.

i think the optimum is really to avail yourself of a unicode-capable command line (difficult under windows, easy under linux) and switch from python 2.x to python 3.x. the ease and clarity of text vs bytes handling afforded by the new python 3 series is really one of the big gains you can expect. it does mean you’ll have to spend a little time learning the distinction between ‘bytes’ and ‘text’ and grasp the concept of character encodings, but then that time is much better spent in a python 3 environment as python’s new approch to these vexing problems is much clearer and much less error-prone than what python 2 had to offer. i’d go so far as to call python 2’s approach to unicode problematic in retrospect, although i used to think of it as superior—when i compared it to the way this issue is handled in php.

edit i just stopped by a related discussion here on SO and found this comment on the way that php these days appears to tackle unicode / encoding issues:

It’s like a mouse trying to eat an
elephant. By framing Unicode as an
extension of ASCII (we have normal
strings and we have mb_strings) it
gets things the wrong way around, and
gets hung up on what special cases are
required to deal with characters with
funny squiggles that need more than
one byte. If you treat Unicode as
providing an abstract space for any
character you need, ASCII is
accommodated in that without any need
to treat it as a special case.

i quote this here because in my experience 90% of all SO python+unicode topics seem to come from people who used to be fine with ascii or maybe latin-1, got bitten by the occasional character that was not supported in their usual settings, and then basically just want to get rid of it. what you do when switching to python 3 is exactly what the commenter above suggests to do: instead of viewing unicode as a vexing extension of ascii, you start to view ascii (and almost any other encoding you’ll ever meet) as subset(s) of unicode.

to be true, unicode v6 is certainly not the last word in encodings, but it is as close to being universal as you can get in 2011. get used to it.

Answered By: flow

Answer 7

In the case that you do not want to update to Python 3, you could make use of substrings.
For example, say the original output was (u’mystring’,). Let us assume for the sake of the example that the variable row contains the “mystring” string without the unicode prefix. Then you would want to do something like this:

temp = str(row); #str is not necessary, but probably good practice
temp = temp[:-3];
print = temp[3:];

Answered By: Agent0

Answer 8

I had a case where I needed drop the u prefix because I was setting up some javascript with python as part of an html template. A simple output left the u prefix in for the dict keys e.g.

var turns = [{u'armies':2...];

which breaks javascript.

In order to get the output javascript needed, I used the json python module to encode the string for me:

turns = json.dumps(turns)

This does the trick in my particular case and as the keys are all ascii there is no worry about the encoding. You could probably use this trick for your debug output.

Answered By: Paul Whipp

Answer 9

from __future__ import unicode_literals

is available since Python 2.6 (released on October 1, 2008). It is default in Python 3.

It allows to omit u'' prefix in the source code though it does not change repr(unicode_string) that would be misleading.

You could override sys.displayhook() in a Python REPL, to display objects however your like. You could also override __repr__ for your own custom objects.

Answered By: jfs

Answer 10

Just in case you are getting something like this u['hello'] then you must be printing an array. print str(arr[0]) and you are good to go.

Answered By: Max

Answer 11

What seems to be working for me:

import ast
import json
j = json.loads('{"one" : "two"}')
j
dd = {u'one': u'two'}
dd
# to get double quotes
json.dumps(j,  encoding='ascii')
json.dumps(dd, encoding='ascii')
# to get single quotes
str(ast.literal_eval(json.dumps(j,  encoding='ascii')))
str(ast.literal_eval(json.dumps(dd, encoding='ascii')))

Output:

>>> {u'one': u'two'}
>>> {u'one': u'two'}
>>> '{"one": "two"}'
>>> '{"one": "two"}'
>>> "{'one': 'two'}"
>>> "{'one': 'two'}"

Above works for dictionaries and JSON objects, as self-evident.

For just a string, wrapping in str() seems to work for me.

s=u'test string'
s
str(s)

Output:

>>> u'test string'
>>> 'test string'

Python version: 2.7.12

Answered By: tautology

Suppress the u'prefix indicating unicode' in python strings

Question:

Answers: