Fastest way to convert a dict's keys & values from `unicode` to `str`?
Question:
I’m receiving a dict from one “layer” of code upon which some calculations/modifications are performed before passing it onto another “layer”. The original dict’s keys & “string” values are unicode
, but the layer they’re being passed onto only accepts str
.
This is going to be called often, so I’d like to know what would be the fastest way to convert something like:
{ u'spam': u'eggs', u'foo': True, u'bar': { u'baz': 97 } }
…to:
{ 'spam': 'eggs', 'foo': True, 'bar': { 'baz': 97 } }
…bearing in mind the non-“string” values need to stay as their original type.
Any thoughts?
Answers:
def to_str(key, value):
if isinstance(key, unicode):
key = str(key)
if isinstance(value, unicode):
value = str(value)
return key, value
pass key and value to it, and add recursion to your code to account for inner dictionary.
DATA = { u'spam': u'eggs', u'foo': frozenset([u'Gah!']), u'bar': { u'baz': 97 },
u'list': [u'list', (True, u'Maybe'), set([u'and', u'a', u'set', 1])]}
def convert(data):
if isinstance(data, basestring):
return str(data)
elif isinstance(data, collections.Mapping):
return dict(map(convert, data.iteritems()))
elif isinstance(data, collections.Iterable):
return type(data)(map(convert, data))
else:
return data
print DATA
print convert(DATA)
# Prints:
# {u'list': [u'list', (True, u'Maybe'), set([u'and', u'a', u'set', 1])], u'foo': frozenset([u'Gah!']), u'bar': {u'baz': 97}, u'spam': u'eggs'}
# {'bar': {'baz': 97}, 'foo': frozenset(['Gah!']), 'list': ['list', (True, 'Maybe'), set(['and', 'a', 'set', 1])], 'spam': 'eggs'}
Assumptions:
- You’ve imported the collections module and can make use of the abstract base classes it provides
- You’re happy to convert using the default encoding (use
data.encode('utf-8')
rather than str(data)
if you need an explicit encoding).
If you need to support other container types, hopefully it’s obvious how to follow the pattern and add cases for them.
If you wanted to do this inline and didn’t need recursive descent, this might work:
DATA = { u'spam': u'eggs', u'foo': True, u'bar': { u'baz': 97 } }
print DATA
# "{ u'spam': u'eggs', u'foo': True, u'bar': { u'baz': 97 } }"
STRING_DATA = dict([(str(k), v) for k, v in data.items()])
print STRING_DATA
# "{ 'spam': 'eggs', 'foo': True, 'bar': { u'baz': 97 } }"
I know I’m late on this one:
def convert_keys_to_string(dictionary):
"""Recursively converts dictionary keys to strings."""
if not isinstance(dictionary, dict):
return dictionary
return dict((str(k), convert_keys_to_string(v))
for k, v in dictionary.items())
for a non-nested dict (since the title does not mention that case, it might be interesting for other people)
{str(k): str(v) for k, v in my_dict.items()}
To make it all inline (non-recursive):
{str(k):(str(v) if isinstance(v, unicode) else v) for k,v in my_dict.items()}
Just use print(*(dict.keys()))
The * can be used for unpacking containers e.g. lists. For more info on * check this SO answer.
>>> d = {u"a": u"b", u"c": u"d"}
>>> d
{u'a': u'b', u'c': u'd'}
>>> import json
>>> import yaml
>>> d = {u"a": u"b", u"c": u"d"}
>>> yaml.safe_load(json.dumps(d))
{'a': 'b', 'c': 'd'}
I’m receiving a dict from one “layer” of code upon which some calculations/modifications are performed before passing it onto another “layer”. The original dict’s keys & “string” values are unicode
, but the layer they’re being passed onto only accepts str
.
This is going to be called often, so I’d like to know what would be the fastest way to convert something like:
{ u'spam': u'eggs', u'foo': True, u'bar': { u'baz': 97 } }
…to:
{ 'spam': 'eggs', 'foo': True, 'bar': { 'baz': 97 } }
…bearing in mind the non-“string” values need to stay as their original type.
Any thoughts?
def to_str(key, value):
if isinstance(key, unicode):
key = str(key)
if isinstance(value, unicode):
value = str(value)
return key, value
pass key and value to it, and add recursion to your code to account for inner dictionary.
DATA = { u'spam': u'eggs', u'foo': frozenset([u'Gah!']), u'bar': { u'baz': 97 },
u'list': [u'list', (True, u'Maybe'), set([u'and', u'a', u'set', 1])]}
def convert(data):
if isinstance(data, basestring):
return str(data)
elif isinstance(data, collections.Mapping):
return dict(map(convert, data.iteritems()))
elif isinstance(data, collections.Iterable):
return type(data)(map(convert, data))
else:
return data
print DATA
print convert(DATA)
# Prints:
# {u'list': [u'list', (True, u'Maybe'), set([u'and', u'a', u'set', 1])], u'foo': frozenset([u'Gah!']), u'bar': {u'baz': 97}, u'spam': u'eggs'}
# {'bar': {'baz': 97}, 'foo': frozenset(['Gah!']), 'list': ['list', (True, 'Maybe'), set(['and', 'a', 'set', 1])], 'spam': 'eggs'}
Assumptions:
- You’ve imported the collections module and can make use of the abstract base classes it provides
- You’re happy to convert using the default encoding (use
data.encode('utf-8')
rather thanstr(data)
if you need an explicit encoding).
If you need to support other container types, hopefully it’s obvious how to follow the pattern and add cases for them.
If you wanted to do this inline and didn’t need recursive descent, this might work:
DATA = { u'spam': u'eggs', u'foo': True, u'bar': { u'baz': 97 } }
print DATA
# "{ u'spam': u'eggs', u'foo': True, u'bar': { u'baz': 97 } }"
STRING_DATA = dict([(str(k), v) for k, v in data.items()])
print STRING_DATA
# "{ 'spam': 'eggs', 'foo': True, 'bar': { u'baz': 97 } }"
I know I’m late on this one:
def convert_keys_to_string(dictionary):
"""Recursively converts dictionary keys to strings."""
if not isinstance(dictionary, dict):
return dictionary
return dict((str(k), convert_keys_to_string(v))
for k, v in dictionary.items())
for a non-nested dict (since the title does not mention that case, it might be interesting for other people)
{str(k): str(v) for k, v in my_dict.items()}
To make it all inline (non-recursive):
{str(k):(str(v) if isinstance(v, unicode) else v) for k,v in my_dict.items()}
Just use print(*(dict.keys()))
The * can be used for unpacking containers e.g. lists. For more info on * check this SO answer.
>>> d = {u"a": u"b", u"c": u"d"}
>>> d
{u'a': u'b', u'c': u'd'}
>>> import json
>>> import yaml
>>> d = {u"a": u"b", u"c": u"d"}
>>> yaml.safe_load(json.dumps(d))
{'a': 'b', 'c': 'd'}