Pythonic way to encode UTF-8 strings to ASCII/Hex and back again?

Question:

I am using a black-box database to store objects from my python code which can only store ASCII characters. Let’s assume this database cannot be swapped in for another, more friendly one. Unfortunately, the data I need to store is in UTF-8 and contains non-english characters, so simply converting my strings to ASCII and back leads to data loss.

My best idea for how to solve this is to convert my string to hex (which uses all ASCII-compliant characters), store it, and then upon retrieval convert the hex back to UTF-8.

I have tried varying combinations of encode and decode but none have given me the intended result.

Example of how I’d like this to work:

original_string='Parabéns'
original_string.some_decode_function('hex') # now it looks like A4 B8 C7 etc
database.store(original_string)

Upon retrieval:

retrieved_string=database.retrieve(storage_location) # now it looks like A4 B8 C7 etc
final-string=retrieved_string.decode('UTF-8) # now it looks like 'Parabéns'
Asked By: Mr. T

||

Answers:

You can use str.encode to encode the string into bytes and call the bytes.hex method to convert the bytes to its hexadecimal representation. To convert it back, use the bytes.fromhex method to convert the hexadecimal string to bytes, and then decoded it back to the original string with bytes.decode:

original_string = 'Parabéns'
encoded = original_string.encode().hex()
print(encoded)
print(bytes.fromhex(encoded).decode())

This outputs:

5061726162c3a96e73
Parabéns

Demo: https://replit.com/@blhsing/OutgoingUntimelyCases

Answered By: blhsing
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.