unicode | py4u

Using UTF-8 in Python 3 string literals

Using UTF-8 in Python 3 string literals Question: I have a script I’m writing where I need to print the character sequence "Qä" to the terminal. My terminal is using UTF-8 encoding. My file has # -*- coding: utf-8 -*- at the top of it, which I think is not actually necessary for Python 3, …

Total answers: 1

Python: How to convert string of an encoded Unicode variable to a binary variable

Python: How to convert string of an encoded Unicode variable to a binary variable Question: I am building an app to transliterate Myanmar (Burmese) text to International Phonetic Alphabet. I have found that it’s easier to manipulate combining Unicode characters in my dictionaries as binary variables like this b’xe1x80xadxe1x80xafxe1x80x84′: ‘áɪŋ̃’, #’ိုင’ because otherwise they glue …

Total answers: 1

Selective replacement of unicode characters in Python using regex

Selective replacement of unicode characters in Python using regex Question: There are many answers as to how one can use regex to remove unicode characters in Python. See Remove Unicode code (uxxx) in string Python and Python regex module "re" match unicode characters with u However, in my case, I don’t want to replace every …

Total answers: 2

Preserve letter order when replacing LTR chars with RTL chars in a word at byte level

Preserve letter order when replacing LTR chars with RTL chars in a word at byte level Question: I have a Hebrew word "יתꢀראꢁ" which needs to be "בראשית". To correct I am encoding and than replacing chars. The replacement works however since I am replacing LTR chars with RTL chars the order gets jumbled. data="יתꢀראꢁ".encode("unicode_escape") …

Total answers: 2

In Python, how to use re.sub() to replace all literal Unicode spaces?

In Python, how to use re.sub() to replace all literal Unicode spaces? Question: In Python, when I use readlines() to read from a text file, something that was originally a space will become a literal Unicode character, as shown follows. Where u2009 is a space in the original text file. So, I’m using re.sub() to …

Total answers: 2

Why does some unicode characters change shape depending on which character is first used in tkinter?

Why does some unicode characters change shape depending on which character is first used in tkinter? Question: I’m using Windows 11 and python 3.11.1. Some unicode characters change look depending on which character is used first in tkinter per font. The code below show the behavior: import tkinter as tk from tkinter.font import Font app …

Total answers: 1

Python utf-8 encoding not following unicode rules

Python utf-8 encoding not following unicode rules Question: Background: I’ve got a byte file that is encoded using unicode. However, I can’t figure out the right method to get Python to decode it to a string. Sometimes is uses 1-byte ASCII text. The majority of the time it uses 2-byte "plain latin" text, but it …

Total answers: 1

Converting elements in a single list to key/value pair using Unicode characters as key

Converting elements in a single list to key/value pair using Unicode characters as key Question: I have a list (see below) that I want to take any element in the list containing a Unicode character (e.g.,’①’,’②’,’㉖’) as the key/value pair inside a ‘category’ JSON element and the following elements in the list between each Unicode …

Total answers: 2

How to send accented characters with diacritics in HTTP request-payload?

How to send accented characters with diacritics in HTTP request-payload? Question: I am requiring to send special characters like accented characters with diacritics, e.g. o-acute ó, via API This is my test code import string import http.client import datetime import json def apiSendFarmacia(idatencion,articulo,deviceid): ##API PAYLOAD now = datetime.datetime.now() conn = http.client.HTTPSConnection("apimocha.com") payload = json.dumps({ "idatencion": …

Total answers: 1

How to print unicode character from a string variable?

How to print unicode character from a string variable? Question: I am new in programming world, and I am a bit confused. I expecting that both print result the same graphical unicode exclamation mark symbol: My experiment: number = 10071 byteStr = number.to_bytes(4, byteorder=’big’) hexStr = hex(number) uniChar = byteStr.decode(‘utf-32be’) uniStr = ‘\u’ + hexStr[2:6] …

Total answers: 1