Preventing Python requests.post to encode strings to UTF-8

Question:

I am making an API call to an appliance, passing a message in a JSON payload via HTTP POST.

Despite not doing any character encoding, the string received is encoded in UTF-8.

Unfortunately, the appliance manufacturer requires no encoding for the message, and characters with accents are turned into 5-character codes 🙁

Here is the code:

import requests

payload = {
           "type": "send-message",
           "username": "myuser",
           "password": "mypass",
           "to": "456",
           "msg": "here are accents: é ç"
          }

resp = requests.post("http://192.168.1.10/send_message.html",json=payload)

The result seen by the recipient doesn’t show the accent characters correctly:

received message

Doing a tcpdump, I can see the HTTP POST made by requests.post contains the following payload:

{"type": "send-message", "username": "myuser", "password": "mypass", "to": "456", "msg": "here are accents: u00e9 u00e7"}

As you can see, the text has been encoded to UTF-8, which is not asked for anywhere in the code.

If I try to force decode "here are accents: é ç".decode('utf-8') I get the error AttributeError: 'str' object has no attribute 'decode' which makes sense because it’s not encoded.

If I attempt to force ASCII: "here are accents: é ç".encode('ascii','ignore') then the accents will be lost.

Testing with CURL it works perfectly:

curl -X POST 'http://192.168.1.10/send_message.html' -H 'Content-Type: application/json' -d '{"type": "send-message","username": "myuser","password": "mypass","to": "456","msg": "here are accents: é ç" }'

Looking at the tcpdump with the curl attempt from the linux CLI shows the JSON exactly as sent, and the appliance recognizes the accents and sends them exactly as expected.

Imported into Wireshark, the string sent by CURL which is not UTF-8 formatted, and is correctly interpreted looks like this:

wireshark-screenshot

Is there a way to tell Python’s requests.post NOT to translate to UTF-8, or do I have to re-code the HTTP POST?

Thank you so much in advance.

Asked By: Love2Code4Fun

||

Answers:

If for some reason the target API is not fully JSON-compliant, you can build a JSON response manually and encode it in whatever encoding you like. ensure_ascii=False wil disable non-ASCII translation to escape codes, and you can specify the encoding if it is non-standard. The wireshark screenshot shows the data is actually UTF-8-encoded, so that is what I’ve done below:

import requests
import json

payload = {
           "type": "send-message",
           "username": "myuser",
           "password": "mypass",
           "to": "456",
           "msg": "here are accents: é ç"
          }

headers = {'Content-Type': 'application/json'}
data = json.dumps(payload, ensure_ascii=False).encode('utf8')
resp = requests.post("http://192.168.1.10/send_message.html", data=data, headers=headers)
Answered By: Mark Tolonen
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.