How to get subprocess output and maintain encoding

Question:

I am most likely missing some really easy, but I can’t wrap my head around why what seems to work for everyone else doesnt work for me.

Goal: I want to run shell commands with native output in non-english characters, capture the output in a variable then print to screen.

Problem: All my output that should have the non-english characters are replaced with ? marks.

Thoughts: is there an encoding issue? I am running python 3.8, shouldnt be!! Also running Windows 10, but also happens in Windows 7 and Server 2008.

>>> p=subprocess.run("dir",shell=True,encoding="utf8")                     
 Volume in drive C has no label.
 Volume Serial Number is A22B-FA10

 Directory of C:UsersjeronimoDocumentsGithub

04/24/2021  08:17 AM    <DIR>          .
04/24/2021  08:17 AM    <DIR>          ..
07/21/2020  09:37 PM    <DIR>          scripts
04/24/2021  08:09 AM    <DIR>          **Администратор**
               1 File(s)            295 bytes
              11 Dir(s)  151,978,950,656 bytes free

>>> p=subprocess.run("dir",capture_output=True,shell=True,encoding="utf8")
>>> p.stdout
' Volume in drive C has no label.n Volume Serial Number is A22B-FA10nn Directory of C:\Users\jeronimo\Documents\Githubnn04/24/2021  08:17 AM    <DIR>          .n04/24/2021  08:17 AM    <DIR>    
      ..n05/18/2020  01:24 PM scriptsn04/24/2021  08:09 AM    <DIR>          **?????????????**n               1 File(s)            295 bytesn              11 Dir(s)  151,976,796,160 bytes freen'

>>> print(p.stdout)
 Volume in drive C has no label.
 Volume Serial Number is A22B-FA10

 Directory of C:UsersjeronimoDocumentsGithub

04/24/2021  08:17 AM    <DIR>          .
04/24/2021  08:17 AM    <DIR>          ..
07/21/2020  09:37 PM    <DIR>          scripts
04/24/2021  08:09 AM    <DIR>          **?????????????**
               1 File(s)            295 bytes
              11 Dir(s)  151,976,796,160 bytes free

EDIT: I’ve tried piping out to a file:

>>> f=open('file','a+',encoding='utf-8')                                              
>>> p=subprocess.call("dir",shell=True,encoding="utf8",stdout=f)  
>>> f.close()

Volume in drive C has no label.
Volume Serial Number is A22B-FA10
Directory of C:UsersjeronimoDocumentsGithub
04/24/2021  11:49 AM    <DIR>          .
04/24/2021  11:49 AM    <DIR>          ..
07/21/2020  09:37 PM    <DIR>          scripts
04/24/2021  08:09 AM    <DIR>          ?????????????
               1 File(s)              0 bytes
              11 Dir(s)  151,974,350,848 bytes free

I’ve tried many variations of subprocess – popen, run, check_output, call – all give the same result. What the heck am i doing wrong?

Asked By: midnightseer

||

Answers:

Solved if I change the terminal coding before running subprocess AND specified utf-8 encoding in the subprocess call

os.system('chcp 65001')
output = subprocess.run(data, timeout=10, encoding="utf8", shell=True, stdin=subprocess.DEVNULL,stderr=subprocess.PIPE,stdout=subprocess.PIPE)
Answered By: midnightseer

A quick ‘module’ for this task. Should work on any Windows…

I noticed that every windows encoding starts with cp and ends with a bunch of numbers. We can get the current encoding writing chcp in cmd.exe. Go ahead, try this out.

The output should look like this in Russian:

Текущая кодовая страница: 866

or this (utf-8):

Active code page: 65001

It will be in the stdout of the subprocess.run call. We do not care for the letters (and they will be unreadable, since we do not know the encoding), but we will get the numbers with _REGEX and store them in a module-wide cache variable _CP_CODE.

After this we know the encoding and we’ll be using run function without any problems. It will always return valid strings inside stdout and stderr.

import subprocess
import re


_CP_CODE = None
_REGEX = re.compile(br".+: (d+)s*$")


def get_cp_code():
    stdout = subprocess.run(
        "chcp", shell=True,
        stdout=subprocess.PIPE, stderr=subprocess.DEVNULL,
        stdin=subprocess.DEVNULL
    ).stdout
    result = re.search(_REGEX, stdout)
    if result is None:
        raise ValueError(stdout)
    else:
        return int(result.group(1))


def run(cmd, **kwargs):
    global _CP_CODE
    if _CP_CODE is None:
        _CP_CODE = get_cp_code()
    return subprocess.run(
        cmd,
        shell=True,
        encoding=f'cp{_CP_CODE}',
        **kwargs
    )


if __name__ == "__main__":
    command = f"TASKKILL /F /PID 12345 /T"
    res = run(command)
    print(res.stderr)

Answered By: winwin
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.