Passing utf-16 string to a Windows function

Question:

I have a Windows dll called some.dll with the following function:

void some_func(TCHAR* input_string)
{
...
}

some_func expects a pointer to utf-16 encoded string.

Running this python code:

from ctypes import *

some_string = "disco duck"
param_to_some_func = c_wchar_p(some_string.encode('utf-16'))  #  here exception!

some_dll = ctypes.WinDLL(some.dll)
some_dll.some_func(param_to_some_func)

fails with exception “unicode string or integer address expected instead of bytes instance

The documentation for ctypes and ctypes.wintypes is very thin, and I have not found a way to convert a python string to a Windows wide char and pass it to a function.

Answers:

According to [Python 3.Docs]: Built-in Types – Text Sequence Type – str (emphasis is mine):

Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.

On Win they are UTF16 encoded.

So, the correspondence between CTypes and Python (also visible by checking the differences between):

╔═══════════════╦══════════════╦══════════════╗
║    CTypes     ║   Python 3   ║   Python 2   ║
╠═══════════════╬══════════════╬══════════════╣
║   c_char_p    ║    bytes     ║     str      ║
║   c_wchar_p   ║     str      ║   unicode    ║
╚═══════════════╩══════════════╩══════════════╝

Example:

  • Python 3:

    >>> import ctypes as cts
    >>> import sys
    >>>
    >>> sys.version
    '3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]'
    >>>
    >>> text_ascii = b"Dummy"
    >>> text_unicode = "Dummy"
    >>>
    >>> cts.c_char_p(text_ascii)
    c_char_p(2563882450144)
    >>>
    >>> cts.c_wchar_p(text_ascii)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unicode string or integer address expected instead of bytes instance
    >>>
    >>> cts.c_char_p(text_unicode)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: bytes or integer address expected instead of str instance
    >>>
    >>> cts.c_wchar_p(text_unicode)
    c_wchar_p(2563878400656)
    
  • Python 2 (note that str <=> unicode conversions are performed automatically):

    >>> import ctypes as cts
    >>> import sys
    >>>
    >>> sys.version
    '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]'
    >>>
    >>> text_ascii = "Dummy"
    >>> text_unicode = u"Dummy"
    >>>
    >>> cts.c_char_p(text_ascii)
    c_char_p('Dummy')
    >>>
    >>> cts.c_wchar_p(text_ascii)
    c_wchar_p(u'Dummy')
    >>>
    >>> cts.c_char_p(text_unicode)
    c_char_p('Dummy')
    >>>
    >>> cts.c_wchar_p(text_unicode)
    c_wchar_p(u'Dummy')
    

Back to your situation:

>>> import ctypes as cts
>>>
>>> some_string = "disco duck"
>>>
>>> enc_utf16 = some_string.encode("utf16")
>>> enc_utf16
b'xffxfedx00ix00sx00cx00ox00 x00dx00ux00cx00kx00'
>>>
>>> type(some_string), type(enc_utf16)
(<class 'str'>, <class 'bytes'>)
>>>
>>> cts.c_wchar_p(some_string)  # This is the right way
c_wchar_p(2508534214928)
>>>
>>> cts.c_wchar_p(enc_utf16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unicode string or integer address expected instead of bytes instance

As a side note, TCHAR varies (it’s a typedef) on _UNICODE (not) being defined. Check [MS.Learn]: Generic-Text Mappings in tchar.h for more details. So, depending on the C code compilation flags, the Python code might also need adjustments.

You could also check:

Answered By: CristiFati
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.