PyUnicode_FromStringAndSize: Very terse documentation

Question:

Apologies if this is a stupid question which I suspect it may well be. I’m a Python user with little experience in C.

According to the official Python docs (v3.10.6):

PyObject *PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Return value: New reference. Part of the Stable ABI.

Create a Unicode object from the char buffer u. The bytes will be interpreted as being UTF-8 encoded. The buffer is copied into the new object. If the buffer is not NULL, the return value might be a shared object, i.e. modification of the data is not allowed. […]

which has me slightly confused.

It says the data i.e. the buffer u is copied.
But then it says the data may be shared which seems to contradict the first statement.

My Question is:

What exactly do they mean? That the newly allocated copy of the data is shared? If so who with?

Also, coming from Python: Why do they make a point of warning against tampering with the data anyway? Is changing a Python-immutable object something routinely done in C?

Ultimately, all I need to know is what to do with u: Can/should I free it or has Python taken ownership?

Asked By: loopy walt

||

Answers:

Ultimately, all I need to know is what to do with u: Can/should I free it or has Python taken ownership?

You still own u. Python has no idea where u came from or how it should be freed. It could even be a local array. Python will not retain a pointer to u. Cleaning up u is still your responsibility.

What exactly do they mean? That the newly allocated copy of the data is shared? If so who with?

The returned string object may be shared with arbitrary other code. Python makes no promises about how that might happen, but in the current implementation, one way is that a single-character ASCII string will be drawn from a cached array of such strings:

/* ASCII is equivalent to the first 128 ordinals in Unicode. */
if (size == 1 && (unsigned char)s[0] < 128) {
    if (consumed) {
        *consumed = 1;
    }
    return get_latin1_char((unsigned char)s[0]);
}

Also, coming from Python: Why do they make a point of warning against tampering with the data anyway? Is changing a Python-immutable object something routinely done in C?

It is in fact fairly routine. Python-immutable objects have to be initialized somehow, and that means writing to their memory, in C. Immutability is an abstraction presented at the Python level, but the physical memory of an object is mutable. However, such mutation is only safe in very limited circumstances, and one of the requirements is that no other code should hold any references to the object.

Answered By: user2357112
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.