In Cpython implementation, are strings an array of characters or array of references/pointers like python lists?

Question:

I went through the post https://rushter.com/blog/python-strings-and-memory/

Based on that article,

  • Depending on the type of characters in a string, each character in that string would be represented using either 1/2/4 bytes
  • Since the address length of each such character is fixed (either 1/2/4), we can find the address of index i using starting_pos_address + no_of_bytes*index

But the below code kinda contradicts this model of string being stored as a contiguous block of characters, but more like an array of references/pointers to individual characters/strings since o in both the strings point to the same object

>>> s1 = "hello"
>>> s2 = "world"
>>> id(s1[4])
140195535215024
>>> id(s2[1])
140195535215024

So, should I see string as an array of characters or array of references to character objects?

Asked By: learner

||

Answers:

The key piece of information can be read in this answer to a similiar question – "Indexing into a string creates a new string" – which means, both s1[4] and s2[1] create new string, "o". Because strings are interned, Python optimalizes the reference to point to the same object in memory, which is not necessarily the character than was part of any of the original string.

So yes, strings are stored as arrays of characters

Answered By: matszwecja
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.