How can a non-assigned string in Python have an address in memory?

Question:

Can someone explain this to me? So I’ve been playing with the id() command in python and came across this:

>>> id('cat')
5181152
>>> a = 'cat'
>>> b = 'cat'
>>> id(a)
5181152
>>> id(b)
5181152

This makes some sense to me except for one part: The string ‘cat’ has an address in memory before I assign it to a variable. I probably just don’t understand how memory addressing works but can someone explain this to me or at least tell me that I should read up on memory addressing?

So that is all well and good but this confused me further:

>>> a = a[0:2]+'t'
>>> a
'cat'
>>> id(a)
39964224
>>> id('cat')
5181152

This struck me as weird because ‘cat’ is a string with an address of 5181152 but the new a has a different address. So if there are two ‘cat’ strings in memory why aren’t two addresses printed for id(‘cat’)? My last thought was that the concatenation had something to do with the change in address so I tried this:

>>> id(b[0:2]+'t')
39921024
>>> b = b[0:2]+'t'
>>> b
'cat'
>>> id(b)
40000896

I would have predicted the IDs to be the same but that was not the case. Thoughts?

Asked By: Usagi

||

Answers:

'cat' has an address because you create it in order to pass it to id(). You haven’t yet bound it to a name, but the object still exists.

Python caches and reuses short strings. But if you assemble strings by concatenation, then the code that searches the cache and attempts re-use is bypassed.

Note that the inner workings of the string cache is pure implementation detail and should not be relied upon.

Answered By: David Heffernan

All values must reside somewhere in memory. This is why id('cat') produces a value. You call it a “non-existent” string, but it clearly does exist, it just hasn’t been assigned to a name yet.

Strings are immutable, so the interpreter can do clever things like make all instances of the literal 'cat' be the same object, so that id(a) and id(b) are the same.

Operating on strings will produce new strings. These may or may not be the same strings as previous strings with the same content.

Note that all of these details are implementations details of CPython, and they can change at any time. You don’t need to be concerned with these issues in actual programs.

Answered By: Ned Batchelder

Python reuses string literals fairly aggressively. The rules by which it does so are implementation-dependent, but CPython uses two that I’m aware of:

  • Strings that contain only characters valid in Python identifiers are interned, which means they are stored in a big table and reused wherever they occur. So, no matter where you use "cat", it always refers to the same string object.
  • String literals in the same code block are reused regardless of their content and length. If you put a string literal of the entire Gettysburg Address in a function, twice, it’s the same string object both times. In separate functions, they are different objects:
    def foo(): return "pack my box with five dozen liquor jugs"
    def bar(): return "pack my box with five dozen liquor jugs"
    assert foo() is bar() # AssertionError

Both optimizations are done at compile time (that is, when the bytecode is generated).

On the other hand, something like chr(99) + chr(97) + chr(116) is a string expression that evaluates to the string "cat". In a dynamic language like Python, its value can’t be known at compile time (chr() is a built-in function, but you might have reassigned it) so it normally isn’t interned. Thus its id() is different from that of "cat". However, you can force a string to be interned using the intern() function. Thus:

id(intern(chr(99) + chr(97) + chr(116))) == id("cat")   # True

As others have mentioned, interning is possible because strings are immutable. It isn’t possible to change "cat" to "dog", in other words. You have to generate a new string object, which means that there’s no danger that other names pointing to the same string will be affected.

Just as an aside, Python also converts expressions containing only constants (like "c" + "a" + "t") to constants at compile time, as the below disassembly shows. These will be optimized to point to identical string objects per the rules above.

>>> def foo(): "c" + "a" + "t"
...
>>> from dis import dis; dis(foo)
  1           0 LOAD_CONST               5 ('cat')
              3 POP_TOP
              4 LOAD_CONST               0 (None)
              7 RETURN_VALUE
Answered By: kindall

Python variables are rather unlike variables in other languages (say, C).

In many other languages, a variable is a name for a location in memory. In these languages, Different kinds of variables can refer to different kinds of locations, and the same location could be given multiple names. For the most part, a given memory location can have the data change from time to time. There are also ways to refer to memory locations indirectly (int *p would contain the address, and in the memory location at that address, there’s an integer.) But a the actual location a variable references cannot change; The variable is the location. A variable assignment in these languages is effectively “Look up the location for this variable, and copy this data into that location”

Python doesn’t work that way. In python, actual objects go in some memory location, and variables are like tags for locations. Python manages the stored values in a separate way from how it manages the variables. Essentially, an assignment in python means “Look up the information for this variable, forget the location it already refers to, and replace that with this new location”. No data is copied.

A common feature of langauges that work like python (as opposed to the first kind we were talking about earlier) is that some kinds of objects are managed in a special way; identical values are cached so that they don’t take up extra memory, and so that they can be compared very easily (if they have the same address, they are equal). This process is called interning; All python string literals are interned (in addition to a few other types), although dynamically created strings may not be.

In your exact code, The semantic dialog would be:

# before anything, since 'cat' is a literal constant, add it to the intern cache
>>> id('cat') # grab the constant 'cat' from the intern cache and look up 
              # it's address
5181152
>>> a = 'cat' # grab the constant 'cat' from the intern cache and 
              # make the variable "a" point to it's location 
>>> b = 'cat' # do the same thing with the variable "b"
>>> id(a) # look up the object "a" currently points to, 
          # then look up that object's address
5181152
>>> id(b) # look up the object "b" currently points to, 
          # then look up that object's address
5181152

The code you posted creates new strings as intermediate objects. These created strings eventually have the same contents as your originals. In the intermediate time period, they do not exactly match the original, and must be kept at a distinct address.

>>> id('cat')
5181152

As others have answered, by issuing these instructions, you cause the Python VM to create a string object containing the string “cat”. This string object is cached and is at address 5181152.

>>> a = 'cat'
>>> id(a)
5181152

Again, a has been assigned to refer to this cached string object at 5181152, containing “cat”.

>>> a = a[0:2]
>>> id(a)
27731511

At this point in my modified version of your program, you have created two small string objects: 'cat' and 'ca'. 'cat' still exists in the cache. The string to which a refers is a different and probably novel string object, containing the characters 'ca'.

>>> a = a + 't'
>>> id(a)
39964224

Now you have created another new string object. This object is the concatenation of the string 'ca' at address 27731511, and the string 't'. This concatenation does match the previously-cached string 'cat'. Python does not automatically detect this case. As kindall indicated, you can force the search with the intern() method.

Hopefully this explanation illuminates the steps by which the address of a changed.

Your code did not include the intermediate state with a assigned the string 'ca'. The answer still applies, because the Python interpreter does generate a new string object to hold the intermediate result a[0:2], whether you assign that intermediate result to a variable or not.

Answered By: Heath Hunnicutt
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.