How to log CPU instructions executed by a Python program?

Question:

I understand that Python source code is compiled into bytecode which is then interpreted by the Python VM (let’s say CPython). If I understand correctly, this mean that the VM parses the bytecode instructions and decides (at runtime) what CPU instructions should be executed accordingly.

My questions:

  • Is it possible to log the actual CPU instructions executed on your machine as a result of the interpretation of a particular Python file (.py)? I understand it might not be simple (or even feasible) to get a 1-1 correspondence between a .py file and CPU instructions, but what is the closest you can get?
  • Going a step further: Is it even possible to log the instructions executed that correspond to a particular process?
Asked By: rect0x51

||

Answers:

use strace on linux, it will show you every system call made by any program (including python). On windows you have to use something like wt or maybe Logger.exe which traces all library calls (not just system).

You can use a debugger like gdb to look at the machine code in realtime, and since you have CPython source code, a better alternative is to just compile it with debugging symbols then run it in a C debugger, that can give you a high-level call stack, which will be a lot easier to understand.

Answered By: nosklo

Here’s how I looked at the CPU instructions of CPython using lldb (it’s similar to gdb) through VS Code on my ARM CPU (macOS 13 running on Apple silicon):

  1. Install the CodeLLDB VS Code extension

  2. git clone https://github.com/python/cpython.git

  3. cd cpython

  4. git checkout 20cf32e761 (because the latest code might be broken)

  5. Compile CPython (see the README file or how they build it for testing for details), which will generate an executable file called "python.exe":

brew install pkg-config [email protected] xz gdbm tcl-tk

CFLAGS="-I$(brew --prefix gdbm)/include -I$(brew --prefix xz)/include" 
LDFLAGS="-L$(brew --prefix gdbm)/lib -I$(brew --prefix xz)/lib" 
PKG_CONFIG_PATH="$(brew --prefix tcl-tk)/lib/pkgconfig" 
./configure 
  --with-pydebug 
  --with-openssl="$(brew --prefix [email protected])"

make
  1. Open the CPython source code directory in VS Code (code .) and add a .vscode/launch.json file with these contents (set args to the arguments you want to pass to the python executable, in this example I’m running python -c '30858 + 7', which just gets Python to add 30858 + 7 and exit without printing anything):
{
    "version": "0.2.0",
    "configurations": [
        {
            "type": "lldb",
            "request": "launch",
            "name": "Debug",
            "program": "${workspaceFolder}/python.exe",
            // Add two integers
            "args": ["-c", "30858 + 7"],
            "cwd": "${workspaceFolder}"
        }
    ]
}
  1. Add a breakpoint somewhere. Interesting places to put a breakpoint are the main() function or the evaluation loop (a giant switch statement that looks at the opcode) which will be in generated_cases.c.h (it’s not in the source code) or for my 30858 + 7 example, _PyLong_Add() which is what implements addition.

  2. Press F5 to run the command

At this point you should’ve hit the breakpoint and be able to see where the execution is in the C code. To look at the current assembly instruction, hit Cmd+Shift+P and execute the "LLDB: Show Disassembly…" command and choose "always", you can go back to the C view by running the command again and choosing "auto". You can see the register values under the "Registers" dropdown in the top left, under where the current variable values are.

Here’s the machine instruction that is adding 30858 (788A in hex), stored in register x20 to 7, stored in register x0, and storing the result in register x0 (I modified the code a bit to put the addition on a separate line to make it easier to set the breakpoint exactly at the add instruction):

screenshot of VS Code

Answered By: Boris Verkhovskiy