How to track the "calling chain" from numpy to C implementation?

Question:

I have read the tutorial and API guide of Numpy, and I learned how to extend Numpy with my own C code or how to use C to call Numpy function from this helpful documentation.

However, what I really want to know is: how could I track the calling chain from python code to C implementation? Or i.e. how could I know which part of its C implementation corresponds to this simple numpy array addition?

x = np.array([1, 2, 3])
y = np.array([1, 2, 3])
print(x + y)

Can I use some tools like gdb to track its stack frame step by step?

Or can I directly recognize the corresponding codes from variable naming policy? (like if I want to know the code about addition, I can search for something like function PyNumpyArrayAdd(…) )


[EDIT] I found a very useful video about how to point out the C implementation of these basic C-implemented function or operator overrides like "+" "-".

https://www.youtube.com/watch?v=mTWpBf1zewc

Got this from Andras Deak via Numpy mailing-list.

Asked By: 27rabbit

||

Answers:

However, what I really want to know is: how could I track the calling chain from python code to C implementation? Or i.e. how could I know which part of its C implementation corresponds to this simple numpy array addition?

AFAIK, there is two main way to do that: using a debugger or by tracking the function in the code (typically by looking the wrapping part or by searching keywords in numpy/core/src/XXX/). Numpy has different kind of functions. Some are focusing more on the CPython interaction part (eg. type checking, array creation, generic iterators, etc.) and some are focusing on the computing part (doing the computation efficiently). Regarding what you want, different files needs to be inspected. core/src/umath/loops.c.src is the way to go for core computing functions doing basic independent math operations.

Can I use some tools like gdb to track its stack frame step by step?

Using a debugger is the common way to do unless you are familiar with the code of Numpy. You can try to find the Numpy entry point function by looking the wrapper code but I think it is a bit difficult as this part of the code is not very readable (many related parts are generated certainly to ease the development of avoid mistakes). The hard part with GDB is to find the first entry point of the function in Numpy (the CPython interpreter function calls are hard to track as they are many of them (sometime called recursively) and the call stack is quite big far from being clear (ie. there is no clear information about the actual statement/expression being executed). That being said, AFAIR, the entry point is often something like PyArray_XXX or array_XXX. You can also track the first function executing code of the Numpy library.

Or can I directly recognize the corresponding codes from variable naming policy?

Some functions have a standardized name like typically PyArray_XXX. That being said, core computing function generally does not. They have a name generated by a template system that parse comments and annotations and generate code based on that. For adding two array, the main computing function should be for example @TYPE@_add@isa@ where @TYPE@ is either INT or LONG regarding your target platform. There is a special version (ie. specialization) for floating-point numbers that makes use of an optimized pair-wise summation so sake of accuracy. This kind of naming convention is quite frequent though so you can search _add in the code or a begin repeat section with add as a kind parameter.


Related post: Numpy argmax source

Answered By: Jérôme Richard
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.