Get python function source excluding the docstring?

Question:

You might want to have the docstring not affect the hash for example like in joblib memory.

Is there a good way of stripping the docstring? inspect.getsource and inspect.getdoc kind of fight each other: the docstring is “cleaned” in one.

Asked By: mathtick

||

Answers:

One approach is to delete the docstring from the source using regex:

nodoc = re.sub(":s'''.*?'''", "", source)
nodoc = re.sub(':s""".*?"""', "", nodoc)

currently works for functions and classes only, maybe someone finds a pattern for modules too

Answered By: bb1950328

If you just want to hash the body of a function, regardless of the docstring, you can use the function.__code__ attribute.

It gives access to a code object which is not affected by the docstring.

unfortunately, using this, you will not be able to get a readable version of the source

def foo():
    """Prints 'foo'"""
    print('foo')


print(foo.__doc__)  # Prints 'foo'
print(foo.__code__.co_code)  # b'tx00dx01x83x01x01x00dx02Sx00'
foo.__doc__ += 'pouet'
print(foo.__doc__)  # Prints 'foo'pouet
print(foo.__code__.co_code)  # b'tx00dx01x83x01x01x00dx02Sx00'
Answered By: Tryph

There is a simple solution

def fun(a,b):
    '''hahah'''
    return a+b
# we simply delete the docstring
fun.__doc__ = ''
print(help(fun))

this code yields:

Help on function fun in module __main__:

fun(a, b)
Answered By: fccoelho

In case anyone is still looking for a solution for this, this is how I managed to build it:

from ast import Module, Expr, FunctionDef, parse
from inspect import getsource
from textwrap import dedent
from types import FunctionType
from typing import cast


def get_source_without_docstring(obj: FunctionType) -> str:
    # Get cleanly indented source code of the function
    obj_source = dedent(getsource(obj))

    # Parse the source code into an Abstract Syntax Tree.
    # The root of this tree is a Module node.
    module: Module = parse(obj_source)

    # The first child of a Module node is FunctionDef node that represents
    # the function definition. We cast module.body[0] to FunctionDef for type safety.
    function_def = cast(FunctionDef, module.body[0])

    # The first statement of a function could be a docstring, which in AST
    # is represented as an Expr node. To remove the docstring, we need to find
    # this Expr node.
    first_stmt = function_def.body[0]

    # Check if first statement is an expression (docstring is an expression)
    if isinstance(first_stmt, Expr):
        # Split the original source code by lines
        code_lines: list[str] = obj_source.splitlines()

        # Delete the lines corresponding to the docstring from the list.
        # Note: We are using 0-based list index, but the line numbers in the
        # parsed AST nodes are 1-based. So, we need to subtract 1 from the
        # 'lineno' property of the node.
        del code_lines[first_stmt.lineno - 1 : first_stmt.end_lineno]

        # Join the remaining lines back into a single string
        obj_source = "n".join(code_lines)

    # Return the source code of function without docstrings
    return obj_source

Note: code by myself, comments by OpenAI’s GPT

Answered By: Leandro Lima
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.