How to modify imported source code on-the-fly?

Question:

Suppose I have a module file like this:

# my_module.py
print("hello")

Then I have a simple script:

# my_script.py
import my_module

This will print "hello".

Let’s say I want to “override” the print() function so it returns "world" instead. How could I do this programmatically (without manually modifying my_module.py)?


What I thought is that I need somehow to modify the source code of my_module before or while importing it. Obvisouly, I cannot do this after importing it so solution using unittest.mock are impossible.

I also thought I could read the file my_module.py, perform modification, then load it. But this is ugly, as it will not work if the module is located somewhere else.

The good solution, I think, is to make use of importlib.

I read the doc and found a very intersecting method: get_source(fullname). I thought I could just override it:

def get_source(fullname):
    source = super().get_source(fullname)
    source = source.replace("hello", "world")
    return source

Unfortunately, I am a bit lost with all these abstract classes and I do not know how to perform this properly.

I tried vainly:

spec = importlib.util.find_spec("my_module")
spec.loader.get_source = mocked_get_source
module = importlib.util.module_from_spec(spec)

Any help would be welcome, please.

Asked By: Delgan

||

Answers:

Not elegant, but works for me (may have to add a path):

with open ('my_module.py') as aFile:
    exec (aFile.read () .replace (<something>, <something else>))
Answered By: Jacques de Hooge

If importing the module before the patching it is okay, then a possible solution would be

import inspect

import my_module

source = inspect.getsource(my_module)
new_source = source.replace('"hello"', '"world"')
exec(new_source, my_module.__dict__)

If you’re after a more general solution, then you can also take a look at the approach I used in another answer a while ago.

Answered By: Martin Valgur

Here’s a solution based on the content of this great talk. It allows any arbitrary modifications to be made to the source before importing the specified module. It should be reasonably correct as long as the slides did not omit anything important. This will only work on Python 3.5+.

import importlib
import sys

def modify_and_import(module_name, package, modification_func):
    spec = importlib.util.find_spec(module_name, package)
    source = spec.loader.get_source(module_name)
    new_source = modification_func(source)
    module = importlib.util.module_from_spec(spec)
    codeobj = compile(new_source, module.__spec__.origin, 'exec')
    exec(codeobj, module.__dict__)
    sys.modules[module_name] = module
    return module

So, using this you can do

my_module = modify_and_import("my_module", None, lambda src: src.replace("hello", "world"))
Answered By: Martin Valgur

This doesn’t answer the general question of dynamically modifying the source code of an imported module, but to “Override” or “monkey-patch” its use of the print() function can be done (since it’s a built-in function in Python 3.x). Here’s how:

#!/usr/bin/env python3
# my_script.py

import builtins

_print = builtins.print

def my_print(*args, **kwargs):
    _print('In my_print: ', end='')
    return _print(*args, **kwargs)

builtins.print = my_print

import my_module  # -> In my_print: hello
Answered By: martineau

I first needed to better understand the import operation. Fortunately, this is well explained in the importlib documentation and scratching through the source code helped too.

This import process is actually split in two parts. First, a finder is in charge of parsing the module name (including dot-separated packages) and instantiating an appropriate loader. Indeed, built-in are not imported as local modules for example. Then, the loader is called based on what the finder returned. This loader get the source from a file or from a cache, and executed the code if the module was not previously loaded.

This is very simple. This explains why I actually did not need to use abstract classes from importutil.abc: I do not want to provide my own import process. Instead, I could create a subclass inherited from one of the classes from importuil.machinery and override get_source() from SourceFileLoader for example. However, this is not the way to go because the loader is instantiated by the finder so I do not have the hand on which class is used. I cannot specify that my subclass should be used.

So, the best solution is to let the finder do its job, and then replace the get_source() method of whatever Loader has been instantiated.

Unfortunately, by looking trough the code source I saw that the basic Loaders are not using get_source() (which is only used by the the inspect module). So my whole idea could not work.

In the end, I guess get_source() should be called manually, then the returned source should be modified, and finally the code should be executed. This is what Martin Valgur detailed in his answer.

If compatibility with Python 2 is needed, I see no other way than reading the source file:

import imp
import sys
import types

module_name = "my_module"

file, pathname, description = imp.find_module(module_name)

with open(pathname) as f:
    source = f.read()

source = source.replace('hello', 'world')

module = types.ModuleType(module_name)
exec(source, module.__dict__)

sys.modules[module_name] = module
Answered By: Delgan

My solution updates the source file, which works for the inner import situation. The inner import means that transformers.models.albert import modeling_albert from the source file. In such case, even if I use the solution from Martin Valgur, it won’t work. So I update the source file. Hope it help the people who have the same trouble with me.

import inspect
from transformers.models.albert import modeling_albert

# Get source
source = inspect.getsource(modeling_albert)
source_before = "AlbertModel(config, add_pooling_layer=False)"
source_after = "AlbertModel(config, add_pooling_layer=True)"
new_source = source.replace(source_before, source_after)

# Update file
file_path = modeling_albert.__spec__.origin
with open(file_path, 'w') as f:
    f.write(new_source)
Answered By: BrambleXu