Automatically Update Python source code (imports)

Question:

We are refactoring our code base.

Old:

from a.b import foo_method

New:

from b.d import bar_method

Both methods (foo_method() and bar_method()) are the same. It just changed the name an the package.

Since above example is just one example of many ways a method can be imported, I don’t think a simple regular expression can help here.

How to refactor the importing of a module with a command line tool?

A lot of source code lines need to be changed, so that an IDE does not help here.

Asked By: guettli

||

Answers:

You’ll need to write/find some script that will do text replacement of all occurrences in some folder. I remember that Notepad++ could do that.

But as you mentioned, that regexp will not help here, then no scripts (even Open Source) will help here too. And you’ll definitely need to have some intelligence here, that will build index of your dependencies/modules/files/packages/etc. and will be able to manipulate them on that level. And that is the purpose for which IDE were built for.

You can choose any that you like: PyCharm, Sublime, Visual Studio or any other that are not just text editor but something that has refactoring functionality.


In any case, I would suggest you to do following refactoring steps:

  • rename old methods and their usages to new names
  • then just replace package path & names in imports with newer versions
Answered By: wowkin2

In cases where there is no obvious way to solve a batch editing problem, doing the next best thing with the addition of some manual work can work just as well.

As you’ve mentioned in your post:

Since above example is just one example of many ways a method can be imported, I don’t think a simple regular expression can help here.

I would recommend using a regular expression, while still printing out potential matches in case they are relevant:

def potential(line):
    # This is just a minimal example; replace with more reliable expression
    return "foo_method" in line or "a.b" in line 

matches = ["from a.b import foo_method"] # Add more to the list if necessary
new = "from b.d import bar_method" 
# new = "from b.d import bar_method as foo_method"

file = "file.py"
result = ""

with open(file) as f:
    for line in f:
        for match in matches:
            if match in line:
                result += line.replace(match, new)
                break
        else:
            if potential(line):
                print(line)

                # Here is the part where you manually check lines that potentially needs editing
                new_line = input("Replace with... (leave blank to ignore) ")
                if new_line:
                    result += new_line + "n"
                    continue
            result += line
                    
with open(file, "w") as f:
    f.write(result) 

Also, this goes without saying, but always make sure to create at least one backup of your original code base/project before doing such alterations.

But I don’t really think there will be too many complications on the different ways to import the method given that the code base was developed in proper PEP-8, as from What are all the ways to import modules in Python?:

The only ways that matter for ordinary usage are the first three ways listed on that page:

  • import module
  • from module import this, that, tother
  • from module import *

Finally, to avoid complications in renaming each instance the files call the foo_method from foo_method to bar_method, I would recommend importing the newly named bar_method as foo_method, using the as keyword, of course.

Answered By: Ann Zen

A programmatic solution is to convert each file into a syntax tree, identify parts that meet your criteria and transform them. You can do this with Python’s ast module, but it doesn’t preserve whitespace or comments. There are also libraries that preserve these features because they operate on concrete (or lossless) rather than abstract syntax trees.

Red Baron is one such tool, but it does not support Python 3.8+, and looks to be unmaintained (last commit in 2019). libcst is another, and I’ll use it in this answer (disclaimer: I am not associated with the libcst project). Note that libcst does not yet support Python 3.10+.

The code below uses a Transformer that can identify

  • from a.b import foo_method statements
  • function calls where the function is named foo_method

and transforms the identified nodes into

  • from b.d import bar_method
  • bar_method

In the transformer class we specify methods named leave_Node where Node is the type of node that we want to inspect and transform (we can also specify visit_Node methods, but we don’t need them for this example). Within the methods we use matchers to check that the nodes match our criteria for transformation.

import libcst as cst
import libcst.matchers as m


src = """
import foo
from a.b import foo_method


class C:
    def do_something(self, x):
        return foo_method(x)
"""


class ImportFixer(cst.CSTTransformer):
    def leave_SimpleStatementLine(self, orignal_node, updated_node):
        """Replace imports that match our criteria."""
        if m.matches(updated_node.body[0], m.ImportFrom()):
            import_from = updated_node.body[0]
            if m.matches(
                import_from.module,
                m.Attribute(value=m.Name('a'), attr=m.Name('b')),
            ):
                if m.matches(
                    import_from.names[0],
                    m.ImportAlias(name=m.Name('foo_method')),
                ):
                    # Note that when matching we use m.Node,
                    # but when replacing we use cst.Node.
                    return updated_node.with_changes(
                        body=[
                            cst.ImportFrom(
                                module=cst.Attribute(
                                    value=cst.Name('b'), attr=cst.Name('d')
                                ),
                                names=[
                                    cst.ImportAlias(
                                        name=cst.Name('bar_method')
                                    )
                                ],
                            )
                        ]
                    )
        return updated_node

    def leave_Call(self, original_node, updated_node):
        if m.matches(updated_node, m.Call(func=m.Name('foo_method'))):
            return updated_node.with_changes(func=cst.Name('bar_method'))
        return updated_node


source_tree = cst.parse_module(src)
transformer = ImportFixer()
modified_tree = source_tree.visit(transformer)
print(modified_tree.code)

Output:

import foo
from b.d import bar_method


class C:
    def do_something(self, x):
        return bar_method(x)

You can use libcst‘s parsing helpers in the Python REPL to view and work with the node trees for modules, statements and expressions. This is usually the best way to work out which nodes to transform and what needs to be matched.

libcst provides a framework named codemods to support refactoring large codebases.

Answered By: snakecharmerb

Behind the scenes, IDEs are no much more than text editors with bunch of windows and attached binaries to make different kind of jobs, like compiling, debugging, tagging code, linting, etc. Eventually one of those libraries can be used to refactor code. One such library is Jedi, but there is one that was specifically made to handle refactoring, which is rope.

pip3 install rope

A CLI solution

You can try using their API, but since you asked for a command line tool and there wasn’t one, save the following file anywhere reachable (a known relative folder your user bin, etc) and make it executable chmod +x pyrename.py.

#!/usr/bin/env python3
from rope.base.project import Project
from rope.refactor.rename import Rename
from argparse import ArgumentParser

def renamodule(old, new):
    prj.do(Rename(prj, prj.find_module(old)).get_changes(new))

def renamethod(mod, old, new, instance=None):
    mod = prj.find_module(mod)
    modtxt = mod.read()
    pos, inst = -1, 0
    while True:
        pos = modtxt.find('def '+old+'(', pos+1)
        if pos < 0:
            if instance is None and prepos > 0:
                pos = prepos+4 # instance=None and only one instance found
                break
            print('found', inst, 'instances of method', old+',', ('tell which to rename by using an extra integer argument in the range 0..' if (instance is None) else 'could not use instance=')+str(inst-1))
            pos = -1
            break
        if (type(instance) is int) and inst == instance:
            pos += 4
            break # found
        if instance is None:
            if inst == 0:
                prepos = pos
            else:
                prepos = -1
        inst += 1
    if pos > 0:
        prj.do(Rename(prj, mod, pos).get_changes(new))

argparser = ArgumentParser()
#argparser.add_argument('moduleormethod', choices=['module', 'method'], help='choose between module or method')
subparsers = argparser.add_subparsers()
subparsermod = subparsers.add_parser('module', help='moduledottedpath newname')
subparsermod.add_argument('moduledottedpath', help='old module full dotted path')
subparsermod.add_argument('newname', help='new module name only')
subparsermet = subparsers.add_parser('method', help='moduledottedpath oldname newname')
subparsermet.add_argument('moduledottedpath', help='module full dotted path')
subparsermet.add_argument('oldname', help='old method name')
subparsermet.add_argument('newname', help='new method name')
subparsermet.add_argument('instance', nargs='?', help='instance count')
args = argparser.parse_args()
if 'moduledottedpath' in args:
    prj = Project('.')
    if 'oldname' not in args:
        renamodule(args.moduledottedpath, args.newname)
    else:
        renamethod(args.moduledottedpath, args.oldname, args.newname)
else:
    argparser.error('nothing to do, please choose module or method')

Let’s create a test environment with the exact the scenario shown in the question (here assuming a linux user):

cd /some/folder/

ls pyrename.py # we are in the same folder of the script

# creating your test project equal to the question in prj child folder:
mkdir prj; cd prj; cat << EOF >> main.py
#!/usr/bin/env python3
from a.b import foo_method

foo_method()
EOF
mkdir a; touch a/__init__.py; cat << EOF >> a/b.py
def foo_method():
    print('yesterday i was foo, tomorrow i will be bar')
EOF
chmod +x main.py

# testing:
./main.py
# yesterday i was foo, tomorrow i will be bar
cat main.py
cat a/b.py

Now using the script for renaming modules and methods:

# be sure that you are in the project root folder


# rename package (here called module)
../pyrename.py module a b 
# package folder 'a' renamed to 'b' and also all references


# rename module
../pyrename.py module b.b d
# 'b.b' (previous 'a.b') renamed to 'd' and also all references also
# important - oldname is the full dotted path, new name is name only


# rename method
../pyrename.py method b.d foo_method bar_method
# 'foo_method' in package 'b.d' renamed to 'bar_method' and also all references
# important - if there are more than one occurence of 'def foo_method(' in the file,
#             it is necessary to add an extra argument telling which (zero-indexed) instance to use
#             you will be warned if multiple instances are found and you don't include this extra argument


# testing again:
./main.py
# yesterday i was foo, tomorrow i will be bar
cat main.py
cat b/d.py

This example did exact what the question did.

Only renaming of modules and methods were implemented because it is the question scope. If you need more, you can increment the script or create a new one from scratch, learning from their documentation and from this script itself. For simplicity we are using current folder as the project folder, but you can add an extra parameter in the script to make it more flexible.

Answered By: brunoff