Automatically Update Python source code (imports)
Question:
We are refactoring our code base.
Old:
from a.b import foo_method
New:
from b.d import bar_method
Both methods (foo_method()
and bar_method()
) are the same. It just changed the name an the package.
Since above example is just one example of many ways a method can be imported, I don’t think a simple regular expression can help here.
How to refactor the importing of a module with a command line tool?
A lot of source code lines need to be changed, so that an IDE does not help here.
Answers:
You’ll need to write/find some script that will do text replacement of all occurrences in some folder. I remember that Notepad++ could do that.
But as you mentioned, that regexp will not help here, then no scripts (even Open Source) will help here too. And you’ll definitely need to have some intelligence here, that will build index of your dependencies/modules/files/packages/etc. and will be able to manipulate them on that level. And that is the purpose for which IDE were built for.
You can choose any that you like: PyCharm, Sublime, Visual Studio or any other that are not just text editor but something that has refactoring functionality.
In any case, I would suggest you to do following refactoring steps:
- rename old methods and their usages to new names
- then just replace package path & names in imports with newer versions
In cases where there is no obvious way to solve a batch editing problem, doing the next best thing with the addition of some manual work can work just as well.
As you’ve mentioned in your post:
Since above example is just one example of many ways a method can be imported, I don’t think a simple regular expression can help here.
I would recommend using a regular expression, while still printing out potential matches in case they are relevant:
def potential(line):
# This is just a minimal example; replace with more reliable expression
return "foo_method" in line or "a.b" in line
matches = ["from a.b import foo_method"] # Add more to the list if necessary
new = "from b.d import bar_method"
# new = "from b.d import bar_method as foo_method"
file = "file.py"
result = ""
with open(file) as f:
for line in f:
for match in matches:
if match in line:
result += line.replace(match, new)
break
else:
if potential(line):
print(line)
# Here is the part where you manually check lines that potentially needs editing
new_line = input("Replace with... (leave blank to ignore) ")
if new_line:
result += new_line + "n"
continue
result += line
with open(file, "w") as f:
f.write(result)
Also, this goes without saying, but always make sure to create at least one backup of your original code base/project before doing such alterations.
But I don’t really think there will be too many complications on the different ways to import the method given that the code base was developed in proper PEP-8, as from What are all the ways to import modules in Python?:
The only ways that matter for ordinary usage are the first three ways listed on that page:
import module
from module import this, that, tother
from module import *
Finally, to avoid complications in renaming each instance the files call the foo_method
from foo_method
to bar_method
, I would recommend importing the newly named bar_method
as foo_method
, using the as
keyword, of course.
A programmatic solution is to convert each file into a syntax tree, identify parts that meet your criteria and transform them. You can do this with Python’s ast module, but it doesn’t preserve whitespace or comments. There are also libraries that preserve these features because they operate on concrete (or lossless) rather than abstract syntax trees.
Red Baron is one such tool, but it does not support Python 3.8+, and looks to be unmaintained (last commit in 2019). libcst is another, and I’ll use it in this answer (disclaimer: I am not associated with the libcst
project). Note that libcst
does not yet support Python 3.10+.
The code below uses a Transformer that can identify
from a.b import foo_method
statements
- function calls where the function is named
foo_method
and transforms the identified nodes into
from b.d import bar_method
bar_method
In the transformer class we specify methods named leave_Node
where Node
is the type of node that we want to inspect and transform (we can also specify visit_Node
methods, but we don’t need them for this example). Within the methods we use matchers to check that the nodes match our criteria for transformation.
import libcst as cst
import libcst.matchers as m
src = """
import foo
from a.b import foo_method
class C:
def do_something(self, x):
return foo_method(x)
"""
class ImportFixer(cst.CSTTransformer):
def leave_SimpleStatementLine(self, orignal_node, updated_node):
"""Replace imports that match our criteria."""
if m.matches(updated_node.body[0], m.ImportFrom()):
import_from = updated_node.body[0]
if m.matches(
import_from.module,
m.Attribute(value=m.Name('a'), attr=m.Name('b')),
):
if m.matches(
import_from.names[0],
m.ImportAlias(name=m.Name('foo_method')),
):
# Note that when matching we use m.Node,
# but when replacing we use cst.Node.
return updated_node.with_changes(
body=[
cst.ImportFrom(
module=cst.Attribute(
value=cst.Name('b'), attr=cst.Name('d')
),
names=[
cst.ImportAlias(
name=cst.Name('bar_method')
)
],
)
]
)
return updated_node
def leave_Call(self, original_node, updated_node):
if m.matches(updated_node, m.Call(func=m.Name('foo_method'))):
return updated_node.with_changes(func=cst.Name('bar_method'))
return updated_node
source_tree = cst.parse_module(src)
transformer = ImportFixer()
modified_tree = source_tree.visit(transformer)
print(modified_tree.code)
Output:
import foo
from b.d import bar_method
class C:
def do_something(self, x):
return bar_method(x)
You can use libcst
‘s parsing helpers in the Python REPL to view and work with the node trees for modules, statements and expressions. This is usually the best way to work out which nodes to transform and what needs to be matched.
libcst
provides a framework named codemods to support refactoring large codebases.
Behind the scenes, IDEs are no much more than text editors with bunch of windows and attached binaries to make different kind of jobs, like compiling, debugging, tagging code, linting, etc. Eventually one of those libraries can be used to refactor code. One such library is Jedi, but there is one that was specifically made to handle refactoring, which is rope.
pip3 install rope
A CLI solution
You can try using their API, but since you asked for a command line tool and there wasn’t one, save the following file anywhere reachable (a known relative folder your user bin, etc) and make it executable chmod +x pyrename.py
.
#!/usr/bin/env python3
from rope.base.project import Project
from rope.refactor.rename import Rename
from argparse import ArgumentParser
def renamodule(old, new):
prj.do(Rename(prj, prj.find_module(old)).get_changes(new))
def renamethod(mod, old, new, instance=None):
mod = prj.find_module(mod)
modtxt = mod.read()
pos, inst = -1, 0
while True:
pos = modtxt.find('def '+old+'(', pos+1)
if pos < 0:
if instance is None and prepos > 0:
pos = prepos+4 # instance=None and only one instance found
break
print('found', inst, 'instances of method', old+',', ('tell which to rename by using an extra integer argument in the range 0..' if (instance is None) else 'could not use instance=')+str(inst-1))
pos = -1
break
if (type(instance) is int) and inst == instance:
pos += 4
break # found
if instance is None:
if inst == 0:
prepos = pos
else:
prepos = -1
inst += 1
if pos > 0:
prj.do(Rename(prj, mod, pos).get_changes(new))
argparser = ArgumentParser()
#argparser.add_argument('moduleormethod', choices=['module', 'method'], help='choose between module or method')
subparsers = argparser.add_subparsers()
subparsermod = subparsers.add_parser('module', help='moduledottedpath newname')
subparsermod.add_argument('moduledottedpath', help='old module full dotted path')
subparsermod.add_argument('newname', help='new module name only')
subparsermet = subparsers.add_parser('method', help='moduledottedpath oldname newname')
subparsermet.add_argument('moduledottedpath', help='module full dotted path')
subparsermet.add_argument('oldname', help='old method name')
subparsermet.add_argument('newname', help='new method name')
subparsermet.add_argument('instance', nargs='?', help='instance count')
args = argparser.parse_args()
if 'moduledottedpath' in args:
prj = Project('.')
if 'oldname' not in args:
renamodule(args.moduledottedpath, args.newname)
else:
renamethod(args.moduledottedpath, args.oldname, args.newname)
else:
argparser.error('nothing to do, please choose module or method')
Let’s create a test environment with the exact the scenario shown in the question (here assuming a linux user):
cd /some/folder/
ls pyrename.py # we are in the same folder of the script
# creating your test project equal to the question in prj child folder:
mkdir prj; cd prj; cat << EOF >> main.py
#!/usr/bin/env python3
from a.b import foo_method
foo_method()
EOF
mkdir a; touch a/__init__.py; cat << EOF >> a/b.py
def foo_method():
print('yesterday i was foo, tomorrow i will be bar')
EOF
chmod +x main.py
# testing:
./main.py
# yesterday i was foo, tomorrow i will be bar
cat main.py
cat a/b.py
Now using the script for renaming modules and methods:
# be sure that you are in the project root folder
# rename package (here called module)
../pyrename.py module a b
# package folder 'a' renamed to 'b' and also all references
# rename module
../pyrename.py module b.b d
# 'b.b' (previous 'a.b') renamed to 'd' and also all references also
# important - oldname is the full dotted path, new name is name only
# rename method
../pyrename.py method b.d foo_method bar_method
# 'foo_method' in package 'b.d' renamed to 'bar_method' and also all references
# important - if there are more than one occurence of 'def foo_method(' in the file,
# it is necessary to add an extra argument telling which (zero-indexed) instance to use
# you will be warned if multiple instances are found and you don't include this extra argument
# testing again:
./main.py
# yesterday i was foo, tomorrow i will be bar
cat main.py
cat b/d.py
This example did exact what the question did.
Only renaming of modules and methods were implemented because it is the question scope. If you need more, you can increment the script or create a new one from scratch, learning from their documentation and from this script itself. For simplicity we are using current folder as the project folder, but you can add an extra parameter in the script to make it more flexible.
We are refactoring our code base.
Old:
from a.b import foo_method
New:
from b.d import bar_method
Both methods (foo_method()
and bar_method()
) are the same. It just changed the name an the package.
Since above example is just one example of many ways a method can be imported, I don’t think a simple regular expression can help here.
How to refactor the importing of a module with a command line tool?
A lot of source code lines need to be changed, so that an IDE does not help here.
You’ll need to write/find some script that will do text replacement of all occurrences in some folder. I remember that Notepad++ could do that.
But as you mentioned, that regexp will not help here, then no scripts (even Open Source) will help here too. And you’ll definitely need to have some intelligence here, that will build index of your dependencies/modules/files/packages/etc. and will be able to manipulate them on that level. And that is the purpose for which IDE were built for.
You can choose any that you like: PyCharm, Sublime, Visual Studio or any other that are not just text editor but something that has refactoring functionality.
In any case, I would suggest you to do following refactoring steps:
- rename old methods and their usages to new names
- then just replace package path & names in imports with newer versions
In cases where there is no obvious way to solve a batch editing problem, doing the next best thing with the addition of some manual work can work just as well.
As you’ve mentioned in your post:
Since above example is just one example of many ways a method can be imported, I don’t think a simple regular expression can help here.
I would recommend using a regular expression, while still printing out potential matches in case they are relevant:
def potential(line):
# This is just a minimal example; replace with more reliable expression
return "foo_method" in line or "a.b" in line
matches = ["from a.b import foo_method"] # Add more to the list if necessary
new = "from b.d import bar_method"
# new = "from b.d import bar_method as foo_method"
file = "file.py"
result = ""
with open(file) as f:
for line in f:
for match in matches:
if match in line:
result += line.replace(match, new)
break
else:
if potential(line):
print(line)
# Here is the part where you manually check lines that potentially needs editing
new_line = input("Replace with... (leave blank to ignore) ")
if new_line:
result += new_line + "n"
continue
result += line
with open(file, "w") as f:
f.write(result)
Also, this goes without saying, but always make sure to create at least one backup of your original code base/project before doing such alterations.
But I don’t really think there will be too many complications on the different ways to import the method given that the code base was developed in proper PEP-8, as from What are all the ways to import modules in Python?:
The only ways that matter for ordinary usage are the first three ways listed on that page:
import module
from module import this, that, tother
from module import *
Finally, to avoid complications in renaming each instance the files call the foo_method
from foo_method
to bar_method
, I would recommend importing the newly named bar_method
as foo_method
, using the as
keyword, of course.
A programmatic solution is to convert each file into a syntax tree, identify parts that meet your criteria and transform them. You can do this with Python’s ast module, but it doesn’t preserve whitespace or comments. There are also libraries that preserve these features because they operate on concrete (or lossless) rather than abstract syntax trees.
Red Baron is one such tool, but it does not support Python 3.8+, and looks to be unmaintained (last commit in 2019). libcst is another, and I’ll use it in this answer (disclaimer: I am not associated with the libcst
project). Note that libcst
does not yet support Python 3.10+.
The code below uses a Transformer that can identify
from a.b import foo_method
statements- function calls where the function is named
foo_method
and transforms the identified nodes into
from b.d import bar_method
bar_method
In the transformer class we specify methods named leave_Node
where Node
is the type of node that we want to inspect and transform (we can also specify visit_Node
methods, but we don’t need them for this example). Within the methods we use matchers to check that the nodes match our criteria for transformation.
import libcst as cst
import libcst.matchers as m
src = """
import foo
from a.b import foo_method
class C:
def do_something(self, x):
return foo_method(x)
"""
class ImportFixer(cst.CSTTransformer):
def leave_SimpleStatementLine(self, orignal_node, updated_node):
"""Replace imports that match our criteria."""
if m.matches(updated_node.body[0], m.ImportFrom()):
import_from = updated_node.body[0]
if m.matches(
import_from.module,
m.Attribute(value=m.Name('a'), attr=m.Name('b')),
):
if m.matches(
import_from.names[0],
m.ImportAlias(name=m.Name('foo_method')),
):
# Note that when matching we use m.Node,
# but when replacing we use cst.Node.
return updated_node.with_changes(
body=[
cst.ImportFrom(
module=cst.Attribute(
value=cst.Name('b'), attr=cst.Name('d')
),
names=[
cst.ImportAlias(
name=cst.Name('bar_method')
)
],
)
]
)
return updated_node
def leave_Call(self, original_node, updated_node):
if m.matches(updated_node, m.Call(func=m.Name('foo_method'))):
return updated_node.with_changes(func=cst.Name('bar_method'))
return updated_node
source_tree = cst.parse_module(src)
transformer = ImportFixer()
modified_tree = source_tree.visit(transformer)
print(modified_tree.code)
Output:
import foo
from b.d import bar_method
class C:
def do_something(self, x):
return bar_method(x)
You can use libcst
‘s parsing helpers in the Python REPL to view and work with the node trees for modules, statements and expressions. This is usually the best way to work out which nodes to transform and what needs to be matched.
libcst
provides a framework named codemods to support refactoring large codebases.
Behind the scenes, IDEs are no much more than text editors with bunch of windows and attached binaries to make different kind of jobs, like compiling, debugging, tagging code, linting, etc. Eventually one of those libraries can be used to refactor code. One such library is Jedi, but there is one that was specifically made to handle refactoring, which is rope.
pip3 install rope
A CLI solution
You can try using their API, but since you asked for a command line tool and there wasn’t one, save the following file anywhere reachable (a known relative folder your user bin, etc) and make it executable chmod +x pyrename.py
.
#!/usr/bin/env python3
from rope.base.project import Project
from rope.refactor.rename import Rename
from argparse import ArgumentParser
def renamodule(old, new):
prj.do(Rename(prj, prj.find_module(old)).get_changes(new))
def renamethod(mod, old, new, instance=None):
mod = prj.find_module(mod)
modtxt = mod.read()
pos, inst = -1, 0
while True:
pos = modtxt.find('def '+old+'(', pos+1)
if pos < 0:
if instance is None and prepos > 0:
pos = prepos+4 # instance=None and only one instance found
break
print('found', inst, 'instances of method', old+',', ('tell which to rename by using an extra integer argument in the range 0..' if (instance is None) else 'could not use instance=')+str(inst-1))
pos = -1
break
if (type(instance) is int) and inst == instance:
pos += 4
break # found
if instance is None:
if inst == 0:
prepos = pos
else:
prepos = -1
inst += 1
if pos > 0:
prj.do(Rename(prj, mod, pos).get_changes(new))
argparser = ArgumentParser()
#argparser.add_argument('moduleormethod', choices=['module', 'method'], help='choose between module or method')
subparsers = argparser.add_subparsers()
subparsermod = subparsers.add_parser('module', help='moduledottedpath newname')
subparsermod.add_argument('moduledottedpath', help='old module full dotted path')
subparsermod.add_argument('newname', help='new module name only')
subparsermet = subparsers.add_parser('method', help='moduledottedpath oldname newname')
subparsermet.add_argument('moduledottedpath', help='module full dotted path')
subparsermet.add_argument('oldname', help='old method name')
subparsermet.add_argument('newname', help='new method name')
subparsermet.add_argument('instance', nargs='?', help='instance count')
args = argparser.parse_args()
if 'moduledottedpath' in args:
prj = Project('.')
if 'oldname' not in args:
renamodule(args.moduledottedpath, args.newname)
else:
renamethod(args.moduledottedpath, args.oldname, args.newname)
else:
argparser.error('nothing to do, please choose module or method')
Let’s create a test environment with the exact the scenario shown in the question (here assuming a linux user):
cd /some/folder/
ls pyrename.py # we are in the same folder of the script
# creating your test project equal to the question in prj child folder:
mkdir prj; cd prj; cat << EOF >> main.py
#!/usr/bin/env python3
from a.b import foo_method
foo_method()
EOF
mkdir a; touch a/__init__.py; cat << EOF >> a/b.py
def foo_method():
print('yesterday i was foo, tomorrow i will be bar')
EOF
chmod +x main.py
# testing:
./main.py
# yesterday i was foo, tomorrow i will be bar
cat main.py
cat a/b.py
Now using the script for renaming modules and methods:
# be sure that you are in the project root folder
# rename package (here called module)
../pyrename.py module a b
# package folder 'a' renamed to 'b' and also all references
# rename module
../pyrename.py module b.b d
# 'b.b' (previous 'a.b') renamed to 'd' and also all references also
# important - oldname is the full dotted path, new name is name only
# rename method
../pyrename.py method b.d foo_method bar_method
# 'foo_method' in package 'b.d' renamed to 'bar_method' and also all references
# important - if there are more than one occurence of 'def foo_method(' in the file,
# it is necessary to add an extra argument telling which (zero-indexed) instance to use
# you will be warned if multiple instances are found and you don't include this extra argument
# testing again:
./main.py
# yesterday i was foo, tomorrow i will be bar
cat main.py
cat b/d.py
This example did exact what the question did.
Only renaming of modules and methods were implemented because it is the question scope. If you need more, you can increment the script or create a new one from scratch, learning from their documentation and from this script itself. For simplicity we are using current folder as the project folder, but you can add an extra parameter in the script to make it more flexible.