Generating one MD5/SHA1 checksum of multiple files in Python
Question:
I have looked through several topics about calculating checksums of files in Python but none of them answered the question about one sum from multiple files. I have several files in sub directories and would like to determine if there was any change in one or more of them.
Is there a way to generate one sum from multiple files?
EDIT:
This is the way I do it to get a list of sums:
checksums = [(fname, hashlib.md5(open(fname, 'rb').read()).digest()) for fname in flist]
Answers:
So I made it 🙂 This way one hash sum is generated for a file list.
hash_obj = hashlib.md5(open(flist[0], 'rb').read())
for fname in flist[1:]:
hash_obj.update(open(fname, 'rb').read())
checksum = hash_obj.digest()
Thank you PM 2Ring for your input!
Note that md5 has been cracked so use it only for non security critical purposes.
Slightly cleaner than Artur’s answer. There’s no need to treat the first element specially.
Edit (2022): I know Python a bit better now so I updated the code as follows:
- Use
pathlib
– it’s more ergonomic and doesn’t leave files open.
- Add type hints. If you don’t use these you’re doing it wrong.
- Avoid a very mild TOCTOU issue.
import hashlib
from pathlib import Path
def calculate_checksum(filenames: list[str]) -> bytes:
hash = hashlib.md5()
for fn in filenames:
try:
hash.update(Path(fn).read_bytes())
except IsADirectoryError:
pass
return hash.digest()
(You can handle IsADirectoryError
differently if you like.)
import subprocess
cmd =input("Enter the command : ")
trial = subprocess.run(["powershell","-Command",cmd])
#Powershell command : Get-FileHash -Algorithm MD5 -Path (Get-ChildItem "filepath*.*" -Recurse -force)
I have looked through several topics about calculating checksums of files in Python but none of them answered the question about one sum from multiple files. I have several files in sub directories and would like to determine if there was any change in one or more of them.
Is there a way to generate one sum from multiple files?
EDIT:
This is the way I do it to get a list of sums:
checksums = [(fname, hashlib.md5(open(fname, 'rb').read()).digest()) for fname in flist]
So I made it 🙂 This way one hash sum is generated for a file list.
hash_obj = hashlib.md5(open(flist[0], 'rb').read())
for fname in flist[1:]:
hash_obj.update(open(fname, 'rb').read())
checksum = hash_obj.digest()
Thank you PM 2Ring for your input!
Note that md5 has been cracked so use it only for non security critical purposes.
Slightly cleaner than Artur’s answer. There’s no need to treat the first element specially.
Edit (2022): I know Python a bit better now so I updated the code as follows:
- Use
pathlib
– it’s more ergonomic and doesn’t leave files open. - Add type hints. If you don’t use these you’re doing it wrong.
- Avoid a very mild TOCTOU issue.
import hashlib
from pathlib import Path
def calculate_checksum(filenames: list[str]) -> bytes:
hash = hashlib.md5()
for fn in filenames:
try:
hash.update(Path(fn).read_bytes())
except IsADirectoryError:
pass
return hash.digest()
(You can handle IsADirectoryError
differently if you like.)
import subprocess
cmd =input("Enter the command : ")
trial = subprocess.run(["powershell","-Command",cmd])
#Powershell command : Get-FileHash -Algorithm MD5 -Path (Get-ChildItem "filepath*.*" -Recurse -force)