How to verify integrity of files using digest in python (SHA256SUMS)

Question:

I have a set of files and a SHA256SUMS digest file that contains a sha256() hash for each of the files. What’s the best way to verify the integrity of my files with python?

For example, here’s how I would download the Debian 10 net installer SHA256SUMS digest file and download/verify its the MANIFEST file in BASH

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 02:11:20--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K  71.7KB/s    in 1.0s    

2020-08-25 02:11:22 (71.7 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 02:11:27--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 02:11:28 (128 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ sha256sum --check --ignore-missing SHA256SUMS 
./MANIFEST: OK
user@host:~$ 

What is the best way to do this same operation (download and verify the integrity of the Debian 10 MANIFEST file using the SHA256SUMS file) in python?

Asked By: Michael Altfield

||

Answers:

You may calculate the sha256sums of each file as described in this blog post:

https://www.quickprogrammingtips.com/python/how-to-calculate-sha256-hash-of-a-file-in-python.html

A sample implementation to generate a new manifest file may look like:

import hashlib
from pathlib import Path

# Your output file
output_file = "manifest-check"

# Your target directory
p = Path('.')

sha256_hash = hashlib.sha256()

with open(output_file, "w") as out:
  # Iterate over the files in the directory
  for f in p.glob("**/*"):
    # Process files only (no subdirs)
    if f.is_file():
      with open(filename,"rb") as f:
      # Read the file by chunks
      for byte_block in iter(lambda: f.read(4096),b""):
        sha256_hash.update(byte_block)
      out.write(f + "t" + sha256_hash.hexdigest() + "n")

Alternatively, this seems to be achieved by manifest-checker pip package.

You may have a look at its source here
https://github.com/TonyFlury/manifest-checkerand adjust it for python 3

Answered By: mabe02

The following python script implements a function named integrity_is_ok() that takes the path to a SHA256SUMS file and a list of files to be verified, and it returns False if any of the files couldn’t be verified and True otherwise.

#!/usr/bin/env python3
from hashlib import sha256
import os

# Takes the path (as a string) to a SHA256SUMS file and a list of paths to
# local files. Returns true only if all files' checksums are present in the
# SHA256SUMS file and their checksums match
def integrity_is_ok( sha256sums_filepath, local_filepaths ):

    # first we parse the SHA256SUMS file and convert it into a dictionary
    sha256sums = dict()
    with open( sha256sums_filepath ) as fd:
        for line in fd:
            # sha256 hashes are exactly 64 characters long
            checksum = line[0:64]

            # there is one space followed by one metadata character between the
            # checksum and the filename in the `sha256sum` command output
            filename = os.path.split( line[66:] )[1].strip()
            sha256sums[filename] = checksum

    # now loop through each file that we were asked to check and confirm its
    # checksum matches what was listed in the SHA256SUMS file
    for local_file in local_filepaths:

        local_filename = os.path.split( local_file )[1]

        sha256sum = sha256()
        with open( local_file, 'rb' ) as fd:
            data_chunk = fd.read(1024)
            while data_chunk:
                sha256sum.update(data_chunk)
                data_chunk = fd.read(1024)

        checksum = sha256sum.hexdigest()
        if checksum != sha256sums[local_filename]:
            return False

    return True

if __name__ == '__main__':

    script_dir = os.path.split( os.path.realpath(__file__) )[0]
    sha256sums_filepath = script_dir + '/SHA256SUMS'
    local_filepaths = [ script_dir + '/MANIFEST' ]

    if integrity_is_ok( sha256sums_filepath, local_filepaths ):
        print( "INFO: Checksum OK" )
    else:
        print( "ERROR: Checksum Invalid" )

Here is an example execution:

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
--2020-08-25 22:40:16--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/SHA256SUMS
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 75295 (74K)
Saving to: ‘SHA256SUMS’

SHA256SUMS          100%[===================>]  73.53K   201KB/s    in 0.4s    

2020-08-25 22:40:17 (201 KB/s) - ‘SHA256SUMS’ saved [75295/75295]

user@host:~$ wget http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
--2020-08-25 22:40:32--  http://ftp.nl.debian.org/debian/dists/buster/main/installer-amd64/current/images/MANIFEST
Resolving ftp.nl.debian.org (ftp.nl.debian.org)... 130.89.149.21, 2001:67c:2564:a120::21
Connecting to ftp.nl.debian.org (ftp.nl.debian.org)|130.89.149.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1709 (1.7K)
Saving to: ‘MANIFEST’

MANIFEST            100%[===================>]   1.67K  --.-KB/s    in 0s      

2020-08-25 22:40:32 (13.0 MB/s) - ‘MANIFEST’ saved [1709/1709]

user@host:~$ ./sha256sums_python.py 
INFO: Checksum OK
user@host:~$ 

Parts of the above code were adapted from the following answer on Ask Ubuntu:

Answered By: Michael Altfield

Python 3.11 added hashlib.file_digest()

https://docs.python.org/3.11/library/hashlib.html#file-hashing

Generating the digest for a file:

with open("my_file", "rb") as f:
    digest = hashlib.file_digest(f, "sha256")
    s = digest.hexdigest()

Compare s against the information you have in SHA256SUMS.

Answered By: psq