Open XML document protection implementation (documentProtection class)

Question:

I’m trying to implement the Open XML documentProtection hash protection of a MS Word (2019) document in Python to test the hashing algorithm. So I’ve created a Word document, protected it against editing with this password: johnjohn. Then, opening the document as ZIP/XML, I see the following in the documentProtection section:

<w:documentProtection w_edit="readOnly" w_enforcement="1" w_cryptProviderType="rsaAES" w_cryptAlgorithmClass="hash" w_cryptAlgorithmType="typeAny" w_cryptAlgorithmSid="14" w_cryptSpinCount="100000" w_hash="pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==" w_salt="pH1TDVHSfGBxkd3Q88UNhQ==" /> 

According to the Open XML docs (ECMA-376-1:2016 #17.15.1.29):

  • cryptAlgorithmSid="14" points to the SHA-512 algorithm
  • cryptSpinCount="100000" means that hashing must be done in 100k rounds, using the following algoright (quote from above standard):

Specifies the number of times the hashing function shall be iteratively run (runs using each iteration’s result plus a 4 byte value (0-based, little endian) containing the number of the iteration as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.

The BASE64-encoded salt used for hashing ("pH1TDVHSfGBxkd3Q88UNhQ==") is prepended to the original password. The target BASE64-encoded hash must be "pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=="

So my Python script attempts to generate the same hash value with the described algorithm as follows:

import hashlib
import base64
import struct

TARGET_HASH = 'pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=='

TARGET_SALT = 'pH1TDVHSfGBxkd3Q88UNhQ=='
bsalt = base64.b64decode(TARGET_SALT)

def hashit(what, alg='sha512', **kwargs):
    if alg == 'sha1':
        return hashlib.sha1(what)
    elif alg == 'sha512':
        return hashlib.sha512(what)
    # etc...
    else:
        raise Exception(f'Unsupported hash algorithm: {alg}')

def gethash(data, salt=None, alg='sha512', iters=100000, base64result=True, returnstring=True):
    # encode password in UTF-16LE
    # ECMA-376-1:2016 17.15.1.29 (p. 1026)
    if isinstance(data, str): data = data.encode('utf-16-le')
    
    # prepend salt if provided
    if not salt is None:
        if isinstance(salt, str): salt = salt.encode('utf-16-le')
        ghash = salt + data
    else:
        ghash = data
    
    # hash iteratively for 'iters' rounds
    for i in range(iters):
        try:
            # next hash = hash(previous data) + 4-byte integer (previous round number) with LE byte ordering
            # ECMA-376-1:2016 17.15.1.29 (p. 1020)
            ghash = hashit(ghash, alg).digest() + struct.pack('<I', i)
        except Exception as err:
            print(err)
            break
    
    # remove trailing round number bytes
    ghash = ghash[:-4]

    # BASE64 encode if requested
    if base64result:
        ghash = base64.b64encode(ghash)
    # return as an ASCII string if requested
    if returnstring:
        ghash = ghash.decode()
        
    return ghash

But then when I run

print(gethash('johnjohn', bsalt))

I get the following hash which is not equal to the target one:

G47RT4/+JdE6pnrP6MqUKa3JyL8abeYSCX+E4+9J+6shiZqImBJ8M6bb+IMKEdvKd6+9dVnQ3oeOsgQz/aCdcQ==

Could I be wrong in my implementation somewhere or do you think there’s a difference in the low-level hash function implementation (Python’s hashlib vs. Open XML)?

Updated

I realized that Word uses a legacy algorithm to pre-process passwords (for compatibility with older versions). This algorithm is described at length in ECMA-376-1:2016 Part 4 (Transitional Migration Features, #14.8.1 "Legacy Password Hash Algorithm"). So I’ve managed to make a script that reproduces the official ECMA example:

def strtobytes(s, trunc=15):    
    b = s.encode('utf-16-le')
    # remove BOM symbol if present
    if b[0] == 0xfeff: b = b[1:]    
    pwdlen = min(trunc, len(s))
    if pwdlen < 1: return None
    return bytes([b[i] or b[i+1] for i in range(0, pwdlen * 2, 2)])

def process_pwd(pwd):
    # 1. PREPARE PWD STRING (TRUNCATE, CONVERT TO BYTES)
    pw = strtobytes(pwd) if isinstance(pwd, str) else pwd[:15]
    pwdlen = len(pw)
    
    # 2. HIGH WORD CALC
    HW = InitialCodeArray[pwdlen - 1]
    for i in range(pwdlen):
        r = 15 - pwdlen + i
        for ibit in range(7):
            if (pw[i] & (0x0001 << ibit)):                
                HW ^= EncryptionMatrix[r][ibit]
    
    # 3. LO WORD CALC
    LW = 0
    for i in reversed(range(pwdlen)):
        LW = (((LW >> 14) & 0x0001) | ((LW << 1) & 0x7FFF)) ^ pw[i]
    LW = (((LW >> 14) & 0x0001) | ((LW << 1) & 0x7FFF)) ^ pwdlen ^ 0xCE4B    
    
    # 4. COMBINE AND REVERSE
    return bytes([LW & 0xff, LW >> 8, HW & 0xff, HW >> 8])

So when I do process_pwd('Example') I get what’s said in the ECMA (0x7EEDCE64). The hashing function was also modified (the initial SALT + HASH should not be included in the main iteration loop, as I found on a forum):

def gethash(data, salt=None, alg='sha512', iters=100000, base64result=True, returnstring=True):
    
    def hashit(what, alg='sha512'):
        return getattr(hashlib, alg)(what)
    
    # encode password with legacy algorithm if a string is given
    if isinstance(data, str): 
        data = process_pwd(data)
        
    if data is None: 
        print('WRONG PASSWORD STRING!')
        return None
    
    # prepend salt if provided
    if not salt is None:
        if isinstance(salt, str): 
            salt = process_pwd(salt)
            if salt is None:
                print('WRONG SALT STRING!')
                return None
        ghash = salt + data
    else:
        ghash = data
    
    # initial hash (salted)
    ghash = hashit(ghash, alg).digest()
    
    # hash iteratively for 'iters' rounds
    for i in range(iters):
        try:
            # next hash = hash(previous data + 4-byte integer (previous round number) with LE byte ordering)
            # ECMA-376-1:2016 17.15.1.29 (p. 1020)
            ghash = hashit(ghash + struct.pack('<I', i), alg).digest()
        except Exception as err:
            print(err)
            return None

    # BASE64 encode if requested
    if base64result:
        ghash = base64.b64encode(ghash)
        
    # return as an ASCII string if requested
    if returnstring:
        ghash = ghash.decode()
        
    return ghash

However many times I’ve re-checked this code, I couldn’t see any more errors. But I still can’t reproduce the target hash in the test Word document:

myhash = gethash('johnjohn', base64.b64decode('pH1TDVHSfGBxkd3Q88UNhQ=='))
print(myhash)
print(TARGET_HASH == myhash)

I get:

wut2VOpT+X8pKXky6u/+YtwRX2inDv1WVC8FtZcdxKsyX0gHNBJGYwBgV8xzq7Rke/hWMfWe9JVvqDQAZ11A5w==

False

UPDATE (August 2022)

Returning to this question, I’ve updated my Python code adapting the detailed answer below (thanks @Andrew O!). My full code is now as follows:

# coding: utf-8
import hashlib
import base64

TARGET_HASH = 'pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw=='
TARGET_SALT = 'pH1TDVHSfGBxkd3Q88UNhQ=='

HighOrderWords = [
    [0xE1, 0xF0],
    [0x1D, 0x0F],
    [0xCC, 0x9C],
    [0x84, 0xC0],
    [0x11, 0x0C],
    [0x0E, 0x10],
    [0xF1, 0xCE],
    [0x31, 0x3E],
    [0x18, 0x72],
    [0xE1, 0x39],
    [0xD4, 0x0F],
    [0x84, 0xF9],
    [0x28, 0x0C],
    [0xA9, 0x6A],
    [0x4E, 0xC3]
]

EncryptionMatrix = [
    [[0xAE, 0xFC], [0x4D, 0xD9], [0x9B, 0xB2], [0x27, 0x45], [0x4E, 0x8A], [0x9D, 0x14], [0x2A, 0x09]],
    [[0x7B, 0x61], [0xF6, 0xC2], [0xFD, 0xA5], [0xEB, 0x6B], [0xC6, 0xF7], [0x9D, 0xCF], [0x2B, 0xBF]],
    [[0x45, 0x63], [0x8A, 0xC6], [0x05, 0xAD], [0x0B, 0x5A], [0x16, 0xB4], [0x2D, 0x68], [0x5A, 0xD0]],
    [[0x03, 0x75], [0x06, 0xEA], [0x0D, 0xD4], [0x1B, 0xA8], [0x37, 0x50], [0x6E, 0xA0], [0xDD, 0x40]],
    [[0xD8, 0x49], [0xA0, 0xB3], [0x51, 0x47], [0xA2, 0x8E], [0x55, 0x3D], [0xAA, 0x7A], [0x44, 0xD5]],
    [[0x6F, 0x45], [0xDE, 0x8A], [0xAD, 0x35], [0x4A, 0x4B], [0x94, 0x96], [0x39, 0x0D], [0x72, 0x1A]],
    [[0xEB, 0x23], [0xC6, 0x67], [0x9C, 0xEF], [0x29, 0xFF], [0x53, 0xFE], [0xA7, 0xFC], [0x5F, 0xD9]],
    [[0x47, 0xD3], [0x8F, 0xA6], [0x0F, 0x6D], [0x1E, 0xDA], [0x3D, 0xB4], [0x7B, 0x68], [0xF6, 0xD0]],
    [[0xB8, 0x61], [0x60, 0xE3], [0xC1, 0xC6], [0x93, 0xAD], [0x37, 0x7B], [0x6E, 0xF6], [0xDD, 0xEC]],
    [[0x45, 0xA0], [0x8B, 0x40], [0x06, 0xA1], [0x0D, 0x42], [0x1A, 0x84], [0x35, 0x08], [0x6A, 0x10]],
    [[0xAA, 0x51], [0x44, 0x83], [0x89, 0x06], [0x02, 0x2D], [0x04, 0x5A], [0x08, 0xB4], [0x11, 0x68]],
    [[0x76, 0xB4], [0xED, 0x68], [0xCA, 0xF1], [0x85, 0xC3], [0x1B, 0xA7], [0x37, 0x4E], [0x6E, 0x9C]],
    [[0x37, 0x30], [0x6E, 0x60], [0xDC, 0xC0], [0xA9, 0xA1], [0x43, 0x63], [0x86, 0xC6], [0x1D, 0xAD]],
    [[0x33, 0x31], [0x66, 0x62], [0xCC, 0xC4], [0x89, 0xA9], [0x03, 0x73], [0x06, 0xE6], [0x0D, 0xCC]],
    [[0x10, 0x21], [0x20, 0x42], [0x40, 0x84], [0x81, 0x08], [0x12, 0x31], [0x24, 0x62], [0x48, 0xC4]]
]

def hashit(what, alg='sha1', **kwargs):
    f = getattr(hashlib, alg, None)
    if f is None:
        raise Exception(f'Unsupported hash algorithm: {alg}')
    return f(what)

def strtobytes(s, trunc=15):    
    b = s.encode('utf-16-le')
    # remove BOM symbol if present
    if b[0] == 0xfeff: b = b[1:]    
    pwdlen = min(trunc, len(s))
    if pwdlen < 1: return None
    return bytearray([b[i] or b[i+1] for i in range(0, pwdlen * 2, 2)])

def generate_hash(password: str, salt: bytes = None, alg: str = 'sha512', iters: int = 100000, base64result=True, returnstring=True):
    """
    Algorithm given in ECMA-374, 1st Edition, December 2006
    https://www.ecma-international.org/wp-content/uploads/ecma-376_first_edition_december_2006.zip
    Alternatively: https://c-rex.net/projects/samples/ooxml/e1/Part4/OOXML_P4_DOCX_documentProtection_topic_ID0EJVTX.html
    """
    # Truncate the password to 15 characters
    passwordBytes = strtobytes(password)
    # Obtain the high-order word from the magic list based on the length of the password. 
    # If the password is 0 length, it's just two zero bytes
    passwordLength = len(passwordBytes)
    highOrderWord = bytearray([0, 0])
    # For each byte in the password, grab the bits based on its position in the encryption matrix 
    # (taking care that the last character always corresponds to the last row, 
    # the first part of the matrix may be unused if the password is shorter than 15 bytes). 
    # For the first to seventh bit, if it's set, do a XOR operation with the current value of the high order word. 
    # Repeat for each character. 
    if passwordLength > 0:
        highOrderWord = bytearray(HighOrderWords[passwordLength - 1])
        for i in range(passwordLength):
            passwordByte = passwordBytes[i]
            m = i + 15 - passwordLength
            for j in range(7):
                if (passwordByte & j) == 0: 
                    continue
                for k in range(2):
                    highOrderWord[k] ^= EncryptionMatrix[m][j][k]
    # Grab a low order word (2 bytes) and initialize to zero
    lowOrderWord = 0  
    # Perform the operation on each character, starting from the last character in the password and working forwards: 
    # low-order word = ( ((low-order word >> 14) AND 0x0001) | (low-order word << 1) & 0x7FFF)) ^ character (byte)
    for i in reversed(range(passwordLength)):
        passwordByte = passwordBytes[i]
        lowOrderWord = ( ((lowOrderWord >> 14) & 1) | ((lowOrderWord << 1) & 0x7FFF) ) ^ passwordByte
    # Then do low-order word = (((low-order word >> 14) & 0x0001) | (low-order word << 1) & 0x7FFF)) ^ password length ^ 0xCE4B
    lowOrderWord = ( ((lowOrderWord >> 14) & 1) | ((lowOrderWord << 1) & 0x7FFF) ) ^ passwordLength ^ 0xCE4B
    lowOrderWord = lowOrderWord.to_bytes(2, 'big')

    # Form the key by appending the low order word to the high order word, then reverse the byte ordering
    key = (highOrderWord + lowOrderWord)[::-1]
    # For some reason, Microsoft Word then uses the Unicode hex representation of the above key, 
    # then back converts that representation into bytes
    # In Word, an additional third stage is added to the process of hashing and storing a user supplied password. 
    # In this third stage, the reversed byte order legacy hash from the second stage shall be converted to Unicode hex string representation 
    # [Example: If the single byte string 7EEDCE64 is converted to Unicode hex string it will be represented in memory as the following byte stream: 
    # 37 00 45 00 45 00 44 00 43 00 45 00 36 00 34 00], and that value shall be hashed as defined by the attribute values
    # https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/fb220a2f-88d4-488c-a9b7-e094756b6699
    key = ''.join('{:02x}'.format(x) for x in key).replace('-', '').encode('utf-8')
    computedHash = bytearray(key)
    # Now compute the hash once by prepending the salt bytes to the result from above. 
    # If there are no salt bytes, just skip this step
    if salt:
        computedHash = bytearray(salt) + key
    # Word requires that the initial hash of the password with the salt not be considered in the count
    computedHash = bytearray(hashit(computedHash, alg).digest())
    # If there are iterations to compute, for each iteration, convert the iteration count (0-base) to a 32-bit (4 byte) integer (little endian), 
    # and (documentation wasn't clear on this, it just said to "add" the bytes - but to align with the output I had to append it) append this to the current computed hash. 
    # Apply the requested hash algorithm (Word seems to default to SHA512, but from testing I saw that it handles the other options fine as well)
    for i in range(iters):
        # ISO/IEC 29500-1 Fourth Edition, 2016-11-01
        # 17.15.1.29 - spinCount
        # Specifies the number of times the hashing function shall be iteratively run 
        # (runs using each iteration's result plus a 4 byte value (0-based, little endian) containing the number of the iteration 
        # as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute
        computedHash += i.to_bytes(4, 'little')
        computedHash = bytearray(hashit(computedHash, alg).digest())

    # Return the above as a base-64 encoded string. This is what goes in the documentProtection attribute.

    # BASE64 encode if requested
    if base64result:
        computedHash = base64.b64encode(computedHash)
        
    # return as an ASCII string if requested
    if returnstring:
        computedHash = computedHash.decode('utf-8')
        
    return computedHash

# -------------------------------------------------------------------- #

if __name__ == '__main__':
    myhash = generate_hash('johnjohn', base64.b64decode(TARGET_SALT))
    print(myhash)
    print(TARGET_HASH == myhash)

But ALAS! — still assertion fails. Which means I’m getting something wrong here… Who can help adapt the C# to Python 1:1?

Asked By: s0mbre

||

Answers:

Had to look at this today too and managed to reverse engineer it.

In plain English, the steps are:

  1. Truncate the password to 15 characters (it’s not clear if this is ASCII encoding or UTF8 – a few documents make a reference to the "Unicode password" but all examples seem to be ASCII based). My implementation simply takes the truncated bytes post UTF8 conversion (which preserves the ASCII set).
  2. Obtain the high-order word from the magic list based on the length of the password. If the password is 0 length, it’s just two zero bytes.
  3. For each byte in the password, grab the bits based on its position in the encryption matrix (taking care that the last character always corresponds to the last row, the first part of the matrix may be unused if the password is shorter than 15 bytes). For the first to seventh bit, if it’s set, do an XOR operation with the current value of the high order word. Repeat for each character.
  4. Grab a low order word (2 bytes) and initialize to zero. Perform the operation on each character, starting from the last character in the password and working forwards:
    low-order word = (((low-order word >> 14) AND 0x0001) | (low-order word << 1) & 0x7FFF)) ^ character (byte) (<<, >> are the bit shift left and right operators. |, &, ^ the bitwise or, and, and exclusive or respectively.)
  5. Then do low-order word = (((low-order word >> 14) & 0x0001) | (low-order word << 1) & 0x7FFF)) ^ password length ^ 0xCE4B.
  6. Form the key by appending the low order word to the high order word. Then reverse the byte ordering.
  7. For some reason, Microsoft Word then uses the Unicode hex representation of the above key, then back converts that representation into bytes (see link in comments).
  8. Now compute the hash once by prepending the salt bytes to the result from above. If there are no salt bytes, just skip this step.
  9. If there are iterations to compute, for each iteration, convert the iteration count (0-base) to a 32-bit (4 byte) integer (little endian), and (documentation wasn’t clear on this, it just said to "add" the bytes – but to align with the output I had to append it) append this to the current computed hash. Apply the requested hash algorithm (Word seems to default to SHA512, but from testing I saw that it handles the other options fine as well).
  10. Return the above as a base-64 encoded string. This is what goes in the documentProtection attribute.

Here’s my implementation in C# (NuGet):

/// <summary>
/// Class that generates hashes suitable for use with OpenXML Wordprocessing ML documents with the documentProtection element.
/// </summary>
public class WordprocessingMLDocumentProtectionHashGenerator
{
    private static readonly byte[][] HighOrderWords = new byte[][]
    {
        new byte[] { 0xE1, 0xF0 },
        new byte[] { 0x1D, 0x0F },
        new byte[] { 0xCC, 0x9C },
        new byte[] { 0x84, 0xC0 },
        new byte[] { 0x11, 0x0C },
        new byte[] { 0x0E, 0x10 },
        new byte[] { 0xF1, 0xCE },
        new byte[] { 0x31, 0x3E },
        new byte[] { 0x18, 0x72 },
        new byte[] { 0xE1, 0x39 },
        new byte[] { 0xD4, 0x0F },
        new byte[] { 0x84, 0xF9 },
        new byte[] { 0x28, 0x0C },
        new byte[] { 0xA9, 0x6A },
        new byte[] { 0x4E, 0xC3 }
    };

    private static readonly byte[,,] EncryptionMatrix = new byte[,,]
    {
        { { 0xAE, 0xFC }, { 0x4D, 0xD9 }, { 0x9B, 0xB2 }, { 0x27, 0x45 }, { 0x4E, 0x8A }, { 0x9D, 0x14 }, { 0x2A, 0x09 } },
        { { 0x7B, 0x61 }, { 0xF6, 0xC2 }, { 0xFD, 0xA5 }, { 0xEB, 0x6B }, { 0xC6, 0xF7 }, { 0x9D, 0xCF }, { 0x2B, 0xBF } },
        { { 0x45, 0x63 }, { 0x8A, 0xC6 }, { 0x05, 0xAD }, { 0x0B, 0x5A }, { 0x16, 0xB4 }, { 0x2D, 0x68 }, { 0x5A, 0xD0 } },
        { { 0x03, 0x75 }, { 0x06, 0xEA }, { 0x0D, 0xD4 }, { 0x1B, 0xA8 }, { 0x37, 0x50 }, { 0x6E, 0xA0 }, { 0xDD, 0x40 } },
        { { 0xD8, 0x49 }, { 0xA0, 0xB3 }, { 0x51, 0x47 }, { 0xA2, 0x8E }, { 0x55, 0x3D }, { 0xAA, 0x7A }, { 0x44, 0xD5 } },
        { { 0x6F, 0x45 }, { 0xDE, 0x8A }, { 0xAD, 0x35 }, { 0x4A, 0x4B }, { 0x94, 0x96 }, { 0x39, 0x0D }, { 0x72, 0x1A } },
        { { 0xEB, 0x23 }, { 0xC6, 0x67 }, { 0x9C, 0xEF }, { 0x29, 0xFF }, { 0x53, 0xFE }, { 0xA7, 0xFC }, { 0x5F, 0xD9 } },
        { { 0x47, 0xD3 }, { 0x8F, 0xA6 }, { 0x0F, 0x6D }, { 0x1E, 0xDA }, { 0x3D, 0xB4 }, { 0x7B, 0x68 }, { 0xF6, 0xD0 } },
        { { 0xB8, 0x61 }, { 0x60, 0xE3 }, { 0xC1, 0xC6 }, { 0x93, 0xAD }, { 0x37, 0x7B }, { 0x6E, 0xF6 }, { 0xDD, 0xEC } },
        { { 0x45, 0xA0 }, { 0x8B, 0x40 }, { 0x06, 0xA1 }, { 0x0D, 0x42 }, { 0x1A, 0x84 }, { 0x35, 0x08 }, { 0x6A, 0x10 } },
        { { 0xAA, 0x51 }, { 0x44, 0x83 }, { 0x89, 0x06 }, { 0x02, 0x2D }, { 0x04, 0x5A }, { 0x08, 0xB4 }, { 0x11, 0x68 } },
        { { 0x76, 0xB4 }, { 0xED, 0x68 }, { 0xCA, 0xF1 }, { 0x85, 0xC3 }, { 0x1B, 0xA7 }, { 0x37, 0x4E }, { 0x6E, 0x9C } },
        { { 0x37, 0x30 }, { 0x6E, 0x60 }, { 0xDC, 0xC0 }, { 0xA9, 0xA1 }, { 0x43, 0x63 }, { 0x86, 0xC6 }, { 0x1D, 0xAD } },
        { { 0x33, 0x31 }, { 0x66, 0x62 }, { 0xCC, 0xC4 }, { 0x89, 0xA9 }, { 0x03, 0x73 }, { 0x06, 0xE6 }, { 0x0D, 0xCC } },
        { { 0x10, 0x21 }, { 0x20, 0x42 }, { 0x40, 0x84 }, { 0x81, 0x08 }, { 0x12, 0x31 }, { 0x24, 0x62 }, { 0x48, 0xC4 } }
    };

    /// <summary>
    /// Generates a base-64 string according to the Wordprocessing ML Document DocumentProtection security algorithm.
    /// </summary>
    /// <param name="password"></param>
    /// <param name="salt"></param>
    /// <param name="iterations"></param>
    /// <param name="hashAlgorithmName"></param>
    /// <returns></returns>
    public string GenerateHash(string password, byte[] salt, int iterations, HashAlgorithmName hashAlgorithmName)
    {
        if (password == null)
        {
            throw new ArgumentNullException(nameof(password));
        }

        // Algorithm given in ECMA-374, 1st Edition, December 2006
        // https://www.ecma-international.org/wp-content/uploads/ecma-376_first_edition_december_2006.zip
        // Alternatively: https://c-rex.net/projects/samples/ooxml/e1/Part4/OOXML_P4_DOCX_documentProtection_topic_ID0EJVTX.html
        byte[] passwordBytes = Encoding.UTF8.GetBytes(password);
        passwordBytes = passwordBytes.Take(15).ToArray();
        int passwordLength = passwordBytes.Length;

        // If the password length is 0, the key is 0.
        byte[] highOrderWord = new byte[] { 0x00, 0x00 };
        if (passwordLength > 0)
        {
            highOrderWord = HighOrderWords[passwordLength - 1].ToArray();
        }
        for (int i = 0; i < passwordLength; i++)
        {
            byte passwordByte = passwordBytes[i];
            int encryptionMatrixIndex = i + (EncryptionMatrix.GetLength(0) - passwordLength);

            BitArray bitArray = passwordByte.ToBitArray();

            for (int j = 0; j < EncryptionMatrix.GetLength(1); j++)
            {
                bool isSet = bitArray[j];

                if (isSet)
                {
                    for (int k = 0; k < EncryptionMatrix.GetLength(2); k++)
                    {
                        highOrderWord[k] = (byte)(highOrderWord[k] ^ EncryptionMatrix[encryptionMatrixIndex, j, k]);
                    }
                }
            }
        }

        byte[] lowOrderWord = new byte[] { 0x00, 0x00 };
        BitSequence lowOrderBitSequence = lowOrderWord.ToBitSequence();
        BitSequence bitSequence1 = new byte[] { 0x00, 0x01 }.ToBitSequence();
        BitSequence bitSequence7FFF = new byte[] { 0x7F, 0xFF }.ToBitSequence();

        for (int i = passwordLength - 1; i >= 0; i--)
        {
            byte passwordByte = passwordBytes[i];
            lowOrderBitSequence = (((lowOrderBitSequence >> 14) & bitSequence1) | ((lowOrderBitSequence << 1) & bitSequence7FFF)) ^ new byte[] { 0x00, passwordByte }.ToBitSequence();
        }

        lowOrderBitSequence = (((lowOrderBitSequence >> 14) & bitSequence1) | ((lowOrderBitSequence << 1) & bitSequence7FFF)) ^ new byte[] { 0x00, (byte)passwordLength }.ToBitSequence() ^ new byte[] { 0xCE, 0x4B }.ToBitSequence();
        lowOrderWord = lowOrderBitSequence.ToByteArray();

        byte[] key = highOrderWord.Concat(lowOrderWord).ToArray();
        key = key.Reverse().ToArray();

        // https://docs.microsoft.com/en-us/openspecs/office_standards/ms-oe376/fb220a2f-88d4-488c-a9b7-e094756b6699
        // In Word, an additional third stage is added to the process of hashing and storing a user supplied password.  In this third stage, the reversed byte order legacy hash from the second stage shall be converted to Unicode hex string representation [Example: If the single byte string 7EEDCE64 is converted to Unicode hex string it will be represented in memory as the following byte stream: 37 00 45 00 45 00 44 00 43 00 45 00 36 00 34 00. end example], and that value shall be hashed as defined by the attribute values.
        key = Encoding.Unicode.GetBytes(BitConverter.ToString(key).Replace("-", string.Empty));

        HashAlgorithm hashAlgorithm = hashAlgorithmName.Create();

        byte[] computedHash = key;

        if (salt != null)
        {
            computedHash = salt.Concat(key).ToArray();
        }

        // Word requires that the initial hash of the password with the salt not be considered in the count.
        computedHash = hashAlgorithm.ComputeHash(computedHash);

        for (int i = 0; i < iterations; i++)
        {
            // ISO/IEC 29500-1 Fourth Edition, 2016-11-01
            // 17.15.1.29 - spinCount
            // Specifies the number of times the hashing function shall be iteratively run (runs using each iteration''s result plus a 4 byte value (0-based, little endian) containing the number of the iteration as the input for the next iteration) when attempting to compare a user-supplied password with the value stored in the hashValue attribute.
            byte[] iterationBytes = BitConverter.GetBytes(i);
            computedHash = computedHash.Concat(iterationBytes).ToArray();
            computedHash = hashAlgorithm.ComputeHash(computedHash);
        }

        return Convert.ToBase64String(computedHash);
    }
}

I tested it with your example hash and checked that it passed:

    [TestClass]
[TestCategory("WordprocessingMLDocumentProtectionHashGenerator")]
public class WordprocessingMLDocumentProtectionHashGeneratorTests
{
    [TestMethod]
    public void GeneratesKnownHashes()
    {
        WordprocessingMLDocumentProtectionHashGenerator wordprocessingMLDocumentProtectionHashGenerator = new WordprocessingMLDocumentProtectionHashGenerator();

        Assert.AreEqual("sstT7oPzpUQTchSUE6WbidCrZv1c8k+/5D1Pm+weZt7QoaeSnBEg/cZFg2W+1eohg1mgXGXLci1CWbnbHDYsXQ==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Example", Convert.FromBase64String("KPr2WqWFihenPDtAmpqUtw=="), 100000, HashAlgorithmName.SHA512));
        Assert.AreEqual("uBuZhlyVTOQtRwQuOGjY7GU3FnJbe1VFKvN+j9u27HSbthOY+n1/daU/WCkqV40fG6HxX+pxgR+Ow4ZvAE7aZg==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 100000, HashAlgorithmName.SHA512));
        Assert.AreEqual("mkGbBri0a1icL1nJKTQL7PyLUY2Uei2wyMHC0Y6s1+DOMYvPWdB6cy0Npao15O0+yqtyZW4hAP0+dcdyrEk7qg==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 0, HashAlgorithmName.SHA512));
        Assert.AreEqual("qdPI8cSBM/21Mr29mfFrR6l7hIn8oLKKT1nTDXHsAQA=", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Testerman", Convert.FromBase64String("On9D022mrdqvHTb6eEkFGA=="), 100000, HashAlgorithmName.SHA256));
        Assert.AreEqual("d5FZvHnQhm6Mzqy6cYE7ZbniYXA/8qJxkAze0sFcNirWYhaLpScmSsfBHptuEmuBreLuNjyV5IjdUoOFWM9mbQ==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("Password", null, 100000, HashAlgorithmName.SHA512));
        Assert.AreEqual("pVjR9ktO9vlxijXcMPlH+4PLwD4Xwy1aqbNQOFmWaSpvBjipNh//T8S3nBhq6HRoRVfWL6s/+NdUCPTxUr0vZw==", wordprocessingMLDocumentProtectionHashGenerator.GenerateHash("johnjohn", Convert.FromBase64String("pH1TDVHSfGBxkd3Q88UNhQ=="), 100000, HashAlgorithmName.SHA512));
    }
}
Answered By: Andrew O
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.