Generate random ASCII strings until first N characters of it's hash matches specific string

Question:

So I’m trying to create a program in Python that generates a bunch of random ASCII strings that gets hashed with the specified id number "1234". The goal is to try to find a valid string where the first 6 letters of its hash starts with "a0eebb" and the 7th and 8th characters are equal to "34", the last 2 characters of the id string.

The problem is my program has been running for over 13 hours, and there hasn’t been a match. I’m wondering if something might be wrong with the logic of my program?

I was also wondering if there might be a way to speed up the process – I came across using multiple threads, but wasn’t sure how I would incorporate that into my program. So far, I’ve just been running the program on multiple command line terminals that are open.

My program just generates strings of 10 characters by default, but the length could be anything. I also found a suggestion to use itertools.product to generate the ascii strings? Wasn’t sure how that worked/if it would be any different than what I am doing now.

Any help would be appreciated!

import string
import hashlib
import random

id = "1234"
random_string = ""
count = 0

while True:   
    random_string = ''.join(random.choice(string.ascii_letters) for i in range(10))
    random_string_with_salt = random_string + id    
    random_string_hash = hashlib.sha256(random_string_with_salt.encode()).hexdigest()
    print("Random string generated: " + random_string)  
    count = count + 1        
    print("Count: " + str(count))     
    if ((random_string_hash[0:6] == "a0eebb") and (random_string_hash[6:8] == id[-2:])): 
        valid_string = random_string
        print("Random string generated: " + valid_string)
        print("Random string hash:  " + random_string_hash)
        print("Count: " + str(count))
        break
        
        
Asked By: DemonSlayer

||

Answers:

In line with my comments here is an example of what I was trying to explain.

import hashlib
import itertools
import time

# match_target = bytes.fromhex('a0eebb34')
match_target = bytes.fromhex('a0eebb')

alphabet = ''.join(map(chr, range(33, 127))).encode('ascii')


def find_hash_input(target: bytes) -> tuple[str, int]:
    target_len = len(target)
    iteration_count = 0
    for input_tuple in itertools.product(alphabet, repeat=6):
        input_bytes = bytearray(input_tuple)
        iteration_count += 1
        if hashlib.sha256(input_bytes).digest()[:target_len] == target:
            return input_bytes.decode('ascii'), iteration_count


def tests():
    start = time.process_time()
    input_str, count = find_hash_input(match_target)
    elapsed = time.process_time() - start
    print(f'found {input_str} in {elapsed} seconds')
    # check
    if hashlib.sha256(input_str.encode('ascii')).digest()[: len(match_target)] != match_target:
        print('failure')
    else:
        print(f'success after {count} iterations')


if __name__ == '__main__':
    tests()

This takes about 8 seconds on my machine to find the first 3 bytes of the match. Therefore, to find all four I would expect it take about 256 times as long. In other words, less than an hour.

Actually, more careful measurement shows that it should take around 23 seconds to find the 3 byte match, thus the expected time to find a 4 byte match would be around 1.7 hours on my machine.

The expression itertools.product(alphabet, repeat=6) may need some explanation. The expected number of iterations to success is 2number of bits to be matched = 232 in this case. However, there is variance in this statistic, it might take fewer iterations or it might take more. The number of distinct values generated by itertools.product(alphabet, repeat=k) is len(alphabet)k. Our alphabet has 94 characters. With repeat=5 we would be able to generate about 232.8 inputs. That’s more than we expect to need but with the variance it’s possible we’ll run out of inputs before we find a match. With repeat=6 we can generate 239.3 inputs, more than enough. I really should have a more general calculation for the repeat count. Perhaps I’ll add it in a later edit.