Why are my the delimiters not random enough and the numbers subdomain count range aren't correct?

Question

Hello guys i want to ask, why are my punctuations on my domain name generator not random enough? here is my code

import random

Example = open("dictionaries/examples.txt", 'w', encoding="utf-8")
with open("dictionaries/Combined.txt") as i:
    Test = [line.rstrip() for line in i]

This is a few words what inside the combined text file

 Example ['login', 'confirm', 'signup', 'confirmation', 'enroll', 'mobile', 'access',
'claim', 'service', 'group', 'recovery', 'support', 'find', 'confirmation']

delimeters = ['', '-', '.']
web = 'web'
subdomain_count2 = 2
subdomain_count3 = 3
subdomain_count4 = 4
suffix = ['co.id', 'info', 'id', 'com']
output = []
for i in range(100):
    outputs = []
    for j in range(subdomain_count2):
        outputs.append(random.choice(Test))
    data = outputs + [web]
    random.shuffle(data)
    for k in range(subdomain_count3):
        outputs.append(random.choice(Test))
    data2 = outputs + [web]
    random.shuffle(data2)
    for l in range(subdomain_count4):
        outputs.append(random.choice(Test))
    data3 = outputs + [web]
    random.shuffle(data3)
    prefix = random.choice(delimeters).join(data)
    prefix2 = random.choice(delimeters).join(data2)
    prefix3 = random.choice(delimeters).join(data3)
    out = random.choice(delimeters).join([prefix])
    out2 = random.choice(delimeters).join([prefix2])
    out3 = random.choice(delimeters).join([prefix3])
    tld = random.choice(suffix)
    addr1 = '.'.join([out, tld])
    addr2 = '.'.join([out2, tld])
    addr3 = '.'.join([out3, tld])
    addr = (addr1 + 'n' + addr2 + 'n' + addr3 + 'n')
    output.append(addr)
    Example.write(addr)

with open("dictionaries/examples.txt") as f:
    websamples = [line.rstrip() for line in f]

This is the output that i get.

As you can see from the image that the domain punctuation is pretty straightforward and has no variation for the domain name. it should’ve looked like this web-loginconfirm.co.id or web.login-confirm.co.id since I have made the delimiters = ['', '-', .] so there would be a combination where there are 3 characters connected with the punctuation like this web-confirm.group.com or confirm.web-group.com and there are also 3 characters that connects without the punction like this webclaimservice.com.

And also as you can see on the 2nd domain and 3rd domain, it somehow does not follow the subdomain number that I want based on this code line for example if subdomain_count3 = 3 there should’ve only been 3 subdomains. On the 2nd domain (excluding the web variable since it’s a fixed one and I want it to always show up on every domain) signup-login-web-confirmation-confirm-enroll.co.id, there are 5 subdomains instead of 3 subdomains (signup, login, confirmation, confirm, enroll). Is there a reason why the delimiters are not random enough and also the subdomain count on the 2nd and the 3rd one has too many variables than i wanted?

Thank you!

EDIT:

Okay for the subdomain range one I have actually found the solution.

I basically added another output name for each subdomain

for i in range(5):
    outputs = []
    outputs2 = []
    outputs3 = []

No need to answer the 2nd question, i still need help of the first one

thanks!

Asked By: Tsukuru

||

Source

Answer 1

Okay for the subdomain range one I have actually found the solution.

I basically added another output name for each subdomain

for i in range(5):
    outputs = []
    outputs2 = []
    outputs3 = []

No need to answer the 2nd question, i still need help of the first one

thanks!

Answered By: Tsukuru

Answer 2

The domain punctuation varies between the three domains generated in the main loop, for example the three domain names with find in them one has a dot and the other has nothing. You are only choosing the delimiter once, and using that one delimiter to join all the elements in data:

prefix = random.choice(delimeters).join(data)

You are adding to the same output list for the second and third subdomains, either don’t do that (in which case you can get rid of 2/3 of the code and just loop over three different values for the number of domains to add) or only add the difference between subdomain_count3 and subdomain_count2 subdomains.

You can also use random.choices to get a list of k choices rather than looping and calling random.choice repeatedly. Do that twice – once for the domain parts and once for the delimiters, then zip the lists.

import random
Test = ['login', 'confirm', 'signup', 'confirmation', 'enroll', 'mobile', 'access',
        'claim', 'service', 'group', 'recovery', 'support', 'find', 'confirmation']

delimiters = ['', '-', '.']
web = 'web'
suffix = ['co.id', 'info', 'id', 'com']
output = []

for i in range(100):
    for subdomain_count in [2, 3, 4]:
        data = [web] + random.choices(Test, k=subdomain_count)
        random.shuffle(data)
        delims = (random.choices(delimiters, k=subdomain_count) +
            ['.' + random.choice(suffix)])
        address = ''.join([a+b for a, b in zip(data, delims)])
        output.append(address)

for o in output:
    print(o)

Answered By: Pete Kirkham

Why are my the delimiters not random enough and the numbers subdomain count range aren't correct?

Question:

Answers: