how to get unique random values from a list within a for loop?

Question:

I made a script that combines data from 2 different csv files and generates a txt file with different lines (prompt).
What I want to do is to avoid a repetition of the same "fintag" variable in a way that all the prompts would be different.

This script does exactly what I need, but it obviously repeats some of the values because ran is a random number.

I can’t avoid repetitions of the same random number, because the random number is used in multiple column. Creating a different variable for each column would solve it, but the columns number is high, and it might even change overtime.

The alternative is to remove the elements from the "asstag" lists once they’ve been used, but the list is generated within a for loop and I have no idea how to remove elements from a list while a for loop is iterating on it.

Input:

people = {'Name' : ['mark', 'bill', 'tim', 'frank'],
        'Tag' : [color, animal, clothes, animal]}
dic = {'color' : ['blu', 'green', 'red', 'yellow'],
        'animal' : [dog, cat, horse, shark],
        'clothes' : [gloves, shoes, shirt, socks]}

Expected Output:

mark blu (or green, or red, or yellow)
bill horse (or dog, or cat, or shark)
tim socks (or gloves, or shoes, or shirt)
frank dog (or cat, or shark, but not horse if horse is already assigned to bill)

Code:

people = pd.read_csv("people.csv")
dic = pd.read_csv("dic.csv")

nam = list(people.loc[:,"Name"])    
tag = list(people.loc[:,"Tag"])

with open("test.txt", "w+") as file:  
    for n, t in zip (nam, tag):
        asstag = list(dic.loc[:, t])
        ran = random.randint(0, len(dic.loc[:, tag]) - 1)
        fintag = asstag[ran]
        prompt = (str(nam) + " " + str(fintag))
        print(prompt)
        file.write(prompt)

Answers:

One approach to select by tag unique elements, using random.sample:

import pandas as pd
import random
from collections import Counter

random.seed(42)

people = pd.DataFrame({'Name': ['mark', 'bill', 'tim', 'frank'],
                       'Tag': ['color', 'animal', 'clothes', 'animal']})
dic = pd.DataFrame({'color': ['blu', 'green', 'red', 'yellow'],
                    'animal': ['dog', 'cat', 'horse', 'shark'],
                    'clothes': ['gloves', 'shoes', 'shirt', 'socks']})

names = list(people.loc[:, "Name"])
tags = list(people.loc[:, "Tag"])

samples_by_tag = {tag: random.sample(dic.loc[:, tag].unique().tolist(), count) for tag, count in Counter(tags).items()}

for name, tag in zip(names, tags):
    print(name, samples_by_tag[tag].pop())

Output

mark blu
bill horse
tim shirt
frank dog

The idea is to sample n_i unique elements by each tag using random.sample, where n_i is the number each tag appears in tags, this is done in the line:

samples_by_tag = {tag: random.sample(dic.loc[:, tag].unique().tolist(), count) for tag, count in Counter(tags).items()}

for a given run it can take the following value:

{'color': ['blu'], 'animal': ['dog', 'horse'], 'clothes': ['shirt']}
 # samples_by_tag 

Note that you need to remove:

random.seed(42)

to make the script give random results every time. See the documentation on random.seed and the notes on reproducibility.

UPDATE

If one tag has fewer values than need, and you have a list to replace them, do the following:

other_colors = ['black', 'violet', 'green', 'brown']
populations = { tag : dic.loc[:, tag].unique().tolist() for tag in set(tags) }
populations["color"] = list(set(other_colors))

samples_by_tag = {tag: random.sample(populations[tag], count) for tag, count in Counter(tags).items()}

for name, tag in zip(names, tags):
    print(name, samples_by_tag[tag].pop())
Answered By: Dani Mesejo
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.