Finding Existence of a Set From Combined Values of a Hashmap

Question:

Given string-list P of len(P) employees.
A training session can be scheduled on 2 of 10 potential days, the days represented by 0-9.
Each employee has given a string which consists of the numbers 0-9 with no spaces.
These preferences are combined into an array of strings, P, consisting of all the employees preferences.
In this format, P[k] represents the K'th employees preferences for what days they can attend training.
The training can take place on 2 of the 10 days, which 2 are they?

`P = ["0123", "012", "23", "256", "689", "1567"]`

Example P, there are 0-5 employees, based on the indices of P. Based on the preferences in the strings of each employee:

  • Day 0 can host employees: 0, 1
  • Day 1 can host employees: 0, 1, 5
  • Day 2 can host employees: 0, 1, 2, 3
  • Day 3 can host employees: 0, 2
  • Day 4 can host employees:
  • Day 5 can host employees: 3, 5
  • Day 6 can host employees: 3, 4, 5
  • Day 7 can host employees: 5
  • Day 8 can host employees: 4
  • Day 9 can host employees: 4

#The days which can host the most employees are day 2 and day 6 which together have all employees.
#What is the formula to find the days that contain the full suite of attendees for any input P?

So far I have written a function which creates the above bullet points in the form of a dict:


def find_days(prefs):
    #Initialise the Dict.
      preferences = {}
      num_employees = set(range(len(prefs)))
      
      #it will be computationally faster to iterate over each string once and update 
      #the hashmap multiple times, 
      for index, employee in enumerate(prefs):
        available = [*employee]
        #print(available)
      
        #Update all preferences into the dict.
        for day in available:
          #create or update the list of employees for that particular day.
          if day in preferences.keys():
            preferences[day].append(index)
          else:
            preferences[day] = [index]
      #print(preferences)
      #Pick the 2 days which form the greatest subset of employees
      day1 = None
      day2 = None

Going on length of preferences for a specific day is not enough, as it does not account for overlap, how can I check for combined maximum value of 2 keys against the set num_employees?

Asked By: Siggyweb

||

Answers:

You can very easily solve this with n^2 complexity, where n is the number of days, which works fine when there are only 10 days. You just need to compare every 2 possible pairs of days and return the one that covers most employees.

def find_days(prefs):
    # Initialise the dictionary.
    preferences = {}
    for index, employee in enumerate(prefs):
        for day in employee:
            preferences.setdefault(day, set()).add(index)

    # Find the pair of days that have the maximum count of unique employees.
    max_count = 0
    for day1 in preferences:
        for day2 in preferences:
            if day1 >= day2:
                continue
            count = len(preferences[day1] | preferences[day2])
            if count > max_count:
                max_count = count
                best_days = (day1, day2)
                
    return best_days

P = ["0123", "012", "23", "256", "689", "1567"]
print(find_days(P))
Answered By: Nejc