Python: Is there a way to join threads while using semaphores

Question:

Background:

I have an inventory application that scrapes data from our various IT resources (VMware, storage, backups, etc…) We have a vCenter that has over 2000 VMs registered to it. I have code to go in and pull details for each VM in its own thread to parallelize the collections.

I have them joined to a parent thread so that the different sections will complete before it moves onto the next area. I also have it set to timeout after 10 minutes so that the collection isn’t held up by a single object thread that just gets stuck. What I’ve found though is that when I try to pull data for more than about 1000 objects at once, it overloads the vCenter and it kills my connection, and almost all of the child threads die.

I’m quite sure that it’s partially related to vCenter versions that are below 7.0 (we’re using 6.7 in a lot of places). But we’re stuck using the current versions due to older hardware.

What I would like to do is limit the number of threads spun up using semaphores, but also have them joined to the parent thread when they are spun up. All of the ways I’ve thought of to do this either end up serializing the collection, or end up having the join timeout after 10 minutes.

Is there a way to pull this off? The part that gets me stuck is joining the thread because it blocks the rest of the operations. Once I stop joining the threads, I can’t join any others.

Code sample:

        try:
            objects = vsphere_client.vcenter.VM.list() # try newer REST API operation
            old_objects = container_view.view # old pyvmomi objects
            rest_api = True
        except UnableToAllocateResource: # if there's too many objects for the REST API to return happens at 1000 on vCenter 6.7 and 4000 on 7.0
            objects = container_view.view
            old_objects = None
        except OperationNotFound: # if different error happens
            objects = container_view.view
            old_objects = None

        threads = []
        for obj in objects:
            thread = RESTVMDetail(vsphere_client, db_vcenter, obj, old_objects, rest_api, db_vms, db_hosts,
                                  db_datastores, db_networks, db_vm_disks, db_vm_os_disks, db_vm_nics, db_vm_cdroms,
                                  db_vm_floppies, db_vm_scsis, db_regions, db_sites, db_environments, db_platforms,
                                  db_applications, db_functions, db_costs, db_vm_snapshots, api_limiter)
            threads.append(thread)

        for thread in threads:
            thread.start()

        for thread in threads:
            thread.join(600)

Asked By: Jimmy Fort

||

Answers:

I had to switch this to a consumer/producer implementation utilizing a queue. That allowed me to limit the number of collections that would be kicked off simultaneously.

Answered By: Jimmy Fort