Combing concurrent.future.as_complete() with dictionary using zip()

Question:

I am a first time user of concurrent.futures and following the official guides.

Problem: Inside the as_completed() block, how do I access the k, v which is inside the future_to_url?

The k variable is vital.

Using something like:

for (future, k,v) in zip(concurrent.futures.as_completed(future_to_url), urls.items()):

I stumbled on this post however I cannot decipher the syntax to reproduce

Original

def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}
        for future in concurrent.futures.as_completed(future_to_url):
            data = future.result()
            json = data.json()
            print(f"k: {future[k]}")

Second Attempt – Using zip which breaks

def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}
        for (future, k, v) in zip(concurrent.futures.as_completed(future_to_url), urls.items()):
            data = future.result()
            json = data.json()
            print(f"k: {k}")

Third Broken Attempt – Using Map
source

for future, (k, v) in map(concurrent.futures.as_completed(future_to_url), scraping_robot_urls.items()):

TypeError: ‘generator’ object is not callable

Fourth Broken Attempt – Storing the k,v pairs before the as_completed() loop and pairing them with an enumerate index

    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(get_response, v): v for k, v in scraping_robot_urls.items()}
        info = {k: v for k, v in scraping_robot_urls.items()}
        for i, future in enumerate(concurrent.futures.as_completed(future_to_url)):
            url = future_to_url[future]
            data = future.result()
            print(f"data: {data}")
            print(f"key: {list(info)[i]} / url: {url}")

This does not work as the URL, does not match the key, they seem to be mismatched, and I cannot rely on this behaviour working.

For completeness, here are the dependencies

def visit_url(url):
    return requests.get(url)

urls = {
  'id123': 'www.google.com', 
  'id456': 'www.bing.com', 
  'id789': 'www.yahoo.com'
}

Sources of inspiration:

Asked By: dimButTries

||

Answers:

This has nothing to do with futures and more to do with list comprehension.

    future_to_url = {executor.submit(visit_url, v): v for k, v in urls.items()}

Is looping everything in the urls dict and getting the key and value(k, v) and submitting that to the executor to run visit_url. k and v will not be available outside of the for loop because the scope of those variables belong to the for loop.

If you want to have the results of the call and what URL it was called on you can pass the URL back as a return tuple:

from tornado import concurrent


def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, k, v): v for k, v in urls.items()}
        for future in concurrent.futures.as_completed(future_to_url):
            id, data = future.result()
            json = data.json()
            print(f"id: {id}")
            print(f"data: {json}")

def visit_url(id, url):
    return id, requests.get(url)

urls = {
  'id123': 'www.google.com',
  'id456': 'www.bing.com',
  'id789': 'www.yahoo.com'
}

After comments made by OP (mainly that this seems dirty by using the scope of the visit_url function to pass context/keys back after exec) I can propose a more OOP way of doing this:

import requests
from tornado import concurrent

class URL:
    def __init__(self, id, url):
        self.id = id
        self.url = url
        self.response = None

    def vist(self):
        self.response = requests.get(self.url)
        return self

def start():
    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(c.vist): c for c in urls}
        for future in concurrent.futures.as_completed(future_to_url):
            data = future.result()
            print(f"response: {data.response}")
            print(f"id: {data.id}")

urls = [
  URL('id123', 'http://www.google.com'),
  URL('id456', 'http://www.bing.com'),
  URL('id789', 'http://www.yahoo.com')
]

start()

This ensures the response, ID and URL are together in their class which might be cleaner for some. The for loop to submit to the executor is simplified as well.

Answered By: testfile

For posterity, I was inspired by testfile’s response.

I resolved this issue by sneaking the k inside the visit_url() function.

def visit_url(url, k):
    return k, requests.get(url)

I now have access to the key, inside the as_completed() loop. It is predictable, as the key and URL will match. Unlike binding an outer loop, with one inside the as_completed() loop. Which irregularly behaved, due to the external requests resolving in random order.

    with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
        future_to_url = {executor.submit(visit_url, v, k): v for k, v in scraping_robot_urls.items()}
        for future in concurrent.futures.as_completed(future_to_url):
            url = future_to_url[future]
            key, data = future.result()
            print(f"key: {key} / url: {url}")

This resolution feels to me like a hack, as I am using the scope of another function to pass "state/variable" to something else.

Answered By: dimButTries

I came up with this simple solution to using as_completed with a dictionary.

run as_compled using the dictionary values(), then match the result with the results in the dictionary to retrieve the key.

Retrieve the result and assign it to a dictionary using the key.

data={}
with concurrent.futures.ThreadPoolExecutor(max_workers=50) as executor:
    future_to_url = {k: executor.submit(visit_url, v) for k, v in urls.items()}
    for i in concurrent.futures.as_completed(future_to_url):
            for k, v in future_to_url.items():
                if v == i:
                    data[k] = future_to_url[k].result()
print(data)

It would be very simple to put something like this inside the as_completed() function. If the object passed to as_completed() is a dictionary, it would return the key or key, value with as_completed(dict).items().

Answered By: ride_the_chaos
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.