dask-distributed

Dask tasks distributions with synthetic test

Dask tasks distributions with synthetic test Question: I am trying to use Dask to distribute calculations over multiple systems. However, there is some concept I fail to understand because I cannot reproduce a logical behavior with a simple test that I was using for python mutliprocessing. I am using this very naive code: import dask …

Total answers: 1

How would you use Dask to recursively find neighbouring polygons in a Dask.Geodataframe?

How would you use Dask to recursively find neighbouring polygons in a Dask.Geodataframe? Question: I am new to Dask. I’ve been trying to get it to do the following task: I have two geodataframes and a set: # Main chunk and combined chunk are a list of polygons of tessellated cells main_chunk = gpd.read_parquet(f"./out/singapore/tess_chunk_{int(n1)}.pq") combined_chunks …

Total answers: 1

How to access nested data in Dask Bag while using dask mongo

How to access nested data in Dask Bag while using dask mongo Question: Below is the sample data – ({‘age’: 61, ‘name’: [‘Emiko’, ‘Oliver’], ‘occupation’: ‘Medical Student’, ‘telephone’: ‘166.814.5565’, ‘address’: {‘address’: ‘645 Drumm Line’, ‘city’: ‘Kennewick’}, ‘credit-card’: {‘number’: ‘3792 459318 98518’, ‘expiration-date’: ’12/23′}}, {‘age’: 54, ‘name’: [‘Wendolyn’, ‘Ortega’], ‘occupation’: ‘Tractor Driver’, ‘telephone’: ‘1-975-090-1672’, ‘address’: {‘address’: …

Total answers: 1

Disable pure function assumption in dask distributed

Disable pure function assumption in dask distributed Question: The Dask distributed library documentation says: By default, distributed assumes that all functions are pure. […] The scheduler avoids redundant computations. If the result is already in memory from a previous call then that old result will be used rather than recomputing it. When benchmarking function runtimes, …

Total answers: 1

How to store data from dask.distributed on disk?

How to store data from dask.distributed on disk? Question: I’m trying to scale my computations from local Dask Arrays to Dask Distributed. Unfortunately, I am new to distributed computed, so I could not adapt the answer here for my purpose. Mainly my problem is saving data from distributed computations back to an in-memory Zarr array …

Total answers: 1

How to control python dask's number of threads per worker in linux?

How to control python dask's number of threads per worker in linux? Question: I tried to use dask localcluster, in multiprocess but single thread per process setup, in linux, but failed so far: from dask.distributed import LocalCluster, Client, progress def do_work(): while True: pass return if __name__ == ‘__main__’: cluster = LocalCluster(n_workers=2, processes=True, threads_per_worker=1) client …

Total answers: 1

Defining `__iter__` method for a dask actor?

Defining `__iter__` method for a dask actor? Question: Is it possible for a dask actor to have an __iter__ method as defined by a class? Consider this example adapted from the docs: class Counter: """A simple class to manage an incrementing counter""" def __init__(self): self.n = 0 def increment(self): self.n += 1 return self.n def …

Total answers: 1

How does dask know variable states before it runs map_partitions?

How does dask know variable states before it runs map_partitions? Question: In the dask code below I set x with 1 and 2 right before executing two map_partitions. The result seems fine, however I don’t fully understand it. If dask waits to run the two map_partitions only when it finds the compute(), and at the …

Total answers: 1

limit number of CPUs used by dask compute

limit number of CPUs used by dask compute Question: Below code uses appx 1 sec to execute on an 8-CPU system. How to manually configure number of CPUs used by dask.compute eg to 4 CPUs so the below code will use appx 2 sec to execute even on an 8-CPU system? import dask from time …

Total answers: 1