dask

Dask tasks distributions with synthetic test

Dask tasks distributions with synthetic test Question: I am trying to use Dask to distribute calculations over multiple systems. However, there is some concept I fail to understand because I cannot reproduce a logical behavior with a simple test that I was using for python mutliprocessing. I am using this very naive code: import dask …

Total answers: 1

How would you use Dask to recursively find neighbouring polygons in a Dask.Geodataframe?

How would you use Dask to recursively find neighbouring polygons in a Dask.Geodataframe? Question: I am new to Dask. I’ve been trying to get it to do the following task: I have two geodataframes and a set: # Main chunk and combined chunk are a list of polygons of tessellated cells main_chunk = gpd.read_parquet(f"./out/singapore/tess_chunk_{int(n1)}.pq") combined_chunks …

Total answers: 1

How to access nested data in Dask Bag while using dask mongo

How to access nested data in Dask Bag while using dask mongo Question: Below is the sample data – ({‘age’: 61, ‘name’: [‘Emiko’, ‘Oliver’], ‘occupation’: ‘Medical Student’, ‘telephone’: ‘166.814.5565’, ‘address’: {‘address’: ‘645 Drumm Line’, ‘city’: ‘Kennewick’}, ‘credit-card’: {‘number’: ‘3792 459318 98518’, ‘expiration-date’: ’12/23′}}, {‘age’: 54, ‘name’: [‘Wendolyn’, ‘Ortega’], ‘occupation’: ‘Tractor Driver’, ‘telephone’: ‘1-975-090-1672’, ‘address’: {‘address’: …

Total answers: 1

How to build a datetime in dask from separate fields

How to build a datetime in dask from separate fields Question: I’m trying to build a computed column in dask, a datetime from separate fields year, month, day, hour. And I can’t find a way to make it work. With the method below it’s creating a datetime column, but inside it’s not datetime. I’ve tried …

Total answers: 1

populate SQL database with dask dataframe and dump into a file

populate SQL database with dask dataframe and dump into a file Question: reproduce the error and the use case on this colab I have multiple large tables that I read and analyze through Dask (dataframe). After doing analysis, I would like to push them into a local database (in this case sqlite engine through sqlalchemy …

Total answers: 1

How to add a constant to negative values in array

How to add a constant to negative values in array Question: Given the xarray below, I would like to add 10 to all negative values (i.e, -5 becomes 5, -4 becomes 6 … -1 becomes 9, all values remain unchanged). a = xr.DataArray(np.arange(25).reshape(5, 5)-5, dims=("x", "y")) I tried: a[a<0]=10+a[a<0], but it returns 2-dimensional boolean indexing …

Total answers: 2

Python Dask – how to get row content on string match

Python Dask – how to get row content on string match Question: I have a very large dataset (>1m entries), then I have a list of postcodes and I want to loop through the postcodes and create a list of matching output areas code from the dataset. The dataset source: https://geoportal.statistics.gov.uk/datasets/06938ffe68de49de98709b0c2ea7c21a/about The code: import dask.dataframe …

Total answers: 1

Increase performance of df.rolling(…).apply(…) for large dataframes

Increase performance of df.rolling(…).apply(…) for large dataframes Question: Execution time of this code is too long. df.rolling(window=255).apply(myFunc) My dataframes shape is (500, 10000). 0 1 … 9999 2021-11-01 0.011111 0.054242 2021-11-04 0.025244 0.003653 2021-11-05 0.524521 0.099521 2021-11-06 0.054241 0.138321 … I make the calculation for each date with the last 255 date values. myFunc looks …

Total answers: 2

Trying to filter in dask.read_parquet tries to compare NoneType and str

Trying to filter in dask.read_parquet tries to compare NoneType and str Question: I have a project where I pass the following load_args to read_parquet: filters = {‘filters’: [(‘itemId’, ‘=’, ‘9403cfde-7fe5-4c9c-916c-41ff0b595c5c’)]} According to the documentation, a List[Tuple] like this should be accepted and I should get all partitions which match the predicate (or equivalently, filter out …

Total answers: 2