How to Convert Dask DataFrame Into List of Dictionaries?

Question:

I need to convert a dask dataframe into a list of dictionaries as the response for an API endpoint. I know I can convert the dask dataframe to pandas, and then from there I can convert to dictionary, but it would be better to map each partition to a dict, and then concatenate.

What I tried:

df = dd.read_csv(path, usecols=cols)

dd.compute(df.to_dict(orient='records'))

Error I’m getting:

AttributeError: 'DataFrame' object has no attribute 'to_dict'
Asked By: Riley Hun

||

Answers:

You can do it as follows

import dask.bag as db
db.from_delayed(df.map_partitions(pd.DataFrame.to_dict, orient='records'
    ).to_delayed())

which gives you a bag which you could compute (if it fits in memory) or otherwise manipulate.

Note that to_delayed/from_delayed should not be necessary, there is also a to_bag method, but it doesn’t seem to do the right thing.

Also, you are not really getting much from the dataframe model here, you may want to start with db.read_text and the builtin CSV module.

Answered By: mdurant

Try this:

data=list(df.map_partitions(lambda x:x.to_dict(orient="records")))

It will return a list of dictionaries wherein each row will be converted to the dictionary.

Answered By: Kunal Bafna

The answer of Kunal Bafna is easiest to implement, and has fewer dependencies.

data=list(df.map_partitions(lambda x:x.to_dict(orient="records")))