Create a Dataframe in Dask

Question:

I’m just starting using Dask as a possible replacement (?) of pandas. The first think that hit me is that i can’t seem to find a way to create a dataframe from a couple lists/arrays.

In regular pandas i just do: pd.DataFrame({'a':a,'b':b,...}) but i can’t find an equivalent way to do it in Dask, other than create the df in pandas and then create a dask df with from_pandas().

Is there any way? Or the only way is literally to create the df in pandas and then "import" it into a dask df?

Asked By: Ghost

||

Answers:

There is a fairly recent feature by @MrPowers that allows creating dask.DataFrame using from_dict method:

from dask.dataframe import DataFrame
ddf = DataFrame.from_dict({"num1": [1, 2, 3], "num2": [7, 8, 9]}, npartitions=2)

However, note that this method is meant for more concise dask.DataFrame code when used in tutorials and code examples, so when working with real datasets it’s better to use more appropriate methods, e.g. read_csv or read_parquet.

Answered By: SultanOrazbayev