# How to transform Dask.DataFrame to pd.DataFrame?

## Question:

How can I transform my resulting dask.DataFrame into pandas.DataFrame (let’s say I am done with heavy lifting, and just want to apply sklearn to my aggregate result)?

## Answers:

You can call the .compute() method to transform a dask.dataframe to a pandas dataframe:

```
df = df.compute()
```

```
pd_df = pd.DataFrame(dsk_df)
```

Here you go. It’s faster than `dsk_df.compute()`

.

MRocklin’s answer is correct and this answer gives more details on when it’s appropriate to convert from a Dask DataFrame to and Pandas DataFrame (and how to predict when it’ll cause problems).

Each partition in a Dask DataFrame is a Pandas DataFrame. Running `df.compute()`

will coalesce all the underlying partitions in the Dask DataFrame into a single Pandas DataFrame. That’ll cause problems if the size of the Pandas DataFrame is bigger than the RAM on your machine.

If `df`

has 30 GB of data and your computer has 16 GB of RAM, then `df.compute()`

will blow up with a memory error. If `df`

only has 1 GB of data, then you’ll be fine.

You can run `df.memory_usage(deep=True).sum()`

to compute the amount of memory that your DataFrame is using. This’ll let you know if your DataFrame is sufficiently small to be coalesced into a single Pandas DataFrame.

Repartioning changes the number of underlying partitions in a Dask DataFrame. `df.repartition(1).partitions[0]`

is conceptually similar to `df.compute()`

.

Converting to a Pandas DataFrame is especially possible after performing a big filtering operation. If you filter a 100 billion row dataset down to 10 thousand rows, then you can probably just switch to the Pandas API.