Creating a zero-filled pandas data frame

Question:

What is the best way to create a zero-filled pandas data frame of a given size?

I have used:

zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)

Is there a better way to do it?

Asked By: niedakh

||

Answers:

You can try this:

d = pd.DataFrame(0, index=np.arange(len(data)), columns=feature_list)
Answered By: Shravan

If you already have a dataframe, this is the fastest way:

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 µs per loop

Compare to:

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 µs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 µs per loop
Answered By: mtd

Assuming having a template DataFrame, which one would like to copy with zero values filled here…

If you have no NaNs in your data set, multiplying by zero can be significantly faster:

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

Improvement depends on DataFrame size, but never found it slower.

And just for the heck of it:

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

But:

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

EDIT!!!

Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop
Answered By: Mark Horvath

It’s best to do this with numpy in my opinion

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))
Answered By: Alex

Similar to @Shravan, but without the use of numpy:

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

Then you can do whatever you want with it:

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)
Answered By: WaveRider

If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

df_zeros = df * 0

If the existing data frame contains NaNs or non-numeric values you can instead apply a function to each cell that will just return 0:

df_zeros = df.applymap(lambda x: 0)
Answered By: chakuRak
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.