Pandas iloc or transpose slice transpose

Question:

Whilst I know both yield the same result which is more efficient and why.

#dataframe with shape 20,20
#slicing the first 10 columns

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(400).reshape(20,20))

df.T[:10].T
#or 
df.iloc[:,:10]

It’s likely that the difference is negligible and the iloc is best practice because it is more readable. I’d like to know some pros and cons.

Asked By: Plod

||

Answers:

.iloc[] has been specially designed to be as efficient as possible. Performing two transposes involves a lot of data movement and is bound to be slower.

The performance difference between the two is measurable, and gets more significant as the size of the dataframe increases. Using timeit.timeit() to measure timings:

For a small array:

>>> df = pd.DataFrame(np.arange(400).reshape(20,20))
>>> timeit("x = df.T[:10].T", globals=globals(), number=100)
0.04253590002190322
>>> timeit("x = df.iloc[:,:10]", globals=globals(), number=100)
0.006828900019172579

For a large array, the difference is more noticeable:

>>> df = pd.DataFrame(np.arange(400000000).reshape(20000,20000))
>>> timeit("x = df.T[:10].T", globals=globals(), number=100)
0.5803892000112683
>>> timeit("x = df.iloc[:,:10]", globals=globals(), number=100)
0.00561390002258122

That’s about 100x slower for the transpose approach.

Answered By: sj95126
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.