How can i use pd.concat' to join all columns at once instead of calling `frame.insert` many times?

Question:

I have to create a new dataframe in which each column is determined by a function which has two arguments. The problem is that for each column the function needs a different argument which is given by the number of the column.
There are about 6k rows and 200 columns in the dataframe:

The function that defines each column of the new dataframe is defined like this:

def phiNT(M,nT):

  M=M[M.columns[:nT]]
  d=pd.concat([M.iloc[:,nT-1]]*nT,axis=1)
  d.columns=M.columns
  D=M-d
  D=D.mean(axis=1)
return D

I tried to create an empty dataframe and then add each column using a loop:

A=pd.DataFrame()
for i in range(1,len(M.columns)):
    A[i]=phiNT(M,i)

But this is what pops up:

 PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`

So i need a way to apply pd.concat to create all columns at once.

Asked By: Gigi

||

Answers:

you should create all dataframes in a list or generator then call pd.concat on the list or generator to create a new dataframe with all the dataframe columns in it, instead of doing it once for each column.

the following uses a generator to be memory efficient.

results = (phiNT(M,i) for i in range(1,len(M.columns)))
A = pd.concat(results,axis=1)

this is how it’d be done in a list.

A = pd.concat([phiNT(M,i) for i in range(1,len(M.columns))],axis=1)
Answered By: Ahmed AEK
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.