Performant Way of appending shifted columns in existing dataframe

Question:

I am trying to append the shifts in dataframe as per below code:

import pandas as pd 
# list of strings 
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw']) 
print(df)
lagSize=365
for col in df.columns:
    for i in range(1, lagSize + 1):
        df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)

I am getting following warning:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)

What is good way to do the following operation?

Asked By: utkarsh saraf

||

Answers:

Named columns

In case you want to also keep a key for each shift so you know what it is you can keep them in a dictionary and create a new dataframe from the dictionary.
Here’s an example:

import pandas as pd

# list of strings 
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw']) 
print(df)
lagSize=365

shifts = {}
for col in df.columns:
  for i in range(1, lagSize + 1):
    shifts['key' + i] = df[col].shift(i)

dd = pd.DataFrame.from_dict(shifts)
print(dd)

Unnamed columns

If you keep a list instead it is even easier to create the next dataframe.

import pandas as pd

# list of strings 
shiftsList = []
for col in df.columns:
  for i in range(1, lagSize + 1):
    shiftsList.append(df[col].shift(i))

dl = pd.DataFrame(shiftsList)
print(dl)

After you’ve created the shifted Dataframe it’s way faster to add the original 1 column rather than add each column as you go.

Answered By: florianc63
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.