How to apply a function to all elements in a column in a dataframe for faster results?

Question:

I have a data frame with close to 2.5M rows. The structure of the data frame is as follows:

X Y
3256772 54745
3256778 54779

I have to apply a PyProj function such that the following result is obtained:

X Y X2 Y2
3256772 54745 23.45 -49.23
3256778 54779 23.50 -51.24

Is there anyway to optimize this piece of code? The data frame i’m working on has close to 2.5 million rows thus the optimization matters.

I have written the following code for applying the function but it is taking forever to process the results.

from pyproj import Proj, transform

def convert(x1,y1):
    inProj = Proj('epsg:3857')
    outProj = Proj('epsg:4326')
    x2,y2 = transform(inProj,outProj,x1,y1,always_xy=True)
    return(x2,y2)


final[['X2', 'Y2']] = final.apply(lambda row: pd.Series(convert(row['X'], row['Y'])), axis=1)

Asked By: Rishikesh Sreehari

||

Answers:

Based on the suggestions in the comments, I passed the x1 and y1 input values as numpy arrays and got the results. The code executed in 4.1s which works for me.

For future reference for anyone looking, here’s the code I used:

final['X2'],final['Y2']=transform(input_epsg,output_epsg,final[["X"]].to_numpy(),final[["Y"]].to_numpy(),always_xy=True)
Answered By: Rishikesh Sreehari