pandas: faster method than df.at[x,y]?

Question:

I have df1

df1 = pd.DataFrame({'x':[1,2,3,5],
                    'y':[2,3,4,6],
                    'value':[1.5,2.0,0.5,3.0]})

df1
    x   y   value
0   1   2   1.5
1   2   3   2.0
2   3   4   0.5
3   5   6   3.0

and I want to assign the value at x and y coordinates to another dataframe df2

df2 = pd.DataFrame(0.0, index=[x for x in range(0,df1['x'].max()+1)], columns=[y for y in range(0,df1['y'].max()+1)])

df2
    0   1   2   3   4   5   6
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 0.0 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.0 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0
5   0.0 0.0 0.0 0.0 0.0 0.0 0.0

by

for x, y, value in zip(df1['x'],df1['y'],df1['value']):

    df2.at[x,y] = value

to give

    0   1   2   3   4   5   6
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 1.5 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 2.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.5 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0
5   0.0 0.0 0.0 0.0 0.0 0.0 3.0

However, it is a bit slow because I have a long df1.

Do we have a faster method than df.at[x,y]?

Asked By: Johnny Tam

||

Answers:

You can avoid create zero df2 and using df.at method by DataFrame.pivot, DataFrame.fillna and DataFrame.reindex:

df2 = (df1.pivot('x','y','value')
          .fillna(0)
          .reindex(index=range(df1['x'].max()+1),
                   columns=range(df1['y'].max()+1), fill_value=0))
print (df2)
y    0    1    2    3    4    5    6
x                                   
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  1.5  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  2.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.5  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  3.0
Answered By: jezrael

Since your data is all numbers, you can use numpy; with a larger dataset, it might be faster than using pd.pivot:

# create a flattened array from df2
temp = df2.to_numpy().ravel()
# get indices for a flattened array, based on df1.x and df1.y
arr = np.ravel_multi_index((df1.x, df1.y), df2.shape)
# replace at the positions with df1.value
temp[arr] = df1.value
# reshape and create dataframe
temp = temp.reshape(df2.shape)
pd.DataFrame(temp, columns = df2.columns)

     0    1    2    3    4    5    6
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  1.5  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  2.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.5  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  3.0
Answered By: sammywemmy

Another neat way to do this (for numeric data) is using SciPy’s sparse matrix – your data is in sparse format:

from scipy.sparse import csr_matrix

df2_shape = df1['x'].max()+1, df1['y'].max()+1
sp_df1 = csr_matrix((df1['value'], (df1['x'], df1['y'])), shape=df2_shape)
pd.DataFrame.sparse.from_spmatrix(sp_df1)

In terms of speed, it’s comparable with sammywemmy’s numpy method for large datasets, and the intent is very clear.

Both are much faster than jezrael’s pivot approach, but that approach will work with all pandas datatypes, not just numeric.

There’s also a neat pandas one-liner if you have df2 setup (from this answer):

# this is an inplace operation - no need to assign
df2.update(df1.pivot(index='x', columns='y', values='value'))

This is the slowest, but performance may be acceptable if you like the style.

Answered By: s_pike
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.