pandas: faster method than df.at[x,y]?

Question

I have df1

df1 = pd.DataFrame({'x':[1,2,3,5],
                    'y':[2,3,4,6],
                    'value':[1.5,2.0,0.5,3.0]})

df1
    x   y   value
0   1   2   1.5
1   2   3   2.0
2   3   4   0.5
3   5   6   3.0

and I want to assign the value at x and y coordinates to another dataframe df2

df2 = pd.DataFrame(0.0, index=[x for x in range(0,df1['x'].max()+1)], columns=[y for y in range(0,df1['y'].max()+1)])

df2
    0   1   2   3   4   5   6
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 0.0 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 0.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.0 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0
5   0.0 0.0 0.0 0.0 0.0 0.0 0.0

by

for x, y, value in zip(df1['x'],df1['y'],df1['value']):

    df2.at[x,y] = value

to give

    0   1   2   3   4   5   6
0   0.0 0.0 0.0 0.0 0.0 0.0 0.0
1   0.0 0.0 1.5 0.0 0.0 0.0 0.0
2   0.0 0.0 0.0 2.0 0.0 0.0 0.0
3   0.0 0.0 0.0 0.0 0.5 0.0 0.0
4   0.0 0.0 0.0 0.0 0.0 0.0 0.0
5   0.0 0.0 0.0 0.0 0.0 0.0 3.0

However, it is a bit slow because I have a long df1.

Do we have a faster method than df.at[x,y]?

Asked By: Johnny Tam

||

Source

Answer 1

You can avoid create zero df2 and using df.at method by DataFrame.pivot, DataFrame.fillna and DataFrame.reindex:

df2 = (df1.pivot('x','y','value')
          .fillna(0)
          .reindex(index=range(df1['x'].max()+1),
                   columns=range(df1['y'].max()+1), fill_value=0))
print (df2)
y    0    1    2    3    4    5    6
x                                   
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  1.5  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  2.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.5  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  3.0

Answered By: jezrael

Answer 2

Since your data is all numbers, you can use numpy; with a larger dataset, it might be faster than using pd.pivot:

# create a flattened array from df2
temp = df2.to_numpy().ravel()
# get indices for a flattened array, based on df1.x and df1.y
arr = np.ravel_multi_index((df1.x, df1.y), df2.shape)
# replace at the positions with df1.value
temp[arr] = df1.value
# reshape and create dataframe
temp = temp.reshape(df2.shape)
pd.DataFrame(temp, columns = df2.columns)

     0    1    2    3    4    5    6
0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  1.5  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  2.0  0.0  0.0  0.0
3  0.0  0.0  0.0  0.0  0.5  0.0  0.0
4  0.0  0.0  0.0  0.0  0.0  0.0  0.0
5  0.0  0.0  0.0  0.0  0.0  0.0  3.0

Answered By: sammywemmy

Answer 3

Another neat way to do this (for numeric data) is using SciPy’s sparse matrix – your data is in sparse format:

from scipy.sparse import csr_matrix

df2_shape = df1['x'].max()+1, df1['y'].max()+1
sp_df1 = csr_matrix((df1['value'], (df1['x'], df1['y'])), shape=df2_shape)
pd.DataFrame.sparse.from_spmatrix(sp_df1)

In terms of speed, it’s comparable with sammywemmy’s numpy method for large datasets, and the intent is very clear.

Both are much faster than jezrael’s pivot approach, but that approach will work with all pandas datatypes, not just numeric.

There’s also a neat pandas one-liner if you have df2 setup (from this answer):

# this is an inplace operation - no need to assign
df2.update(df1.pivot(index='x', columns='y', values='value'))

This is the slowest, but performance may be acceptable if you like the style.

Answered By: s_pike

pandas: faster method than df.at[x,y]?

Question:

Answers: