# How to fusion cells of a dataframe by summation

## Question:

I want to transform my dataframe by merging it cells and summing them into other larger cells given the indices of those, as an example, given the indices `[0,2] & [2,4]` on the `X` and `Y` axis and go from the following dataframe :

``````+----+----+----+----+
| 1  | 2  | 3  | 4  |
+----+----+----+----+
| 5  | 6  | 7  | 8  |
+----+----+----+----+
| 9  | 10 | 11 | 12 |
+----+----+----+----+
| 13 | 14 | 15 | 16 |
+----+----+----+----+
``````

to the following one:

``````+----+----+
| 14 | 22 |
+----+----+
| 46 | 54 |
+----+----+
``````

I was thinking of Pandas’ `groupBY.transform` or `rolling` would be of help.
Any clues?

``````def fn(x):
x.index = x.columns = x.index//2
y = x.stack().groupby(level = [0,1]).sum().unstack()
y.index.name = y.columns.name = None
return y

df = pd.DataFrame({0 : [1, 5, 9, 13], 1 : [2, 6, 10, 14],
2 : [3, 7, 11, 15], 3 : [4, 8, 12, 16]})
fn(df.copy())
0   1
0  14  22
1  46  54
``````

Assuming you have homogenous blocks (e.g, 2×2), the most efficient would be
to `reshape` the underlying array and `sum`:

``````N = 2

out = pd.DataFrame(df.to_numpy()
# convert the 2D array to 4D
.reshape(len(df)//N, N, -1, N)
# sum along dimensions 1 and 3 to go back to 2D
.sum((1, 3))
)
``````

If you want non-square blocks (RxC):

``````R, C = 2, 2

out = pd.DataFrame(df.to_numpy()
.reshape(len(df)//R, R, df.shape[1]//C, C)
.sum((1, 3))
)
``````

Output:

``````    0   1
0  14  22
1  46  54
``````

Intermediate 4D array:

``````# df.to_numpy().reshape((len(df)//N, N, -1, N))

array([[[[ 1,  2],    # ──┐
[ 3,  4]],   # ─┐├─> 1+2+5+6 = 14
#  ││
[[ 5,  6],    # ──┘
[ 7,  8]]],  # ─┴──> 3+4+7+8 = 22

[[[ 9, 10],    # ──┐
[11, 12]],   # ─┐├─>  9+10+13+14 = 46
#  ││
[[13, 14],    # ──┘
[15, 16]]]]) # ─┴──> 11+12+15+16 = 54
``````

Using list comprehension combined with `np.array_split` for faster code instead of a slower `for` loop :

``````import pandas as pd

data = [[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]]

df = pd.DataFrame(data)

df_len = len(df)//2

# Reshape the list into a 2x2 array
arr = np.array(   [arr2.sum().sum() for arr1 in np.array_split(df, df_len)
for arr2 in np.array_split(arr1, df_len, axis=1)]
).reshape((df_len, df_len))

# Convert the array to a DataFrame
result = pd.DataFrame(arr, columns=['col1', 'col2'])

print(result)
``````
``````   col1  col2
0    14    22
1    46    54
``````

Timings

Tested with `time_it` :

• Laurent_B :
``````Temps d'exécution du script: 0.004486 secondes
``````
• Mozway :
``````Temps d'exécution du script: 0.000242 secondes
``````
• Onyambu
``````Temps d'exécution du script: 0.004269 secondes
``````

In definitive, Mozway has the fastest script using pure numpy components, even if other approaches work too.

Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.