# How to fusion cells of a dataframe by summation

## Question:

I want to transform my dataframe by merging it cells and summing them into other larger cells given the indices of those, as an example, given the indices `[0,2] & [2,4]`

on the `X`

and `Y`

axis and go from the following dataframe :

```
+----+----+----+----+
| 1 | 2 | 3 | 4 |
+----+----+----+----+
| 5 | 6 | 7 | 8 |
+----+----+----+----+
| 9 | 10 | 11 | 12 |
+----+----+----+----+
| 13 | 14 | 15 | 16 |
+----+----+----+----+
```

to the following one:

```
+----+----+
| 14 | 22 |
+----+----+
| 46 | 54 |
+----+----+
```

I was thinking of Pandas’ `groupBY.transform`

or `rolling`

would be of help.

Any clues?

## Answers:

```
def fn(x):
x.index = x.columns = x.index//2
y = x.stack().groupby(level = [0,1]).sum().unstack()
y.index.name = y.columns.name = None
return y
df = pd.DataFrame({0 : [1, 5, 9, 13], 1 : [2, 6, 10, 14],
2 : [3, 7, 11, 15], 3 : [4, 8, 12, 16]})
fn(df.copy())
0 1
0 14 22
1 46 54
```

Assuming you have homogenous blocks (e.g, 2×2), the most efficient would be

to `reshape`

the underlying numpy array and `sum`

:

```
N = 2
out = pd.DataFrame(df.to_numpy()
# convert the 2D array to 4D
.reshape(len(df)//N, N, -1, N)
# sum along dimensions 1 and 3 to go back to 2D
.sum((1, 3))
)
```

If you want non-square blocks (RxC):

```
R, C = 2, 2
out = pd.DataFrame(df.to_numpy()
.reshape(len(df)//R, R, df.shape[1]//C, C)
.sum((1, 3))
)
```

Output:

```
0 1
0 14 22
1 46 54
```

Intermediate 4D array:

```
# df.to_numpy().reshape((len(df)//N, N, -1, N))
array([[[[ 1, 2], # ──┐
[ 3, 4]], # ─┐├─> 1+2+5+6 = 14
# ││
[[ 5, 6], # ──┘
[ 7, 8]]], # ─┴──> 3+4+7+8 = 22
[[[ 9, 10], # ──┐
[11, 12]], # ─┐├─> 9+10+13+14 = 46
# ││
[[13, 14], # ──┘
[15, 16]]]]) # ─┴──> 11+12+15+16 = 54
```

Using list comprehension combined with `np.array_split`

for faster code instead of a slower `for`

loop :

```
import pandas as pd
data = [[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]]
df = pd.DataFrame(data)
df_len = len(df)//2
# Reshape the list into a 2x2 array
arr = np.array( [arr2.sum().sum() for arr1 in np.array_split(df, df_len)
for arr2 in np.array_split(arr1, df_len, axis=1)]
).reshape((df_len, df_len))
# Convert the array to a DataFrame
result = pd.DataFrame(arr, columns=['col1', 'col2'])
print(result)
```

```
col1 col2
0 14 22
1 46 54
```

**Timings**

Tested with `time_it`

:

- Laurent_B :

```
Temps d'exécution du script: 0.004486 secondes
```

- Mozway :

```
Temps d'exécution du script: 0.000242 secondes
```

- Onyambu

```
Temps d'exécution du script: 0.004269 secondes
```

In definitive, Mozway has the fastest script using pure numpy components, even if other approaches work too.