# Attaching a calculated column to an existing dataframe raises TypeError: incompatible index of inserted column with frame index

## Question:

I am starting to learn Pandas, and I was following the question here and could not get the solution proposed to work for me and I get an indexing error. This is what I have

```
from pandas import *
import pandas as pd
d = {'L1' : Series(['X','X','Z','X','Z','Y','Z','Y','Y',]),
'L2' : Series([1,2,1,3,2,1,3,2,3]),
'L3' : Series([50,100,15,200,10,1,20,10,100])}
df = DataFrame(d)
df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
```

which outputs the following (I am using iPython)

```
L1
X 3 0.571429
1 0.857143
0 1.000000
Y 8 0.900901
7 0.990991
5 1.000000
Z 6 0.444444
2 0.777778
4 1.000000
dtype: float64
```

Then, I try to append the cumulative number calculation under the label "new" as suggested in the post

```
df["new"] = df.groupby("L1", as_index=False).apply(lambda x : pd.expanding_sum(x.sort("L3", ascending=False)["L3"])/x["L3"].sum())
```

I get this:

```
2196 value = value.reindex(self.index).values
2197 except:
-> 2198 raise TypeError('incompatible index of inserted column '
2199 'with frame index')
2200
TypeError: incompatible index of inserted column with frame index
```

Does anybody knows what the problem is? How can I reinsert the calculated value into the dataframe so it shows the values in order (descending by "new" for each label X, Y, Z.)

## Answers:

The problem is, as the Error message says, that the index of the calculated column you want to insert is incompatible with the index of `df`

.

The index of `df`

is a simple index:

```
In [8]: df.index
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8], dtype='int64')
```

while the index of the calculated column is a MultiIndex (as you also already can see in the output), supposing we call it `new_column`

:

```
In [15]: new_column.index
Out[15]:
MultiIndex
[(u'X', 3), (u'X', 1), (u'X', 0), (u'Y', 8), (u'Y', 7), (u'Y', 5), (u'Z', 6), (u'Z', 2), (u'Z', 4)]
```

For this reason, you cannot insert it into the frame. However, **this is a bug in 0.12**, as this does work in 0.13 (for which the answer in the linked question was tested) and the keyword `as_index=False`

should ensure the column `L1`

is not added to the index.

**SOLUTION for 0.12**:

Remove the first level of the MultiIndex, so you get back the original index:

```
In [13]: new_column = df.groupby('L1', as_index=False).apply(lambda x : pd.expanding_sum(x.sort('L3', ascending=False)['L3'])/x['L3'].sum())
In [14]: df["new"] = new_column.reset_index(level=0, drop=True)
```

In pandas 0.13 (in development) this is fixed (https://github.com/pydata/pandas/pull/4670). It is for this reason the `as_index=False`

is used in the groupby call, so the column `L1`

(fow which you group) is not added to the index (creating a MultiIndex), so the original index is retained and the result can be appended to the original frame. But it seems the `as_index`

keyword is ignored in 0.12 when using `apply`

.

This problem still exists (as of pandas 1.5.0) if the indices don’t match. A modern version of the `groupby.apply`

in the OP may be written as

```
df['new'] = df.groupby('L1')['L3'].apply(lambda x: x.sort_values(ascending=False).cumsum()/x.sum())
```

and it would raise `TypeError: incompatible index of inserted column with frame index`

.

A solution is to drop the index level created by the `groupby`

.

```
result = df.groupby('L1')['L3'].apply(lambda x: x.sort_values(ascending=False).cumsum()/x.sum())
df['new'] = result.droplevel(0) # <--- drop the unwanted index level
```

In any case, to get a column that is indexed the same as the original dataframe (as is being tried in the OP), the canonical way is to transform the function using `groupby.transform`

(as suggested by @DSM in a comment). The sorting has to be done beforehand.

```
df['new'] = df.sort_values(by='L3', ascending=False).groupby('L1')['L3'].transform(lambda y: y.cumsum()/y.sum())
```

Yet another way is to perform the division outside the `groupby`

ditching `lambda`

altogether.

```
g = df.sort_values(by='L3', ascending=False).groupby('L1')['L3']
df['new'] = g.cumsum() / g.transform('sum')
```