# Create a column based on conditions and calculation

## Question:

Below is my dataframe:

```
df = pd.DataFrame({"ID" : [1, 1, 2, 2, 2, 3, 3],
"length" : [0.7, 0.7, 0.8, 0.6, 0.6, 0.9, 0.9],
"comment" : ["typed", "handwritten", "typed", "typed", "handwritten", "handwritten", "handwritten"]})
df
ID length comment
0 1 0.7 typed
1 1 0.7 handwritten
2 2 0.8 typed
3 2 0.6 typed
4 2 0.6 handwritten
5 3 0.9 handwritten
6 3 0.9 handwritten
```

I want to be able to do the following:

For any group of ID, if the length are the same but the comments are different, use the "typed" formula (5 x length) for the calculated length of that group of ID, otherwise use the formula that apply to each comment to get the calculated length. typed = 5 x length, handwritten = 7*length.

Required Output will be as below:

```
ID length comment Calculated Length
0 1 0.7 typed 5*length
1 1 0.7 handwritten 5*length
2 2 0.8 typed 5*length
3 2 0.6 typed 5*length
4 2 0.6 handwritten 7*length
5 3 0.9 handwritten 7*length
6 3 0.9 handwritten 7*length
```

Thank you.

## Answers:

use `np.where`

if comment column exist only typed or handwritten.

```
import numpy as np
cond1 = df['comment'] == 'typed'
df.assign(Calculated_Length=np.where(cond1, df['length'] * 5, df['length'] * 7))
```

output:

```
ID length comment Calculated_Length
0 1 0.7 typed 3.5
1 1 0.7 handwritten 4.9
2 2 0.8 typed 4.0
3 2 0.6 typed 3.0
4 2 0.6 handwritten 4.2
5 3 0.9 handwritten 6.3
6 3 0.9 handwritten 6.3
```

edit after comment

```
cond1 = df['comment'] == 'typed'
cond2 = df.groupby('ID')['length'].transform(lambda x: (x.max() == x.min()) & (df.loc[x.index, 'comment'].eq('typed').sum() > 0))
df.assign(Caculated_Length=np.where((cond1 | cond2), df['length']*5, df['length']*7))
```

output:

```
ID length comment Caculated_Length
0 1 0.7 typed 3.5
1 1 0.7 handwritten 3.5
2 2 0.8 typed 4.0
3 2 0.6 typed 3.0
4 2 0.6 handwritten 4.2
5 3 0.9 handwritten 6.3
6 3 0.9 handwritten 6.3
```

Find the `IDs`

that satisfy the special condition using `groupby`

. Using the `IDs`

and the `comment`

, compute the `Calculated length`

using `np.where`

as follows

```
>>> grp_ids = df.groupby("ID")[["length", "comment"]].nunique()
>>> grp_ids
length comment
ID
1 1 2
2 2 2
3 1 1
>>> idx = grp_ids.index[(grp_ids["length"] == 1) & (grp_ids["comment"] != 1)]
>>> idx
Int64Index([1], dtype='int64', name='ID')
>>> df["Calculated length"] = np.where(
df["ID"].isin(idx) | (df["comment"] == "typed"),
df["length"] * 5,
df["length"] * 7
)
>>> df
ID length comment Calculated length
0 1 0.7 typed 3.5
1 1 0.7 handwritten 3.5
2 2 0.8 typed 4.0
3 2 0.6 typed 3.0
4 2 0.6 handwritten 4.2
5 3 0.9 handwritten 6.3
6 3 0.9 handwritten 6.3
```