How to create a column based on the value of the other columns
Question:
I have the following dataframe
type_x Range myValname
0 g1 0.48 600
1 g2 0.30 600
2 g3 0.62 890
3 g4 0.75 890
I would like to get the following dataframe
type_x Range myValname newCol
0 g1 0.48 600 c1
1 g2 0.30 600 c1
2 g3 0.62 890 c2
3 g4 0.75 890 c2
The significance of c1
and c2
are that if the myValname is same for a type_x
value then both the value can be treated as same value. I want generalized code.
My thinking is to convert it into dictionary and map some values, but unable to get the outcome.
df3['newCol'] = df3.groupby('myValname').rank()
Answers:
You can add/append a new column to the DataFrame based on the values of another column using df. assign()
, df. apply()
, np. where()
functions and return a new Dataframe after adding a new column.
df["newCol"] = df.groupby("myValname").ngroup().add(1).astype(str).radd("c")
- for each unique "myValname", take the group order of it (0, 1, …)
- since it’s 0-based, add(1) to get 1, 2, … instead
- then stringify it to add from right "c"
to get
>>> df
type_x Range myValname newCol
0 g1 0.48 600 c1
1 g2 0.30 600 c1
2 g3 0.62 890 c2
3 g4 0.75 890 c2
where after .ngroup()
, this was here:
>>> df.groupby("myValname").ngroup()
0 0
1 0
2 1
3 1
dtype: int64
alternative with pd.factorize:
df["newCol"] = pd.Series(pd.factorize(df["myValname"])[0] + 1, dtype="str").radd("c")
where now pd.factorize assigns 0, 1, … to each unique value in "myValname", and after that the same modifications follow as before.
Use the .cat.codes
attribute when you convert the dataFrame column into a categorical
dtype:
df['newCol'] = df['myValname'].astype('category').cat.codes.apply(lambda x: 'c' + str(x))
In the .apply
method, it gets the categoric value and puts a c before it (to your liking).
I have the following dataframe
type_x Range myValname
0 g1 0.48 600
1 g2 0.30 600
2 g3 0.62 890
3 g4 0.75 890
I would like to get the following dataframe
type_x Range myValname newCol
0 g1 0.48 600 c1
1 g2 0.30 600 c1
2 g3 0.62 890 c2
3 g4 0.75 890 c2
The significance of c1
and c2
are that if the myValname is same for a type_x
value then both the value can be treated as same value. I want generalized code.
My thinking is to convert it into dictionary and map some values, but unable to get the outcome.
df3['newCol'] = df3.groupby('myValname').rank()
You can add/append a new column to the DataFrame based on the values of another column using df. assign()
, df. apply()
, np. where()
functions and return a new Dataframe after adding a new column.
df["newCol"] = df.groupby("myValname").ngroup().add(1).astype(str).radd("c")
- for each unique "myValname", take the group order of it (0, 1, …)
- since it’s 0-based, add(1) to get 1, 2, … instead
- then stringify it to add from right "c"
to get
>>> df
type_x Range myValname newCol
0 g1 0.48 600 c1
1 g2 0.30 600 c1
2 g3 0.62 890 c2
3 g4 0.75 890 c2
where after .ngroup()
, this was here:
>>> df.groupby("myValname").ngroup()
0 0
1 0
2 1
3 1
dtype: int64
alternative with pd.factorize:
df["newCol"] = pd.Series(pd.factorize(df["myValname"])[0] + 1, dtype="str").radd("c")
where now pd.factorize assigns 0, 1, … to each unique value in "myValname", and after that the same modifications follow as before.
Use the .cat.codes
attribute when you convert the dataFrame column into a categorical
dtype:
df['newCol'] = df['myValname'].astype('category').cat.codes.apply(lambda x: 'c' + str(x))
In the .apply
method, it gets the categoric value and puts a c before it (to your liking).