Create a new column based on count of other columns
Question:
I have a dataframe in pandas that looks like
col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B
and i want to add a new column col_new
that counts the number of rows with the same elements in col_1
and col_2
but excluding that row itself. So the desired output would look like
col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1
Here what’s I tried but I am not sure if it’s the right approach:
df['col_new'] = df.groupby(['col_1', 'col_2']).count()
But then I got the error: TypeError: incompatible index of inserted column with frame index
Thanks in advance.
Answers:
You can use:
df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)
Output:
col_1 col_2 col_new
0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1
I would use the value_counts
method.
-
Create a 3rd column called col3
and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.
df["col3"] = df.apply(lambda x: (x[0], x[1]), axis = 1)
col_1 col_2 col3
0 6 A (6, A)
1 2 A (2, A)
2 5 B (5, B)
3 3 C (3, C)
4 5 C (5, C)
5 3 B (3, B)
6 6 A (6, A)
7 6 A (6, A)
8 2 B (2, B)
9 2 C (2, C)
10 5 A (5, A)
11 5 B (5, B)
-
Create a Series
for value counts. This will be used like a lookup table.
value_counts = df["col3"].value_counts()
(6, A) 3
(5, B) 2
(2, A) 1
(3, C) 1
(5, C) 1
(3, B) 1
(2, B) 1
(2, C) 1
(5, A) 1
Name: col3, dtype: int64
-
Map each row to a fourth column called counts
df["counts"] = df["col3"].map(value_counts)
col_1 col_2 col3 counts
0 6 A (6, A) 3
1 2 A (2, A) 1
2 5 B (5, B) 2
3 3 C (3, C) 1
4 5 C (5, C) 1
5 3 B (3, B) 1
6 6 A (6, A) 3
7 6 A (6, A) 3
8 2 B (2, B) 1
9 2 C (2, C) 1
10 5 A (5, A) 1
11 5 B (5, B) 2
I have a dataframe in pandas that looks like
col_1 col_2
6 A
2 A
5 B
3 C
5 C
3 B
6 A
6 A
2 B
2 C
5 A
5 B
and i want to add a new column col_new
that counts the number of rows with the same elements in col_1
and col_2
but excluding that row itself. So the desired output would look like
col_1 col_2 col_new
6 A 2
2 A 0
5 B 1
3 C 0
5 C 0
3 B 0
6 A 2
6 A 2
2 B 0
2 C 0
5 A 0
5 B 1
Here what’s I tried but I am not sure if it’s the right approach:
df['col_new'] = df.groupby(['col_1', 'col_2']).count()
But then I got the error: TypeError: incompatible index of inserted column with frame index
Thanks in advance.
You can use:
df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)
Output:
col_1 col_2 col_new
0 6 A 2
1 2 A 0
2 5 B 1
3 3 C 0
4 5 C 0
5 3 B 0
6 6 A 2
7 6 A 2
8 2 B 0
9 2 C 0
10 5 A 0
11 5 B 1
I would use the value_counts
method.
-
Create a 3rd column called
col3
and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.df["col3"] = df.apply(lambda x: (x[0], x[1]), axis = 1)
col_1 col_2 col3 0 6 A (6, A) 1 2 A (2, A) 2 5 B (5, B) 3 3 C (3, C) 4 5 C (5, C) 5 3 B (3, B) 6 6 A (6, A) 7 6 A (6, A) 8 2 B (2, B) 9 2 C (2, C) 10 5 A (5, A) 11 5 B (5, B)
-
Create a
Series
for value counts. This will be used like a lookup table.value_counts = df["col3"].value_counts()
(6, A) 3 (5, B) 2 (2, A) 1 (3, C) 1 (5, C) 1 (3, B) 1 (2, B) 1 (2, C) 1 (5, A) 1 Name: col3, dtype: int64
-
Map each row to a fourth column called
counts
df["counts"] = df["col3"].map(value_counts)
col_1 col_2 col3 counts 0 6 A (6, A) 3 1 2 A (2, A) 1 2 5 B (5, B) 2 3 3 C (3, C) 1 4 5 C (5, C) 1 5 3 B (3, B) 1 6 6 A (6, A) 3 7 6 A (6, A) 3 8 2 B (2, B) 1 9 2 C (2, C) 1 10 5 A (5, A) 1 11 5 B (5, B) 2