Create a new column based on count of other columns

Question

I have a dataframe in pandas that looks like

col_1   col_2
6       A       
2       A       
5       B       
3       C       
5       C       
3       B       
6       A       
6       A       
2       B       
2       C       
5       A       
5       B

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself. So the desired output would look like

col_1   col_2   col_new
6       A       2
2       A       0
5       B       1
3       C       0  
5       C       0
3       B       0
6       A       2
6       A       2
2       B       0
2       C       0
5       A       0
5       B       1

Here what’s I tried but I am not sure if it’s the right approach:

df['col_new'] = df.groupby(['col_1', 'col_2']).count()

But then I got the error: TypeError: incompatible index of inserted column with frame index

Thanks in advance.

Asked By: Nayr borcherds

||

Source

Answer 1

You can use:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

Output:

    col_1 col_2  col_new
0       6     A        2
1       2     A        0
2       5     B        1
3       3     C        0
4       5     C        0
5       3     B        0
6       6     A        2
7       6     A        2
8       2     B        0
9       2     C        0
10      5     A        0
11      5     B        1

Answered By: mozway

Answer 2

I would use the value_counts method.

Create a 3rd column called col3 and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.

df["col3"] = df.apply(lambda x: (x[0], x[1]), axis = 1)

       col_1 col_2    col3                                                                                              
  0       6     A    (6, A)                                                                                              
  1       2     A    (2, A)                                                                                              
  2       5     B    (5, B)                                                                                             
  3       3     C    (3, C)                                                                                              
  4       5     C    (5, C)                                                                                              
  5       3     B    (3, B)                                                                                              
  6       6     A    (6, A)                                                                                              
  7       6     A    (6, A)                                                                                              
  8       2     B    (2, B)                                                                                              
  9       2     C    (2, C)                                                                                              
  10      5     A    (5, A)                                                                                              
  11      5     B    (5, B)

Create a Series for value counts. This will be used like a lookup table.

value_counts = df["col3"].value_counts()

(6, A)    3
(5, B)    2
(2, A)    1
(3, C)    1
(5, C)    1
(3, B)    1
(2, B)    1
(2, C)    1
(5, A)    1
Name: col3, dtype: int64

Map each row to a fourth column called counts

df["counts"] = df["col3"].map(value_counts)

       col_1 col_2    col3  counts
  0       6     A  (6, A)       3
  1       2     A  (2, A)       1
  2       5     B  (5, B)       2
  3       3     C  (3, C)       1
  4       5     C  (5, C)       1
  5       3     B  (3, B)       1
  6       6     A  (6, A)       3
  7       6     A  (6, A)       3
  8       2     B  (2, B)       1
  9       2     C  (2, C)       1
  10      5     A  (5, A)       1
  11      5     B  (5, B)       2

Answered By: Conic

Create a new column based on count of other columns

Question:

Answers: