Create a new column based on count of other columns

Question:

I have a dataframe in pandas that looks like

col_1   col_2
6       A       
2       A       
5       B       
3       C       
5       C       
3       B       
6       A       
6       A       
2       B       
2       C       
5       A       
5       B

and i want to add a new column col_new that counts the number of rows with the same elements in col_1 and col_2 but excluding that row itself. So the desired output would look like

col_1   col_2   col_new
6       A       2
2       A       0
5       B       1
3       C       0  
5       C       0
3       B       0
6       A       2
6       A       2
2       B       0
2       C       0
5       A       0
5       B       1   

Here what’s I tried but I am not sure if it’s the right approach:

df['col_new'] = df.groupby(['col_1', 'col_2']).count()

But then I got the error: TypeError: incompatible index of inserted column with frame index

Thanks in advance.

Asked By: Nayr borcherds

||

Answers:

You can use:

df['col_new'] = df.groupby(['col_1', 'col_2'])['col_2'].transform('count').sub(1)

Output:

    col_1 col_2  col_new
0       6     A        2
1       2     A        0
2       5     B        1
3       3     C        0
4       5     C        0
5       3     B        0
6       6     A        2
7       6     A        2
8       2     B        0
9       2     C        0
10      5     A        0
11      5     B        1
Answered By: mozway

I would use the value_counts method.

  • Create a 3rd column called col3 and store a tuple of the row values. Tuples, unlike lists are hashable and can be used to create keys for counting.

    df["col3"] = df.apply(lambda x: (x[0], x[1]), axis = 1)   
    
           col_1 col_2    col3                                                                                              
      0       6     A    (6, A)                                                                                              
      1       2     A    (2, A)                                                                                              
      2       5     B    (5, B)                                                                                             
      3       3     C    (3, C)                                                                                              
      4       5     C    (5, C)                                                                                              
      5       3     B    (3, B)                                                                                              
      6       6     A    (6, A)                                                                                              
      7       6     A    (6, A)                                                                                              
      8       2     B    (2, B)                                                                                              
      9       2     C    (2, C)                                                                                              
      10      5     A    (5, A)                                                                                              
      11      5     B    (5, B) 
    
  • Create a Series for value counts. This will be used like a lookup table.

    value_counts = df["col3"].value_counts() 
    
    (6, A)    3
    (5, B)    2
    (2, A)    1
    (3, C)    1
    (5, C)    1
    (3, B)    1
    (2, B)    1
    (2, C)    1
    (5, A)    1
    Name: col3, dtype: int64
    
  • Map each row to a fourth column called counts

    df["counts"] = df["col3"].map(value_counts)  
    
           col_1 col_2    col3  counts
      0       6     A  (6, A)       3
      1       2     A  (2, A)       1
      2       5     B  (5, B)       2
      3       3     C  (3, C)       1
      4       5     C  (5, C)       1
      5       3     B  (3, B)       1
      6       6     A  (6, A)       3
      7       6     A  (6, A)       3
      8       2     B  (2, B)       1
      9       2     C  (2, C)       1
      10      5     A  (5, A)       1
      11      5     B  (5, B)       2
    
Answered By: Conic