Subtract the highest value within row containing a pattern against the highest value of row not containing that pattern in Pandas

Question:

I have a dataframe such as:

Groups Name               Value
G1     BLOC_Homo_sapiens  100
G1     BLOC_Chimpenzee    99
G1     BLOC_Bonobo        80
G1     Canis_lupus        20
G1     Danio_rerio        10
G2     BLOC_Homo_sapiens  30
G2     BLOC_Bonobo        29
G2     Mus_musculus       28
G2     Cules_pupiens      26
G3     BLOC_Gorrilla      300
G3     Cimex_lectularius  10
G3     Bombus_terrestris  9

And I would like to add a new column called "diff_length" for each Groups where I subtract the highest Value of the Name containing the pattern "BLOC" against the highest Value of the Name which does not contain the pattern "BLOC".

For the Groups1 for instance, the highest Value with the BLOC is 100, and the highest Value without BLOC is 20. So the result is 100-20 = 80.

I should then get:

Groups Name               Value diff_length
G1     BLOC_Homo_sapiens  100   80
G1     BLOC_Chimpenzee    99    80
G1     BLOC_Bonobo        80    80
G1     Canis_lupus        20    80
G1     Danio_rerio        10    80
G2     BLOC_Homo_sapiens  30    2
G2     BLOC_Bonobo        29    2
G2     Mus_musculus       28    2 
G2     Cules_pupiens      26    2
G3     BLOC_Gorrilla      300   290
G3     Cimex_lectularius  10    290
G3     Bombus_terrestris  9     290
Asked By: chippycentra

||

Answers:

You can use:

m = df['Name'].str.contains('BLOC')

df['diff_length'] = (df.groupby('Groups')['Value']
                       .transform(lambda d: d.where(m).max() - d.mask(m).max())
                    )

NB. this assumes unique indices.

Output:

   Groups               Name  Value  diff_length
0      G1  BLOC_Homo_sapiens    100         80.0
1      G1    BLOC_Chimpenzee     99         80.0
2      G1        BLOC_Bonobo     80         80.0
3      G1        Canis_lupus     20         80.0
4      G1        Danio_rerio     10         80.0
5      G2  BLOC_Homo_sapiens     30          2.0
6      G2        BLOC_Bonobo     29          2.0
7      G2       Mus_musculus     28          2.0
8      G2      Cules_pupiens     26          2.0
9      G3      BLOC_Gorrilla    300        290.0
10     G3  Cimex_lectularius     10        290.0
11     G3  Bombus_terrestris      9        290.0

Alternative syntax:

m = df['Name'].str.contains('BLOC')

df['diff_length'] = (
  df['Value'].where(m).groupby(df['Groups']).transform('max')
 -df['Value'].mask(m).groupby(df['Groups']).transform('max')
)
Answered By: mozway

here is one way to do it

# identify rows that contains the block
m1=df['Name'].str.contains('BLOC')

# groupby on Groups and the rows that has BLOC and ones that don't
# take the max for each (TRUE/FALSE) in a group, and take diff
df2=df.groupby(['Groups', m1 ] )['Value'].max().diff().reset_index()

# create a dictionary
d=dict(df2[df2['Name'].eq(True)][['Groups','Value']].values)

# map difference back to the df
df['diff_length'] = df['Groups'].map(d)
df


    Groups  Name          Value      diff_length
0   G1  BLOC_Homo_sapiens   100      80.0
1   G1  BLOC_Chimpenzee      99      80.0
2   G1  BLOC_Bonobo          80      80.0
3   G1  Canis_lupus          20      80.0
4   G1  Danio_rerio          10      80.0
5   G2  BLOC_Homo_sapiens    30       2.0
6   G2  BLOC_Bonobo          29       2.0
7   G2  Mus_musculus         28       2.0
8   G2  Cules_pupiens        26       2.0
9   G3  BLOC_Gorrilla       300     290.0
10  G3  Cimex_lectularius    10     290.0
11  G3  Bombus_terrestris     9     290.0
Answered By: Naveed
df1.assign(diff_length=df1.join(df1.groupby(['Groups',df1.Name.str.contains('BLOC')])
                                .transform(max)['Value'].rename('col1'))
    .groupby('Groups').col1
           .transform(lambda ss:ss.max()-ss.min()))

 Groups               Name  Value  diff_length
0      G1  BLOC_Homo_sapiens    100           80
1      G1    BLOC_Chimpenzee     99           80
2      G1        BLOC_Bonobo     80           80
3      G1        Canis_lupus     20           80
4      G1        Danio_rerio     10           80
5      G2  BLOC_Homo_sapiens     30            2
6      G2        BLOC_Bonobo     29            2
7      G2       Mus_musculus     28            2
8      G2      Cules_pupiens     26            2
9      G3      BLOC_Gorrilla    300          290
10     G3  Cimex_lectularius     10          290
11     G3  Bombus_terrestris      9          290
Answered By: G.G
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.