How to find the id which contains the specific name includes in it using python

Question:

I have a excel file containing three columns as shown below,

project_id Name Date
755 Waber 2019-04-08
755 Wabern 2020-06-16
755 Wabern (FTTH geplan) 2020-07-24
755 Wabern FTTH DTAG 2020-08-15
755 Wabern (FTTH DTAG gg) 2021-03-05
755 Wabern 2021-09-13
134 Lerbeck 2019-04-18
134 BAd oyehausen (FttH svd) 2020-06-26
134 Werre Park 2020-07-14
134 Werre Park (FTTH ssd) 2020-08-25
134 Werre Park (FTTH) 2021-03-15
134 Bad oyehausen 2021-09-23
584 kitern 2019-04-08
584 Lausen (ftth los) 2020-06-16
584 kitener (FTTH geplan) 2020-07-24
584 Lausern 2020-08-15
584 Lausern (FTTH DTAG gg) 2021-03-05

Is it possible to filter out the project_id for which the name does not include any form of (FTTH) in the first row of the same id and at the last row of the same id. For example the first project_id 755 in the first row doesn’t have any form of (FTTH) included in it and also the last row of the same project_id doesn’t have the text (fttH), but for the project_id 584 it is a different scenario. Is it possible to filter out the project_id which doesn’t include FTTH in any form in the first and last row of the same project_id ?

Expecting the result to be like,

project_id
755
134
Asked By: Vicky

||

Answers:

By looping over the unique id, and each time get the all name by specific id, after keep only first and last name, convert it to str and check there is any word ‘FTTH’

Code:

import numpy as np
[i for i in set(df.id.values) if 'FTTH' not in str(np.array(df[df['id']==i]['NM'])[[0,-1]])]



#[755, 134]
Answered By: R. Baraiya

Using pandas:

def custom_function(series, pattern='FTTH'):
    """Identify if the first and last items do not have a pattern"""
    first = pattern not in series.iat[0].upper()
    last = pattern not in series.iat[-1].upper()
    return first and last

df.groupby('project_id').Name.apply(custom_function)

Output:

project_id
134     True
584    False
755     True
Name: Name, dtype: bool

a different approuch with pandas:

res = df.groupby('project_id').apply(lambda x: ~x.Name.take([0,-1]).str
                                     .contains('ftth',case=False).any())

res[res].reset_index().drop(0,axis=1)

>>>
'''
    project_id
0   134
1   755
Answered By: SergFSM