How to find the id which contains the specific name includes in it using python
Question:
I have a excel file containing three columns as shown below,
project_id
Name
Date
755
Waber
2019-04-08
755
Wabern
2020-06-16
755
Wabern (FTTH geplan)
2020-07-24
755
Wabern FTTH DTAG
2020-08-15
755
Wabern (FTTH DTAG gg)
2021-03-05
755
Wabern
2021-09-13
134
Lerbeck
2019-04-18
134
BAd oyehausen (FttH svd)
2020-06-26
134
Werre Park
2020-07-14
134
Werre Park (FTTH ssd)
2020-08-25
134
Werre Park (FTTH)
2021-03-15
134
Bad oyehausen
2021-09-23
584
kitern
2019-04-08
584
Lausen (ftth los)
2020-06-16
584
kitener (FTTH geplan)
2020-07-24
584
Lausern
2020-08-15
584
Lausern (FTTH DTAG gg)
2021-03-05
Is it possible to filter out the project_id for which the name does not include any form of (FTTH) in the first row of the same id and at the last row of the same id. For example the first project_id 755 in the first row doesn’t have any form of (FTTH) included in it and also the last row of the same project_id doesn’t have the text (fttH), but for the project_id 584 it is a different scenario. Is it possible to filter out the project_id which doesn’t include FTTH in any form in the first and last row of the same project_id ?
Expecting the result to be like,
project_id
755
134
Answers:
By looping over the unique id, and each time get the all name by specific id, after keep only first and last name, convert it to str and check there is any word ‘FTTH’
Code:
import numpy as np
[i for i in set(df.id.values) if 'FTTH' not in str(np.array(df[df['id']==i]['NM'])[[0,-1]])]
#[755, 134]
Using pandas:
def custom_function(series, pattern='FTTH'):
"""Identify if the first and last items do not have a pattern"""
first = pattern not in series.iat[0].upper()
last = pattern not in series.iat[-1].upper()
return first and last
df.groupby('project_id').Name.apply(custom_function)
Output:
project_id
134 True
584 False
755 True
Name: Name, dtype: bool
a different approuch with pandas:
res = df.groupby('project_id').apply(lambda x: ~x.Name.take([0,-1]).str
.contains('ftth',case=False).any())
res[res].reset_index().drop(0,axis=1)
>>>
'''
project_id
0 134
1 755
I have a excel file containing three columns as shown below,
project_id | Name | Date |
---|---|---|
755 | Waber | 2019-04-08 |
755 | Wabern | 2020-06-16 |
755 | Wabern (FTTH geplan) | 2020-07-24 |
755 | Wabern FTTH DTAG | 2020-08-15 |
755 | Wabern (FTTH DTAG gg) | 2021-03-05 |
755 | Wabern | 2021-09-13 |
134 | Lerbeck | 2019-04-18 |
134 | BAd oyehausen (FttH svd) | 2020-06-26 |
134 | Werre Park | 2020-07-14 |
134 | Werre Park (FTTH ssd) | 2020-08-25 |
134 | Werre Park (FTTH) | 2021-03-15 |
134 | Bad oyehausen | 2021-09-23 |
584 | kitern | 2019-04-08 |
584 | Lausen (ftth los) | 2020-06-16 |
584 | kitener (FTTH geplan) | 2020-07-24 |
584 | Lausern | 2020-08-15 |
584 | Lausern (FTTH DTAG gg) | 2021-03-05 |
Is it possible to filter out the project_id for which the name does not include any form of (FTTH) in the first row of the same id and at the last row of the same id. For example the first project_id 755 in the first row doesn’t have any form of (FTTH) included in it and also the last row of the same project_id doesn’t have the text (fttH), but for the project_id 584 it is a different scenario. Is it possible to filter out the project_id which doesn’t include FTTH in any form in the first and last row of the same project_id ?
Expecting the result to be like,
project_id |
---|
755 |
134 |
By looping over the unique id, and each time get the all name by specific id, after keep only first and last name, convert it to str and check there is any word ‘FTTH’
Code:
import numpy as np
[i for i in set(df.id.values) if 'FTTH' not in str(np.array(df[df['id']==i]['NM'])[[0,-1]])]
#[755, 134]
Using pandas:
def custom_function(series, pattern='FTTH'):
"""Identify if the first and last items do not have a pattern"""
first = pattern not in series.iat[0].upper()
last = pattern not in series.iat[-1].upper()
return first and last
df.groupby('project_id').Name.apply(custom_function)
Output:
project_id
134 True
584 False
755 True
Name: Name, dtype: bool
a different approuch with pandas:
res = df.groupby('project_id').apply(lambda x: ~x.Name.take([0,-1]).str
.contains('ftth',case=False).any())
res[res].reset_index().drop(0,axis=1)
>>>
'''
project_id
0 134
1 755