Extract sub list from list depending on values within list

Question:

I’m attempting to extract the following a sublist that follows the following rule :

| Events |
| {on,e_1...e_n,on} |  
|{off,e_1...e_n,on} |  
| {on,e_1...e_n,off}|  
| {off,e_1...e_n,off} |

Here is what I have so far :

import pandas as pd
import numpy as np


def get_subs(df):
    updated = []
    for index, row in df.iterrows():
        values = row['t']
        new_lst = []
        sub_list = []
        is_on_or_off_set = False

        for value in values:
            if (value == 'ON' or value == 'OFF') and is_on_or_off_set == False:
                sub_list.append(value)
                is_on_or_off_set = True
            elif value.startswith('sc') and is_on_or_off_set == True:
                sub_list.append(value)
            else:
                is_on_or_off_set = False
                sub_list.append(value)
                new_lst.append(sub_list)
                sub_list = []

        updated.append(new_lst)

    return updated


array = np.array([
    ['name1', ['ON', 'sc','sc', 'ON', 'ON', 'sc', 'sc', 'ON']]
    ,
    ['name2', ['OFF', 'sc', 'sc', 'ON', 'OFF', 'sc', 'sc','OFF']]
    ,
    ['name3', ['ON', 'sc', 'sc' , 'OFF', 'ON', 'sc', 'sc', 'OFF']]
    ,
    ['name4', ['ON' , 'sc1' , 'sc2' , 'OFF' , 'ON']]
    ,
    ['name5', ['OFF' , 'ON' , 'sc' , 'OFF' , 'ON']]
    ,
    ['name6', ['OFF' , 'OFF' , 'sc1' , 'OFF' , 'ON']]
    ,
    ['name6', ['ON', 'OFF', 'OFF', 'sc1', 'sc2' , 'OFF', 'ON','ON']]
])

index_values = ['1', '2', '3', '4', '5','6','7']
column_values = ['name', 't']
df = pd.DataFrame(data=array,
                  index=index_values,
                  columns=column_values)

subs = get_subs(df)
for s in subs :
    print(s)

which prints :

[['ON', 'sc', 'sc', 'ON'], ['ON', 'sc', 'sc', 'ON']]
[['OFF', 'sc', 'sc', 'ON'], ['OFF', 'sc', 'sc', 'OFF']]
[['ON', 'sc', 'sc', 'OFF'], ['ON', 'sc', 'sc', 'OFF']]
[['ON', 'sc1', 'sc2', 'OFF']]
[['OFF', 'ON'], ['sc'], ['OFF', 'ON']]
[['OFF', 'OFF'], ['sc1'], ['OFF', 'ON']]
[['ON', 'OFF'], ['OFF', 'sc1', 'sc2', 'OFF'], ['ON', 'ON']]

There is an issue when the algo encounters ['name6', ['OFF' , 'OFF' , 'sc1' , 'OFF' , 'ON']] as this is transformed to [['OFF', 'OFF'], ['sc1'], ['OFF', 'ON']] when I expect the transform to be [[‘OFF’ , ‘sc1’ , ‘OFF’]

How to modify such that if ['name6', ['OFF' , 'OFF' , 'sc1' , 'OFF' , 'ON']] it is transformed to [['OFF' , 'sc1' , 'OFF']] while not breaking any of the existing rules :

| Events |
| {on,e_1...e_n,on} |  
|{off,e_1...e_n,on} |  
| {on,e_1...e_n,off}|  
| {off,e_1...e_n,off} |

Each group should have at least 1 sc event.

Asked By: blue-sky

||

Answers:

I looked into your approach for a bit, but I couldn’t come up with a way that fixes your bugs.

I don’t know how much you simplified your problem here, but how about you join the lists to a string and try to find matching patterns with regex like this:

I picked the different possible combinations in your example, joined each list to a string and put them in a dict for demonstration.

dic = {
    'name1' : 'ON sc sc ON ON sc sc ON',
    'name2' : 'ON sc1 sc2 OFF ON',
    'name3' : 'OFF ON sc OFF ON',
    'name4' : 'ON OFF OFF sc1 sc2 OFF ON ON',
    'name5' : 'ON OFF OFF sc1 sc2 OFF ON ON OFF sc1 sc2 ON'
}
import re

pat = r"(ON|OFF)s?(scd?s?)+s?(ON|OFF)"
for key, string in dic.items():
    m = re.finditer(pat, string)
    if m:
        res = [elem.group().split(' ') for elem in m]
        print(f"{key=}:t{res=}")

Output:

key='name1':    res=[['ON', 'sc', 'sc', 'ON'], ['ON', 'sc', 'sc', 'ON']]
key='name2':    res=[['ON', 'sc1', 'sc2', 'OFF']]
key='name3':    res=[['ON', 'sc', 'OFF']]
key='name4':    res=[['OFF', 'sc1', 'sc2', 'OFF']]
key='name5':    res=[['OFF', 'sc1', 'sc2', 'OFF'], ['OFF', 'sc1', 'sc2', 'ON']]

You can find the Regex here.

Answered By: Rabinzel
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.