How can I calculate sequences of contact events?

Question:

I have a dataset that represents contact events between tumors. The dataset is grouped by the "base-cell" and then sorted on "Neighbor-cell" and "Time-frame", it looks like this:

index base-cell neighbor-cell timeframe
0 Track_1 Track_4 1
1 Track_1 Track_4 2
2 Track_1 Track_4 3
3 Track_1 Track_4 4
4 Track_1 Track_4 8
5 Track_1 Track_4 9
6 Track_1 Track_4 10
7 Track_1 Track_6 1
8 Track_1 Track_6 2

Because the dataframe is grouped on base-tumor, I have multiple dataframes with ascending base-tumor.

The end result that I’m trying to get to is a dictionary with all tracks that contains a dictionary with all tracks that have a contact event with it, and then they contain a list of the frames where there is a sequence of contact events. It looks like this:

{Track_1: {Track_4: [[1,4], [8,10], 
           Track_6: [[1,2]]},
 Track_2: {Track_5: [[10, 14], [20, 25], [28, 31]}}

What I’ve done till now is, I made an extra column that shows a 1 if there is a sequence and a 0 if there is no sequence of contact events.

def get_sequence(df):
    
    for id, grp in df:
        prev_id = grp['id_2'].shift(1).fillna(0)
        prev_frame = grp['FRAME'].shift(1)
        
        conditions = [
            ((grp['id_2'] == prev_id) & 
            (grp['FRAME']) - prev_frame == 1)
        ]

        
        choises = [1]
        
        grp['sequence'] = np.select(conditions, choises, default=0)
        print(grp)

Now I’m stuck and don’t know if I’m going in the right direction and if so, how to take the next step.

Asked By: dennis vlaar

||

Answers:

Here would be one way:

# Identify continuous timeframes.
df['consec'] = df.groupby(['base-tumor', 'neighbor-tumor'])['timeframe'].transform(lambda s: s.diff().ne(1).cumsum())

# Get timeframe intervals.
t_df = (df.groupby(['base-tumor', 'neighbor-tumor', 'consec']).
        agg(t_start=('timeframe', 'first'), t_end=('timeframe', 'last')).
        droplevel(-1)
       )
t_df = t_df[t_df['t_start'].ne(t_df['t_end'])]
t_df['interval'] = list(zip(t_df['t_start'], t_df['t_end']))

# Convert to dictionary.
result = {k: g.droplevel(0)['interval'].groupby(level=0).agg(list).to_dict()
          for k, g in t_df.drop(columns=['t_start', 't_end']).groupby(level=0)}

print(result)
{'Track_1': {'Track_4': [(1, 4), (8, 10)], 'Track_6': [(1, 2)]}}
Answered By: user2246849
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.