python pandas complex merging of two dataframes

Question:

I have an interval (let’s say from 0 to 45) and I split it up based on the change in the value. The problem is that I have 2 values (value1 and value2) that I am trying to split the graph based on them and then join them by creating more point splits and giving them a value (see the examples)

I have two pandas dataframes as follows:

From1 To1 Value1
0 3 1.
3 15 2.
15 30 1.
30 45 3.
From2 To2 Value2
0 5 b)
5 11 a)
11 30 c)
30 45 a)

I would like to join them to get something like this:

From To Value1 Value2
0 3 1. b)
3 5 2. b)
5 11 2. a)
11 15 2. c)
15 30 1. c)
30 45 3. a)

I tried to get all values from columns: From1 and From2 and create From column, but I don’t know how to continue.

Asked By: Tajda Koščak

||

Answers:

You can create individual rows for each step (here considering 1), then use a double groupby.agg:

def reindex_int(df):
    tmp = df.loc[df.index.repeat(df['To'].sub(df['From']))]
    s = tmp.groupby(level=0).cumcount()

    tmp['From'] += s
    tmp['To'] = tmp['From']+1
    
    return tmp

out = (pd.concat([reindex_int(df1.rename(columns={'From1': 'From', 'To1': 'To'})),
                  reindex_int(df2.rename(columns={'From2': 'From', 'To2': 'To'}))])
         .groupby(['From', 'To'], as_index=False).first()
         .pipe(lambda d: d.groupby(d[['Value1', 'Value2']]
                                   .ne(d[['Value1', 'Value2']].shift())
                                   .any(axis=1).cumsum())
                           .agg({'From': 'min', 'To': 'max',
                                 'Value1': 'first', 'Value2': 'first'})
              )
      
      )

Output:

   From  To  Value1 Value2
1     0   3     1.0     b)
2     3   5     2.0     b)
3     5  11     2.0     a)
4    11  15     2.0     c)
5    15  30     1.0     c)
6    30  45     3.0     a)

Intermediate:

reindex_int(df1.rename(columns={'From1': 'From', 'To1': 'To'}))

   From  To  Value1
0     0   1       1
0     1   2       1
0     2   3       1
1     3   4       2
1     4   5       2
1     5   6       2
1     6   7       2
1     7   8       2
1     8   9       2
1     9  10       2
1    10  11       2
1    11  12       2
1    12  13       2
1    13  14       2
1    14  15       2
2    15  16       1
2    16  17       1
2    17  18       1
2    18  19       1
2    19  20       1
2    20  21       1
2    21  22       1
2    22  23       1
2    23  24       1
2    24  25       1
2    25  26       1
2    26  27       1
2    27  28       1
2    28  29       1
2    29  30       1
3    30  31       3
3    31  32       3
3    32  33       3
3    33  34       3
3    34  35       3
3    35  36       3
3    36  37       3
3    37  38       3
3    38  39       3
3    39  40       3
3    40  41       3
3    41  42       3
3    42  43       3
3    43  44       3
3    44  45       3
Answered By: mozway

Here is an alternative way:

ndf = (pd.merge(df.assign(t = [range(s,e+1) for s,e in zip(df['From1'],df['To1'])]).explode('t'),
df2.assign(t = [range(s,e+1) for s,e in zip(df2['From2'],df2['To2'])])
.explode('t')))

ndf = (ndf.groupby(['Value1','Value2'],sort=False)
.agg(From = ('t','first'),To = ('t','last'))
.drop_duplicates(keep=False)
.reset_index()))

Output:

   Value1 Value2 From  To
0     1.0     b)    0   3
1     2.0     b)    3   5
2     2.0     a)    5  11
3     2.0     c)   11  15
4     1.0     c)   15  30
5     3.0     a)   30  45
Answered By: rhug123
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.