Merge with one column and also with another if value between other values in pandas

Question:

I have the following dataframes

import pandas as pd
foo1 = pd.DataFrame({'id':[1,1,2,2],
              'phase':['Pre','Post','Pre','Post'],
              'date_start': ['2022-07-24', '2022-12-25', '2022-09-30', '2022-12-25'],
               'date_end': ['2022-07-30', '2023-03-07', '2022-10-05', '2023-03-04']})

foo2 = pd.DataFrame({'id': [1,1,1,1,
                     2,2,2,2],
              'date': ['2022-07-24', '2022-07-25', '2022-12-26', '2023-01-01',
                      '2022-10-04', '2022-11-25', '2022-12-26', '2023-03-01']})

print(foo1, 'n' ,foo2)

   id phase  date_start    date_end
0   1   Pre  2022-07-24  2022-07-30
1   1  Post  2022-12-25  2023-03-07
2   2   Pre  2022-09-30  2022-10-05
3   2  Post  2022-12-25  2023-03-04 
    id        date
0   1  2022-07-24
1   1  2022-07-25
2   1  2022-12-26
3   1  2023-01-01
4   2  2022-10-04
5   2  2022-11-25
6   2  2022-12-26
7   2  2023-03-01

I would like to get the phase column in foo2 by merging on id and if date is between date_start and date_end. If date is not within the range [date_start,date_end] then the phase column should have NaN

The resulting dataframe should look like this:

 id        date phase
0   1  2022-07-24   Pre
1   1  2022-07-25   Pre
2   1  2022-12-26  Post
3   1  2023-01-01  Post
4   2  2022-10-04   Pre
5   2  2022-11-25   NaN
6   2  2022-12-26  Post
7   2  2023-03-01  Post

How could I do that ?

I found this but it does not include "merging as well with id"

Asked By: quant

||

Answers:

Use inner join by id and compare columns by Series.between with DataFrame.loc, last add missing pairs id, date by left join:

df1 = foo2.merge(foo1, on='id')
df2 = df1.loc[df1['date'].between(df1['date_start'],df1['date_end']), ['id','date','phase']]
df = foo2.merge(df2, how='left')

print (df)
   id        date phase
0   1  2022-07-24   Pre
1   1  2022-07-25   Pre
2   1  2022-12-26  Post
3   1  2023-01-01  Post
4   2  2022-10-04   Pre
5   2  2022-11-25   NaN
6   2  2022-12-26  Post
7   2  2023-03-01  Post

If possible multiple rows match in df2 is possible filter only first rows:

df = foo2.merge(df2.drop_duplicates(['id','date']), how='left')
Answered By: jezrael
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.