Insert values from a column in a DataFrame to another based on column values

Question:

I am wondering how I can add one column from a dataframe, to another based on the dates in both columns. I tried to simplify this as much as I can.

I have two dataframes one is filled with event types and the cost done for the event, the other being the observation from a nearby station giving how much rainfall occurred as well. But

Here is what the dataframes looks like:

d1 = {'ObservationDate': ['2021-09-11', '2021-06-24', '2020-09-23', '2015-10-09'],
        'EventType': ['Flood', 'Flood', 'Wind', 'Hail'],
      'Closures': ['Schools', 'Schools', 'None', 'Offices'],
      'Cost': [1000, 4000, 100, 8000]}
events = pd.DataFrame(data = d1)

display(events)

    ObservationDate EventType   Closures    Cost
0   2021-09-11       Flood      Schools     1000
1   2021-06-24       Flood      Schools     4000
2   2020-09-23       Wind       None         100
3   2015-10-09       Hail       Offices     8000

And,

d2 = {'ObservationDate': ['2021-09-11', '2021-06-24', '2015-10-09', '2018-07-16'], 
      'Rainfall': [45, 90, 32, 22]}
observation = pd.DataFrame(data = d2)

display(observation)

    ObservationDate Rainfall
0   2021-09-11         45
1   2021-06-24         90
2   2015-10-09         32
3   2018-07-16         22

What I then did was filtered the observation DataFrame to get only the relevant event dates (in my actual code I do have the dates converted to the Pandas DateTime type).

filtered = observation[observation.ObservationDate.dt.strftime('%y%m%d').isin(events.ObservationDate.dt.strftime('%y%m%d'))]

What I am trying to do is take the Dates in the filtered dataframe and then add the "Event Type" column from the Event Dataframe, but only for those specific dates.

The ideal result would look like:

    ObservationDate Rainfall    EventType
0   2021-09-11        45          Flood
1   2021-06-24        90          Flood
2   2015-10-09        32          Hail

Things I have tried but I can’t seem to get it to work right is merging them, also trying to take the index of where the "isin" is true, as well as a few other mindless attempts.

I intuitively know how I would go about this, I would honestly just loop through each of the rows then find where the dates are equal to one another, then assign that value to the column. But I am trying to find a more "Python" way of doing this.

Any help is appreciated.

For further context, when I tried Merge on the full dataframe I have like was suggested by Clegange. The dataframe went from having 467,132 entries (taking up 110.5+ MB of memory) to having 2,317,895 entries (583.6+ MB). It seemed like when I was using merge, it was duplicating entries multiple times and not giving expected result on the full dataframe I have.

Asked By: Robert

||

Answers:

use merge

observation.merge(events.iloc[:, :2])
Answered By: Panda Kim

Following Clegane’s proposal, using merge() in this case would require you to previously modify slightly the events df:

output = observation.merge(events[['ObservationDate','EventType']],how='left',on='ObservationDate').dropna()

Returning:

  ObservationDate  Rainfall EventType
0      2021-09-11        45     Flood
1      2021-06-24        90     Flood
2      2015-10-09        32      Hail

Besides using merge() as Clegane proposed, you can also consider using map():

observation['EventType'] = observation['ObservationDate'].map(events[['ObservationDate','EventType']].set_index('ObservationDate').to_dict()['EventType'])

Returning:

  ObservationDate  Rainfall EventType
0      2021-09-11        45     Flood
1      2021-06-24        90     Flood
2      2015-10-09        32      Hail
3      2018-07-16        22       NaN

You can then simply drop the Null row with the NaN value

Answered By: Celius Stingher
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.