Efficiently add a value to a new column in a large DataFrame

Question:

I have two dataframes, adv_text with about 9,000 rows and events with over 900,000 rows. events is essentially an expanded version of adv_text with about 100 rows per row in adv_text. I want to add three columns from adv_text to events.

The following code is a partial addition of one column.

events_x = events.head(30000).copy()

def add_date(game_id):
    date = adv_text[adv_text['id_odsp'] == game_id]['date']
    return(date.iloc[0])

events_x['date'] = events_x['id_odsp'].apply(add_date)

This test code takes almost 25 seconds for 30,000 rows. At this speed, adding all three columns over the full dataframe will take nearly 40 minutes. Is this typical? Is there a faster way to accomplish this task?

Asked By: Cuenca Guy

||

Answers:

IIUC, one way is to use merge:

events_x['date'] = events_x.merge(adv_text[['id_odsp', 'date']], on='id_odsp')['date']

More information: Pandas Merging 101

Answered By: Corralien
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.