How Do I Create New Column In Pandas Dataframe Using Two Columns Simultaneously From A Different Dataframe?

Question:

EDIT: Also, it is okay if I lose the month and day in the "State_Date" column. In other words, it may be easier to make values in the "State_Date" column datatype int as well, preserving just the year, and then merge on that.

I have two pandas dataframes, df_montly and df_pop.

df_monthly looks more or less like this, with Start_Date as datatype datetime64:

Jurisdiction Start_Date CrimeCount
AR0010000 2007-02-24 10.0
WVWSP9000 2008-06-04 15.0

df_pop is a dataframe containing Jurisdictions and their corresponding populations for any given year (datatype int64), like:

data_year ori population
1970 AK0010100 44237
1970 AK0010200 13311

I want to create a new column in df_monthly called year_pop, which contains the corresponding population for that jurisdiction and year of the Start_Date value.

I tried achieving this with "data_year" as datatype period[A-DEC] with the following:

# merge the two dataframes
merged_df = pd.merge(df_monthly, df_pop, left_on='Jurisdiction', right_on='ori')

# create a new column "year_pop"
merged_df['year_pop'] = merged_df.apply(lambda x: df_pop[(df_pop['ori']==x['ori']) & (df_pop['data_year']==x['Start_Date'].to_period('A-DEC'))]['population'].values[0], axis=1)

# drop unnecessary columns
merged_df.drop(['data_year', 'ori', 'population'], axis=1, inplace=True)

# assign the merged dataframe to 'df_monthly'
df_monthly = merged_df

However, this gives me an index 0 is out of bounds error. Is there a more straightforward way of doing this?

Asked By: TheMaffGuy

||

Answers:

IIUC, why don’t you extract the year from Start_Date column and merge both on ['Jurisdiction', df['Start_Date'].dt.year] and ['ori', 'data_year']. Something like:

df_merged = (df_monthly.assign(year=df_monthly['Start_Date'].dt.year)
                       .merge(df_pop, how='inner',
                              left_on=['Jurisdiction', 'year'],
                              right_on=['ori', 'data_year']))
Answered By: Corralien

You could also use:

df_monthly['data_year'] = df_monthly['Start_Date'].dt.year
df_merged = pd.merge(df_monthly, df_pop, how = 'outer', on = ['data_year', 'Jurisdiction'])
df_merged.drop('data_year', axis = 1, inplace = True)
Answered By: user19077881
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.