Calculating the percentage change by year for a particular column
Question:
Suppose I have this dataframe
data = {'Year': ['2010', '2011', '2012'],
'Total_Population': [1000, 1200, 1600]}
df_sample = pd.DataFrame(data=data)
df_sample.head()
I want to calculate the percentage change in Year
for Total_Population
. I tried following this here. Which boiled down to this code
cols1 = ['Year', 'Total_Population']
vals = df[cols1].merge(df.assign(year=df['Year']+1), on=cols1, how='left')['values']
df['change_in_population'] = df['values'].sub(vals).div(vals).mul(100).fillna(0)
For the actual .csv file the size is about 215.2 MB and when I try the latter my memory blows up and I have 64 gb of RAM. Is there a better way of doing this?
Answers:
Did you try pct_change
?
df_sample['% change'] = df_sample['Total_Population'].pct_change() * 100
print(df_sample)
Output:
Year Total_Population % change
0 2010 1000 NaN
1 2011 1200 20.000000
2 2012 1600 33.333333
Suppose I have this dataframe
data = {'Year': ['2010', '2011', '2012'],
'Total_Population': [1000, 1200, 1600]}
df_sample = pd.DataFrame(data=data)
df_sample.head()
I want to calculate the percentage change in Year
for Total_Population
. I tried following this here. Which boiled down to this code
cols1 = ['Year', 'Total_Population']
vals = df[cols1].merge(df.assign(year=df['Year']+1), on=cols1, how='left')['values']
df['change_in_population'] = df['values'].sub(vals).div(vals).mul(100).fillna(0)
For the actual .csv file the size is about 215.2 MB and when I try the latter my memory blows up and I have 64 gb of RAM. Is there a better way of doing this?
Did you try pct_change
?
df_sample['% change'] = df_sample['Total_Population'].pct_change() * 100
print(df_sample)
Output:
Year Total_Population % change
0 2010 1000 NaN
1 2011 1200 20.000000
2 2012 1600 33.333333