How to make a year range column into separate columns?
Question:
I am using Python to analyze a data set that has a column with a year range (see below for example):
Name
Years Range
Andy
1985 – 1987
Bruce
2011 – 2018
I am trying to convert the "Years Range" column that has a string of start and end years into two separate columns within the data frame to: "Year Start" and "Year End".
Name
Years Range
Year Start
Year End
Andy
1985 – 1987
1985
1987
Bruce
2011 – 2018
2011
2018
Answers:
You can use expand=True within split
function
df[['Year Start','Year End']] = df['Years Range'].str.split('-',expand=True)
output #
Nmae Years_Range Year Start Year End
0 NAdy 1995-1987 1995 1987
1 bruce 1890-8775 1890 8775
I think str.extract
can do the job.
Here is an example :
df = pd.DataFrame([ "1985 - 1987"], columns = [ "Years Range"])
df['Year Start'] = df['Years Range'].str.extract('(d{4})')
df['Year End'] = df['Years Range'].str.extract('- (d{4})')
df['start']=''#create a blank column name 'start'
df['end']=''#create a blank column name 'end'
#loop over the data frame
for i in range(len(df)):
df['start'][i]=df['Year'][i].split('-')[0]#split each data and store first element
df['end'][i]=df['Year'][i].split('-')[1]#split each data and store second element
I am using Python to analyze a data set that has a column with a year range (see below for example):
Name | Years Range |
---|---|
Andy | 1985 – 1987 |
Bruce | 2011 – 2018 |
I am trying to convert the "Years Range" column that has a string of start and end years into two separate columns within the data frame to: "Year Start" and "Year End".
Name | Years Range | Year Start | Year End |
---|---|---|---|
Andy | 1985 – 1987 | 1985 | 1987 |
Bruce | 2011 – 2018 | 2011 | 2018 |
You can use expand=True within split
function
df[['Year Start','Year End']] = df['Years Range'].str.split('-',expand=True)
output #
Nmae Years_Range Year Start Year End
0 NAdy 1995-1987 1995 1987
1 bruce 1890-8775 1890 8775
I think str.extract
can do the job.
Here is an example :
df = pd.DataFrame([ "1985 - 1987"], columns = [ "Years Range"])
df['Year Start'] = df['Years Range'].str.extract('(d{4})')
df['Year End'] = df['Years Range'].str.extract('- (d{4})')
df['start']=''#create a blank column name 'start'
df['end']=''#create a blank column name 'end'
#loop over the data frame
for i in range(len(df)):
df['start'][i]=df['Year'][i].split('-')[0]#split each data and store first element
df['end'][i]=df['Year'][i].split('-')[1]#split each data and store second element