convert into number of years based on experience in string using python
Question:
I have a column with data like
9 years 3 months 22 days. How to convert it into
=9+(3/12)+(22/365)
=9.31
I want output as 9.31.
How to do the same in python? Please anyone can help me out?
Answers:
If leaps years are not important extract values by Series.str.extract
and count new column by multiple with dictionary and sum values:
df = pd.DataFrame({'col': ['9 years 3 months 22 days',
'3 YEARS 6 MONTHS',
'4 YEARS',
'3 YEARS',
'3.6']})
numeric = pd.to_numeric(df['col'], errors='coerce')
print (numeric)
y = df['col'].str.extract(r'(?i)(d+)s*year', expand=False).astype(float)
m = df['col'].str.extract(r'(?i)(d+)s*month', expand=False).astype(float).div(12)
d = df['col'].str.extract(r'(?i)(d+)s*day', expand=False).astype(float).div(365)
df['float'] = y.add(m, fill_value=0).add(d, fill_value=0).fillna(numeric)
print (df)
col float
0 9 years 3 months 22 days 9.310274
1 3 YEARS 6 MONTHS 3.500000
2 4 YEARS 4.000000
3 3 YEARS 3.000000
4 3.6 3.600000
If you want to update the same column then you can use
df = pd.DataFrame({'time': ['9 years 3 months 22 days',
'9 years 3 months 21 days',
'9 years 3 months 20 days']})
for idx in range(len(df)):
if df.time[idx] is np.NaN:
continue
# Extract values from string
val = df.time[idx].lower()
if ' ' in val:
val = val.split()
t = {val[i + 1]: int(val[i]) for i in range(0, len(val), 2)}
# calculate float value upto two decimal places
float_val = "{:.2f}".format(t.get("years", 0) + (t.get("months", 0)/12) + (t.get("days", 0)/365))
# update table
df.time[idx] = float_val
I have a column with data like
9 years 3 months 22 days. How to convert it into
=9+(3/12)+(22/365)
=9.31
I want output as 9.31.
How to do the same in python? Please anyone can help me out?
If leaps years are not important extract values by Series.str.extract
and count new column by multiple with dictionary and sum values:
df = pd.DataFrame({'col': ['9 years 3 months 22 days',
'3 YEARS 6 MONTHS',
'4 YEARS',
'3 YEARS',
'3.6']})
numeric = pd.to_numeric(df['col'], errors='coerce')
print (numeric)
y = df['col'].str.extract(r'(?i)(d+)s*year', expand=False).astype(float)
m = df['col'].str.extract(r'(?i)(d+)s*month', expand=False).astype(float).div(12)
d = df['col'].str.extract(r'(?i)(d+)s*day', expand=False).astype(float).div(365)
df['float'] = y.add(m, fill_value=0).add(d, fill_value=0).fillna(numeric)
print (df)
col float
0 9 years 3 months 22 days 9.310274
1 3 YEARS 6 MONTHS 3.500000
2 4 YEARS 4.000000
3 3 YEARS 3.000000
4 3.6 3.600000
If you want to update the same column then you can use
df = pd.DataFrame({'time': ['9 years 3 months 22 days',
'9 years 3 months 21 days',
'9 years 3 months 20 days']})
for idx in range(len(df)):
if df.time[idx] is np.NaN:
continue
# Extract values from string
val = df.time[idx].lower()
if ' ' in val:
val = val.split()
t = {val[i + 1]: int(val[i]) for i in range(0, len(val), 2)}
# calculate float value upto two decimal places
float_val = "{:.2f}".format(t.get("years", 0) + (t.get("months", 0)/12) + (t.get("days", 0)/365))
# update table
df.time[idx] = float_val