How to convert total years and months to corresponding float/decimal values in pandas

Question:

How do I convert the values of Years_in_service to its corresponding decimal/float values ?

For example, '5 year(s), 7 month(s), 3 day(s)' has a decimal value of 5.59

import pandas as pd 
import numpy as np

data = {'ID':['A1001','A5001','B1001','D5115','K4910'],
'Years_in_service': ['5 year(s), 7 month(s), 3 day(s)', '16 year(s), 0 month(s), 25 day(s)', 
'7 year(s), 0 month(s), 2 day(s)', '0 year(s), 11 month(s), 23 day(s)','1 year(s), 0 month(s), 6 day(s)'],
 'Age': [45, 59,21,18,35]}

df = pd.DataFrame(data)  

df 

Currently I’m able to extract only the year (See my attempt below)

df['Years_in_service'].str[:2].astype(float)

Please show your full code, Thanks for your attempt.

Asked By: RayX500

||

Answers:

How about this?

After:

import pandas as pd 
import numpy as np

data = {'ID':['A1001','A5001','B1001','D5115','K4910'],
'Years_in_service': ['5 year(s), 7 month(s), 3 day(s)', '16 year(s), 0 month(s), 25 day(s)', 
'7 year(s), 0 month(s), 2 day(s)', '0 year(s), 11 month(s), 23 day(s)','1 year(s), 0 month(s), 6 day(s)'],
 'Age': [45, 59,21,18,35]}

df = pd.DataFrame(data)  

Do this:

returnlist = []

for each in df['Years_in_service']:
    years, months, days = [float(i.strip().split(' ')[0]) for i in each.split(',')]
    returnlist.append(years + months/12 + days/365.25)
    
for each in returnlist:
    print (f'Years in service: {each:.2f}')
    
# Result:
#     Years in service: 5.59
#     Years in service: 16.07
#     Years in service: 7.01
#     Years in service: 0.98
#     Years in service: 1.02

You could make it more compact (but less readable) like this. I don’t think there’s a computational upside, but here’s the idea anyway:

for each in df['Years_in_service']:
    returnlist.append(np.sum(np.array([1, 1/12, 1/365.25])*np.array([float(i.strip().split(' ')[0]) for i in each.split(',')])))
Answered By: Vin

Here’s a way to do:

def convert_dates(y,m,d):
    return round(int(y) + int(m)/12 + int(d)/365.25, 2)
    

df['date_float'] = df['Years_in_service'].apply(lambda x: convert_dates(*[int(i) for i in x.split(' ') if i.isnumeric()]))

print(df)

      ID                   Years_in_service  Age  date_float
0  A1001    5 year(s), 7 month(s), 3 day(s)   45        5.59
1  A5001  16 year(s), 0 month(s), 25 day(s)   59       16.07
2  B1001    7 year(s), 0 month(s), 2 day(s)   21        7.01
3  D5115  0 year(s), 11 month(s), 23 day(s)   18        0.98
4  K4910    1 year(s), 0 month(s), 6 day(s)   35        1.02

Note:

*[int(i) for i in x.split(' ') if i.isnumeric()] <- This expression unpacks the list and passes the numbers as argument to the convert_dates function.

Answered By: YOLO

If you don’t care about the years/month precision, and the Year/Month/Day are always present and in this order, you can extractall the 3 numbers, divide by the average conversion factor and sum:

df['Total'] = (pd.to_numeric(df['Years_in_service'].str.extractall('(d+)')[0])
                 .unstack().div([1, 12, 365.25]).sum(axis=1)
                 .round(2) # optional
              )

output:

      ID                   Years_in_service  Age  Total
0  A1001    5 year(s), 7 month(s), 3 day(s)   45   5.59
1  A5001  16 year(s), 0 month(s), 25 day(s)   59  16.07
2  B1001    7 year(s), 0 month(s), 2 day(s)   21   7.01
3  D5115  0 year(s), 11 month(s), 23 day(s)   18   0.98
4  K4910    1 year(s), 0 month(s), 6 day(s)   35   1.02
Answered By: mozway
i=1
for name in ['month','day']:
    df[name] = [date.split(',')[i].split(' ')[1] for date in df['Years_in_service']]
    df[name]=df[name].astype('float64')
    i+=1
df['years']=[date.split(',')[0].split(' ')[0] for date in df['Years_in_service']]    
df['years'] = df['years'].astype('float64')

i=12
for name in ['month','day']:
    df[name] = [x/i for x in df[name]]
    i=365.25
l=[]
for i in range(len(df.index)):
     l.append(round((df.iloc[i,3]+df.iloc[i,4]+df.iloc[i,5]),2))
df['decimal_date'] =l       
    
Answered By: atena karimi
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.