split columns, extract numbers, and subtract difference
Question:
Community! I have this following df
data = {'exp_lvl': ['5-10 yrs', '3-5 yrs', '1-3 Years']}
df = pd.DataFrame(data)
my goal is something like:
my approach is to 1. replace values , 2. split, 3. append to list, 4. build columns from the appended lists. However i’m stuck in the last step and maybe there is a easier way to approach
thansk so much!!!
Answers:
This was not hard. Just mechanical. Did you make any effort?
data = {'exp_lvl': ['5-10 yrs', '3-5 yrs', '1-3 Years']}
data['first'] = []
data['second'] = []
data['difference'] = []
for row in data['exp_lvl']:
parts = [int(i) for i in row.split(' ')[0].split('-')]
data['first'].append( parts[0] )
data['second'].append( parts[1] )
data['difference'].append( parts[1]-parts[0] )
print(data)
import pandas as pd
df = pd.DataFrame(data)
print(df)
Output:
C:tmp>python x.py
{'exp_lvl': ['5-10 yrs', '3-5 yrs', '1-3 Years'], 'first': [5, 3, 1], 'second': [10, 5, 3], 'difference': [5, 2, 2]}
exp_lvl first second difference
0 5-10 yrs 5 10 5
1 3-5 yrs 3 5 2
2 1-3 Years 1 3 2
C:tmp>
Here is another way:
df.join(df['exp_lvl'].str.extractall(r'(d+)')[0]
.unstack()
.rename({0:'first',1:'second'},axis=1)
.astype(float)
.assign(diff = lambda x: x['second'] - x['first']))
or
(df.join(
df['exp_lvl'].str.extract(r'(?P<first>d+)-(?P<second>d+)')
.astype(int)
.assign(difference = lambda x: x['second'] - x['first'])))
Output:
exp_lvl first second difference
0 5-10 yrs 5 10 5
1 3-5 yrs 3 5 2
2 1-3 Years 1 3 2
Use pandas str.split
to construct column first
and second
. Next, compute to get column different
df[['first', 'second']] = df.exp_lvl.str.split('-| ').str[:2].tolist()
df['difference'] = df['second'].astype(int) - df['first'].astype(int)
Out[103]:
exp_lvl first second difference
0 5-10 yrs 5 10 5
1 3-5 yrs 3 5 2
2 1-3 Years 1 3 2
Another way:
df[['first', 'second']] = df.exp_lvl.str.extract(r'(d+)-(d+)')
df['difference'] = df['second'].astype(int) - df['first'].astype(int)
Community! I have this following df
data = {'exp_lvl': ['5-10 yrs', '3-5 yrs', '1-3 Years']}
df = pd.DataFrame(data)
my goal is something like:
my approach is to 1. replace values , 2. split, 3. append to list, 4. build columns from the appended lists. However i’m stuck in the last step and maybe there is a easier way to approach
thansk so much!!!
This was not hard. Just mechanical. Did you make any effort?
data = {'exp_lvl': ['5-10 yrs', '3-5 yrs', '1-3 Years']}
data['first'] = []
data['second'] = []
data['difference'] = []
for row in data['exp_lvl']:
parts = [int(i) for i in row.split(' ')[0].split('-')]
data['first'].append( parts[0] )
data['second'].append( parts[1] )
data['difference'].append( parts[1]-parts[0] )
print(data)
import pandas as pd
df = pd.DataFrame(data)
print(df)
Output:
C:tmp>python x.py
{'exp_lvl': ['5-10 yrs', '3-5 yrs', '1-3 Years'], 'first': [5, 3, 1], 'second': [10, 5, 3], 'difference': [5, 2, 2]}
exp_lvl first second difference
0 5-10 yrs 5 10 5
1 3-5 yrs 3 5 2
2 1-3 Years 1 3 2
C:tmp>
Here is another way:
df.join(df['exp_lvl'].str.extractall(r'(d+)')[0]
.unstack()
.rename({0:'first',1:'second'},axis=1)
.astype(float)
.assign(diff = lambda x: x['second'] - x['first']))
or
(df.join(
df['exp_lvl'].str.extract(r'(?P<first>d+)-(?P<second>d+)')
.astype(int)
.assign(difference = lambda x: x['second'] - x['first'])))
Output:
exp_lvl first second difference
0 5-10 yrs 5 10 5
1 3-5 yrs 3 5 2
2 1-3 Years 1 3 2
Use pandas str.split
to construct column first
and second
. Next, compute to get column different
df[['first', 'second']] = df.exp_lvl.str.split('-| ').str[:2].tolist()
df['difference'] = df['second'].astype(int) - df['first'].astype(int)
Out[103]:
exp_lvl first second difference
0 5-10 yrs 5 10 5
1 3-5 yrs 3 5 2
2 1-3 Years 1 3 2
Another way:
df[['first', 'second']] = df.exp_lvl.str.extract(r'(d+)-(d+)')
df['difference'] = df['second'].astype(int) - df['first'].astype(int)