Pandas take number out string

Question:

In my data, I have this column "price_range".

Dummy dataset:

df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146', 'No pricing available']})

I am using pandas. What is the most efficient way to get the upper and lower bound of the price range in seperate columns?

Asked By: FlorNeufkens

||

Answers:

You can do the following. First create two extra columns lower and upper which contain the lower bound and the upper bound from each row. Then find the minimum from the lower column and maximum from the upper column.

df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146', 'No pricing available']})

df.loc[df.price_range != 'No pricing available', 'lower'] = df['price_range'].str.split('-').str[0]
df.loc[df.price_range != 'No pricing available', 'upper'] = df['price_range'].str.split('-').str[1]

df['lower'] = df.lower.str.replace('€', '').astype(float)
df['upper'] = df.upper.str.replace('€', '').astype(float)

price_range = [df.lower.min(), df.upper.max()]

Output:

>>> price_range
[3.0, 146.0]
Answered By: T C Molenaar

Alternatively, you can parse the string accordingly (if you want to limits for each row, rather than the total range:

df = pd.DataFrame({'price_range': ['€4 - €25', '€3 - €14', '€25 - €114', '€112 - €146']})



def get_lower_limit(some_string):
    a = some_string.split(' - ')
    return int(a[0].split('€')[-1])
    
def get_upper_limit(some_string):
    a = some_string.split(' - ')
    return int(a[1].split('€')[-1])
    
df['lower_limit'] = df.price_range.apply(get_lower_limit)
df['upper_limit'] = df.price_range.apply(get_upper_limit)

Output:

Out[153]: 
   price_range  lower_limit  upper_limit
0     €4 - €25            4           25
1     €3 - €14            3           14
2   €25 - €114           25          114
3  €112 - €146          112          146

Answered By: userE
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.