Changing bad format of number and currency from user input to float number
Question:
I need to write a script in Python which will transform bad input from user to float number.
For example
"10,123.20 Kč" to "10123.2"
"10.023,123.45 Kč" to "10023123.45"
"20 743 210.2 Kč" to "20743210.2"
or any other bad input – this is what I’ve come up with.
Kč is Czech koruna.
My thought process was to get rid of any spaces, letters. Then change every comma to dot to make numbers looks like "123.123.456.78" then delete all dots except of last one in a string and then change it to float so it would looks like "123123456.78". But I don’t know how to do it. If you know any faster and easier way to do it, I would like to know.
This is what I have and I’m lost now.
import re
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
for i in my_list:
ws = i.replace("Kč", '')
x = re.sub(',','.', ws).replace(" ","")
print(x)
Answers:
This should do the job.
def parse_entry(entry):
#remove currency and spaces
entry = entry.replace("Kč", "")
entry = entry.replace(" ", "")
#check if a comma is used for decimals or thousands
comma_i = entry.find(",")
if len(entry[comma_i:]) > 3: #it's a thousands separator, it can be removed
entry = entry.replace(",", "")
else: #it's a decimal separator
entry = entry.replace(",", ".") #convert it to dot
#remove extra dots
while entry.count(".") > 1:
entry = entry.replace(".", "", 1) #replace once
return round(float(entry), 1) #round to 1 decimal
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
parsed = list(map(parse_entry, my_list))
print(parsed) #[100.3, 10000.0, 10000.0, 10000.0, 32100.3, 12345678.9]
You could select the find all numerics instead of trying to remove non-numerics
In any case you have to make some assumtpions about the input, here is the code assuming that a final block of two digits in a text with separators is the fractional part.
import re
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
for s in my_list:
parts = list(re.findall('d+', s))
if len(parts) == 1 or len(parts[-1]) != 2:
parts.append('0')
print(float(''.join(parts[:-1]) + '.' + parts[-1]))
I tried to keep your code and add just few lines. The idea is the store in a variable the number after "." and then add it after replacing the "," with "." and join the number separated by ".".
import re
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
for i in my_list:
ws = i.replace("Kč", '')
x = re.sub(',','.', ws).replace(" ","")
if len( x.split("."))>1:
end= x.split(".")[-1]
x = "".join([i for i in x.split(".")[:-1]])+"."+end
print(x)
Whilst the other answers work for your specific scenario (e.g. you know the current code you’re replacing), it’s not very extensible.
So here’s a more generic approach:
import re
values = [
"100,30 Kč",
"10 000,00 Kč",
"10,000.00 Kč",
"10000 Kč",
"32.100,30 Kč",
"12.345,678.91 Kč", # This value is a bit odd... is it _right_?
]
for value in values:
# Remove any character that's not a number or a comma
value = re.sub("[^0-9,]", "", value)
# Replace remaining commas with periods
value = value.replace(",", ".")
# Convert from string to number
value = float(value)
print(value)
This outputs:
100.3
10000.0
10.0
10000.0
32100.3
12345.67891
Without the aid or re you could just do this:
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
def fix(s):
r = []
for c in s:
if c in '0123456789':
r.append(c)
elif c == ',':
r.append('.')
elif not c in '. ':
break
return float(''.join(r))
for n in my_list:
print(fix(n))
Output:
100.3
10000.0
10.0
10000.0
32100.3
12345.67891
I need to write a script in Python which will transform bad input from user to float number.
For example
"10,123.20 Kč" to "10123.2"
"10.023,123.45 Kč" to "10023123.45"
"20 743 210.2 Kč" to "20743210.2"
or any other bad input – this is what I’ve come up with.
Kč is Czech koruna.
My thought process was to get rid of any spaces, letters. Then change every comma to dot to make numbers looks like "123.123.456.78" then delete all dots except of last one in a string and then change it to float so it would looks like "123123456.78". But I don’t know how to do it. If you know any faster and easier way to do it, I would like to know.
This is what I have and I’m lost now.
import re
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
for i in my_list:
ws = i.replace("Kč", '')
x = re.sub(',','.', ws).replace(" ","")
print(x)
This should do the job.
def parse_entry(entry):
#remove currency and spaces
entry = entry.replace("Kč", "")
entry = entry.replace(" ", "")
#check if a comma is used for decimals or thousands
comma_i = entry.find(",")
if len(entry[comma_i:]) > 3: #it's a thousands separator, it can be removed
entry = entry.replace(",", "")
else: #it's a decimal separator
entry = entry.replace(",", ".") #convert it to dot
#remove extra dots
while entry.count(".") > 1:
entry = entry.replace(".", "", 1) #replace once
return round(float(entry), 1) #round to 1 decimal
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
parsed = list(map(parse_entry, my_list))
print(parsed) #[100.3, 10000.0, 10000.0, 10000.0, 32100.3, 12345678.9]
You could select the find all numerics instead of trying to remove non-numerics
In any case you have to make some assumtpions about the input, here is the code assuming that a final block of two digits in a text with separators is the fractional part.
import re
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
for s in my_list:
parts = list(re.findall('d+', s))
if len(parts) == 1 or len(parts[-1]) != 2:
parts.append('0')
print(float(''.join(parts[:-1]) + '.' + parts[-1]))
I tried to keep your code and add just few lines. The idea is the store in a variable the number after "." and then add it after replacing the "," with "." and join the number separated by ".".
import re
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
for i in my_list:
ws = i.replace("Kč", '')
x = re.sub(',','.', ws).replace(" ","")
if len( x.split("."))>1:
end= x.split(".")[-1]
x = "".join([i for i in x.split(".")[:-1]])+"."+end
print(x)
Whilst the other answers work for your specific scenario (e.g. you know the current code you’re replacing), it’s not very extensible.
So here’s a more generic approach:
import re
values = [
"100,30 Kč",
"10 000,00 Kč",
"10,000.00 Kč",
"10000 Kč",
"32.100,30 Kč",
"12.345,678.91 Kč", # This value is a bit odd... is it _right_?
]
for value in values:
# Remove any character that's not a number or a comma
value = re.sub("[^0-9,]", "", value)
# Replace remaining commas with periods
value = value.replace(",", ".")
# Convert from string to number
value = float(value)
print(value)
This outputs:
100.3
10000.0
10.0
10000.0
32100.3
12345.67891
Without the aid or re you could just do this:
my_list = ['100,30 Kč','10 000,00 Kč', '10,000.00 Kč', '10000 Kč', '32.100,30 Kč', '12.345,678.91 Kč']
def fix(s):
r = []
for c in s:
if c in '0123456789':
r.append(c)
elif c == ',':
r.append('.')
elif not c in '. ':
break
return float(''.join(r))
for n in my_list:
print(fix(n))
Output:
100.3
10000.0
10.0
10000.0
32100.3
12345.67891