How to get the real number after a string in a file
Question:
I have files that contain both strings and floats. I am interested in finding the floats after a specific string. Any help in writing such a function that reads the file look for that specific string and returns the float after it will be much appreciated.
Thanks
An example of a file is
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
import re
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
str_to_search = 'xxxxxxxxxx'
num = re.findall(r'^' + str_to_search + r' (d+.d+)', lines, flags=re.M)
print(num)
This works if there are no negative signs. In other words, if the number after the string ‘xxxxxxxxxx’ is 1.099 rather than ‘-1.099’, it works fine. The question I have is how to generalize so it accounts for negative numbers as well given that it can be positive number (no sign in this case) or a negative number (with a negative sign in this case)
Answers:
You can use regex
(-?d+.?d*)
import re
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5
xxxxxxxxxx 1.099"""
str_to_search = "xxxxxxxxxx"
num = re.findall(fr"(?m)^{str_to_search}s+(-?d+.?d*)", lines)
print(num)
Prints:
['-1.099', '1.099']
You can change the regex to following:
num = re.findall(r'^' + str_to_search + r' (-?d+.?d*)', lines, flags=re.M)
I would just split the entire filecontent at every space. This will give us a list of all strings and floats. Then use list.index(" ") to find the index of the string you are searching for, put that into try/except to make sure your code wont stop if the string is not in the contents. Then just read the next element and try to convert it to a float.
Code:
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
lines = lines.replace("n", " ").split(" ") # replace the newlines with spaces to split them as well
try:
float_index = lines.index("xxxxxxxxxx") + 1 # Get the element after the string you are trying to find
num = float(lines[float_index])
except Exception as e:
print(e)
print(num)
If you are looking for a solution in regex, use Andrej Kesely’s awnser.
I have files that contain both strings and floats. I am interested in finding the floats after a specific string. Any help in writing such a function that reads the file look for that specific string and returns the float after it will be much appreciated.
Thanks
An example of a file is
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
import re
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
str_to_search = 'xxxxxxxxxx'
num = re.findall(r'^' + str_to_search + r' (d+.d+)', lines, flags=re.M)
print(num)
This works if there are no negative signs. In other words, if the number after the string ‘xxxxxxxxxx’ is 1.099 rather than ‘-1.099’, it works fine. The question I have is how to generalize so it accounts for negative numbers as well given that it can be positive number (no sign in this case) or a negative number (with a negative sign in this case)
You can use regex
(-?d+.?d*)
import re
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5
xxxxxxxxxx 1.099"""
str_to_search = "xxxxxxxxxx"
num = re.findall(fr"(?m)^{str_to_search}s+(-?d+.?d*)", lines)
print(num)
Prints:
['-1.099', '1.099']
You can change the regex to following:
num = re.findall(r'^' + str_to_search + r' (-?d+.?d*)', lines, flags=re.M)
I would just split the entire filecontent at every space. This will give us a list of all strings and floats. Then use list.index(" ") to find the index of the string you are searching for, put that into try/except to make sure your code wont stop if the string is not in the contents. Then just read the next element and try to convert it to a float.
Code:
lines = """aaaaaaaaaaaaaaa bbbbbbbbbbbbbbb cccccccccc
qq vvv rrr ssssa 22.6
zzzzx bbbb 12.0
xxxxxxxxxx -1.099
zzzz bbb nnn 33.5"""
lines = lines.replace("n", " ").split(" ") # replace the newlines with spaces to split them as well
try:
float_index = lines.index("xxxxxxxxxx") + 1 # Get the element after the string you are trying to find
num = float(lines[float_index])
except Exception as e:
print(e)
print(num)
If you are looking for a solution in regex, use Andrej Kesely’s awnser.