How to obtain substring of big string text in Python?
Question:
I have the following format of text files, which are outputs of an API:
TASK [Do this]
OK: {
"changed":false,
"msg": "check ok"
}
TASK [Do that]
OK
TASK [Do x]
Fatal: "Error message x"
TASK [Do y]
OK
TASK [Do z]
Fatal: "Stopped because of previous error"
The amount of lines, or tasks before and after the "Fatal" error are random, and I am only interested in the "Error message x" part.
Code as of now:
url = # API URL
r = request.get(url, verify=False, allow_redirects=True, headers=headers, timeout=10)
output = r.text
I tried using a combination of output.split("Fatal", 1)[1]
but it seems to return list index out of range
, while also messing up the text, adding a lot of n
.
Answers:
You should be able to use regular expressions with the re package to do that fairly easily. If it is possible for more than one occurrence of "Error Message X" then using something along the lines of
someVar = re.findall("Error Message X", output)
should return a list of all occurrences of strings within the output text that match. Findall can also be used if only one occurrence is possible, it will then just return a list with only one element.
Here is a helpful site for an intro to re
https://www.w3schools.com/python/python_regex.asp
You can use the re
package to use a regular expression to search for the text you need. There are probably more optimal regex, but I wrote this one quickly using regex101.com: Fatal: "(.+)"
import re
s = '''TASK [Do this]
OK: {
"changed":false,
"msg": "check ok"
}
TASK [Do that]
OK
TASK [Do x]
Fatal: "Error message x"
TASK [Do y]
OK
TASK [Do z]
Fatal: "Stopped because of previous error"'''
errors = re.findall(r'Fatal: "(.+)"', s)
for x in errors:
print(x)
I have the following format of text files, which are outputs of an API:
TASK [Do this]
OK: {
"changed":false,
"msg": "check ok"
}
TASK [Do that]
OK
TASK [Do x]
Fatal: "Error message x"
TASK [Do y]
OK
TASK [Do z]
Fatal: "Stopped because of previous error"
The amount of lines, or tasks before and after the "Fatal" error are random, and I am only interested in the "Error message x" part.
Code as of now:
url = # API URL
r = request.get(url, verify=False, allow_redirects=True, headers=headers, timeout=10)
output = r.text
I tried using a combination of output.split("Fatal", 1)[1]
but it seems to return list index out of range
, while also messing up the text, adding a lot of n
.
You should be able to use regular expressions with the re package to do that fairly easily. If it is possible for more than one occurrence of "Error Message X" then using something along the lines of
someVar = re.findall("Error Message X", output)
should return a list of all occurrences of strings within the output text that match. Findall can also be used if only one occurrence is possible, it will then just return a list with only one element.
Here is a helpful site for an intro to re
https://www.w3schools.com/python/python_regex.asp
You can use the re
package to use a regular expression to search for the text you need. There are probably more optimal regex, but I wrote this one quickly using regex101.com: Fatal: "(.+)"
import re
s = '''TASK [Do this]
OK: {
"changed":false,
"msg": "check ok"
}
TASK [Do that]
OK
TASK [Do x]
Fatal: "Error message x"
TASK [Do y]
OK
TASK [Do z]
Fatal: "Stopped because of previous error"'''
errors = re.findall(r'Fatal: "(.+)"', s)
for x in errors:
print(x)