Using regex to parse string python3
Question:
I am trying to access gSecureToken
from the following string:
$("#ejectButton").on("click", function(e) {
$("#ejectButton").prop("disabled", true);
$.ajax({
url : "/apps_home/eject/",
type : "POST",
data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
dataType : "json",
success : function(data, textStatus, xhr) {
$("#smbStatus").html('');
$("#smbEnable").removeClass('greenColor').html('OFF');
showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
},
error : function(xhr, textStatus, errorThrown) {
//undoChange($toggleSwitchElement);
// If auth session has ended, force a new login with a fresh GET.
if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
}
});
How can I use regex to parse the value out of the string? I know once I have it parsed I will be able to load it as JSON.
My current code doesn’t use an regex, it just deals with using BeautifulSoup to parse some html. Here is my code so far:
from bs4 import BeautifulSoup
class SecureTokenParser:
@staticmethod
def parse_secure_token_from_html_response(html_response):
soup = BeautifulSoup(html_response, 'html.parser')
for script_tag in soup.find_all("script", type="text/javascript"):
print(script_tag)
I know it’s not much, but I figured it was a good starting point to print the contents to the terminal. How can I use regex to parse out the gSecureToken
and then load it as JSON?
Answers:
You won’t show us what print()
displays, but imagine it resembles s
below.
Use this to parse it:
import re
def parse_token(s: str):
token_re = re.compile(r'"gSecureToken": "(w{40})"')
m = token_re.search(s)
return m.group(1)
s = '{"url": "/apps_home/eject/", "type": "POST", "data": {"gSecureToken": "7b9854390a079b03cce068b577cd9af6686826b8"}, "dataType": "json"}'
print(parse_token(s))
print(dict(data=dict(gSecureToken=parse_token(s))))
Feel free to use w+
if a fixed 40 characters is too restrictive.
The man page is at: https://docs.python.org/3/library/re.html
Your “… and then load it as JSON?” remark doesn’t appear to be relevant,
since by demanding we parse with a regex it looks like there are no
parsing tasks leftover for JSON to attend to.
(I would have probably started with json.loads()
from the get-go,
rather than using a regex, since the data appears to be JSON formatted.)
A non-regex, non-BS4 option:
html_response = [your string above]
splt = html_string.split(' : { ')
splt[1].split('},n')[0]
Output:
‘gSecureToken : “7b9854390a079b03cce068b577cd9af6686826b8” ‘
No need to rely on a large package like BeautifulSoup
for this; you can easily parse out the value of gSecureToken
using just the Python re
package.
I’m assuming you want to parse out just the value of the gSecureToken
. Then, you can create a regular expression pattern:
import re
pattern = r'{s*gSecureTokens*:s*"([a-z0-9]+)"s*}'
Then, we can use, for example, your test string:
test_str = """
$("#ejectButton").on("click", function(e) {
$("#ejectButton").prop("disabled", true);
$.ajax({
url : "/apps_home/eject/",
type : "POST",
data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
dataType : "json",
success : function(data, textStatus, xhr) {
$("#smbStatus").html('');
$("#smbEnable").removeClass('greenColor').html('OFF');
showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
},
error : function(xhr, textStatus, errorThrown) {
//undoChange($toggleSwitchElement);
// If auth session has ended, force a new login with a fresh GET.
if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
}
});
"""
And finally we can search the test string for our regular expression:
match = re.search(pattern, test_str)
matching_string = match.groups()[0]
print(matching_string)
Which gives us the value desired:
7b9854390a079b03cce068b577cd9af6686826b8
You can see why this regular expression works by visiting this link: www.regexr.com/4ihpd
I am trying to access gSecureToken
from the following string:
$("#ejectButton").on("click", function(e) {
$("#ejectButton").prop("disabled", true);
$.ajax({
url : "/apps_home/eject/",
type : "POST",
data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
dataType : "json",
success : function(data, textStatus, xhr) {
$("#smbStatus").html('');
$("#smbEnable").removeClass('greenColor').html('OFF');
showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
},
error : function(xhr, textStatus, errorThrown) {
//undoChange($toggleSwitchElement);
// If auth session has ended, force a new login with a fresh GET.
if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
}
});
How can I use regex to parse the value out of the string? I know once I have it parsed I will be able to load it as JSON.
My current code doesn’t use an regex, it just deals with using BeautifulSoup to parse some html. Here is my code so far:
from bs4 import BeautifulSoup
class SecureTokenParser:
@staticmethod
def parse_secure_token_from_html_response(html_response):
soup = BeautifulSoup(html_response, 'html.parser')
for script_tag in soup.find_all("script", type="text/javascript"):
print(script_tag)
I know it’s not much, but I figured it was a good starting point to print the contents to the terminal. How can I use regex to parse out the gSecureToken
and then load it as JSON?
You won’t show us what print()
displays, but imagine it resembles s
below.
Use this to parse it:
import re
def parse_token(s: str):
token_re = re.compile(r'"gSecureToken": "(w{40})"')
m = token_re.search(s)
return m.group(1)
s = '{"url": "/apps_home/eject/", "type": "POST", "data": {"gSecureToken": "7b9854390a079b03cce068b577cd9af6686826b8"}, "dataType": "json"}'
print(parse_token(s))
print(dict(data=dict(gSecureToken=parse_token(s))))
Feel free to use w+
if a fixed 40 characters is too restrictive.
The man page is at: https://docs.python.org/3/library/re.html
Your “… and then load it as JSON?” remark doesn’t appear to be relevant,
since by demanding we parse with a regex it looks like there are no
parsing tasks leftover for JSON to attend to.
(I would have probably started with json.loads()
from the get-go,
rather than using a regex, since the data appears to be JSON formatted.)
A non-regex, non-BS4 option:
html_response = [your string above]
splt = html_string.split(' : { ')
splt[1].split('},n')[0]
Output:
‘gSecureToken : “7b9854390a079b03cce068b577cd9af6686826b8” ‘
No need to rely on a large package like BeautifulSoup
for this; you can easily parse out the value of gSecureToken
using just the Python re
package.
I’m assuming you want to parse out just the value of the gSecureToken
. Then, you can create a regular expression pattern:
import re
pattern = r'{s*gSecureTokens*:s*"([a-z0-9]+)"s*}'
Then, we can use, for example, your test string:
test_str = """
$("#ejectButton").on("click", function(e) {
$("#ejectButton").prop("disabled", true);
$.ajax({
url : "/apps_home/eject/",
type : "POST",
data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
dataType : "json",
success : function(data, textStatus, xhr) {
$("#smbStatus").html('');
$("#smbEnable").removeClass('greenColor').html('OFF');
showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
},
error : function(xhr, textStatus, errorThrown) {
//undoChange($toggleSwitchElement);
// If auth session has ended, force a new login with a fresh GET.
if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
}
});
"""
And finally we can search the test string for our regular expression:
match = re.search(pattern, test_str)
matching_string = match.groups()[0]
print(matching_string)
Which gives us the value desired:
7b9854390a079b03cce068b577cd9af6686826b8
You can see why this regular expression works by visiting this link: www.regexr.com/4ihpd