Using regex to parse string python3

Question:

I am trying to access gSecureToken from the following string:

$("#ejectButton").on("click", function(e) {
            $("#ejectButton").prop("disabled", true);
            $.ajax({
                url : "/apps_home/eject/",
                type : "POST",
                data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
                dataType : "json",
                success : function(data, textStatus, xhr) {
                    $("#smbStatus").html('');
                    $("#smbEnable").removeClass('greenColor').html('OFF');
                    showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
                },
                error : function(xhr, textStatus, errorThrown) {
                    //undoChange($toggleSwitchElement);
                    // If auth session has ended, force a new login with a fresh GET.
                    if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
                }
            });

How can I use regex to parse the value out of the string? I know once I have it parsed I will be able to load it as JSON.

My current code doesn’t use an regex, it just deals with using BeautifulSoup to parse some html. Here is my code so far:

from bs4 import BeautifulSoup

class SecureTokenParser:

    @staticmethod
    def parse_secure_token_from_html_response(html_response):
        soup = BeautifulSoup(html_response, 'html.parser')
        for script_tag in soup.find_all("script", type="text/javascript"):
            print(script_tag)

I know it’s not much, but I figured it was a good starting point to print the contents to the terminal. How can I use regex to parse out the gSecureToken and then load it as JSON?

Asked By: KED

||

Answers:

You won’t show us what print() displays, but imagine it resembles s below.

Use this to parse it:

import re


def parse_token(s: str):
    token_re = re.compile(r'"gSecureToken": "(w{40})"')
    m = token_re.search(s)
    return m.group(1)


s = '{"url": "/apps_home/eject/", "type": "POST", "data": {"gSecureToken": "7b9854390a079b03cce068b577cd9af6686826b8"}, "dataType": "json"}'
print(parse_token(s))
print(dict(data=dict(gSecureToken=parse_token(s))))

Feel free to use w+ if a fixed 40 characters is too restrictive.
The man page is at: https://docs.python.org/3/library/re.html

Your “… and then load it as JSON?” remark doesn’t appear to be relevant,
since by demanding we parse with a regex it looks like there are no
parsing tasks leftover for JSON to attend to.
(I would have probably started with json.loads() from the get-go,
rather than using a regex, since the data appears to be JSON formatted.)

Answered By: J_H

A non-regex, non-BS4 option:

html_response = [your string above]

splt = html_string.split(' : { ')
splt[1].split('},n')[0]

Output:

‘gSecureToken : “7b9854390a079b03cce068b577cd9af6686826b8” ‘

Answered By: Jack Fleeting

No need to rely on a large package like BeautifulSoup for this; you can easily parse out the value of gSecureToken using just the Python re package.

I’m assuming you want to parse out just the value of the gSecureToken. Then, you can create a regular expression pattern:

import re

pattern = r'{s*gSecureTokens*:s*"([a-z0-9]+)"s*}'

Then, we can use, for example, your test string:

test_str = """
$("#ejectButton").on("click", function(e) {
            $("#ejectButton").prop("disabled", true);
            $.ajax({
                url : "/apps_home/eject/",
                type : "POST",
                data : { gSecureToken : "7b9854390a079b03cce068b577cd9af6686826b8" },
                dataType : "json",
                success : function(data, textStatus, xhr) {
                    $("#smbStatus").html('');
                    $("#smbEnable").removeClass('greenColor').html('OFF');
                    showPopup("MiFi Share", "<p>Eject completed. It is now safe to remove your USB storage device.</p>");
                },
                error : function(xhr, textStatus, errorThrown) {
                    //undoChange($toggleSwitchElement);
                    // If auth session has ended, force a new login with a fresh GET.
                    if( (xhr.status == 401) || (xhr.status == 403) || (xhr.status == 406) ) window.location.replace(window.location.href);
                }
            });
"""

And finally we can search the test string for our regular expression:

match = re.search(pattern, test_str)
matching_string = match.groups()[0]
print(matching_string)

Which gives us the value desired:

7b9854390a079b03cce068b577cd9af6686826b8

You can see why this regular expression works by visiting this link: www.regexr.com/4ihpd

Answered By: Arthur D.