Parse JavaScript array with empty elements using bs4

Question:

I am trying to parse this javascript element using BS4.
I want to get that input array into a usable format.

<script type="text/javascript">
    
        require.config.params['matchheader'] = {
            input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']
    ,
            matchId: 1640674
        };


</script>

To get the text inside the global variable, I used the following regex:

re.search("input: [.*?]", script_element.string).group(0)

which returns:

"input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']"

I am having some trouble parsing this array because of the empty elements (literal_eval does not work).

Any idea on how to accomplish this? Is there an easier way to do it?

Regards

Asked By: Verance

||

Answers:

One solution could be insert None between the empty , and then parse it:

import re
from ast import literal_eval

data = re.search(r"input:s*(.*)", s).group(1)  # <-- `s` is your string from the question
data = re.sub(r"(?<=,)s*(?=,)", "None", data)
data = literal_eval(data)

print(data)

Prints:

[162, 13, 'Crystal Palace', 'Arsenal', '05/08/2022 20:00:00', '05/08/2022 00:00:00', 6, 'FT', '0 : 1', '0 : 2', None, None, '0 : 2', 'England', 'England']
Answered By: Andrej Kesely

You could do some string manipulation and do a .split(',') on the string to create a list.

import re

var_to_parse = """
<script type="text/javascript">
    
        require.config.params['matchheader'] = {
            input: [162,13,'Crystal Palace','Arsenal','05/08/2022 20:00:00','05/08/2022 00:00:00',6,'FT','0 : 1','0 : 2',,,'0 : 2','England','England']
    ,
            matchId: 1640674
        };


</script>
"""
parse1 = re.search("input: [.*?]", var_to_parse).group(0)
parse2 = parse1.split("input: ")[1]
parse3 = parse2[1:-1]
parse4 = parse3.split(',')
print("parse4:", parse4)
Answered By: Jortega
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.