Python If/else behaving incorrectly (or I'm just dumb)

Question:

I’m still a beginner, so I’m sure the issue is in some silly thing I did.

Basically, I’m trying to figure out websites that have only one or the two versions of Google Analytics (UA –> Universal analytics, and GA4 –> Google Analytics 4).

The best way to do it in my opinion is to scrape the network requests and differentiate them using the URLs (see the difference in the variables "ga4check" and "uacheck").

Scraping the network requests and parsing it is working fine, but when I check its presence using an if/else statement it doesn’t work. It basically returns false to the first if since the output is "Something isn’t right…"

Here’s my code :

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import time
import json

ga4check = 'google-analytics.com/g/collect?v=2&tid=G-'
uacheck = 'google-analytics.com/collect?v=1&_v='
collectlist = []

if __name__ == "__main__":

    desired_capabilities = DesiredCapabilities.CHROME
    desired_capabilities["goog:loggingPrefs"] = {"performance": "ALL"}
    options = webdriver.ChromeOptions()
    options.add_argument('headless')
    options.add_argument("--ignore-certificate-errors")

    driver = webdriver.Chrome(executable_path=r'C:UsersdgaygDesktopScriptsGA4 finderchromedriver.exe',
                            chrome_options=options,
                            desired_capabilities=desired_capabilities)

    driver.get("https://www.measureschool.com/")
    time.sleep(10)
    logs = driver.get_log("performance")

    with open("network_log.json", "w", encoding="utf-8") as f:
        f.write("[")

        for log in logs:
            network_log = json.loads(log["message"])["message"]

            if("Network.response" in network_log["method"]
                    or "Network.request" in network_log["method"]
                    or "Network.webSocket" in network_log["method"]):

                f.write(json.dumps(network_log)+",")
        f.write("{}]")

    print("Quitting Selenium WebDriver")
    driver.quit()

    json_file_path = "network_log.json"
    with open(json_file_path, "r", encoding="utf-8") as f:
        logs = json.loads(f.read())

    for log in logs:
        try:
            url = log["params"]["request"]["url"]

            if "collect?v=" in url:
                collectlist.append(url)
        except Exception as e:
            pass

if any(uacheck in i for i in collectlist):
    if any(ga4check in i for i in collectlist):
        print('There's UA and GA4 on this website')
    elif any(ga4check not in i for i in collectlist):
        print('Only UA is present on this website')
else:
    print('Something isn't right...') 

Output :

C:UsersdgaygDesktopScriptsGA4 finder> & C:/Users/dgayg/AppData/Local/Programs/Python/Python39/python.exe "c:/Users/dgayg/Desktop/Scripts/GA4 finder/main.py"
c:UsersdgaygDesktopScriptsGA4 findermain.py:18: DeprecationWarning: executable_path has been deprecated, 
please pass in a Service object
  driver = webdriver.Chrome(executable_path=r'C:UsersdgaygDesktopScriptsGA4 finderchromedriver.exe',     
c:UsersdgaygDesktopScriptsGA4 findermain.py:18: DeprecationWarning: use options instead of chrome_options  driver = webdriver.Chrome(executable_path=r'C:UsersdgaygDesktopScriptsGA4 finderchromedriver.exe',     

DevTools listening on ws://127.0.0.1:14224/devtools/browser/ea27a598-5b1d-48e2-bffa-1bf849b826b8
[0731/193526.701:INFO:CONSOLE(0)] "Failed to set referrer policy: The value '' is not one of 'no-referrer', 'no-referrer-when-downgrade', 'origin', 'origin-when-cross-origin', 'same-origin', 'strict-origin', 'strict-origin-when-cross-origin', or 'unsafe-url'. The referrer policy has been left unchanged.", source:  (0)
[0731/193526.880:INFO:CONSOLE(2)] "JQMIGRATE: Migrate is installed, version 3.3.2", source: https://measureschool.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2 (2)
Quitting Selenium WebDriver
Something isn't right...

Here’s the output of collectlist

['https://region1.google-analytics.com/g/collect?v=2&tid=G-QG5JR71SF7&gtm=2oe7r0&_p=877231823&_z=ccd.v9B&cid=879701179.1659290205&ul=en-us&sr=800x600&_s=1&sid=1659290205&sct=1&seg=0&dl=https%3A%2F%2Fmeasureschool.com%2F&dt=MeasureSchool%20-%20The%20Data-Driven%20Way%20of%20Digital%20Marketing&en=page_view&_fv=1&_nsi=1&_ss=1', 'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https%3A%2F%2Fmeasureschool.com%2F', 'https://www.google-analytics.com/j/collect?v=1&_v=j96&a=877231823&t=pageview&_s=1&dl=https%3A%2F%2Fmeasureschool.com%2F&dp=%2F&ul=en-us&de=UTF-8&dt=MeasureSchool%20-%20The%20Data-Driven%20Way%20of%20Digital%20Marketing&sd=24-bit&sr=800x600&vp=774x600&je=0&_u=4CDACEABBAAAAC~&jid=253819846&gjid=797933957&cid=879701179.1659290205&tid=UA-58541733-2&_gid=1754033578.1659290206&_r=1&gtm=2wg7r0593KN2&z=952191590', 'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https%3A%2F%2Fmeasureschool.com%2F&liSync=true', 'https://px4.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https%3A%2F%2Fmeasureschool.com%2F&liSync=true&e_ipv6=AQJQvwc9MAD7QAAAAYJVZz8nCTxegeWWl3Feqs04Ry8lLAYe4tRStgs5YUf0ek2yseMWT3wlT2oSrFcxugGX91BzO2PCy9w']

I hope that what I’m trying to achieve is clear enough.

Thanks a lot in advance !!

Asked By: Aymen Eddaoudi

||

Answers:

Look at your collectlist:

collectlist = [
    'https://region1.google-analytics.com/g/collect?v=2&tid=G-QG5JR71SF7&gtm=2oe7r0&_p=877231823&_z=ccd.v9B&cid=879701179.1659290205&ul=en-us&sr=800x600&_s=1&sid=1659290205&sct=1&seg=0&dl=https%3A%2F%2Fmeasureschool.com%2F&dt=MeasureSchool%20-%20The%20Data-Driven%20Way%20of%20Digital%20Marketing&en=page_view&_fv=1&_nsi=1&_ss=1',
    'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https%3A%2F%2Fmeasureschool.com%2F',
    'https://www.google-analytics.com/j/collect?v=1&_v=j96&a=877231823&t=pageview&_s=1&dl=https%3A%2F%2Fmeasureschool.com%2F&dp=%2F&ul=en-us&de=UTF-8&dt=MeasureSchool%20-%20The%20Data-Driven%20Way%20of%20Digital%20Marketing&sd=24-bit&sr=800x600&vp=774x600&je=0&_u=4CDACEABBAAAAC~&jid=253819846&gjid=797933957&cid=879701179.1659290205&tid=UA-58541733-2&_gid=1754033578.1659290206&_r=1&gtm=2wg7r0593KN2&z=952191590',
    'https://px.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https%3A%2F%2Fmeasureschool.com%2F&liSync=true',
    'https://px4.ads.linkedin.com/collect?v=2&fmt=js&pid=1024658&time=1659290205477&url=https%3A%2F%2Fmeasureschool.com%2F&liSync=true&e_ipv6=AQJQvwc9MAD7QAAAAYJVZz8nCTxegeWWl3Feqs04Ry8lLAYe4tRStgs5YUf0ek2yseMWT3wlT2oSrFcxugGX91BzO2PCy9w'
]

and look at your value for uacheck:

uacheck = 'google-analytics.com/collect?v=1&_v='

There is not any i in collectlist that contains uacheck. You do have a ga4check URL, but your code doesn’t bother looking for ga4check if it doesn’t find at least one uacheck first.

I believe you may want to structure your checks more like:

any_ua = any(uacheck in i for i in collectlist)
any_ga4 = any(ga4check in i for i in collectlist)

if any_ua and any_ga4:
    print('There's UA and GA4 on this website')
elif any_ua:
    print('Only UA is present on this website')
elif any_ga4:
    print('Only GA4 is present on this website')
else:
    print('Neither is present on this website.)

Since your ifs are effectively checking all the possible combinations of two booleans, you could also represent that as a 2×2 truth table, like this:

any_ua = any(uacheck in i for i in collectlist)
any_ga4 = any(ga4check in i for i in collectlist)
print([
    # no GA                # some GA
    ["Neither is present", "Only GA4 is present"],  # no UA
    ["Only UA is present", "There's UA and GA4"],   # some UA
][any_ua][any_ga4], "on this website")
Answered By: Samwise