read data from chrome console to python

Question:

`I have a code in python to read xpath from a website (https://www.op.gg/summoners/kr/Hide%20on%20bush)

import requests
import lxml.html as html
import pandas as pd

url_padre = "https://www.op.gg/summoners/br/tercermundista"

link_farm = '//div[@class="stats"]//div[@class="cs"]'

r = requests.get(url_padre) 

home=r.content.decode("utf-8") 

parser=html.fromstring(home) 
farm=parser.xpath(link_farm) 

print(farm)`

this code print "[]"

but when in the console chrome put this xpath: $x(‘//div[@class="stats"]//div[@class="cs"]’).map(x=>x.innerText), this print the numbers i want, but my python code dont do it
What is the mistake?

i want a code to solve my mistake

————————–edit—————————


Error                                     Traceback (most recent call last)
c:UsersGCODesktopAnalisis de datosborradoresfsdfs.ipynb Cell 2 in 3
      1 from playwright.sync_api import sync_playwright
----> 3 with sync_playwright() as p, p.chromium.launch() as browser:
      4     page = browser.new_page()
      5     page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)

File c:UsersGCOAppDataLocalProgramsPythonPython310libsite-packagesplaywrightsync_api_context_manager.py:47, in PlaywrightContextManager.__enter__(self)
     45             self._own_loop = True
     46         if self._loop.is_running():
---> 47             raise Error(
     48                 """It looks like you are using Playwright Sync API inside the asyncio loop.
     49 Please use the Async API instead."""
     50             )
     52         # In Python 3.7, asyncio.Process.wait() hangs because it does not use ThreadedChildWatcher
     53         # which is used in Python 3.8+. This is unix specific and also takes care about
     54         # cleaning up zombie processes. See https://bugs.python.org/issue35621
     55         if (
     56             sys.version_info[0] == 3
     57             and sys.version_info[1] == 7
     58             and sys.platform != "win32"
     59             and isinstance(asyncio.get_child_watcher(), asyncio.SafeChildWatcher)
     60         ):

Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
Asked By: Benjamin Correa

||

Answers:

As I understand you can not get dynamically generated content using requests.

Here is solution using playwright which can load whole page before parsing.

  1. Install playwright using pip install playwright
  2. Install browser and dependencies using playwright install chromium --with-deps
  3. Run following code
from playwright.sync_api import sync_playwright

with sync_playwright() as p, p.chromium.launch() as browser:
    page = browser.new_page()
    page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
    selector = "//div[@class='stats']//div[@class='cs']/div"
    cs_stats = page.query_selector_all(selector)
    print(len(cs_stats), [cs.inner_text() for cs in cs_stats])

If you want to stick with lxml as parsing tool you can use following code:

from lxml import html
from playwright.sync_api import sync_playwright

with sync_playwright() as p, p.chromium.launch() as browser:
    page = browser.new_page()
    page.goto("https://www.op.gg/summoners/kr/Hide%20on%20bush", timeout=10000)
    selector = "//div[@class='stats']//div[@class='cs']/div"
    c = page.content()
    parser = html.fromstring(c)
    farm = parser.xpath(selector)
    print(len(farm), [cs.text for cs in farm])

P.S.

Also I have noticed that op.gg use pretty simple HTTP requests that do not need authorization. You can find desired info using this code:

import json
from urllib.request import urlopen
url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg?&limit=20"
r = urlopen(url)
games = json.load(r).get("data", [])
print(games)

games is a list of dicts that stores all info you need. CS stats are stored in list element under following keys: games[0]["myData"]["stats"]["minion_kill"]

The only difficult thing here is to find how to get summoner_id for desired user (which is 4b4tvMrpRRDLvXAiQ_Vmh5yMOsD0R3GPGTUVfIanp1Httg in your example)

Answered By: pL3b

You can use this example how to load the data from external URL and compute the CS value:

import re
import requests


url = "https://www.op.gg/summoners/kr/Hide%20on%20bush"
api_url = "https://op.gg/api/v1.0/internal/bypass/games/kr/summoners/{summoner_id}?=&limit=20&hl=en_US&game_type=total"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/110.0"
}

html_doc = requests.get(url, headers=headers).text
summoner_id = re.search(r'"summoner_id":"(.*?)"', html_doc).group(1)

data = requests.get(api_url.format(summoner_id=summoner_id), headers=headers).json()

for d in data["data"]:
    stats = d["myData"]["stats"]
    kills = (
        stats["minion_kill"]
        + stats["neutral_minion_kill_team_jungle"]
        + stats["neutral_minion_kill_enemy_jungle"]
        + stats["neutral_minion_kill"]
    )
    cs = kills / (d['game_length_second'] / 60)
    print(f'{cs=:.1f}')

Prints:

cs=6.7
cs=8.5
cs=8.2
cs=1.4
cs=7.3
cs=8.5
cs=6.8
cs=7.7
cs=8.7
cs=8.8
cs=5.6
cs=9.9
cs=7.0
cs=9.6
cs=9.7
cs=5.0
cs=7.5
cs=9.2
cs=9.0
cs=7.9
Answered By: Andrej Kesely