Web Scraping Table from 'Dune.com' with Python3 and bs4

Question:

I am trying to web scrape table data from Dune.com (https://dune.com/queries/1144723). When I ‘inspect’ the web page, I am able to clearly see the <table></table> element, but when I run the following code I am returned None results.

import bs4
import requests

data = []

r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")

table = soup.find('table')

How can I successfully find this table data?

Asked By: spal

||

Answers:

import bs4
import requests

data = []

r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")

table = soup.find('table')
Answered By: Satyam Kumar

The page uses Javascript to load the data. This example will use their API endpoint to load the data to a dataframe:

import requests
import pandas as pd
from bs4 import BeautifulSoup


api_url = "https://app-api.dune.com/v1/graphql"

payload = {
    "operationName": "GetExecution",
    "query": "query GetExecution($execution_id: String!, $query_id: Int!, $parameters: [Parameter!]!) {n  get_execution(n    execution_id: $execution_idn    query_id: $query_idn    parameters: $parametersn  ) {n    execution_queued {n      execution_idn      execution_user_idn      positionn      execution_typen      created_atn      __typenamen    }n    execution_running {n      execution_idn      execution_user_idn      execution_typen      started_atn      created_atn      __typenamen    }n    execution_succeeded {n      execution_idn      runtime_secondsn      generated_atn      columnsn      datan      __typenamen    }n    execution_failed {n      execution_idn      typen      messagen      metadata {n        linen        columnn        hintn        __typenamen      }n      runtime_secondsn      generated_atn      __typenamen    }n    __typenamen  }n}n",
    "variables": {
        "execution_id": "01GN7GTHF62FY5DYYSQ5MSEG2H",
        "parameters": [],
        "query_id": 1144723,
    },
}


data = requests.post(api_url, json=payload).json()

df = pd.DataFrame(data["data"]["get_execution"]["execution_succeeded"]["data"])
df["total_pnl"] = df["total_pnl"].astype(str)
df[["account", "link"]] = df.apply(
    func=lambda x: (
        (s := BeautifulSoup(x["account"], "html.parser")).text,
        s.a["href"],
    ),
    result_type="expand",
    axis=1,
)
print(df.head(10))  # <-- print sample data

Prints:

                                      account           last_traded rankings           total_pnl          traded_since                                                                               link
0  0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46  2022-02-01T13:57:01Z       #1   1591196.831211874  2021-11-20T18:04:19Z  https://www.gmx.house/arbitrum/account/0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46
1  0xcb696fd8e239dd68337c70f542c2e38686849e90  2022-11-23T18:26:04Z       #2  1367359.0616298981  2022-10-26T06:45:14Z  https://www.gmx.house/arbitrum/account/0xcb696fd8e239dd68337c70f542c2e38686849e90
2                                  190416.eth  2022-12-20T20:30:09Z       #3   864694.6695150969  2022-09-06T03:07:03Z  https://www.gmx.house/arbitrum/account/0xa688bc5e676325cc5fc891ac48fe442f6298a432
3  0x1729f93e3c3c74b503b8130516984ced70bf47d9  2021-09-24T07:30:51Z       #4   801075.4878765604  2021-09-22T00:16:43Z  https://www.gmx.house/arbitrum/account/0x1729f93e3c3c74b503b8130516984ced70bf47d9
4  0x83b13abab6ec323fff3af6d18a8fd1646ea39477  2022-12-12T21:36:25Z       #5     682459.02019836  2022-04-18T14:19:56Z  https://www.gmx.house/arbitrum/account/0x83b13abab6ec323fff3af6d18a8fd1646ea39477
5  0x9fc3b6191927b044ef709addd163b15c933ee205  2022-12-03T00:05:33Z       #6   652673.6605261166  2022-11-02T18:26:18Z  https://www.gmx.house/arbitrum/account/0x9fc3b6191927b044ef709addd163b15c933ee205
6  0xe8c19db00287e3536075114b2576c70773e039bd  2022-12-23T08:59:38Z       #7    644020.503240131  2022-10-06T07:20:44Z  https://www.gmx.house/arbitrum/account/0xe8c19db00287e3536075114b2576c70773e039bd
7  0x75a34444581f563680003f2ba05ea0c890a10934  2022-11-10T18:08:50Z       #8   639684.0495719836  2022-03-06T23:20:41Z  https://www.gmx.house/arbitrum/account/0x75a34444581f563680003f2ba05ea0c890a10934
8                               omarazhar.eth  2022-09-16T00:27:22Z       #9   536522.3114796011  2022-04-11T20:44:42Z  https://www.gmx.house/arbitrum/account/0x204495da23507be4e1281c32fb1b82d9d4289826
9  0x023cb9f0662c6612e830b37a82f41125a4c117e1  2022-09-06T01:10:28Z      #10   496922.9880152336  2022-04-12T22:31:47Z  https://www.gmx.house/arbitrum/account/0x023cb9f0662c6612e830b37a82f41125a4c117e1
Answered By: Andrej Kesely