Web Scraping Table from 'Dune.com' with Python3 and bs4
Question:
I am trying to web scrape table data from Dune.com (https://dune.com/queries/1144723). When I ‘inspect’ the web page, I am able to clearly see the <table></table>
element, but when I run the following code I am returned None results.
import bs4
import requests
data = []
r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")
table = soup.find('table')
How can I successfully find this table data?
Answers:
import bs4
import requests
data = []
r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")
table = soup.find('table')
The page uses Javascript to load the data. This example will use their API endpoint to load the data to a dataframe:
import requests
import pandas as pd
from bs4 import BeautifulSoup
api_url = "https://app-api.dune.com/v1/graphql"
payload = {
"operationName": "GetExecution",
"query": "query GetExecution($execution_id: String!, $query_id: Int!, $parameters: [Parameter!]!) {n get_execution(n execution_id: $execution_idn query_id: $query_idn parameters: $parametersn ) {n execution_queued {n execution_idn execution_user_idn positionn execution_typen created_atn __typenamen }n execution_running {n execution_idn execution_user_idn execution_typen started_atn created_atn __typenamen }n execution_succeeded {n execution_idn runtime_secondsn generated_atn columnsn datan __typenamen }n execution_failed {n execution_idn typen messagen metadata {n linen columnn hintn __typenamen }n runtime_secondsn generated_atn __typenamen }n __typenamen }n}n",
"variables": {
"execution_id": "01GN7GTHF62FY5DYYSQ5MSEG2H",
"parameters": [],
"query_id": 1144723,
},
}
data = requests.post(api_url, json=payload).json()
df = pd.DataFrame(data["data"]["get_execution"]["execution_succeeded"]["data"])
df["total_pnl"] = df["total_pnl"].astype(str)
df[["account", "link"]] = df.apply(
func=lambda x: (
(s := BeautifulSoup(x["account"], "html.parser")).text,
s.a["href"],
),
result_type="expand",
axis=1,
)
print(df.head(10)) # <-- print sample data
Prints:
account last_traded rankings total_pnl traded_since link
0 0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46 2022-02-01T13:57:01Z #1 1591196.831211874 2021-11-20T18:04:19Z https://www.gmx.house/arbitrum/account/0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46
1 0xcb696fd8e239dd68337c70f542c2e38686849e90 2022-11-23T18:26:04Z #2 1367359.0616298981 2022-10-26T06:45:14Z https://www.gmx.house/arbitrum/account/0xcb696fd8e239dd68337c70f542c2e38686849e90
2 190416.eth 2022-12-20T20:30:09Z #3 864694.6695150969 2022-09-06T03:07:03Z https://www.gmx.house/arbitrum/account/0xa688bc5e676325cc5fc891ac48fe442f6298a432
3 0x1729f93e3c3c74b503b8130516984ced70bf47d9 2021-09-24T07:30:51Z #4 801075.4878765604 2021-09-22T00:16:43Z https://www.gmx.house/arbitrum/account/0x1729f93e3c3c74b503b8130516984ced70bf47d9
4 0x83b13abab6ec323fff3af6d18a8fd1646ea39477 2022-12-12T21:36:25Z #5 682459.02019836 2022-04-18T14:19:56Z https://www.gmx.house/arbitrum/account/0x83b13abab6ec323fff3af6d18a8fd1646ea39477
5 0x9fc3b6191927b044ef709addd163b15c933ee205 2022-12-03T00:05:33Z #6 652673.6605261166 2022-11-02T18:26:18Z https://www.gmx.house/arbitrum/account/0x9fc3b6191927b044ef709addd163b15c933ee205
6 0xe8c19db00287e3536075114b2576c70773e039bd 2022-12-23T08:59:38Z #7 644020.503240131 2022-10-06T07:20:44Z https://www.gmx.house/arbitrum/account/0xe8c19db00287e3536075114b2576c70773e039bd
7 0x75a34444581f563680003f2ba05ea0c890a10934 2022-11-10T18:08:50Z #8 639684.0495719836 2022-03-06T23:20:41Z https://www.gmx.house/arbitrum/account/0x75a34444581f563680003f2ba05ea0c890a10934
8 omarazhar.eth 2022-09-16T00:27:22Z #9 536522.3114796011 2022-04-11T20:44:42Z https://www.gmx.house/arbitrum/account/0x204495da23507be4e1281c32fb1b82d9d4289826
9 0x023cb9f0662c6612e830b37a82f41125a4c117e1 2022-09-06T01:10:28Z #10 496922.9880152336 2022-04-12T22:31:47Z https://www.gmx.house/arbitrum/account/0x023cb9f0662c6612e830b37a82f41125a4c117e1
I am trying to web scrape table data from Dune.com (https://dune.com/queries/1144723). When I ‘inspect’ the web page, I am able to clearly see the <table></table>
element, but when I run the following code I am returned None results.
import bs4
import requests
data = []
r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")
table = soup.find('table')
How can I successfully find this table data?
import bs4
import requests
data = []
r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")
table = soup.find('table')
The page uses Javascript to load the data. This example will use their API endpoint to load the data to a dataframe:
import requests
import pandas as pd
from bs4 import BeautifulSoup
api_url = "https://app-api.dune.com/v1/graphql"
payload = {
"operationName": "GetExecution",
"query": "query GetExecution($execution_id: String!, $query_id: Int!, $parameters: [Parameter!]!) {n get_execution(n execution_id: $execution_idn query_id: $query_idn parameters: $parametersn ) {n execution_queued {n execution_idn execution_user_idn positionn execution_typen created_atn __typenamen }n execution_running {n execution_idn execution_user_idn execution_typen started_atn created_atn __typenamen }n execution_succeeded {n execution_idn runtime_secondsn generated_atn columnsn datan __typenamen }n execution_failed {n execution_idn typen messagen metadata {n linen columnn hintn __typenamen }n runtime_secondsn generated_atn __typenamen }n __typenamen }n}n",
"variables": {
"execution_id": "01GN7GTHF62FY5DYYSQ5MSEG2H",
"parameters": [],
"query_id": 1144723,
},
}
data = requests.post(api_url, json=payload).json()
df = pd.DataFrame(data["data"]["get_execution"]["execution_succeeded"]["data"])
df["total_pnl"] = df["total_pnl"].astype(str)
df[["account", "link"]] = df.apply(
func=lambda x: (
(s := BeautifulSoup(x["account"], "html.parser")).text,
s.a["href"],
),
result_type="expand",
axis=1,
)
print(df.head(10)) # <-- print sample data
Prints:
account last_traded rankings total_pnl traded_since link
0 0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46 2022-02-01T13:57:01Z #1 1591196.831211874 2021-11-20T18:04:19Z https://www.gmx.house/arbitrum/account/0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46
1 0xcb696fd8e239dd68337c70f542c2e38686849e90 2022-11-23T18:26:04Z #2 1367359.0616298981 2022-10-26T06:45:14Z https://www.gmx.house/arbitrum/account/0xcb696fd8e239dd68337c70f542c2e38686849e90
2 190416.eth 2022-12-20T20:30:09Z #3 864694.6695150969 2022-09-06T03:07:03Z https://www.gmx.house/arbitrum/account/0xa688bc5e676325cc5fc891ac48fe442f6298a432
3 0x1729f93e3c3c74b503b8130516984ced70bf47d9 2021-09-24T07:30:51Z #4 801075.4878765604 2021-09-22T00:16:43Z https://www.gmx.house/arbitrum/account/0x1729f93e3c3c74b503b8130516984ced70bf47d9
4 0x83b13abab6ec323fff3af6d18a8fd1646ea39477 2022-12-12T21:36:25Z #5 682459.02019836 2022-04-18T14:19:56Z https://www.gmx.house/arbitrum/account/0x83b13abab6ec323fff3af6d18a8fd1646ea39477
5 0x9fc3b6191927b044ef709addd163b15c933ee205 2022-12-03T00:05:33Z #6 652673.6605261166 2022-11-02T18:26:18Z https://www.gmx.house/arbitrum/account/0x9fc3b6191927b044ef709addd163b15c933ee205
6 0xe8c19db00287e3536075114b2576c70773e039bd 2022-12-23T08:59:38Z #7 644020.503240131 2022-10-06T07:20:44Z https://www.gmx.house/arbitrum/account/0xe8c19db00287e3536075114b2576c70773e039bd
7 0x75a34444581f563680003f2ba05ea0c890a10934 2022-11-10T18:08:50Z #8 639684.0495719836 2022-03-06T23:20:41Z https://www.gmx.house/arbitrum/account/0x75a34444581f563680003f2ba05ea0c890a10934
8 omarazhar.eth 2022-09-16T00:27:22Z #9 536522.3114796011 2022-04-11T20:44:42Z https://www.gmx.house/arbitrum/account/0x204495da23507be4e1281c32fb1b82d9d4289826
9 0x023cb9f0662c6612e830b37a82f41125a4c117e1 2022-09-06T01:10:28Z #10 496922.9880152336 2022-04-12T22:31:47Z https://www.gmx.house/arbitrum/account/0x023cb9f0662c6612e830b37a82f41125a4c117e1