Scraping data from CME

Question:

I am trying to webscrape data from CME exchange:

https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/425/FUT?tradeDate=11/05/2021

I have the following code snippet:

import requests as r

user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
header = {'User-Agent': user_agent}
link = 'https://www.cmegroup.com/CmeWS/mvc/Settlements/Futures/Settlements/425/FUT?tradeDate=11/05/2021'
page = r.get(link,headers=header)
raw_json = json.loads(page.text)

While it works perfectly well on a local computer, it totally hangs on remote hosting servers (Digital Ocean, Hetzner). I have also tried to curl url but it gives a timeout error without additional details.

Do I need to use selenium for this? I wonder what can be different between scraping data from a local machine and the hosting server.

I don’t know how to resolve this. Hope you can give me some clues.

Asked By: Kogelet

||

Answers:

You can get json response from URL itself not requried page.text to transform in to json

Just use this directly may be it could work

data=page.json()
Answered By: Bhavya Parikh

Apparently, some hosting providers are blocked by CME. You should look for one which is not blocked and you can use it as a proxy server. That’s the solution that worked for me. However, now I am thinking that this could be related to IPv6 settings on the server. Try to disable IPv6 connection and it will automatically fall back into IPv4.

on Ubuntu

sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
sudo sysctl -w net.ipv6.conf.lo.disable_ipv6=1
Answered By: Kogelet

Just found the solution for this problem.

Reason for this behaviour its due to the protocol HTTP/2.
A way to test this its upgrading curl, since 7.47.0, the curl tool enables HTTP/2 by default for HTTPS connections.

Hope it helps!

Answered By: Kiko Seijo