Download a file that's linked to a button on a website

Question:

I’m looking for a way to get files such as the one in this link, which can be downloaded by clicking a "download" button. I couldn’t find a way despite reading many posts that seemed to be relevant.

The code I got so far:

import requests
from bs4 import BeautifulSoup as bs

with open('ukb49810.html', 'r') as f:
    html = f.read()

index_page = bs(html, 'html.parser')
for i in index_page.find_all('a', href=True)[2:]:
    if 'coding' in i['href']:
        file = requests.get(i['href']).text
        download_page = bs(file, 'html.parser').find_all('a', href=True)

From the download_page variable I got "URLs" with the code

for ii in download_page:
    print(ii['href'])

which printed

http://
index.cgi
browse.cgi?id=9&cd=data_coding
search.cgi
catalogs.cgi
download.cgi
https://bbams.ndph.ox.ac.uk/ams/resApplications
help.cgi?cd=data_coding
field.cgi?id=22001
field.cgi?id=22001
label.cgi?id=100313
field.cgi?id=31
field.cgi?id=31
label.cgi?id=100094

I tried to use these supposedly-URLs to compose the download URL but the link I got didn’t work.
Thanks.

Asked By: random

||

Answers:

None of these links are to the download page. If you view source on the page, you will see how the download is done:

<form method="post" action="codown.cgi">
    <input type="hidden" name="id" value="9"></td><td>
    <input class="btn_glow" type="submit" value="Download">
</form>

So you would need to submit a POST request to codown.cgi with your value, something like:

curl --request POST 
   --url https://biobank.ndph.ox.ac.uk/showcase/codown.cgi 
   --header 'Content-Type: application/x-www-form-urlencoded' 
   --data id=9

However the thing I would suggest is searching the site for a more convenient option than scraping. On something like this it’s likely to available (and indeed, it is in this case!)

It looks like all of the data you can get from that page (and its variants) can be obtained from the Downloads->Schema page, and those all offer simple download links you can use, eg:
https://biobank.ndph.ox.ac.uk/showcase/schema.cgi?id=5

Answered By: Joel Rein