How to login and web scrape "support.oracle.com" using python3 requests?

Question

Am trying to webscrape below mentioned URL using python requests, but unable to make it.

Url: https://support.oracle.com/rs?type=doc&id=1439822.1

Not Working Code:

import requests
from bs4 import BeautifulSoup  

s = requests.session()
s.headers.update(headers)


r = s.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('[email protected]', 'mypass'), allow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())

Expected Output: (Got output via web-browser, post successful login. Actually need below output on command line)

Current Output: (Shows again the login page)

Note: Am able to achieve through wget command, but i need to do with python request.

wget --user "[email protected]" --password "mypass" "https://support.oracle.com/rs?type=doc&id=1439822.1" -O /root/webout.html

Appreciate your help !!

Asked By: M.S. Arun

||

Source

Answer 1

Finally Found the Answer !!

import requests
from bs4 import BeautifulSoup

r = requests.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('[email protected]', 'mypass'), allow_redirects=True)

full_fetch = requests.get(r.url, auth=('[email protected]', 'mypass), allow_redirects=True) 
soup = BeautifulSoup(full_fetch.text, 'html.parser')
print(soup.prettify())

Answered By: M.S. Arun

Answer 2

I tried that last solution and gave me exactly the same output like you at the beginning.

Answered By: Dawe

How to login and web scrape "support.oracle.com" using python3 requests?

Question:

Answers: