How to login and web scrape "support.oracle.com" using python3 requests?
Question:
Am trying to webscrape below mentioned URL using python requests, but unable to make it.
Url: https://support.oracle.com/rs?type=doc&id=1439822.1
Not Working Code:
import requests
from bs4 import BeautifulSoup
s = requests.session()
s.headers.update(headers)
r = s.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('[email protected]', 'mypass'), allow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
Expected Output: (Got output via web-browser, post successful login. Actually need below output on command line)
Current Output: (Shows again the login page)
Note: Am able to achieve through wget command, but i need to do with python request.
wget --user "[email protected]" --password "mypass" "https://support.oracle.com/rs?type=doc&id=1439822.1" -O /root/webout.html
Appreciate your help !!
Answers:
Finally Found the Answer !!
import requests
from bs4 import BeautifulSoup
r = requests.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('[email protected]', 'mypass'), allow_redirects=True)
full_fetch = requests.get(r.url, auth=('[email protected]', 'mypass), allow_redirects=True)
soup = BeautifulSoup(full_fetch.text, 'html.parser')
print(soup.prettify())
I tried that last solution and gave me exactly the same output like you at the beginning.
Am trying to webscrape below mentioned URL using python requests, but unable to make it.
Url: https://support.oracle.com/rs?type=doc&id=1439822.1
Not Working Code:
import requests
from bs4 import BeautifulSoup
s = requests.session()
s.headers.update(headers)
r = s.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('[email protected]', 'mypass'), allow_redirects=True)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.prettify())
Expected Output: (Got output via web-browser, post successful login. Actually need below output on command line)
Current Output: (Shows again the login page)
Note: Am able to achieve through wget command, but i need to do with python request.
wget --user "[email protected]" --password "mypass" "https://support.oracle.com/rs?type=doc&id=1439822.1" -O /root/webout.html
Appreciate your help !!
Finally Found the Answer !!
import requests
from bs4 import BeautifulSoup
r = requests.get("https://support.oracle.com/rs?type=doc&id=1439822.1", auth=('[email protected]', 'mypass'), allow_redirects=True)
full_fetch = requests.get(r.url, auth=('[email protected]', 'mypass), allow_redirects=True)
soup = BeautifulSoup(full_fetch.text, 'html.parser')
print(soup.prettify())
I tried that last solution and gave me exactly the same output like you at the beginning.