scrap Cdn-cgi email protecton
Question:
When I’m trying to scrap https://www.kw.com/agent/UPA-6904130219335225344-3 email, I’m a facing a problem. It showing [email protected].
How I can I solve this problem?
import requests as rq
from bs4 import BeautifulSoup as bs
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
"referer": "https://www.kw.com/","Cookie": "AWSALBCORS=+zJ8cI5SlUQSmSyZe9oowBrLJG26sqwO0IeWt1REISJhvdWfy3/YYfWgi32NgACJQv0K/QVwJoFAYnmOTgkWT/OihI7yH1M1kT+5IqLYyUjIYn1AyBPsN2JCO9dO"}
url = 'https://www.kw.com/agent/UPA-6904130219335225344-3'
html = rq.get(url, headers=headers)
soup = bs(html.text, 'html.parser')
email = soup.find('a', class_="AgentInformation__factBody")
print(email)
and output
<a aria-label="Agent E-mail" class="AgentInformation__factBody" href="/cdn-cgi/l/email-protection#c7a4a6aba2a587a6a4b5a2a2a5b5a8b3afa2b5b4b5a2a6abb3bee9a4a8aa" type="button"><span class="__cf_email__" data-cfemail="5734363b3235173634253232352538233f3225242532363b232e7934383a">[email protected]</span></a>
Answers:
Email is protected by CDN. To pull the email you can use an automation tool something like selenium with bs4.
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
driver.get('https://www.kw.com/agent/UPA-6904130219335225344-3')
driver.maximize_window()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
email = soup.find('a', class_="AgentInformation__factBody")
print(email.text)
Output:
[email protected]
When I’m trying to scrap https://www.kw.com/agent/UPA-6904130219335225344-3 email, I’m a facing a problem. It showing [email protected].
How I can I solve this problem?
import requests as rq
from bs4 import BeautifulSoup as bs
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
"referer": "https://www.kw.com/","Cookie": "AWSALBCORS=+zJ8cI5SlUQSmSyZe9oowBrLJG26sqwO0IeWt1REISJhvdWfy3/YYfWgi32NgACJQv0K/QVwJoFAYnmOTgkWT/OihI7yH1M1kT+5IqLYyUjIYn1AyBPsN2JCO9dO"}
url = 'https://www.kw.com/agent/UPA-6904130219335225344-3'
html = rq.get(url, headers=headers)
soup = bs(html.text, 'html.parser')
email = soup.find('a', class_="AgentInformation__factBody")
print(email)
and output
<a aria-label="Agent E-mail" class="AgentInformation__factBody" href="/cdn-cgi/l/email-protection#c7a4a6aba2a587a6a4b5a2a2a5b5a8b3afa2b5b4b5a2a6abb3bee9a4a8aa" type="button"><span class="__cf_email__" data-cfemail="5734363b3235173634253232352538233f3225242532363b232e7934383a">[email protected]</span></a>
Email is protected by CDN. To pull the email you can use an automation tool something like selenium with bs4.
import time
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))#,options=options
driver.get('https://www.kw.com/agent/UPA-6904130219335225344-3')
driver.maximize_window()
time.sleep(3)
soup = BeautifulSoup(driver.page_source, 'lxml')
email = soup.find('a', class_="AgentInformation__factBody")
print(email.text)
Output:
[email protected]