Can't scrape breweries' names from a website using the requests module
Question:
I’ve created a script to collect the breweries’ names from this website using the requests module, but when I execute the script, it ends up getting nothing. I looked for the title in the page source and also in any undocumented APIs that are usually found through dev tools, but no luck.
import requests
from bs4 import BeautifulSoup
link = "https://www.brewersassociation.org/directories/breweries/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
for item in soup.select(".company-content > h3[itemprop='name']"):
print(item.text)
Answers:
You can try:
import requests
import pandas as pd
url = 'https://www.brewersassociation.org/wp-content/themes/ba2019/json-store/breweries/breweries.json'
data = requests.get(url).json()
df = pd.DataFrame(data)
df = pd.concat([df, df.pop('BillingAddress').apply(pd.Series, dtype=object)], axis=1)
df.pop('attributes')
# print sample data, total length should be 26802 breweries:
print(df.head().to_markdown(index=False))
Prints:
Id
Name
Parent
Phone
Website
Brewery_Type__c
Is_Craft_Brewery__c
Voting_Member__c
Membership_Record_Item__c
Membership_Record_Paid_Through_Date__c
Membership_Record_Status__c
Account_Badges__c
city
country
countryCode
geocodeAccuracy
latitude
longitude
postalCode
state
stateCode
street
0014x000012jyoHAAQ
Brewery in Planning – Monterrey
(811) 244-8078
Brewery In Planning
False
False
Monterrey
Mexico
MX
Block
25.6444
-100.275
64850
Tucan 362
0014x000012jyoJAAQ
Sekinoichi-shuzo Co.,Ltd/Iwai Brewery
+81-191-21-1144
www.sekinoichi.co.jp
Brewpub
False
False
Ichinoseki-city
Japan
JP
Address
38.9314
141.132
021-0885
5-42 Tamuracho
0014x000012jyoKAAQ
Selby (Middleborough) Brewery Ltd
01757 702826
False
False
Selby
United Kingdom
GB
Block
53.7871
-1.07141
YO8 3LL
131 Milgate
0014x000012jyoLAAQ
SENDERO BREWING COMPANY
www.senderobrewing.com
Brewery In Planning
False
False
Brewery Membership
2019-10-31
Expired
San Pedro Sula
Honduras
HN
City
15.5039
-88.0157
21102
Los Alpes, Boulevard McKay
0014x000012jyoMAAQ
Ser Bhum Microbrewery
Micro
False
False
Brewery Membership
2017-08-31
Expired
Thimphu
Bhutan
BT
nan
nan
Hongtsho Hongtsho
I’ve created a script to collect the breweries’ names from this website using the requests module, but when I execute the script, it ends up getting nothing. I looked for the title in the page source and also in any undocumented APIs that are usually found through dev tools, but no luck.
import requests
from bs4 import BeautifulSoup
link = "https://www.brewersassociation.org/directories/breweries/"
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"html.parser")
for item in soup.select(".company-content > h3[itemprop='name']"):
print(item.text)
You can try:
import requests
import pandas as pd
url = 'https://www.brewersassociation.org/wp-content/themes/ba2019/json-store/breweries/breweries.json'
data = requests.get(url).json()
df = pd.DataFrame(data)
df = pd.concat([df, df.pop('BillingAddress').apply(pd.Series, dtype=object)], axis=1)
df.pop('attributes')
# print sample data, total length should be 26802 breweries:
print(df.head().to_markdown(index=False))
Prints:
Id | Name | Parent | Phone | Website | Brewery_Type__c | Is_Craft_Brewery__c | Voting_Member__c | Membership_Record_Item__c | Membership_Record_Paid_Through_Date__c | Membership_Record_Status__c | Account_Badges__c | city | country | countryCode | geocodeAccuracy | latitude | longitude | postalCode | state | stateCode | street |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0014x000012jyoHAAQ | Brewery in Planning – Monterrey | (811) 244-8078 | Brewery In Planning | False | False | Monterrey | Mexico | MX | Block | 25.6444 | -100.275 | 64850 | Tucan 362 | ||||||||
0014x000012jyoJAAQ | Sekinoichi-shuzo Co.,Ltd/Iwai Brewery | +81-191-21-1144 | www.sekinoichi.co.jp | Brewpub | False | False | Ichinoseki-city | Japan | JP | Address | 38.9314 | 141.132 | 021-0885 | 5-42 Tamuracho | |||||||
0014x000012jyoKAAQ | Selby (Middleborough) Brewery Ltd | 01757 702826 | False | False | Selby | United Kingdom | GB | Block | 53.7871 | -1.07141 | YO8 3LL | 131 Milgate | |||||||||
0014x000012jyoLAAQ | SENDERO BREWING COMPANY | www.senderobrewing.com | Brewery In Planning | False | False | Brewery Membership | 2019-10-31 | Expired | San Pedro Sula | Honduras | HN | City | 15.5039 | -88.0157 | 21102 | Los Alpes, Boulevard McKay | |||||
0014x000012jyoMAAQ | Ser Bhum Microbrewery | Micro | False | False | Brewery Membership | 2017-08-31 | Expired | Thimphu | Bhutan | BT | nan | nan | Hongtsho Hongtsho |