Scraping a website that doesn't have specific tags with classes
Question:
So I am scraping a used car website I’ve got the make, model, year, and miles but I don’t know how to get the others due to them being the li tag as well. I’ve put all my code here
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://jammer.ie/used-cars'
response = requests.get(url)
response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
soup
results = soup.find_all('div', {'class': 'span-9 right-col'})
len(results)
results[0].find('h6',{'class':'car-make'}).get_text()
results[0].find('p', {'class':'model'}).get_text()
results[0].find('p', {'class': 'year'}).get_text()
results[0].find('li').get_text().replace('n', "")
I get the information I want from the above code but for other parts of the li tags they have img tags and span tags how can I get the information from each of the li tags?
I am new to python so would like it to be somewhat simply and explained to me please
I tired using the img tag but don’t think I used it right.
Answers:
To get all features into a dataframe you can do:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://jammer.ie/used-cars"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
all_data = []
for car in soup.select(".car"):
info = car.select_one(".top-info").get_text(strip=True, separator="|")
make, model, year, price = info.split("|")
features = {}
for feature in car.select(".car--features li"):
k = feature.img["src"].split("/")[-1].split(".")[0]
v = feature.span.text
features[f"feature_{k}"] = v
all_data.append(
{"make": make, "model": model, "year": year, "price": price, **features}
)
df = pd.DataFrame(all_data)
print(df.to_markdown(index=False))
Prints:
make
model
year
price
feature_speed
feature_engine
feature_transmission
feature_owner
feature_door-icon1
feature_petrol5
feature_paint
feature_hatchback
Ford
Fiesta
2010
€5,950
113144 miles
1.4 litres
Manual
4 previous owners
5 doors
Diesel
Silver
Hatchback
Volkswagen
Polo
2013
Price on application
41000 miles
1.2 litres
Automatic
nan
5 doors
Petrol
Blue
Hatchback
Volkswagen
Polo
2015
Price on application
27000 miles
1.2 litres
Automatic
nan
5 doors
Petrol
Red
Hatchback
Audi
A1
2014
Price on application
45000 miles
1.4 litres
Automatic
nan
3 doors
Petrol
White
Hatchback
Audi
A3
2014
Price on application
79000 miles
1.4 litres
Automatic
nan
5 doors
Petrol
White
Hatchback
Audi
A3
2008
€4,450
147890 miles
1.6 litres
Automatic
3 previous owners
3 doors
Petrol
Black
Hatchback
SEAT
Alhambra
2018
€29,950
134000 miles
2.0 litres
Manual
2 previous owners
5 doors
Diesel
White
MPV
Volkswagen
Jetta
2014
€8,950
138569 miles
1.6 litres
Manual
3 previous owners
4 doors
Diesel
Grey
Saloon
Volkswagen
Beetle
2014
Price on application
66379 miles
1.2 litres
Automatic
1 previous owners
2 doors
Petrol
Black
Hatchback
Volvo
XC60
2019
€44,950
38214 miles
2.0 litres
Automatic
1 previous owners
5 doors
Diesel
Black
Estate
Toyota
Aqua
2014
Price on application
67405 miles
1.5 litres
Automatic
1 previous owners
5 doors
nan
White
Hatchback
Audi
A3
2014
Price on application
51182 miles
1.4 litres
Automatic
1 previous owners
4 doors
Petrol
Black
Saloon
Volkswagen
Golf
2014
Price on application
68066 miles
1.2 litres
Automatic
1 previous owners
5 doors
Petrol
Blue
Hatchback
So I am scraping a used car website I’ve got the make, model, year, and miles but I don’t know how to get the others due to them being the li tag as well. I’ve put all my code here
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = 'https://jammer.ie/used-cars'
response = requests.get(url)
response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
soup
results = soup.find_all('div', {'class': 'span-9 right-col'})
len(results)
results[0].find('h6',{'class':'car-make'}).get_text()
results[0].find('p', {'class':'model'}).get_text()
results[0].find('p', {'class': 'year'}).get_text()
results[0].find('li').get_text().replace('n', "")
I get the information I want from the above code but for other parts of the li tags they have img tags and span tags how can I get the information from each of the li tags?
I am new to python so would like it to be somewhat simply and explained to me please
I tired using the img tag but don’t think I used it right.
To get all features into a dataframe you can do:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://jammer.ie/used-cars"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
all_data = []
for car in soup.select(".car"):
info = car.select_one(".top-info").get_text(strip=True, separator="|")
make, model, year, price = info.split("|")
features = {}
for feature in car.select(".car--features li"):
k = feature.img["src"].split("/")[-1].split(".")[0]
v = feature.span.text
features[f"feature_{k}"] = v
all_data.append(
{"make": make, "model": model, "year": year, "price": price, **features}
)
df = pd.DataFrame(all_data)
print(df.to_markdown(index=False))
Prints:
make | model | year | price | feature_speed | feature_engine | feature_transmission | feature_owner | feature_door-icon1 | feature_petrol5 | feature_paint | feature_hatchback |
---|---|---|---|---|---|---|---|---|---|---|---|
Ford | Fiesta | 2010 | €5,950 | 113144 miles | 1.4 litres | Manual | 4 previous owners | 5 doors | Diesel | Silver | Hatchback |
Volkswagen | Polo | 2013 | Price on application | 41000 miles | 1.2 litres | Automatic | nan | 5 doors | Petrol | Blue | Hatchback |
Volkswagen | Polo | 2015 | Price on application | 27000 miles | 1.2 litres | Automatic | nan | 5 doors | Petrol | Red | Hatchback |
Audi | A1 | 2014 | Price on application | 45000 miles | 1.4 litres | Automatic | nan | 3 doors | Petrol | White | Hatchback |
Audi | A3 | 2014 | Price on application | 79000 miles | 1.4 litres | Automatic | nan | 5 doors | Petrol | White | Hatchback |
Audi | A3 | 2008 | €4,450 | 147890 miles | 1.6 litres | Automatic | 3 previous owners | 3 doors | Petrol | Black | Hatchback |
SEAT | Alhambra | 2018 | €29,950 | 134000 miles | 2.0 litres | Manual | 2 previous owners | 5 doors | Diesel | White | MPV |
Volkswagen | Jetta | 2014 | €8,950 | 138569 miles | 1.6 litres | Manual | 3 previous owners | 4 doors | Diesel | Grey | Saloon |
Volkswagen | Beetle | 2014 | Price on application | 66379 miles | 1.2 litres | Automatic | 1 previous owners | 2 doors | Petrol | Black | Hatchback |
Volvo | XC60 | 2019 | €44,950 | 38214 miles | 2.0 litres | Automatic | 1 previous owners | 5 doors | Diesel | Black | Estate |
Toyota | Aqua | 2014 | Price on application | 67405 miles | 1.5 litres | Automatic | 1 previous owners | 5 doors | nan | White | Hatchback |
Audi | A3 | 2014 | Price on application | 51182 miles | 1.4 litres | Automatic | 1 previous owners | 4 doors | Petrol | Black | Saloon |
Volkswagen | Golf | 2014 | Price on application | 68066 miles | 1.2 litres | Automatic | 1 previous owners | 5 doors | Petrol | Blue | Hatchback |