Scraping images in nested divs
Question:
I am trying to scrape the images from a personal imgur gallery: https://imgur.com/a/FIR1BL1 so I can then format them and prepare them for linking to my website. I want a list of all the image links, but for some reason I can’t get any. I also tried with a CSS selector but no luck. I suspect it might be because they are too deeply nested. Also I don’t have much experience with scraping.
This is what I came up with using Python and BeautifulSoup:
import requests
from bs4 import BeautifulSoup
# Make a GET request to the website
r = requests.get("https://imgur.com/a/FIR1BL1")
# Parse the HTML content
soup = BeautifulSoup(r.content, 'html.parser')
# Find the element with tag "div" and class "PostContent-imageWrapper-rounded"
div = soup.find_all("div", class_="PostContent-imageWrapper-rounded")
if div:
# Find all the "img" elements inside the div
img_tags = div.find_all('img')
# Print the src attribute of each img element
for img in img_tags:
print(img['src'])
else:
print("Div not found")
Answers:
You are not finding them because they are not there, The images are loaded from the imgur api. to see the request is loading them:
- Open a new tab
- Open developer tools and go to network tab
- open your imgur link in the tab (https://imgur.com/a/FIR1BL1 is the one you have)
- use the search to find this request
https://api.imgur.com/post/v1/albums/FIR1BL1
or something similar
- This request has the data you looking for try to reconstruct something similar and use request.json() to parse it
You can try to use their API:
import requests
# FIR1BL1 is the album name
url = "https://api.imgur.com/post/v1/albums/FIR1BL1?client_id=546c25a59c58ad7&include=media"
data = requests.get(url).json()
for m in data['media']:
print(m['url'])
Prints:
https://i.imgur.com/q4UuhEq.jpeg
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
I am trying to scrape the images from a personal imgur gallery: https://imgur.com/a/FIR1BL1 so I can then format them and prepare them for linking to my website. I want a list of all the image links, but for some reason I can’t get any. I also tried with a CSS selector but no luck. I suspect it might be because they are too deeply nested. Also I don’t have much experience with scraping.
This is what I came up with using Python and BeautifulSoup:
import requests
from bs4 import BeautifulSoup
# Make a GET request to the website
r = requests.get("https://imgur.com/a/FIR1BL1")
# Parse the HTML content
soup = BeautifulSoup(r.content, 'html.parser')
# Find the element with tag "div" and class "PostContent-imageWrapper-rounded"
div = soup.find_all("div", class_="PostContent-imageWrapper-rounded")
if div:
# Find all the "img" elements inside the div
img_tags = div.find_all('img')
# Print the src attribute of each img element
for img in img_tags:
print(img['src'])
else:
print("Div not found")
You are not finding them because they are not there, The images are loaded from the imgur api. to see the request is loading them:
- Open a new tab
- Open developer tools and go to network tab
- open your imgur link in the tab (https://imgur.com/a/FIR1BL1 is the one you have)
- use the search to find this request
https://api.imgur.com/post/v1/albums/FIR1BL1
or something similar - This request has the data you looking for try to reconstruct something similar and use request.json() to parse it
You can try to use their API:
import requests
# FIR1BL1 is the album name
url = "https://api.imgur.com/post/v1/albums/FIR1BL1?client_id=546c25a59c58ad7&include=media"
data = requests.get(url).json()
for m in data['media']:
print(m['url'])
Prints:
https://i.imgur.com/q4UuhEq.jpeg
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com
View post on imgur.com