Python Beautiful Soup I Want to Go Inside of A Tag Element
Question:
while True:
print(url)
response = requests.get(url, headers=headers)
# print(response.status_code)
soup = BeautifulSoup(response.content, 'html.parser')
footer = soup.select_one('li.page-item.nb.active')
print(footer.text.strip())
for tags in soup.find_all('h6'):
print(tags)
# tags = soup.select_one('h6>a') <<<<<<<<<<< This part i want to go inside of h6 element click it and get data from there
next_page = soup.select_one('li.page-item.next>a')
if next_page:
next_url = next_page.get('href')
url = urljoin(url, next_url)
else:
break
Hi Guys, I want to extract data from current page, going to clickable page which is the h6 tag. and loop again. I cannot figure out how can I solve the issue with for loops. please help thank you. i already updated the code
Answers:
From the url you provided, taking the first as an example,
Notice there /people/232-lee-min-ho
is a sublink.
All you got to do is scrape the sublink and add it to the main link as shown below,
new_link = https://mydramalist.com + sublink
it should give you the full link https://mydramalist.com/people/232-lee-min-ho
Now perform another requests.get(new_link)
on your new link to retrieve the contents.
Example code:
import requests
from bs4 import BeautifulSoup
url = 'https://mydramalist.com/search?adv=people&na=3&so=popular&page=1'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for links in soup.find_all("h6", {"class": "text-primary title"}):
sublink = links.find("a").get("href")
print(sublink)
new_link = "https://mydramalist.com" + sublink
response2 = requests.get(new_link)
soup2 = BeautifulSoup(response2.content, 'html.parser')
soup.find()....
....
....
#do all your searches here
You should be able to get the rest.
while True:
print(url)
response = requests.get(url, headers=headers)
# print(response.status_code)
soup = BeautifulSoup(response.content, 'html.parser')
footer = soup.select_one('li.page-item.nb.active')
print(footer.text.strip())
for tags in soup.find_all('h6'):
print(tags)
# tags = soup.select_one('h6>a') <<<<<<<<<<< This part i want to go inside of h6 element click it and get data from there
next_page = soup.select_one('li.page-item.next>a')
if next_page:
next_url = next_page.get('href')
url = urljoin(url, next_url)
else:
break
Hi Guys, I want to extract data from current page, going to clickable page which is the h6 tag. and loop again. I cannot figure out how can I solve the issue with for loops. please help thank you. i already updated the code
From the url you provided, taking the first as an example,
Notice there /people/232-lee-min-ho
is a sublink.
All you got to do is scrape the sublink and add it to the main link as shown below,
new_link = https://mydramalist.com + sublink
it should give you the full link https://mydramalist.com/people/232-lee-min-ho
Now perform another requests.get(new_link)
on your new link to retrieve the contents.
Example code:
import requests
from bs4 import BeautifulSoup
url = 'https://mydramalist.com/search?adv=people&na=3&so=popular&page=1'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
for links in soup.find_all("h6", {"class": "text-primary title"}):
sublink = links.find("a").get("href")
print(sublink)
new_link = "https://mydramalist.com" + sublink
response2 = requests.get(new_link)
soup2 = BeautifulSoup(response2.content, 'html.parser')
soup.find()....
....
....
#do all your searches here
You should be able to get the rest.