Python-Requests How do I capture a specific section of html code from response?
Question:
how do I request a specific tag? For example, I want to get the footer as an output instead of the whole HTML page. how do I do that?
import requests as req
resp = req.get("site")
print(resp.text)
I want to get only this as the output instead of the whole HTML file; is it possible?
<footer class="footer">
<ol>
<li class="nav-item">
<a class="nav-link active" aria-current="page" href="index.html">Home</a>
</li>
<li class="nav-item">
<a class="nav-link" href="about_us.html"> About us</a>
</li>
<li class="nav-item">
<a class="nav-link" href="ticket.html"> Submit a ticket</a>
</li>
<li class="nav-item">
<a class="nav-link" href="tos.html"> Terms of use</a>
</li>
<li class="nav-item">
<a class="nav-link" href="donate.html"> Donate</a>
</li>
<li class="nav-item">
<a class="nav-link" href="news.html"> News</a>
</li>
<li class="nav-item">
<a class="nav-link" href="quotes.html"> Quotes</a>
</li>
</ol>
</footer>
Answers:
You can use (requests-HTML) instead of requests. Here you can extract specific classes from a html page.
This should work:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://ilyabr.com')
print( r.html.find('.footer', first=True).html )
how do I request a specific tag? For example, I want to get the footer as an output instead of the whole HTML page. how do I do that?
import requests as req
resp = req.get("site")
print(resp.text)
I want to get only this as the output instead of the whole HTML file; is it possible?
<footer class="footer">
<ol>
<li class="nav-item">
<a class="nav-link active" aria-current="page" href="index.html">Home</a>
</li>
<li class="nav-item">
<a class="nav-link" href="about_us.html"> About us</a>
</li>
<li class="nav-item">
<a class="nav-link" href="ticket.html"> Submit a ticket</a>
</li>
<li class="nav-item">
<a class="nav-link" href="tos.html"> Terms of use</a>
</li>
<li class="nav-item">
<a class="nav-link" href="donate.html"> Donate</a>
</li>
<li class="nav-item">
<a class="nav-link" href="news.html"> News</a>
</li>
<li class="nav-item">
<a class="nav-link" href="quotes.html"> Quotes</a>
</li>
</ol>
</footer>
You can use (requests-HTML) instead of requests. Here you can extract specific classes from a html page.
This should work:
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('http://ilyabr.com')
print( r.html.find('.footer', first=True).html )