Why am I getting a random string and not the expected output in Beautiful soup in Python?

Question:

With the following url and soup, I have the following and I seek to webscrape the Subdivision Information Section. I have copied the html portion for one house below:

house_url = 'https://www.har.com/homedetail/2701-main-st-1910-houston-tx-77002/15331551'
house_response = requests.get(url=house_url, headers=your_header)
house_soup = BeautifulSoup(house_response.text, 'html.parser').find('div', {'class':'pt-2 pb-2 mr-4 pr-md-5 ml-4 pl-md-5'})

Subdivision Section HTML

<div id="subDivisonInfo" class="lazy" data-contentname="subdivision-facts"><div class="mb-5 pb-5 border-bottom border-color--cement_light">
    <h2 tabindex="0">Subdivision Facts</h2>
                <a class="font_weight--bold font_size--large mr-4" href="/geomarketarea/100_midtown---houston">View Neighborhood Profile </a>
        <div class="mb-5 mt-4 pb-3">
        <a href="/geomarketarea/100_midtown---houston">
            <div class="mb-3 border_radius--round  image" style="height: 360px; width: 100%; background-size: cover; background-repeat: no-repeat; background-position: center center; background-image: url(&quot;https://api.mapbox.com/styles/v1/mapbox/streets-v11/static/path-1+0000ff-0.45+0000ff-0.45(u%7CstDfobeQnCbExAhAdCx%40xBXvDDlA%40lMF%7ECBvBEhJOVAnDo%40dDiClByCpCkEh%60%40sv%40zx%40vi%40rGhEx%40jAXF%5ElAB%60%40Bx%40Dx%40%3FfBPlBLxA%60%40fL%5EtMNhGFlBJdEDxAHlANdCHbBPpDDpA%3FhDFzE%40xB%40zADbBa%40sB%5Bk%40q%40gA%7DA%7DAw%40k%40yAcAqBwAyAgAiBuAyCyBoBuAmA%7B%40u%40i%40%7BBaB%7BC%7BAsBc%40kF%5D%7BE%3F%7BDHqCIuESk%40%3FCvCAv%40%3FtCJ%7EK%3FlA%7DLB%3F%7B%40yJB_K%40_E%3FwJ%40mX%40yA%40sIFgM%40%3FkDOcCm%40_IeBgG%3F%3FaAyDs%40_EWgBGsASuDQiNGoNKwJ%3F%3FQuH%7EClJjCfG)/auto/651x360?access_token=pk.eyJ1IjoiaGFyZGV2ZXJpY2siLCJhIjoiY2sxZ3FuNWJpMDFtbDNjbDJ0bnJnbnpkdyJ9.byj8yrbalnyCw4u9TNwYuA&quot;);">
    <img class="img-fluid img-loader" src="https://content.harstatic.com/img/common/loading1.gif" style="display: none;">
</div>

<script type="text/javascript">

    /*! domready (c) Dustin Diaz 2014 - License MIT */
    ;!function(e,t){"undefined"!=typeof module?module.exports=t():"function"==typeof define&&"object"==typeof define.amd?define(t):this.domready=t()}(0,function(){var e,t=[],o="object"==typeof document&&document,n=o&&o.documentElement.doScroll,d=o&&(n?/^loaded|^c/:/^loaded|^i|^c/).test(o.readyState);return!d&&o&&o.addEventListener("DOMContentLoaded",e=function(){for(o.removeEventListener("DOMContentLoaded",e),d=1;e=t.shift();)e()}),function(e){d?setTimeout(e,0):t.push(e)}});

</script>
<script type="text/javascript">
    domready(function() {
        HARMap.load().then(function(module) {
            var componentId = 'image24906579';
            var polygon = 'POLYGON((-95.372842651 29.762188072,-95.373816894 29.761474216,-95.374191992 29.76101599,-95.374483311 29.760352652,-95.374609142 29.759738202,-95.374640089 29.758819359,-95.374647572 29.758426112,-95.374691499 29.756117694,-95.374706659 29.75532102,-95.3746799 29.754718062,-95.374599516 29.752906601,-95.374594347 29.752790096,-95.374351409 29.751909331,-95.373661419 29.751078253,-95.372887724 29.750528187,-95.371869687 29.74980439,-95.362970840876 29.744465093909,-95.369806756 29.735213416,-95.370819903 29.733833779,-95.371197558 29.733537028,-95.371239769 29.733411918,-95.371629671 29.733245349,-95.371804383 29.73323255,-95.372090663 29.733211576,-95.372379167 29.733175792,-95.372896911 29.733184661,-95.373448298 29.733085864,-95.373897555 29.73302357,-95.376020952 29.732848991,-95.378367141 29.732692501,-95.379698574 29.732605591,-95.380251649 29.73256989,-95.381236514 29.732506316,-95.381686127 29.732477294,-95.382077218 29.732432106,-95.382753475 29.732353969,-95.383254109 29.732299798,-95.384141891 29.732214015,-95.384547373 29.732184995,-95.385401694 29.732177061,-95.386504599 29.7321448,-95.387113815 29.732128476,-95.38757473 29.732116124,-95.388065951 29.732085741,-95.387487144 29.732263493,-95.387266095 29.732397447,-95.386907035 29.732649085,-95.386438833 29.733120089,-95.386221661 29.733399038,-95.385875385 29.733846224,-95.385438799 29.734415138,-95.38508092 29.734869849,-95.384651063 29.735397347,-95.384037746 29.736172341,-95.383612227 29.736729284,-95.383311768 29.737122539,-95.383099783 29.737389038,-95.382608073 29.738007194,-95.382145211 29.738793482,-95.381972784 29.739372769,-95.381824272 29.740550975,-95.381818116 29.741650627,-95.381867014 29.742586972,-95.381820616 29.743316543,-95.38172399 29.744387766,-95.381721351 29.744610716,-95.382483057 29.744631286,-95.382763059 29.744636527,-95.383509646 29.74463557,-95.385588363 29.744584135,-95.385979746 29.744575193,-95.386003821 29.746807942,-95.385699411 29.746810253,-95.385718338 29.748696241,-95.385725089 29.750624573,-95.385732125 29.75158054,-95.385735183 29.753459289,-95.385748282 29.757531403,-95.385758824 29.757976236,-95.385799269 29.75968271,-95.385808634 29.76196431,-95.384952318 29.761964306,-95.384292125 29.762037035,-95.382689529 29.76226967,-95.381370964 29.762781324,-95.381373603 29.762782609,-95.380443808 29.763114063,-95.379476498 29.763374528,-95.378959294 29.763485938,-95.378540344 29.763528353,-95.377633441 29.763629227,-95.375180147 29.763724096,-95.37270141 29.763764494,-95.370824522 29.763815179,-95.370818039 29.76381558,-95.369274785 29.76391082,-95.371101581 29.763114639,-95.372416403 29.762414919,-95.372842651 29.762188072))';
            var node = $('.' + componentId).removeClass(componentId);

            // var result = module.StaticMap.custom.withPolygon(node.width(), node.height(), polygon)
            // result.backgroundImage(node);

            var result = module.StaticMap.custom.withPolygon(node.width(), node.height(), polygon)
            result.backgroundImage(node);

            /*var geometry = module.geometry;
            var points = geometry.pointsFromWKT(polygon);
            //console.log(points);
            if(points.length > 100) { points = geometry.simplifyPolygon(points, 0.0001); }
            if(points.length > 100) { points = geometry.simplifyPolygon(points, 0.001); }
            //console.log(points);
            var encString = geometry.encodePath(points);
            var width = node.width();
            var height = node.height();
            if(!width) { console.error('width cannot be empty!'); }
            if(!height) { console.error('height cannot be empty!'); }
            var path = encodeURIComponent("weight:1|fillcolor:blue|enc:" + encString);
            var url = "/api/staticmap?size="+ width +"x"+ height +"&path="+ path + "&client=gme-houstonrealtorsinformation";
           // alert(url);
            //$(node).html('<a class="pointer" href="'+url+'" id="hoodMapStaticLink"></a><img />');
            var image = new Image();
            image.onload = image.onerror = function() { node.find('img').remove(); }
            image.src = url;
            $(node).css('background-image', 'url(' + url + ')');*/
        });
    });
</script>       </a>
    </div>
                    <h3 class="mt-5 pb-3" tabindex="0">Facts (Based on Active listings)</h3>
        <div class="row">
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Market Area Name</div>
                <div class="font_size--large font_weight--regular">Midtown - Houston</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Home For Sales</div>
                <div class="font_size--large font_weight--regular">104</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average List Price</div>
                <div class="font_size--large font_weight--regular">$428,844</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average Bedrooms</div>
                <div class="font_size--large font_weight--regular">2.27</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average Baths</div>
                <div class="font_size--large font_weight--regular">2.07</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average Sqft</div>
                <div class="font_size--large font_weight--regular">1,873</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average Price/Sqft</div>
                <div class="font_size--large font_weight--regular">$236.48</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Home For Lease</div>
                <div class="font_size--large font_weight--regular">96</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average Lease</div>
                <div class="font_size--large font_weight--regular">$2,396</div>
            </div>
                        <div class="col-md-4 col-6 mb-4">
                <div class="font_weight--bold font_size--small_extra">Average Lease/Sqft</div>
                <div class="font_size--large font_weight--regular">$1.76</div>
            </div>
                    </div>
            </div>
</div>

However, whenever I use beautifulSoup to get the text such as "Average List Price:$428,844", This is the output I get:

house_soup.find('div',{'id':'subDivisonInfo'}).find('div',{'class':'row'}).findAll('div',{'class':'col-md-4 col-6 mb-4'})[0].getText()
'n-----------n-----------n'

I am not sure why it is returning this string instead of the actual text?

Asked By: Josh

||

Answers:

The required data is loaded from external source via AJAX.So you have to use API url instead.

import requests
from bs4 import BeautifulSoup
api_url= 'https://www.har.com/api/getSubdivisionFacts/15331551'
req=requests.get(api_url).text
#print(req)
soup= BeautifulSoup(req,'lxml')
price = soup.select_one('[class="col-md-4 col-6 mb-4"] > div:-soup-contains("Average List Price")').find_next_sibling('div')
print(price.text)

Output:

$428,844
Answered By: F.Hoque

Because there is a script executed that gets the data when you open the url in the browser. Try performing a get request in python and check the html contents. The initial html does not contain the details you are looking for such as "Average Listing Price".

Answered By: zvz