trying to get text from html via xpath with scrapy "Mobilya"

Question:

Below is the HTML, I am working on and I am trying to get "Gardroplar" text but it return me empty

start with <ol class="nav align-items-center flex-nowrap text-nowrap overflow-auto hide-scrollbar>

<li>
  <a href="/">
    <svg class="icon-home m-0">
      <use
        xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-home"
      ></use>
    </svg>
  </a>
</li>
<li>
  <svg class="icon-arrow2 m-0">
    <use
      xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
    ></use>
  </svg>
  <a href="/mobilya/c/109">Mobilya</a>
</li>
<li>
  <svg class="icon-arrow1 m-0">
    <use
      xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
    ></use>
  </svg>
  <span class="top-breadcrumb">
    <a
      class="d-inline-flex align-items-center border pl-10 rounded-sm"
      href="/mobilya/gardiroplar/c/109011"
      data-toggle="dropdown"
      aria-expanded="false"
      >Gardıroplar<svg class="icon-arrow7 m-0 rotate-top">
        <use
          xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow7"
        ></use>
      </svg>
    </a>
    <ul class="dropdown-menu px-15 py-0 border-0 text-c2">
      <li class="border-bottom py-10">
        <a
          class="d-flex align-items-center justify-content-between pl-5 py-5 reverse font-weight-bold"
          href="/mobilya/gardiroplar/c/109011"
          >Gardıroplar</a
        >
      </li>

      <li class="px-10 border-bottom">
        <a
          class="d-flex align-items-center justify-content-between pl-5 py-5 reverse"
          href="/gardiroplar/kapakli-gardiroplar/c/109011002"
          >Kapaklı Gardıroplar<svg class="icon-arrow1 ml-5">
            <use
              xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
            ></use>
          </svg>
        </a>
      </li>
      <li class="px-10 border-bottom">
        <a
          class="d-flex align-items-center justify-content-between pl-5 py-5 reverse"
          href="/gardiroplar/surgulu-gardiroplar/c/109011003"
          >Sürgülü Gardıroplar<svg class="icon-arrow1 ml-5">
            <use
              xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
            ></use>
          </svg>
        </a>
      </li>
      <li class="px-10 border-bottom">
        <a
          class="d-flex align-items-center justify-content-between pl-5 py-5 reverse"
          href="/gardiroplar/bez-dolaplar/c/109011001"
          >Bez Dolaplar<svg class="icon-arrow1 ml-5">
            <use
              xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
            ></use>
          </svg>
        </a>
      </li>
    </ul>
  </span>
</li>
<li>
  <svg class="icon-arrow1 m-0">
    <use
      xlink_href="/_ui/responsive/theme-alpha/images/icons.svg#icon-arrow1"
    ></use>
  </svg>
  <a href="/gardiroplar/kapakli-gardiroplar/c/109011002">Kapaklı Gardıroplar</a>
</li>
</ol>

My code:

response.xpath('//ol[@class="nav.align-items-center.flex-nowrap.text-nowrap.overflow-auto.hide-scrollbar.tab-title"]//li[svg[contains(@class,"icon-arrow2")]]/text()').getall()
Asked By: destan

||

Answers:

In case icon-arrow2 class is fixed value there you can use the following XPath:

"//li[./*[contains(@class,'icon-arrow2')]]//a"

The complete command is

response.xpath("//li[./*[contains(@class,'icon-arrow2')]]//a/text()").getall()

UPD
After you shared the actual link to that page I can give you better locator.
This will work:

response.xpath("//li[./*[contains(@class,'icon-arrow')]]/a[contains(@href,'mob')]/text()").getall()

UPD2
This will give you Kapaklı Gardıroplar text:

response.xpath("//li[./*[contains(@class,'icon-arrow')]]/a[contains(@href,'gar')]/text()").getall()

UPD3
This will give you Gardıroplar text as you defined:

response.xpath("//li[@class='border-bottom py-10']/a/text()").getall()
Answered By: Prophet

The issue you are facing is you’re using a css selector in an xpath statement. Assuming your path is accurate you would need to use the full text of the class.

response.xpath('//ol[@class="nav align-items-center flex-nowrap text-nowrap overflow-auto hide-scrollbar tab-title"]//li[svg[contains(@class,"icon-arrow2")]]/text()').getall()

BTW Prophets answer is the way to go, I just wanted to explain what your code is missing.

Answered By: Alexander
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.