In Python scrapy, With Multiple projects, How Do You Import A Class Method From One project Into Another project?

Question:

PROBLEM

I need to import a function/method located in scrapy project #1 into a spider in scrapy project # 2 and use it in one of the spiders of project #2.

DIRECTORY STRUCTURE

For starters, here’s my directory structure (assume these are all under one root directory):

/importables    # scrapy project #1 
    /importables
        /spiders
            title_collection.py    # take class functions defined from here

/alibaba        # scrapy project #2
    /alibaba
        /spiders
            alibabaPage.py         # use them here

WHAT I WANT

As shown above, I am trying to get scrapy to:

  1. Run alibabaPage.py
  2. From title_collection.py, import a class method named saveTitleInTitlesCollection out of a class in that file named TitleCollectionSpider
  3. I want to use saveTitleInTitlesCollection inside functions that are called in the alibabaPage.py spider.

HOW IT’S GOING…

Here’s what I’ve done so far at the top of alibabaPage.py:

  1. from importables.importables.spiders import saveTitleInTitlesCollection

    • nope. Fails and the error says builtins.ModuleNotFoundError: No module named 'importables'

    • How can that be? That answer I got from this answer.

  2. sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))
    Then, I did this…
    from importables.importables.spiders import saveTitleInTitlesCollection

    • nope, Fails and I get the same error as the first attempt. Taken from this answer.
  3. Re-reading the post in the link from answer #1, I realized the guy put the two files in the same directory, so, I tried doing that (making a copy of title_collection.py and putting it in like so:

/alibaba        # scrapy project #2
    /alibaba
        /spiders
            alibabaPage.py         # use them here
            title_collection.py    # added this
  • Well, that appeared to work but didn’t in the end. This threw no errors…
from alibaba.spiders.title_collection import TitleCollectionSpiderAlibaba 

Leading me to assume everything worked. I added a test function named testForImport and tried importing it, ended up getting error: builtins.ModuleNotFoundError: No module named 'alibaba.spiders.title_collection.testForImport'; 'alibaba.spiders.title_collection' is not a package

  • Unfortunately, this wasn’t actually achieving the goal of importing the class method I want to use, named saveTitleInTitlesCollection.

  • I have numerous scrapy projects and want to really just have one project of spiders that I can just import into every other project with ease.

  • This is not that solution so, the quest for a true solution to importing a bunch of class methods from one scrapy project to many continues… can this even be done I wonder…

  • WAIT, this actually didn’t work after all because when builtins.ModuleNotFoundError:
    No module named ‘TitleCollectionSpiderAlibaba’

  1. from alibaba.spiders.title_collection import testForImport
  • nope. This failed too.

    But, this time it gave me slightly different error…

builtins.ImportError: 
cannot import name 'testForImport' from 'alibaba.spiders.title_collection' 
(C:UsersUser\scrapy-webscrapersalibabaalibabaspiderstitle_collection.py)

Consider this solved

Due to Umair’s answer I was able to do this:

# typical scrapy spider imports...
import scrapy 
from ..items import AlibabaItem

# import this near the top of the page
sys.path.append(os.path.join(os.path.abspath('../')))
from importables.importables.spiders.title_collection import TitleCollectionSpider


...

# then, in parse method I did this...
def parse(self, response):
    alibaba_item = AlibabaItem()
    title_collection_spider_obj = TitleCollectionSpider()
    title_collection_spider_obj.testForImportTitlesCollection()

# terminal showed this, proving it worked...
# "testForImport worked if you see this!"

Asked By: rom

||

Answers:

inside alibabaPage.py you can do this to import class outside of your Scrapy project folder

import os, sys
sys.path.append(os.path.join(os.path.abspath('../')))

from importables.importables.spiders.title_collection import TitleCollectionSpider    

This will import class from title_collection.py into alibabaPage.py

Answered By: Umair Ayub