PyTorch throws OSError on Detectron2LayoutModel()

Question:

I’ve been trying to read pdf pages as an image, for extraction purposes.

I found that layoutparser serves this purpose by identifying blocks of text. However, when I try to Create a Detectron2-based Layout Detection Model, I encounter the following error:

codeblock:

model = lp.Detectron2LayoutModel(
        config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', 
        label_map   = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, 
        extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] 
    )

error:

OSError                                   Traceback (most recent call last)
<ipython-input-16-893fdc4d537c> in <module>
      2             config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config', 
      3             label_map   = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, 
----> 4             extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] 
      5         )
      6 
.
.
.
d:softwarespython3libsite-packagesportalockerutils.py in _get_fh(self)
    269     def _get_fh(self) -> typing.IO:
    270         '''Get a new filehandle'''
--> 271         return open(self.filename, self.mode, **self.file_open_kwargs)
    272 
    273     def _get_lock(self, fh: typing.IO) -> typing.IO:

OSError: [Errno 22] Invalid argument: 'C:\Users\user/.torch/iopath_cache\s/nau5ut6zgthunil\config.yaml?dl=1.lock'

I checked the destination path folder, and surprisingly, there is no config.yaml file, which can be the reason why the error shows up. I tried uninstalling and re-installing PyTorch in anticipation that the .yaml files would be installed correctly. Unfortunately, the problem remains the same.

I would appreciate a solution for this, or an alternative suggestion if exists.

Asked By: iGetRandomBugs

||

Answers:

The config.yaml basically only has configurations for the model as well as a URL for downloading the model weights. I’m not sure why it isn’t automatically downloading for you, but you can also download them from the model zoo page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html

The one you’re looking for is mask_rcnn_X_101_32x8d_FPN_3x trained on PubLayNet. Once you have downloaded the yaml file you can use the same code snippet, only changing the path.

model = lp.Detectron2LayoutModel(config_path='path/to/config.yaml', ...)
Answered By: Carlos Amaral

I found the solution as adding the congif path of tesseract.exe to pytesseract_cmd for running CLI behind on jupyter:

pytesseract.pytesseract.tesseract_cmd = r'pathtofolderTesseract_OCRtesseract.exe'

Then calling the Detectron2Model didn’t throw error.

Referred to this thread Pytesseract : “TesseractNotFound Error: tesseract is not installed or it’s not in your path”, how do I fix this?

Answered By: iGetRandomBugs

Even I got a similar error. I tried out manually some work around in Windows.

I am using your case as example: OSError: [Errno 22] Invalid argument: ‘C:Usersuser/.torch/iopath_caches/nau5ut6zgthunilconfig.yaml?dl=1.lock’

Please follow the following process.

  1. Navigate to C:Usersuser/.torch/iopath_caches/nau5ut6zgthunilconfig.yaml
    Open that config.yaml file

  2. Scroll down to WEIGHTS: https://www.dropbox.com/s/h7th27jfv19rxiy/model_final.pth?dl=1 should be around 265 line.

  3. Copy that link and paste it in your browser, a ‘model_final.pth’ will be downloaded. Copy this file to your desired folder.

  4. Now replace the path to WEIGHTS: your_desired_folder/model_final.pth

  5. Save it and run the code it works!

But there is a small work around I think before you do this (if you have not done)
iopath work around

https://github.com/Layout-Parser/layout-parser/issues/15 (Github link to the issue)

Hope this helps!

A fix for this is to clone and modify iopath, since its a Windows file naming error.
So clone iopath (this version worked for me)

git clone https://github.com/facebookresearch/iopath --single-branch --branch v0.1.8

Change iopath/iopath/common/file_io.py; class: HTTPURLHandler; method: _get_local_path; line 753 (line number might change with newer versions)

from this:

filename = path.split("/")[-1]

to this:

filename = parsed_url.path.split("/")[-1]

Then call:

pip install -e iopath

And it will work.

Also detectron2v0.4 didnt work. Installed the latest one via:

python -m pip install "git+https://github.com/facebookresearch/detectron2.git"

The detectron2 installation will overwrite your modified iopath version because of dependency conflict, so its best to install iopath last. But it should work ok.

Update: It should also work with iopath v0.1.7, but i mentioned the newer one because it is in their GitHub tag release

for original answer click here

Answered By: keshav khanal