PyTorch throws OSError on Detectron2LayoutModel()
Question:
I’ve been trying to read pdf pages as an image, for extraction purposes.
I found that layoutparser serves this purpose by identifying blocks of text. However, when I try to Create a Detectron2-based Layout Detection Model
, I encounter the following error:
codeblock:
model = lp.Detectron2LayoutModel(
config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8]
)
error:
OSError Traceback (most recent call last)
<ipython-input-16-893fdc4d537c> in <module>
2 config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
3 label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
----> 4 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8]
5 )
6
.
.
.
d:softwarespython3libsite-packagesportalockerutils.py in _get_fh(self)
269 def _get_fh(self) -> typing.IO:
270 '''Get a new filehandle'''
--> 271 return open(self.filename, self.mode, **self.file_open_kwargs)
272
273 def _get_lock(self, fh: typing.IO) -> typing.IO:
OSError: [Errno 22] Invalid argument: 'C:\Users\user/.torch/iopath_cache\s/nau5ut6zgthunil\config.yaml?dl=1.lock'
I checked the destination path folder, and surprisingly, there is no config.yaml
file, which can be the reason why the error shows up. I tried uninstalling and re-installing PyTorch in anticipation that the .yaml files would be installed correctly. Unfortunately, the problem remains the same.
I would appreciate a solution for this, or an alternative suggestion if exists.
Answers:
The config.yaml
basically only has configurations for the model as well as a URL for downloading the model weights. I’m not sure why it isn’t automatically downloading for you, but you can also download them from the model zoo page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html
The one you’re looking for is mask_rcnn_X_101_32x8d_FPN_3x
trained on PubLayNet
. Once you have downloaded the yaml file you can use the same code snippet, only changing the path.
model = lp.Detectron2LayoutModel(config_path='path/to/config.yaml', ...)
I found the solution as adding the congif path of tesseract.exe to pytesseract_cmd for running CLI behind on jupyter:
pytesseract.pytesseract.tesseract_cmd = r'pathtofolderTesseract_OCRtesseract.exe'
Then calling the Detectron2Model didn’t throw error.
Referred to this thread Pytesseract : “TesseractNotFound Error: tesseract is not installed or it’s not in your path”, how do I fix this?
Even I got a similar error. I tried out manually some work around in Windows.
I am using your case as example: OSError: [Errno 22] Invalid argument: ‘C:Usersuser/.torch/iopath_caches/nau5ut6zgthunilconfig.yaml?dl=1.lock’
Please follow the following process.
-
Navigate to C:Usersuser/.torch/iopath_caches/nau5ut6zgthunilconfig.yaml
Open that config.yaml file
-
Scroll down to WEIGHTS: https://www.dropbox.com/s/h7th27jfv19rxiy/model_final.pth?dl=1 should be around 265 line.
-
Copy that link and paste it in your browser, a ‘model_final.pth’ will be downloaded. Copy this file to your desired folder.
-
Now replace the path to WEIGHTS: your_desired_folder/model_final.pth
-
Save it and run the code it works!
But there is a small work around I think before you do this (if you have not done)
iopath work around
https://github.com/Layout-Parser/layout-parser/issues/15 (Github link to the issue)
Hope this helps!
A fix for this is to clone and modify iopath, since its a Windows file naming error.
So clone iopath (this version worked for me)
git clone https://github.com/facebookresearch/iopath --single-branch --branch v0.1.8
Change iopath/iopath/common/file_io.py; class: HTTPURLHandler; method: _get_local_path; line 753 (line number might change with newer versions)
from this:
filename = path.split("/")[-1]
to this:
filename = parsed_url.path.split("/")[-1]
Then call:
pip install -e iopath
And it will work.
Also detectron2v0.4 didnt work. Installed the latest one via:
python -m pip install "git+https://github.com/facebookresearch/detectron2.git"
The detectron2 installation will overwrite your modified iopath version because of dependency conflict, so its best to install iopath last. But it should work ok.
Update: It should also work with iopath v0.1.7, but i mentioned the newer one because it is in their GitHub tag release
for original answer click here
I’ve been trying to read pdf pages as an image, for extraction purposes.
I found that layoutparser serves this purpose by identifying blocks of text. However, when I try to Create a Detectron2-based Layout Detection Model
, I encounter the following error:
codeblock:
model = lp.Detectron2LayoutModel(
config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8]
)
error:
OSError Traceback (most recent call last)
<ipython-input-16-893fdc4d537c> in <module>
2 config_path ='lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config',
3 label_map = {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"},
----> 4 extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8]
5 )
6
.
.
.
d:softwarespython3libsite-packagesportalockerutils.py in _get_fh(self)
269 def _get_fh(self) -> typing.IO:
270 '''Get a new filehandle'''
--> 271 return open(self.filename, self.mode, **self.file_open_kwargs)
272
273 def _get_lock(self, fh: typing.IO) -> typing.IO:
OSError: [Errno 22] Invalid argument: 'C:\Users\user/.torch/iopath_cache\s/nau5ut6zgthunil\config.yaml?dl=1.lock'
I checked the destination path folder, and surprisingly, there is no config.yaml
file, which can be the reason why the error shows up. I tried uninstalling and re-installing PyTorch in anticipation that the .yaml files would be installed correctly. Unfortunately, the problem remains the same.
I would appreciate a solution for this, or an alternative suggestion if exists.
The config.yaml
basically only has configurations for the model as well as a URL for downloading the model weights. I’m not sure why it isn’t automatically downloading for you, but you can also download them from the model zoo page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html
The one you’re looking for is mask_rcnn_X_101_32x8d_FPN_3x
trained on PubLayNet
. Once you have downloaded the yaml file you can use the same code snippet, only changing the path.
model = lp.Detectron2LayoutModel(config_path='path/to/config.yaml', ...)
I found the solution as adding the congif path of tesseract.exe to pytesseract_cmd for running CLI behind on jupyter:
pytesseract.pytesseract.tesseract_cmd = r'pathtofolderTesseract_OCRtesseract.exe'
Then calling the Detectron2Model didn’t throw error.
Referred to this thread Pytesseract : “TesseractNotFound Error: tesseract is not installed or it’s not in your path”, how do I fix this?
Even I got a similar error. I tried out manually some work around in Windows.
I am using your case as example: OSError: [Errno 22] Invalid argument: ‘C:Usersuser/.torch/iopath_caches/nau5ut6zgthunilconfig.yaml?dl=1.lock’
Please follow the following process.
-
Navigate to C:Usersuser/.torch/iopath_caches/nau5ut6zgthunilconfig.yaml
Open that config.yaml file -
Scroll down to WEIGHTS: https://www.dropbox.com/s/h7th27jfv19rxiy/model_final.pth?dl=1 should be around 265 line.
-
Copy that link and paste it in your browser, a ‘model_final.pth’ will be downloaded. Copy this file to your desired folder.
-
Now replace the path to WEIGHTS: your_desired_folder/model_final.pth
-
Save it and run the code it works!
But there is a small work around I think before you do this (if you have not done)
iopath work around
https://github.com/Layout-Parser/layout-parser/issues/15 (Github link to the issue)
Hope this helps!
A fix for this is to clone and modify iopath, since its a Windows file naming error.
So clone iopath (this version worked for me)
git clone https://github.com/facebookresearch/iopath --single-branch --branch v0.1.8
Change iopath/iopath/common/file_io.py; class: HTTPURLHandler; method: _get_local_path; line 753 (line number might change with newer versions)
from this:
filename = path.split("/")[-1]
to this:
filename = parsed_url.path.split("/")[-1]
Then call:
pip install -e iopath
And it will work.
Also detectron2v0.4 didnt work. Installed the latest one via:
python -m pip install "git+https://github.com/facebookresearch/detectron2.git"
The detectron2 installation will overwrite your modified iopath version because of dependency conflict, so its best to install iopath last. But it should work ok.
Update: It should also work with iopath v0.1.7, but i mentioned the newer one because it is in their GitHub tag release
for original answer click here