Python: ImportError: lxml not found, please install it

Question:

I have the following code (in PyCharm (MacOS)):

import pandas as pd

fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')

print(fiddy_states)

And I get the following error:

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6 /Users/user_name/PycharmProjects/PandasTest/Doc3.py
Traceback (most recent call last):
  File "/Users/user_name/PycharmProjects/PandasTest/Doc3.py", line 9, in <module>
    fiddy_states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 906, in read_html
    keep_default_na=keep_default_na)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 733, in _parse
    parser = _parser_dispatch(flav)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pandas/io/html.py", line 693, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

In Anaconda does appear installed the last version of lxml (3.8.0). Despite of that, I have tried to reinstall it by: 1) writing pip install lxml and 2) downloading the lxml wheel corresponding to my python version (lxml-3.8.0-cp36-cp36m-win_amd64.whl), but in any case all remains the same (in the second case I get that it is not a supported wheel on this platform, even though the version of python is correct (3.6, 64 bits)).

I’ve read similar questions here (even with the same code above, since it’s from a tutorial), but the problem still persists.

Asked By: asd

||

Answers:

Based on the fact that the error is:

/Library/Frameworks/Python.framework/Versions/3.6/bin/python3.6

This means that you are working with . Now usually the package manager for is pip3. So you probably should install it with:

pip3 install lxml
Answered By: Willem Van Onsem

I got same error, it seems that my python3 was pointing to pandas in python2 (since I have not install pandas in python3). After doing pip3 install pandas and restarting a notebook, it worked fine.

Answered By: Ruxi Zhang

You can go to Settings > Project Interpreter > Click on ‘+’ icon
Find ‘lxml’ from the list of packages and click ‘Install Package’ button found below.

I am using PyCharm 2019.2.1 (Community Edition)
Build #PC-192.6262.63, built on August 22, 2019
Runtime version: 11.0.3+12-b304.39 amd64
VM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
Linux 4.15.0-58-generic
GC: ParNew, ConcurrentMarkSweep
Memory: 937M
Cores: 4

Answered By: Krish PG
  1. you may have to (re)install some of your libraries pip install lxml bs4 html5lib

  2. pd.read_html() reads with ‘lxml’ library by default, so try another library that you installed above like pd.read_html(some_url, flavor='html5lib')

Answered By: Artur

For people reached here using Jupyter notebook, I restarted the kernel after pip install lxml and the error is gone.

Answered By: EasonL

I tried to reinstall lxml without any progress.

I ended uninstalling pandas and reinstalling and updating and that solved my issues!

pip uninstall pandas  
pip install pandas
pip3 install --upgrade pandas
Answered By: Heidar Jon

I got the same error when trying to run some code that was using pandas. I tried some suggestions here but those did not work. Finally, what worked for me was the following two steps :

conda update anaconda
conda install spyder=5.0.5

Now when I restarted Spyder and ran my code it worked fine.

I have just installed and starting using anaconda so I don’t know the root cause of this issue, but my guess is there seemed to be some "cross-connection" in the packages I had installed prior to my installation of Anaconda, and by running the above two steps now everything is running from within the Anaconda environment.

Answered By: yankeemike

This error occurs when lxml is not installed, so just go to the terminal
and run: pip3 install lxml

Terminal Image

Answered By: ashusharma

I installed lxml 4.9.1, but it didn’t work. So I tried to install lxml 4.8.0 instead, and it worked!

pip install lxml==4.8
Answered By: Bear77777

I got the same problem. Trying to reinstall lxml does not work. After rereading the error message and tracing the error ~Miniconda3envsmini_dslibsite-packagespandasiohtml.py:872, I think I found the problem lies in the function _importers() in ~/pandas/io/html.py.

Here is the function:

def _importers() -> None:
    # import things we need
    # but make this done on a first use basis

    global _IMPORTS
    if _IMPORTS:
        return

    global _HAS_BS4, _HAS_LXML, _HAS_HTML5LIB
    bs4 = import_optional_dependency("bs4", errors="ignore")
    _HAS_BS4 = bs4 is not None

    lxml = import_optional_dependency("lxml.etree", errors="ignore")
    _HAS_LXML = lxml is not None

    html5lib = import_optional_dependency("html5lib", errors="ignore")
    _HAS_HTML5LIB = html5lib is not None

    _IMPORTS = True

You can see that for lxml option, it actually tries importing "lxml.etree" instead of "lxml". So this is probably why reinstalling "lxml" would not help.

Conclusion, I think this is perhaps a problem of pandas version (mine is 1.4.1). For me, a quick solution is to specify the flavor =’html5lib’ in pd.read_html().

Answered By: H T

As OP is using Anaconda, in order to solve that issue, install lxml by opening the CMD.Exe Prompt for the environment one is working on, and run

conda install -c anaconda lxml

(Source)

One can also do it by specifying the version as follows

conda install -c anaconda lxml=4.8.0

Notes:

  • pip doesn’t manage dependencies the same way conda does and can, potentially, damage one’s installation. Therefore, would recommend to use it only if conda doesn’t work.

    pip install lxml
    
    # or
    
    pip install lxml==4.9.1
    
  • If one is using pip and one has already the package installed and one is getting errors, one can pass -I (--ignore-installed) and -v as follows

    pip install -Iv lxml==4.9.1
    
  • lxml official documentation can be found here.

  • This is their official GitHub repo.

Answered By: Gonçalo Peres

I was seeing this issue as well on my RPi.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 1113, in read_html
    displayed_only=displayed_only,
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 902, in _parse
    parser = _parser_dispatch(flav)
  File "/home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py", line 859, in _parser_dispatch
    raise ImportError("lxml not found, please install it")
ImportError: lxml not found, please install it

Looking into /home/pi/python3-ml/lib/python3.7/site-packages/pandas/io/html.py it was attempting to use lxml.etree, so I attempted to just use that module

>>> from lxml import etree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: libxslt.so.1: cannot open shared object file: No such file or directory

I searched for that error and found that the following packages needed to be installed on the RPi

sudo apt-get install libxslt

After installing I was successfully able to use pandas

Answered By: Geo99M6Z
`import pandas as pd
 from urllibenter code here.request import Request, urlopen

url = 'WEB-SITE'
request_site = Request(url, headers={"User-Agent": "Mozilla/5.0"})
webpage = urlopen(request_site)
dfk1 = pd.read_html(webpage, flavor='html5lib')
print(dfk1)`
Answered By: Yuriy
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.