__init__() got an unexpected keyword argument 'cachedir' when importing top2vec

Question:

I keep getting this error when importing top2vec.

TypeError                                 Traceback (most recent call last)
Cell In [1], line 1
----> 1 from top2vec import Top2Vec

File ~AppDataRoamingPythonPython39site-packagestop2vec__init__.py:1
----> 1 from top2vec.Top2Vec import Top2Vec
      3 __version__ = '1.0.27'

File ~AppDataRoamingPythonPython39site-packagestop2vecTop2Vec.py:12
     10 from gensim.models.phrases import Phrases
     11 import umap
---> 12 import hdbscan
     13 from wordcloud import WordCloud
     14 import matplotlib.pyplot as plt

File ~AppDataRoamingPythonPython39site-packageshdbscan__init__.py:1
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
      2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
      3 from .validity import validity_index

File ~AppDataRoamingPythonPython39site-packageshdbscanhdbscan_.py:509
    494         row_indices = np.where(np.isfinite(matrix).sum(axis=1) == matrix.shape[1])[0]
    495     return row_indices
    498 def hdbscan(
    499     X,
    500     min_cluster_size=5,
    501     min_samples=None,
    502     alpha=1.0,
    503     cluster_selection_epsilon=0.0,
    504     max_cluster_size=0,
    505     metric="minkowski",
    506     p=2,
    507     leaf_size=40,
    508     algorithm="best",
--> 509     memory=Memory(cachedir=None, verbose=0),
    510     approx_min_span_tree=True,
    511     gen_min_span_tree=False,
    512     core_dist_n_jobs=4,
    513     cluster_selection_method="eom",
    514     allow_single_cluster=False,
    515     match_reference_implementation=False,
    516     **kwargs
    517 ):
    518     """Perform HDBSCAN clustering from a vector array or distance matrix.
    519 
    520    Parameters
   (...)
    672        Density-based Cluster Selection. arxiv preprint 1911.02282.
    673    """
    674     if min_samples is None:

TypeError: __init__() got an unexpected keyword argument 'cachedir'

Python version: 3.9.7 (64-bit)

Have installed MSBuild

No errors when pip installing this package

Does anyone know a solution to this problem or experienced a similar problem?

Answers:

  • UPDATE 12 November 2022:

There is new release (ver. 0.8.29) of hdbscan from 31 Oct. 2022 that fix the issue. See my original answer for more details.

Original Answer:

It looks like you are using latest (as of 23 Sept 2022) versions of hdbscan and joblib packages available on PyPI.

cachedir was removed from joblib.Memory in commit on 2 Feb 2022 as depreciated. The latest version on PyPi is ver. 1.2.0 released on Sep 16, 2022, i.e. it incorporate this change

The relevant part of hdbscan source code on GitHub was updated on 16 Sept 2022. Unfortunately the latest (as of 23 Sept 2022) hdbscan release on PyPi is ver. 0.8.28 released on Feb 8, 2022 and still not updated. It still use memory=Memory(cachedir=None, verbose=0)

One possible solution is to force using joblib version before cachedir was removed – ver. 1.1.0 as of Oct 7, 2021. However note my edits below.

  • UPDATE 29 Sept 2022:

There are open issues on hdbscan repo (#563) and (#565).

Note there is vulnerability CVE-2022-21797 when using joblib < 1.2.0

  • UPDATE 12 November 2022:

There is new release (ver. 0.8.29) of hdbscan from 31 Oct. 2022.

Answered By: buran

Thank you! It worked for me. I downgraded the joblib package by using pip install --upgrade joblib==1.1.0; however, please be advised that this version of joblib has a known vulnerability of Arbitrary Code Execution via the pre_dispatch flag in Parallel() class due to the eval() statement. So, please use it with caution and not in production. Happy coding. 🙂

Answered By: Javid Jouzdani