Installing RDKit in Google Colab

Question:

I cannot figure out how to fix the following issue. Up until today I was using the following code snippet for installing RDKit in Google Colab:

!wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
!chmod +x Miniconda3-latest-Linux-x86_64.sh
!time bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /usr/local
!time conda install -q -y -c conda-forge rdkit

import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

However, today I started to get the following error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-2-d24c24e2d1f9> in <module>()
----> 1 from rdkit import Chem
      2 import networkx as nx

ModuleNotFoundError: No module named 'rdkit'

I’ve tried using the full Anaconda distribution instead of Miniconda, as well as changing the python version to 3.6 and 3.8 but nothing seems to work.

Asked By: Alderson

||

Answers:

I think you need to specify python 3.7 when you install Miniconda (the current rdkit build supports python 3.7), the latest Miniconda version is py3.8:

!wget -c https://repo.continuum.io/miniconda/Miniconda3-py37_4.8.3-Linux-x86_64.sh
!chmod +x Miniconda3-py37_4.8.3-Linux-x86_64.sh
!time bash ./Miniconda3-py37_4.8.3-Linux-x86_64.sh -b -f -p /usr/local
!time conda install -q -y -c conda-forge rdkit

import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

https://colab.research.google.com/drive/1MAZyv3O4-TrI8c1MD4JVmwExDquaprRT?usp=sharing

Answered By: Oliver Scott

If you want to avoid installing Conda, you can just extract the anaconda package

# version 2018 is quite easy
# download & extract
url = 'https://anaconda.org/rdkit/rdkit/2018.09.1.0/download/linux-64/rdkit-2018.09.1.0-py36h71b666b_1.tar.bz2'
!curl -L $url | tar xj lib
# move to python packages directory
!mv lib/python3.6/site-packages/rdkit /usr/local/lib/python3.6/dist-packages/
x86 = '/usr/lib/x86_64-linux-gnu'
!mv lib/*.so.* $x86/
# rdkit need libboost_python3.so.1.65.1
!ln -s $x86/libboost_python3-py36.so.1.65.1 $x86/libboost_python3.so.1.65.1

For the latest version, it’s a bit more complicate due to libboost 1.67. So, I put it in my kora library.

!pip install kora -q
import kora.install.rdkit

You’ll get version 2020.09.1

Answered By: korakot

I created a Python package to simplify the setup. You can find it here.

It will install Miniconda (or any other flavour) and patch a couple things that make Colab tricky.

Use it like this (first cell in your notebook):

!pip install -q condacolab
import condacolab
condacolab.install()

The kernel will restart and then you will be able to run conda or mamba with the !shell syntax:

!mamba install -c conda-forge rdkit

Check the repository for more details!

First, you can install condacolab in Colab like below.

!pip install -q condacolab
import condacolab
condacolab.install()

Then you can install rdkit by using conda syntax.

!conda install -c rdkit rdkit

If you follow these steps it will work completely properly

Answered By: drorhun