Iterations not storing the correct value

Question:

I am making a row vector that contains valency and bond information, an multiplying it by an adjacency matrix to make a valency connectivity matrix, however I am having issues with my code. My issue is quite strange with this code. It works for most atoms, but has issues with atoms such as Cl and Br. My code is as follows.

from rdkit import Chem
import numpy as np
import pandas as pd

def smiles_to_val_matrix(smiles):
    mol = Chem.MolFromSmiles(smiles)
    num_atoms = mol.GetNumAtoms()
    conn_matrix = np.zeros((num_atoms, num_atoms))
    for i in range(num_atoms):
        symbol = mol.GetAtomWithIdx(i).GetSymbol()
        if symbol == 'C':
            atomic_num = 6
        elif symbol == 'N':
            atomic_num = 7
        elif symbol == 'O':
            atomic_num = 8
        elif symbol == 'F':
            atomic_num = 9
        elif symbol == 'P':
            atomic_num = 15
        elif symbol == 'S':
            atomic_num = 16
        elif symbol == 'Cl':
            atomic_num = 17
        elif symbol == 'Fe':
            atomic_num = 26
        elif symbol == 'B':
            atomic_num = 5
        elif symbol == 'I':
            atomic_num = 53
        elif symbol == 'Br':
            atomic_num = 35
        elif symbol == 'Li':
            atomic_num = 3
        elif symbol == 'K':
            atomic_num = 19
        if symbol == 'C':
            valence = 4
            possible_hydrogen = 4
        elif symbol == 'N':
            valence = 5
            possible_hydrogen = 3
        elif symbol == 'O':
            valence = 6
            possible_hydrogen = 2
        elif symbol == 'F':
            valence = 7
            possible_hydrogen = 1
        elif symbol == 'P':       
            valence = 5
            possible_hydrogen = 3     #Not necessarily the case all the time. try different method for S and metals
        elif symbol == 'S':
            valence = 6
            possible_hydrogen = 2
        elif symbol == 'Cl':
            valence = 7
            possible_hydrogen = 1
        elif symbol == 'Fe':
            valence = 8
            possible_hydrogen = 0
        elif symbol == 'B':
            valence = 3
            possible_hydrogen = 1
        elif symbol == 'I':
            valence = 7
            possible_hydrogen = 1
        elif symbol == 'Br':
            valence = 7
            possible_hydrogen = 1
        elif symbol == 'Li':    
            valence = 1
            possible_hydrogen = 0
        elif symbol == 'K':
            valence = 1
            possible_hydrogen = 0
        for j in range(i+1, num_atoms):
            bond = mol.GetBondBetweenAtoms(i, j)
            if bond is not None:
                conn_matrix[i, j] = bond.GetBondTypeAsDouble() / (atomic_num - valence - 1)
                conn_matrix[j, i] = bond.GetBondTypeAsDouble() / (atomic_num - valence - 1)
        conn_matrix[i, i] = (valence - possible_hydrogen) / (atomic_num - valence - 1)
    row_sum_matrix = np.sum(conn_matrix, axis=1)
    return row_sum_matrix


def smiles_to_conn_matrix(smiles):
    mol = Chem.MolFromSmiles(smiles)
    num_atoms = mol.GetNumAtoms()
    conn_matrix = np.zeros((num_atoms, num_atoms), dtype=np.float32)
    for bond in mol.GetBonds():
        i = bond.GetBeginAtomIdx()
        j = bond.GetEndAtomIdx()
        conn_matrix[i, j] = 1
        conn_matrix[j, i] = 1
    return conn_matrix

def randic_index(A):
    n = A.shape[0]
    R = 0
    for i in range(n):
        for j in range(n):
            if A[i,j] != 0:
                deg_i = np.sum(A[i,:])
                deg_j = np.sum(A[:,j])
                R += 1 / np.sqrt(deg_i * deg_j)
    return  R


smiles = "CCCCCl"
V = smiles_to_val_matrix(smiles)
VT = V.reshape(-1,1)
A = smiles_to_conn_matrix(smiles)

Z = A * VT
print (V)
print (Z)

FINAL = randic_index(Z)
print (FINAL)

The output I get is…

[1.         2.         2.         2.         1.66666667]
[[0.         1.         0.         0.         0.        ]
 [2.         0.         2.         0.         0.        ]
 [0.         2.         0.         2.         0.        ]
 [0.         0.         2.         0.         2.        ]
 [0.         0.         0.         1.66666667 0.        ]]
2.7387685863824784

The value in this matrix that corresponds to Cl, seen in the smile input of the code, is 1.6666.
This is incorrect as when we follow the maths in the code, the bond.GetBondTypeAsDouble() value is 1 which will eliminate a potential hydrogen bond. This means the values will be ((7-1)+1)/(17-7-1)=7/9
not 1.6666.

I’ve been trying to figure this out for a while, but been unable to realise how the code is making a mistake with only the larger atoms.

Only issue is in the line (conn_matrix[i, j] = bond.GetBondTypeAsDouble() / (atomic_num - valence - 1)), it should be 1/9 so when I sum a row it would be conn_matrix[i , i] + conn_matrix[i j] which should be 6/9+1/9=7/9. I’m suspecting the division in the conn_matrix[i,j] line is not being done properly

After some editing, the conn_matrix[i,j], line changes with values. For example, if I divide by 2 it actually works. But for some reason the equation isn’t computed when using the variables. It either sticks with the first values which make it equal to 1 (i.e. C), or isn’t being used at all.

Asked By: YZman

||

Answers:

You are getting 0.6667 for :

conn_matrix[i, i] = (valence - possible_hydrogen) / (atomic_num - valence - 1)

in your very last loop. For Cl, valence = 7, poss Hyd = 1, At Numb = 17. So (valence – possible_hydrogen) = 6, (atomic_num – valence – 1) = 9. So conn_matrix[i, i] = 0.6667. Thus when you sum the rows you get 1.6667.

Also, your code could be simplified significantly. Rather than use two long elif statements, there’s no reason you can’t combine into one:

if symbol == 'C':
    atomic_num = 6
    valence = 4
    possible_hydrogen = 4
elif symbol == 'N':
    atomic_num = 7
    valence = 5
    possible_hydrogen = 3
elif
   ....

Better still create a Dataframe with the element Atomic number as an index and have columns for Element Code, Poss Hydrogen and Valence. Then you just need to look up the Atomic code in the Dataframe.

import pandas as pd

df = pd.DataFrame([['C', 6, 4, 4], ['N', 7, 5, 3], ['O', 8, 6, 2]], columns=['Element', 'Atomic Number', 'Valence', 'Poss Hydrogen']).set_index('Atomic Number')

then you can just use i in your loop to call-up the data:

atomic_numb = i
valence = df.loc[i, 'Valence']
poss_hyd = df.loc[i, 'Possible Hydrogen']

and so on. This will dispense with the very long elif statements. I would even suspect there is some python module out there that would allow you to import the periodic table already as a dataframe, although I haven’t checked. Might be worth looking into.

Answered By: Galo do Leste

I realised the mistake was that I needed to redo the symbol definition for the j elements. It took the values of the i atoms, so it needed to be redefined in j.

Thanks

Answered By: YZman
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.