I often write code with python because it has lots of useful packages, documents and community. And first programming language which I learned is python.

So I've never wrote chemoinformatics code without python. But I have interested in coding with C++ / Rust because it works very fast.

Today, I tried to wrote code with C++ because I found great post about writing C++ chemoinformatics code from Greg! The URL is below.

https://greglandrum.github.io/rdkit-blog/tutorial/technical/2021/07/24/setting-up-a-cxx-dev-env.html

Computational cost of Finding Maximum common substructure(MCS) is high. RDKit has useful function FindMCS which is developed by Andrew Dalke. It's really cool function and I wonder if the code works faster when I run it from C++.

I wrote some code for getting MCS. OK let's see it....

At first, I wrote find_mcs.cpp. I'm new bee of c++ so most of code is came from Greg's post(I appreciate his great work).

 #include <unistd.h> #include <fcntl.h> #include <sys/time.h> #include <sys/resource.h> #include <cstdio> #include <cstring> #include <ctime> #include <string> #include <iostream> #include <GraphMol/RDKitBase.h> #include <GraphMol/FileParsers/FileParsers.h>  //MOL single molecule ! #include <GraphMol/FileParsers/MolSupplier.h>  //SDF #include <GraphMol/FMCS/FMCS.h> #include <GraphMol/FileParsers/MolSupplier.h>  using namespace RDKit;  void get_mcs(std::string pathName, unsigned int maxToDo,               std::vector<ROMOL_SPTR> &mols) {   RDKit::SDMolSupplier suppl(pathName, true, true, true);   unsigned int nDone = 0;   while (!suppl.atEnd() && (maxToDo <= 0 || nDone < maxToDo)) {     RDKit::ROMol *m = suppl.next();     if (!m) {       continue;     }     m->updatePropertyCache();     // the tautomer hash code uses conjugation info     MolOps::setConjugation(*m);     nDone += 1;     mols.push_back(ROMOL_SPTR((ROMol *)m));   }   std::cerr << "read: " << nDone << " mols." << std::endl;   MCSParameters p;   MCSResult res = findMCS(mols, &p);   std::cout << "\n"                 << "MCS: " << res.SmartsString << " " << res.NumAtoms                 << " atoms, " << res.NumBonds << " bonds\n";  }   int main(int argc, char *argv[]) {   std::vector<ROMOL_SPTR> mols;   get_mcs(argv[1], 10000, mols);  }  

Next, I worte CMakeLists.txt.

 cmake_minimum_required(VERSION 3.18)  project(simple_cxx_example) set(CMAKE_CXX_STANDARD 14) set(CMAKE_CXX_STANDARD_REQUIRED True)   find_package(RDKit REQUIRED) find_package(Boost COMPONENTS timer system REQUIRED) add_executable(find_mcs find_mcs.cpp) target_link_libraries(find_mcs RDKit::FileParsers RDKit::FMCS    Boost::timer) 

Now ready. Before compiling the code, I wrote python script to do same task as C++ code.

 import sys from rdkit import Chem from rdkit.Chem import rdFMCS  def get_mcs(mols):     mcs = rdFMCS.FindMCS(mols)     return mcs  if __name__=='__main__':     mols = Chem.SDMolSupplier(sys.argv[1])     mcs = get_mcs(mols)     print(mcs.smartsString) 

Almost done 😉 Next, I compiled cpp file and run it.

 $ mkdir build $ cd build $ cmake .. $ make find_mcs 

After running the code shown above, I could get find_mcs file.

OK let's compare C++ and Python code. I used cdk2.sdf as test file. As you can see find_mcs implemented c++ works faster than python implementation.

 (chemoinfo) iwatobipen@penguin:~/dev/sandbox/cpp/build$ time ./find_mcs cdk2.sdf  read: 47 mols.  MCS: [#6](:,-[#7]:,-[#6]):,-[#6]:,-[#6] 5 atoms, 4 bonds  real	0m0.025s user	0m0.017s sys	0m0.008s  (chemoinfo) iwatobipen@penguin:~/dev/sandbox/cpp/build$ time python find_mcs.py cdk2.sdf  [#6](:,-[#7]:,-[#6]):,-[#6]:,-[#6]  real	0m0.177s user	0m0.480s sys	0m1.368s  

It's interesting and exciting results for me. I'll learn c++ more and more.

Today's code is pushed my github repo. Any suggestions, advice are apprecaited.

https://github.com/iwatobipen/rdkit_cpp


This free site is ad-supported. Learn more