2011-12-01

Free chemistry software - quantum chemistry

Free chemistry software - quantum chemistry
In the previous part of the journey into free chemistry software I wrote about molecular mechanics. The molecular mechanics view of a molecular system is a classical mechanical view i.e., the atoms move according to Newton's law.

Quantum mechanics revolutionized the view of the atomic world. The Schrödinger equation is a general model and number of parameters is limited. Solving a Schrödinger equation for an atomic or molecular system is often referred to an ab initio calculation. The word ab initio is latin and can be translated to from the beginning or  from first principles.

The Schrödinger equaiton - in particular the time-independent equation - can be used to calculate many different properties of chemical substances. Properties like enthalpy and entropy, and heat capacity can calculated as the partition function can be calculated from the solution. Moreover, it is possible to predict data for various spectra (IR, Raman, UV-VIS, etc.). The problem is that it is not possible to find an analytical solution for any non-trivial cases. The solutions to the equation are called wave functions. The Copenhagen interpretation of quantum mechanics is that the wave function represents the probably distribution (in space) of the electron.

In order to solve the (time independent) Schrödinger equation you must make two important approximations. The first approximation is that the nucleus of the atom is fixed in space, and the Schrödinger equation is reduced to only model the electrons. This approximation is called the Born-Oppenheimer approximation. The rationale behind the approximation is that the nucleus of an atom is much heavier that the electrons, and the nucleus moves much slower (this is probably a classical mechanical picture). The second approximation is that the wave function is a sum - or linear combination - of other functions. These functions form a basis set. The basis set is often atom orbitals (solution to the Schrödinger equation for a single atom), and they are often selected in much a manner that they require so few CPU cycles to process as possible. A popular family of basis set is the Slater-type orbitals.

There are many quantum chemistry packages in the world. From a license perspective, they fall in three groups. The first group is completely proprietary software. You buy the package (maybe the source code is included so you can compile it yourself) and can use it for the set of computers, you have bought a license for. It is not rare to find special pricing schemes for users in the academic world. Gaussian is a leading software vendor in this field, and a site licence (with UNIX source code) for commercial use is listed as roughly 40,000 USD. This might sound like a high price tag but you must remember that the development of a new drug can easily run in billions. Compared to that, the software is quite cheap - if you are in big pharma. Gaussian does provided must more that just a quantum chemistry calculation engine - it includes various supporting utilities. The group of free software packages is another group with a completely different licensing model. A typical example is MPQC (Massive Parallel Quantum Chemistry). Released under GNU General Public License it is a truly free software package. The motivation of the developers is to test new algorithm - in particular for the parallellization for using either compute clusters or multicore computers. Somewhere in between you find the third group. Software packages in this group are free to download (if you are in academia or using it for personal purposes) but you are allowed to redistribute the source code or binary. And often you must include a citation to a particular scientific paper if you publish any results based on the software. In this group you find packages like GAMESS (both US and UK versions). 


Gabedit
The program gabedit provides a uniform interface to many quantum chemistry packages including GAMESS-US, GAMESS-UK, Gaussian and MPQC. The program exists as a binary package for most Linux distributions including Ubuntu Linux and Debian GNU/Linux which makes the installation quite simple.

1,3,7-trimethylxanthine is a known (legal) psycho-active chemical. It is found naturally in coffee, tea, and soft drinks. The trivial (non systematic) name is cafferine. It is a small molecule with an aromatic ring structure.

Setting up a calculation in Gabedit
You can either draw your molecule or load a file. The structure of many molecules can be found at PubChem or similar services. A 3-dimensional structure of Cafferine can be found at PubChem. A 3-dimensional structure is nothing more that the coordinates of the nuclei of the atoms in the molecule. Using OpenBabel (discussed in previous blog entries), you can convert the PubChem files format to something that Gabedit can understand.

Setting up a calculation is simple but you are required to know and understand all the parameters and methods. For a causal user of quantum chemistry software, Gabedit might help you to allow to write configuration files manually, but the program is a thin wrapper only.

Monitoring a calculation
Once you have edited the calculation paramters, you are ready to go. It is possible to monitor the progress of your simulation but beware that quantum chemistry calculation might take hours. Gabedit will use the good old UNIX trick called nohup so you can come back later. Moreover, Gabedit let you perform the calculation on remote machines. This is useful for long calculations.

When the calculation has finished, Gabedit can assist you in analyzing the result. It is only a limited number of analyses offered, and if you need more advanced analysis, you will probably write small scripts and programs to help you.

Analyzing the result of a geometry optimization using Gabedit
The really good thing about Gabedit is that it provides a uniform interface to the most common quantum chemistry packages.

Other software packages exists e.g., Ghemical. Ghemical is fairly tight bound to GNOME, and tries to solve all needs for computational chemistry in one packages. The latest version of Ghemical is recent (October 2011).


MPQC
MPQC is a pure free software project which provides an ab initio package. It is designed to run in multicore or cluster environments i.e., in massive parallel environment. Ubuntu and Debian packages exists. Currently, two packages exist: the core program and supporting utilities. Even an Emacs mode can be found for editing the configuration files. The main mode of operation is that the user writes a configuration or input file and run the program from the command-line or through a batch system. The OpenBabel conversion utility supports MPQC and can write raw input files from structure file formats. This simplifies the usage a lot in the learning period.

In order to test MPQC, I have prepared an input file using Caffeine. As MPQC supports the calculation of thermodynamics properties (non-electronic enthalpy and entropy), my test input file will perform this analysis after the geometry optimization using the STO-6G basis set and a Hartree-Fock method. The Hartree-Fock method is only calculating the ground level state, and the thermodynamics will be somewhat inaccurate.

MPQC/Linux Speed-up
As MPQC is a parallel calculation program, it is worth analyzing the speed-up i.e., how well does scale as the number of core/CPUs increases. MPQC is parallellized using either MPI, SysV shared memory and Posix threads.

Running on multiple cores is simple with MPQC. You can specify the number of threads on the command-line. For a 4-thread calculation, the command-line is: mpqc -threadgrp "<PthreadThreadGrp>:(num_threads = 4) -o caffeine.out caffeine.in where caffeine.in is the input file and the output is found in caffeine.out.
As I have access to a hyperthreaded quad-core computer, I have tested the Posix thread parallellization (please remember that Linux has a highly optimized Posix thread implementation).

The result of my simple benchmark shows that hyperthreading is not a good idea in heavy computing environment - notice that the speed-up levels off at four cores. The reason is probably that the threads are competing for a limited resource: the floating-point units. A calculation of the Caffeine molecule took about 8 hours using 4 Posix threads, and this only calculated the ground state using the STO-6G basis set.

Summary
In general it is possible to carry out quantum chemistry calculation using free software if you can live without a fancy user interface. If you do many calculations, you might be happy to know that you can modify and extend the source code. For a causal computational chemist (maybe an organic chemist predicting spectras), free chemistry software might not be the solution yet.

2011-11-24

Saved by Dropbox

Saved by Dropbox


I have installed Dropbox on my laptops and my phone. On my laptops I use it for two purposes: synchronization and backup. Initially, I used it for synchronization of my private and my work laptop (both computers are running Linux).

The laptop at work is a bit unstable. It crashes probably 3-4 times a week, and I believe it has something to do with the temperature of the processor. It happened to me the other way, and an OpenOffice document was left in a state where the file did not contain anything else that zeroes.

Luckily, Dropbox keeps track on my revisions of the files. That means that I was able to go one revision back and recover much of my file (probably lost half a page). Needless to say, I'm pretty happy with the Dropbox service now!

2011-11-01

Emacsforum 2011

Emacsforum 2011
Peter Toft and I are in the process of preparing Emacsforum 2011 with some help by Troels Henriksen (at DIKU) and Keld Simonsen (from KLID). The program is almost ready for publication, so I will not say too much - but there will be something for scientists and developers. Even our Evil Twin will be represented.

The mini-conference takes place 12th November 2011 at DIKU. The is no conference fee - and there will be no benifits.

If you are using Emacs (and even XEmacs) and live in the Copenhagen area, Emacsforum is a good place to meet fellow users.

2011-10-29

Free chemistry software - molecular modeling


Molecular modeling is a very large and important field of chemistry. As computers have increased in raw computing power, the usage of computers to calculate molecular properties is not a specialized fields for the few. Today, every chemist can perform calculation for even large molecules.


Roughly speaking, you can divide the calculations in two separate groups: molecular mechanics and quantum chemistry. The first is based on classical mechanics, while the second group uses quantum mechanics as a underlaying model and equations. The software list at http://en.wikipedia.org/wiki/Molecular_modeling shows a wide range of offerings. Many of them are commercial and closed-source solutions.

Molecular mechanics calculations are used when no chemistry is going to. By definition, chemistry is the rearrangement of atoms - and that involves electrons. But molecular mechanics can be used to investigate how a molecule is solvent that is, how its structure changing when it is surrounded by a solvent like water.

Gromacs is one of the oldest and most successful molecular mechanics software suites. It is covered by the GNU General Public License (version 2), and most distributions like Debian GNU/Linux have a package of it. It does not come with a fancy user interface, and the user primarily interacts with Gromacs at the command line. It is a big advantage as some of the operations can take a long time. Seldomly, you sit by your computer and work with Gromacs. The typical usage is to write small shell scripts and run them as batch jobs.

Today, most supercomputers in the chemical industry and academia are Linux clusters which are build from commodity hardware. That means that a supercomputer is a distributed system where the individual processors are loosely coupled using Ethernet (or maybe InfiniBand). The MPI  framework is used by Gromacs to utilize such a supercomputer (if you have a SMP system, MPI can still be used for parallellization). Queuing systems (SUN Grid Engine, OpenPBS/Torque, etc.) schedule which batch job to execute, and the command-line nature of Gromacs comes to its rights on such systems.

Explaining all details of Gromacs is not the scope here. But let us a quick tour on how to use some of the many utilities and programs of Gromacs. The assignment is to take the experimentally determined structure a small biological active molecule and create a solvated version of the molecule. The structure found at the Protein DataBank is for a crystal, and IGF-1 (as most other molecules in your body) is in a solution where water is the solvent (remember, 60 % of you body is water). For the tour, the Insulin-like growth factor 1 (IGF-1) is chosen. IGF-1 is a small protein (or peptide) which is involved of the growth and regeneration of your body. You can download a file with the experimental structure from Protein DataBank.

First, you must pre-process the downloaded file into files used by Gromacs. In that process you decided the force field. The force field is the parametrization of the interaction between the atoms, and all calculations in Gromacs (and any other molecular mechanical program) are based on Newton's second law. In the command-line below, two files are generated (2GF1.gro and 2GF1.top).

pdb2gmx -f 2GF1.pdb -o 2GF1.gro -p 2GF1.top -ignh -ff G53a6

Now you have to edit the output file (2GF1.gro) in order to change box size. You can do an energy minimization and generate a solvation box using the commands (some steps might take some time):

mdrun -v -deffnm 2GF1-EM-vacuum -c 2GF1-EM-vacuum.gro
editconf -f 2GF1-EM-vacuum.gro -o 2GF1-PBC.gro -bt dodecahedron -d 1.2
genbox -cp 2GF1-PBC.gro -cs spc216.gro -p 2GF1.top -o 2GF1-water.gro

The final file is 2GF1-water.gro which is the biological molecule solvated in water. It might not sound as a great deal, but the file can be used in further simulation involving the solvated molecule.

Other molecular modeling packages exists. NAMD is a highly scalable molecular dynamics program. It is aimed at large molecules (proteins) and can utilize very large parallel computers. But NAMD is not free software as defined by Free Software Foundation. You can download it and use it for any non-commercial purpose.


2011-10-10

Free chemistry software - utilities

Free chemistry software - utilities


One of the major annoyances as chemists in front of computer is faced with is the vast number of file formats. The good news is that most file formats are text files so it is possible to reverse engineer them by looking at a number of examples. One open source project called OpenBabel tries to help chemists in converting between the formats (currently OpenBabel supports 113 file formats related to chemistry). Most Linux distributions have packages for OpenBabel, including Debian GNU/Linux (it's a version from 2009 you find in Debian stable). Converting a molecular structure of caffeine from one file format (SDF) to another (PDB) is simply done by the following command:babel -isdf caffeine.sdf -opdb caffeine.pdb

You can find many small molecules - with 3D structures, physical properties and toxicology data - at PubChem. For larger molecules (proteins mainly), you can go to the Protein Data Bank. The file for caffeine as used above can be found at PubChem.

OpenBabel project also includes a number of other utilities including a chemist's version of grep called obgrep (searching for molecules with a particular substructure within a database) and simple program to (energy) minimize a molecule called obminimize.

GNOME Chemistry Utils is a set of utilities developed for GNOME users. The set includes a calculator (for calculating the molecular mass of a molecule), the periodic table of the elements, and a spectrum viewer. The periodic table of the elements can give you the physical and chemical properties of all elements. Most chemists have a periodic table of elements close when working,
and having one on your desktop seems as a good idea.

Chemists do a lot of drawing: they draw structures of molecules. In can be regarded as a generic representation of a molecules 3-dimensional structure using a 2D paper. Understanding and drawing such chemical structures are an integral part of any chemist's education and chemists have used these drawing for more than 150 years (the discovery of the electron and the development of quantum mechanics changed the view of molecular structures). The 3-dimensional geometry is an important factor for determine the properties (reactivity, toxicology, color, etc.) of a molecule.

A drawing program for chemists is not hard to image. When it comes to free software, we are so
lucky that we have more than one. GNOME users can use the molecular drawing program from the GNOME Chem
istry Utils project. It is called GChemPaint. As GChemPaint can only load a rather small number of file formats, you really learn to use OpenBabel rather quickly. It is an easy program to work with, and it is possible to save your drawing in most used image formats (both bitmap and vector formats). You can then easily insert your drawing
in your favorite word processing software prior to publication (take publication rather broad: everything from a high-school report to a paper in Nature).

As already said, drawing programs for chemists are not hard to imagine. Other projects in this area include titles likes bkchem, chemtool, easychem, xdrawchem, jchempaint, molsKetch (probably stalled).

2011-10-07

Free chemistry software - Introduction

The year of 2011 has been declared the International Year of Chemistry by UNESCO (United Nations Educational, Scientific and Cultural Organization) and IUPAC (International Union of Pure and Applied Chemistry). The purpose of devoting a full year to chemistry is to spread the notion that chemistry is important for our daily life.

In this series of blog post I have examine the state of free software in chemistry. This first post is an outline of the usage of computers in chemistry.

It is hard to imagine modern life without the discoveries and developments done by chemists and chemical engineers over the last two or three centuries. Plastic, gasoline, and pharmaceuticals are products from the chemical industry. And forensic scientists use many chemical analysis in order to provide evidence for police investigations all over the world. But chemistry is more that an applied science. It also give us an insight to how our world works. In the recent decade, the modern cuisine has changed. For example, the cheif Heston Blumenthal has been using chemistry to create new dishes (this branch of chemistry is called molecular gastronomy).

As you can see, chemistry is a broad science and engineering discipline. Modern chemistry is divided into a number of branches. Traditionally, an academic education of chemistry consists of courses in general, organic, inorganic, physical and even analytical chemistry. Chemistry is a wet science, and as a student you spend a lot of time in laboratories. Amongst chemists, it is still discussed whether chemistry is a descriptive science (classification of observations) or an exact science (explaining observations).

Chemistry interfaces most other sciences, including physics, mathematics, statistics and biology. Quantum chemistry applies quantum mechanics to calculate properties of chemical substances. But as you might imagine, the three-body problem is a serious show-stopper for a chemist as very few molecules have only three nucleus and electrons.

Computers are heavily used in chemistry. One example is to perform quantum chemistry calculation as finding an analytical solution for a many-body problem is impossible. A rough break-down of the usage of computers in chemistry consists of three major areas. Firstly, you have the end-user applications used by every chemist. The applications are domain-specific applications - the domain is as broad as chemistry. The second area is chemoinformatics. It is a fairly young area (a decade or two only). Chemoinformatics applies techniques from informatics to transform chemical data to knowledge and thereby improving the decision making process. The usage of specialized databases and search algorithms is an integral part of chemoinformatics. Any non-trivial chemical compound can be represented in a number of ways. Even a small molecule like styrene can be named in different ways depending on how you look at it. Chemoinformaticians have introduced a string representation for all chemical compounds called the simplified molecular input line entry specification (SMILES). The SMILES code for styrene is C=CC1=CC=CC=C1. Image to find all compounds in your database with a certain substructure. You cannot use a regular expression or an SQL query. As molecules can be regarded as graphs (atoms are connected by chemical bonds), searching in chemical databases is a variant of find subgraphs. This is the core of chemoinformatics.
The third area where computers are used in chemistry is to perform calculations and it is often refered to as computational chemistry. It is an old area - calculation and simulation of properties of chemical compounds and reaction have been carried out as long as computers have been available to scientists. The calculations either use a classical-mechanical approach or a quantum mechanical approach. In the first approach, electrons is neglected and a force-field between the atoms are applied. This is possible to simulate large molecules using this approach. But if you need to predict the energy levels, thermodynamical properties, and charge distribution of a molecule, you have to use a quantum mechanical approach. This involves solving the time-independent Schrödinger equation (or at least an approximation to the equation called the Born-Oppenheimer approximation).

It is important to understand that most chemists are not educated as programmers. On the other hand, using computational techniques can save chemical industries huge fortunes. Today, most pharmaceutical companies have specialized departments for performing chemical calculations and supporting an informatics infrastructure. These departments are small in terms of man-power compared to the company as a whole. As the market is small and the potential benefits huge (time-to-market and saving expensive laboratory time), vendors often ask for very high license-fees. Vendors like Schrödinger, Wavefunction and Gaussian, and OpenEye offer software packages for chemists. Sadly, free software is a minor player in chemistry but you can find free chemistry software for most needs.