moil documentation - Center for Computational Life Sciences and

Transcription

MOIL DOCUMENTATION 19 April 2011
1
Table of Contents 2 Obtaining MOIL and Getting Started ..........................................................................4 3 Goals......................................................................................................................................4 4 Force field............................................................................................................................5 5 Units ......................................................................................................................................5 6 Typical order of events ...................................................................................................5 7 Directory structure ..........................................................................................................6 8 Basic syntax of the MOIL interpreter .........................................................................9 8.1 Overview................................................................................................................................... 9 8.2 Variable selections (integer, real, double, character, logical)..............................10 8.3 Syntax of file assignment...................................................................................................10 8.4 Syntax of the “pick” command.........................................................................................12 8.5 Comment lines ......................................................................................................................13 8.6 Line continuation.................................................................................................................13 8.7 The “action” and “*EOD” commands..............................................................................13 9 Brief description of MOIL modules.......................................................................... 15 9.1 Special characters in description of programs:.........................................................15 9.2 Major programs:...................................................................................................................15 9.3 Some noted options.............................................................................................................16 9.4 Utilities ....................................................................................................................................16 9.5 Analysis programs...............................................................................................................17 9.6 The moil.tcl, cmoil, and zmoil programs......................................................................18 9.7 Parameters for energy and force calculations - EPROP..........................................18 10 MOIL in depth ............................................................................................................... 21 10.1 The major programs ........................................................................................................21 10.1.1 conn.................................................................................................................................................. 21 10.1.2 puth .................................................................................................................................................. 24 10.1.3 energy ............................................................................................................................................. 25 10.1.4 mini_pwl and mini_tn ............................................................................................................... 27 10.1.5 dyna.................................................................................................................................................. 30 10.1.6 dyna_prl (parallel program, requires MPI)..................................................................... 34 10.1.7 freee ................................................................................................................................................. 34 10.1.8 therm ............................................................................................................................................... 36 10.1.9 mfep ................................................................................................................................................. 38 9.1.10 umbr ................................................................................................................................................ 42 9.1.11 sdel (parallel version, requires MPI) / sdelS (serial version) ............................... 43 9.1.12 sdp (parallel version, requires MPI) / sdpS (serial version) .............................. 46 9.1.13 chmin.............................................................................................................................................. 48 9.1.14 fp............................................................................................................................................................ 51 1
10.1.10 DiM................................................................................................................................................. 55 9.1.15 scndrv (and numerical) ............................................................................................................... 58 10.2 Major options......................................................................................................................60 10.2.1 LES .................................................................................................................................................... 60 10.2.2 MUTA............................................................................................................................................... 62 10.2.3 FREADY .......................................................................................................................................... 63 10.2.4 Double-‐well Elastic network model................................................................................... 64 10.2.5 LD (Langevin Dynamics)......................................................................................................... 65 10.2.6 dynapress ...................................................................................................................................... 69 10.2.7 PME .................................................................................................................................................. 69 10.2.8 dynapt (parallel program for parallel tempering, requires MPI) ......................... 71 10.3 Utilities .................................................................................................................................73 10.3.1 addion ............................................................................................................................................. 73 10.3.2 boat: ................................................................................................................................................. 74 10.3.3 ccrd................................................................................................................................................... 75 10.3.4 crd2pdb .......................................................................................................................................... 77 10.3.5 con_specl........................................................................................................................................ 78 10.3.6 memeqns ....................................................................................................................................... 79 10.3.7 reconstruct.................................................................................................................................... 80 10.3.8 path_eqw........................................................................................................................................ 81 10.3.9 ovrlp_trj.......................................................................................................................................... 82 10.3.10 Numerical ................................................................................................................................... 83 10.3.11 solvatecrd ................................................................................................................................... 83 10.3.12 pdb2puth..................................................................................................................................... 84 10.4 Analyses................................................................................................................................86 10.4.1 av_dif................................................................................................................................................ 86 10.4.2 Contacts.......................................................................................................................................... 86 10.4.3 dxdl................................................................................................................................................... 87 10.4.4 eff_difdens ..................................................................................................................................... 88 10.4.5 Fluc ................................................................................................................................................... 88 10.4.6 rgyr................................................................................................................................................... 89 10.4.7 rms_2crd ........................................................................................................................................ 90 10.4.8 rms_2path...................................................................................................................................... 90 10.4.9 rms_p2p.......................................................................................................................................... 91 10.4.10 rms_resd ...................................................................................................................................... 91 10.4.11 SuperTMscore........................................................................................................................... 92 10.4.12 superback ................................................................................................................................... 92 10.4.13 superrms ..................................................................................................................................... 93 10.4.14 str_measures ............................................................................................................................. 94 10.4.15 tmalign (Zhang and Skolnick 2005) ................................................................................ 95 10.4.16 Torstat.......................................................................................................................................... 96 10.4.17 xangle............................................................................................................................................ 97 10.4.18 xcrd................................................................................................................................................ 97 10.4.19 xtors .............................................................................................................................................. 98 11 MOIL files ....................................................................................................................... 99 11.1 monomer..............................................................................................................................99 11.2 property ............................................................................................................................ 102 11.3 poly...................................................................................................................................... 105 11.4 addbond............................................................................................................................. 106 2
11.5 edit ...................................................................................................................................... 107 11.6 connectivity...................................................................................................................... 107 11.7 Coordinates ...................................................................................................................... 108 11.7.1 PDB file interpretation in MOIL .........................................................................................108 11.7.2 CRD file .........................................................................................................................................108 11.7.3 DCD and DVD files....................................................................................................................110 11.7.4 PTH files .......................................................................................................................................111 11.7.5 wene and wmin files ...............................................................................................................111 11.8 Standard input and output in MOIL ......................................................................... 112 11.9 Other special files........................................................................................................... 112 12 Credit .............................................................................................................................113 13 References ...................................................................................................................114 3
2 Obtaining MOIL and Getting Started Moil is available either as a “packaged” version, comprised of source code with prebuilt
binaries for popular operating systems, or as source only via a public software version
control server from which the very latest source code may be retrieved at any time.
Instructions for both can be found at https://wiki.ices.utexas.edu/clsb/wiki. The present
document provides a reference for MOIL; the file get_started.pdf in the moil.doc folder is
a good place to start if you prefer to jump in immediately and begin performing
computations and visualizations and learn and work simultaneously.
3 Goals Moil is a suite of integrated modular (FORTRAN) programs to perform a variety of biomolecular calculations and simulations using molecular mechanic force fields. It takes as
input PDB files and performs energy, energy minimization, dynamics, free energy,
reaction path, kinetic, and thermodynamic calculations. These calculations help bridge
the gap between structure, dynamics, and function. Emphasis is made on unique features
developed in the Elber’s laboratory among which are reaction path calculations,
simulations of long time approximate trajectories, calculations of kinetics and
thermodynamics along reaction coordinates, and Locally Enhanced Sampling. All
programs are available through the command line interface.
A good selection of programs is also available through the moil.tcl menu-based graphic
interface that drives many of the MOIL modules. This menu-based interface, written in
tcl/tk, is referred to in this document in different names: moil.tcl, the “menu-based
interface”, the “graphical interface”, or the “GUI”, and it exists simply to provide a
menu-based paradigm for setting up required input and running the MOIL command-line
programs, sometimes to accomplish a series of tasks with one mouse-click. It should not
be confused with Zmoil, which is an OpenGL-based visualization program used to
visualize with 3D graphics molecules and trajectories, and is documented elsewhere.
Nevertheless, Zmoil is most easily accessed from moil.tcl.
The prime purpose of MOIL is an engine for HPC applications (High Performance
Computing) on clusters of computers. Therefore emphasis is made on the command line
usage and the syntax of concrete input files. The graphic interface is used in many cases
to generate these input files in more transparent way. However, the graphic interface
helps with the initial set up and in the analysis of the results and not in monitoring HPC
applications. Massively distributed or parallel applications are not supported through the
moil.tcl menu-based interface. A tutorial for the use of the graphic interface can be found
in the file get_started.pdf
4
4 Force field MOIL uses the OPLS (Jorgensen and Tiradorives 1988; Pranata, Wierschke et al. 1991;
Zichi 1995), OPLS-AA (Kaminski, Friesner et al. 2001) and AMBER force fields
(Hornak, Abel et al. 2006). Recent implementation of force fields of nucleic acids is also
available. Conversion to other force fields as CHARMM can be done externally by
converting database (text) files with no changes to source code. See section 9 on MOIL
files for more details. At present there is no source code support for electrostatic
polarization. However, MOIL supports the addition of charges off the positions of nuclei
using the vprt facility. MOIL also implements the coarse grained force field FREADY
(Majek and Elber 2009), and double Gaussian network model that allows for simple
calculations of conformational transition (Yang, Majek et al. 2009)
5 Units Moil units are angstrom (length), kcal/mol (energy), and atomic mass (mass of hydrogen
is equal to 1). Externally MOIL accepts time steps in picoseconds and converts these
times to internal time in MOIL. Temperature is expressed in Kelvin. Angles are input in
degrees.
6 Typical order of events MOIL applications for a particular system starts with the conn program, generating a data
structure necessary for energy and force calculations (The data structure is written into a
connectivity file customarily called *.wcon, where the “*” denotes a wild card for a
molecule-specific name). If the graphic interface moil.tcl is used then in parallel to the
generation of *.wcon file we may convert a PDB file to MOIL workable coordinate file
(customarily called *.crd). If the command line interface is used the *.crd file is
generated after *.wcon with the programs puth and (if a solvated molecule is desired)
solvatecrd. The files *.wcon and *.crd form the input core for any follow-up calculation
that uses the energy function of MOIL. See get_started.pdf in the moil.doc folder for a
description of generating .wcon and .crd files from a PDB file.
With *.wcon and *.crd files at hand numerous applications are possible. For example
energy minimization can proceed with mini_pwl, followed up with Molecular Dynamics
simulations with dyna. The output of the dyna program is a *.dcd file: a set of Cartesian
coordinates of the system written sequentially as a function of time.
5
The last step is analysis of the results. Here is where creativity might be tested, however a
few simple routine tools are available in MOIL. For example xcrd extracts the
coordinates of a selected group of atoms, making it easier to analyze and understand
subset of motions.
7 Directory structure After successfully installing the source code you should have the following directories in
the designated slot. The moil directory is at the head and under it you will find the files:
1. README that discusses installation issues
2. ReleaseNotes_moil11.doc with a few highlights of the current release (11).
3. version. This file contains information about the version of the code at the time it was
packaged from the subversion control system SVN. This file is only included in
packaged distributions; if instead you pulled the source code directly from our version
control system, the same information is available via the “svn info” command.
More importantly there are a few directories that store source code and databases. Going
down alphabetically (and not necessarily in order of importance) we have
make_distribution – this directory includes high level scripts to compile the code. For
example to compile the code on Mac OS/X go to the make_distribution directory and
simply type ./make_distro_osx. More information on how to compile the code can be
found at https://wiki.ices.utexas.edu/clsb/wiki/BuildingMoil
moil.amber – a set of perl scripts and inputs to convert from AMBER force field to the
MOIL representation. The subdirectory data includes output appropriate for use in moil
moil.crd – includes files of sizable water boxes to solvate biological molecules.
moil.doc – where the documentations of MOIL can be found.
moil.exe – where execution files are stored (ONLY in the Windows version).
moil.gui – where the graphic interface (moil.tcl) resides.
moil.input – where sample input files for different programs can be found. In practice it
is always easier to take existing sample input and to modify it for your own needs instead
of creating your own from scratch.
moil.mop – where the basic database files, monomers and properties, reside. For example
ALL.MONO stores primarily the properties of monomers (e.g. who are the atoms of the
residue ALA(nine) and how are they covalently connected)
moil.source – where all the source code files reside. For all operating systems (with the
exception of the Windows version) the execution files are in moil.source/exe directory.
Of particular significance is moil.source/COMMON that includes all the global shared
variables (via COMMON) between subroutines and programs. This is also the place
where the length of the program arrays is defined (in COMMON/LENGTH.BLOCK).
List of directories under moil.source follows
analysis – analysis programs
boat – Bond Angle and Torsions. Compute internal coordinates for analysis
ccrd – convert Cartesian coordinate files between formats
6
chain – The chmin program for computing reaction coordinate based on the SPW
functional (Czerminski and Elber 1990)
cmoil – the original c-language opengl-based visualization program for MOIL
comm – a set of communication subroutine for MPI parallel code
comm_dummy – dummy communication routines to allow for serial compilation
comm_t – communication for a special machine (Terra)
COMMON – where all the shared COMMON blocks (global variables) resides
connect – the source code for the program conn
coupledDyna – code to run multiple coupled copies of dyna
DEE – Dead End Elimination code (not operational)
dynamics –the source for straightforward molecular dynamics and Langevin
equation simulations
exe – all the moil execution files reside in this directory with the exception of the
Windows version ( programs are in moil/moil.exe for windows )
fp – the home directory of Milestoning source code (Faradjian and Elber 2004)
fready – the coarse grained model of proteins implemented into moil (Majek and
Elber 2009)
free_e – compute free energy differences by free energy perturbation method
along a reaction coordinate (Elber 1990)
GB – Generalized Born Surface Area code, Implemented according to (Onufriev,
Bashford et al. 2004)
generic –generic tools used by many programs, e.g. matrix diag.
intrprtr – line interpreter, extract expression and values from command line
LES – Locally Enhanced Sampling (Elber and Karplus 1990; Roitberg and Elber
1991; Simmerling and Elber 1994)
memeqns – memory equation solver (for Milestoning(Faradjian and Elber 2004))
mfep – minimum free energy path using a string method
mini_tn – minimization with truncated Newton-Raphson algorithm.
mini_pwl – minimization with conjugate-gradient algorithm with the Powell
restart option
muta – free energy differences between mutated molecules
path – general code for path calculation algorithms, it is used by both Sdel and
sdp programs
pot – potential energy and forces
prepcrd – solvate the solute in a box of water
puth – read PDB file, place missing hydrogens and write CRD file
s2d – second derivative of the potential
sdel – stochastic difference equation in length
sdp – compute the steepest descent path (parallel code)(Olender and Elber 1997)
sdpS – steepest descent path (serial code)
steep – steepest descent minimization
stochpath – sdet algorithm (not operational)
symm – periodic boundary symmetry operation
therm – thermodynamic integration for alchemical changes
tmalign – align two structures and compute TM score (Zhang and Skolnick code
(Zhang and Skolnick 2005), modified to add “trivial alignment” flag)
7
umbrella – umbrella sampling
vopt – vector operations
zmoil – a c++ opengl-based visualization program; successor to cmoil (see above)
moil.test – a significant number of tests that can be executed semi-automatically and also
a number of examples of how MOIL is used are provided. A useful script is run_tests.pl
which is a perl script that runs all the tests, or can be used to run a subset of tests with the
syntax “perl run_tests.pl test1 test2”. Specific tests reside in different directories (briefly
discussed below). Each test directory includes runme.bat file (which is a script to run the
test) and an Output directory that stores correct output files to be compared to files
generated during the test. Note that, despite the runme.bat filename, these are unix-style
scripts. The run_tests.pl will run these tests on a windows system by doing some syntax
translation as required.
Note that when running the tests, some “errors” detected are a result of floating-point
rounding and some due to different formatting. Manual examination of the outputs is
therefore advised before crying Wolf! Nevertheless, significant level of comparison is
done completely automatically which is a plus. Another advantage of the tests is that
“real” runs can be developed based on examples which in many cases is a lot easier than
reading manuals. A list of the tests/directories and a brief explanation follows
ad_map – compute adiabatic map (phi,psi) for a dipeptide
ala3 – a set of minimizations and dynamic calculations (including GBSA) for trialanine
bench_vp – simulation with virtual (no mass) charged particles (extension of the
force field)
connectivity – generate a connectivity file
diffdens – compute diffusion constant and density fluctuations of water from
dynamic simulation
dyna_ssbp – use of the spherical solvent boundary potential of Roux(Beglov and
Roux 1994)
eball – keeping a spherical constraint on water molecules
fp – Milestoning run
fready – coarse grained model of proteins
free_e – compute free energy difference
hb – simulate hemoglobin
memeqns – solve memory equation (last step of Milestoning)
metal – simulate metal wall (with image charges)
mini_tn – use truncated Newton-Raphson algorithm for energy minimization
more_water – simulate internal water in gramicidin
mult_pept – multiply a part of a peptide with Locally Enhanced Sampling (LES)
myo – simulate the classic (myoglobin)
nuc_acid – simulate nucleotides
path – run chmin boundary-value path calculation
pdb2CG – generate a coarse grained model from PDB structure
pep21_mini – simulate 21 amino acid peptide
8
prep_solv – solvate a solute (place it at the center of a water box while removing
overlapping water molecules)
pressure – monitoring pressure during simulations
prlltemp – replica exchange simulations
read_pdb – read PDB file, convert to CRD and generate a connectivity file.
s2d – generate second derivative matrix of the potential
Sdel – stochastic difference equation in length
Sdelave – stochastic difference equation with some variables thermally averaged.
sdp – compute the steepest descent path between two fixed points.
special – Landau Zener curve crossing
sto_sval – stochastic difference equation in time
str_measures – different shape measures
symm – periodic boundary condition of solvated system
symm2 – another example for periodic boundary condition simulation
symm2_prl – a parallel implementation of the periodic boundary condition
for straightforward MD
therm_cycle – thermodynamics integration
umbrella – umbrella sampling along a reaction coordinate
valdip – molecular dynamics simulations on valine dipeptide.
wfly – capture (and stop) evaporated water molecule for finite system simulations.
8 Basic syntax of the MOIL interpreter 8.1 Overview MOIL has a flexible line interpreter that picks variables from a command line in a
consistent way for a range of different MOIL programs. While there are few rules that
must be followed the text facilitates reasonably convenient use and building complex
inputs and relationships.
All commands and variables are case sensitive.
The basic structure includes file assignment at the beginning of the input file followed by
a list of initialization of different variables. Typically the series of input lines is
terminated by a single line with the keyword “action” suggesting that it is time to stop
reading and start doing. In some MOIL programs a secondary terminator of the command
line *EOD is needed. It is OK to have an extra *EOD at the end of the file since the
interpreter ignores extra material.
The line interpreter is used to read input lines prepared by the user as well as the data
files that determine the potential energy; the hope is that data files are correspondingly
(relatively) readable too. A typical structure of a moil input file is therefore something
like
~ This is a sample input
9
file …
file …
x=… y=... z=…
action
*EOD
8.2 Variable selections (integer, real, double, character, logical) Assignment of value to a variable is done via equal sign, setting an expression.
Expressions must be separated by space(s). For example x=5 y=3 means that the value of
5 will be assigned to x and y will be 3. Unacceptable expressions are x=5,y=3 or
x=5<tab>y=3 (<tab> is the character tab).
Some of the variables are designed to be integers so i=5 is legal but i=5.5 is not. The
variable names are case sensitive so do not flip between upper and lower cases. They are
typically letters and special characters. For example #ste=10 assigns 10 to #ste which is
the number of steps to execute a particular algorithm. The interpreter is reading at most 4
characters per variable so abcdef=5 is interpreted as abcd=5. Writing as input longer
names is sometimes useful to increase clarity. For example #step=10 has the same
meaning as the shorter expression above, or “action” can be written as “acti”.
In case of ambiguity (multiple assignments of the same variable), the last assignment
counts. For example “y=5 y=3” will assign 3 to y.
Number assignments are either integer, real, or double precision. When a variable is
assigned an integer we will denote it by [i], real number by [r], and double precision
number by [d] (double precision numbers are written (for example) as 1.0d1 which
means 10).
Character variables are between parenthesis, i.e. character=(….). For example
name=(input).
Logical variables are set on and off by their presence (no equal sign). For example prll
turns on the parallel option in the code.
An example from an input file with a collection of variables assigned is below:
#ste=6000 #equ=1000 info=1000 rmax=15.d0 ovlp #crd=1000 #vel=1000 #lis=2000
8.3 Syntax of file assignment The file assignment intends to provide the program an access to existing information and
a place to write output. Once a file is assigned it is read either immediately after the
10
assignment line or only after the action keyword is detected. The order of events usually
does not matter to the user. The command to assign a file to the project is
file file_type name=(file_name) rw_status format [more_options]
the “file” keyword tells the interpreter that this is a file assignment command and this line
must be interpreted accordingly.
“file_type” is the type of the file to be open. There are numerous types of files in MOIL;
some are used across many programs and some are specific to a particular module. Below
we mentioned the most common ones. More examples can be found in the description of
individual programs. Examples of file_type: prop – property file, mono – monomer file,
poly – polymerization file, addb – add bond, edit – edit connectivity, rcrd – read
coordinates, wcrd – write coordinates, wvel – write velocities, wmin – write
minimization results, wene – write energy output.
“name” is the name of the file. For example: name=(/usr/guest/guest_file). Note that the
file name must not include spaces since the interpreter uses spaces to break the command
line into expressions. Windows users who access files directly from the Desktop with
filenames including the folder “My Documents” and similar may suffer. For Windows
users it is probably better to start from C:\ and have no spaces in the name of the working
directory (the default install location for Windows is typically C:\Moil11 – installing to
the desktop is not recommended for the above reasons).
“rw_status” is the read/write status. File can be opened for a “read” status in which case it
must exist and it cannot be overwritten. It is opened at the beginning of the file. The
second most frequent option is “wovr” in which a file is open with a status “unknown”. If
it does not exist it will be opened and write will be performed. If it exists it will be
written over. It is also possible to open a file as “writ”. In that case the program exits with
“RED ALERT” if the file already exists. This option is for the true collectors who cannot
delete or write over existing files and insist on showing only growth in their disk usage.
“format” is the way the file is written. There are two basic formats in MOIL – text and
unformatted (FORTRAN style). If no keyword is provided then the default prevails
which is the text format. Otherwise the logical keyword “bina” should be added which
means write an unformatted (“binary”) file. MOIL expects certain format for different
type of files, so this is not really a free choice but a forced choice most of the time. For
example Dynamics Coordinate files must be written as unformatted files.
“more_options” are connected to specific programs. Some programs require more details
on the type of the file (besides being formatted or unformatted). For example the program
energy can accept DYNA and PATH keywords to indicate that dcd and pth format are
used (respectively). More details on the additional options will be given in specific
description of different programs.
11
8.4 Syntax of the “pick” command Moil has a pick subroutine that is used by essentially all modules in MOIL (the syntax of
pick in the moil.tcl menu-based interface to MOIL is slightly different and friendlier).
The basic command line for "pick" looks something like:
pick pick #prt 1 60 | chem mono ALA & chem prtc CA != #mon 2 3 done
The command line includes a series of selections interconnected by logical instructions. It
is read from left to right until "done" is detected. No brackets are allowed, and hopefully
this is not a too severe restriction. The “pick” command can take only part of a line and
other instructions may be included before the first “pick” or after the “done” (but not in
the middle of the pick segment).
On output the “pick” subroutine returns a vector of the length of the number of atoms that
contains zeroes (atoms not picked) and positive integers (selected). Most of the time the
positive integer is 1, however with the “group” keyword (see below) it is possible to
further partition the selected particles to different groups; each group with a different
integer value. The above vectors are invisible to the user and this paragraph is provided
as a general background for the operation of the pick command.
The double “pick” at the beginning is required unless stated otherwise. It is a little
confusing and perhaps will be fixed in the future, but for now we are stuck with it. It is
therefore useful to understand the source of the double “pick”. The first thing MOIL does
to an input line is to parse it to expressions by “rline”. If the keyword “pick” is found the
line is being forwarded to the pick subroutine. The pick subroutine needs to determine
where the “pick” starts and where it ends and here comes the second “pick” SO there is
one “pick” for rline and another “pick” for the pick subroutine.
The above line is interpreted from left to right sequentially (in brackets we wrote the
corresponding expression)
pick particles 1 to 60 (#prt 1 60), or (|) use chemical notation (chem) to pick all the
particles of the monomers (mono) alanine (ALA) anywhere along the sequence. The
particles selected so far are now subject to a test of .and. (&). From the set selected so far
pick according to chemical notation (chem) particles (prtc) that are called (CA), then you
should remove (!=) from the selection monomer numbers (#mono) from 2 (2) to 3 (3) and
this is all (done). The net result is the selection of CA of all ALAnine residues, as well as
CA found in the range of particles from 1 to 60 (whether from ALA or not), but
excluding any in the range of residues 2 to 3.
The pick line is of the format
pick A lexp B lexp C lexp ... done
where A-C are selection commands and lexp are logical expressions
12
available keywords:
pick - beginning of a pick line. If not found (remember the double “pick”), the subroutine
returns with default selection (all particles selected)
done - denotes end of selection, time to return the selection made.
logical expressions :
A | B - standing for A U B (A union B)
A & B - standing for A /\ B (common parts of A and B are kept)
A != B - only parts of A that are not in B are kept.
selection commands
#prt - defines range of numerical indices of particles to be picked, e.g. #prt 1 6
#mon - defines range of numerical indices of monomers to be picked, e.g. #mon 1 3
chem - An indicator for "chemical" notation will be used.
prtc - select by chemical name of particle
mono - select by chemical name of monomer
examples: "chem prtc CH3E", "chem mono HEME"
grou - A selection of groups will be made. i.e. not only zero/one will be assigned but also
group numbers, for example "pick grou 2 #prt 1 4 done" pick particles 1 to 4 as
belonging to group 2
8.5 Comment lines A line that starts with the character “~” is a comment and is not interpreted.
Example
~ This is a comment line and here A=5 will not assign 5 to A
8.6 Line continuation Continuation lines can be used (up to 300 characters per line) by ending the line to be
continued with a space and a dash, “ –“.
Example
pick pick #prt 1 60 | chem mono ALA & chem prtc –
CA != #mon 2 3 done
Is equivalent to
pick pick #prt 1 60 | chem mono ALA & chem prtc CA != #mon 2 3 done
8.7 The “action” and “*EOD” commands 13
Most of the line commands of moil are interpreted within a “read-line” loop in which
variables are read, assigned and stored and no (numerical) action is taking place. The
action command (“acti”) is a single keyword in a line that instructs the program to get out
from the main reading loop and starts executing instructions. In most cases it is the last
input command to a program. However, in some cases there is a need for some operations
(after exiting the main read loop) before a decision about new variables can be made (e.g.
files that are read only based on information of the prime input, assignment of LES
particles, etc.). Hence, in some cases one finds additional input lines after the “action”
was initiated.
The *EOD keyword is coming also on a separate line at the end of the input file. It
denotes the End Of Data that the user provided. It is a useful signal for the program since
it indicates that no more input is expected. If something is missing the program has a
chance of exiting gracefully instead of attempting another read and crash (ungracefully)
while attempting to read beyond the boundary of a file.
14
9 Brief description of MOIL modules 9.1 Special characters in description of programs: •
•
•
* Novel scientific features developed by the MOIL team
+ run in parallel
& available via moil.tcl menu-based graphic interface
Special characters used in detailed explanation of input to programs. The special
character must be replaced by the appropriate variable when running the program.
• *
-- a wild card typically replaces a name of molecule in a file name,
• [i]
-- an integer number
• [r]
-- a real value number
• [l]
-- a logical variable. If found set to true
• [d]
-- a double precision number (e.g. 1.d1 which means 10 in double
precision)
• ([c]) -- a character variable (must be enclosed in (…) )
9.2 Major programs: conn& – generate a connectivity file. The connectivity file stores all information
necessary for an energy-based calculation.
chmin*& –compute approximate reaction coordinates following the SPW algorithm of
Czerminski and Elber(Czerminski and Elber 1990). Input files: Connectivity and
coordinates of the end points, or prior path. Output a sequence of structures along the
reaction coordinate.
dyna& – compute a molecular dynamics trajectory. Inputs: a coordinate file and a
connectivity file. dyna produces a sequence of coordinate (and velocity) vectors in
unformatted files.
energy& – A single energy evaluation of a system with known coordinate and
connectivity files. Output is in *.wene file.
fp* – first passage time calculation(West, Elber et al. 2007). Sample trajectories at
Milestones and simulate trajectories between Milestones. Compute kinetics and
thermodynamics along a reaction path. Input: connectivity and reaction path. Output:
distribution of first passage times.
freee – computes a free energy profile by free energy perturbation and/or thermodynamic
integration along a reaction coordinate.
mfep – compute a minimum free energy path by the string method [Eric}
mini_tn& and mini_pwl& – truncated Newton-Raphson and Powell conjugate gradient
minimizations. Input: coordinate and connectivity files. Output: energy listing and file of
minimized coordinate file.
15
sdpS* and sdp*+ – Compute the steepest descent path following the scalar work
algorithm of Olender and Elber(Olender and Elber 1997). Input files: connectivity and
coordinates of the end points, or a prior path. Output a sequence of structures along the
reaction coordinate.
puth& – add hydrogens to a Protein Data Bank file and convert to a format accessible to
MOIL calculations (same as CRD file in CHARMM). Inputs: PDB file and connectivity
file. Output a CRD file with the additional hydrogens.
scndrv – compute second derivatives of the potential. Useful for normal mode
calculations and a highly refined optimization. Inputs: coordinate and connectivity file,
output the second derivative matrix. A variant is used as a subroutine.
sdelS* and sdel*+ – Large time step trajectories computed with an optimization of an
action(Elber, Ghosh et al. 2002; Ghosh, Elber et al. 2002; Cardenas and Elber 2003;
Elber and Cardenas 2004). Inputs: connectivity and end point coordinate files (or a prior
path file).
therm – compute free energy difference by thermodynamic integration varying
Hamiltonian parameters. Input: two connectivity files and a coordinate file.
umbr – compute the potential of mean force along a reaction coordinate using umbrella
sampling. Inputs: connectivity and reaction coordinate file.
9.3 Some noted options dynapress – run dyna with pressure monitoring.
FREADY* – run dynamics and energy minimization with a coarse grained potential for
proteins (Fold REcognition And Dynamics, (Majek and Elber 2009)).
LES*& – Locally Enhanced Sampling(Elber and Karplus 1990; Roitberg and Elber
1991; Simmerling and Elber 1994). Allows for enhancing the sampling of a small part in
a large system (a ligand in a protein(Elber and Karplus 1990; Czerminski and Elber 1991;
Gibson, Regan et al. 1992), side chain in a protein (Roitberg and Elber 1991), solvated
peptide (Simmerling and Elber 1994; Mohanty, Elber et al. 1997)). Modify coordinate
and connectivity file. Used in conn, energy, mini_tn, mini_pwl and dyna.
dynapt+ – Replica exchange simulations for better equilibrium sampling (Kirmizialtin,
and Elber, submitted).
PME& – sum long-range electrostatic forces with Particle Mesh Ewald (we use the
Darden code (Darden, York et al. 1993)).
LD – run Langevin dynamics.
con_specl& – prepare dynamics with electronic surface crossing using the LandauZenner model. con_specl generates a second energy surface to allow surface crossing (Li
et al(Li, Elber et al. 1993)).
MUTA – compute free energy differences of mutants,
9.4 Utilities addion& – add ions to a solvation box to make the system neutral (required for Ewald
calculations)
16
boat& – compute BOnd Angle and Torsion for a structure (check and list all the internal
coordinates)
ccrd& – convert coordinates between different formats (CRD/DCD/PTH)
crd2pdb& – convert crd file (CHARMM coordinate file format) to PDB format.
memeqns – use local first passage time distributions (output of fp) to compute the overall
first passage time.
numerical and test_drv – compute derivative of the potential numerically (by finite
difference) test_drv tests first derivatives, numerical second derivatives. Useful for
testing eforce and scndrv analytical code.
ovrlp_trj& – overlap structure of a dynamics file (*dcd) with respect to a reference
structure.
path_eqw – take a reaction path (a set of structures in a single *pth file) in vacuum, and
convert it to a solvated path. Generate also a poly file that is used to create a new
connectivity file with the new water molecules (TIP3P) included.
pdb2puth – edit pdb file to include terminal monomers and change atom/monomer
names. This program is deprecated. A more complete “processing” of a PDB file to
produce MOIL input files (.crd, .wcon, etc) is possible via the moil.tcl “Process PDB”
menu item – see get_started.pdf for a description of using the moil.tcl interface to
interactively process a PDB file in this manner. Alternatively, moil.tcl may be used as a
non-interactive command-line tool to process a PDB file with the syntax: moil.tcl
procpdb –drop <name of pdb-file>. This syntax may be used with two options:
-drop: unknown monomers will be dropped without prompting
-nodrop: unknown monomers will halt processing of the PDB, without prompting
If neither option is given, it is assumed the script is being run on an interactive
workstation and a dialog box will ask the user how to handle the unknown monomer.
reconstruct – code to reconstruct an atomically detailed model from a coarse grained
model.
solvatecrd& – water box solvation of vacuum structures.
9.5 Analysis programs av_dif – compute average diffusion coefficient from a path or dynamics files.
contacts& – follow contacts for selected subset of atoms along a trajectory
eff_diffdens& – compute diffusion constants and densities for water molecules on spatial
grid.
fluc& – compare (root mean square distance) rms with respect to a reference structure
and to the time averaged structure from a Molecular Dynamics trajectory.
rgyr& – compute radius of gyration for a sequence of structure from a Molecular
Dynamics trajectory.
rms_2crd rms_2path rms_p2p – All kind of root mean square distance calculations.
2crd compares structures in two crd files. 2path and rms_p2p analyze two separate path
files.
rms_resd& – overlap structures in Molecular Dynamics and compute the average rms
for each residue (to compare with B factors, for example).
17
str_measures – provides shape descriptors that relate to the three eigenvalues of the
tensor of inertia. These descriptors were developed by (Honeycutt and Thirumalai 1989).
superTMscore – compute the TM score for structural alignment of two 3D objects. A
measure invented and programmed by (Zhang and Skolnick 2005)that we use with small
adjustment to MOIL format.
superback, superrms – variants of the rms program that provides additional options.
tmalign – the structural alignment program from Jeff Skolnick group (Zhang and
Skolnick 2005) that we use with minor modification to allow forcing a trivial alignment.
torstat – extracts statistics for backbone torsions from a dynamics or path file.
xangle& xcrd& xtors& – extract specific angle, coordinate or torsion from a dynamics
or path file.
9.6 The moil.tcl, cmoil, and zmoil programs This document focuses on the use of the keyboard to prepare input to the molecular
modeling suite of programs MOIL. moil.tcl is a menu-based graphical interface written in
tcl/tk which is available in Windows, OS/X, and Linux that facilitates the generation of
many types of input files. It is a convenient choice to process a protein data bank file
(PDB) to the different files (coordinate and connectivity) required by other MOIL
programs. Zmoil is the OpenGL-based molecular visualization component of MOIL, and
has a separate documentation. It is the successor to the program cmoil, which while still
included in the MOIL distribution, is no longer actively developed.
9.7 Parameters for energy and force calculations -‐ EPROP This set of parameters appears in numerous moil modules. We collect these parameters in
a single section below and refer to this paragraph using the name EPROP in the
description of the other programs. Default values are in curly brackets {…}. Note that not
all options listed below are available in all programs.
Parameters for curve crossing and Landau-Zener model for electronic curve crossing (Li,
Elber et al. 1993).
mors – a flag stating that the present line defines a Morse bond [l] {false} – Initiates
Morse bond parameters. The connectivity file must include an entry about the
number of Morse bonds. Morse bonds are added to the connectivity file via the
addb file. The mors line specifies the energy parameters (for 4 Morse bonds) all
double precision:
mors alph=2.0 Dmor=30. alph=2. Dmor=30. alph=2. Dmor=30. alph=2. Dmor=30.
The equilibrium distance for the above bonds is read when a Morse bond is
declared in the addb file.
Dmor - the Dissociation energy of a Morse bond (kcal/mol) [d]{30.0d0}
alph – range parameter for a morse bond. Used in dyna in simulations that use Morse
energy [d]{1.0}
spec – a flag for switching between different electronic energy surfaces
18
rcut – It is the range distance employed in the switching function between different
forms of the heme. [d]{5.0}
lmda – range parameter for adiabatic potential flip between two Born Oppenheimer
surfaces (part of the Landau Zener model) [d]{3.0}
repl – [l] {false]– introduces an exponential repulsion terms between two atoms
(Morse and exponential repulsion present the two electronic energy surfaces for
simulating crossing). The repl functional form is Ae − β r + B . The number of
repulsions is typically the same as the number of Morse bonds and it is read from the
repl line (example with 3 repulsion) all double precision:
repl Arep=80. beta=1. Brep=4. Arep=80. beta=1.0 Brep=4. Arep=80. beta=1.
Brep=4.
Arep – the excited curve pre-exponential factor which is used in curve crossing
calculation [d]{100.0}
Brep – the asymptotic value of the repulsion curve at large distances, i.e. the potential
looks like V(r)=exp(-beta*r)+Brep. It is necessary for curve crossing (influences the
crossing point). [d]{100.0}
beta – range parameter for the exponential repulsion bond [d]{1.0}
cent -- [l] {false}– restraining the geometric center of a selected set of atoms to be at a
fixed point. The following parameters are optional (must be in the same line):
kcnt, which is the force constant; xeqm, yeqm ,zeqm which are the coordinates of
the fixed point and a “pick” command to select the subset of restrained atoms,
e.g.,
cent kcnt=[d]{10.d0} xeqm=[d]{0.} yeqm=[d]{0.} zeqm=[d]{0} pick… done
rvmx – [d] {6.d0} cutoff for Van der Waals interactions. Note that the default is very low
rvbg – [d] {-1.d0} a second (larger) cutoff for Van der Waals interaction for buffer
computations. The default is negative since unless explicitly assigned, it is
computed as rvmx+2.
relx – [d] {8.d0} cutoff for electrostatic calculation. Note that relx MUST be larger than
rvmx.
rebg – [d] {-1.d0} a second (larger) cutoff for electrostatic interactions. The default is
negative since unless given explicitly it is computed as relx+2.
cutm – [d] {-1.d0} cutoff for monomer monomer distance (used in intermediate
calculation of the non-bonded list). If not explicitly given, computed from
relx*1.2.
rmax – [d] {-1.d0}. A single cutoff for all non-bonded interactions. Used to indicate no
cutoff , i.e. rmax=9999. Not used anymore to indicate actual cutoff and kept for
past consistency.
gbsa – [l] {false} turn on Generalized Born Surface Area calculations (Tsui and Case,
2000).
gbo1 – one of the options for gbsa calculation
gbo2 – a second options for gbsa calculation
npol – the surface area non-polar component of gbsa
surften or sten – [d] {0.005d0} surface tension coefficient.
gbsu – [i] {0}frequency of updating the gbsa neighbor list.
epsi – [d] {1.d0} dielectric constant. Most applications do not use it and its impact is precomputed to the connectivity file.
19
hscl – [d] {1.d0} scale for hydrophobic potential between Cbeta atoms (old potential, no
longer used).
nobo noan noto noim novd noel nohy – different logical variables ([l](false} each)
turning off different energy terms, nobo is no bonds, noan is no angles, noto is no
torsions, noim no improper torsions, novd is no Van der Waals, noel no
electrostatic, nohy no hydrophobicity phenomenological term.
hvdw – [l]{false} set finite van der Waals radius for hydrogen atoms (usually zero in
OPLS). Helps to avoid numerical instabilities at high temperature simulations or
when the initial structure is highly distorted.
cnst – [l]{false} turn on constraint energy
symm – [l]{false} symmetry operations. Periodic boundary conditions for rectangular
boxes. A definition of box size xtra=[d]{0.d0} ytra=[d]{0.d0} ztra=[d]{0.d0}
must come in the same line with.
ewald – [l]{false} apply the Particle Mesh Ewald sum. The following parameters are
optional and should come in the same line as ewald declaration: Error tolerance
dtol=[d]{0.d0} ; grid in x,y,z (typically 32) grdx=[i]{0} grdy=[i]{0} grdz=[i]{0} ;
more scaling parameters sgdx=[d]{1.d0} sgdy=[d]{1.d0} sgdz=[d]{1.d0}
vprt – [l]{false} virtual particles are present to better model charge distribution.
Examples are TIP4P (not supported in moil currently) and carbon monoxide. The
keyword gcnt[l]{false} in the same line means that the geometric center will be
used to determine the coordinate of the charge.
amid – add a harmonic constraint on the torsions of the amide planes to ensure trans
configuration
kamd – value of the force constant [d]{100.d0}.
ball – [l]{false} to indicate a restraining spherical potential on water molecules. In the
same line the option variables are: (i) force constant for the harmonic restraint that
keep the water in the spherical boundary fbal=[d]{0.d0}, (ii) radius of the sphere
rball=[d]{0.d0} , origin of the sphere xbal=[d]{0.d0} ybal=[d]{0.d0}
zbal=[d]{0.d0}
metl – metal boundary condition for a box. It includes a repulsion A/(y-y_start)^6 along
the Y axis, voltage term, and image charges. The exact position of the interface is
bwal. Other parameters in the same line are amtl=[d]{50.d0}; bwal=[d]{ytra} ;
the voltage v_el=[d]{0.d0}.
20
10 MOIL in depth 10.1 The major programs For more details on file formats see section 11 Moil files
10.1.1 conn Purpose: Generate a connectivity file. This file is a necessary input for all programs that
use energy or force calls. Also used to obtain the internal coordinate in general. In the
moil.tcl menu-based interface to Zmoil the connectivity file is used to extract the atom
and bond identifiers.
Use : conn < input > output
Input file types: Required
prop – define atom names, atom charge, Lennard Jones parameters, bond length and
equilibrium distance, angle, torsion, improper torsion parameters. Prepared versions of
this file exist in the directory moil.mop with a large list of predefined monomer (all
amino acids, nucleotides, water, ions, etc.). The most widely used file is ALL.PROP.
This version should be used unless there is a need to have energy terms not defined in the
existing file.
mono – define monomer names, the atoms that belong to the monomers, the monomers
that are connected to it and the bond between atoms, optional are the definition of atom
charges and surface area in the monomer file. If defined in the mono file they overwrite
the properties listed in the prop file. Pre-prepared versions of this file exist in moil.mop.
The one used most frequently is ALL.MONO. This version should be used unless there is
a need to build a new monomer.
poly – the file with the sequence (list of monomers) that we wish to study. It is provided
by the user and is (obviously) specific to the molecule we wish to study. This file can be
created using the GUI.
Input file types: optional
ubon – list of bonds to add to the default generation determined by the mono file. For
example adding S-S bonds or iron ligand bond is done here. An option in MOIL is to add
a Morse bond D e −2α (r −r0 ) − 2e −α (r −r0 ) , where r0 is the minimum energy position. A typical
line adding a Morse bond between atoms 1299 and atom 1346 is
mors atm1=1299 atm2=1346 requ=1.743
uedit – changes to the bond structure determined by the mono file. For example removing
some of the HEME angles the uedit file would look as follows:
remo angl chem HEM1 157 NA HEM1 157 FE HEM1 157 NC
remo angl chem HEM1 157 NB HEM1 157 FE HEM1 157 ND
*EOD
[
]
Output file types: required
21
wcon – the (written) connectivity file that includes information on the molecular topology
and the parameters that are required for energy calculation.
Output file types: optional
wco2 – a second connectivity file for the same molecule. Useful for curve crossing
calculation.
Variables: (in square brackets - type, curly bracket default)
mshk – turn on SHAKE for water molecules. Do not initiate flexible bonds for water
molecules. [l] {false}
prll – initiate parallel run (starting from Moil11 this parameter is obsolete ) [l] {false}
debu – print a lot of debugging information [l] {false}
hydr – turn on hydrophobic potential of Sippl [reference] [l] {false}
arit – change the combination rule for LJ sigma (from
σ iσ j to. 0.5(σ i + σ j ) ). Allowing
for easy flip between force fields such as AMBER and OPLS. [l] {false}.
mdiv – allows refinement of the nonbonded list. The generation is based (to begin with)
on monomers. If the monomers are large inaccuracies may occur since the distance
between the center of mass of the monomers will not reflect atomic distances near the
edges. Mdiv makes it possible to use fragments of monomers (defined in ALL.MONO
file) in the calculations of the list. The fragments are defined in the monomer file. An
example in which mdiv is used is in the definition of phospholipids. [l] {true}
nomd – explicitly turn off mdiv [l] {false}
hvdw – add small Van der Waals radius to hydrogen to avoid Coulombic explosion [l]
{false}
muta – prepares the program to read which are the particles involved in a mutation as
stated with the command MUTA.
acti – final read, here signal to stop reading and start executing.
Post action instructions
Optionals:
LES command multiplies a subset of particles selected in a pick command:
MULT(iply) pick pick_expression done #cpy=[i]{1}
MUTA establishes a free energy calculation interpolating between two chemical species
with potentials U1 and U 2 . For interpolation we use U ( λ ) = (1 − λ )U1 + λU 2 , where the
interpolation parameter λ ∈[ 0,1] . A selection defines the first specie as group 1 and the
second specie as group 2 following the syntax below:
MUTA pick grou 1 #prt 7 9 | grou 2 #prt 15 15 done.
Particles 7 to 9 are specie 1 and group 2 is particle 15 and specie 2.
Use with FREADY
In order to prepare a connectivity file for FREADY (a coarse grained model of proteins)
extend all names of amino acid residues in the poly file with the forth letter Z (e.g. ALA
=> ALAZ). Remove the NTER monomers and change all CTER monomers to CGTR
22
monomers. This will work with monomers/properties defined in ALL.MONO and
ALL.PROP from moil.mop distribution directory.
Sample conn input (running multiple LES copies of CO in myoglobin)
File name : mb10co_conn.inp
file mono name=(../../moil.mop/ALL.MONO) read
file prop name=(../../moil.mop/ALL.PROP) read
file poly name=(mb10co.poly) read
file ubon name=(mb10co.addb) read
file uedi name=(mb10co.edit) read
file wcon name=(mb10co.wcon) wovr
mdiv
action
MULT pick chem mono CO done #cpy=10
*EOD
Sample user supplied input files
File name : mb10co.poly
MOLC=(MYOG) #mon=158
NTER MET VAL LEU SER GLU GLY GLU TRP GLN LEU VAL LEU HIS
VAL
TRP ALA LYS VAL GLU ALA ASP VAL ALA GLY HIS GLY GLN
ASP ILE LEU ILE ARG LEU PHE LYS SER HIS PRO GLU THR
LEU GLU LYS PHE ASP ARG PHE LYS HIS LEU LYS THR GLU
ALA GLU MET LYS ALA SER GLU ASP LEU LYS LYS HIS GLY
VAL THR VAL LEU THR ALA LEU GLY ALA ILE LEU LYS LYS
LYS GLY HIS HIS GLU ALA GLU LEU LYS PRO LEU ALA GLN
SER HIS ALA THR LYS HIS LYS ILE PRO ILE LYS TYR LEU
GLU PHE ILE SER GLU ALA ILE ILE HIS VAL LEU HIS SER
ARG HIS PRO GLY ASN PHE GLY ALA ASP ALA GLN GLY ALA
MET ASN LYS ALA LEU GLU LEU PHE ARG LYS ASP ILE ALA
ALA LYS TYR LYS GLU LEU GLY TYR GLN GLY CTRG
HEM1 CO
*EOD
File name : mb10co.addb
bond chem HIS 95 NE2 HEM1 157 FE
*EOD
File name : mb10co.edit
*EOD
23
10.1.2 puth Purpose: Add hydrogens to a structure that includes only atoms significantly heavier
than hydrogens (e.g. C, N, and O). Lack of hydrogens is typical in PDB structures that
are determined by X-ray crystallography.
Use : puth < input > output
conn – a file *.wcon obtained by calling conn first. It contains molecular topology data
and the parameters required for energy calculations.
rcrd – read coordinate file. Must include ctype=(pdb) in the line since the only coordinate
file type supported for puth is PDB format.
Input file types: Optional
None
Output file types: Required
wcrd – written coordinate file with the hydrogen built in. Type must be CHARMM
ctyp=(CHARM)
Output file types: Optional
None
Variables
None
Post action parameters
None
Sample puth input
file conn name=(ery.wcon) read
file rcrd name=(ery.pdb) read ctyp=(pdb)
file wcrd name=(ery.crd) wovr ctyp=(CHARM)
action
24
10.1.3 energy Purpose: Calculate and output the energy of one or a set of coordinates
Use : energy < input > output
rcrd - the file where the Cartesian coordinates of all the particles are stored. The possible
formats for this coordinates file are:
CHARM – coordinates written in charmm format (default);
DYNA – coordinates taken from a dynamics DYNA file;
PATH – coordinates taken from a path format file (binary).
Out files types: Required
wene – file name for energy listings are summarized
Variables
debu – a flag for printing a lot of debugging information. Do not use unless you are a
moil expert.
EPROP ARE AVAILABLE FOR energy CALCULATIONS.
cdie - constant dielectric, currently the only option available (the default option, presently
the only option)
rdie - a flag indicating that good old Coulomb law is modified from 1/r to 1/r2 (not
active)
shif - a flag indicating a different style of cutoff which brings the interaction energy to
zero continuously (not active)
gcnt - used when virtual particle are present. If found Geometric CeNTer instead of
center of mass (default) is used.
#str or #ste– Number of structures if a dynamic coordinate or path files are used. [i]{1}
Sample input
~
~ energy calculation:
~
file rcon name=(../conn-ala3/ala3.wcon) unit=10 read
file rcrd name=(ala3_min.crd) unit=12 read
file wene name=(ala3-gbsa.ene) unit=13 wovr
rmax=9999. epsi=1. cdie gbsa
action
25
The output file ala3-gbsa.ene has a standard moil structure
Parameters for energy calculation
Constant dielectric will be used. elec. Cutoff= 9999.00000
vdW cutoff
9999.00000
GB polarized solvation energy required (Hawkins)
ENERGIES: E total =
939.756
E bond =
968.827
E angl =
E impr =
61.795
E vdw =
E 14el =
117.013
E 14vd =
E cnst =
0.000
E evsym=
E centr=
0.000
E hydro=
E gbsa = -182.283
E pol = -182.283
E nonpol =
Norm Force =
85.274
Number
Number
Number
Number
Number
of
of
of
of
of
163.960
-0.498
-0.194
0.000
0.000
E tors =
E elec =
1.916
-190.8
E elsym=
0.000
0.000
neighbours for short range int.
uncharged vdW interactions
elec. only interactions
wat-wat shrt. range neighbors
wat-wat long range neighbors
40
25
51
0
0
26
10.1.4 mini_pwl and mini_tn Purpose: Minimize the energy of a given structure with respect to Cartesian coordinates.
Coarse grained FREADY input is supported, See FREADY documentation for details.
10.1.4.1 mini_pwl: Purpose: Use conjugate gradient algorithm with the Powell restart option to locally
optimize the energy as a function of Cartesian coordinates.
Use : mini_pwl < input > output
Input file types: required
rcrd – read coordinate file. The default type of coordinates is CHARM. The keyword
cstyl can change it (see cstyl below) to PATH (pth) and DYNA (dcd) formats.
con1 & con2 – the possibility of multiple connectivity files that describe (each) a
different Born Oppenheimer surface.
wcrd – write output coordinates in this file
wmin – report progress of the minimization in this file.
wpth – write minimized coordinates in pth format in this file (replaces wcrd).
Variables: (in square brackets –type, curly bracket default)
EPROP ARE AVAILABLE FOR mini_pwl EXCEPT vprt.
gcnt – the keyword gcnt in the same line with vprt means that the geometric center will
be used to determine the coordinate of the charge [l]{false}
Extra variables for minimization
cpth –the input coordinate file is a pth formatted file
read structure number istr=[i]{0} [l] {false}
cdyn –the input coordinate file is a dcd (dyna) formatted file
read structure number istr=[i]{0}[l] {false}
DYNA – read and minimize a set of structures from a dcd (dyna) formatted file
The range of sequential structures are from lpst=[i]{1} to lpen=[i]{-1}
PATH – read and minimize a set of structures from a pth formatted file
27
wpth – write out coordinate file in path format
tolf –tolerance of force. When tolf is reached during the minimization,
terminate[d]{0.d0}.
mistep –number of minimization steps. When mistep is reached, terminate. [i] {100}
list or #lis – [i]{20} number of steps between updates of the non-bonded list [i]{100}.
TORS – minimization is possible with torsional constraints. Note that the constraint’s
flag must be set previously by the keyword cnst (shared by ENERGY). Torsion is
specified with 4 atoms listed in the same line: atm1=[i]{0} atm2=[i]{0}
atm3=[i]{0} atm4=[i]{0} then we specify a force constant kcns=[d]{0.d0} and
angle (in degrees) cneq=[d]{-999.d0}. Another important option is loop[l]{false}
which indicates that a loop on torsion values will be computed (useful in
generation of adiabatic maps). The loop is specified with a start position
strt=[d]{0.d0} , a stop position stop=[d]{0.d0} and a step size step=[d]{1.d0}
Sample input file
file rcon name=(valdip.wcon) read
file rcrd name=(a.crd) read
file wpth name=(admap.pth) binary wovr
file wmin name=(valdip_mini.out) wovr
rmax=9999.
tolf=1.d-4 mistep=5000 list=500
cnst
TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100. loop strt=-180 stop=180 step=30
TORS atm1=4 atm2=6 atm3=10 atm4=12 kcns=100. loop strt=-180 stop=180 step=30
action
*EOD
28
10.1.4.2 mini_tn: Purpose: Use truncated Newton Raphson algorithm to locally optimize the energy as a
function of Cartesian coordinate. It is lot slower than mini_pwl and with less options (e.g.
no gbsa and no metl). It provides very accurate minimum that is important (for example)
in calculations of normal modes.
Use : mini_tn < input > output
rcrd – read coordinates. The default type of coordinates is CHARM. The keyword cstyl
can change it (see cstyl below) to PATH (pth) and DYNA (dcd) formats.
None
wcrd – write output coordinates in this file
wmin – report progress of the minimization in this file.
wpth – write minimized coordinates in pth format in this file (replaces wcrd).
Variables
EPROP ARE AVAILABLE FOR mini_tn EXCEPT vprt.
gcnt – the keyword gcnt in the same line with vprt means that the geometric center will
be used to determine the coordinate of the charge [l]{false}
Extra variables for minimization
cpth – the input coordinate file is from a pth formatted file
cdyn – the input coordinate file is from a dcd (dyn) formatted file
DYNA – read and minimize a set of structures from a dcd (dyn) formatted file
PATH – read and minimize a set of structures from a pth formatted file
wpth – write out coordinate file in path format
tolf – tolerance of force. When tolf is reached during the minimization, terminate
[d]{0.d0}
mistep – number of minimization steps. When mistep is reached, terminate. [i]{100}
list or #lis – [i]{20} number of steps between updates of the non-bonded list [i]{100}
29
TORS – minimization is possible with torsional constraints. Torsion is specified with 4
atoms listed in the same line: atm1=[i]{0} atm2=[i]{0} atm3=[i]{0} atm4=[i]{0}
then we specify a force constant kcns=[d]{0.d0} and angle (in degrees)
cneq=[d]{-999.d0}. Another important option is loop[l]{false} which indicates
that a loop on torsion values will be computed (useful in generation of adiabatic
maps). The loop is specified with a start position strt=[d]{0.d0}, a stop position
stop=[d]{0.d0} and a step size step=[d]{1.d0}
Sample input file
file conn name=(val.wcon) read
file rcrd name=(val.crd) read
file wcrd name=(valmin.crd) wovr
file wmin name=(valmin.out) wovr
mistep=100000 tolg=0.000001
rmax=9999.
action
10.1.5 dyna Purpose : Integrate Newton’s equations of motion using “Velocity Verlet” algorithm
(see for instance “Understanding molecular simulations: from algorithms to
applications”, D.Frenkel, B.Smit). This program takes as an input the coordinate
file and the connectivity file and gives as an output the trajectory of the protein
(which can be visualized using zmoil).
Use : dyna < input > output
rcrd – the file where the Cartesian coordinates of all the particles are stored. These are
the initial conditions to solve the Newton’s equations. The possible formats for
the coordinate file are:
ctyp=(charm) – coordinates written in charmm format (default);
ctyp=(path) – coordinates taken from a path format file (binary).
rtet - coordinates file for tethering particles to their initial coordinates during MD.
Harmonic springs are attached to the position of the particles as defined in rtet.
The coordinates are in charmm format (default).
ucon1 & ucon2 – two special connectivity files to model two electronic energy curves
and curve crossing with the Landau-Zener model.
30
rvel – file with initial velocities if you do not want to sample them randomly from the
Boltzmann distribution. Velocities are written in charmm format.
None
wcrd – a binary file where the Cartesian coordinates of the particles are stored.
wvel – a binary file where the velocities of the particles are stored.
rest – a file that stores the last step saved for the restart.
rstr – a file where recent coordinates for restart are stored.
Variables
EPROP ARE AVAILABLE FOR dyna.
eqms –turn on the logical variable eqms=true [l]{f}. All masses will be set to 10. The
idea is to allow for more efficient equilibrium simulations.
bigb – Set a spring constant for the bond between any two particles to the value newb.
Bonds are selected with pick
newb – new bond spring constant [d]{500.d0}
wfly – For solvation shell simulations without spherical constraints.
Check if water molecules fly away from a solvation shell (use to study molecules
in vacuum with a solvation layer (Steinberg, Breuker et al. 2007)) and stop it from
reaching infinity to avoid numerical problems. [l]{false}
tstd – turns on the check on acceptable distances between atoms. If it is shorter than 1.5A
prints a warning. No dynamics will be run. It is a single structure evaluation to
find bad contacts. [l]{false}
nfrz – select the particles that WILL NOT be frozen. In the same line as nfrz a "pick"
command must follow
cgsk – matrix shake in conjugate gradient
mshk – a flag to indicate the use of special constraint protocol matrix shake for water
molecules
When added to the conn program, bond and angles are excluded from
connecticity list of the water molecules
mtol – is the tolerance in mshake [d]{1.d-7}
shkl – turns on shaking of bonds with light particles ( m < 1.1)
shkb – turns on shaking of all bonds
shac – maximal error allowed for bond constrained (coordinates) [d]{1.d-7}
shav – maximal error allowed for bond constrained (velocities) [d]{1.d-7}
itsh – maximum number of allowed iterations for SHAKE convergence [i]{100}
cgpt – maximum number of iterations in conjugate gradient in a matrix shake(Weinbach
and Elber 2005) (not active) [i]{NA}
cgvl – maximum number of iterations in conjugate gradient in a matrix shake (not active)
[i]{NA}
31
nori – turns on the reorientation during the dynamics
nosc – avoids scaling of temperature (done by default)
orie – by default to do the reorientation to avoid rigid body motion, the overlapping of
the structure with the initial structure is done using all atoms. If call orie it is
possible to pick only some atoms
TORS – turns on the constraining of some torsional angles
example - TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100
atm1 atm2 atm3 atm4 are the constrained atoms [i]{0}
kcns – amplitude for torsional constraint (the larger kcns, the stronger the
constraint) [d]{0.d0}
cneq – equilibrium angle expressed in degrees [d]{-999.d0}
( TORS keyword must come AFTER amid if amid is used)
spec – option of switching between different energy surfaces
example: spec lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0
lmda – is the range parameter for continuous potential shifts from in plane to out
of plane configuration of the heme iron [d]{3.d0}
rcut – is the range distance employed in the switching function between different
forms of the heme [d]{5.d0}
swit – this flag makes possible the passage between different energy curves in LandauZener calculations
example: switch Rcro=3.53181 dRcr=0.05 Forc=5.59951 delt=0.287
Rcros – is the position at the crossing point [d]{3.d0}
dRcr – the interval (in angstrom) of significant interaction between two curves
that cross (i.e. the range in which a transition probability between the two
electronic curves is evaluated) [d] {1.d0}
Forc – the difference in the forces at the crossing point of the two electronic
curves [d]{0.1d0} indicate the time interval during which curve crossing is felt
[d]{0.1}
nocut – Non bonded interactions are computed in full according to existing lists. No
additional distance check is used in the energy routines. Important (and the
default) in minimization that requires continuous and differential energy function
(like conjugate gradient)
shif – a flag indicating different style of cutoff which is no longer used. Ignore at
present, may return sometime in the future
nbfi – a flag to indicate that a soft, finite Van der Waals repulsion (Gaussian repulsion is
employed). It is useful when initial structure is bad with many overlaps since it
avoids hard core energy singularities.
sym2 – A flag indicating that the box size is changing during the simulation and the final
size is determined in the present line. The rate of change is linear in the simulation
time from the value defined by the symm command and the values found at sym2
example - sym2 xtr2=26.8 ytr2=26.8 ztr2=26.8
xtr2, ytr2, ztr2 – are the final sizes of the box that is changing during the
simulation [d]{0.d0}
tthr – Selected atoms are harmonically linked to fix positions in space. A pick command
is expected in the same line
32
frcc – Force constant for tether constraints (linking particles to specific position in
space){d}[1.d0]
mult – Multiple temperatures are present
Picked temperatures are used for velocity scaling of subsets of the system. The
default is that all particles belong to temperature 1 (have the same temperature).
Useful in annealing in which only part of the system is bad, or in LES in which
equipartition is violated and different scaling is used for enhanced and regular
parts(Ulitsky and Elber 1993)
#tmp – number of temperatures in the system (useful for LES simulation) . If larger than
one more input is required to define different domains with different temperatures
[i] {1}
tmpi – initial temperature [d] {300}
tmpf – final temperature [d] {300}
For a number of tmpi and tmpf for #tmp>1
rand – a random number seed for (Boltzmann) velocity generator. [i]{1}
step – time step in ps [d] {0.001d0}
newv – how many steps you need before assigning new velocities.[i]{0}
#rig – number of steps between two rigid rotations to remove rigid body motion.[1]{10}
fmax – if >0 turns true sdyes, which means that you do steepest descend. Its value is the
threshold above which you should perform steepest descend iterations [i]{-1}
strt – for restart
debu – print a lot of debugging information [l] {false}
pdeb – print virial term of the pressure on standard output [l] {f}
#ste – number of MD steps [i]{1}
#equ – number of equilibration steps [i]{1}
info – number of steps between writing information on standard output [i]{1}
#crd – number of steps between writing coordinate sets [i]{0} (=0 means do not write
coordinates)
#vel – number of steps between writing velocities sets [i]{0} (=0 means do not write
velocities)
#lis – number of steps between regeneration of the non bonded list [i]{1}
#scl –if temperature(s) deviates more than this value it is being rescaled [d]{0} (=0
means isokinetic ensemble)
cont – Used to denote that this is a continuation of a dynamics run [l] {false}
strt - is the starting step for dynamics[i]{1}
rdie – DIstance dependent (linear) dielectric – no longer active [l]{false}
FREADY Coarse grained model for protein interactions works with dyna (see FREADY
documentation 9.2.3).
Sample dyna input
file conn name=(mb10co.wcon) read
file rcrd name=(mb10comin.crd) read
file wcrd name=(mb10co.dcd) bina wovr
33
file wvel name=(mb10co.dvd) bina wovr
#ste=500 info=100 #equ=10 #crd=100 #vel=10 #lis=20
#scl=10 rand=-3451187 step=0.002 tmpi=10 tmpf=300
relx=11. rvmx=8. epsi=1. cdie
shkb shac=1.d-6 shav=1.d-6 itsh=500
action
10.1.6 dyna_prl (parallel program, requires MPI) Purpose: The executable dyna_prl is a parallel version of the program dyna in which the
calculations of the forces are done in parallel by splitting the interaction lists.
It takes exactly the same input as dyna. The only difference is that the input
for the simulation is always taken from a file named Input and the textual
output is returned in a series of files called dyn_out_x.out, where x is the
process number. Please consult the file dyn_out_0000.log for the most
detailed output. Openmpi needs to be installed on the system in order to run
dyna_prl.
Use: mpirun –np number_of_processes dyna_prl
10.1.7 freee Purpose : Executable to perform free energy calculation by free energy perturbation.
Use : freee < input > output
rcrd – the file where the Cartesian coordinates of all the particles are stored. The only
possible format for this coordinates file is:
wcrd – a binary file where the Cartesian coordinates of the particles are stored
wslo –The slope of the reaction coordinate
wfrc – The forces in direction of the reaction path direction to make it possible to
compute force-force correlation function and approximate friction kernel
wene – Energy output
Variables (in square brackets – type, curly bracket default)
EPROP ARE AVAILABLE FOR freee EXCEPT srften,gbsu, hscl, mors, repl, gbsa, gbo1,
gbo2, npol, ball.
debu – prints out on the debug file a lot of debugging information [l]{false}
34
rvrs –, compute the free energy from reactants to products AND vice versa [l]{false}
additional distance check is used in the energy routines. Important (and the default) in
minimization that requires continuous and differential energy function (like conjugate
gradient) [l]{true for minimization, false otherwise}
cdie – Use constant dielectric (=1). [l]{true}
rdie – Distance dependent (linear) dielectric (not active). [l]{false}
nori – Global translation and rotation of the system are not eliminated [l]{false for
vacuum simulations}.
mshk - A special constraint protocol will be used for water (Matrix SHAKE)[l]{false}
nfrz – select the particles that WILL NOT be frozen. In the same line as
nfrz a "pick" command must follow
toiy – A flag indicating that the Tensor Of Inertia and rotational contribution to the
potential of mean force should be calculated [l]{false}
selc – A pick command to select particles for which the potential of mean force/free
energy difference is computed
example:
selc pick #prt 1 12 done
ssbp – Spherical solvent boundary potential is used(Beglov and Roux 1994) [l]{false}
nmul, diec, drdi, drca are parameters related with ssbp
hvdw – use finite van der Waals radius for hydrogen atoms which are not water.
Normally they are zero but this may cause stability problem since charges of
opposite sign may overlap in space. [l]{true}
temp – Desired temperature (the temperature is determined from the kinetic energy7)
[d]{300.d0}
grid – Number of grid points of the path [i]{200}
#ste – number of sampling points for a given reaction coordinate value [i]{1}
#sve – period of velocity scaling [i]{100}
#tes – Number of steps between checks that the linear constraints are satisfied [i]{500}
#wcr – Frequency of writing coordinates to a file (binary form) [i]{500}
dt – step in pico-second for time integration [d]{0.001d0}
bgin – Index of first path coordinates used to do free energy calculations [i]{1}
fina – Index of last path coordinates used to do free energy calculations [i]{1}
rand – a seed for random number generator to sample velocities. [i]{1}
newv – Frequency of assigning new velocities.[i]{1000}
#pri – number of steps between writes of simulation report [i]{1}
list – period between updates of non bonded list [i]{1}
Sample free_e input
file conn name=(sypfdv.wcon) read
file rcrd name=(sypfdv.pth) bina read
file wcrd name=(sypfdv_fin.pth) bina wovr
file wslo name=(sypfdv.slo) wovr
file wfrc name=(sypfdv.frc) wovr
file wene name=(sypfdv.ene) wovr
35
relx=999.0 rvmx=999.0 epsi=1. cdie v14f=8. e14f=2.
#ste=10 #equ=2 #pri=1 #wcr=1000 #list=101
#tes=50 #sve=1
step=0.001
mshk mtol=1.d-12 rand=-30379267
bgin=1 fina=5 grid=5
nfrz pick chem mono TIP3 done
temp=300.0
ssbp nmul=15 diec=78.4 drdi=2.8 drca=2.6 pick chem prtc OH2 done
hvdw
action
10.1.8 therm Purpose : Executable to perform free energy differences by thermodynamic perturbation
model
Use : therm < input > output
conr – connectivity file for the reactants along the lambda free energy calculations.
conp - connectivity file for the products along the lambda free energy calculations.
rcrd - the initial (crd) coordinate file.
wcrd – an unformatted file of output Cartesian coordinates.
wvel - an unformatted output file of Cartesian velocities.
EPROP ARE AVAILABLE FOR therm EXCEPT srften, gbsu, hscl, mors, repl, cent, gbsa,
gbo1, gbo2, npol, ball.
(like conjugate gradient) [l]{true in minimizations}
shif - A flag indicating different style of cutoff which is no longer used. Ignored at
present, may return sometime in the future [l]{false}
cdie – Constant dielectric (=1). [l]{true}
rdie – Distance dependent (linear) dielectric. Not active. [l]{false}
shkb – Shake all bonds [l]{false}
cpth – Initial coordinates are read in PATH format [l]{false}
cchr - Initial coordinates are read in CHARM format [l]{false}
36
temp – Desired temperature (the temperature is determined by kinetics) [d]{300.d0}
#ste – number of sampling points for a given reaction coordinate [i]{1}
#eqv – number of steps before scaling the temperature according to assign temperature
[i]{10}
#sve - A different keyword indicating period of velocity scaling that is used [i]{100}
#pri – Frequency of reports on computation progress [i]{1}
#tes - Number of steps between checks that the linear constraints are satisfied [i]{500}
#wcr – Frequency of writing down coordinates to a file [i]{500}
rand - a seed for a random number generator that samples velocities. Seed must
be positive [i]{1}
step - time step in ps [d] {0.001d0}
firs – initial value of lambda point in thermodynamic integration [d]{0.d0}
fina – final value of lambda point in thermodynamic integration [d]{1.d0}
sted – change in lambda, the thermodynamic integration variable [d]{5.d-2}
Sample therm input
file conr name=(val1.wcon)
read
file conp name=(val2.wcon)
read
file rcrd name=(val.crd)
read
file wcrd name=(mut50-60.dcd) bina unit=12 wovr
file wvel name=(mut50-60.dvd) bina unit=13 wovr
#ste=1000 #equ=900 #pri=50 #wcr=1000 list=20 #tes=50 newv=1100
#eqv=1 #sve=1 rand=3451187 step=0.001
temp=300. relx=9.0 rvmx=6. epsi=1. cdie v14f=8. e14f=2. shkb
firs=50.d-2 fina=60.d-2 sted=10.d-2
hvdw
action
37
10.1.9 mfep Purpose : Program mfep ("minimum free energy path ") is a variant of the finite
temperature string method of Eric Vanden-Eijnden (JCP 123, 134109, 2005). The
selected subset of coordinates at k th milestone, {X k } are used to define hyperplane
ϕ k and the unit vector normal to the plane as in the chmin program. The planes are then
updated by calculating the average configuration on the plane for τ time step (eq.1).
After averaging the planes are gradually updated to the average configuration in Δ steps
(eq.2). The indices in the subscript are plane numbers while the superscripts are iteration
step. To prevent the instabilities due to the noise while averaging (especially in the
beginning of the calculations) the new plane ϕ k(n +1) is smoothed by eq.3 where s is the
degree of smoothing. Also a linear reparametrization is used to evenly distribute the
distances between the planes see Ref. (JCP 125, 024106, 2006) for more details.
ϕ k(m +1) =
τ
1
X k dt
τ ∫0
(1)
ϕ k(n +1) = ϕ k(n) + Δ(ϕ k(m +1) − ϕ k(n) ) (2)
s
ϕ k(n +1) = sϕ k(n +1) + (ϕ k(n+1+1) + ϕ k(n−1+1) ) (3)
2
mfep requires multiple processors that communicate by mpi
The input file for each point should be prepared and named in such a way that the first
processors sees inp_0001, second inp_0002 etc.
Use: mpirun –np number_of_processes mfep
Standart output for every is written to pth_00XX
conn – the file wcon has all the information on the molecular topology (what is bound to
what) and the parameters required for energy calculations. The file must be
produced before launching fp with the program conn
mlst -- the file with the individual milestoning image along the reaction coordinate is
stored. Must be of path format.
rcrd - the file where the Cartesian coordinates of the initial run are stored. Must be in
path (pth) format. It is the initial structure for sampling in the plane. It can be a
copy of mlst file
38
upfr – a file in which the projected force along the reaction coordinate is written during
sampling.
wcrd – a path formatted file where output coordinates are stored.
wpep – a file with the average coordinates are stored
wene – a file to write standard energy output during the simulation.
wmom– a file to write distance of the milestone from the initially started one.
Variables
nwav – number of time steps for averaging configurations for new plane update
frep – number of steps used to update to the new milestone [i]{2000}
smth – degree of smoothing of the new milestones (reaction path) [d]{0.1d0}
orth – The current run is OEQ type[l]{true}
ucrb – use crbm constraints to force the system to remain in the plane[l]{true}.
temp – assigned temperature[d]{300.d0}
ptmp – temperature of equilibration phase (if #equ not zero) [d]{100.d0}
grid – the number of path structures minus 1 [i]{0}
#wcr – period for writing output coordinates[i]{100}
#nen – period for writing ostandart energies [i]{100}
#upfr – period for writing the projected force [i]{100}
#mom – period for writing the distance from the initial configuration
nwpep – period for writing the trajectory on wpep [i]{100}
selc – select the subset of particles to define the reaction coordinate. A pick command
must follow the select : pick … done
newv – re-sample velocities from the Boltzmann distribution. Should not be used
frequently [i]{10000}
#scl – frequency between attempted velocity scaling [i]{20}
#equ – number of equilibration steps (standard MD with no constraints) [i]{0}
#tes – frequency for testing constraints [i]{0}
step – the size of the time step [d]{0.005d0}
rand – random number to initiate velocities from the Maxwell distribution
#mxstps – maximum number of steps. [i]{10000}
cent – restraining the geometric center of a selected set of atoms to be at the origin. The
following parameters are optional in the cent command in the same line: kcnt,
which is the force constant; xeqm, yeqm ,zeqm which are the coordinates to
restraint the geometric center of the selected subset of coordinates. [l] {false}
cent kcnt=[d]{10.d0} xeqm=[d]{0.} yeqm=[d]{0.} zeqm=[d]{0} pick… done
The last part is a pick command to select the subset of coordinates. Check the
documentation for pick command if unfamiliar with it.
prints a warning [l] {false}
cgsk – matrix shake in conjugate gradient[l] {false}
mshk – a flag to indicate the use of special constraint protocol matrix shake for water
molecules. When added to the conn program, bond and angles are excluded from
39
connecticity list of the water molecules and therefore the same flag must be used in
follow up codes. [l] {false}
mtol is the tolerance in mshake [d]{1.d-7}
shkb – turns on shaking of all bonds[l] {false}
nori – turns on the reorientation during the dynamics[l] {false}
nosc – avoids scaling of temperature (done by default) [l] {false}
orie – by default to avoid rigid body motion for simulation in vacuum, the overlapping of
the structure with the initial structure is done using all atoms. If orie is called
explicitly it is possible to pick only some atoms[l] {false}
sym2 – A flag indicating that the box size is changing during the simulation and the final
size will be s defined according to the values provided in the present line. The rate of
change will be based on
linear interpolation from the value defined by the symm
command and the values found at sym2
xtr2, ytr2, ztr2 are the final sizes of the box that is changing during the simulation
[d]{0.d0}
cdie – turns true the use of constant dielectric constant (=1). Default and currently the
only option [l] {true}
Sample output
The norm of the difference between the current milestone and the initial one,
dk = ϕ nk − ϕ k0 , is written to each pth_out_XXXX.log file except for the
pth_out_0001.log where
∑d
k
is reported in.
k
Third and fourth columns of pth_out_XXXX.log files are the norm of the vectors
between k, k-1, lk , k−1 = ϕ k − ϕ k −1 , and the one between k-1 and k+1 milestones.
more pth_out_0002.out
…
-------- PME - setting up -------ewald_cof: 0.277516754261821
ewald coeff. = 0.277517 actuall cutoff = 8.50
contrib. to direct sum on cutoff sphere= 0.1000E-03
desired direct space cutoff for "exact" = 21.62
total HEAP storage needed for PME =
1005
total STACK storage needed for PME=
568386
Grid dimensions in PME: x = 64 y = 64 z = 64 ; intrpord = 3
-------- PME - end of setting up -------RANLUX DEFAULT INITIALIZATION: 314159265
RANLUX DEFAULT LUXURY LEVEL = 3 p = 223
Time
init mlst (i-1)-i (i-1)-(i+1)
40
----------100
200
300
400
------------ ---------- ----------0.00000 0.02549 0.03790
0.00000 0.02549 0.03790
0.02226 0.01062 0.02120
0.02498 0.00996 0.01989
…..
Sample input file
file conn name=(ala.wcon) read
file mlst name=(ala_1.pth) binary read
file rcrd name=(ala_1.pth) binary read
file wene name=(ala_1.wene) wovr
file wpep name=(pep_1.pth) binary wovr
file wcrd name=(crd_1.pth) binary wovr
file wmom name=(ala_1.mom) wovr
orth
ucrb
smth=0.1 frep=1000
nwav=20000
ptemp=100
temp=300
grid=7
step=0.0005
#nen=500
#mom=100
#wcr=100
#equ=200
#mxs=50000
#tes=2000
#scl=40
newv=8000
mshk mtol=0.000001
symm xtra=20 ytra=20 ztra=20
ewald dtol=1e-06 grdx=16 grdy=16 grdz=16
selc pick #prt 2 2 | #prt 4 4 | #prt 6 6 | #prt 8 8 | #prt 10 10 done
relx=8.5 rvmx=8
list=20
epsi=1
cdie
hvdw mdiv
action
41
9.1.10 umbr Purpose : Executable to perform an umbrella sampling calculation.
Use : umbr < input > output
rcrd - Initial Cartesian coordinates in PATH (pth) format.
quni – the sampled value of the umbrella coordinate
wcrd – Unformatted Cartesian coordinates of sampled configuration (PATH format).
wvel – Unformatted file of velocities.
Variables: (in square brackets – type, curly bracket default)
EPROP ARE AVAILABLE FOR umbr EXCEPT surften, hscl, hvdw, cnst, symm, Mors,
repl, cent, gbo1, gbo2, npol
nocut – Non bonded interactions are computed in full according to existing lists.
No additional distance check is used in the energy routines. Important (and the
(like conjugate gradient) [l]{true for minimization, false otherwise}
cdie – constant dielectric (=1). [l]{false}
rdie – distance dependent (linear) dielectric (not active). [l]{false}
temp – desired temperature (the temperature is determined by the average kinetic energy)
[d]{300.d0}
grid – number of grid points in the path [i]{200}
istp – the size of the step between sequential points along umbrella path [i]{1}
strt – It is the starting value of the thermodynamic variable along which we integrate in
the
computations of the potential of mean force [i]{1}
finl – the index of the final coordinate set in a collection of structures along the reaction
coordinate[i] {-1}
#ste – number of sampling points for a given reaction coordinate [i]{1}
#sve – frequency of velocity scaling [i]{100}
npri – frequency of writing a summary of computation progress [i]{1}
#tes – Number of steps between checks that the linear constraints are satisfied [i]{500}
#wcr – frequency of writing down coordinate sets to a file [i]{500}
rvrs – if true, compute the free energy backwards [l]{false}
effm – Compute the effective mass of the reaction coordinate [l]{false}
forc – is the difference in the forces at the crossing point of the two electronic curves
[d]{0.1d0}
rand – a seed for a random number generator for sampling velocities (must be positive).
[i]{1}.
step – time step in ps [d] {0.001d0}
42
Sample umbrella input (A conformational transition in valine dipeptide)
file quni name=(q.data) wovr
file rcrd name=(valmin.dcd) bina wovr
file wcrd name=(valumb.dcd) bina wovr
file wvel name=(valumb.dvd) bina wovr
#ste=2000 #equ=2 #pri=10 #wcr=5 list=20
#sve=4000 rand=-3451187 step=0.001 grid=1 #str=2 forc=5.0d0
rvrs=-1 temp=300.
rmax=9. epsi=1. cdie v14f=8. e14f=2.
hvdw
action
9.1.11 sdel (parallel version, requires MPI) / sdelS (serial version) Purpose: This module searches for a trajectory between two specified structures by
action minimization. Given the two end structures xi , x f , the target function minimized
⎛ ∂S
by sdel is T = ∑ ⎜
⎜
j = 2 ⎝ ∂x j
N −1
2
⎞
⎟⎟ + C , where S is the functional
⎠
)
(
N −1
N
1
S ⎡{x j } ⎤ = ∑
2 (E − U (x j ) )+ 2 (E − U (x j +1 ) ) Δl j , j +1 , Δl j , j +1 = Mx j − Mx j +1
⎢⎣
j =1 ⎥
⎦ j =1 2
(1)
and C is a restraint that ensures that configurations x j ’s are distributed approximately
uniformly along the pathway: C = η1 ∑ (Δl j , j +1 − Δl
j
)
2
, Δl =
1 N −1
∑ Δl j , j +1 . The
N − 1 j =1
target function T is minimized by simulated annealing.
Use : mpirun –np #procs sdel (parallel version)
sdelS
(serial version)
Executable takes input from files inp_0000, inp_0001,…, inp_(#proc-1), where these
should be identical. The number of processors needs to be specified also in the input files
(see below). Each processor then writes its output to the file pth_out_(#procID).log
conn – connectivity file
rcrd – the starting input trajectory in PATH format (flag cpth needs to be specified) or
two structures (the end points) in CHARM format (flag cini needs to be specified). In the
43
case the two end structures are specified the initial trajectory in the pathway searched is
set to linear interpolation between the two end points. This interpolation scheme is not
recommended with typical all-atomistic potentials since energies of intermediate
structures would be too large due to hardcore overlap and one would get negative values
under square roots in Eq. (1). It is better to start with partially refined path like a
minimum energy path.
wcrd – the output file for the minimized trajectory, this is binary file in PATH format
EPROP ARE AVAILABLE FOR sdel
This module supports FREADY related parameters .
sdel specific variables: (in square brackets - type, curly bracket default)
cpth – input is read in PATH format [l] {true}
cini – input is read as two CHARM format structures that are interpolated [l] {false}
dtop – integration step in simulated annealing minimization [d] {10-5}
grid – number of images that represent the trajectory (including the end points) [i] {10}
#ste – total number of minimization steps in simulated annealing algorithm [i] {1}
#pri – each #pri steps information about current state of the trajectory is printed out to
standard output [i] {1}
#wcr – if #wcr ≠ 0 , each #wcr steps trajectory is saved to file int_pth_NNNN.pth in
PATH format (where NNN increases as 0001,0002,…) [i] {0}
tmpr – starting temperature of simulated annealing (not to be confused with physical
temperature) [d] {10.0}
list – The frequency of updating the non-bonded list, but in the module also specifies
cooling schedule of simulated annealing. Total number of steps is divided to small
cycles of length list. The starting temperature of each cycle is linearly decreasing
from tmpr (see above) to 0.0. Temperature in each cycle linearly changes from its
starting temperature to 0.0 as well [i] {1}
rand – random number generator seed, if more processors are used, seed of each
processor is set to rand + procID [i] {1}
pdqe – total energy of the system E [d] {0.0}
ctmp – constant temperature run, instead of linear cooling [l] {false}
proc – number of processors used in the calculation (1 ≤ proc ≤ grid-2) [i] {1}
gama – value of parameter η1 from the definition of the equidistance restraint C [d] {102}
clog – introduces additional restraint of −η2 ∑ log(Δl j , j +1 ) that helps to avoid collapse of
j
the trajectory. Parameter η2 is linearly scaled from clog to 0 [d] {0.0}
fene – Parameter helping to avoid structures with U > E by adding a restraint
0.1 fene N ∑ ( E − U j ) to target function. [d] {1.0}
j
44
noRA – the simulated annealing is not initiated with random momentum corresponding to
given temperature, but instead the momentum is set to 0 [l] {false}
itpl – interpolate mode, allows for interpolation/skipping of structures from the input. It is
useful when refining the trajectory description. If itpl=0 whole input trajectory is
used as is. If itpl=1 a new structure will be interpolated in between all existing
neighboring structures changing the total number of structures from igrid to 2
igrid – 1. If itpl=2, a structure is kept, then skno (see below) structures are
skipped,… Nonzero values work only in serial version (proc=1). [i] {0}
ovlp – overlap the structures with respect to each other after the initial trajectory is read
from the input [l] {false}
select – successive pick command selects a subset of particles for the calculation [l]
{false} CURRENTLY DOES NOT WORK!
skno – in the case of reading input trajectory in interpolate=2 mode this parameter
specifies how many structures should be skipped after reading in a structure from
input file [i] {1}
Sample sdel input
~debu
file rcrd name=(valmin200.pth) bina read
file wcrd name=(output.PTH) bina wovr
#ste=5000 #pri=100 list=500
gama=2000.0 grid=200 pdqe=-42.2
~
proc=49 cpth tmpr=3.0 dtop=1.0d-4 fene=1.d0
rmax=9999.
gbsa hvdw
amid
action
45
9.1.12 sdp (parallel version, requires MPI) / sdpS (serial version) Purpose: This module searches for an overdamped trajectory between two specified
structures by action minimization. Given the two end structures xi , x f , the target
function minimized by sdp is
2
N −1
⎛
⎞
(2)
T (x 2 , … , x N −1 x1 = xi , x N = x f ) = ∑ H S + ⎜ ∂U
⎟ x j +1 − x j + C ,
∂
x
j ⎠
⎝
j =1
where C is a restraint that ensures that configurations x j ’s are distributed approximately
1 N −1
∑ Δl j , j +1 . The
N − 1 j =1
j
target function T is minimized by simulated annealing in parallel version (sdp) or by
conjugate gradient local minimization (sdpS).(Olender and Elber 1996; Elber and
Shalloway 2000; Majek, Elber et al. 2009)
uniformly along the pathway: C = η1 ∑ (Δl j , j +1 − Δl
)
2
, Δl =
Use : mpirun –np #procs sdp (parallel version)
sdpS
(serial version)
Executable takes input from files inp_0000, inp_0001,…, inp_(#proc-1), where these
should be identical. The number of processors needs to be specified also in the input files
(see below). Each processor then writes its output to the file pth_out_(#procID).log
rcrd – the starting input trajectory in PATH format (flag cpth needs to be specified) or
two structures (the end points) in CHARM format (flag cini needs to be
specified). In a case the two end structures are specified the initial trajectory in the
pathway searched is set to linear interpolation between the two end points.
EPROP ARE AVAILABLE FOR sdp and sdpS
This module supports FREADY related parameters
dtop – integration step in simulated annealing minimization [d] {10-5}
#ste – total number of minimization steps in simulated annealing algorithm [i] {1}
#pri – each #pri steps information about current state of the trajectory is printed out to
46
#wcr – if #wcr ≠ 0 , each #wcr steps trajectory is saved to file int_pth_NNNN.pth in
PATH format (where NNN increases as 0001,0002,…) [i] {0}
tmpr – starting temperature of simulated annealing (not to be confused with physical
temperature of the system) [d] {10.0}
list – The frequency of updates of the non-bonded lists. Here also specifies cooling
schedule of simulated annealing. Total number of steps is divided to small cycles
of length list. The starting temperature of each cycle is linearly decreasing from
tmpr (see above) to 0.0. Temperature in each cycle linearly changes from its
starting temperature to 0.0 as well [i] {1}
rand – random number generator seed, if more processors are used, seed of each
processor is set to rand + procID [i] {1}
ctmp – constant temperature run, instead of linear cooling [l] {false}
proc – number of processors used in the calculation (1 ≤ proc ≤ igrid-2) [i] {1}
clog – introduces additional restraint of −η2 ∑ log(Δl j , j +1 ) that helps to avoid collapse of
j
the trajectory. Parameter η2 is linearly scaled from clog to 0 [d] {0.0}
noRA – the simulated annealing is not initiated with random momentum corresponding to
given temperature. Instead the momentum is set to 0 [l] {false}
hami – value of H S from Eq. (2) [d] {0.0}
itpl – interpolate mode, allows for interpolation/skipping of structures from the input. It is
useful when refining the trajectory description. If itpl=0 whole input trajectory is
used as is. If itpl=1 a new structure will be interpolated in between all existing
neighboring structures changing the total number of structures from igrid to 2
igrid – 1. If itpl=2, a structure is kept, then skno (see below) structures are skipped,…
Nonzero values work only in serial version (proc=1). [i] {0}
input file [i] {1}
select – successive pick command selects a subset of particles for the calculation [l]
{false} CURRENTLY DOES NOT WORK!
anne – use annealing in sdpS [l]{false}
tolg –tolerance of gradient. When tolg is reached during the minimization,
terminate[d]{1.d-3}.
dfpr – estimated reduction in energy during first step [d]{1d-2}
Sample sdp input
file rcrd name=(valmin200.pth) bina read
file wcrd name=(output.PTH) bina wovr
#ste=100 #pri=1 list=100 anne
gama=2000.0 grid=30 hami=1.d-5
~
47
proc=1 cpth tmpr=30.0 dtop=1.0d-4
~
rmax=9999.
gbsa hvdw
amid
action
9.1.13 chmin (this module in a serial mode only)
Purpose: This module searches for a trajectory between two specified structures by
action minimization. Given the two end structures xi ., x f , the target function minimized
by chmin is
N −1
T (x 2 , … , x N −1 x1 = xi , x N = x f ) = ∑ U (x j ) + C ,
(3)
j =2
where C is a restraint that ensures that configurations x j ’s are distributed approximately
uniformly along the pathway:
⎛ Δl 2j , j + 2 ⎞
2
ρ
1 N −1
C = η1 ∑ (Δl j , j +1 − Δl ) + ∑ exp ⎜ -λ
⎟ ,
Δl =
∑ Δl j , j +1 .
2
⎜
λ j
N − 1 j =1
Δl ⎟⎠
j
⎝
The target function T is minimized by either simulated annealing or conjugate gradient
local minimization.
Use : chmin < input > output
rcrd – the starting input trajectory for algorithm, there are 4 different input styles
supported:
(i) PATH files (binary, double precision, specifies all structures of a trajectory)
(ii) INIT. Reading formatted coordinates file for reactants and products and generating
the rest of the path by linear interpolation
(iii) INTRpolate. Given a low resolution path in PATH format, add structures in between
to refine the path.
The keywords cpth, cini, cint specifies which of the styles are used, by default PATH
format is assumed.
Energy/general variables: (in square brackets - type, curly bracket default)
48
This module supports all energy and FREADY related parameters, see documentation of
dyna executable
All eprop exept: mors,mors alph, Dmor, alph, spec, rcut, lmda, Arep
brep, beta, cent, gbo1, gbo2, npol,surften, ten, hscl,
cnst, ball
None of FREADY parameters
NOTE: metl is reading ‘amtl’ and ‘alfa’ in EPROP it’s written “amtl” and “bwal”
chmin specific variables: (in square brackets - type, curly bracket default)
cint – input is read in PATH format, with interpolation [l] {false}
dtop – integration step in simulated annealing minimization [d] {5.10-4}
#ste – number of minimization steps [i] {1}
#pri – each #pri steps, information about current state of the trajectory is printed out to
#tes – number of steps between checks that the linear constraints are satisfied [i] {500}
#wcr – each #wcr steps trajectory is saved to file int_pth_NNNN.pth in PATH format
(where NNN increases as 0001,0002,…) [i] {0}
tmpr – starting temperature of simulated annealing of the chain (not to be confused with
the physical temperature of the system) [d] {10.0}
list – The non bonded list is updated each “list” steps in conjugate gradient minimization
(according to the middle structure). In simulated annealing mode non bonded list
is updated for every structure every single step [i] {1}
rand – random number generator seed, [i] {1}
repl - value of parameter ρ from the definition of the equidistance restraint C [d] {102}
lmbd - value of parameter λ from the definition of the equidistance restraint C [d] {2.0}
tolg - tolerance of the gradient used in conjugate gradient optimization [d] {10-3}
dfpr - estimated reduction in the value of the target function T in the first step [d] {0.01}
anne – use simulated annealing [l]{false}
input file [i] {1}
select – successive pick command selects a subset of particles on which the chain
constraints will be imposed [l] {false}
debu – debug option {false}
49
nbfi – A flag to indicate that a soft, finite van der Waals repulsion is used for difficult
annealing (Gausssian repulsion is employed) [l]{false}
Sample chmin input
file rcrd unit=5 read
file wcrd name=(valmin.pth) bina wovr
#ste=1000 #pri=10 #wcr=1000 list=1000 repl=1000.
lmbd=2. gama=20. grid=5
rmax=9999. epsi=1. cdie cini
hvdw
action
file name=(val01.crd) read
file name=(val60.crd) read
50
9.1.14 fp Purpose : Compute first passage time trajectories between Milesones and collect
statistics. Program fp ("first passage") implements milestoning in MOIL. It runs
in one of two modes. The first is "oeq mode" ("orthogonal equilibration"), in
which the peptide is constrained to the plane normal to the current milestone. The
second mode is "fp mode," in which the peptide is unconstrained and evolves
dynamically (according to the Verlet algorithm) until it makes first passage to a
neighboring milestone plane. Note that I've just used "fp" as both the name of the
program and the second mode. To avoid confusion, I'll usually say "the program"
when referring to the entire Fortran program, and I'll reserve "fp" for the name of
the second running mode.
Use : fp < input > output
produced before launching fp with the program conn
rcrd – the file where the Cartesian coordinates of the initial run are stored. Must be in
path (pth) format. In OEQ mode it is the initial structure for sampling in the plane.
In FP mode the file contains multiple coordinate sets that are used as initial
conditions for trajectories that terminate on the nearby Milestones.
mlst – the file that stores a complete list of the Milestoning images along the reaction
coordinate. If there are M milestones in the reaction path then the milestone file
must contain M structures. Must be of path format.
upfr – a file in which the projected force along the reaction coordinate is written during
sampling. Can be used to estimate the PMF along the reaction coordinate. Active
only in the OEQ phase.
upvl – a file in which the velocities along the direction of the reaction coordinate are
written. Can be used to estimate velocity memory along the reaction coordinate.
wcrd – a path formatted file where output coordinates are stored. In OEQ run the output
is used to initiate FPT trajectories. In FPT run the file includes sample.
conformations along the terminating trajectory
wfpt – a file with the first passage times sampled.
wfpp – a file with first passage time configurations.
wpep – a file with the trajectory
wene – a file to write standard energy output during the simulation, at present not in use.
51
wmom – This output file contains the distances and squared distances of the output
configurations from the initial configuration given in rcrd. The order in which the
distances are listed is the same as the order of the configurations in wcrd.
wdot – write one type of dot product into a file udot
wdt1 – write down dot products between path unit vectors to file unit udt1
wdst – write the distance from the initial structure to file unit udst
Variables
EPROP ARE AVAILABLE FOR fp EXCEPT surften, hscl, mors, repl.
orth – The current run is OEQ type (sample configurations in a Milestone) [l]{false}
tefp – the current run is FP (run terminating trajectories between Milestones) [l]{false}
ucrb – use crbm constraints to force the system to remain in the plane.
temp – assigned temperature [d]{300}
ptemp – temperature of pre-equilibration phase [d]{-1}
grid – the number of path structures minus 1 [i]{lgrid – parameter in LENGTH.BLOCK}
nene – period between printing energies (not used) [i]{100}
nrcrd – number of structures in urcrd [i]{1}
nwcrd – Frequency of writing output coordinates [i]{0}
nwpep – period for writing the trajectory on wpep
selc – select the subset of particles to define the reaction coordinate. A pick command
must follow the select : pick … done
nfrz – select particles that are not frozen (fixed). In the same line a “pick” command must
be present [l]{false}
newv – re-sample velocities from the Boltzmann distribution. Should not be used
frequently [i] {1000}
#scl – frequency between attempted velocity scaling [i]{100}
#pri – frequency of printing out progress reports. [i]{1}
#tes – frequency for testing constraints [i]{0}
step – the size of the time step [d]{0.001d0}
rand – random number seed to sample velocities from the Maxwell distribution [i]{1}
tmslt – the index of the current milestone [i]
pmslt – the index of the previous milestone. The spacing between Milestones if flexible.
We can use (for example) milestone 5 as current, milestone 1 as previous and milestone 7
as next. [i]
nmlst – the index of the next Milestone. [i]
#mxstps – maximum number of steps. In OEQ it is the actual number of steps used for
sampling. In FP run it is a maximum runtime of a trajectory. We may give up on a
termination if it runs for too long. Extremely long termination times suggest that
more Milestones should be put in between. [i] {-1}
prints a warning
nfrz – select the particles that WILL NOT be frozen. In the same line as nfrz a "pick"
command must follow
52
shkb - shake all bonds (alternatively one may try shkl for shaking bonds with light
particles only m<1.1, shkb is highly recommended for dynamics). This option does
not work at present if the SHAKE constraints are present for the subset of atoms
that is included in the reaction coordinate.
cgsk – matrix shake using conjugate gradient (not operational at present) (Weinbach and
Elber 2005) [l]{false}
mshk – A special constraint implementation for water molecules
When added to the conn program, bond and angles are excluded from
connectivity list of the water molecules
mtol is the tolerance in mshake [d]{1.d-7}
nori – turns off the reorientation during the dynamics
nosc – avoids scaling of temperature (done by default) [l]{false}
orie – by default to do the reorientation to avoid rigid body motion, the overlapping of
the structure with the initial structure is done using all atoms. If call orie it is
possible to pick only some atoms
TORS – turns on the constraining of some torsional angles example –
TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100
kcns - amplitude for torsional constraint (the larger kcns, the stronger the
cneq equilibrium angle expressed in degrees [d]{-999.d0}
(TORS keyword must come AFTER amid if amid is used)
sym2 - A flag indicating that the box size is changing during the simulation and the final
size will be defined according to the values provided in the present line. The rate
of change will be based on linear interpolation from the value defined by the
symm command and the values found at sym2
example –
sym2 xtr2=26.8 ytr2=26.8 ztr2=26.8
[d]{0.d0}
cdie – constant dielectric (=1). [l]{true}
Sample input
file conn name=(aladip_w248.wcon) read
file mlst name=(albet.pth) binary read
file rcrd name=(oeq_M_144_1.pth) binary read
file wene name=(fp_M_144_1.wen) wovr
file wcrd name=(fp_M_144_1.pth) binary wovr
file wfpt name=(fp_M_144_1.fpt) wovr
file wpep name=(fp_pep_M_144_1.pth) binary wovr
file wfpp name=(fpp_M_144_1.pth) binary wovr
file upvl name=(fpp_M_144_1.pvl) wovr
tefp temp=303 grid=143 step=0.001 #pri=10000
#rcr=10 #wcr=10000 #pep=50 #equ=0
53
tmlst=144 pmlst=141 nmlst=0
rand=aq13623277 #tes=10000 #scl=50 newv=0
mshk mtol=0.0001
symm xtra=20 ytra=20 ztra=20
ewald dtol=1e-06 grdx=16 grdy=16 grdz=16
relx=8 rvmx=7 list=20 hvdw amid
action
54
10.1.10
DiM Purpose: Calculating first passage times and stationary populations between states
defined as reactant and product. Program Directional Milestoning (DiM) is a variant of
milestoning (fp) at which the dividing hypersurfaces are redefined in more than one
dimension and at the same time the concept of Milestone separation is done directional.
For details see (JCTC,2010,6,p1805). Program runs in one of three modes. The first is
dim_prepare which runs trajectories between a set of anchors pre-defined. The aim is to
identify the anchors that are connected directly. The second program is dim_sampleS
which constrains the dynamics around the interface defined between the two directly
connected anchors. Sampled points are further integrated bacward in time to check if they
are from First Hitting Point Distribution (FHPD). Phase space points that are FHPD are
written to a file and used in the last mode. The third mode is called dim_run which runs
a unconstrained MD trajectory from the points saved from the previous mode and it
reports the first passage times when trajectory hits another milestone.
Input files(required) for dim_prepare
produced before launching dyna with the program conn
rcrd - the file where the Cartesian coordinates of the anchors are stored. Must be in path
(pth) format.
Output files(required) for dim_prepare
wcrd – a path formatted file where output coordinates are stored. In dim_sample the
output is used to isample at the interface.
Input parameters(required) for dim_prepare
#ste – upper bound for length of each searching trajectory
stot – number of searching trajectories per cell
grid – number of cells (or anchors)
andr – a flag to use Anderson thermostat
andC – the probability of velocity resampling of waters for Anderson thermostat (see Ref
Juraszek and Bolhuis 2008 for details)
cpth – [l] {false} the input coordinate file is from a pth formatted file
read structure number istr=[i]{0}
cell – cell id that is going to be used
sele – a flag for selection of collective variables
TORS atm1=2 atm2=4 atm3=6 atm4=10 weig=1
weig [d]{1.d0} the weight of each torsional constrain
55
Input example
file conn name=(sugar.wcon) read
file rcrd name=(centers.PTH) bina read
file wcrd name=(interfaces_18.PTH) bina wovr
~
#ste=10000 #pri=1 #lis=20
step=0.001 stot=500 grid=20
info=250 #scl=30
andr andC=0.2
~
cpth
cell=18
~
tmpi=300 tmpf=310
mshk mtol=1.d-6
symm xtra=24.5 ytra=24.5 ztra=24.5
ewald dtol=0.000001 grdx=32 grdy=32 grdz=32
relx=9 rvmx=8
~
sele
~ set of reduced variables here:
TORS atm1=6 atm2=33 atm3=40 atm4=64 weig=1.
action
Input files(required) for dim_sample
what) and the parameters required for the energy calculations. The file must be
produced before launching dyna with the program conn
rcrd - the file where the Cartesian coordinates of the anchors are stored. Must be in path
(pth) format.
rint - the file where the Cartesian coordinates of the interface is stored. Must be in path
(pth) format.
Output files(required) for dim_sample
wcrd – a path formatted file where output coordinates are stored. In dim_run this will be
used to launch trajectories for first hitting times.
wvel – a path formatted file where output velocities are stored. In dim_run this will be
used to launch trajectories for first hitting times.
56
Input parameters(required) for dim_sample
#ste – upper bound for length of each sampling trajectory
#sav – how often saving will be attempted
umbr - width of umbrela sampling region/interface
appr - threshold value for distance between free trajectory conformation and the closest
interface
K1_U - force constants for umbrela sampling
between cell1 and conformation
K2_U - force constants for umbrela sampling
between cell2 and conformation
cell – incoming cell id that is going to be used
cel2 – outgoing cell id that is going to be used
Input parameters(required) for dim_run
#ste – upper bound for length of each searching trajectory
stot – number of searching trajectories per cell
cell – cell id that is going to be used
Input example
file conn name=(sugar.wcon) read
file rcrd name=(centers.PTH) bina read
file wcrd name=(interfaces_18.PTH) bina wovr
57
~
#ste=10000 #pri=1 #lis=20
step=0.001 stot=500 grid=20
info=250 #scl=30
andr andC=0.2
~
cpth
cell=18
~
tmpi=300 tmpf=310
mshk mtol=1.d-6
symm xtra=24.5 ytra=24.5 ztra=24.5
ewald dtol=0.000001 grdx=32 grdy=32 grdz=32
relx=9 rvmx=8
~
sele
~ set of reduced variables here:
action
9.1.15 scndrv (and numerical) Purpose: Computes second derivative matrix of the potential energy for one coordinate
system. “scndrv” computes the derivatives analytically, while “numerical” numerically
by finite difference. The eigenvalues of the mass weighted scndrv are written to the
standard output. The same code is also used to interface second derivative calculations
with other programs that need it.
Use: scndrv < input > output
rcrd – the file where the Cartesian coordinates of all the particles are stored: CHARMM
format
rpth – read coordinates from path file.
Out files types: Required
wene – file name for energy reporting
58
Variables
EPROP ARE AVAILABLE FOR SCNDRV(and Numerical) EXCEPT gbsu, ewald, vprt,
mors, repl,gbo1.gbo2, metl, repl.
debu – A flag for printing a lot of debugging information. DO not use unless you are a
moil expert.
rdie – A flag indicating that good old Coulomb law is modified from 1/r to 1/r2 (not
active).
Sample input
file rcon name=(valdip.wcon) read
file rcrd name=(valdip.crd) read
rmax=9999 epsi=1. cdie
action
59
10.2 Major options 10.2.1 LES Purpose: LES (Locally Enhanced Sampling) is a mean field approach that enhances the
sampling of a small part of the systems that we are mostly interested in. Examples are
multiplication of side chains in homology modeling, or of a peptide in a box of water. It
was introduced in a paper by Elber and Karplus (Elber and Karplus 1990) to study ligand
diffusion through proteins (by having a probability density of one ligand represented by
60 “ligand fragments”) and was extended to other applications such as global
optimization by Roitberg and Elber(Roitberg and Elber 1991), and Simmerling and
Elber(Simmerling and Elber 1994) and to free energy calculations by Verkhivker and
Elber(Verkhivker, Elber et al. 1992). The current implementation in MOIL is highly
flexible and supports all major modules that use energy calculation.
Input file
A single line needs to be added to the input file that generates the connectivity file. It
must come immediately after the “action” line. It is
MULT pick …. done #cpy=[i]{0}
The meaning of which is the following. Pick a subset of atoms by the “pick” command
and then multiply them #cpy times. The “MULT” is just an indicator to the program that
we are going to multiply some of the particles. Once the file is modified we run the
program conn as usual:conn < conn.inp > conn.out
The connectivity file generated has some special features that tell other programs that
particles have been multiplied. Multiplied particles do not see each other (they are like
“ghost” to each other) and they interact with other particles only on the average. Strictly
speaking the multiple particles represent probability density chopped to fragments of
particles. Formally if the probability density of a single trajectory can be represented by a
delta function: ρ ( X , P, t ) = δ (X − X 0 (t ), P − P0 (t )) then the LES approach uses an
ansatz probability density
⎤
1⎡
⎢ ∑ δ (x − xi 0 (t ))δ ( p − pi 0 (t ))⎥
N ⎣ i =1,..., N
⎦
Where x and p is the part of the system that we do not enhance. The number of copies
in the above formula is N . In addition to connectivity file we also need a coordinate file.
Such a file can be produced with the graphic interface or manually. The LES option
duplicates atoms. So if (for example) we enhance the two atoms N and H four times the
crd file at the early beginning in which all the multiplied atoms occupy the same position
in space will look something like
12 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000
13 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000
14 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000
15 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000
16 3 ALA H -1.09464 1.29356 -1.15315 ALA3 FREE 0.00000
ρ LES ( X , P, t ) = δ (x − x0 (t ))δ ( p − p0 (t ))
60
17
18
19
3 ALA H
3 ALA H
3 ALA H
-1.09464 1.29356 -1.15315 ALA3 FREE 0.00000
-1.09464 1.29356 -1.15315 ALA3 FREE 0.00000
-1.09464 1.29356 -1.15315 ALA3 FREE 0.00000
The only differences between the lines are the atom numbers (the residue numbers
remain the same). Such a file can be prepared from an ordinary crd file by editing, or by
using the graphic interface for “Processing PDB” and selecting the LES option. The
graphic interface prepares both the connectivity and coordinate files of LES, which
makes it especially convenient for the present application.
Sample input
file poly name=(a4.poly) read
file wcon name=(a4.wcon) wovr
action
MULT pick #mon 3 3 done #cpy=4
*EOD
61
10.2.2 MUTA Purpose: MUTA is a program that performs thermodynamic integration for the free
energy differences of two molecules that differ in a small number of atoms. The
algorithm picks two groups of particles, say groups 1 and 2, each of the groups belong to
a different molecule representing a mutated part and eliminates all the interactions
between them, so that particles belonging to two different groups do not see each other.
Moreover, the interactions between particles of group 1 and all the particles not
belonging to group 2 are rescaled by a parameter λ, and all the interactions between
particles belonging to group 2 and all the particles not belonging to group 1 are rescaled
by a factor (1-λ), so that the total Hamiltonian becomes:
H(λ ) = K 0 + K1 + K 2 + U 0 + λU1 + (1− λ )U 2
The free energy difference turns out to be:
λ2
ΔF = F(λ 2 ) − F(λ1 ) =
∫
U 2 − U1
λ1
λ'
dλ '
To run a MUTA simulation a special connectivity file is required. A typical input file for
conn in preparation for MUTA calculation is:
Sample Input
file poly name=(ala-to-val.poly) read
file ubon name=(ala-to-val.addb) read
file wcon name=(ala-to-val.wcon) wovr
muta
action
MUTA pick grou 1 #prt 7 9 | grou 2 #prt 15 15 done
*EOD
The data above describes a mutation of valine (side chain atoms from 7 to 9) to alanine
(side chain atom 15). The two side chain are set such that they do not see each other and
their interactions are scaled by λ or (1 − λ ) . The keyword “muta” is used to alert the
program of additional MUTA input after the “action”. The mutant monomer has to
appear in the ala-to-val.poly file, and using the add bond facility in moil to connect the
extra side chain to the same backbone atom ( Cα ) .
The program samples configurations to estimate the average ... λ and the free energy
difference. The sampling is performed with Langevin dynamic (LD). For the input and
output files see LD documentation. The file lambda_res.log returns the results of the free
energy calculation.
Use: muta < input > output
Optional parameters:
62
ilam – initial value of lambda (at present it cannot be 0) [d]{0.0d0}
flam – final value of lambda (at present it cannot be 1) [d] {0.0d0}
#lam - number of lambda step from ilam to flam [i]{0}
twai – number of time steps before starting to collect statistics for the ensemble average
[i]{0}
banf – if true make the integration go from ilam to flam, and then back to ilam. [l] {F}
Sample input:
file conn name=(ala-to-val.wcon) read
file rcrd name=(ala-to-val.crd) read
#ste=10000 step=0.0002 info=100 list=1 tmpi=300 tmpf=300
rmax=9999. epsi=1. cdie v14f=8. el14=2.
muta ilam=0.2d0 flam=0.8d0 #lam=10 twai=100 banf
hvdw
nori
eqms
acti
10.2.3 FREADY In order to use FREADY within MOIL, you first need to generate connectivity and
coordinate files of the coarse-grained model. See a comment in the description of
program conn how to generate connectivity file for FREADY. Alternatively have a look
in moil.test/fready. In this directory there is a script called runme.bat which processes a
pdb file (the file name is specified in the 3rd line of the script) and generates a wcon and a
crd files for FREADY.
In order to run a module in FREADY mode you need to add the following line to your
input file (for dyna, energy, sdel, sdp, chmin, mini_pwl, …)
file CGpr name=(moil.mop/CG.PROP) read
This command at the same time turns off all atomistic energy parameters.
optional FREADY parameters
file fix2 name=(reference.crd) read – Purpose of this command is to fix the local
secondary structural elements to those observed in reference.crd. This command adds
quadratic restraints on all bond angles and bond lengths with equilibrium values set to
those seen in the reference structure. Moreover, it adds similar quadratic restraints on
backbone dihedral angles in residues that are specified to be in an alpha helix or beta
sheet conformation. The specification of secondary structure element is done through the
last column of the reference.crd file, with 0 (default value in normal crd file) – coil, 1 –
alpha helix, 2 – beta sheet.
cutC – cutoff for FREADY non-bonded interaction, recommended to keep the default
value [d] {13.5}
cutH – cutoff for hydrogen bonding interactions in FREADY [d] {8.0}
63
DECO – this flag leads to smoothing the hardcore part of NB interaction, useful for
ranking of DECOY structures [l] {false}
Example of an input file for dyna executable (in FREADY mode)
file conn name=(molecule.wcon) read
file rcrd name=(molecule.crd) read
file CGpr name=(../../../moil.mop/CG.PROP) read
file wcrd name=(tmp.dcd) bina wovr
file wvel name=(tmp.dvd) bina wovr
#ste=10000 #equi=5000 step=0.003 info=2000 #crd=200 #vel=10000
#lis=20 #scl=20 tmpi=1 tmpf=300 cutH=10.0
action
10.2.4 Double-‐well Elastic network model It is possible to run moil action minimization routines (currently implemented only for
sdp and sdel) in a double well elastic network model. Allowing to generate simple
models of conformational transitions. The basic elastic network model is defined as
γ
U (x) = ∑ (rij − rij0 ) 2 , where rij and rij0 are distances between Cα atoms of residues i
2 rij <C
and j in the structure x and in the reference structure, respectively. The parameter C is a
cutoff value, typically in the range of 6 – 12 Å. The transition between two metastable
structures xi , x f is modeled by a network
2
1⎛
⎞
U (x) ≡ ⎜ U i (x) + (U f (x) − α ) − (U i (x) − (U f (x) − α ) ) + 4 β 2 ⎟ .
2⎝
⎠
The two new parameters α and β define the relative energy difference between the two
minima and smoothness/height of the barrier between the minima respectively.
In order to run sdp/sdel with mixed elastic network model a coarse description of the
system (connectivity file and coordinate files) is required. The connectivity file can be
generated in the same way as for FREADY with only the modification that all residues
(except CGTR) should be named GLYZ. Sidechain positions are not considered in this
mixed elastic network model. Once connectivity and coordinate/PATH file of your
system are ready you can use standard sdp/sdel input files with following additions:
ENM2 – this required switch tells the system that a mixed elastic network model is used
for the calculation. [l] {false}
cutE – specifies cutoff used in definition of a contact (C parameter in the equations
above) [d] {7.0}
alFh - α parameter from the last equation [d] {0.0}
bEta - β parameter from the last equation [d] {10.0}
gamE – force constant gamma in the definition of simple ENM [d] {1.0}
Example of input file for sdpS with double-well ENM model
file conn name=(CG.wcon) read
64
file rcrd unit=5 read
file wcrd name=(output.PTH) bina unit=12 wovr
~
#ste=60000 list=60000 tolg=0.0001 grid=100
cini cuto=12.d0 alph=0.d0 beta=1.d2 proc=1
gama=1.d0 gamC=5000 hami=1.d-5
ENM2
~
tmpr=1.d-1 dtop=2.d-3 #pri=100
action
file name=(conf1_CA.crd) read
file name=(conf2_CA.crd) read
10.2.5 LD (Langevin Dynamics) Purpose : Perform a stochastic dynamics simulation using Langevin equation of motion.
A frictional term, with a memory function proportional to a Dirac's delta in time, and a
random term are introduced. The friction coefficient gamma can be controlled by input.
The algorithm used here is discussed in “Computer Simulation of liquids - M.P.Allen &
D.J. Tildesley”.
Use : LD < input > output
rcrd – the file where the cartesian coordinates of all the particles are stored. These are the
initial conditions to solve the Newton equations. The possible formats for this
coordinates file are:
ctyp=(charm) – coordinates written in charmm format (default);
ctyp=(pdb) – coordinates taken from a pdb coordinate file;
Input file types: Optional
rtet – coordinates file for tethering particles to their initial coordinates during MD. Hence
no significant deviation from initial coordinates is allowed. Useful when only part
of the system requires optimization. These coordinates are read in charmm format
(default).
rvel – file with initial velocities if you do not want to extract them randomly. Velocities
are written in charmm format.
wcrd – a binary file where the cartesian coordinates of the particles are stored.
wvel – a binary file where the velocities of the particles are stored.
rest – a file with the last dynamics step saved for the restart.
65
rstr – a file where recent coordinates for restart are stored.
EPROP ARE AVAILABLE FOR LD EXCEPT surften, gbsu, cnst, mors, repl..
bigb – if found the spring constant for any bond is modified according to newb. [l]{false}
newb – new bond spring constant [d]{500.d0}
wfly –check if water molecules fly away from the main system (for simulation of
solvation shell). [l] {false}
tstd – turns on a check for hard collisions for pairs in a neighbor listing in the present
structure. Pairs with a distance shorter than 1.5A are reported. A single structure
evaluation. No dynamics will be run [l]{false}
nfrz – select the particles that WILL NOT be frozen. In the same line as
nfrz a "pick" command must follow [l]{false}
shkb – shake all bonds (alternatively one may try shkl for shaking
bonds with light particles only m<1.1, shkb is highly recommended for dynamics.)
[l] {false}
cgsk – matrix shake using conjugate gradient determination of the Lagrange’s multipliers
mshk –matrix shake for water molecules. When added to the conn program, bond and
angles are excluded from connecticity list of the water molecules
mtol is the tolerance in mshak [l]{false}
mtol – tolerance of error for mshk [d]{1.d-7}
shkl – turns on the shaking of bonds with light particles ( m < 1.1) [l]{false}
nori – turns off the reorientation during the dynamics [l]{false}
nosc – avoids scaling of temperature (done by default) [l]{false}
orie – Avoid rigid body motion. Overlap current structure with the initial structure.
Selection of a subset of atoms for overlap is also possible. [l]{false}
TORS – Apply torsional angle constraints
example - TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100
kcns amplitude for torsional constraint (the larger kcns, the stronger the
cneq equilibrium angle expressed in degrees [d]{-999.d0}
(TORS keyword must come AFTER amid if amid is used)
spec – option of switching between different energy surfaces
example - spec lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0
lmda is the range parameter for continuous potential shifts between empirical
energy surfaces [d]{3.d0}
rcut is the range distance employed in the switching function between different
forms of the heme [d]{5.d0}
repl – it replaces van der Waals wall by an exponential repulsion which is primarily used
in Landau Zener modeling of curve crossing
example - repl Arep=80.0 beta=1.0 Brep=4.
The exponential repulsion is of the form Arep*exp(-beta*r)+Brep. The parameters
should be given as variables as in the above example Arep [d], beta [d], Brep [d}
66
swit – this flag makes possible the passage between different energy curves in LandauZener calculations
example - switch Rcro=3.53181 dRcr=0.05 Forc=5.59951 delt=0.287
Rcros is the position of the crossing point [d]{3.d0}
dRcr is the interval (in angstrom) of significant interaction between two electronic
curves that cross (i.e. the range in which a transition probability between the two
electronic curves is evaluated)[d] {1.d0}
Forc is the difference in the forces at the crossing point of the two electronic
curves [d]{0.1d0} delt indicate the time interval during which curve crossing is
felt [d]{0.1}
(like conjugate gradient) [l]{true for minimization, false otherwise}
nbfi - A flag to indicate that a soft, finite van der Waals repulsion is used for difficult
annealing (Gausssian repulsion is employed) [l]{false}
sym2 - A flag indicating that the box size is changing during the simulation.
[l]{false}The final size is defined according to the values provided in the present
line. The rate of change is based on a linear interpolation from the value defined
by the symm command and the values found at sym2
[d]{0.d0}
tthr - A flag indicating that the tether option (some atoms harmonically linked to fix
positions in space) is set in the present line. ”pick” command to select restrained
atoms is possible. [l]{false}
frcc Force constant for tether constraints (linking particles to specific position in
space){d}[1.d0]
mult - Multiple temperatures are present [l]{false}
Picked temperatures are used in different velocity scaling of selected parts of the
system. The default is that all particles belong to temperature 1.
Useful in annealing part of the system. Or in LES simulations in which
equipartition is violated and different scaling are used for enhanced and regular
parts
eqms –all masses are set to 10.0 [l] {false}
debug – print a lot of debugging information [l] {false}
#ste – number of MD steps [i]{1}
info – number of steps between writing information on standard output [i]{1}
#crd – number of steps between writing coordinate sets [i]{0} (=0 means do not write
coordinates)
#vel - number of steps between writing velocities sets [i]{0} (=0 means do not write
velocities)
#lis - number of steps between regeneration of the non bonded list [i]{1}
#scl - velocity is rescaled to the target temperature if the current kinetic energy violates
the expected one (Boltzmann average) by more than #scl Kelvins. [d]{0.d0}
67
#tmp - number of temperatures in the system (useful for LES simulation) . If larger than
one more input is required to define different domains with different temperatures
[i] {1}
tmpi - initial temperature [d] {300}
tmpf - final temperature [d] {300}
For multiple temperatures just list the values following tmpi or tmpf, e.g., tmpi
300 30 tmpf 300 300 for a system with two temperatures. Group 1 will start and
end at 300K while group 1 starts at 30K and end up at 300K.
rand - a seed for the random number generator for velocity sampling. [i]{1}
step - time step in ps [d] {0.001d0}
gama – Langevin dynamics friction coefficient in (internal MOIL time unit)^-1, so
gama=3.d0 corresponds to approximately 60 ps^-1 [d]{3.d0} [** should be
converted to MOIL units **]
#rig – number of steps between rigid body overlaps of current structure and the initial
reference structure to remove overall rotations and translations.
fmax – if the norm of the force>fmax do steepest descent minimization. Its value is the
threshold above which you should perform steepest descend iterations to stabilize
the system [i]{-1}
strt – for restart, is the starting step for dynamics
shac – maximal error allowed for bond constraints (coordinates) [d]{1.d-7}
shav – maximal error allowed for bond constraints (velocities) [d]{1.d-7}
itsh – maximum number of allowed iterations for SHAKE convergence [i]{100}
cgpt – maximum number of iterations in conjugate gradient SHAKE iteration for
Lagrange’s multipliers of particle positions [i]{NA}
cgvl – maximum number of iterations in conjugate gradient SHAKE iteration for
Lagrange’s multipliers of velocities [i]{NA}
cdie – turns true the use of constant dielectric constant (=1). Otherwise, distance
dependent is used [l] {true}
rdie – No longer operational (turn on dielectric linear with distance. [l]{true}
FREADY works with LD (see FREADY documentation).
Sample LD input
It is the same as in the dyna sample input, just add a definition of the friction constant,
e.g.
gama=60.0d0.
68
10.2.6 dynapress Purpose: Calculates pressure of a biomolecular system enclosed in a cubic box. It works
exactly as dyna program, only the definition of a rectangular periodic box is mandatory.
Pressure is printed out every info steps. Microscopic pressure shows significant
oscillations and thus a longer (at least 100 ps) averaging is recommended in order to
estimate accurately the pressure of a system. The module is recommended with keywords
nobo and shakb which constrain all bond lengths to their ideal values. Special water
shake algorithm (keyword mshk) is currently not supported.
Use: dynapress < input > output
10.2.7 PME Purpose: Calculating the long range electrostatic interactions by using Particle Mesh
Ewald Summation. The current version is using code from Darden (Darden, York et al.
1993)
Input parameters(required)
ewald the ONLY necessary keyword for PME
Input parameters(optional)
dtol – tolerance for direct space summation - it essentially sets up the Ewald coefficient
(for a given direct space cutoff) [f]{(5*10E-5) }
grdx, grdy, grdz (function of box sizes) – grid dimensions defining the accuracy of PME.
One grid point per Angstrom is recommended. Choose powers of 2,3 or 5 if
possible
iord – another parameter defining accuracy of PME, namely interpolation order (replaced
by order in many places in the code) - notice, the interpolation order is equal to
(iord – 1) [i]{0}
sgdx, sgdy, sgdz – additional scaling parameters for further adjustment of automatically
chosen xgrd, ygrd, zgrd
Input example
file conn name=(memb5.WCON) read
file rcrd name=(memb5_4.CRD) read
file wcrd name=(memb_16.DCD) bina wovr
file wvel name=(memb_16.VCD) bina wovr
relx=12. rvmx=8. epsi=1. cdie v14f=8. e14f=2.
step=0.001
#ste=300 #equ=300 info=1 #crd=500 #vel=200 #lis=5
mshk mtol=1.d-12
shkb shac=1.d-12 shav=1.d-12 itsh=2000
symm xtra=70.0 ytra=106.3 ztra=50
69
ewald dtol=0.000001 iord=6 grdx=81 grdy=81 grdz=64
~ the above line is relevant for PME
rand=1111111
tmpi=300. tmpf=300.
action
70
10.2.8 dynapt (parallel program for parallel tempering, requires MPI) Purpose: Compute molecular dynamic trajectories of replicas of the same system at
different temperatures and swaps two neighboring replica's coordinates and velocities by
a MC criterion. This method allows the low temperature system of interest to escape from
local free energy minima where it might otherwise be trapped. (see Ref (Sugita and
Okamoto 1999))
Use: mpirun -np numproc exe/dynapt < numrep_file
here, numproc is the number of processors that will be used to calculate the trajectories
and numrep_file is an input file that includes the number of replicas that will be run.
Input file types
To run parallel jobs with different temperatures one needs to prepare separate input files
for each temperatures. These input files are similar to the dyna input file except for the
swap frequency is set to a desired value. The name of the input file is given as:
dyna_0000.inp, dyna_0001.inp, ...
for the first, second replica and so forth..
Secondly one needs another input file numrep_file that is already mentioned above
Output file types
dyna output is automatically named by the program as dyna_0000.out, dyna_0001.out, ...
and standard .dcd and .dvd files are generated exactly the same as the dyna program.
Everything is the same as the dyna output file. Additionally, the acceptance ratio is
written in the dyna output file. One can extract them with grep as
grep “Acc” dyna_0002.out
[** not clear what does the output mean? **]
a standard Acc Ratio is given here:
Acc. ratio in steps=
12144 temps =
17664 temps =
18768 temps =
19872 temps =
300.00<-->350.00
300.00<-->350.00
300.00<-->350.00
300.00<-->350.00
1.00
0.50
0.33
0.25
Variables
All variables are the same as in the dyna program with an addition of
swfr – it is the attempt frequency for swapping. A number greater than 0 turns on parallel
tempering [i] {0}
71
Further notes
compile dynapt with mpif77 (or other MPI compatible compiler)
Please note that in MOIL's implementation of parallel tempering at every swfr steps one
replica is chosen randomly and the swap criterion is computed. Thus each replica swaps
configuration in a different time.
An example input file
file conn name=(val.wcon) unit=10 read
file rcrd name=(val.crd) unit=11 read
file wcrd name=(300.dcd) bina unit=12 wovr
file wvel name=(300.dvd) bina unit=13 wovr
#ste=200000 #equ=10000 info=1000 #crd=1000 #vel=1000 #lis=2000
swfr=1000 #scl=20 rand=3451187 step=0.001 tmpi=300 tmpf=300
relx=12. rvmx=9.
cdie epsi=1.
action
72
10.3 Utilities 10.3.1 addion Purpose: To correctly use Ewald summation of a periodic system, the system must be
neutral. This program takes a coordinate set of a solvated system and “mutate” water
molecules to desired ions. The water molecules are chosen at random. This code is
working most conveniently through the graphic interface.
Use: addion < input > output
conn – the connectivity file that lists the molecular topology and parameters required for
energy and force calculations. The connectivity file is for the molecule
WITHOUT the added ions
rcrd – the file from which the current coordinate system will be read (CHARMM format)
wcrd – where the coordinates (with the ions added) will be written
poly – a polymerization file which includes the ions to be written. It will be used by the
conn program to generate a connectivity file appropriate for the new coordinate
file.
Variables
iona – the name of the ion particle (atom). We support at present only monatomic ions for
an ion monomer. This makes sense since it replaces a single water molecule. A
large ion may not fit. [c]{NONE}
ionm – the name of the ion monomer [c]{NONE}
#ion – the number of ions to be inserted [i]{0}
rand – random number seed to select water molecules to be replaced by ions. [i]{-1}
73
10.3.2 boat: Purpose: Compute Bonds Angles and Torsions (internal coordinates) from Cartesian
coordinates of one or a series of structures. Only internal degrees of freedom that are
defined in the connectivity file are computed. A single structure is in CRD format and
multiple structures are in PTH or DCD format.
Use: boat < input > output
Input file types (required)
conn or rcon – connecticity file
rcrd – coordinate file
Output file types (required)
boat – output file with values of internal coordinates
coor – type of coordinates to be followed by a space and the type of coordinate files to
be read. Options are CHAR PATH or DYNA
lpst & lpen – [i] {0} the starting (lpst) and ending (lpen) indices of the structures to be
read from PATH or DYNA files.
acti – start executing
Sample input
file rcon name=(valdip.wcon) unit=10 read
file rcrd name=(admap.pth) binary unit=11 read
file boat name=(valdip_boat_pth.out) unit=12 wovr
coor PATH lpst=1 lpen=3
action
74
10.3.3 ccrd purpose: Convert CooRDinates between different formats (CHAR, PATH, and DYNA).
Many (confusing) options are available so be aware! For example it is possible to take a
list of CHAR file from the standard output (after the “action”) and convert them to a
DCD file
Use:
ccrd < input > output
Input file types (required)
rcrd – primary coordinate file (the meaning of which will be explained below)
Input file types (optional)
rcr1 – secondary coordinate file
Output file types
cmbn – combined files. The file declaration must be followed by assignment of an integer
variable comb=[i]{0} which is the total number of structures to be read.
wcrd – output written coordinate (in the new desired coordinate format). In most
application we extract a CHAR file from DYNA or PATH files, or convert between
DYNA and PATH files which are done directly between the rcrd file and the wcrd file.
Some more interesting applications are also possible. For example it is possible to
combine several CHAR file to a single DYNA or PATH file. This is done by setting rcrd
to the standard input (unit=5) and reading a number of crd files after the action. Another
interesting option is an output from cmbn, in which several DYNA files are merged
together. After the action (and before *EOD) a list of DYNA file is given in the usual
format with explicit statement of the number of structures to be read from this file. For
example the two lines below mean to read 30 structures (10 from 1.dcd and 20 from
2.dcd) . Note that the number 30 must match the cmbn parameter (the total number of
structures to be written to the output file).
file rcrd name=(1.dcd) bina read lpst=10
file rcrd name=(2.dcd) bina read lpst=20
wpck – Indicating a pick command for writing coordinates (only picked atoms will be
written). The format should be of the form “wpck pick pick_selection done” in one line.
For “pick” syntax see 7.4 [l]{false}
opck – Indicating a pick command for overlapping the structures when writing output.
The whole structure will translate and rotate as rigid body but the rotation matrix will be
computed according to the selected set of atoms. The format should be of the form “opck
pick pick_selection done”. For “pick” syntax see section 7.4 [l]{false}
fpth fdyn fchr fxyz – logical, indicating the format of the input coordinates are either
PATH, DYNA, CHAR, FREE respectively. FREE has x,y,z coordinates in a single line
75
assuming that each line correspond to an atom and the line are ordered exactly as in the
connectivity file. [l]{false}
tpth tdyn tchr – the output coordinate format options: PATH, DYNA, CHAR
respectively. [l] {false}
wsub –write subset of the coordinates [l]{f}
ovlp –overlap coordinates according to selection [l]{f}
acti – start computing
Example
file conn name=(aladip_w248.wcon) read
file rcrd name=(oeq_M_96.pth) binary read
file wcrd name=(oeq_M_96.dcd) binary wovr
fpth tdyn lpst=1 lpen=10
action
76
10.3.4 crd2pdb purpose: convert from CHAR (CRD) format to PDB
Use: crd2pdb < crdfile > pdbfile
Further input is not required
77
10.3.5 con_specl Purpose: Generating secondary connectivity file that is used in simulation of curve
crossing (based on Landau-Zener model). The secondary connectivity file is extracted
from a regular file and includes only particles that are involved in the curve crossing
Use: con_specl < input > output
Input file types (required):
rcon – regular connectivity file
Output file types (required):
wcon – file with connectivity of a subset of atoms that participate in curve crossing
Variables
spec – turn on the Landau Zener model [l]{false}
mos1 – each curve crossing is modeled by a Morse potential that crosses a potential
energy of exponential repulsion. These degrees of freedom are covalently coupled to
other degrees of freedom (e.g. a bond of CO to the iron in the heme, is coupled to the
heme degrees of freedom). We allow up to 4 curve crossing centers (mos1 mos2 mos3
mos4) that were used in the past to model hemoglobin. If found, the mos[i] command
must be followed by a selection, for example:
chem mono HEME | chem mono CO | #mon 95 95 done
Sample input
file rcon name=(mbco_m.wcon) read
file wcon name=(mbco_s.wcon) wovr
specl
mos1 pick chem mono HEME | chem mono CO | #mon 95 95 done
action
*EOD
78
10.3.6 memeqns Purpose: This program takes in results from milestoning calculation (program fp) and
postprocesses them and calculates kinetics/equilibrium information about the system. It
runs in the interactive mode where user chooses the kind of analysis to be done on the fly.
One can either calculate equilibrium properties (answer y for question „Equilibrium run?
(y/n)“) or mean first passage times (MFPT mode).
In the equilibrium run, one can opt for QK analysis (question „Perform QK
integration?“) of the data (1). In the MFPT mode one specifies one of the milestones and
the mean first passage time (mfpt) from this milestone to the last milestone is calculated.
Optionally the reverse mfpt is calculated as well. The MFPT provides the overall first
passage time of the process which is the most straightforward and easy calculation to do
(West, Elber 2007). The QK formulation integrates the (integral) equation and provides
the most detailed information, including p(i,t) the probability of being at milestone i and
time t (ref. (Faradjian and Elber 2004))
A milestoning run produces files from the program fp, one file per milestone. This file
includes a list of termination times of trajectories initiated on Milestone i and terminating
on Milestones i+/-1 Transition times to i+1 are recorded as positive and transition time to
i-1 as negative. Note that the current version of Milestoning handles only sequential
Milestones. Extensions for general arrangements of Milestones are in progress. User
should also prepare another file that lists filenames of all milestoning result files,
memeqns will ask for location of this file during the input collection from the user.
The results of the analysis are printed to the standard output. In the case of QK
integration, iterative evolution of transition probability vector q(i,t) is written to the file
fort.10 (the probability to make a transition to Milestone i at exactly time t) and evolution
of p(i,t) vector to the file fort.11. See (Faradjian and Elber 2004; West, Elber et al. 2007)
for definitions of p and q probability vectors.
Use: memeqns (and reply to the queries that follow)
79
10.3.7 reconstruct Purpose: This program takes in all-atom representation of two different conformations
(files in CHARM format) of the same molecule (single wcon file). It also takes a coarsegrained (CA and CTERM particles only) representation of a trajectory between these two
endpoints. It generates all-atom representation of the trajectory by using the
reconstruction algorithm described in (Majek, Elber et al. 2009) in section VI.
Use: reconstruct < input
conn – connectivity file of all-atom model (water, ions, etc should not be included)
rcr1 – read all-atom coordinates of the 1st configuration
rcr2 - read all-atom coordinates of the 2nd configuration
rpth – read binary (in double precision) representation of a trajectory from structure 1 to
structure 2. This trajectory is in a coarse model which specifies only CA particles
and second oxygen of all CTER residues.
wpth – writes all-atom representation of the trajectory into this binary file (in double
precision format)
#str – number of structures in the input binary file [i] {1}
join – if present, the binary input file is assumed to be in the order 1 ,2 , ..., n-1, n, n-1, ...,
2, 1. The same order of frames is preserved on output. This order is useful for
visualizing the trajectory, since there is no jump in the movie if you play it in loops.
[l] {false}
Sample reconstruct input
file conn name=(all_atom.wcon) read
file rcr1 name=(start.crd) read
file rcr2 name=(end.crd) read
file rpth name=(start_to_end.PTH) bina read
file wpth name=(all_atom.PTH) bina wovr
#str=100
action
80
10.3.8 path_eqw Purpose: This program takes a set of solvated structures (CHARM format) same solute
and box size possibly solvated by different number of water molecules. It merges all the
structures to a single binary file (double precision) ready for path/free energy/fp
calculations. It further sets the number of water molecules to be equal in all the
structures. It does so by assuming that water molecules are at the very end of the input
files (which is typically the case) and remove any extra entries from the end of each input
structure. If a structure in the input has a smaller amount of water molecules present,
extra water molecules are added with dummy coordinates set to 9999. In contrast to other
moil programs, path_eqw, accepts its input in the exact pre-specified order (see below)
and does not support commented lines (~).
Use: path_eqw < input
Sample path_eqw input
file name=(template.wcon) read
#str=3
file name=(output.PTH) bina wovr
file name=(structure_1.wcon) read
file name=(structure_1.crd) read
In the first line, the template connectivity file is specified, the desired number of water
molecules in the output file is set to the number of water molecules in this connectivity
file. In the second line, number of structures (N) to follow is specified. The next line
assigned the output coordinate file. Then 2N lines specifying N structures follows,
a connectivity file followed by a coordinate file repeats N times.
81
10.3.9 ovrlp_trj Purpose : Align structures in trajectory with respect to a reference structure minimizing
the mass weighted rmsd from a given structure in the trajectory to the reference. The
output is a binary file where the aligned structures are stored.
Use : ovrlp_trj<input>output
Input file types
conn – connectivity file.
rcrd – the file where the Cartesian coordinates of all the particles are stored.
This file stores the coordinates of the reference system.
The only possible format for this coordinates file is:
ctyp=(CHARM) – coordinates taken from a charmm format file
rdyc – File where the dynamic coordinates are stored. The only format allowed is DCD.
Output file types
wcrd – a binary file where the aligned coordinates of the atoms are stored. The only
allowed format is DCD
Other instructions
Variables: (in square brackets – type, curly bracket default
norw – Do not rewind the coordinate file to be read. By default it rewinds the
file.[l]{false}
#str – number of structures to be looking at in a dynamic or path file. [i]{0}
pick – particles that you choose to align the structures.
82
10.3.10
Numerical Purpose : Calculate numerically the second derivative of the energy. It can be computed
for all energy terms, or just picking up some of them and discarding others. It returns in
standard output the matrix with all the second derivatives (3 directions per particle to the
square, so 9npt^2 elements).
Use : numerical<input>output
Input file types
rcon or conn: wcon file obtained calling conn which has all the information regarding the
molecular topology and the parameters required for energy calculations
rcrd - the file where the Cartesian coordinates of all the particles are stored. Only CRD
format is supported:
Output file types
wene – file where the energy output is stored
Variables
EPROP ARE AVAILABLE FOR numerical.
debu – Prints a lot of debugging information.
shif - A flag indicating different style of cutoff which is no longer used.
10.3.11
solvatecrd Purpose : This program takes a file with a solute and a file with a water box and solvate
the solute avoiding overlapping of water with the solute itself and cutting off water
particles whose oxygen is not inside the input box. This program is used most effectively
in GUI while converting PDB file to a solvated structure.
Use : solvatecrd < input > output
The graphic interface of moil (moil.tcl found in ~/moil/moil.gui/) provides nice and
convenient input to convert a PDB file coordinates to internal MOIL coordinates. It
generates a connectivity file and solvated the system in a water box in a single moil.tcl
submission (mouse stroke).
83
rcrd – read coordinate, can only be CHAR format. It contains the coordinates of the
solute.
rwbx – read coordinate containg the coordinates of the water box (CHAR format).
Output files: Required
wcrd - write the coordinates of the solvated system (only CHAR format)
wpol - write the poly file corresponding to the solvated system
xbex, ybex, zbex. The x, y and z coordinates of the center of the box [d] {0d0}.
xwbx, ywbx, zwbx are the x, y and z lengths of the rectangular simulation box [d] {1.0}.
selc – A pick command to select particles for which the center of mass of the solute is
computed [l] {false}.
Sample input file::
file conn name=(1mbd.wcon) read
file rcrd name=(1mbd.crd) read ctyp=(CHARM)
file wcrd name=(1mbd_solv.crd) wovr ctyp=(CHARM)
file rwbx name=(../../../moil.crd/watbox.crd) read
file wpol name=(1mbd_solv.poly) wovr
xbex=0.0 xwbx=50.0
ybex=0.0 ywbx=50.0
zbex=0.0 zwbx=50.0
~debug
action
10.3.12
pdb2puth Purpose: it reads a pdb file and do some changes of the file that includes the addition of
the C- and N-terminals, removing duplicate coordinates of the same atom in the crystal
structure, editing some atoms or residues names, etc. The program outputs an edited pdb
file. Currently, the functions of this program are more easily used through the moil.tcl
graphic interface.
Use: pdb2puth < input > output
rcrd – read PDB file.
wcrd – edited PDB file
wpol - write the poly file corresponding to the edited PDB file.
84
MOLC- it provides the molecular name, a maximum of 4 characters
Sample input
file rcrd name=(3SDH.pdb) read ctyp=(pdb)
file wpol name=(3SDH.poly) wovr
file wcrd name=(3SDH-1.pdb) wovr ctyp=(pdb)
MOLC=(3SDH)
action
*EOD
85
10.4 Analyses 10.4.1 av_dif Purpose : Compute water properties from MD simulations: average diffusion constant
Use: av_dif < input > output
rcrd – read coordinate, can be either PATH or DYNA format (the keywords DYNA or
PATH must be present in the same line.
norw – do not rewind the coordinate file for each read (ensures faster reading)
lbox – length of the water box [d]{0.d0}
lpst – the first structure index in the coordinate file to be analyzed [i]{1}
lpen – the last structure index [i]{1}
tau1 – the time window used to estimate the diffusion constant [d]{1.d0}
dens0 – upper bound for the density [d]{1.5d0}
nrmono – - number of solute monomers [i]{0}
nratom – number of solute atoms [i]{0}
A pick command for the OH2 (water oxygens) must be present
Output files
None. Results are written to the standard output.
10.4.2 Contacts Purpose: calculate the distance and collision numbers for a picked group of a selected set
of atoms (for example a diatomic ligand diffusing in a protein). This is computed for a
sequence of dynamic structures.
Use: contacts < input > output
rcrd – coordinate file (MUST be of dcd type)
wsum – write a summary file of all collisions
wave – write average collision numbers
86
norw - [l]{f}do not rewind the dcd file after each read (used for faster reading)
rcut - cutoff distance to define a collision [d]{5}
#str - number of structures in the dcd file [i]{0}
A pick command for the subset of colliding particle must be present.
10.4.3 dxdl Purpose: Computes a trajectory as a function of the arc-length (instead of as a function
of time) using the initial value formulation and compares the results to boundary value
calculations.
Use: dxdl < input > output
rcrd – coordinate file (path format)
other input options in the code:
norw – Files are read from a binary file without “rewinding” the file, which is usually a
lot faster for structures read sequentially [l]{f}
coor – [character] {unkw} Acceptable value for coor are the three different internal
coordinate formats CHAR DYNA and PATH. If the formats are PATH or DYNA
and the number of structures is different from one then the variables lpst and lpen
(see below), MUST be in the same line
#str – number of coordinate frames in the file. [i]{0}
list – number of steps between updates of the non-bonded list [i]{20}
hvdw – [l]{false} set finite van der Waals radius for hydrogen atoms (usually zero in
OPLS). Helps to avoid numerical instabilities at high temperature simulations or
when the initial structure is highly distorted
rmax –A single cutoff for all non-bonded interactions. Used to indicate no cutoff , i.e.
rmax=9999. Not used anymore to indicate actual cutoff and kept for past
consistency [d] {-1.d0}.
epsi – [d] {1.d0} dielectric constant. Most applications do not use it and its impact is
pre-computed to the connectivity file.
cdie – Use constant dielectric (=1). [l]{true}
gbsa – turn on Generalized Born Surface Area calculations (Tsui and Case, 2000). [l]
{false}
gbsu – frequency of updating the gbsa neighbor list [i] {0}
A pick command is possible
wtor – Output coordinates
87
10.4.4 eff_difdens purpose: Computes spatial density and diffusion constants for water molecules using a
grid of a rectangular periodic box. There is some overlap of the present module with the
module av_diff
Use: eff_difdens < input > output
rcrd – coordinate file. Only type PATH or DYNA are allowed.
Output files types: required
None. Results are written to the standard output and to Fortran file indices 102 and 103
lbox –dimension of cubic box [d]{0.d0}
lpst –a starting index for structures in the file [i]{1}
lpen – an ending index for structures in the file [i]{1}
tau1 – the time interval used to estimate the diffusion constant, typically 1ps [d] {0.d0}
dens0 – the maximum value for the density [d] {0.d0}
ddens – an increment for the density [d] {0.d0}
nrmono – number of solute monomers (to generate PDB file) [i] {0}
nratom – number of solute atoms (to generate PDB file) [i] {0}
norw – Files are read from a binary file without “rewinding” the file, which is usually a
lot faster for structures read sequentially [l]{f}
pick command for selecting OH2 atoms (waters’ oxygens) is required. See 7.4 for a
description of the pick command.
10.4.5 Fluc Purpose: computing fluctuations and RMSD difference for a molecular dynamics
trajectory and a reference structure.
Use: fluc < input > output
rcrd – coordinate file for reference system in CHARM format
rdyc – dynamic coordinates for analysis
wrms – write rms values as a function of time (time, rms) compared to the reference
coordinate
wave – write rms with respect to the average structure (time, rms)
88
wflu – time-averaged thermal fluctuation at different residue positions (B factors)
norw – Do not rewind a sequential binary file for next read. Usually faster for reading
sequential frames of a trajectory [l] {f}
#crd – number of Molecular Dynamics steps before writing a coordinate set to the DCD
file. [i]{0}
#ste – number of molecular dynamics steps [i]{0}
step – the size of the time step[d]{0.01}..
pick –command for selection of a subset of atom for rms or fluctuation calculations. See
section 7.4 for an explanation of the options in pick command.
action – stop reading input and start processing
Sample input
file conn name=(*.wcon) read
file rcrd name=(*.crd) read
file rdyc name=(*.dcd) bina read
file wrms name=(*.rms) wovr
file wave name=(*.ave) wovr
file wflu name=(*.flu) wovr
#ste=200000
#crd=1000
step=0.001
~ pick only protein particles (no water, no ions)
pick pick #mon 1 298 done
action
*EOD
10.4.6 rgyr Purpose: Compute the radius of gyration of a coordinate system.
Use: rgyr < input > output
rcrd – coordinate file in CHARM or PATH format (default CHAR)
wcln – write radius of gyration sequentially for molecular frames
norw –if true do not rewind a sequential binary file for next read. Usually faster for
reading sequential frame of a trajectory [l] {f}
#str – number of coordinate frames in the file. [i]{0}
#crd – number of frames between writing up coordinates [i]{0}
89
step – size of time step. [d]{0.d0}
pick – command for selection of a subset of atom for rms or calculations of fluctuations.
See section 7.4 for an explanation of the options in pick command.
action – stop reading input and start processing
10.4.7 rms_2crd Purpose: compute the mean square distance of two crd sets (same number of atoms, noalignment) sharing the same connectivity file. Selection is possible for a subset of atoms
to be used in the overlap calculations (Kabsch [reference]) and a second for the
calculations of the distance
rcc1 – first coordinate file (CHAR only)
rcc2 – second coordinate file (CHAR only)
None
sel1 – select first group of atoms for overlap followed by a “pick” command
sel2 – select a second group of atoms for RMSD calculation followed by a “pick”
command
10.4.8 rms_2path Purpose: Compare the RMSD of two CHARMm files or of all pairs of structures from a
PATH set.
rcc1 – first coordinate file (only CHAR)
rcc2 – second coordinate file (only CHAR)
rpc1 – first coordinate file in PATH
rpc2 – second coordinate file in PATH
wrms – write rmsd output data.
st1s – index str to start with of structures of set 1 [i]{1}
st2s – index of str to start for the second set [i]{1}
90
st1e – index str to end with of structures of set 1 [i]{1}
st2e – index of str to end the second set [i]{1}
#rmp – write rms of #rmp monomer [i]{1}
frzp – pick frozen particle: particle that are frozen are given mass zero and they do not
participate in the orientation and rmsd calculations.
10.4.9 rms_p2p Purpose: Compute rms of all structures in one PATH file against all structures in another
PATH file. The output is a matrix of rmsd for all (i,j) pairs.
rpt1 – first coordinate file ( PATH only)
rpt2 – second coordinate file (PATH only)
None
len1 – [i]{1} length (number of frames) of file no.1
len2 – [i]{2} length (number of frames) of file no. 2
sele – a pick command for selection of particles for overlap and rmsd calculations. See
7.4 for the syntax of pick command.
10.4.10
rms_resd Purpose: Compute rmsd of a trajectory compared to the average structure and the B
factors
rcrd – reference coordinate (CHAR)
rdyc – trajectory (DYNA) file
wrms – writing rms results
#str – [i]{1} number of structures
pick – a pick command for selection of particles for overlap and rmsd calculations. See
7z.4 for the syntax of pick command.
91
10.4.11
SuperTMscore Purpose : This tool carries out a comparison between two structures and finds out the
superposition that has the maximum TM-score (reference: Yang Zhang, Jeffrey Skolnick,
Proteins 2004 57:702-10). Then overlaps the first structure with the second in a way that
their mass weighted rms is a minimum.
Use : superTMscore < input > output
rcrd - the file where the Cartesian coordinates of all the particles are stored.
These are the coordinates that are “moved” in the overlap operation
The only possible format for this coordinates file is: ctyp=(CHARM) This file is
used only if there is no rdyc file, and it is used only to minimize the rms, not the
TM-score
rcor - this file stores the coordinates that are “fixed” in the overlap operation.
This file is used only if there is no rdyc file and must be of CRD format. Finally it
is only used to minimize the rms, not the TM-score
rdyc - file where the coordinate set from dynamics run are stored. Also in this case the
only format allowed is DCD. If this file is found, both rms and TM-score are
computed with the coordinates in this file.
wrms – file where the TM-score is stored if there is a rdyc file, the rms otherwise
#str – number of structures that will be considered in the TM-score calculation [i]{1}
10.4.12
superback Purpose : This tool carries out a calculation of the fluctuations per residue, and aligns the
structures on a dynamic file minimizing the rms.
Use : superback < input > output
rdyc - file where the coordinates of the dynamics run are stored. The only format
92
allowed is DCD. The content of this file is used to compute fluctuations. The
coordinates in this file are aligned with the coordinates in reference structure
before computing the fluctuations.
the coordinates are the reference structure.
wdyc – file where the dynamic coordinates after alignment are stored
wflu – file where the fluctuations are stored. Enables also the calculation of the
fluctuations
wave – file where the average rms is stored. Enables the print out of the average rms
wrms - file storing rmsd calculations
wave – file with the average rms
subs – select the particles for rms calculations [l]{false}
norw – do not rewind the coordinates file while computing the average of the
fluctuations [l]{true}
nstr – number of structures for which we compute the average of the fluctuations [i]{1}
jump – step in reading rdyc coordinate file [i]{1}
10.4.13
superrms Purpose : Overlap two structures such that their mass weighted rms is a minimum.
Use : superrms < input > output
Coordinates must be CRD format file. This file is used only if there is no rdyc
file.
rcor - the Cartesian coordinates of all the particles of a reference coordinate system. The
format is CRD.
rdyc - file where the dynamics coordinates are stored. Format is DCD. The rms is
computed with the coordinates in this file kept as a reference, the coordinates
taken from the rcrd file are moved.
wrms – file where the TM-score is stored if there is a rdyc file, if not, the rms is the
output
#str – number of structures for the rms minimization in the dynamic coordinate file is
used. [i]{1}
93
CAon – turns on the calculation of the rms only for Cα setting to zero the mass of all the
other particles
self – if this flag is found, the rms is computed only within the rdyc file, without using as
a reference the rcrd file (which is the default).
pick – command to pick a subset of particles. The rms will be computed only within this
subset. If no pick is found, all the particles are used.
10.4.14
str_measures Purpose : Compute the tensor of inertia, a measure of sphericity, and a “shape” measure
for a collection of point masses. The diagonal elements of the tensor of inertia are:
Txx = ∑ mi ri2 − xi2
(
)
and the off diagonal elements:
Txy = − ∑ mi xi yi
If we call a1, a2 , a3 the three eigenvalues of the tensor of inertia, and a is their
average, the sphericity parameter becomes:
3 ∑ ( ai − a )
D=
2 ∑a 2
(
i
2
)
The limit of zero D means that all the eigenvalues are equal, and the collection of point
masses is spherical.
The “shape” measure is:
s = 27
∏ (a − a )
(∑ a )
i
2
i
A positive s means that the “shape” of the protein is flat(“pita like”), a value bigger than
zero means that it is a cylinder (“cigar”).
Use : str_measures < input > output
rdyc - read coordinates in DCD format
rpth – coordinates are read in path format
94
Output file types
wtab – formatted file where the elements of the tensor of inertia are stored
wdel - formatted file where the value of the sphericity coefficient is stored
wris - formatted file where the value of the “shape” coefficient is stored
Other instructions
pick – particles that you choose to compute the tensor of inertia. If no particles are picked
by an external selection, the program enforces the default in which all particles are
selected.
#crd – number of coordinate files in the dynamics file [i]{1}
10.4.15
tmalign (Zhang and Skolnick 2005) Purpose : Align a PDB structure (hereafter ‘structure.pdb’) to a target PDB (hereafter
‘target.pdb’). A transformation matrix is produced giving the translation and rotation to
be applied to structure.pdb. A score, “TM-score” is also produced which assigns a metric
to the resulting alignment. Detailed information about the alignment and scoring
procedure can be found at:
Zhang & Skolnick, Nucl. Acid Res.2005 33, 2303-9
The program was written by the above authors. A simple addition was made for
inclusion in the MOIL package (see below).
Use: (the following instructions are produced if tmalign is run without arguments)
1. Align 'structure.pdb' to 'target.pdb'
(By default, TM-score is normalized by the length of 'target.pdb')
>tmalign structure.pdb target.pdb
2. Run TM-align and output the superposition to 'TM.sup' and
'TM.sup_all':
>tmalign structure.pdb target.pdb -o TM.sup
To view the superimposed structures of the aligned regions by
rasmol:
>rasmol -script TM.sup)
To view the superimposed structures of all regions by rasmol:
>rasmol -script TM.sup_all)
3. If you want TM-score normalized by an assigned length, e.g. 100 aa:
>tmalign structure.pdb target.pdb -L 100
If you want TM-score normalized by the average length of two
structures:
>tmalign structure.pdb target.pdb -a
If you want TM-score normalized by the shorter length of two
structures:
>tmalign structure.pdb target.pdb -b
95
If you want TM-score normalized by the longer length of two
structures:
>tmalign structure.pdb target.pdb –c
* A new option added for the MOIL version: ‘-t’ may be supplied to force a “trivial”
alignment of the two structures (target and structure should be the same length).
** The tmalign program is also utilized by Zmoil for doing alignments of PDB or CRD
coordinate files, allowing the visualization of a number of alignments, as well as the
saving of new coordinates based on the alignment. In the case of CRD formatted files,
the MOIL program crd2pdb is first run to produce a temporary structure in PDB format
on which tmalign can operate.
10.4.16
Torstat Purpose : Torsion Statistics for a set of protein structures. Program picks the relevant
atom sets and calculates the phi, psi and if exists chi angle of each residue in the protein
structure. If chi angle is not present (glycine, alanine, and proline) chi is set to -999.0
Use : torstat < input>output
coor – [character] {unkw} Acceptable value for coor are the three different internal
coordinate formats CHAR DYNA and PATH. If the formats are PATH or DYNA
and the number of structures is different from one then the variables lpst and lpen
(see below), MUST be in the same line
lpst– [i] {1} the starting index of a structure in unformatted coordinate file
lpen– [i] {1} the ending index of a structure in unformatted coordinate file
tors – file where the torsions are stored.
Sample input
file rcon name=(molecule.wcon) unit=10 read
file rcrd name=(AtoB.pth) binary unit=11 read
file tors name=(tors.out) unit=12 wovr
coor PATH lpst=1 lpen=25
action
Sample output
2 GLY 45.033 -53.495 -999.000 -999.000
0
96
3 ASN -54.236 -52.080 1.002 -121.378
4 ASN -70.088 -44.782 31.763 -75.281
5 GLN -49.863 -17.781 50.472 -112.609
10.4.17
xangle 0
0
0
Purpose : Extract angles from dynamics file
Use : xangle < input > output
rcrd - the file where the Cartesian coordinates of all the particles are stored (dcd format).
wcrd – file where the angles are stored
Variables
pick–flag to indicate that the present line is for selection of a subset of particles
norew – do not perform a rewind on a file [l]{false}
#str – number of structures in rcrd file. [i]{0}
Sample Input
file rcrd name=(valpath.dcd) bina read
file wang name=(angle.out) wovr
pick pick #prt 1 1 | #prt 5 5 | #prt 8 8 done
#str=10
action
10.4.18
xcrd Purpose : Extract coordinates from dynamics file
Use : xcrd < input > output
rcrd – the file where the Cartesian coordinates of all the particles are stored (dcd format).
wcrd – file where the distances are stored
97
Variables
pick – selection of a subset of particles for write to wcrd
norew – do not perform a rewind on a file [l]{false}
#str - number of structures in rcrd file. [i]{1}
str1 – the first structure to be read [i]{1}
#mon – running monomer index for write [i]{0}
Sample input
file wcrd name=(crd.out) wovr
pick pick #prt 1 1 | #prt 5 5 done
#str=10
action
10.4.19
xtors Purpose : program to extract the torsion along a trajectory
Use : xtors < input > output
rcrd - read Cartesian coordinates in DCD format.
wtor – write to wtor output torsions
Parameters
pick –flag to indicate that the present line is for selection of a subset of particles
#str – number of structures to be looked at a dynamics or path file.
norew – do not rewind a DCD file in sequential reads
Sample input
file wtors name=(tors.out) wovr
pick pick #prt 1 1 | #prt 5 5 | #prt 6 6 | #prt 10 10 done
#str=10
action
98
11 MOIL files 11.1 monomer The monomer file is where the connectivity of the particles in a monomer is listed and
where the rules for joining monomers are defined. It is an input to the "conn" program in
order to generate the connectivity file for the complete molecule. Typically it is NOT
input prepared by the user and existing databases are used.
The structure of the file is as follows:
The top of the file must be (or the first non-comment line ):
MONO LIST
Following the title the different monomers are listed. Each monomer starts with the line
MONO=(NAME) #prt=5 chrg=0.
where NAME is the name of the monomer type that can be at most four characters (all
character assignments must be closed in parenthesis (…) and this includes of course also
the monomer name). #prt is the number of particles in the monomer. This includes also
virtual particles used to link the monomer to next or previous monomers. chrg is the
charge of the total monomer INCLUDING the virtual particles. The virtual particles are
included since their type is the one that will be finally used (see below). The total charge
is used only for test purposes.
After that line a list of unique names of particles (to that monomer) and their types is
provided (the last three assignment of SAID PCHG and divi are optional. HERE is
assumed implicitly unless PREV or NEXT are found):
~ unique nam type
link
UNIQ=(UNAM) PRTC=(PTYP) HERE
UNIQ=(B)
PRTC=(BTYP) PREV
UNIQ=(C)
PRTC=(CTYP) NEXT
surface
chrg
divisions
SAID=16 PCHG=0.1 divi=1
SAID=1
divi=2
SAID=7
divi=2
where the UNIQ command assigns a unique particle name (unique to that monomer). The
name can be at most four characters and must appear only once within a monomer).
The PRTC defines the particle type and it is matched against the PNAM data from the
property file. This is an essential match to determine the parameters for energy
calculations. Failure to match particle types results in program termination.
The link information makes it possible to have a set of monomers and to link them to a
polymer in automated fashion. The following keywords are available for the link action
(note that only four characters from a keyword are actually used):
99
HERE (or blank) - This is a normal particle of the present monomer
NEXT
- This particle belongs to the next monomer, when the connectivity file
is generated by linking connectivity information between monomers. The NEXT particle
is identified in the monomer that follows up (NEXT) according to its UNIQ name. This
allows (for example) attaching an N-terminal residues that consists of the three hydrogen
atoms and a NEXT nitrogen atom. We incline to use the NEXT facility over the
PREVIOUS option, since typically we start with the N terminal, though the results would
be (of course) identical.
If a NEXT particle is not found a warning is issued.. This warning is not terminal since it
is possible that a NEXT particle will be missing (for example when attaching a C
terminal to an amino acid). In that case the extra particle is removed. All the bonds of a
NEXT particle to the current monomer atoms are kept when a matching is made. If there
is a conflict between the particle types between the NEXT and actual UNIQ particles the
NEXT or PREV (see below) assignments take precedence. For example, in the N
terminal we have three identical and charged hydrogen atoms. However the default
structure of an amino acid includes the usual amide hydrogen atom. The last is replaced
by a hydrogen atom type that is equivalent to the other two hydrogens in N terminal.
PREVIOUS
- The particle belongs to the previous monomer, when the
connectivity file is generated the bonds of the previous particle are transferred to the
corresponding (identical UNIQ name) atom in the previous monomer.
DNXT
- Remove a particle in the next monomers. This option is not used at
present in the ALL.MONO file
DPRV
- Remove a particle in the PREVious monomer. This option is not used
in the present ALL.MONO file.
After the link list the “surface” expression is optional. The keyword SAID=[i]{0} implies
that the surface attached to this particular atom belongs to type SAID (Surface Area
IDentification). This is useful in calculations of hydrophobicity which is modeled as
proportional to the solvent exposed surface area. This expression is optional. Obviously it
is completely unnecessary if explicit water molecules are used.
Yet another option is the use of PCHG. By default the charge of a particle is stored in the
property file and the charge is assigned to a UNIQ atom according to the atom type
PRTC. However, to allow greater diversity and for consistency with other force fields
that assign new set of charges for each monomer while keeping all other parameters (van
der Waals, bonds angles, etc) the same, we may assign charges at the monomer level.
This assignment overrides an assignment by the PROP file. The use is simply to add
PCHG=[d]{NONE} to the line of the (re)charged atom.
The final entry in the particle line is that of divi. Non-bonded lists are computed in MOIL
in two steps. First, a division neighbor list is generated and then an atom neighbor list is
created based on the coarser division list. The divisions are groups of atoms for which the
center of mass is computed and used to generate a coarse division list. How to define
these groups? The default is to use the monomers as the division. Each monomer as
defined in ALL.MONO is one division. Alternatively one may use the divi=[i]{1}
100
optional entry to divide the monomer entry to multiple division. For example to increase
accuracy in estimating neighbors to the heme group (which is pretty large monomer) 9
division of the HEM are used in MOIL.
The unique particle list ends with DONE
A bond list follows the particle list. A Bond is defined by two unique particle names and
a dash in between (e.g. A-B is a bond between A and B). Note that special particles for
which the connectivity action is different from HERE (i.e. PREV or NEXT) are denoted
by *. Example following the definition of particles above is.
BOND
UNAM-B* UNAM-C*
DONE
Special particles must come second in bond and a monomer cannot be used to define a
bond between two special particles, one of the particles must be HERE (or default – no
entry). Note that the * is really needed to avoid ambiguity. Since it is possible to have
(for example), particle with UNIQ name A as HERE and also UNIQ A as PREV.
This also means that there are some restrictions on the connectivity. Currently it is not
possible to make a reference from a given monomer to the same particle name at
PREVious and NEXT monomers. It is hard to imagine however a case in which it is truly
needed. It is also not possible to create a bond between PREV and NEXT particles
Yet another restriction is that a regular UNIQ name that ends with “*” is unacceptable,
since a confusion with PREV and NEXT particles is likely.
We re-emphasize that cases in which a NEXT particle is defined but not found are
possible, a warning will be issued but the warning is not fatal. In fact it can be quite
convenient to define peptide link as the carbonyl carbon attached to the NEXT nitrogen
for all amino acids in the protein chain. This however fails at the C terminus. To treat the
C-terminal correctly, the virtual nitrogen at the C terminus is deleted which bring
everything back to normal, a warning about that nitrogen is however issued to the
standard output, that warning can be ignored.
The angles, torsions, and improper torsions are generated once the bond structure of the
complete molecule is formed and they are generated comprehensively. i.e. all possible
angles, torsions and improper torsions are formed. Some torsions are then eliminated
(those with zero energy contribution). One consequence is that all possible bonds angles
and improper torsions MUST be defined in the property file. If torsion is not found a
yellow alert is issued (non-fatal warning) and that torsion is ignored.
The file ends in the traditional way, i.e.
*EOD
Finally we give a complete example to define alanine (with minimal essential-only
information)
101
MONO=(ALA) #prt=7 chrg=-0.57
UNIQ=(N)
PRTC=(NH)
UNIQ=(H)
PRTC=(HN)
UNIQ=(CA)
PRTC=(CAH)
UNIQ=(CB)
PRTC=(CH3)
UNIQ=(C)
PRTC=(CO)
UNIQ=(O)
PRTC=(OC)
UNIQ=(N)
PRTC=(NH)
NEXT
DONE
BOND
C-O C-N* C-CA CA-CB CA-N N-H
DONE
Other examples for a monomer file can be found in moil/moil.mop/*.MONO . The most
widely used version is ALL.MONO
11.2 property The property file stores the parameters of the particles, bonds, angles, torsions and
improper torsions. It is an input to the connectivity program which builds the molecular
connectivity file. The default file can be found in moil.mop and its name is ALL.PROP.
Typically prepared files are read instead of generating the atomic properties from scratch.
One such prepared file is ALL.PROP.
The property file is build from sequential sections which MUST come in the following
order:
PRTC - individual particle properties
1-4P
- scaling parameters for non-bonded 1-4 interactions.
BOND - bond parameters
ANGLE - angle parameters
TORSION - torsion parameters
IMPROPER - improper torsion parameters
It is possible not to provide all the information i.e. a property file with PRTC only is
legal, however PRTC and ANGLE is not. If you provide only PRTC the program will
issue a warning (yellow alert), ignore it unless you want to provide bonds and somehow
the bonds were not read correctly.
Each subsection (e.g. PRTC, BOND, TORSION) must end with
DONE
The file must end with
*EOD
The DONE and *EOD are general termination features used in other data files.
102
Another general feature shared between different data files is the comment line. ~
ANYWHERE in the line makes it a comment. These lines are echoed by the interpreter
and otherwise ignored.
Below details on the syntax are provided: The explanations will be written as comment
lines as in a "real" property file
~ This is a first line of a property file. The first exe line must be
PRTC
~ The following line lists properties of an individual particle
~ name
mass charge epsilon sigma
PNAM=(NX) PMAS=14. PCHG=-0.3 PEPS=0.170 PSGM=3.250
~ The example above provides the data for the particle type NX (Nitrogen
~ of the N-terminal. characters (like PNAM - the name of particle
~ type) must be enclosed in brackets. Each of the expressions (i.e.
~ A=B) must be separated from other expressions by space(s). No spaces
~ within an expression are allowed.
~ epsilon and sigma are the van der Waals well depth
~ and the hard core radius respectively. Obviously this type of
~ line is repeated as needed for different particle types.
~ The data base for particle properties is based on the OPLS potential
~ Jorgensen and Tirado-Rives JACS 110,1657(1988)
~
~ Now end the particle part by DONE
DONE
~ The following part specifies 1-4 scaling parameters of the force field for van den Walls
~ forces (v14f) and electrostatic forces (el14). These parameters are used for scaling
~ the above mentioned interactions between pairs of atoms separated by exactly 3 bonds.
1-4P
v14f=0.125
el14=0.5
~ Finish this section by DONE keyword
DONE
~
~ The next part lists the bond properties. The first Bond line must be
BOND
~ Bond energy is set to be K(r - req)^2
~ Below we provide the names of the two particle types, the force constant
~ (in kcal/mol angstrom^-2) and the equilibrium distance in angstrom
~ Note the different style of i/o different expression are still
~ separated by spaces but no equality is used. This requires the data
~ to be placed in exactly the same order. I.e. do not exchange equilibrium
~ position and force constant.
103
~ The covalent part of the potential (excluding improper torsions)
~ is taken from AMBER
~ Weiner et al JACS 106,765(1984)
~ particle particle force-constant equilibrium distance
NX
HX
434.0
1.01
CANX
NX
337.0
1.449
~
~ Pictorially NX-HX
~ end the BOND with DONE
DONE
~
~ Angles are similar to bonds in format style
ANGLE
~ K (theta -theta(eq))^2
~ name name name K(kcal/mol radians^-2) theta(eq) (degrees)
HX NX HX
35.0
109.5
~
~ Pictorially HX-NX-HX
DONE
~
~ And here are the torsions. The format style is similar to BOND and ANGLE
TORSION
~ (Pictorially CAH-CO-NH-CAH)
~ however the energy function is more complex:
~ E = sum k(n)*(1 + cos(n*phi+gamma)
~ (gamma should be a function of n too and will be added to the program
~ soon). Currently the format is
~ name name name name k(1) k(2) k(3) n cos(gamma1) cos(gamma2) cos(gamma3)
CAH CO NH CAH 0.0 2.5 0.0 2 -1.0 -1.0 -1.0
~ There is an option in TORSION (only) to use a wild card by X, e.g.
X CANX CX X 0.0 0.0 0.0 3 0.0 0.0 0.0
~ where X means "any atom".
~ **** ALL TORSIONS MUST BE DEFINED IN THE PROPERTY FILE ***
~ However in many cases the energy is set identically to zero.
~ This is done by setting cos(gamma)=0. When the program matched
~ this torsion, it is skipped and NOT included finally in the
~ connectivity file
~
DONE
~
104
~ Improper torsions are four body interactions in which one atom is sitting
~ in the center, Pictorially
~
B
~
|
~
A
~
/\
~ C D
IMPROPER
~ The internal degree of freedom - phi, is the angle between the normal
~ to the ABC plane and the normal to the BCD plane. To obtain consistent
~ values A must be first and D must be last
~
~ The energy function is rather messy...
~ If the equilibrium angle is far from zero we use simply harmonic term
~ E = K1(phi-phi(eq))^2 (K1 kcal/mol radian^-2 ; phi(eq) degrees)
~ If the equilibrium angle equals zero then the above energy expression
~ is singular, we therefore use
~ E = K2(cos(phi) - cos(phi(eq))^2 (K2 kcal/mol ; phi(eq) degrees)
~ Note that the units for K is different in both cases. Note also
~ that at phi(eq)=0, Taylor expansion shows that E is quartic in phi
~ therefore to maintain comparable restoring force K2 > K1
~
~ The atom in the center is always first, the last atom must also be chosen
~ with care since it determines the sign. The other two in the middle
~ can be interchange
~ name name name name K1/K2 phi(eq)
CANX NX CO CH3 55.0 35.26
DONE
~ End it all
*EOD
PROPERTY files are kept in
moil/moil.mop/*.PROP
11.3 poly This (short) file is typically prepared by the user as input to the conn program (generating
a connectivity file or wcon). It contains a list of the monomers that forms his/her
molecule of interest.
The file poly is accessed in the conn program and the conn input looks something like
file poly name=(ala3.poly) read
A simple example is below
105
MOLC=(BIG) #mon=3
NTER ALA CTER
*EOD
MOLC is the molecule name (called BULK in the conn file) four characters at most.
#mon is the number of monomers.
The line that follows includes the monomer names. The number of monomers found
should match the number of monomers declared in the first line (#mon).
11.4 addbond Sometimes it is necessary to add a bond to a connectivity file since the automated
generation of bonds cannot cover everything. For example the binding of carbon
monoxide to a heme iron is not modeled with the usual tools of MOIL. It is therefore
useful to have a tool to explicitly add bonds between pairs of atoms. The file for bond
addition is an input to the conn program.
file ubon name=(mb10co.addb)
read
The addbond file include one (or more ) line(s) identifying added bond(s). Each line
follows the general syntax
bond select-one-atom select-a-second-atom . The selection is slightly different from the
pick command. For example
bond chem HIS 95 NE2 HEM1 157 FE
Which is interpreted as “a bond between the unique atom NE2 of Histidine residue
number 95 and the unique atom FE of the heme residue number 157”
Morse bonds (using the function D(exp(-2*a(r-r0))-2exp(-a*(r-r0))) ) can also be added.
morse atm1=[i]{0} atm2=[i]{0}
Where the integer entries are the indices of the atoms within the connectivity file. The
syntax is not ideal since the parameters are set via the standard input of dyna, energy, or
mini. The morse energy parameters are not read from the prop file.
The file ends with
*EOD
106
11.5 edit Sometimes it is useful to remove some terms from the connectivity data structure as
generated automatically with conn. The file edit is serving that purpose. It is used during
a call to connect. The prime keyword is “remo”. Only bond or an angle can be removed.
A typical line to remove a bond looks like
remo bond atm1=[i]{0} atm2 =[i]{0}
The missing id numbers are the indices of the atoms within the connectivity file. A nicer
expression is:
remo bond chem HEM1 157 NA HEM1 157 FE
The remove command is useful in the process of substituting a harmonic bond by a
Morse bond. We first remove the usual harmonic bond and then add the Morse term via
the addbond option.
Similarly we can remove an angle from the list of angles generated automatically. Some
of these generated angles are undesired. The example below is typical for the removal of
angles. The iron in heme is bonded to 4 nitrogens in a planar arrangement. Two of the
angles are linear (180 degrees). It is possible to maintain the structure with the 90 degree
angles only. Moreover the 180 degree angles are bad news to the force field. Derivative
involves a division by the sine of the angle and 180 degrees causes singularity. We
therefore eliminate the 180 degree angles as illustrated below.
*EOD
~
11.6 connectivity The connectivity file is where the complete information to compute the energy of a set of
coordinates of a molecule is found (the coordinates are provided separately). It typically
ends with the extension *.wcon (written connectivity). The file is created by the program
conn based on a list of residues (sequence) provided by the user (i.e. the *.poly file) and
the generic database of monomer and particle properties (typically the ALL.MONO and
ALL.PROP files). A list of all the covalent energy terms and their parameters is provided
and also a list of the nonbonded parameters. The file is formatted and the users are
discouraged from editing it or try to create it bypassing the conn program.
107
11.7 Coordinates 11.7.1 PDB file interpretation in MOIL The PDB files are the standard entries of the protein data bank www.rcsb.org. Zmoil
views this structure “as is” by building bonds from distance proximity between atoms as
well as CONECT records for hetero atoms (see PDB format). The file can be processed
through the menu-based interface moil.tcl (see the file get_started.pdf in moil.doc for an
introduction to the moil.tcl graphic interface) and is converted to MOIL CRD file which
shares a lot of similarities with the CRD format of CHARMM. The MOIL version is
more restricted that the CHARMM version. In MOIL the title is fixed and is not available
for the user to edit. MOIL interprets only the ATOM and HETATM records of the PDB
and ignores the rest. The atom entry is assumed to be of the following format
zevel(1:4),j1,char2,char1,i1,xtmp,ytmp,ztmp,moretmp
char1 and char2 are the atom and residue unique names that are extracted from the file
and are compared to the residue (monomer) list available in the poly file and the atom
list available in the connectivity file. The monomer list of the connectivity file must
match (in order) the monomers read from the atom records of the PDB. Within a residue
all the heavy atoms (non-hydrogens) must find a match with the unique atom of the
residue as defined in the connectivity data structure (and the ALL.MONO file). The
unique atoms within a residue need not be in the same order in the two files. A mismatch
or missing residue or atom name causes a termination of the read.
At present MOIL does not support insertion and modeling of atoms or residues with the
exception of hydrogens. This will have to be done externally to MOIL. If the match is
good the coordinate vector is filled with xtemp, ytemp, and ztmp. The value moretmp is
stored internally in MOIL in the “more” array. For PDB file its value is the B factor.
11.7.2CRD file The CRD file is a standard internal MOIL coordinate file which is very similar to the
CHARM format except that it is more limited in its options. The file starts with title lines.
A title starts with “*” and then a comment is written. A title line with only a “*” and
nothing after denotes the end of the title. Note that in MOIL the user cannot modify the
title content with the program (only externally).
The title is followed by a single line that includes the number of atoms in the file, read as
an integer. The number of atoms must match the number of atom records in the file. A
mismatch results in termination.
The number of atoms is followed by lines that provide the coordinates of the atoms and
“more”.
108
The following format is used:
i5, i5, 1x, a4, 1x, a4, 3(f10.5), 1x, a4, 1x, a4, f10.5
for the following variables:
atom id, monomer id, monomer name, particle name, x, y, z, nothing, optional vector
The atom id is read but is ignored. MOIL is doing its own counting to match the number
of lines read to the number of atoms stated in the line that follows immediately the title.
The residue id is however a must. It is read and compared to the internal id of the
monomers read from the connectivity file. The name of the monomer of a particular id
must match the name of the residue with the same id in the connectivity data structure. A
mismatch results (yes, here we go again) in program termination.
The atom unique name in the CRD file must match one of the unique names of a particle
in the corresponding monomer. If required the atom list within a monomer is searched to
find a match. If a match for an atom cannot be found the program terminates. MOIL
expects all the coordinates of the atoms of the monomer just read (as defined by the
connectivity data structure) to be read and found. If an atom in the connectivity data
structure is not found in the coordinate file, (checked monomer by monomer) the
program reports a missing atom and exits.
After a monomer and particles are matched, the coordinates x, y, z of the particle are read
from the file. Note that the format of f10.5 is better than that of the PDB but it is still
limited compared to double precision numbers. For maximum precision the PTH (path)
format is the most desired. The records named FREE below are ignored at present in
MOIL. It is possible to leave the records after the coordinates simply blank. The read will
go on just fine. The final f10.5 vector contains a single number per atom which can be
(for example) the B factor of crystallography.
A sample of a start of a crd file is below
* title for CHARMM coordinates
*
29026
1
1 DMPC C1
-36.34491
2
1 DMPC O1
-37.38061
3
1 DMPC C11 -36.37354
4
1 DMPC C12 -35.53492
...
5.90324
6.06437
4.54210
3.34375
-29.94611
-29.30440
-30.62652
-30.19138
memb
memb
memb
memb
FREE
FREE
FREE
FREE
0.00000
0.00000
0.00000
0.00000
109
11.7.3 DCD and DVD files The dcd and dvd files are Dynamics CoorDinate and Dynamics VelocityD files.
Coordinate and velocities are completely interchangeable and their format is identical.
Used typically to store coordinates and velocities during Molecular Dynamics
simulations. They are based on a format developed for CHARMM. The option in MOIL
is a subset of what is available in CHARMM but seems to be sufficient for the tasks that
we are after. The coordinates are written in single precision. This is probably all that we
need for Molecular Dynamics simulations. However, many applications in MOIL require
double precision. The files are unformatted with the intention to save space. The format
of the dcd/vcd files is as follows.
The first record is a header and an integer vector of twenty variables. The header is a
character of length 4 that is never used in MOIL. From the vector of 20 integers only the
first (the number of coordinate sets to be read) and the ninth (the number of frozen
particles) entries are used.
The second record is of an integer and a character of length one. Both are not used.
The third record includes one integer, which is the number of particles of a molecular
frame in the dcd (or dvd) file. This number must match the number of atoms of the
connectivity data structure. Otherwise the program terminates with a message on nonmatching number of particles.
The forth record includes the pointer to particles that are not frozen. The file is formatted
in such a way that only the first coordinate set is complete. In the following coordinate
set only the particles that are not frozen are written into the file. The pointer to nonfrozen particles is written as (nofreez(i),i=1,inofrz) where nofreez is the pointer, i.e.
nofreez(i) is the i-th particle that is not frozen and inofrz is the number of unfrozen
particles.
The fifth, sixth and seventh records are all the X,Y, Z coordinates of all the particles of
the system. They are written in single precision and translated internally to double
precision number. Obviously some precision is lost between write and read.
Follow up records includes the X,Y, Z records of the selected particles only. Triplets of
records (for X,Y, Z coordinates) continue until all the coordinate sets are read.
An important keyword can be set in some programs is “norw”. The DCD/DVD files can
be rewind (or not if norw is .true.). If the file undergoes “rewind” every read operation
reading a large number of DCD records can take a LOT of time (the read is sequential).
Therefore the “norw” option is recommended for multiple reads and in use of analysis
programs.
110
11.7.4 PTH files The path file (extension .pth) is a moil “invention”. They are not compatible with other
programs but are useful for calculations that produce multiple structures and retain the
full double precision of the coordinates. They are unformatted but in a very simple way.
There is no title or internal test (we assume that you know what you are doing). Every
record is identical and consists of the following sequence of double precision numbers
energy_value(if available, zero if not), ((coor(j,i),i=1,npt),j=1,3)
coor(j,i) is the Cartesian coordinate j of particle i.
11.7.5 wene and wmin files wene and wmin are the output file of the energy and the mini_pwl programs, an output
format that is widely used in MOIL. It is therefore useful to see it once and briefly
describe the meaning of the different terms. All the terms are described in the energy
section of the documentation. For completeness, we briefly describe them below
Parameters for energy calculation
Constant dielectric will be used. elec. Cutoff= 9999.00000
vdW cutoff 9999.00000
ENERGIES: E total =
-39.919
E bond =
0.754
E angl =
E impr =
1.844
E vdw =
E 14el =
28.889
E 14vd =
E cnst =
0.000
E evsym=
E centr=
0.000
E hydro=
Norm Force =
4.254
Number
Number
Number
Number
Number
of
of
of
of
of
3.482
3.445
3.234
0.000
0.000
neighbours for short range int.
uncharged vdW interactions
elec. only interactions
wat-wat shrt. range neighbors
wat-wat long range neighbors
E tors =
E elec =
2.029
-83.6
E elsym=
0.000
8
19
7
0
0
The constant dielectric is the only option currently available in MOIL. The cutoff
distances of 9999 indicates that no cutoff is used. E total provides the potential energy
(not including kinetic). E bond is the bond energy and similarly E angl, E tors, E impr, E
111
vdw, E elec are angle, torsion, improper torsion, van der Waals and electrostatic energies.
E 14el, E 14vd are the electrostatic and van der Waals 14 interactions.
E cnst corresponds to the energy of the constraints, E vsym and E elsym are van der
Waals and electrostatic energies that result from translational symmetry operation
(periodic boundary conditions). E center is a restraint added to a set of selected
coordinate to avoid diffusion, E hydro is an approximate hydrophobic energy term. The
norm of the force is the normalized length of the force vector
the number of particles in the system.
∇U t ⋅∇U 3n where n is
11.8 Standard input and output in MOIL Most of the programs direct some output to the standard output that should be read in
addition to the specifically designed output such as *wene and *wmin. Error messages
are always directed to the standard output.
The standard input is used to provide file lists, and initialize variables as discussed earlier
in this document.
11.9 Other special files None
112
12 Credit Thanks to all who were involved in different phases of code developments and testing
Code developers:
Alfredo Cardenas, Ron Elber, Avijit Ghosh, Robert Goldstein, Chen Keasar, Serdal
Kirimizialtin, Haiying Li, Peter Májek, Jaroslaw Meller, Debasisa Mohanty, Mauro
Mugnai, Roberto Olender, Felicia Pitici, Adrian Roitberg, Amena Siddiqi, Carlos
Simmerling, Ileana Stoica, Alex Ulitsky, Gennady Verkhivker, Yael Weinbach, Anthony
West, Veaceslav Zaloj
GUI developers:
Thomas Blom, Baohua Wang, Avijit Ghosh,
Current code keeper: Thomas Blom
We made a use of the generously provided codes:
(1) The Housholder diagonalization routine written by Ryszard Czerminski.
(2) The truncated newton-raphson minimization by Stephen G. Nash
(3) The Particle Mesh Ewald of Darden and co-workers,
J. Chem. Phys. 98,10089(1993)
(4) The Spherical Solvent Boundary Potential of Beglov and Roux
J. Chem. Phys. 100,9050(1994)
(5) Generalized Born model from Tsui and Case
Biopolymers 56, 275(2000)
(6) TMalign code of Zhang and Skolnick, Nucleic Acids Research 33, 2302-2309 (2005).
113
13 References The reference to the general code is (numerous other references describe concrete
applications and modules, consult the main text):
R. Elber, A. Roitberg, C. Simmerling, R. Goldstein, H. Li, G. Verkhivker,
C. Keasar, J. Zhang and A. Ulitsky "MOIL: A program for simulations
of macromolecules", Computer Physics Communications, 91, 159189(1995)
Beglov, D. and B. Roux (1994). "FINITE REPRESENTATION OF AN INFINITE
BULK SYSTEM - SOLVENT BOUNDARY POTENTIAL FOR
COMPUTER-SIMULATIONS." Journal of Chemical Physics 100(12): 90509063.
Cardenas, A. E. and R. Elber (2003). "Kinetics of cytochrome C folding: Atomically
detailed simulations." Proteins-Structure Function and Genetics 51(2): 245257.
Czerminski, R. and R. Elber (1990). "Self avoiding walk between 2 fixed end points
as a tool to calculate reaction paths in large molecular systems."
International Journal of Quantum Chemistry: 167-186.
Czerminski, R. and R. Elber (1991). "Computational study of ligand diffusion in
globins 1 Leghemoglobin." Proteins-Structure Function and Genetics 10(1):
70-80.
Darden, T., D. York, et al. (1993). "PARTICLE MESH EWALD - AN N.LOG(N)
METHOD FOR EWALD SUMS IN LARGE SYSTEMS." Journal of
Chemical Physics 98(12): 10089-10092.
Elber, R. (1990). "Calculation of the potential of mean force using molecular
dynamics with linear constraints -An application to a conformational
transition in a solvated dipeptide." Journal of Chemical Physics 93(6): 43124321.
Elber, R. and A. Cardenas (2004). "From reaction pathways to classical
trajectories." Biophysical Journal 86(1): 34A-34A.
Elber, R., A. Ghosh, et al. (2002). "Long time dynamics of complex systems."
Accounts of Chemical Research 35(6): 396-403.
Elber, R. and M. Karplus (1990). "Enhanced sampling in molecular dynamics - use
of the time dependent hartree approximation for a simulation of carbon
monoxide diffusion through myoglobin." Journal of the American Chemical
Society 112(25): 9161-9175.
Elber, R. and D. Shalloway (2000). "Temperature dependent reaction coordinates."
Journal of Chemical Physics 112(13): 5539-5545.
Faradjian, A. K. and R. Elber (2004). "Computing time scales from reaction
coordinates by milestoning." Journal of Chemical Physics 120(23): 1088010889.
Ghosh, A., R. Elber, et al. (2002). "An atomically detailed study of the folding
pathways of protein A with the stochastic difference equation." Proceedings
114
of the National Academy of Sciences of the United States of America 99(16):
10394-10398.
Gibson, Q. H., R. Regan, et al. (1992). "DISTAL POCKET RESIDUES AFFECT
PICOSECOND LIGAND RECOMBINATION IN MYOGLOBIN - AN
EXPERIMENTAL AND MOLECULAR-DYNAMICS STUDY OF
POSITION 29 MUTANTS." Journal of Biological Chemistry 267(31): 2202222034.
Honeycutt, J. D. and D. Thirumalai (1989). "STATIC PROPERTIES OF
POLYMER-CHAINS IN POROUS-MEDIA." Journal of Chemical Physics
90(8): 4542-4559.
Hornak, V., R. Abel, et al. (2006). "Comparison of multiple amber force fields and
development of improved protein backbone parameters." Proteins-Structure
Function and Bioinformatics 65(3): 712-725.
Jorgensen, W. L. and J. Tiradorives (1988). "THE OPLS POTENTIAL
FUNCTIONS FOR PROTEINS - ENERGY MINIMIZATIONS FOR
CRYSTALS OF CYCLIC-PEPTIDES AND CRAMBIN." Journal of the
American Chemical Society 110(6): 1657-1666.
Kaminski, G., R. Friesner , et al. (2001). "Evaluation and reparameterization of the
OPLS-AA force field for proteins via comparison with accurate quantum
chemical calculations on peptides." The Journal of Physical Chemistry B
105(28): 6474-6487.
Li, H. Y., R. Elber, et al. (1993). "MOLECULAR-DYNAMICS SIMULATION OF
NO RECOMBINATION TO MYOGLOBIN MUTANTS." Journal of
Biological Chemistry 268(24): 17908-17916.
Majek, P. and R. Elber (2009). "A coarse grained potential for fold recognition and
molecular dynamics simulations of proteins." Proteins, Structure, Function
and Bioinformatics: accepted.
Majek, P., R. Elber, et al. (2009). Pathways of Conformational Transitions in
Proteins, Crc Press-Taylor & Francis Group.
Mohanty, D., R. Elber, et al. (1997). "Kinetics of peptide folding: Computer
simulations of SYPFDV and peptide variants in water." Journal of
Molecular Biology 272(3): 423-442.
Olender, R. and R. Elber (1996). "Calculation of classical trajectories with a very
large time step: Formalism and numerical examples." Journal of Chemical
Physics 105(20): 9299-9315.
Olender, R. and R. Elber (1997). "Yet another look at the steepest descent path."
Theochem-Journal of Molecular Structure 398: 63-71.
Onufriev, A., D. Bashford, et al. (2004). "Exploring protein native states and largescale conformational changes with a modified generalized born model."
Proteins-Structure Function and Bioinformatics 55(2): 383-394.
Pranata, J., S. G. Wierschke, et al. (1991). "OPLS POTENTIAL FUNCTIONS FOR
NUCLEOTIDE BASES - RELATIVE ASSOCIATION CONSTANTS OF
HYDROGEN-BONDED BASE-PAIRS IN CHLOROFORM." Journal of the
American Chemical Society 113(8): 2810-2819.
Roitberg, A. and R. Elber (1991). "MODELING SIDE-CHAINS IN PEPTIDES
AND PROTEINS - APPLICATION OF THE LOCALLY ENHANCED
115
SAMPLING AND THE SIMULATED ANNEALING METHODS TO FIND
MINIMUM ENERGY CONFORMATIONS." Journal of Chemical Physics
95(12): 9277-9287.
Simmerling, C. and R. Elber (1994). "HYDROPHOBIC COLLAPSE IN A
CYCLIC HEXAPEPTIDE - COMPUTER-SIMULATIONS OF CHDLFC
AND CAAAAC IN WATER." Journal of the American Chemical Society
116(6): 2534-2547.
Steinberg, M. Z., K. Breuker, et al. (2007). "The dynamics of water evaporation
from partially solvated cytochrome c in the gas phase." Physical Chemistry
Chemical Physics 9(33): 4690-4697.
Sugita, Y. and Y. Okamoto (1999). "Replica-exchange molecular dynamics method
for protein folding." Chemical Physics Letters 314(1-2): 141-151.
Ulitsky, A. and R. Elber (1993). "The thermal equilibrium aspects of the timedependent hartree and the locally enhanced sampling approximations formal proeprties, a correction, and computational examples for rare gas
clusters." Journal of Chemical Physics 98(4): 3380-3388.
Verkhivker, G., R. Elber, et al. (1992). "Locally enhanced sampling in free-energy
calculations - application of mean field approximation to accurate calculation
of free energy differences." Journal of Chemical Physics 97(10): 7838-7841.
Weinbach, Y. and R. Elber (2005). "Revisiting and parallelizing SHAKE." Journal
of Computational Physics 209(1): 193-206.
West, A. M. A., R. Elber, et al. (2007). "Extending molecular dynamics time scales
with milestoning: Example of complex kinetics in a solvated peptide."
Journal of Chemical Physics 126(14).
Yang, Z., P. Majek, et al. (2009). "Allosteric Transitions of Supramolecular Systems
Explored by Network Models: Application to Chaperonin GroEL." Plos
Computational Biology 5(4).
Zhang, Y. and J. Skolnick (2005). "TM-align: a protein structure alignment
algorithm based on the TM-score." Nucleic Acids Research 33(7): 2302-2309.
Zichi, D. A. (1995). "MOLECULAR-DYNAMICS OF RNA WITH THE OPLS
FORCE-FIELD - AQUEOUS SIMULATION OF A HAIRPIN
CONTAINING A TETRANUCLEOTIDE LOOP." Journal of the American
Chemical Society 117(11): 2957-2969.
116

moil documentation - Center for Computational Life Sciences and

Transcription

Similar documents

Page 1 Proving a triangle is a right triangle Method 1: Show two

Summer Symposium Flyer_2015_v1.pub

Lesson 12: Dividing Segments Proportionately

,%%*a{t8^.r - Poupart`s Bakery

Transparency in works -purchase etc awarded on

Noether`s theorem

5.1 Midsegment Theorem and Coordinate Proof

Campus gathersto remember lost friend

College Voice Vol. 21 No. 21 - Digital Commons @ Connecticut

Moil: Milestoning, NAIS Workshop