moil documentation - Center for Computational Life Sciences and
Transcription
moil documentation - Center for Computational Life Sciences and
MOIL DOCUMENTATION 19 April 2011 1 Table of Contents 2 Obtaining MOIL and Getting Started ..........................................................................4 3 Goals......................................................................................................................................4 4 Force field............................................................................................................................5 5 Units ......................................................................................................................................5 6 Typical order of events ...................................................................................................5 7 Directory structure ..........................................................................................................6 8 Basic syntax of the MOIL interpreter .........................................................................9 8.1 Overview................................................................................................................................... 9 8.2 Variable selections (integer, real, double, character, logical)..............................10 8.3 Syntax of file assignment...................................................................................................10 8.4 Syntax of the “pick” command.........................................................................................12 8.5 Comment lines ......................................................................................................................13 8.6 Line continuation.................................................................................................................13 8.7 The “action” and “*EOD” commands..............................................................................13 9 Brief description of MOIL modules.......................................................................... 15 9.1 Special characters in description of programs:.........................................................15 9.2 Major programs:...................................................................................................................15 9.3 Some noted options.............................................................................................................16 9.4 Utilities ....................................................................................................................................16 9.5 Analysis programs...............................................................................................................17 9.6 The moil.tcl, cmoil, and zmoil programs......................................................................18 9.7 Parameters for energy and force calculations - EPROP..........................................18 10 MOIL in depth ............................................................................................................... 21 10.1 The major programs ........................................................................................................21 10.1.1 conn.................................................................................................................................................. 21 10.1.2 puth .................................................................................................................................................. 24 10.1.3 energy ............................................................................................................................................. 25 10.1.4 mini_pwl and mini_tn ............................................................................................................... 27 10.1.5 dyna.................................................................................................................................................. 30 10.1.6 dyna_prl (parallel program, requires MPI)..................................................................... 34 10.1.7 freee ................................................................................................................................................. 34 10.1.8 therm ............................................................................................................................................... 36 10.1.9 mfep ................................................................................................................................................. 38 9.1.10 umbr ................................................................................................................................................ 42 9.1.11 sdel (parallel version, requires MPI) / sdelS (serial version) ............................... 43 9.1.12 sdp (parallel version, requires MPI) / sdpS (serial version) .............................. 46 9.1.13 chmin.............................................................................................................................................. 48 9.1.14 fp............................................................................................................................................................ 51 1 10.1.10 DiM................................................................................................................................................. 55 9.1.15 scndrv (and numerical) ............................................................................................................... 58 10.2 Major options......................................................................................................................60 10.2.1 LES .................................................................................................................................................... 60 10.2.2 MUTA............................................................................................................................................... 62 10.2.3 FREADY .......................................................................................................................................... 63 10.2.4 Double-‐well Elastic network model................................................................................... 64 10.2.5 LD (Langevin Dynamics)......................................................................................................... 65 10.2.6 dynapress ...................................................................................................................................... 69 10.2.7 PME .................................................................................................................................................. 69 10.2.8 dynapt (parallel program for parallel tempering, requires MPI) ......................... 71 10.3 Utilities .................................................................................................................................73 10.3.1 addion ............................................................................................................................................. 73 10.3.2 boat: ................................................................................................................................................. 74 10.3.3 ccrd................................................................................................................................................... 75 10.3.4 crd2pdb .......................................................................................................................................... 77 10.3.5 con_specl........................................................................................................................................ 78 10.3.6 memeqns ....................................................................................................................................... 79 10.3.7 reconstruct.................................................................................................................................... 80 10.3.8 path_eqw........................................................................................................................................ 81 10.3.9 ovrlp_trj.......................................................................................................................................... 82 10.3.10 Numerical ................................................................................................................................... 83 10.3.11 solvatecrd ................................................................................................................................... 83 10.3.12 pdb2puth..................................................................................................................................... 84 10.4 Analyses................................................................................................................................86 10.4.1 av_dif................................................................................................................................................ 86 10.4.2 Contacts.......................................................................................................................................... 86 10.4.3 dxdl................................................................................................................................................... 87 10.4.4 eff_difdens ..................................................................................................................................... 88 10.4.5 Fluc ................................................................................................................................................... 88 10.4.6 rgyr................................................................................................................................................... 89 10.4.7 rms_2crd ........................................................................................................................................ 90 10.4.8 rms_2path...................................................................................................................................... 90 10.4.9 rms_p2p.......................................................................................................................................... 91 10.4.10 rms_resd ...................................................................................................................................... 91 10.4.11 SuperTMscore........................................................................................................................... 92 10.4.12 superback ................................................................................................................................... 92 10.4.13 superrms ..................................................................................................................................... 93 10.4.14 str_measures ............................................................................................................................. 94 10.4.15 tmalign (Zhang and Skolnick 2005) ................................................................................ 95 10.4.16 Torstat.......................................................................................................................................... 96 10.4.17 xangle............................................................................................................................................ 97 10.4.18 xcrd................................................................................................................................................ 97 10.4.19 xtors .............................................................................................................................................. 98 11 MOIL files ....................................................................................................................... 99 11.1 monomer..............................................................................................................................99 11.2 property ............................................................................................................................ 102 11.3 poly...................................................................................................................................... 105 11.4 addbond............................................................................................................................. 106 2 11.5 edit ...................................................................................................................................... 107 11.6 connectivity...................................................................................................................... 107 11.7 Coordinates ...................................................................................................................... 108 11.7.1 PDB file interpretation in MOIL .........................................................................................108 11.7.2 CRD file .........................................................................................................................................108 11.7.3 DCD and DVD files....................................................................................................................110 11.7.4 PTH files .......................................................................................................................................111 11.7.5 wene and wmin files ...............................................................................................................111 11.8 Standard input and output in MOIL ......................................................................... 112 11.9 Other special files........................................................................................................... 112 12 Credit .............................................................................................................................113 13 References ...................................................................................................................114 3 2 Obtaining MOIL and Getting Started Moil is available either as a “packaged” version, comprised of source code with prebuilt binaries for popular operating systems, or as source only via a public software version control server from which the very latest source code may be retrieved at any time. Instructions for both can be found at https://wiki.ices.utexas.edu/clsb/wiki. The present document provides a reference for MOIL; the file get_started.pdf in the moil.doc folder is a good place to start if you prefer to jump in immediately and begin performing computations and visualizations and learn and work simultaneously. 3 Goals Moil is a suite of integrated modular (FORTRAN) programs to perform a variety of biomolecular calculations and simulations using molecular mechanic force fields. It takes as input PDB files and performs energy, energy minimization, dynamics, free energy, reaction path, kinetic, and thermodynamic calculations. These calculations help bridge the gap between structure, dynamics, and function. Emphasis is made on unique features developed in the Elber’s laboratory among which are reaction path calculations, simulations of long time approximate trajectories, calculations of kinetics and thermodynamics along reaction coordinates, and Locally Enhanced Sampling. All programs are available through the command line interface. A good selection of programs is also available through the moil.tcl menu-based graphic interface that drives many of the MOIL modules. This menu-based interface, written in tcl/tk, is referred to in this document in different names: moil.tcl, the “menu-based interface”, the “graphical interface”, or the “GUI”, and it exists simply to provide a menu-based paradigm for setting up required input and running the MOIL command-line programs, sometimes to accomplish a series of tasks with one mouse-click. It should not be confused with Zmoil, which is an OpenGL-based visualization program used to visualize with 3D graphics molecules and trajectories, and is documented elsewhere. Nevertheless, Zmoil is most easily accessed from moil.tcl. The prime purpose of MOIL is an engine for HPC applications (High Performance Computing) on clusters of computers. Therefore emphasis is made on the command line usage and the syntax of concrete input files. The graphic interface is used in many cases to generate these input files in more transparent way. However, the graphic interface helps with the initial set up and in the analysis of the results and not in monitoring HPC applications. Massively distributed or parallel applications are not supported through the moil.tcl menu-based interface. A tutorial for the use of the graphic interface can be found in the file get_started.pdf 4 4 Force field MOIL uses the OPLS (Jorgensen and Tiradorives 1988; Pranata, Wierschke et al. 1991; Zichi 1995), OPLS-AA (Kaminski, Friesner et al. 2001) and AMBER force fields (Hornak, Abel et al. 2006). Recent implementation of force fields of nucleic acids is also available. Conversion to other force fields as CHARMM can be done externally by converting database (text) files with no changes to source code. See section 9 on MOIL files for more details. At present there is no source code support for electrostatic polarization. However, MOIL supports the addition of charges off the positions of nuclei using the vprt facility. MOIL also implements the coarse grained force field FREADY (Majek and Elber 2009), and double Gaussian network model that allows for simple calculations of conformational transition (Yang, Majek et al. 2009) 5 Units Moil units are angstrom (length), kcal/mol (energy), and atomic mass (mass of hydrogen is equal to 1). Externally MOIL accepts time steps in picoseconds and converts these times to internal time in MOIL. Temperature is expressed in Kelvin. Angles are input in degrees. 6 Typical order of events MOIL applications for a particular system starts with the conn program, generating a data structure necessary for energy and force calculations (The data structure is written into a connectivity file customarily called *.wcon, where the “*” denotes a wild card for a molecule-specific name). If the graphic interface moil.tcl is used then in parallel to the generation of *.wcon file we may convert a PDB file to MOIL workable coordinate file (customarily called *.crd). If the command line interface is used the *.crd file is generated after *.wcon with the programs puth and (if a solvated molecule is desired) solvatecrd. The files *.wcon and *.crd form the input core for any follow-up calculation that uses the energy function of MOIL. See get_started.pdf in the moil.doc folder for a description of generating .wcon and .crd files from a PDB file. With *.wcon and *.crd files at hand numerous applications are possible. For example energy minimization can proceed with mini_pwl, followed up with Molecular Dynamics simulations with dyna. The output of the dyna program is a *.dcd file: a set of Cartesian coordinates of the system written sequentially as a function of time. 5 The last step is analysis of the results. Here is where creativity might be tested, however a few simple routine tools are available in MOIL. For example xcrd extracts the coordinates of a selected group of atoms, making it easier to analyze and understand subset of motions. 7 Directory structure After successfully installing the source code you should have the following directories in the designated slot. The moil directory is at the head and under it you will find the files: 1. README that discusses installation issues 2. ReleaseNotes_moil11.doc with a few highlights of the current release (11). 3. version. This file contains information about the version of the code at the time it was packaged from the subversion control system SVN. This file is only included in packaged distributions; if instead you pulled the source code directly from our version control system, the same information is available via the “svn info” command. More importantly there are a few directories that store source code and databases. Going down alphabetically (and not necessarily in order of importance) we have make_distribution – this directory includes high level scripts to compile the code. For example to compile the code on Mac OS/X go to the make_distribution directory and simply type ./make_distro_osx. More information on how to compile the code can be found at https://wiki.ices.utexas.edu/clsb/wiki/BuildingMoil moil.amber – a set of perl scripts and inputs to convert from AMBER force field to the MOIL representation. The subdirectory data includes output appropriate for use in moil moil.crd – includes files of sizable water boxes to solvate biological molecules. moil.doc – where the documentations of MOIL can be found. moil.exe – where execution files are stored (ONLY in the Windows version). moil.gui – where the graphic interface (moil.tcl) resides. moil.input – where sample input files for different programs can be found. In practice it is always easier to take existing sample input and to modify it for your own needs instead of creating your own from scratch. moil.mop – where the basic database files, monomers and properties, reside. For example ALL.MONO stores primarily the properties of monomers (e.g. who are the atoms of the residue ALA(nine) and how are they covalently connected) moil.source – where all the source code files reside. For all operating systems (with the exception of the Windows version) the execution files are in moil.source/exe directory. Of particular significance is moil.source/COMMON that includes all the global shared variables (via COMMON) between subroutines and programs. This is also the place where the length of the program arrays is defined (in COMMON/LENGTH.BLOCK). List of directories under moil.source follows analysis – analysis programs boat – Bond Angle and Torsions. Compute internal coordinates for analysis ccrd – convert Cartesian coordinate files between formats 6 chain – The chmin program for computing reaction coordinate based on the SPW functional (Czerminski and Elber 1990) cmoil – the original c-language opengl-based visualization program for MOIL comm – a set of communication subroutine for MPI parallel code comm_dummy – dummy communication routines to allow for serial compilation comm_t – communication for a special machine (Terra) COMMON – where all the shared COMMON blocks (global variables) resides connect – the source code for the program conn coupledDyna – code to run multiple coupled copies of dyna DEE – Dead End Elimination code (not operational) dynamics –the source for straightforward molecular dynamics and Langevin equation simulations exe – all the moil execution files reside in this directory with the exception of the Windows version ( programs are in moil/moil.exe for windows ) fp – the home directory of Milestoning source code (Faradjian and Elber 2004) fready – the coarse grained model of proteins implemented into moil (Majek and Elber 2009) free_e – compute free energy differences by free energy perturbation method along a reaction coordinate (Elber 1990) GB – Generalized Born Surface Area code, Implemented according to (Onufriev, Bashford et al. 2004) generic –generic tools used by many programs, e.g. matrix diag. intrprtr – line interpreter, extract expression and values from command line LES – Locally Enhanced Sampling (Elber and Karplus 1990; Roitberg and Elber 1991; Simmerling and Elber 1994) memeqns – memory equation solver (for Milestoning(Faradjian and Elber 2004)) mfep – minimum free energy path using a string method mini_tn – minimization with truncated Newton-Raphson algorithm. mini_pwl – minimization with conjugate-gradient algorithm with the Powell restart option muta – free energy differences between mutated molecules path – general code for path calculation algorithms, it is used by both Sdel and sdp programs pot – potential energy and forces prepcrd – solvate the solute in a box of water puth – read PDB file, place missing hydrogens and write CRD file s2d – second derivative of the potential sdel – stochastic difference equation in length sdp – compute the steepest descent path (parallel code)(Olender and Elber 1997) sdpS – steepest descent path (serial code) steep – steepest descent minimization stochpath – sdet algorithm (not operational) symm – periodic boundary symmetry operation therm – thermodynamic integration for alchemical changes tmalign – align two structures and compute TM score (Zhang and Skolnick code (Zhang and Skolnick 2005), modified to add “trivial alignment” flag) 7 umbrella – umbrella sampling vopt – vector operations zmoil – a c++ opengl-based visualization program; successor to cmoil (see above) moil.test – a significant number of tests that can be executed semi-automatically and also a number of examples of how MOIL is used are provided. A useful script is run_tests.pl which is a perl script that runs all the tests, or can be used to run a subset of tests with the syntax “perl run_tests.pl test1 test2”. Specific tests reside in different directories (briefly discussed below). Each test directory includes runme.bat file (which is a script to run the test) and an Output directory that stores correct output files to be compared to files generated during the test. Note that, despite the runme.bat filename, these are unix-style scripts. The run_tests.pl will run these tests on a windows system by doing some syntax translation as required. Note that when running the tests, some “errors” detected are a result of floating-point rounding and some due to different formatting. Manual examination of the outputs is therefore advised before crying Wolf! Nevertheless, significant level of comparison is done completely automatically which is a plus. Another advantage of the tests is that “real” runs can be developed based on examples which in many cases is a lot easier than reading manuals. A list of the tests/directories and a brief explanation follows ad_map – compute adiabatic map (phi,psi) for a dipeptide ala3 – a set of minimizations and dynamic calculations (including GBSA) for trialanine bench_vp – simulation with virtual (no mass) charged particles (extension of the force field) connectivity – generate a connectivity file diffdens – compute diffusion constant and density fluctuations of water from dynamic simulation dyna_ssbp – use of the spherical solvent boundary potential of Roux(Beglov and Roux 1994) eball – keeping a spherical constraint on water molecules fp – Milestoning run fready – coarse grained model of proteins free_e – compute free energy difference hb – simulate hemoglobin memeqns – solve memory equation (last step of Milestoning) metal – simulate metal wall (with image charges) mini_tn – use truncated Newton-Raphson algorithm for energy minimization more_water – simulate internal water in gramicidin mult_pept – multiply a part of a peptide with Locally Enhanced Sampling (LES) myo – simulate the classic (myoglobin) nuc_acid – simulate nucleotides path – run chmin boundary-value path calculation pdb2CG – generate a coarse grained model from PDB structure pep21_mini – simulate 21 amino acid peptide 8 prep_solv – solvate a solute (place it at the center of a water box while removing overlapping water molecules) pressure – monitoring pressure during simulations prlltemp – replica exchange simulations read_pdb – read PDB file, convert to CRD and generate a connectivity file. s2d – generate second derivative matrix of the potential Sdel – stochastic difference equation in length Sdelave – stochastic difference equation with some variables thermally averaged. sdp – compute the steepest descent path between two fixed points. special – Landau Zener curve crossing sto_sval – stochastic difference equation in time str_measures – different shape measures symm – periodic boundary condition of solvated system symm2 – another example for periodic boundary condition simulation symm2_prl – a parallel implementation of the periodic boundary condition for straightforward MD therm_cycle – thermodynamics integration umbrella – umbrella sampling along a reaction coordinate valdip – molecular dynamics simulations on valine dipeptide. wfly – capture (and stop) evaporated water molecule for finite system simulations. 8 Basic syntax of the MOIL interpreter 8.1 Overview MOIL has a flexible line interpreter that picks variables from a command line in a consistent way for a range of different MOIL programs. While there are few rules that must be followed the text facilitates reasonably convenient use and building complex inputs and relationships. All commands and variables are case sensitive. The basic structure includes file assignment at the beginning of the input file followed by a list of initialization of different variables. Typically the series of input lines is terminated by a single line with the keyword “action” suggesting that it is time to stop reading and start doing. In some MOIL programs a secondary terminator of the command line *EOD is needed. It is OK to have an extra *EOD at the end of the file since the interpreter ignores extra material. The line interpreter is used to read input lines prepared by the user as well as the data files that determine the potential energy; the hope is that data files are correspondingly (relatively) readable too. A typical structure of a moil input file is therefore something like ~ This is a sample input 9 file … file … x=… y=... z=… action *EOD 8.2 Variable selections (integer, real, double, character, logical) Assignment of value to a variable is done via equal sign, setting an expression. Expressions must be separated by space(s). For example x=5 y=3 means that the value of 5 will be assigned to x and y will be 3. Unacceptable expressions are x=5,y=3 or x=5<tab>y=3 (<tab> is the character tab). Some of the variables are designed to be integers so i=5 is legal but i=5.5 is not. The variable names are case sensitive so do not flip between upper and lower cases. They are typically letters and special characters. For example #ste=10 assigns 10 to #ste which is the number of steps to execute a particular algorithm. The interpreter is reading at most 4 characters per variable so abcdef=5 is interpreted as abcd=5. Writing as input longer names is sometimes useful to increase clarity. For example #step=10 has the same meaning as the shorter expression above, or “action” can be written as “acti”. In case of ambiguity (multiple assignments of the same variable), the last assignment counts. For example “y=5 y=3” will assign 3 to y. Number assignments are either integer, real, or double precision. When a variable is assigned an integer we will denote it by [i], real number by [r], and double precision number by [d] (double precision numbers are written (for example) as 1.0d1 which means 10). Character variables are between parenthesis, i.e. character=(….). For example name=(input). Logical variables are set on and off by their presence (no equal sign). For example prll turns on the parallel option in the code. An example from an input file with a collection of variables assigned is below: #ste=6000 #equ=1000 info=1000 rmax=15.d0 ovlp #crd=1000 #vel=1000 #lis=2000 8.3 Syntax of file assignment The file assignment intends to provide the program an access to existing information and a place to write output. Once a file is assigned it is read either immediately after the 10 assignment line or only after the action keyword is detected. The order of events usually does not matter to the user. The command to assign a file to the project is file file_type name=(file_name) rw_status format [more_options] the “file” keyword tells the interpreter that this is a file assignment command and this line must be interpreted accordingly. “file_type” is the type of the file to be open. There are numerous types of files in MOIL; some are used across many programs and some are specific to a particular module. Below we mentioned the most common ones. More examples can be found in the description of individual programs. Examples of file_type: prop – property file, mono – monomer file, poly – polymerization file, addb – add bond, edit – edit connectivity, rcrd – read coordinates, wcrd – write coordinates, wvel – write velocities, wmin – write minimization results, wene – write energy output. “name” is the name of the file. For example: name=(/usr/guest/guest_file). Note that the file name must not include spaces since the interpreter uses spaces to break the command line into expressions. Windows users who access files directly from the Desktop with filenames including the folder “My Documents” and similar may suffer. For Windows users it is probably better to start from C:\ and have no spaces in the name of the working directory (the default install location for Windows is typically C:\Moil11 – installing to the desktop is not recommended for the above reasons). “rw_status” is the read/write status. File can be opened for a “read” status in which case it must exist and it cannot be overwritten. It is opened at the beginning of the file. The second most frequent option is “wovr” in which a file is open with a status “unknown”. If it does not exist it will be opened and write will be performed. If it exists it will be written over. It is also possible to open a file as “writ”. In that case the program exits with “RED ALERT” if the file already exists. This option is for the true collectors who cannot delete or write over existing files and insist on showing only growth in their disk usage. “format” is the way the file is written. There are two basic formats in MOIL – text and unformatted (FORTRAN style). If no keyword is provided then the default prevails which is the text format. Otherwise the logical keyword “bina” should be added which means write an unformatted (“binary”) file. MOIL expects certain format for different type of files, so this is not really a free choice but a forced choice most of the time. For example Dynamics Coordinate files must be written as unformatted files. “more_options” are connected to specific programs. Some programs require more details on the type of the file (besides being formatted or unformatted). For example the program energy can accept DYNA and PATH keywords to indicate that dcd and pth format are used (respectively). More details on the additional options will be given in specific description of different programs. 11 8.4 Syntax of the “pick” command Moil has a pick subroutine that is used by essentially all modules in MOIL (the syntax of pick in the moil.tcl menu-based interface to MOIL is slightly different and friendlier). The basic command line for "pick" looks something like: pick pick #prt 1 60 | chem mono ALA & chem prtc CA != #mon 2 3 done The command line includes a series of selections interconnected by logical instructions. It is read from left to right until "done" is detected. No brackets are allowed, and hopefully this is not a too severe restriction. The “pick” command can take only part of a line and other instructions may be included before the first “pick” or after the “done” (but not in the middle of the pick segment). On output the “pick” subroutine returns a vector of the length of the number of atoms that contains zeroes (atoms not picked) and positive integers (selected). Most of the time the positive integer is 1, however with the “group” keyword (see below) it is possible to further partition the selected particles to different groups; each group with a different integer value. The above vectors are invisible to the user and this paragraph is provided as a general background for the operation of the pick command. The double “pick” at the beginning is required unless stated otherwise. It is a little confusing and perhaps will be fixed in the future, but for now we are stuck with it. It is therefore useful to understand the source of the double “pick”. The first thing MOIL does to an input line is to parse it to expressions by “rline”. If the keyword “pick” is found the line is being forwarded to the pick subroutine. The pick subroutine needs to determine where the “pick” starts and where it ends and here comes the second “pick” SO there is one “pick” for rline and another “pick” for the pick subroutine. The above line is interpreted from left to right sequentially (in brackets we wrote the corresponding expression) pick particles 1 to 60 (#prt 1 60), or (|) use chemical notation (chem) to pick all the particles of the monomers (mono) alanine (ALA) anywhere along the sequence. The particles selected so far are now subject to a test of .and. (&). From the set selected so far pick according to chemical notation (chem) particles (prtc) that are called (CA), then you should remove (!=) from the selection monomer numbers (#mono) from 2 (2) to 3 (3) and this is all (done). The net result is the selection of CA of all ALAnine residues, as well as CA found in the range of particles from 1 to 60 (whether from ALA or not), but excluding any in the range of residues 2 to 3. The pick line is of the format pick A lexp B lexp C lexp ... done where A-C are selection commands and lexp are logical expressions 12 available keywords: pick - beginning of a pick line. If not found (remember the double “pick”), the subroutine returns with default selection (all particles selected) done - denotes end of selection, time to return the selection made. logical expressions : A | B - standing for A U B (A union B) A & B - standing for A /\ B (common parts of A and B are kept) A != B - only parts of A that are not in B are kept. selection commands #prt - defines range of numerical indices of particles to be picked, e.g. #prt 1 6 #mon - defines range of numerical indices of monomers to be picked, e.g. #mon 1 3 chem - An indicator for "chemical" notation will be used. prtc - select by chemical name of particle mono - select by chemical name of monomer examples: "chem prtc CH3E", "chem mono HEME" grou - A selection of groups will be made. i.e. not only zero/one will be assigned but also group numbers, for example "pick grou 2 #prt 1 4 done" pick particles 1 to 4 as belonging to group 2 8.5 Comment lines A line that starts with the character “~” is a comment and is not interpreted. Example ~ This is a comment line and here A=5 will not assign 5 to A 8.6 Line continuation Continuation lines can be used (up to 300 characters per line) by ending the line to be continued with a space and a dash, “ –“. Example pick pick #prt 1 60 | chem mono ALA & chem prtc – CA != #mon 2 3 done Is equivalent to pick pick #prt 1 60 | chem mono ALA & chem prtc CA != #mon 2 3 done 8.7 The “action” and “*EOD” commands 13 Most of the line commands of moil are interpreted within a “read-line” loop in which variables are read, assigned and stored and no (numerical) action is taking place. The action command (“acti”) is a single keyword in a line that instructs the program to get out from the main reading loop and starts executing instructions. In most cases it is the last input command to a program. However, in some cases there is a need for some operations (after exiting the main read loop) before a decision about new variables can be made (e.g. files that are read only based on information of the prime input, assignment of LES particles, etc.). Hence, in some cases one finds additional input lines after the “action” was initiated. The *EOD keyword is coming also on a separate line at the end of the input file. It denotes the End Of Data that the user provided. It is a useful signal for the program since it indicates that no more input is expected. If something is missing the program has a chance of exiting gracefully instead of attempting another read and crash (ungracefully) while attempting to read beyond the boundary of a file. 14 9 Brief description of MOIL modules 9.1 Special characters in description of programs: • • • * Novel scientific features developed by the MOIL team + run in parallel & available via moil.tcl menu-based graphic interface Special characters used in detailed explanation of input to programs. The special character must be replaced by the appropriate variable when running the program. • * -- a wild card typically replaces a name of molecule in a file name, • [i] -- an integer number • [r] -- a real value number • [l] -- a logical variable. If found set to true • [d] -- a double precision number (e.g. 1.d1 which means 10 in double precision) • ([c]) -- a character variable (must be enclosed in (…) ) 9.2 Major programs: conn& – generate a connectivity file. The connectivity file stores all information necessary for an energy-based calculation. chmin*& –compute approximate reaction coordinates following the SPW algorithm of Czerminski and Elber(Czerminski and Elber 1990). Input files: Connectivity and coordinates of the end points, or prior path. Output a sequence of structures along the reaction coordinate. dyna& – compute a molecular dynamics trajectory. Inputs: a coordinate file and a connectivity file. dyna produces a sequence of coordinate (and velocity) vectors in unformatted files. energy& – A single energy evaluation of a system with known coordinate and connectivity files. Output is in *.wene file. fp* – first passage time calculation(West, Elber et al. 2007). Sample trajectories at Milestones and simulate trajectories between Milestones. Compute kinetics and thermodynamics along a reaction path. Input: connectivity and reaction path. Output: distribution of first passage times. freee – computes a free energy profile by free energy perturbation and/or thermodynamic integration along a reaction coordinate. mfep – compute a minimum free energy path by the string method [Eric} mini_tn& and mini_pwl& – truncated Newton-Raphson and Powell conjugate gradient minimizations. Input: coordinate and connectivity files. Output: energy listing and file of minimized coordinate file. 15 sdpS* and sdp*+ – Compute the steepest descent path following the scalar work algorithm of Olender and Elber(Olender and Elber 1997). Input files: connectivity and coordinates of the end points, or a prior path. Output a sequence of structures along the reaction coordinate. puth& – add hydrogens to a Protein Data Bank file and convert to a format accessible to MOIL calculations (same as CRD file in CHARMM). Inputs: PDB file and connectivity file. Output a CRD file with the additional hydrogens. scndrv – compute second derivatives of the potential. Useful for normal mode calculations and a highly refined optimization. Inputs: coordinate and connectivity file, output the second derivative matrix. A variant is used as a subroutine. sdelS* and sdel*+ – Large time step trajectories computed with an optimization of an action(Elber, Ghosh et al. 2002; Ghosh, Elber et al. 2002; Cardenas and Elber 2003; Elber and Cardenas 2004). Inputs: connectivity and end point coordinate files (or a prior path file). therm – compute free energy difference by thermodynamic integration varying Hamiltonian parameters. Input: two connectivity files and a coordinate file. umbr – compute the potential of mean force along a reaction coordinate using umbrella sampling. Inputs: connectivity and reaction coordinate file. 9.3 Some noted options dynapress – run dyna with pressure monitoring. FREADY* – run dynamics and energy minimization with a coarse grained potential for proteins (Fold REcognition And Dynamics, (Majek and Elber 2009)). LES*& – Locally Enhanced Sampling(Elber and Karplus 1990; Roitberg and Elber 1991; Simmerling and Elber 1994). Allows for enhancing the sampling of a small part in a large system (a ligand in a protein(Elber and Karplus 1990; Czerminski and Elber 1991; Gibson, Regan et al. 1992), side chain in a protein (Roitberg and Elber 1991), solvated peptide (Simmerling and Elber 1994; Mohanty, Elber et al. 1997)). Modify coordinate and connectivity file. Used in conn, energy, mini_tn, mini_pwl and dyna. dynapt+ – Replica exchange simulations for better equilibrium sampling (Kirmizialtin, and Elber, submitted). PME& – sum long-range electrostatic forces with Particle Mesh Ewald (we use the Darden code (Darden, York et al. 1993)). LD – run Langevin dynamics. con_specl& – prepare dynamics with electronic surface crossing using the LandauZenner model. con_specl generates a second energy surface to allow surface crossing (Li et al(Li, Elber et al. 1993)). MUTA – compute free energy differences of mutants, 9.4 Utilities addion& – add ions to a solvation box to make the system neutral (required for Ewald calculations) 16 boat& – compute BOnd Angle and Torsion for a structure (check and list all the internal coordinates) ccrd& – convert coordinates between different formats (CRD/DCD/PTH) crd2pdb& – convert crd file (CHARMM coordinate file format) to PDB format. memeqns – use local first passage time distributions (output of fp) to compute the overall first passage time. numerical and test_drv – compute derivative of the potential numerically (by finite difference) test_drv tests first derivatives, numerical second derivatives. Useful for testing eforce and scndrv analytical code. ovrlp_trj& – overlap structure of a dynamics file (*dcd) with respect to a reference structure. path_eqw – take a reaction path (a set of structures in a single *pth file) in vacuum, and convert it to a solvated path. Generate also a poly file that is used to create a new connectivity file with the new water molecules (TIP3P) included. pdb2puth – edit pdb file to include terminal monomers and change atom/monomer names. This program is deprecated. A more complete “processing” of a PDB file to produce MOIL input files (.crd, .wcon, etc) is possible via the moil.tcl “Process PDB” menu item – see get_started.pdf for a description of using the moil.tcl interface to interactively process a PDB file in this manner. Alternatively, moil.tcl may be used as a non-interactive command-line tool to process a PDB file with the syntax: moil.tcl procpdb –drop <name of pdb-file>. This syntax may be used with two options: -drop: unknown monomers will be dropped without prompting -nodrop: unknown monomers will halt processing of the PDB, without prompting If neither option is given, it is assumed the script is being run on an interactive workstation and a dialog box will ask the user how to handle the unknown monomer. reconstruct – code to reconstruct an atomically detailed model from a coarse grained model. solvatecrd& – water box solvation of vacuum structures. 9.5 Analysis programs av_dif – compute average diffusion coefficient from a path or dynamics files. contacts& – follow contacts for selected subset of atoms along a trajectory eff_diffdens& – compute diffusion constants and densities for water molecules on spatial grid. fluc& – compare (root mean square distance) rms with respect to a reference structure and to the time averaged structure from a Molecular Dynamics trajectory. rgyr& – compute radius of gyration for a sequence of structure from a Molecular Dynamics trajectory. rms_2crd rms_2path rms_p2p – All kind of root mean square distance calculations. 2crd compares structures in two crd files. 2path and rms_p2p analyze two separate path files. rms_resd& – overlap structures in Molecular Dynamics and compute the average rms for each residue (to compare with B factors, for example). 17 str_measures – provides shape descriptors that relate to the three eigenvalues of the tensor of inertia. These descriptors were developed by (Honeycutt and Thirumalai 1989). superTMscore – compute the TM score for structural alignment of two 3D objects. A measure invented and programmed by (Zhang and Skolnick 2005)that we use with small adjustment to MOIL format. superback, superrms – variants of the rms program that provides additional options. tmalign – the structural alignment program from Jeff Skolnick group (Zhang and Skolnick 2005) that we use with minor modification to allow forcing a trivial alignment. torstat – extracts statistics for backbone torsions from a dynamics or path file. xangle& xcrd& xtors& – extract specific angle, coordinate or torsion from a dynamics or path file. 9.6 The moil.tcl, cmoil, and zmoil programs This document focuses on the use of the keyboard to prepare input to the molecular modeling suite of programs MOIL. moil.tcl is a menu-based graphical interface written in tcl/tk which is available in Windows, OS/X, and Linux that facilitates the generation of many types of input files. It is a convenient choice to process a protein data bank file (PDB) to the different files (coordinate and connectivity) required by other MOIL programs. Zmoil is the OpenGL-based molecular visualization component of MOIL, and has a separate documentation. It is the successor to the program cmoil, which while still included in the MOIL distribution, is no longer actively developed. 9.7 Parameters for energy and force calculations -‐ EPROP This set of parameters appears in numerous moil modules. We collect these parameters in a single section below and refer to this paragraph using the name EPROP in the description of the other programs. Default values are in curly brackets {…}. Note that not all options listed below are available in all programs. Parameters for curve crossing and Landau-Zener model for electronic curve crossing (Li, Elber et al. 1993). mors – a flag stating that the present line defines a Morse bond [l] {false} – Initiates Morse bond parameters. The connectivity file must include an entry about the number of Morse bonds. Morse bonds are added to the connectivity file via the addb file. The mors line specifies the energy parameters (for 4 Morse bonds) all double precision: mors alph=2.0 Dmor=30. alph=2. Dmor=30. alph=2. Dmor=30. alph=2. Dmor=30. The equilibrium distance for the above bonds is read when a Morse bond is declared in the addb file. Dmor - the Dissociation energy of a Morse bond (kcal/mol) [d]{30.0d0} alph – range parameter for a morse bond. Used in dyna in simulations that use Morse energy [d]{1.0} spec – a flag for switching between different electronic energy surfaces 18 rcut – It is the range distance employed in the switching function between different forms of the heme. [d]{5.0} lmda – range parameter for adiabatic potential flip between two Born Oppenheimer surfaces (part of the Landau Zener model) [d]{3.0} repl – [l] {false]– introduces an exponential repulsion terms between two atoms (Morse and exponential repulsion present the two electronic energy surfaces for simulating crossing). The repl functional form is Ae − β r + B . The number of repulsions is typically the same as the number of Morse bonds and it is read from the repl line (example with 3 repulsion) all double precision: repl Arep=80. beta=1. Brep=4. Arep=80. beta=1.0 Brep=4. Arep=80. beta=1. Brep=4. Arep – the excited curve pre-exponential factor which is used in curve crossing calculation [d]{100.0} Brep – the asymptotic value of the repulsion curve at large distances, i.e. the potential looks like V(r)=exp(-beta*r)+Brep. It is necessary for curve crossing (influences the crossing point). [d]{100.0} beta – range parameter for the exponential repulsion bond [d]{1.0} cent -- [l] {false}– restraining the geometric center of a selected set of atoms to be at a fixed point. The following parameters are optional (must be in the same line): kcnt, which is the force constant; xeqm, yeqm ,zeqm which are the coordinates of the fixed point and a “pick” command to select the subset of restrained atoms, e.g., cent kcnt=[d]{10.d0} xeqm=[d]{0.} yeqm=[d]{0.} zeqm=[d]{0} pick… done rvmx – [d] {6.d0} cutoff for Van der Waals interactions. Note that the default is very low rvbg – [d] {-1.d0} a second (larger) cutoff for Van der Waals interaction for buffer computations. The default is negative since unless explicitly assigned, it is computed as rvmx+2. relx – [d] {8.d0} cutoff for electrostatic calculation. Note that relx MUST be larger than rvmx. rebg – [d] {-1.d0} a second (larger) cutoff for electrostatic interactions. The default is negative since unless given explicitly it is computed as relx+2. cutm – [d] {-1.d0} cutoff for monomer monomer distance (used in intermediate calculation of the non-bonded list). If not explicitly given, computed from relx*1.2. rmax – [d] {-1.d0}. A single cutoff for all non-bonded interactions. Used to indicate no cutoff , i.e. rmax=9999. Not used anymore to indicate actual cutoff and kept for past consistency. gbsa – [l] {false} turn on Generalized Born Surface Area calculations (Tsui and Case, 2000). gbo1 – one of the options for gbsa calculation gbo2 – a second options for gbsa calculation npol – the surface area non-polar component of gbsa surften or sten – [d] {0.005d0} surface tension coefficient. gbsu – [i] {0}frequency of updating the gbsa neighbor list. epsi – [d] {1.d0} dielectric constant. Most applications do not use it and its impact is precomputed to the connectivity file. 19 hscl – [d] {1.d0} scale for hydrophobic potential between Cbeta atoms (old potential, no longer used). nobo noan noto noim novd noel nohy – different logical variables ([l](false} each) turning off different energy terms, nobo is no bonds, noan is no angles, noto is no torsions, noim no improper torsions, novd is no Van der Waals, noel no electrostatic, nohy no hydrophobicity phenomenological term. hvdw – [l]{false} set finite van der Waals radius for hydrogen atoms (usually zero in OPLS). Helps to avoid numerical instabilities at high temperature simulations or when the initial structure is highly distorted. cnst – [l]{false} turn on constraint energy symm – [l]{false} symmetry operations. Periodic boundary conditions for rectangular boxes. A definition of box size xtra=[d]{0.d0} ytra=[d]{0.d0} ztra=[d]{0.d0} must come in the same line with. ewald – [l]{false} apply the Particle Mesh Ewald sum. The following parameters are optional and should come in the same line as ewald declaration: Error tolerance dtol=[d]{0.d0} ; grid in x,y,z (typically 32) grdx=[i]{0} grdy=[i]{0} grdz=[i]{0} ; more scaling parameters sgdx=[d]{1.d0} sgdy=[d]{1.d0} sgdz=[d]{1.d0} vprt – [l]{false} virtual particles are present to better model charge distribution. Examples are TIP4P (not supported in moil currently) and carbon monoxide. The keyword gcnt[l]{false} in the same line means that the geometric center will be used to determine the coordinate of the charge. amid – add a harmonic constraint on the torsions of the amide planes to ensure trans configuration kamd – value of the force constant [d]{100.d0}. ball – [l]{false} to indicate a restraining spherical potential on water molecules. In the same line the option variables are: (i) force constant for the harmonic restraint that keep the water in the spherical boundary fbal=[d]{0.d0}, (ii) radius of the sphere rball=[d]{0.d0} , origin of the sphere xbal=[d]{0.d0} ybal=[d]{0.d0} zbal=[d]{0.d0} metl – metal boundary condition for a box. It includes a repulsion A/(y-y_start)^6 along the Y axis, voltage term, and image charges. The exact position of the interface is bwal. Other parameters in the same line are amtl=[d]{50.d0}; bwal=[d]{ytra} ; the voltage v_el=[d]{0.d0}. 20 10 MOIL in depth 10.1 The major programs For more details on file formats see section 11 Moil files 10.1.1 conn Purpose: Generate a connectivity file. This file is a necessary input for all programs that use energy or force calls. Also used to obtain the internal coordinate in general. In the moil.tcl menu-based interface to Zmoil the connectivity file is used to extract the atom and bond identifiers. Use : conn < input > output Input file types: Required prop – define atom names, atom charge, Lennard Jones parameters, bond length and equilibrium distance, angle, torsion, improper torsion parameters. Prepared versions of this file exist in the directory moil.mop with a large list of predefined monomer (all amino acids, nucleotides, water, ions, etc.). The most widely used file is ALL.PROP. This version should be used unless there is a need to have energy terms not defined in the existing file. mono – define monomer names, the atoms that belong to the monomers, the monomers that are connected to it and the bond between atoms, optional are the definition of atom charges and surface area in the monomer file. If defined in the mono file they overwrite the properties listed in the prop file. Pre-prepared versions of this file exist in moil.mop. The one used most frequently is ALL.MONO. This version should be used unless there is a need to build a new monomer. poly – the file with the sequence (list of monomers) that we wish to study. It is provided by the user and is (obviously) specific to the molecule we wish to study. This file can be created using the GUI. Input file types: optional ubon – list of bonds to add to the default generation determined by the mono file. For example adding S-S bonds or iron ligand bond is done here. An option in MOIL is to add a Morse bond D e −2α (r −r0 ) − 2e −α (r −r0 ) , where r0 is the minimum energy position. A typical line adding a Morse bond between atoms 1299 and atom 1346 is mors atm1=1299 atm2=1346 requ=1.743 uedit – changes to the bond structure determined by the mono file. For example removing some of the HEME angles the uedit file would look as follows: remo angl chem HEM1 157 NA HEM1 157 FE HEM1 157 NC remo angl chem HEM1 157 NB HEM1 157 FE HEM1 157 ND *EOD [ ] Output file types: required 21 wcon – the (written) connectivity file that includes information on the molecular topology and the parameters that are required for energy calculation. Output file types: optional wco2 – a second connectivity file for the same molecule. Useful for curve crossing calculation. Variables: (in square brackets - type, curly bracket default) mshk – turn on SHAKE for water molecules. Do not initiate flexible bonds for water molecules. [l] {false} prll – initiate parallel run (starting from Moil11 this parameter is obsolete ) [l] {false} debu – print a lot of debugging information [l] {false} hydr – turn on hydrophobic potential of Sippl [reference] [l] {false} arit – change the combination rule for LJ sigma (from σ iσ j to. 0.5(σ i + σ j ) ). Allowing for easy flip between force fields such as AMBER and OPLS. [l] {false}. mdiv – allows refinement of the nonbonded list. The generation is based (to begin with) on monomers. If the monomers are large inaccuracies may occur since the distance between the center of mass of the monomers will not reflect atomic distances near the edges. Mdiv makes it possible to use fragments of monomers (defined in ALL.MONO file) in the calculations of the list. The fragments are defined in the monomer file. An example in which mdiv is used is in the definition of phospholipids. [l] {true} nomd – explicitly turn off mdiv [l] {false} hvdw – add small Van der Waals radius to hydrogen to avoid Coulombic explosion [l] {false} muta – prepares the program to read which are the particles involved in a mutation as stated with the command MUTA. acti – final read, here signal to stop reading and start executing. Post action instructions Optionals: LES command multiplies a subset of particles selected in a pick command: MULT(iply) pick pick_expression done #cpy=[i]{1} MUTA establishes a free energy calculation interpolating between two chemical species with potentials U1 and U 2 . For interpolation we use U ( λ ) = (1 − λ )U1 + λU 2 , where the interpolation parameter λ ∈[ 0,1] . A selection defines the first specie as group 1 and the second specie as group 2 following the syntax below: MUTA pick grou 1 #prt 7 9 | grou 2 #prt 15 15 done. Particles 7 to 9 are specie 1 and group 2 is particle 15 and specie 2. Use with FREADY In order to prepare a connectivity file for FREADY (a coarse grained model of proteins) extend all names of amino acid residues in the poly file with the forth letter Z (e.g. ALA => ALAZ). Remove the NTER monomers and change all CTER monomers to CGTR 22 monomers. This will work with monomers/properties defined in ALL.MONO and ALL.PROP from moil.mop distribution directory. Sample conn input (running multiple LES copies of CO in myoglobin) File name : mb10co_conn.inp file mono name=(../../moil.mop/ALL.MONO) read file prop name=(../../moil.mop/ALL.PROP) read file poly name=(mb10co.poly) read file ubon name=(mb10co.addb) read file uedi name=(mb10co.edit) read file wcon name=(mb10co.wcon) wovr mdiv action MULT pick chem mono CO done #cpy=10 *EOD Sample user supplied input files File name : mb10co.poly MOLC=(MYOG) #mon=158 NTER MET VAL LEU SER GLU GLY GLU TRP GLN LEU VAL LEU HIS VAL TRP ALA LYS VAL GLU ALA ASP VAL ALA GLY HIS GLY GLN ASP ILE LEU ILE ARG LEU PHE LYS SER HIS PRO GLU THR LEU GLU LYS PHE ASP ARG PHE LYS HIS LEU LYS THR GLU ALA GLU MET LYS ALA SER GLU ASP LEU LYS LYS HIS GLY VAL THR VAL LEU THR ALA LEU GLY ALA ILE LEU LYS LYS LYS GLY HIS HIS GLU ALA GLU LEU LYS PRO LEU ALA GLN SER HIS ALA THR LYS HIS LYS ILE PRO ILE LYS TYR LEU GLU PHE ILE SER GLU ALA ILE ILE HIS VAL LEU HIS SER ARG HIS PRO GLY ASN PHE GLY ALA ASP ALA GLN GLY ALA MET ASN LYS ALA LEU GLU LEU PHE ARG LYS ASP ILE ALA ALA LYS TYR LYS GLU LEU GLY TYR GLN GLY CTRG HEM1 CO *EOD File name : mb10co.addb bond chem HIS 95 NE2 HEM1 157 FE *EOD File name : mb10co.edit remo angl chem HEM1 157 NA HEM1 157 FE HEM1 157 NC remo angl chem HEM1 157 NB HEM1 157 FE HEM1 157 ND *EOD 23 10.1.2 puth Purpose: Add hydrogens to a structure that includes only atoms significantly heavier than hydrogens (e.g. C, N, and O). Lack of hydrogens is typical in PDB structures that are determined by X-ray crystallography. Use : puth < input > output Input file types: Required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – read coordinate file. Must include ctype=(pdb) in the line since the only coordinate file type supported for puth is PDB format. Input file types: Optional None Output file types: Required wcrd – written coordinate file with the hydrogen built in. Type must be CHARMM ctyp=(CHARM) Output file types: Optional None Variables None Post action parameters None Sample puth input file conn name=(ery.wcon) read file rcrd name=(ery.pdb) read ctyp=(pdb) file wcrd name=(ery.crd) wovr ctyp=(CHARM) action 24 10.1.3 energy Purpose: Calculate and output the energy of one or a set of coordinates Use : energy < input > output Input file types: Required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd - the file where the Cartesian coordinates of all the particles are stored. The possible formats for this coordinates file are: CHARM – coordinates written in charmm format (default); DYNA – coordinates taken from a dynamics DYNA file; PATH – coordinates taken from a path format file (binary). Out files types: Required wene – file name for energy listings are summarized Variables debu – a flag for printing a lot of debugging information. Do not use unless you are a moil expert. EPROP ARE AVAILABLE FOR energy CALCULATIONS. cdie - constant dielectric, currently the only option available (the default option, presently the only option) rdie - a flag indicating that good old Coulomb law is modified from 1/r to 1/r2 (not active) shif - a flag indicating a different style of cutoff which brings the interaction energy to zero continuously (not active) gcnt - used when virtual particle are present. If found Geometric CeNTer instead of center of mass (default) is used. #str or #ste– Number of structures if a dynamic coordinate or path files are used. [i]{1} Sample input ~ ~ energy calculation: ~ file rcon name=(../conn-ala3/ala3.wcon) unit=10 read file rcrd name=(ala3_min.crd) unit=12 read file wene name=(ala3-gbsa.ene) unit=13 wovr rmax=9999. epsi=1. cdie gbsa action 25 The output file ala3-gbsa.ene has a standard moil structure Parameters for energy calculation Constant dielectric will be used. elec. Cutoff= 9999.00000 vdW cutoff 9999.00000 GB polarized solvation energy required (Hawkins) ENERGIES: E total = 939.756 E bond = 968.827 E angl = E impr = 61.795 E vdw = E 14el = 117.013 E 14vd = E cnst = 0.000 E evsym= E centr= 0.000 E hydro= E gbsa = -182.283 E pol = -182.283 E nonpol = Norm Force = 85.274 Number Number Number Number Number of of of of of 163.960 -0.498 -0.194 0.000 0.000 E tors = E elec = 1.916 -190.8 E elsym= 0.000 0.000 neighbours for short range int. uncharged vdW interactions elec. only interactions wat-wat shrt. range neighbors wat-wat long range neighbors 40 25 51 0 0 26 10.1.4 mini_pwl and mini_tn Purpose: Minimize the energy of a given structure with respect to Cartesian coordinates. Coarse grained FREADY input is supported, See FREADY documentation for details. 10.1.4.1 mini_pwl: Purpose: Use conjugate gradient algorithm with the Powell restart option to locally optimize the energy as a function of Cartesian coordinates. Use : mini_pwl < input > output Input file types: required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – read coordinate file. The default type of coordinates is CHARM. The keyword cstyl can change it (see cstyl below) to PATH (pth) and DYNA (dcd) formats. Input file types: optional con1 & con2 – the possibility of multiple connectivity files that describe (each) a different Born Oppenheimer surface. Output file types: required wcrd – write output coordinates in this file wmin – report progress of the minimization in this file. Output file types: optional wpth – write minimized coordinates in pth format in this file (replaces wcrd). Variables: (in square brackets –type, curly bracket default) EPROP ARE AVAILABLE FOR mini_pwl EXCEPT vprt. gcnt – the keyword gcnt in the same line with vprt means that the geometric center will be used to determine the coordinate of the charge [l]{false} Extra variables for minimization cpth –the input coordinate file is a pth formatted file read structure number istr=[i]{0} [l] {false} cdyn –the input coordinate file is a dcd (dyna) formatted file read structure number istr=[i]{0}[l] {false} DYNA – read and minimize a set of structures from a dcd (dyna) formatted file The range of sequential structures are from lpst=[i]{1} to lpen=[i]{-1} PATH – read and minimize a set of structures from a pth formatted file 27 The range of sequential structures are from lpst=[i]{1} to lpen=[i]{-1} wpth – write out coordinate file in path format tolf –tolerance of force. When tolf is reached during the minimization, terminate[d]{0.d0}. mistep –number of minimization steps. When mistep is reached, terminate. [i] {100} list or #lis – [i]{20} number of steps between updates of the non-bonded list [i]{100}. TORS – minimization is possible with torsional constraints. Note that the constraint’s flag must be set previously by the keyword cnst (shared by ENERGY). Torsion is specified with 4 atoms listed in the same line: atm1=[i]{0} atm2=[i]{0} atm3=[i]{0} atm4=[i]{0} then we specify a force constant kcns=[d]{0.d0} and angle (in degrees) cneq=[d]{-999.d0}. Another important option is loop[l]{false} which indicates that a loop on torsion values will be computed (useful in generation of adiabatic maps). The loop is specified with a start position strt=[d]{0.d0} , a stop position stop=[d]{0.d0} and a step size step=[d]{1.d0} Sample input file file rcon name=(valdip.wcon) read file rcrd name=(a.crd) read file wpth name=(admap.pth) binary wovr file wmin name=(valdip_mini.out) wovr rmax=9999. tolf=1.d-4 mistep=5000 list=500 cnst TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100. loop strt=-180 stop=180 step=30 TORS atm1=4 atm2=6 atm3=10 atm4=12 kcns=100. loop strt=-180 stop=180 step=30 action *EOD 28 10.1.4.2 mini_tn: Purpose: Use truncated Newton Raphson algorithm to locally optimize the energy as a function of Cartesian coordinate. It is lot slower than mini_pwl and with less options (e.g. no gbsa and no metl). It provides very accurate minimum that is important (for example) in calculations of normal modes. Use : mini_tn < input > output Input file types: required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – read coordinates. The default type of coordinates is CHARM. The keyword cstyl can change it (see cstyl below) to PATH (pth) and DYNA (dcd) formats. Input file types: optional None Output file types: required wcrd – write output coordinates in this file wmin – report progress of the minimization in this file. Output file types: optional wpth – write minimized coordinates in pth format in this file (replaces wcrd). Variables EPROP ARE AVAILABLE FOR mini_tn EXCEPT vprt. gcnt – the keyword gcnt in the same line with vprt means that the geometric center will be used to determine the coordinate of the charge [l]{false} Extra variables for minimization cpth – the input coordinate file is from a pth formatted file read structure number istr=[i]{0} [l] {false} cdyn – the input coordinate file is from a dcd (dyn) formatted file read structure number istr=[i]{0} [l] {false} DYNA – read and minimize a set of structures from a dcd (dyn) formatted file The range of sequential structures are from lpst=[i]{1} to lpen=[i]{-1} PATH – read and minimize a set of structures from a pth formatted file The range of sequential structures are from lpst=[i]{1} to lpen=[i]{-1} wpth – write out coordinate file in path format tolf – tolerance of force. When tolf is reached during the minimization, terminate [d]{0.d0} mistep – number of minimization steps. When mistep is reached, terminate. [i]{100} list or #lis – [i]{20} number of steps between updates of the non-bonded list [i]{100} 29 TORS – minimization is possible with torsional constraints. Torsion is specified with 4 atoms listed in the same line: atm1=[i]{0} atm2=[i]{0} atm3=[i]{0} atm4=[i]{0} then we specify a force constant kcns=[d]{0.d0} and angle (in degrees) cneq=[d]{-999.d0}. Another important option is loop[l]{false} which indicates that a loop on torsion values will be computed (useful in generation of adiabatic maps). The loop is specified with a start position strt=[d]{0.d0}, a stop position stop=[d]{0.d0} and a step size step=[d]{1.d0} Sample input file file conn name=(val.wcon) read file rcrd name=(val.crd) read file wcrd name=(valmin.crd) wovr file wmin name=(valmin.out) wovr mistep=100000 tolg=0.000001 rmax=9999. action 10.1.5 dyna Purpose : Integrate Newton’s equations of motion using “Velocity Verlet” algorithm (see for instance “Understanding molecular simulations: from algorithms to applications”, D.Frenkel, B.Smit). This program takes as an input the coordinate file and the connectivity file and gives as an output the trajectory of the protein (which can be visualized using zmoil). Use : dyna < input > output Input file types: required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – the file where the Cartesian coordinates of all the particles are stored. These are the initial conditions to solve the Newton’s equations. The possible formats for the coordinate file are: ctyp=(charm) – coordinates written in charmm format (default); ctyp=(path) – coordinates taken from a path format file (binary). Input file types: optional rtet - coordinates file for tethering particles to their initial coordinates during MD. Harmonic springs are attached to the position of the particles as defined in rtet. The coordinates are in charmm format (default). ucon1 & ucon2 – two special connectivity files to model two electronic energy curves and curve crossing with the Landau-Zener model. 30 rvel – file with initial velocities if you do not want to sample them randomly from the Boltzmann distribution. Velocities are written in charmm format. Output file types: required None Output file types: optional wcrd – a binary file where the Cartesian coordinates of the particles are stored. wvel – a binary file where the velocities of the particles are stored. rest – a file that stores the last step saved for the restart. rstr – a file where recent coordinates for restart are stored. Variables EPROP ARE AVAILABLE FOR dyna. eqms –turn on the logical variable eqms=true [l]{f}. All masses will be set to 10. The idea is to allow for more efficient equilibrium simulations. bigb – Set a spring constant for the bond between any two particles to the value newb. Bonds are selected with pick newb – new bond spring constant [d]{500.d0} wfly – For solvation shell simulations without spherical constraints. Check if water molecules fly away from a solvation shell (use to study molecules in vacuum with a solvation layer (Steinberg, Breuker et al. 2007)) and stop it from reaching infinity to avoid numerical problems. [l]{false} tstd – turns on the check on acceptable distances between atoms. If it is shorter than 1.5A prints a warning. No dynamics will be run. It is a single structure evaluation to find bad contacts. [l]{false} nfrz – select the particles that WILL NOT be frozen. In the same line as nfrz a "pick" command must follow cgsk – matrix shake in conjugate gradient mshk – a flag to indicate the use of special constraint protocol matrix shake for water molecules When added to the conn program, bond and angles are excluded from connecticity list of the water molecules mtol – is the tolerance in mshake [d]{1.d-7} shkl – turns on shaking of bonds with light particles ( m < 1.1) shkb – turns on shaking of all bonds shac – maximal error allowed for bond constrained (coordinates) [d]{1.d-7} shav – maximal error allowed for bond constrained (velocities) [d]{1.d-7} itsh – maximum number of allowed iterations for SHAKE convergence [i]{100} cgpt – maximum number of iterations in conjugate gradient in a matrix shake(Weinbach and Elber 2005) (not active) [i]{NA} cgvl – maximum number of iterations in conjugate gradient in a matrix shake (not active) [i]{NA} 31 nori – turns on the reorientation during the dynamics nosc – avoids scaling of temperature (done by default) orie – by default to do the reorientation to avoid rigid body motion, the overlapping of the structure with the initial structure is done using all atoms. If call orie it is possible to pick only some atoms TORS – turns on the constraining of some torsional angles example - TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100 atm1 atm2 atm3 atm4 are the constrained atoms [i]{0} kcns – amplitude for torsional constraint (the larger kcns, the stronger the constraint) [d]{0.d0} cneq – equilibrium angle expressed in degrees [d]{-999.d0} ( TORS keyword must come AFTER amid if amid is used) spec – option of switching between different energy surfaces example: spec lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0 lmda – is the range parameter for continuous potential shifts from in plane to out of plane configuration of the heme iron [d]{3.d0} rcut – is the range distance employed in the switching function between different forms of the heme [d]{5.d0} swit – this flag makes possible the passage between different energy curves in LandauZener calculations example: switch Rcro=3.53181 dRcr=0.05 Forc=5.59951 delt=0.287 Rcros – is the position at the crossing point [d]{3.d0} dRcr – the interval (in angstrom) of significant interaction between two curves that cross (i.e. the range in which a transition probability between the two electronic curves is evaluated) [d] {1.d0} Forc – the difference in the forces at the crossing point of the two electronic curves [d]{0.1d0} indicate the time interval during which curve crossing is felt [d]{0.1} nocut – Non bonded interactions are computed in full according to existing lists. No additional distance check is used in the energy routines. Important (and the default) in minimization that requires continuous and differential energy function (like conjugate gradient) shif – a flag indicating different style of cutoff which is no longer used. Ignore at present, may return sometime in the future nbfi – a flag to indicate that a soft, finite Van der Waals repulsion (Gaussian repulsion is employed). It is useful when initial structure is bad with many overlaps since it avoids hard core energy singularities. sym2 – A flag indicating that the box size is changing during the simulation and the final size is determined in the present line. The rate of change is linear in the simulation time from the value defined by the symm command and the values found at sym2 example - sym2 xtr2=26.8 ytr2=26.8 ztr2=26.8 xtr2, ytr2, ztr2 – are the final sizes of the box that is changing during the simulation [d]{0.d0} tthr – Selected atoms are harmonically linked to fix positions in space. A pick command is expected in the same line 32 frcc – Force constant for tether constraints (linking particles to specific position in space){d}[1.d0] mult – Multiple temperatures are present Picked temperatures are used for velocity scaling of subsets of the system. The default is that all particles belong to temperature 1 (have the same temperature). Useful in annealing in which only part of the system is bad, or in LES in which equipartition is violated and different scaling is used for enhanced and regular parts(Ulitsky and Elber 1993) #tmp – number of temperatures in the system (useful for LES simulation) . If larger than one more input is required to define different domains with different temperatures [i] {1} tmpi – initial temperature [d] {300} tmpf – final temperature [d] {300} For a number of tmpi and tmpf for #tmp>1 rand – a random number seed for (Boltzmann) velocity generator. [i]{1} step – time step in ps [d] {0.001d0} newv – how many steps you need before assigning new velocities.[i]{0} #rig – number of steps between two rigid rotations to remove rigid body motion.[1]{10} fmax – if >0 turns true sdyes, which means that you do steepest descend. Its value is the threshold above which you should perform steepest descend iterations [i]{-1} strt – for restart debu – print a lot of debugging information [l] {false} pdeb – print virial term of the pressure on standard output [l] {f} #ste – number of MD steps [i]{1} #equ – number of equilibration steps [i]{1} info – number of steps between writing information on standard output [i]{1} #crd – number of steps between writing coordinate sets [i]{0} (=0 means do not write coordinates) #vel – number of steps between writing velocities sets [i]{0} (=0 means do not write velocities) #lis – number of steps between regeneration of the non bonded list [i]{1} #scl –if temperature(s) deviates more than this value it is being rescaled [d]{0} (=0 means isokinetic ensemble) cont – Used to denote that this is a continuation of a dynamics run [l] {false} strt - is the starting step for dynamics[i]{1} rdie – DIstance dependent (linear) dielectric – no longer active [l]{false} FREADY Coarse grained model for protein interactions works with dyna (see FREADY documentation 9.2.3). Sample dyna input file conn name=(mb10co.wcon) read file rcrd name=(mb10comin.crd) read file wcrd name=(mb10co.dcd) bina wovr 33 file wvel name=(mb10co.dvd) bina wovr #ste=500 info=100 #equ=10 #crd=100 #vel=10 #lis=20 #scl=10 rand=-3451187 step=0.002 tmpi=10 tmpf=300 relx=11. rvmx=8. epsi=1. cdie shkb shac=1.d-6 shav=1.d-6 itsh=500 action 10.1.6 dyna_prl (parallel program, requires MPI) Purpose: The executable dyna_prl is a parallel version of the program dyna in which the calculations of the forces are done in parallel by splitting the interaction lists. It takes exactly the same input as dyna. The only difference is that the input for the simulation is always taken from a file named Input and the textual output is returned in a series of files called dyn_out_x.out, where x is the process number. Please consult the file dyn_out_0000.log for the most detailed output. Openmpi needs to be installed on the system in order to run dyna_prl. Use: mpirun –np number_of_processes dyna_prl 10.1.7 freee Purpose : Executable to perform free energy calculation by free energy perturbation. Use : freee < input > output Input file types: required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – the file where the Cartesian coordinates of all the particles are stored. The only possible format for this coordinates file is: ctyp=(path) – coordinates taken from a path format file (binary). Output file types: optional wcrd – a binary file where the Cartesian coordinates of the particles are stored wslo –The slope of the reaction coordinate wfrc – The forces in direction of the reaction path direction to make it possible to compute force-force correlation function and approximate friction kernel wene – Energy output Variables (in square brackets – type, curly bracket default) EPROP ARE AVAILABLE FOR freee EXCEPT srften,gbsu, hscl, mors, repl, gbsa, gbo1, gbo2, npol, ball. debu – prints out on the debug file a lot of debugging information [l]{false} 34 rvrs –, compute the free energy from reactants to products AND vice versa [l]{false} nocut – Non bonded interactions are computed in full according to existing lists. No additional distance check is used in the energy routines. Important (and the default) in minimization that requires continuous and differential energy function (like conjugate gradient) [l]{true for minimization, false otherwise} cdie – Use constant dielectric (=1). [l]{true} rdie – Distance dependent (linear) dielectric (not active). [l]{false} nori – Global translation and rotation of the system are not eliminated [l]{false for vacuum simulations}. mshk - A special constraint protocol will be used for water (Matrix SHAKE)[l]{false} nfrz – select the particles that WILL NOT be frozen. In the same line as nfrz a "pick" command must follow toiy – A flag indicating that the Tensor Of Inertia and rotational contribution to the potential of mean force should be calculated [l]{false} selc – A pick command to select particles for which the potential of mean force/free energy difference is computed example: selc pick #prt 1 12 done ssbp – Spherical solvent boundary potential is used(Beglov and Roux 1994) [l]{false} nmul, diec, drdi, drca are parameters related with ssbp hvdw – use finite van der Waals radius for hydrogen atoms which are not water. Normally they are zero but this may cause stability problem since charges of opposite sign may overlap in space. [l]{true} temp – Desired temperature (the temperature is determined from the kinetic energy7) [d]{300.d0} grid – Number of grid points of the path [i]{200} #ste – number of sampling points for a given reaction coordinate value [i]{1} #equ – number of equilibration steps [i]{0} #sve – period of velocity scaling [i]{100} #tes – Number of steps between checks that the linear constraints are satisfied [i]{500} #wcr – Frequency of writing coordinates to a file (binary form) [i]{500} dt – step in pico-second for time integration [d]{0.001d0} bgin – Index of first path coordinates used to do free energy calculations [i]{1} fina – Index of last path coordinates used to do free energy calculations [i]{1} rand – a seed for random number generator to sample velocities. [i]{1} newv – Frequency of assigning new velocities.[i]{1000} #pri – number of steps between writes of simulation report [i]{1} list – period between updates of non bonded list [i]{1} Sample free_e input file conn name=(sypfdv.wcon) read file rcrd name=(sypfdv.pth) bina read file wcrd name=(sypfdv_fin.pth) bina wovr file wslo name=(sypfdv.slo) wovr file wfrc name=(sypfdv.frc) wovr file wene name=(sypfdv.ene) wovr 35 relx=999.0 rvmx=999.0 epsi=1. cdie v14f=8. e14f=2. #ste=10 #equ=2 #pri=1 #wcr=1000 #list=101 #tes=50 #sve=1 step=0.001 mshk mtol=1.d-12 rand=-30379267 bgin=1 fina=5 grid=5 nfrz pick chem mono TIP3 done selc pick #prt 1 61 done temp=300.0 ssbp nmul=15 diec=78.4 drdi=2.8 drca=2.6 pick chem prtc OH2 done hvdw action 10.1.8 therm Purpose : Executable to perform free energy differences by thermodynamic perturbation model Use : therm < input > output Input file types: Required conr – connectivity file for the reactants along the lambda free energy calculations. conp - connectivity file for the products along the lambda free energy calculations. rcrd - the initial (crd) coordinate file. Output file types: Optional wcrd – an unformatted file of output Cartesian coordinates. wvel - an unformatted output file of Cartesian velocities. Variables (in square brackets – type, curly bracket default) EPROP ARE AVAILABLE FOR therm EXCEPT srften, gbsu, hscl, mors, repl, cent, gbsa, gbo1, gbo2, npol, ball. nocut – Non bonded interactions are computed in full according to existing lists. No additional distance check is used in the energy routines. Important (and the default) in minimization that requires continuous and differential energy function (like conjugate gradient) [l]{true in minimizations} shif - A flag indicating different style of cutoff which is no longer used. Ignored at present, may return sometime in the future [l]{false} cdie – Constant dielectric (=1). [l]{true} rdie – Distance dependent (linear) dielectric. Not active. [l]{false} shkb – Shake all bonds [l]{false} cpth – Initial coordinates are read in PATH format [l]{false} cchr - Initial coordinates are read in CHARM format [l]{false} 36 temp – Desired temperature (the temperature is determined by kinetics) [d]{300.d0} #ste – number of sampling points for a given reaction coordinate [i]{1} #equ – number of equilibration steps [i]{1} #eqv – number of steps before scaling the temperature according to assign temperature [i]{10} #sve - A different keyword indicating period of velocity scaling that is used [i]{100} #pri – Frequency of reports on computation progress [i]{1} #tes - Number of steps between checks that the linear constraints are satisfied [i]{500} #wcr – Frequency of writing down coordinates to a file [i]{500} rand - a seed for a random number generator that samples velocities. Seed must be positive [i]{1} step - time step in ps [d] {0.001d0} newv – Frequency of assigning new velocities.[i]{0} list – period between updates of non bonded list [i]{1} firs – initial value of lambda point in thermodynamic integration [d]{0.d0} fina – final value of lambda point in thermodynamic integration [d]{1.d0} sted – change in lambda, the thermodynamic integration variable [d]{5.d-2} Sample therm input file conr name=(val1.wcon) read file conp name=(val2.wcon) read file rcrd name=(val.crd) read file wcrd name=(mut50-60.dcd) bina unit=12 wovr file wvel name=(mut50-60.dvd) bina unit=13 wovr #ste=1000 #equ=900 #pri=50 #wcr=1000 list=20 #tes=50 newv=1100 #eqv=1 #sve=1 rand=3451187 step=0.001 temp=300. relx=9.0 rvmx=6. epsi=1. cdie v14f=8. e14f=2. shkb firs=50.d-2 fina=60.d-2 sted=10.d-2 hvdw action 37 10.1.9 mfep Purpose : Program mfep ("minimum free energy path ") is a variant of the finite temperature string method of Eric Vanden-Eijnden (JCP 123, 134109, 2005). The selected subset of coordinates at k th milestone, {X k } are used to define hyperplane ϕ k and the unit vector normal to the plane as in the chmin program. The planes are then updated by calculating the average configuration on the plane for τ time step (eq.1). After averaging the planes are gradually updated to the average configuration in Δ steps (eq.2). The indices in the subscript are plane numbers while the superscripts are iteration step. To prevent the instabilities due to the noise while averaging (especially in the beginning of the calculations) the new plane ϕ k(n +1) is smoothed by eq.3 where s is the degree of smoothing. Also a linear reparametrization is used to evenly distribute the distances between the planes see Ref. (JCP 125, 024106, 2006) for more details. ϕ k(m +1) = τ 1 X k dt τ ∫0 (1) ϕ k(n +1) = ϕ k(n) + Δ(ϕ k(m +1) − ϕ k(n) ) (2) s ϕ k(n +1) = sϕ k(n +1) + (ϕ k(n+1+1) + ϕ k(n−1+1) ) (3) 2 mfep requires multiple processors that communicate by mpi The input file for each point should be prepared and named in such a way that the first processors sees inp_0001, second inp_0002 etc. Use: mpirun –np number_of_processes mfep Standart output for every is written to pth_00XX Input file types: Required conn – the file wcon has all the information on the molecular topology (what is bound to what) and the parameters required for energy calculations. The file must be produced before launching fp with the program conn mlst -- the file with the individual milestoning image along the reaction coordinate is stored. Must be of path format. rcrd - the file where the Cartesian coordinates of the initial run are stored. Must be in path (pth) format. It is the initial structure for sampling in the plane. It can be a copy of mlst file Output file types: Required 38 upfr – a file in which the projected force along the reaction coordinate is written during sampling. wcrd – a path formatted file where output coordinates are stored. wpep – a file with the average coordinates are stored wene – a file to write standard energy output during the simulation. wmom– a file to write distance of the milestone from the initially started one. Variables nwav – number of time steps for averaging configurations for new plane update frep – number of steps used to update to the new milestone [i]{2000} smth – degree of smoothing of the new milestones (reaction path) [d]{0.1d0} orth – The current run is OEQ type[l]{true} ucrb – use crbm constraints to force the system to remain in the plane[l]{true}. temp – assigned temperature[d]{300.d0} ptmp – temperature of equilibration phase (if #equ not zero) [d]{100.d0} grid – the number of path structures minus 1 [i]{0} #wcr – period for writing output coordinates[i]{100} #nen – period for writing ostandart energies [i]{100} #upfr – period for writing the projected force [i]{100} #mom – period for writing the distance from the initial configuration nwpep – period for writing the trajectory on wpep [i]{100} selc – select the subset of particles to define the reaction coordinate. A pick command must follow the select : pick … done newv – re-sample velocities from the Boltzmann distribution. Should not be used frequently [i]{10000} #scl – frequency between attempted velocity scaling [i]{20} #equ – number of equilibration steps (standard MD with no constraints) [i]{0} #tes – frequency for testing constraints [i]{0} step – the size of the time step [d]{0.005d0} rand – random number to initiate velocities from the Maxwell distribution #mxstps – maximum number of steps. [i]{10000} cent – restraining the geometric center of a selected set of atoms to be at the origin. The following parameters are optional in the cent command in the same line: kcnt, which is the force constant; xeqm, yeqm ,zeqm which are the coordinates to restraint the geometric center of the selected subset of coordinates. [l] {false} cent kcnt=[d]{10.d0} xeqm=[d]{0.} yeqm=[d]{0.} zeqm=[d]{0} pick… done The last part is a pick command to select the subset of coordinates. Check the documentation for pick command if unfamiliar with it. tstd – turns on the check on acceptable distances between atoms. If it is shorter than 1.5A prints a warning [l] {false} cgsk – matrix shake in conjugate gradient[l] {false} mshk – a flag to indicate the use of special constraint protocol matrix shake for water molecules. When added to the conn program, bond and angles are excluded from 39 connecticity list of the water molecules and therefore the same flag must be used in follow up codes. [l] {false} mtol is the tolerance in mshake [d]{1.d-7} shkl – turns on shaking of bonds with light particles ( m < 1.1) shkb – turns on shaking of all bonds[l] {false} nori – turns on the reorientation during the dynamics[l] {false} nosc – avoids scaling of temperature (done by default) [l] {false} orie – by default to avoid rigid body motion for simulation in vacuum, the overlapping of the structure with the initial structure is done using all atoms. If orie is called explicitly it is possible to pick only some atoms[l] {false} sym2 – A flag indicating that the box size is changing during the simulation and the final size will be s defined according to the values provided in the present line. The rate of change will be based on linear interpolation from the value defined by the symm command and the values found at sym2 example - sym2 xtr2=26.8 ytr2=26.8 ztr2=26.8 xtr2, ytr2, ztr2 are the final sizes of the box that is changing during the simulation [d]{0.d0} cdie – turns true the use of constant dielectric constant (=1). Default and currently the only option [l] {true} Sample output The norm of the difference between the current milestone and the initial one, dk = ϕ nk − ϕ k0 , is written to each pth_out_XXXX.log file except for the pth_out_0001.log where ∑d k is reported in. k Third and fourth columns of pth_out_XXXX.log files are the norm of the vectors between k, k-1, lk , k−1 = ϕ k − ϕ k −1 , and the one between k-1 and k+1 milestones. more pth_out_0002.out … -------- PME - setting up -------ewald_cof: 0.277516754261821 ewald coeff. = 0.277517 actuall cutoff = 8.50 contrib. to direct sum on cutoff sphere= 0.1000E-03 desired direct space cutoff for "exact" = 21.62 total HEAP storage needed for PME = 1005 total STACK storage needed for PME= 568386 Grid dimensions in PME: x = 64 y = 64 z = 64 ; intrpord = 3 -------- PME - end of setting up -------RANLUX DEFAULT INITIALIZATION: 314159265 RANLUX DEFAULT LUXURY LEVEL = 3 p = 223 Time init mlst (i-1)-i (i-1)-(i+1) 40 ----------100 200 300 400 ------------ ---------- ----------0.00000 0.02549 0.03790 0.00000 0.02549 0.03790 0.02226 0.01062 0.02120 0.02498 0.00996 0.01989 ….. Sample input file file conn name=(ala.wcon) read file mlst name=(ala_1.pth) binary read file rcrd name=(ala_1.pth) binary read file wene name=(ala_1.wene) wovr file wpep name=(pep_1.pth) binary wovr file wcrd name=(crd_1.pth) binary wovr file wmom name=(ala_1.mom) wovr orth ucrb smth=0.1 frep=1000 nwav=20000 ptemp=100 temp=300 grid=7 step=0.0005 #nen=500 #mom=100 #wcr=100 #equ=200 #mxs=50000 #tes=2000 #scl=40 newv=8000 mshk mtol=0.000001 symm xtra=20 ytra=20 ztra=20 ewald dtol=1e-06 grdx=16 grdy=16 grdz=16 selc pick #prt 2 2 | #prt 4 4 | #prt 6 6 | #prt 8 8 | #prt 10 10 done relx=8.5 rvmx=8 list=20 epsi=1 cdie hvdw mdiv action 41 9.1.10 umbr Purpose : Executable to perform an umbrella sampling calculation. Use : umbr < input > output Input file types: Required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd - Initial Cartesian coordinates in PATH (pth) format. Output file types: Optional quni – the sampled value of the umbrella coordinate wcrd – Unformatted Cartesian coordinates of sampled configuration (PATH format). wvel – Unformatted file of velocities. Variables: (in square brackets – type, curly bracket default) EPROP ARE AVAILABLE FOR umbr EXCEPT surften, hscl, hvdw, cnst, symm, Mors, repl, cent, gbo1, gbo2, npol nocut – Non bonded interactions are computed in full according to existing lists. No additional distance check is used in the energy routines. Important (and the default) in minimization that requires continuous and differential energy function (like conjugate gradient) [l]{true for minimization, false otherwise} cdie – constant dielectric (=1). [l]{false} rdie – distance dependent (linear) dielectric (not active). [l]{false} temp – desired temperature (the temperature is determined by the average kinetic energy) [d]{300.d0} grid – number of grid points in the path [i]{200} istp – the size of the step between sequential points along umbrella path [i]{1} strt – It is the starting value of the thermodynamic variable along which we integrate in the computations of the potential of mean force [i]{1} finl – the index of the final coordinate set in a collection of structures along the reaction coordinate[i] {-1} #ste – number of sampling points for a given reaction coordinate [i]{1} #equ – number of equilibration steps [i]{1} #sve – frequency of velocity scaling [i]{100} npri – frequency of writing a summary of computation progress [i]{1} #tes – Number of steps between checks that the linear constraints are satisfied [i]{500} #wcr – frequency of writing down coordinate sets to a file [i]{500} rvrs – if true, compute the free energy backwards [l]{false} effm – Compute the effective mass of the reaction coordinate [l]{false} forc – is the difference in the forces at the crossing point of the two electronic curves [d]{0.1d0} rand – a seed for a random number generator for sampling velocities (must be positive). [i]{1}. step – time step in ps [d] {0.001d0} newv – Frequency of assigning new velocities.[i]{0} 42 list – period between updates of non bonded list [i]{1} Sample umbrella input (A conformational transition in valine dipeptide) file conn name=(val.wcon) read file quni name=(q.data) wovr file rcrd name=(valmin.dcd) bina wovr file wcrd name=(valumb.dcd) bina wovr file wvel name=(valumb.dvd) bina wovr #ste=2000 #equ=2 #pri=10 #wcr=5 list=20 #sve=4000 rand=-3451187 step=0.001 grid=1 #str=2 forc=5.0d0 rvrs=-1 temp=300. rmax=9. epsi=1. cdie v14f=8. e14f=2. hvdw action 9.1.11 sdel (parallel version, requires MPI) / sdelS (serial version) Purpose: This module searches for a trajectory between two specified structures by action minimization. Given the two end structures xi , x f , the target function minimized ⎛ ∂S by sdel is T = ∑ ⎜ ⎜ j = 2 ⎝ ∂x j N −1 2 ⎞ ⎟⎟ + C , where S is the functional ⎠ ) ( N −1 N 1 S ⎡{x j } ⎤ = ∑ 2 (E − U (x j ) )+ 2 (E − U (x j +1 ) ) Δl j , j +1 , Δl j , j +1 = Mx j − Mx j +1 ⎢⎣ j =1 ⎥ ⎦ j =1 2 (1) and C is a restraint that ensures that configurations x j ’s are distributed approximately uniformly along the pathway: C = η1 ∑ (Δl j , j +1 − Δl j ) 2 , Δl = 1 N −1 ∑ Δl j , j +1 . The N − 1 j =1 target function T is minimized by simulated annealing. Use : mpirun –np #procs sdel (parallel version) sdelS (serial version) Executable takes input from files inp_0000, inp_0001,…, inp_(#proc-1), where these should be identical. The number of processors needs to be specified also in the input files (see below). Each processor then writes its output to the file pth_out_(#procID).log Input file types: Required conn – connectivity file rcrd – the starting input trajectory in PATH format (flag cpth needs to be specified) or two structures (the end points) in CHARM format (flag cini needs to be specified). In the 43 case the two end structures are specified the initial trajectory in the pathway searched is set to linear interpolation between the two end points. This interpolation scheme is not recommended with typical all-atomistic potentials since energies of intermediate structures would be too large due to hardcore overlap and one would get negative values under square roots in Eq. (1). It is better to start with partially refined path like a minimum energy path. Output file types: required wcrd – the output file for the minimized trajectory, this is binary file in PATH format Variables: (in square brackets - type, curly bracket default) EPROP ARE AVAILABLE FOR sdel This module supports FREADY related parameters . sdel specific variables: (in square brackets - type, curly bracket default) cpth – input is read in PATH format [l] {true} cini – input is read as two CHARM format structures that are interpolated [l] {false} dtop – integration step in simulated annealing minimization [d] {10-5} grid – number of images that represent the trajectory (including the end points) [i] {10} #ste – total number of minimization steps in simulated annealing algorithm [i] {1} #pri – each #pri steps information about current state of the trajectory is printed out to standard output [i] {1} #wcr – if #wcr ≠ 0 , each #wcr steps trajectory is saved to file int_pth_NNNN.pth in PATH format (where NNN increases as 0001,0002,…) [i] {0} tmpr – starting temperature of simulated annealing (not to be confused with physical temperature) [d] {10.0} list – The frequency of updating the non-bonded list, but in the module also specifies cooling schedule of simulated annealing. Total number of steps is divided to small cycles of length list. The starting temperature of each cycle is linearly decreasing from tmpr (see above) to 0.0. Temperature in each cycle linearly changes from its starting temperature to 0.0 as well [i] {1} rand – random number generator seed, if more processors are used, seed of each processor is set to rand + procID [i] {1} pdqe – total energy of the system E [d] {0.0} ctmp – constant temperature run, instead of linear cooling [l] {false} proc – number of processors used in the calculation (1 ≤ proc ≤ grid-2) [i] {1} gama – value of parameter η1 from the definition of the equidistance restraint C [d] {102} clog – introduces additional restraint of −η2 ∑ log(Δl j , j +1 ) that helps to avoid collapse of j the trajectory. Parameter η2 is linearly scaled from clog to 0 [d] {0.0} fene – Parameter helping to avoid structures with U > E by adding a restraint 0.1 fene N ∑ ( E − U j ) to target function. [d] {1.0} j 44 noRA – the simulated annealing is not initiated with random momentum corresponding to given temperature, but instead the momentum is set to 0 [l] {false} itpl – interpolate mode, allows for interpolation/skipping of structures from the input. It is useful when refining the trajectory description. If itpl=0 whole input trajectory is used as is. If itpl=1 a new structure will be interpolated in between all existing neighboring structures changing the total number of structures from igrid to 2 igrid – 1. If itpl=2, a structure is kept, then skno (see below) structures are skipped,… Nonzero values work only in serial version (proc=1). [i] {0} ovlp – overlap the structures with respect to each other after the initial trajectory is read from the input [l] {false} select – successive pick command selects a subset of particles for the calculation [l] {false} CURRENTLY DOES NOT WORK! skno – in the case of reading input trajectory in interpolate=2 mode this parameter specifies how many structures should be skipped after reading in a structure from input file [i] {1} Sample sdel input ~debu file conn name=(val.wcon) read file rcrd name=(valmin200.pth) bina read file wcrd name=(output.PTH) bina wovr #ste=5000 #pri=100 list=500 gama=2000.0 grid=200 pdqe=-42.2 ~ proc=49 cpth tmpr=3.0 dtop=1.0d-4 fene=1.d0 rmax=9999. gbsa hvdw amid action 45 9.1.12 sdp (parallel version, requires MPI) / sdpS (serial version) Purpose: This module searches for an overdamped trajectory between two specified structures by action minimization. Given the two end structures xi , x f , the target function minimized by sdp is 2 N −1 ⎛ ⎞ (2) T (x 2 , … , x N −1 x1 = xi , x N = x f ) = ∑ H S + ⎜ ∂U ⎟ x j +1 − x j + C , ∂ x j ⎠ ⎝ j =1 where C is a restraint that ensures that configurations x j ’s are distributed approximately 1 N −1 ∑ Δl j , j +1 . The N − 1 j =1 j target function T is minimized by simulated annealing in parallel version (sdp) or by conjugate gradient local minimization (sdpS).(Olender and Elber 1996; Elber and Shalloway 2000; Majek, Elber et al. 2009) uniformly along the pathway: C = η1 ∑ (Δl j , j +1 − Δl ) 2 , Δl = Use : mpirun –np #procs sdp (parallel version) sdpS (serial version) Executable takes input from files inp_0000, inp_0001,…, inp_(#proc-1), where these should be identical. The number of processors needs to be specified also in the input files (see below). Each processor then writes its output to the file pth_out_(#procID).log Input file types: Required conn – connectivity file rcrd – the starting input trajectory in PATH format (flag cpth needs to be specified) or two structures (the end points) in CHARM format (flag cini needs to be specified). In a case the two end structures are specified the initial trajectory in the pathway searched is set to linear interpolation between the two end points. Output file types: required wcrd – the output file for the minimized trajectory, this is binary file in PATH format Variables: (in square brackets - type, curly bracket default) EPROP ARE AVAILABLE FOR sdp and sdpS This module supports FREADY related parameters cpth – input is read in PATH format [l] {true} cini – input is read as two CHARM format structures that are interpolated [l] {false} dtop – integration step in simulated annealing minimization [d] {10-5} grid – number of images that represent the trajectory (including the end points) [i] {10} #ste – total number of minimization steps in simulated annealing algorithm [i] {1} #pri – each #pri steps information about current state of the trajectory is printed out to standard output [i] {1} 46 #wcr – if #wcr ≠ 0 , each #wcr steps trajectory is saved to file int_pth_NNNN.pth in PATH format (where NNN increases as 0001,0002,…) [i] {0} tmpr – starting temperature of simulated annealing (not to be confused with physical temperature of the system) [d] {10.0} list – The frequency of updates of the non-bonded lists. Here also specifies cooling schedule of simulated annealing. Total number of steps is divided to small cycles of length list. The starting temperature of each cycle is linearly decreasing from tmpr (see above) to 0.0. Temperature in each cycle linearly changes from its starting temperature to 0.0 as well [i] {1} rand – random number generator seed, if more processors are used, seed of each processor is set to rand + procID [i] {1} ctmp – constant temperature run, instead of linear cooling [l] {false} proc – number of processors used in the calculation (1 ≤ proc ≤ igrid-2) [i] {1} gama – value of parameter η1 from the definition of the equidistance restraint C [d] {102} clog – introduces additional restraint of −η2 ∑ log(Δl j , j +1 ) that helps to avoid collapse of j the trajectory. Parameter η2 is linearly scaled from clog to 0 [d] {0.0} noRA – the simulated annealing is not initiated with random momentum corresponding to given temperature. Instead the momentum is set to 0 [l] {false} hami – value of H S from Eq. (2) [d] {0.0} itpl – interpolate mode, allows for interpolation/skipping of structures from the input. It is useful when refining the trajectory description. If itpl=0 whole input trajectory is used as is. If itpl=1 a new structure will be interpolated in between all existing neighboring structures changing the total number of structures from igrid to 2 igrid – 1. If itpl=2, a structure is kept, then skno (see below) structures are skipped,… Nonzero values work only in serial version (proc=1). [i] {0} ovlp – overlap the structures with respect to each other after the initial trajectory is read from the input [l] {false} skno – in the case of reading input trajectory in interpolate=2 mode this parameter specifies how many structures should be skipped after reading in a structure from input file [i] {1} select – successive pick command selects a subset of particles for the calculation [l] {false} CURRENTLY DOES NOT WORK! anne – use annealing in sdpS [l]{false} tolg –tolerance of gradient. When tolg is reached during the minimization, terminate[d]{1.d-3}. dfpr – estimated reduction in energy during first step [d]{1d-2} Sample sdp input file conn name=(val.wcon) read file rcrd name=(valmin200.pth) bina read file wcrd name=(output.PTH) bina wovr #ste=100 #pri=1 list=100 anne gama=2000.0 grid=30 hami=1.d-5 ~ 47 proc=1 cpth tmpr=30.0 dtop=1.0d-4 ~ rmax=9999. gbsa hvdw amid action 9.1.13 chmin (this module in a serial mode only) Purpose: This module searches for a trajectory between two specified structures by action minimization. Given the two end structures xi ., x f , the target function minimized by chmin is N −1 T (x 2 , … , x N −1 x1 = xi , x N = x f ) = ∑ U (x j ) + C , (3) j =2 where C is a restraint that ensures that configurations x j ’s are distributed approximately uniformly along the pathway: ⎛ Δl 2j , j + 2 ⎞ 2 ρ 1 N −1 C = η1 ∑ (Δl j , j +1 − Δl ) + ∑ exp ⎜ -λ ⎟ , Δl = ∑ Δl j , j +1 . 2 ⎜ λ j N − 1 j =1 Δl ⎟⎠ j ⎝ The target function T is minimized by either simulated annealing or conjugate gradient local minimization. Use : chmin < input > output Input file types: Required conn – connectivity file rcrd – the starting input trajectory for algorithm, there are 4 different input styles supported: (i) PATH files (binary, double precision, specifies all structures of a trajectory) (ii) INIT. Reading formatted coordinates file for reactants and products and generating the rest of the path by linear interpolation (iii) INTRpolate. Given a low resolution path in PATH format, add structures in between to refine the path. The keywords cpth, cini, cint specifies which of the styles are used, by default PATH format is assumed. Output file types: required wcrd – the output file for the minimized trajectory, this is binary file in PATH format Energy/general variables: (in square brackets - type, curly bracket default) 48 This module supports all energy and FREADY related parameters, see documentation of dyna executable All eprop exept: mors,mors alph, Dmor, alph, spec, rcut, lmda, Arep brep, beta, cent, gbo1, gbo2, npol,surften, ten, hscl, cnst, ball None of FREADY parameters NOTE: metl is reading ‘amtl’ and ‘alfa’ in EPROP it’s written “amtl” and “bwal” chmin specific variables: (in square brackets - type, curly bracket default) cpth – input is read in PATH format [l] {true} cini – input is read as two CHARM format structures that are interpolated [l] {false} cint – input is read in PATH format, with interpolation [l] {false} dtop – integration step in simulated annealing minimization [d] {5.10-4} grid – number of images that represent the trajectory (including the end points) [i] {10} #ste – number of minimization steps [i] {1} #pri – each #pri steps, information about current state of the trajectory is printed out to standard output [i] {1} #tes – number of steps between checks that the linear constraints are satisfied [i] {500} #wcr – each #wcr steps trajectory is saved to file int_pth_NNNN.pth in PATH format (where NNN increases as 0001,0002,…) [i] {0} tmpr – starting temperature of simulated annealing of the chain (not to be confused with the physical temperature of the system) [d] {10.0} list – The non bonded list is updated each “list” steps in conjugate gradient minimization (according to the middle structure). In simulated annealing mode non bonded list is updated for every structure every single step [i] {1} rand – random number generator seed, [i] {1} gama – value of parameter η1 from the definition of the equidistance restraint C [d] {102} repl - value of parameter ρ from the definition of the equidistance restraint C [d] {102} lmbd - value of parameter λ from the definition of the equidistance restraint C [d] {2.0} tolg - tolerance of the gradient used in conjugate gradient optimization [d] {10-3} dfpr - estimated reduction in the value of the target function T in the first step [d] {0.01} anne – use simulated annealing [l]{false} ovlp – overlap the structures with respect to each other after the initial trajectory is read from the input [l] {false} skno – in the case of reading input trajectory in interpolate=2 mode this parameter specifies how many structures should be skipped after reading in a structure from input file [i] {1} select – successive pick command selects a subset of particles on which the chain constraints will be imposed [l] {false} debu – debug option {false} 49 nbfi – A flag to indicate that a soft, finite van der Waals repulsion is used for difficult annealing (Gausssian repulsion is employed) [l]{false} Sample chmin input file conn name=(val.wcon) read file rcrd unit=5 read file wcrd name=(valmin.pth) bina wovr #ste=1000 #pri=10 #wcr=1000 list=1000 repl=1000. lmbd=2. gama=20. grid=5 rmax=9999. epsi=1. cdie cini hvdw action file name=(val01.crd) read file name=(val60.crd) read 50 9.1.14 fp Purpose : Compute first passage time trajectories between Milesones and collect statistics. Program fp ("first passage") implements milestoning in MOIL. It runs in one of two modes. The first is "oeq mode" ("orthogonal equilibration"), in which the peptide is constrained to the plane normal to the current milestone. The second mode is "fp mode," in which the peptide is unconstrained and evolves dynamically (according to the Verlet algorithm) until it makes first passage to a neighboring milestone plane. Note that I've just used "fp" as both the name of the program and the second mode. To avoid confusion, I'll usually say "the program" when referring to the entire Fortran program, and I'll reserve "fp" for the name of the second running mode. Use : fp < input > output Input file types: Required conn – the file wcon has all the information on the molecular topology (what is bound to what) and the parameters required for energy calculations. The file must be produced before launching fp with the program conn rcrd – the file where the Cartesian coordinates of the initial run are stored. Must be in path (pth) format. In OEQ mode it is the initial structure for sampling in the plane. In FP mode the file contains multiple coordinate sets that are used as initial conditions for trajectories that terminate on the nearby Milestones. mlst – the file that stores a complete list of the Milestoning images along the reaction coordinate. If there are M milestones in the reaction path then the milestone file must contain M structures. Must be of path format. Output file types: Required upfr – a file in which the projected force along the reaction coordinate is written during sampling. Can be used to estimate the PMF along the reaction coordinate. Active only in the OEQ phase. upvl – a file in which the velocities along the direction of the reaction coordinate are written. Can be used to estimate velocity memory along the reaction coordinate. wcrd – a path formatted file where output coordinates are stored. In OEQ run the output is used to initiate FPT trajectories. In FPT run the file includes sample. conformations along the terminating trajectory wfpt – a file with the first passage times sampled. wfpp – a file with first passage time configurations. wpep – a file with the trajectory wene – a file to write standard energy output during the simulation, at present not in use. Output file types: Optional 51 wmom – This output file contains the distances and squared distances of the output configurations from the initial configuration given in rcrd. The order in which the distances are listed is the same as the order of the configurations in wcrd. wdot – write one type of dot product into a file udot wdt1 – write down dot products between path unit vectors to file unit udt1 wdst – write the distance from the initial structure to file unit udst Variables EPROP ARE AVAILABLE FOR fp EXCEPT surften, hscl, mors, repl. orth – The current run is OEQ type (sample configurations in a Milestone) [l]{false} tefp – the current run is FP (run terminating trajectories between Milestones) [l]{false} ucrb – use crbm constraints to force the system to remain in the plane. temp – assigned temperature [d]{300} ptemp – temperature of pre-equilibration phase [d]{-1} grid – the number of path structures minus 1 [i]{lgrid – parameter in LENGTH.BLOCK} nene – period between printing energies (not used) [i]{100} nrcrd – number of structures in urcrd [i]{1} nwcrd – Frequency of writing output coordinates [i]{0} nwpep – period for writing the trajectory on wpep selc – select the subset of particles to define the reaction coordinate. A pick command must follow the select : pick … done nfrz – select particles that are not frozen (fixed). In the same line a “pick” command must be present [l]{false} newv – re-sample velocities from the Boltzmann distribution. Should not be used frequently [i] {1000} #scl – frequency between attempted velocity scaling [i]{100} #equ – number of equilibration steps [i]{0} #pri – frequency of printing out progress reports. [i]{1} #tes – frequency for testing constraints [i]{0} step – the size of the time step [d]{0.001d0} rand – random number seed to sample velocities from the Maxwell distribution [i]{1} tmslt – the index of the current milestone [i] pmslt – the index of the previous milestone. The spacing between Milestones if flexible. We can use (for example) milestone 5 as current, milestone 1 as previous and milestone 7 as next. [i] nmlst – the index of the next Milestone. [i] #mxstps – maximum number of steps. In OEQ it is the actual number of steps used for sampling. In FP run it is a maximum runtime of a trajectory. We may give up on a termination if it runs for too long. Extremely long termination times suggest that more Milestones should be put in between. [i] {-1} tstd – turns on the check on acceptable distances between atoms. If it is shorter than 1.5A prints a warning nfrz – select the particles that WILL NOT be frozen. In the same line as nfrz a "pick" command must follow 52 shkb - shake all bonds (alternatively one may try shkl for shaking bonds with light particles only m<1.1, shkb is highly recommended for dynamics). This option does not work at present if the SHAKE constraints are present for the subset of atoms that is included in the reaction coordinate. cgsk – matrix shake using conjugate gradient (not operational at present) (Weinbach and Elber 2005) [l]{false} mshk – A special constraint implementation for water molecules When added to the conn program, bond and angles are excluded from connectivity list of the water molecules mtol is the tolerance in mshake [d]{1.d-7} shkl – turns on shaking of bonds with light particles ( m < 1.1) nori – turns off the reorientation during the dynamics nosc – avoids scaling of temperature (done by default) [l]{false} orie – by default to do the reorientation to avoid rigid body motion, the overlapping of the structure with the initial structure is done using all atoms. If call orie it is possible to pick only some atoms TORS – turns on the constraining of some torsional angles example – TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100 atm1 atm2 atm3 atm4 are the constrained atoms [i]{0} kcns - amplitude for torsional constraint (the larger kcns, the stronger the constraint) [d]{0.d0} cneq equilibrium angle expressed in degrees [d]{-999.d0} (TORS keyword must come AFTER amid if amid is used) sym2 - A flag indicating that the box size is changing during the simulation and the final size will be defined according to the values provided in the present line. The rate of change will be based on linear interpolation from the value defined by the symm command and the values found at sym2 example – sym2 xtr2=26.8 ytr2=26.8 ztr2=26.8 xtr2, ytr2, ztr2 are the final sizes of the box that is changing during the simulation [d]{0.d0} cdie – constant dielectric (=1). [l]{true} Sample input file conn name=(aladip_w248.wcon) read file mlst name=(albet.pth) binary read file rcrd name=(oeq_M_144_1.pth) binary read file wene name=(fp_M_144_1.wen) wovr file wcrd name=(fp_M_144_1.pth) binary wovr file wfpt name=(fp_M_144_1.fpt) wovr file wpep name=(fp_pep_M_144_1.pth) binary wovr file wfpp name=(fpp_M_144_1.pth) binary wovr file upvl name=(fpp_M_144_1.pvl) wovr tefp temp=303 grid=143 step=0.001 #pri=10000 #rcr=10 #wcr=10000 #pep=50 #equ=0 53 tmlst=144 pmlst=141 nmlst=0 rand=aq13623277 #tes=10000 #scl=50 newv=0 mshk mtol=0.0001 symm xtra=20 ytra=20 ztra=20 ewald dtol=1e-06 grdx=16 grdy=16 grdz=16 selc pick #prt 1 12 done relx=8 rvmx=7 list=20 hvdw amid action 54 10.1.10 DiM Purpose: Calculating first passage times and stationary populations between states defined as reactant and product. Program Directional Milestoning (DiM) is a variant of milestoning (fp) at which the dividing hypersurfaces are redefined in more than one dimension and at the same time the concept of Milestone separation is done directional. For details see (JCTC,2010,6,p1805). Program runs in one of three modes. The first is dim_prepare which runs trajectories between a set of anchors pre-defined. The aim is to identify the anchors that are connected directly. The second program is dim_sampleS which constrains the dynamics around the interface defined between the two directly connected anchors. Sampled points are further integrated bacward in time to check if they are from First Hitting Point Distribution (FHPD). Phase space points that are FHPD are written to a file and used in the last mode. The third mode is called dim_run which runs a unconstrained MD trajectory from the points saved from the previous mode and it reports the first passage times when trajectory hits another milestone. Input files(required) for dim_prepare conn – the file wcon has all the information on the molecular topology (what is bound to what) and the parameters required for energy calculations. The file must be produced before launching dyna with the program conn rcrd - the file where the Cartesian coordinates of the anchors are stored. Must be in path (pth) format. Output files(required) for dim_prepare wcrd – a path formatted file where output coordinates are stored. In dim_sample the output is used to isample at the interface. Input parameters(required) for dim_prepare #ste – upper bound for length of each searching trajectory stot – number of searching trajectories per cell grid – number of cells (or anchors) andr – a flag to use Anderson thermostat andC – the probability of velocity resampling of waters for Anderson thermostat (see Ref Juraszek and Bolhuis 2008 for details) cpth – [l] {false} the input coordinate file is from a pth formatted file read structure number istr=[i]{0} cell – cell id that is going to be used sele – a flag for selection of collective variables TORS – turns on the constraining of some torsional angles example – TORS atm1=2 atm2=4 atm3=6 atm4=10 weig=1 atm1 atm2 atm3 atm4 are the constrained atoms [i]{0} weig [d]{1.d0} the weight of each torsional constrain 55 Input example file conn name=(sugar.wcon) read file rcrd name=(centers.PTH) bina read file wcrd name=(interfaces_18.PTH) bina wovr ~ #ste=10000 #pri=1 #lis=20 step=0.001 stot=500 grid=20 info=250 #scl=30 andr andC=0.2 ~ cpth cell=18 ~ tmpi=300 tmpf=310 mshk mtol=1.d-6 symm xtra=24.5 ytra=24.5 ztra=24.5 ewald dtol=0.000001 grdx=32 grdy=32 grdz=32 relx=9 rvmx=8 ~ sele ~ set of reduced variables here: TORS atm1=6 atm2=33 atm3=40 atm4=64 weig=1. TORS atm1=33 atm2=40 atm3=64 atm4=71 weig=2. TORS atm1=40 atm2=64 atm3=71 atm4=94 weig=2. TORS atm1=64 atm2=71 atm3=94 atm4=101 weig=1. action Input files(required) for dim_sample conn – the file wcon has all the information on the molecular topology (what is bound to what) and the parameters required for the energy calculations. The file must be produced before launching dyna with the program conn rcrd - the file where the Cartesian coordinates of the anchors are stored. Must be in path (pth) format. rint - the file where the Cartesian coordinates of the interface is stored. Must be in path (pth) format. Output files(required) for dim_sample wcrd – a path formatted file where output coordinates are stored. In dim_run this will be used to launch trajectories for first hitting times. wvel – a path formatted file where output velocities are stored. In dim_run this will be used to launch trajectories for first hitting times. 56 Input parameters(required) for dim_sample #ste – upper bound for length of each sampling trajectory #sav – how often saving will be attempted umbr - width of umbrela sampling region/interface appr - threshold value for distance between free trajectory conformation and the closest interface K1_U - force constants for umbrela sampling between cell1 and conformation K2_U - force constants for umbrela sampling between cell2 and conformation grid – number of cells (or anchors) andr – a flag to use Anderson thermostat andC – the probability of velocity resampling of waters for Anderson thermostat (see Ref Juraszek and Bolhuis 2008 for details) cpth – [l] {false} the input coordinate file is from a pth formatted file read structure number istr=[i]{0} cell – incoming cell id that is going to be used cel2 – outgoing cell id that is going to be used sele – a flag for selection of collective variables TORS – turns on the constraining of some torsional angles example – TORS atm1=2 atm2=4 atm3=6 atm4=10 weig=1 atm1 atm2 atm3 atm4 are the constrained atoms [i]{0} weig [d]{1.d0} the weight of each torsional constrain Input parameters(required) for dim_run #ste – upper bound for length of each searching trajectory stot – number of searching trajectories per cell grid – number of cells (or anchors) andr – a flag to use Anderson thermostat andC – the probability of velocity resampling of waters for Anderson thermostat (see Ref Juraszek and Bolhuis 2008 for details) cpth – [l] {false} the input coordinate file is from a pth formatted file read structure number istr=[i]{0} cell – cell id that is going to be used sele – a flag for selection of collective variables TORS – turns on the constraining of some torsional angles example – TORS atm1=2 atm2=4 atm3=6 atm4=10 weig=1 atm1 atm2 atm3 atm4 are the constrained atoms [i]{0} weig [d]{1.d0} the weight of each torsional constrain Input example file conn name=(sugar.wcon) read file rcrd name=(centers.PTH) bina read file wcrd name=(interfaces_18.PTH) bina wovr 57 ~ #ste=10000 #pri=1 #lis=20 step=0.001 stot=500 grid=20 info=250 #scl=30 andr andC=0.2 ~ cpth cell=18 ~ tmpi=300 tmpf=310 mshk mtol=1.d-6 symm xtra=24.5 ytra=24.5 ztra=24.5 ewald dtol=0.000001 grdx=32 grdy=32 grdz=32 relx=9 rvmx=8 ~ sele ~ set of reduced variables here: TORS atm1=6 atm2=33 atm3=40 atm4=64 weig=1. TORS atm1=33 atm2=40 atm3=64 atm4=71 weig=2. TORS atm1=40 atm2=64 atm3=71 atm4=94 weig=2. TORS atm1=64 atm2=71 atm3=94 atm4=101 weig=1. action 9.1.15 scndrv (and numerical) Purpose: Computes second derivative matrix of the potential energy for one coordinate system. “scndrv” computes the derivatives analytically, while “numerical” numerically by finite difference. The eigenvalues of the mass weighted scndrv are written to the standard output. The same code is also used to interface second derivative calculations with other programs that need it. Use: scndrv < input > output Input file types: Required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – the file where the Cartesian coordinates of all the particles are stored: CHARMM format rpth – read coordinates from path file. Out files types: Required wene – file name for energy reporting 58 Variables EPROP ARE AVAILABLE FOR SCNDRV(and Numerical) EXCEPT gbsu, ewald, vprt, mors, repl,gbo1.gbo2, metl, repl. debu – A flag for printing a lot of debugging information. DO not use unless you are a moil expert. rdie – A flag indicating that good old Coulomb law is modified from 1/r to 1/r2 (not active). Sample input file rcon name=(valdip.wcon) read file rcrd name=(valdip.crd) read rmax=9999 epsi=1. cdie action 59 10.2 Major options 10.2.1 LES Purpose: LES (Locally Enhanced Sampling) is a mean field approach that enhances the sampling of a small part of the systems that we are mostly interested in. Examples are multiplication of side chains in homology modeling, or of a peptide in a box of water. It was introduced in a paper by Elber and Karplus (Elber and Karplus 1990) to study ligand diffusion through proteins (by having a probability density of one ligand represented by 60 “ligand fragments”) and was extended to other applications such as global optimization by Roitberg and Elber(Roitberg and Elber 1991), and Simmerling and Elber(Simmerling and Elber 1994) and to free energy calculations by Verkhivker and Elber(Verkhivker, Elber et al. 1992). The current implementation in MOIL is highly flexible and supports all major modules that use energy calculation. Input file A single line needs to be added to the input file that generates the connectivity file. It must come immediately after the “action” line. It is MULT pick …. done #cpy=[i]{0} The meaning of which is the following. Pick a subset of atoms by the “pick” command and then multiply them #cpy times. The “MULT” is just an indicator to the program that we are going to multiply some of the particles. Once the file is modified we run the program conn as usual:conn < conn.inp > conn.out The connectivity file generated has some special features that tell other programs that particles have been multiplied. Multiplied particles do not see each other (they are like “ghost” to each other) and they interact with other particles only on the average. Strictly speaking the multiple particles represent probability density chopped to fragments of particles. Formally if the probability density of a single trajectory can be represented by a delta function: ρ ( X , P, t ) = δ (X − X 0 (t ), P − P0 (t )) then the LES approach uses an ansatz probability density ⎤ 1⎡ ⎢ ∑ δ (x − xi 0 (t ))δ ( p − pi 0 (t ))⎥ N ⎣ i =1,..., N ⎦ Where x and p is the part of the system that we do not enhance. The number of copies in the above formula is N . In addition to connectivity file we also need a coordinate file. Such a file can be produced with the graphic interface or manually. The LES option duplicates atoms. So if (for example) we enhance the two atoms N and H four times the crd file at the early beginning in which all the multiplied atoms occupy the same position in space will look something like 12 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000 13 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000 14 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000 15 3 ALA N -0.12170 1.12091 -1.33955 ALA3 FREE 0.00000 16 3 ALA H -1.09464 1.29356 -1.15315 ALA3 FREE 0.00000 ρ LES ( X , P, t ) = δ (x − x0 (t ))δ ( p − p0 (t )) 60 17 18 19 3 ALA H 3 ALA H 3 ALA H -1.09464 1.29356 -1.15315 ALA3 FREE 0.00000 -1.09464 1.29356 -1.15315 ALA3 FREE 0.00000 -1.09464 1.29356 -1.15315 ALA3 FREE 0.00000 The only differences between the lines are the atom numbers (the residue numbers remain the same). Such a file can be prepared from an ordinary crd file by editing, or by using the graphic interface for “Processing PDB” and selecting the LES option. The graphic interface prepares both the connectivity and coordinate files of LES, which makes it especially convenient for the present application. Sample input file prop name=(../../moil.mop/ALL.PROP) read file mono name=(../../moil.mop/ALL.MONO) read file poly name=(a4.poly) read file wcon name=(a4.wcon) wovr action MULT pick #mon 3 3 done #cpy=4 *EOD 61 10.2.2 MUTA Purpose: MUTA is a program that performs thermodynamic integration for the free energy differences of two molecules that differ in a small number of atoms. The algorithm picks two groups of particles, say groups 1 and 2, each of the groups belong to a different molecule representing a mutated part and eliminates all the interactions between them, so that particles belonging to two different groups do not see each other. Moreover, the interactions between particles of group 1 and all the particles not belonging to group 2 are rescaled by a parameter λ, and all the interactions between particles belonging to group 2 and all the particles not belonging to group 1 are rescaled by a factor (1-λ), so that the total Hamiltonian becomes: H(λ ) = K 0 + K1 + K 2 + U 0 + λU1 + (1− λ )U 2 The free energy difference turns out to be: λ2 ΔF = F(λ 2 ) − F(λ1 ) = ∫ U 2 − U1 λ1 λ' dλ ' To run a MUTA simulation a special connectivity file is required. A typical input file for conn in preparation for MUTA calculation is: Sample Input file prop name=(../../moil.mop/ALL.PROP) read file mono name=(../../moil.mop/ALL.MONO) read file poly name=(ala-to-val.poly) read file ubon name=(ala-to-val.addb) read file wcon name=(ala-to-val.wcon) wovr muta action MUTA pick grou 1 #prt 7 9 | grou 2 #prt 15 15 done *EOD The data above describes a mutation of valine (side chain atoms from 7 to 9) to alanine (side chain atom 15). The two side chain are set such that they do not see each other and their interactions are scaled by λ or (1 − λ ) . The keyword “muta” is used to alert the program of additional MUTA input after the “action”. The mutant monomer has to appear in the ala-to-val.poly file, and using the add bond facility in moil to connect the extra side chain to the same backbone atom ( Cα ) . The program samples configurations to estimate the average ... λ and the free energy difference. The sampling is performed with Langevin dynamic (LD). For the input and output files see LD documentation. The file lambda_res.log returns the results of the free energy calculation. Use: muta < input > output Optional parameters: 62 ilam – initial value of lambda (at present it cannot be 0) [d]{0.0d0} flam – final value of lambda (at present it cannot be 1) [d] {0.0d0} #lam - number of lambda step from ilam to flam [i]{0} twai – number of time steps before starting to collect statistics for the ensemble average [i]{0} banf – if true make the integration go from ilam to flam, and then back to ilam. [l] {F} Sample input: file conn name=(ala-to-val.wcon) read file rcrd name=(ala-to-val.crd) read #ste=10000 step=0.0002 info=100 list=1 tmpi=300 tmpf=300 rmax=9999. epsi=1. cdie v14f=8. el14=2. muta ilam=0.2d0 flam=0.8d0 #lam=10 twai=100 banf hvdw nori eqms acti 10.2.3 FREADY In order to use FREADY within MOIL, you first need to generate connectivity and coordinate files of the coarse-grained model. See a comment in the description of program conn how to generate connectivity file for FREADY. Alternatively have a look in moil.test/fready. In this directory there is a script called runme.bat which processes a pdb file (the file name is specified in the 3rd line of the script) and generates a wcon and a crd files for FREADY. In order to run a module in FREADY mode you need to add the following line to your input file (for dyna, energy, sdel, sdp, chmin, mini_pwl, …) file CGpr name=(moil.mop/CG.PROP) read This command at the same time turns off all atomistic energy parameters. optional FREADY parameters file fix2 name=(reference.crd) read – Purpose of this command is to fix the local secondary structural elements to those observed in reference.crd. This command adds quadratic restraints on all bond angles and bond lengths with equilibrium values set to those seen in the reference structure. Moreover, it adds similar quadratic restraints on backbone dihedral angles in residues that are specified to be in an alpha helix or beta sheet conformation. The specification of secondary structure element is done through the last column of the reference.crd file, with 0 (default value in normal crd file) – coil, 1 – alpha helix, 2 – beta sheet. cutC – cutoff for FREADY non-bonded interaction, recommended to keep the default value [d] {13.5} cutH – cutoff for hydrogen bonding interactions in FREADY [d] {8.0} 63 DECO – this flag leads to smoothing the hardcore part of NB interaction, useful for ranking of DECOY structures [l] {false} Example of an input file for dyna executable (in FREADY mode) file conn name=(molecule.wcon) read file rcrd name=(molecule.crd) read file CGpr name=(../../../moil.mop/CG.PROP) read file wcrd name=(tmp.dcd) bina wovr file wvel name=(tmp.dvd) bina wovr #ste=10000 #equi=5000 step=0.003 info=2000 #crd=200 #vel=10000 #lis=20 #scl=20 tmpi=1 tmpf=300 cutH=10.0 action 10.2.4 Double-‐well Elastic network model It is possible to run moil action minimization routines (currently implemented only for sdp and sdel) in a double well elastic network model. Allowing to generate simple models of conformational transitions. The basic elastic network model is defined as γ U (x) = ∑ (rij − rij0 ) 2 , where rij and rij0 are distances between Cα atoms of residues i 2 rij <C and j in the structure x and in the reference structure, respectively. The parameter C is a cutoff value, typically in the range of 6 – 12 Å. The transition between two metastable structures xi , x f is modeled by a network 2 1⎛ ⎞ U (x) ≡ ⎜ U i (x) + (U f (x) − α ) − (U i (x) − (U f (x) − α ) ) + 4 β 2 ⎟ . 2⎝ ⎠ The two new parameters α and β define the relative energy difference between the two minima and smoothness/height of the barrier between the minima respectively. In order to run sdp/sdel with mixed elastic network model a coarse description of the system (connectivity file and coordinate files) is required. The connectivity file can be generated in the same way as for FREADY with only the modification that all residues (except CGTR) should be named GLYZ. Sidechain positions are not considered in this mixed elastic network model. Once connectivity and coordinate/PATH file of your system are ready you can use standard sdp/sdel input files with following additions: ENM2 – this required switch tells the system that a mixed elastic network model is used for the calculation. [l] {false} cutE – specifies cutoff used in definition of a contact (C parameter in the equations above) [d] {7.0} alFh - α parameter from the last equation [d] {0.0} bEta - β parameter from the last equation [d] {10.0} gamE – force constant gamma in the definition of simple ENM [d] {1.0} Example of input file for sdpS with double-well ENM model file conn name=(CG.wcon) read 64 file rcrd unit=5 read file wcrd name=(output.PTH) bina unit=12 wovr ~ #ste=60000 list=60000 tolg=0.0001 grid=100 cini cuto=12.d0 alph=0.d0 beta=1.d2 proc=1 gama=1.d0 gamC=5000 hami=1.d-5 ENM2 ~ tmpr=1.d-1 dtop=2.d-3 #pri=100 action file name=(conf1_CA.crd) read file name=(conf2_CA.crd) read 10.2.5 LD (Langevin Dynamics) Purpose : Perform a stochastic dynamics simulation using Langevin equation of motion. A frictional term, with a memory function proportional to a Dirac's delta in time, and a random term are introduced. The friction coefficient gamma can be controlled by input. The algorithm used here is discussed in “Computer Simulation of liquids - M.P.Allen & D.J. Tildesley”. Use : LD < input > output Input file types: Required conn – a file *.wcon obtained by calling conn first. It contains molecular topology data and the parameters required for energy calculations. rcrd – the file where the cartesian coordinates of all the particles are stored. These are the initial conditions to solve the Newton equations. The possible formats for this coordinates file are: ctyp=(charm) – coordinates written in charmm format (default); ctyp=(pdb) – coordinates taken from a pdb coordinate file; ctyp=(path) – coordinates taken from a path format file (binary). Input file types: Optional rtet – coordinates file for tethering particles to their initial coordinates during MD. Hence no significant deviation from initial coordinates is allowed. Useful when only part of the system requires optimization. These coordinates are read in charmm format (default). rvel – file with initial velocities if you do not want to extract them randomly. Velocities are written in charmm format. Output file types: Optional wcrd – a binary file where the cartesian coordinates of the particles are stored. wvel – a binary file where the velocities of the particles are stored. rest – a file with the last dynamics step saved for the restart. 65 rstr – a file where recent coordinates for restart are stored. Variables: (in square brackets – type, curly bracket default) EPROP ARE AVAILABLE FOR LD EXCEPT surften, gbsu, cnst, mors, repl.. bigb – if found the spring constant for any bond is modified according to newb. [l]{false} newb – new bond spring constant [d]{500.d0} wfly –check if water molecules fly away from the main system (for simulation of solvation shell). [l] {false} tstd – turns on a check for hard collisions for pairs in a neighbor listing in the present structure. Pairs with a distance shorter than 1.5A are reported. A single structure evaluation. No dynamics will be run [l]{false} nfrz – select the particles that WILL NOT be frozen. In the same line as nfrz a "pick" command must follow [l]{false} shkb – shake all bonds (alternatively one may try shkl for shaking bonds with light particles only m<1.1, shkb is highly recommended for dynamics.) [l] {false} cgsk – matrix shake using conjugate gradient determination of the Lagrange’s multipliers mshk –matrix shake for water molecules. When added to the conn program, bond and angles are excluded from connecticity list of the water molecules mtol is the tolerance in mshak [l]{false} mtol – tolerance of error for mshk [d]{1.d-7} shkl – turns on the shaking of bonds with light particles ( m < 1.1) [l]{false} nori – turns off the reorientation during the dynamics [l]{false} nosc – avoids scaling of temperature (done by default) [l]{false} orie – Avoid rigid body motion. Overlap current structure with the initial structure. Selection of a subset of atoms for overlap is also possible. [l]{false} TORS – Apply torsional angle constraints example - TORS atm1=2 atm2=4 atm3=6 atm4=10 kcns=100 atm1 atm2 atm3 atm4 are the constrained atoms [i]{0} kcns amplitude for torsional constraint (the larger kcns, the stronger the constraint) [d]{0.d0} cneq equilibrium angle expressed in degrees [d]{-999.d0} (TORS keyword must come AFTER amid if amid is used) spec – option of switching between different energy surfaces example - spec lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0 lmda=5.d0 rcut=3.d0 lmda is the range parameter for continuous potential shifts between empirical energy surfaces [d]{3.d0} rcut is the range distance employed in the switching function between different forms of the heme [d]{5.d0} repl – it replaces van der Waals wall by an exponential repulsion which is primarily used in Landau Zener modeling of curve crossing example - repl Arep=80.0 beta=1.0 Brep=4. The exponential repulsion is of the form Arep*exp(-beta*r)+Brep. The parameters should be given as variables as in the above example Arep [d], beta [d], Brep [d} 66 swit – this flag makes possible the passage between different energy curves in LandauZener calculations example - switch Rcro=3.53181 dRcr=0.05 Forc=5.59951 delt=0.287 Rcros is the position of the crossing point [d]{3.d0} dRcr is the interval (in angstrom) of significant interaction between two electronic curves that cross (i.e. the range in which a transition probability between the two electronic curves is evaluated)[d] {1.d0} Forc is the difference in the forces at the crossing point of the two electronic curves [d]{0.1d0} delt indicate the time interval during which curve crossing is felt [d]{0.1} nocut – Non bonded interactions are computed in full according to existing lists. No additional distance check is used in the energy routines. Important (and the default) in minimization that requires continuous and differential energy function (like conjugate gradient) [l]{true for minimization, false otherwise} nbfi - A flag to indicate that a soft, finite van der Waals repulsion is used for difficult annealing (Gausssian repulsion is employed) [l]{false} sym2 - A flag indicating that the box size is changing during the simulation. [l]{false}The final size is defined according to the values provided in the present line. The rate of change is based on a linear interpolation from the value defined by the symm command and the values found at sym2 example - sym2 xtr2=26.8 ytr2=26.8 ztr2=26.8 xtr2, ytr2, ztr2 are the final sizes of the box that is changing during the simulation [d]{0.d0} tthr - A flag indicating that the tether option (some atoms harmonically linked to fix positions in space) is set in the present line. ”pick” command to select restrained atoms is possible. [l]{false} frcc Force constant for tether constraints (linking particles to specific position in space){d}[1.d0] mult - Multiple temperatures are present [l]{false} Picked temperatures are used in different velocity scaling of selected parts of the system. The default is that all particles belong to temperature 1. Useful in annealing part of the system. Or in LES simulations in which equipartition is violated and different scaling are used for enhanced and regular parts eqms –all masses are set to 10.0 [l] {false} debug – print a lot of debugging information [l] {false} #ste – number of MD steps [i]{1} #equ – number of equilibration steps [i]{1} info – number of steps between writing information on standard output [i]{1} #crd – number of steps between writing coordinate sets [i]{0} (=0 means do not write coordinates) #vel - number of steps between writing velocities sets [i]{0} (=0 means do not write velocities) #lis - number of steps between regeneration of the non bonded list [i]{1} #scl - velocity is rescaled to the target temperature if the current kinetic energy violates the expected one (Boltzmann average) by more than #scl Kelvins. [d]{0.d0} 67 #tmp - number of temperatures in the system (useful for LES simulation) . If larger than one more input is required to define different domains with different temperatures [i] {1} tmpi - initial temperature [d] {300} tmpf - final temperature [d] {300} For multiple temperatures just list the values following tmpi or tmpf, e.g., tmpi 300 30 tmpf 300 300 for a system with two temperatures. Group 1 will start and end at 300K while group 1 starts at 30K and end up at 300K. rand - a seed for the random number generator for velocity sampling. [i]{1} step - time step in ps [d] {0.001d0} newv – Frequency of assigning new velocities.[i]{0} gama – Langevin dynamics friction coefficient in (internal MOIL time unit)^-1, so gama=3.d0 corresponds to approximately 60 ps^-1 [d]{3.d0} [** should be converted to MOIL units **] #rig – number of steps between rigid body overlaps of current structure and the initial reference structure to remove overall rotations and translations. fmax – if the norm of the force>fmax do steepest descent minimization. Its value is the threshold above which you should perform steepest descend iterations to stabilize the system [i]{-1} strt – for restart, is the starting step for dynamics shac – maximal error allowed for bond constraints (coordinates) [d]{1.d-7} shav – maximal error allowed for bond constraints (velocities) [d]{1.d-7} itsh – maximum number of allowed iterations for SHAKE convergence [i]{100} cgpt – maximum number of iterations in conjugate gradient SHAKE iteration for Lagrange’s multipliers of particle positions [i]{NA} cgvl – maximum number of iterations in conjugate gradient SHAKE iteration for Lagrange’s multipliers of velocities [i]{NA} cdie – turns true the use of constant dielectric constant (=1). Otherwise, distance dependent is used [l] {true} rdie – No longer operational (turn on dielectric linear with distance. [l]{true} FREADY works with LD (see FREADY documentation). Sample LD input It is the same as in the dyna sample input, just add a definition of the friction constant, e.g. gama=60.0d0. 68 10.2.6 dynapress Purpose: Calculates pressure of a biomolecular system enclosed in a cubic box. It works exactly as dyna program, only the definition of a rectangular periodic box is mandatory. Pressure is printed out every info steps. Microscopic pressure shows significant oscillations and thus a longer (at least 100 ps) averaging is recommended in order to estimate accurately the pressure of a system. The module is recommended with keywords nobo and shakb which constrain all bond lengths to their ideal values. Special water shake algorithm (keyword mshk) is currently not supported. Use: dynapress < input > output 10.2.7 PME Purpose: Calculating the long range electrostatic interactions by using Particle Mesh Ewald Summation. The current version is using code from Darden (Darden, York et al. 1993) Input parameters(required) ewald the ONLY necessary keyword for PME Input parameters(optional) dtol – tolerance for direct space summation - it essentially sets up the Ewald coefficient (for a given direct space cutoff) [f]{(5*10E-5) } grdx, grdy, grdz (function of box sizes) – grid dimensions defining the accuracy of PME. One grid point per Angstrom is recommended. Choose powers of 2,3 or 5 if possible iord – another parameter defining accuracy of PME, namely interpolation order (replaced by order in many places in the code) - notice, the interpolation order is equal to (iord – 1) [i]{0} sgdx, sgdy, sgdz – additional scaling parameters for further adjustment of automatically chosen xgrd, ygrd, zgrd Input example file conn name=(memb5.WCON) read file rcrd name=(memb5_4.CRD) read file wcrd name=(memb_16.DCD) bina wovr file wvel name=(memb_16.VCD) bina wovr relx=12. rvmx=8. epsi=1. cdie v14f=8. e14f=2. step=0.001 #ste=300 #equ=300 info=1 #crd=500 #vel=200 #lis=5 mshk mtol=1.d-12 shkb shac=1.d-12 shav=1.d-12 itsh=2000 symm xtra=70.0 ytra=106.3 ztra=50 69 ewald dtol=0.000001 iord=6 grdx=81 grdy=81 grdz=64 ~ the above line is relevant for PME rand=1111111 tmpi=300. tmpf=300. action 70 10.2.8 dynapt (parallel program for parallel tempering, requires MPI) Purpose: Compute molecular dynamic trajectories of replicas of the same system at different temperatures and swaps two neighboring replica's coordinates and velocities by a MC criterion. This method allows the low temperature system of interest to escape from local free energy minima where it might otherwise be trapped. (see Ref (Sugita and Okamoto 1999)) Use: mpirun -np numproc exe/dynapt < numrep_file here, numproc is the number of processors that will be used to calculate the trajectories and numrep_file is an input file that includes the number of replicas that will be run. Input file types To run parallel jobs with different temperatures one needs to prepare separate input files for each temperatures. These input files are similar to the dyna input file except for the swap frequency is set to a desired value. The name of the input file is given as: dyna_0000.inp, dyna_0001.inp, ... for the first, second replica and so forth.. Secondly one needs another input file numrep_file that is already mentioned above Output file types dyna output is automatically named by the program as dyna_0000.out, dyna_0001.out, ... and standard .dcd and .dvd files are generated exactly the same as the dyna program. Everything is the same as the dyna output file. Additionally, the acceptance ratio is written in the dyna output file. One can extract them with grep as grep “Acc” dyna_0002.out [** not clear what does the output mean? **] a standard Acc Ratio is given here: Acc. ratio in steps= Acc. ratio in steps= Acc. ratio in steps= Acc. ratio in steps= 12144 temps = 17664 temps = 18768 temps = 19872 temps = 300.00<-->350.00 300.00<-->350.00 300.00<-->350.00 300.00<-->350.00 1.00 0.50 0.33 0.25 Variables All variables are the same as in the dyna program with an addition of swfr – it is the attempt frequency for swapping. A number greater than 0 turns on parallel tempering [i] {0} 71 Further notes compile dynapt with mpif77 (or other MPI compatible compiler) Please note that in MOIL's implementation of parallel tempering at every swfr steps one replica is chosen randomly and the swap criterion is computed. Thus each replica swaps configuration in a different time. An example input file file conn name=(val.wcon) unit=10 read file rcrd name=(val.crd) unit=11 read file wcrd name=(300.dcd) bina unit=12 wovr file wvel name=(300.dvd) bina unit=13 wovr #ste=200000 #equ=10000 info=1000 #crd=1000 #vel=1000 #lis=2000 swfr=1000 #scl=20 rand=3451187 step=0.001 tmpi=300 tmpf=300 relx=12. rvmx=9. cdie epsi=1. action 72 10.3 Utilities 10.3.1 addion Purpose: To correctly use Ewald summation of a periodic system, the system must be neutral. This program takes a coordinate set of a solvated system and “mutate” water molecules to desired ions. The water molecules are chosen at random. This code is working most conveniently through the graphic interface. Use: addion < input > output Input file types: Required conn – the connectivity file that lists the molecular topology and parameters required for energy and force calculations. The connectivity file is for the molecule WITHOUT the added ions rcrd – the file from which the current coordinate system will be read (CHARMM format) Output file types: Required wcrd – where the coordinates (with the ions added) will be written poly – a polymerization file which includes the ions to be written. It will be used by the conn program to generate a connectivity file appropriate for the new coordinate file. Variables iona – the name of the ion particle (atom). We support at present only monatomic ions for an ion monomer. This makes sense since it replaces a single water molecule. A large ion may not fit. [c]{NONE} ionm – the name of the ion monomer [c]{NONE} #ion – the number of ions to be inserted [i]{0} rand – random number seed to select water molecules to be replaced by ions. [i]{-1} 73 10.3.2 boat: Purpose: Compute Bonds Angles and Torsions (internal coordinates) from Cartesian coordinates of one or a series of structures. Only internal degrees of freedom that are defined in the connectivity file are computed. A single structure is in CRD format and multiple structures are in PTH or DCD format. Use: boat < input > output Input file types (required) conn or rcon – connecticity file rcrd – coordinate file Output file types (required) boat – output file with values of internal coordinates Variables (in square brackets – type, curly bracket default) coor – type of coordinates to be followed by a space and the type of coordinate files to be read. Options are CHAR PATH or DYNA lpst & lpen – [i] {0} the starting (lpst) and ending (lpen) indices of the structures to be read from PATH or DYNA files. acti – start executing Sample input file rcon name=(valdip.wcon) unit=10 read file rcrd name=(admap.pth) binary unit=11 read file boat name=(valdip_boat_pth.out) unit=12 wovr coor PATH lpst=1 lpen=3 action 74 10.3.3 ccrd purpose: Convert CooRDinates between different formats (CHAR, PATH, and DYNA). Many (confusing) options are available so be aware! For example it is possible to take a list of CHAR file from the standard output (after the “action”) and convert them to a DCD file Use: ccrd < input > output Input file types (required) conn – connectivity file rcrd – primary coordinate file (the meaning of which will be explained below) Input file types (optional) rcr1 – secondary coordinate file Output file types cmbn – combined files. The file declaration must be followed by assignment of an integer variable comb=[i]{0} which is the total number of structures to be read. wcrd – output written coordinate (in the new desired coordinate format). In most application we extract a CHAR file from DYNA or PATH files, or convert between DYNA and PATH files which are done directly between the rcrd file and the wcrd file. Some more interesting applications are also possible. For example it is possible to combine several CHAR file to a single DYNA or PATH file. This is done by setting rcrd to the standard input (unit=5) and reading a number of crd files after the action. Another interesting option is an output from cmbn, in which several DYNA files are merged together. After the action (and before *EOD) a list of DYNA file is given in the usual format with explicit statement of the number of structures to be read from this file. For example the two lines below mean to read 30 structures (10 from 1.dcd and 20 from 2.dcd) . Note that the number 30 must match the cmbn parameter (the total number of structures to be written to the output file). file rcrd name=(1.dcd) bina read lpst=10 file rcrd name=(2.dcd) bina read lpst=20 Variables (in square brackets – type, curly bracket default) wpck – Indicating a pick command for writing coordinates (only picked atoms will be written). The format should be of the form “wpck pick pick_selection done” in one line. For “pick” syntax see 7.4 [l]{false} opck – Indicating a pick command for overlapping the structures when writing output. The whole structure will translate and rotate as rigid body but the rotation matrix will be computed according to the selected set of atoms. The format should be of the form “opck pick pick_selection done”. For “pick” syntax see section 7.4 [l]{false} fpth fdyn fchr fxyz – logical, indicating the format of the input coordinates are either PATH, DYNA, CHAR, FREE respectively. FREE has x,y,z coordinates in a single line 75 assuming that each line correspond to an atom and the line are ordered exactly as in the connectivity file. [l]{false} tpth tdyn tchr – the output coordinate format options: PATH, DYNA, CHAR respectively. [l] {false} wsub –write subset of the coordinates [l]{f} ovlp –overlap coordinates according to selection [l]{f} acti – start computing Example file conn name=(aladip_w248.wcon) read file rcrd name=(oeq_M_96.pth) binary read file wcrd name=(oeq_M_96.dcd) binary wovr fpth tdyn lpst=1 lpen=10 action 76 10.3.4 crd2pdb purpose: convert from CHAR (CRD) format to PDB Use: crd2pdb < crdfile > pdbfile Further input is not required 77 10.3.5 con_specl Purpose: Generating secondary connectivity file that is used in simulation of curve crossing (based on Landau-Zener model). The secondary connectivity file is extracted from a regular file and includes only particles that are involved in the curve crossing Use: con_specl < input > output Input file types (required): rcon – regular connectivity file Output file types (required): wcon – file with connectivity of a subset of atoms that participate in curve crossing Variables spec – turn on the Landau Zener model [l]{false} mos1 – each curve crossing is modeled by a Morse potential that crosses a potential energy of exponential repulsion. These degrees of freedom are covalently coupled to other degrees of freedom (e.g. a bond of CO to the iron in the heme, is coupled to the heme degrees of freedom). We allow up to 4 curve crossing centers (mos1 mos2 mos3 mos4) that were used in the past to model hemoglobin. If found, the mos[i] command must be followed by a selection, for example: chem mono HEME | chem mono CO | #mon 95 95 done Sample input file rcon name=(mbco_m.wcon) read file wcon name=(mbco_s.wcon) wovr specl mos1 pick chem mono HEME | chem mono CO | #mon 95 95 done action *EOD 78 10.3.6 memeqns Purpose: This program takes in results from milestoning calculation (program fp) and postprocesses them and calculates kinetics/equilibrium information about the system. It runs in the interactive mode where user chooses the kind of analysis to be done on the fly. One can either calculate equilibrium properties (answer y for question „Equilibrium run? (y/n)“) or mean first passage times (MFPT mode). In the equilibrium run, one can opt for QK analysis (question „Perform QK integration?“) of the data (1). In the MFPT mode one specifies one of the milestones and the mean first passage time (mfpt) from this milestone to the last milestone is calculated. Optionally the reverse mfpt is calculated as well. The MFPT provides the overall first passage time of the process which is the most straightforward and easy calculation to do (West, Elber 2007). The QK formulation integrates the (integral) equation and provides the most detailed information, including p(i,t) the probability of being at milestone i and time t (ref. (Faradjian and Elber 2004)) A milestoning run produces files from the program fp, one file per milestone. This file includes a list of termination times of trajectories initiated on Milestone i and terminating on Milestones i+/-1 Transition times to i+1 are recorded as positive and transition time to i-1 as negative. Note that the current version of Milestoning handles only sequential Milestones. Extensions for general arrangements of Milestones are in progress. User should also prepare another file that lists filenames of all milestoning result files, memeqns will ask for location of this file during the input collection from the user. The results of the analysis are printed to the standard output. In the case of QK integration, iterative evolution of transition probability vector q(i,t) is written to the file fort.10 (the probability to make a transition to Milestone i at exactly time t) and evolution of p(i,t) vector to the file fort.11. See (Faradjian and Elber 2004; West, Elber et al. 2007) for definitions of p and q probability vectors. Use: memeqns (and reply to the queries that follow) 79 10.3.7 reconstruct Purpose: This program takes in all-atom representation of two different conformations (files in CHARM format) of the same molecule (single wcon file). It also takes a coarsegrained (CA and CTERM particles only) representation of a trajectory between these two endpoints. It generates all-atom representation of the trajectory by using the reconstruction algorithm described in (Majek, Elber et al. 2009) in section VI. Use: reconstruct < input Input file types (required): conn – connectivity file of all-atom model (water, ions, etc should not be included) rcr1 – read all-atom coordinates of the 1st configuration rcr2 - read all-atom coordinates of the 2nd configuration rpth – read binary (in double precision) representation of a trajectory from structure 1 to structure 2. This trajectory is in a coarse model which specifies only CA particles and second oxygen of all CTER residues. Output file types (required): wpth – writes all-atom representation of the trajectory into this binary file (in double precision format) Variables: (in square brackets - type, curly bracket default) #str – number of structures in the input binary file [i] {1} join – if present, the binary input file is assumed to be in the order 1 ,2 , ..., n-1, n, n-1, ..., 2, 1. The same order of frames is preserved on output. This order is useful for visualizing the trajectory, since there is no jump in the movie if you play it in loops. [l] {false} Sample reconstruct input file conn name=(all_atom.wcon) read file rcr1 name=(start.crd) read file rcr2 name=(end.crd) read file rpth name=(start_to_end.PTH) bina read file wpth name=(all_atom.PTH) bina wovr #str=100 action 80 10.3.8 path_eqw Purpose: This program takes a set of solvated structures (CHARM format) same solute and box size possibly solvated by different number of water molecules. It merges all the structures to a single binary file (double precision) ready for path/free energy/fp calculations. It further sets the number of water molecules to be equal in all the structures. It does so by assuming that water molecules are at the very end of the input files (which is typically the case) and remove any extra entries from the end of each input structure. If a structure in the input has a smaller amount of water molecules present, extra water molecules are added with dummy coordinates set to 9999. In contrast to other moil programs, path_eqw, accepts its input in the exact pre-specified order (see below) and does not support commented lines (~). Use: path_eqw < input Sample path_eqw input file name=(template.wcon) read #str=3 file name=(output.PTH) bina wovr file name=(structure_1.wcon) read file name=(structure_1.crd) read file name=(structure_2.wcon) read file name=(structure_2.crd) read file name=(structure_3.wcon) read file name=(structure_3.crd) read In the first line, the template connectivity file is specified, the desired number of water molecules in the output file is set to the number of water molecules in this connectivity file. In the second line, number of structures (N) to follow is specified. The next line assigned the output coordinate file. Then 2N lines specifying N structures follows, a connectivity file followed by a coordinate file repeats N times. 81 10.3.9 ovrlp_trj Purpose : Align structures in trajectory with respect to a reference structure minimizing the mass weighted rmsd from a given structure in the trajectory to the reference. The output is a binary file where the aligned structures are stored. Use : ovrlp_trj<input>output Input file types conn – connectivity file. rcrd – the file where the Cartesian coordinates of all the particles are stored. This file stores the coordinates of the reference system. The only possible format for this coordinates file is: ctyp=(CHARM) – coordinates taken from a charmm format file rdyc – File where the dynamic coordinates are stored. The only format allowed is DCD. Output file types wcrd – a binary file where the aligned coordinates of the atoms are stored. The only allowed format is DCD Other instructions Variables: (in square brackets – type, curly bracket default norw – Do not rewind the coordinate file to be read. By default it rewinds the file.[l]{false} #str – number of structures to be looking at in a dynamic or path file. [i]{0} pick – particles that you choose to align the structures. 82 10.3.10 Numerical Purpose : Calculate numerically the second derivative of the energy. It can be computed for all energy terms, or just picking up some of them and discarding others. It returns in standard output the matrix with all the second derivatives (3 directions per particle to the square, so 9npt^2 elements). Use : numerical<input>output Input file types rcon or conn: wcon file obtained calling conn which has all the information regarding the molecular topology and the parameters required for energy calculations rcrd - the file where the Cartesian coordinates of all the particles are stored. Only CRD format is supported: Output file types wene – file where the energy output is stored Variables EPROP ARE AVAILABLE FOR numerical. debu – Prints a lot of debugging information. shif - A flag indicating different style of cutoff which is no longer used. 10.3.11 solvatecrd Purpose : This program takes a file with a solute and a file with a water box and solvate the solute avoiding overlapping of water with the solute itself and cutting off water particles whose oxygen is not inside the input box. This program is used most effectively in GUI while converting PDB file to a solvated structure. Use : solvatecrd < input > output The graphic interface of moil (moil.tcl found in ~/moil/moil.gui/) provides nice and convenient input to convert a PDB file coordinates to internal MOIL coordinates. It generates a connectivity file and solvated the system in a water box in a single moil.tcl submission (mouse stroke). Input file types: required 83 conn – connectivity file. rcrd – read coordinate, can only be CHAR format. It contains the coordinates of the solute. rwbx – read coordinate containg the coordinates of the water box (CHAR format). Output files: Required wcrd - write the coordinates of the solvated system (only CHAR format) wpol - write the poly file corresponding to the solvated system Variables: (in square brackets – type, curly bracket default) xbex, ybex, zbex. The x, y and z coordinates of the center of the box [d] {0d0}. xwbx, ywbx, zwbx are the x, y and z lengths of the rectangular simulation box [d] {1.0}. selc – A pick command to select particles for which the center of mass of the solute is computed [l] {false}. Sample input file:: file conn name=(1mbd.wcon) read file rcrd name=(1mbd.crd) read ctyp=(CHARM) file wcrd name=(1mbd_solv.crd) wovr ctyp=(CHARM) file rwbx name=(../../../moil.crd/watbox.crd) read file wpol name=(1mbd_solv.poly) wovr xbex=0.0 xwbx=50.0 ybex=0.0 ywbx=50.0 zbex=0.0 zwbx=50.0 ~debug action 10.3.12 pdb2puth Purpose: it reads a pdb file and do some changes of the file that includes the addition of the C- and N-terminals, removing duplicate coordinates of the same atom in the crystal structure, editing some atoms or residues names, etc. The program outputs an edited pdb file. Currently, the functions of this program are more easily used through the moil.tcl graphic interface. Use: pdb2puth < input > output Input file types: required rcrd – read PDB file. Output files: Required wcrd – edited PDB file wpol - write the poly file corresponding to the edited PDB file. 84 Variables: (in square brackets – type, curly bracket default) MOLC- it provides the molecular name, a maximum of 4 characters Sample input file rcrd name=(3SDH.pdb) read ctyp=(pdb) file wpol name=(3SDH.poly) wovr file wcrd name=(3SDH-1.pdb) wovr ctyp=(pdb) MOLC=(3SDH) action *EOD 85 10.4 Analyses 10.4.1 av_dif Purpose : Compute water properties from MD simulations: average diffusion constant Use: av_dif < input > output Input file types: required conn – connectivity file. rcrd – read coordinate, can be either PATH or DYNA format (the keywords DYNA or PATH must be present in the same line. Variables: (in square brackets – type, curly bracket default) norw – do not rewind the coordinate file for each read (ensures faster reading) lbox – length of the water box [d]{0.d0} lpst – the first structure index in the coordinate file to be analyzed [i]{1} lpen – the last structure index [i]{1} tau1 – the time window used to estimate the diffusion constant [d]{1.d0} dens0 – upper bound for the density [d]{1.5d0} nrmono – - number of solute monomers [i]{0} nratom – number of solute atoms [i]{0} A pick command for the OH2 (water oxygens) must be present Output files None. Results are written to the standard output. 10.4.2 Contacts Purpose: calculate the distance and collision numbers for a picked group of a selected set of atoms (for example a diatomic ligand diffusing in a protein). This is computed for a sequence of dynamic structures. Use: contacts < input > output Input file types: required conn – connectivity file rcrd – coordinate file (MUST be of dcd type) Output files: Required wsum – write a summary file of all collisions wave – write average collision numbers 86 Variables (in square brackets – type, curly bracket default) norw - [l]{f}do not rewind the dcd file after each read (used for faster reading) rcut - cutoff distance to define a collision [d]{5} #str - number of structures in the dcd file [i]{0} A pick command for the subset of colliding particle must be present. 10.4.3 dxdl Purpose: Computes a trajectory as a function of the arc-length (instead of as a function of time) using the initial value formulation and compares the results to boundary value calculations. Use: dxdl < input > output Input file types: required conn – connectivity file rcrd – coordinate file (path format) other input options in the code: norw – Files are read from a binary file without “rewinding” the file, which is usually a lot faster for structures read sequentially [l]{f} coor – [character] {unkw} Acceptable value for coor are the three different internal coordinate formats CHAR DYNA and PATH. If the formats are PATH or DYNA and the number of structures is different from one then the variables lpst and lpen (see below), MUST be in the same line #str – number of coordinate frames in the file. [i]{0} list – number of steps between updates of the non-bonded list [i]{20} hvdw – [l]{false} set finite van der Waals radius for hydrogen atoms (usually zero in OPLS). Helps to avoid numerical instabilities at high temperature simulations or when the initial structure is highly distorted rmax –A single cutoff for all non-bonded interactions. Used to indicate no cutoff , i.e. rmax=9999. Not used anymore to indicate actual cutoff and kept for past consistency [d] {-1.d0}. epsi – [d] {1.d0} dielectric constant. Most applications do not use it and its impact is pre-computed to the connectivity file. cdie – Use constant dielectric (=1). [l]{true} gbsa – turn on Generalized Born Surface Area calculations (Tsui and Case, 2000). [l] {false} gbsu – frequency of updating the gbsa neighbor list [i] {0} A pick command is possible Output file types: required wtor – Output coordinates 87 10.4.4 eff_difdens purpose: Computes spatial density and diffusion constants for water molecules using a grid of a rectangular periodic box. There is some overlap of the present module with the module av_diff Use: eff_difdens < input > output Input file types: required conn – connectivity file rcrd – coordinate file. Only type PATH or DYNA are allowed. Output files types: required None. Results are written to the standard output and to Fortran file indices 102 and 103 Variables (in square brackets – type, curly bracket default) lbox –dimension of cubic box [d]{0.d0} lpst –a starting index for structures in the file [i]{1} lpen – an ending index for structures in the file [i]{1} tau1 – the time interval used to estimate the diffusion constant, typically 1ps [d] {0.d0} dens0 – the maximum value for the density [d] {0.d0} ddens – an increment for the density [d] {0.d0} nrmono – number of solute monomers (to generate PDB file) [i] {0} nratom – number of solute atoms (to generate PDB file) [i] {0} norw – Files are read from a binary file without “rewinding” the file, which is usually a lot faster for structures read sequentially [l]{f} pick command for selecting OH2 atoms (waters’ oxygens) is required. See 7.4 for a description of the pick command. 10.4.5 Fluc Purpose: computing fluctuations and RMSD difference for a molecular dynamics trajectory and a reference structure. Use: fluc < input > output Input file types (required): conn – connectivity file rcrd – coordinate file for reference system in CHARM format rdyc – dynamic coordinates for analysis Output file types (required): wrms – write rms values as a function of time (time, rms) compared to the reference coordinate wave – write rms with respect to the average structure (time, rms) 88 wflu – time-averaged thermal fluctuation at different residue positions (B factors) Variables (in square brackets – type, curly bracket default) norw – Do not rewind a sequential binary file for next read. Usually faster for reading sequential frames of a trajectory [l] {f} #crd – number of Molecular Dynamics steps before writing a coordinate set to the DCD file. [i]{0} #ste – number of molecular dynamics steps [i]{0} step – the size of the time step[d]{0.01}.. pick –command for selection of a subset of atom for rms or fluctuation calculations. See section 7.4 for an explanation of the options in pick command. action – stop reading input and start processing Sample input file conn name=(*.wcon) read file rcrd name=(*.crd) read file rdyc name=(*.dcd) bina read file wrms name=(*.rms) wovr file wave name=(*.ave) wovr file wflu name=(*.flu) wovr #ste=200000 #crd=1000 step=0.001 ~ pick only protein particles (no water, no ions) pick pick #mon 1 298 done action *EOD 10.4.6 rgyr Purpose: Compute the radius of gyration of a coordinate system. Use: rgyr < input > output Input file types (required): conn – connectivity file rcrd – coordinate file in CHARM or PATH format (default CHAR) Output file types (required): wcln – write radius of gyration sequentially for molecular frames Variables (in square brackets – type, curly bracket default) norw –if true do not rewind a sequential binary file for next read. Usually faster for reading sequential frame of a trajectory [l] {f} #str – number of coordinate frames in the file. [i]{0} #crd – number of frames between writing up coordinates [i]{0} 89 step – size of time step. [d]{0.d0} pick – command for selection of a subset of atom for rms or calculations of fluctuations. See section 7.4 for an explanation of the options in pick command. action – stop reading input and start processing 10.4.7 rms_2crd Purpose: compute the mean square distance of two crd sets (same number of atoms, noalignment) sharing the same connectivity file. Selection is possible for a subset of atoms to be used in the overlap calculations (Kabsch [reference]) and a second for the calculations of the distance Input file types (required): conn – connectivity file rcc1 – first coordinate file (CHAR only) rcc2 – second coordinate file (CHAR only) Output file types (required): None Variables (in square brackets – type, curly bracket default) sel1 – select first group of atoms for overlap followed by a “pick” command sel2 – select a second group of atoms for RMSD calculation followed by a “pick” command 10.4.8 rms_2path Purpose: Compare the RMSD of two CHARMm files or of all pairs of structures from a PATH set. Input file types (required): conn – connectivity file rcc1 – first coordinate file (only CHAR) rcc2 – second coordinate file (only CHAR) rpc1 – first coordinate file in PATH rpc2 – second coordinate file in PATH Output file types (required): wrms – write rmsd output data. Variables (in square brackets – type, curly bracket default) st1s – index str to start with of structures of set 1 [i]{1} st2s – index of str to start for the second set [i]{1} 90 st1e – index str to end with of structures of set 1 [i]{1} st2e – index of str to end the second set [i]{1} #rmp – write rms of #rmp monomer [i]{1} frzp – pick frozen particle: particle that are frozen are given mass zero and they do not participate in the orientation and rmsd calculations. 10.4.9 rms_p2p Purpose: Compute rms of all structures in one PATH file against all structures in another PATH file. The output is a matrix of rmsd for all (i,j) pairs. Input file types (required): conn – connectivity file rpt1 – first coordinate file ( PATH only) rpt2 – second coordinate file (PATH only) Output file types (required): None Variables (in square brackets – type, curly bracket default) len1 – [i]{1} length (number of frames) of file no.1 len2 – [i]{2} length (number of frames) of file no. 2 sele – a pick command for selection of particles for overlap and rmsd calculations. See 7.4 for the syntax of pick command. 10.4.10 rms_resd Purpose: Compute rmsd of a trajectory compared to the average structure and the B factors Input file types (required): conn – connectivity file rcrd – reference coordinate (CHAR) rdyc – trajectory (DYNA) file Output file types (required): wrms – writing rms results Variables (in square brackets – type, curly bracket default) #str – [i]{1} number of structures pick – a pick command for selection of particles for overlap and rmsd calculations. See 7z.4 for the syntax of pick command. 91 10.4.11 SuperTMscore Purpose : This tool carries out a comparison between two structures and finds out the superposition that has the maximum TM-score (reference: Yang Zhang, Jeffrey Skolnick, Proteins 2004 57:702-10). Then overlaps the first structure with the second in a way that their mass weighted rms is a minimum. Use : superTMscore < input > output Input file types: required conn – connectivity file. rcrd - the file where the Cartesian coordinates of all the particles are stored. These are the coordinates that are “moved” in the overlap operation The only possible format for this coordinates file is: ctyp=(CHARM) This file is used only if there is no rdyc file, and it is used only to minimize the rms, not the TM-score rcor - this file stores the coordinates that are “fixed” in the overlap operation. This file is used only if there is no rdyc file and must be of CRD format. Finally it is only used to minimize the rms, not the TM-score rdyc - file where the coordinate set from dynamics run are stored. Also in this case the only format allowed is DCD. If this file is found, both rms and TM-score are computed with the coordinates in this file. Output file types: required wrms – file where the TM-score is stored if there is a rdyc file, the rms otherwise Variables: (in square brackets – type, curly bracket default) #str – number of structures that will be considered in the TM-score calculation [i]{1} 10.4.12 superback Purpose : This tool carries out a calculation of the fluctuations per residue, and aligns the structures on a dynamic file minimizing the rms. Use : superback < input > output Input file types: required conn – connectivity file. rdyc - file where the coordinates of the dynamics run are stored. The only format 92 allowed is DCD. The content of this file is used to compute fluctuations. The coordinates in this file are aligned with the coordinates in reference structure before computing the fluctuations. rcrd – the file where the Cartesian coordinates of all the particles are stored. the coordinates are the reference structure. The only possible format for this coordinates file is: ctyp=(CHARM) – coordinates taken from a charmm format file Output file types: required wdyc – file where the dynamic coordinates after alignment are stored wflu – file where the fluctuations are stored. Enables also the calculation of the fluctuations wave – file where the average rms is stored. Enables the print out of the average rms wrms - file storing rmsd calculations wave – file with the average rms Variables: (in square brackets – type, curly bracket default) subs – select the particles for rms calculations [l]{false} norw – do not rewind the coordinates file while computing the average of the fluctuations [l]{true} nstr – number of structures for which we compute the average of the fluctuations [i]{1} jump – step in reading rdyc coordinate file [i]{1} 10.4.13 superrms Purpose : Overlap two structures such that their mass weighted rms is a minimum. Use : superrms < input > output Input file types: required conn – connectivity file. rcrd - the file where the Cartesian coordinates of all the particles are stored. Coordinates must be CRD format file. This file is used only if there is no rdyc file. rcor - the Cartesian coordinates of all the particles of a reference coordinate system. The format is CRD. rdyc - file where the dynamics coordinates are stored. Format is DCD. The rms is computed with the coordinates in this file kept as a reference, the coordinates taken from the rcrd file are moved. Output file types: required wrms – file where the TM-score is stored if there is a rdyc file, if not, the rms is the output Variables: (in square brackets – type, curly bracket default #str – number of structures for the rms minimization in the dynamic coordinate file is used. [i]{1} 93 CAon – turns on the calculation of the rms only for Cα setting to zero the mass of all the other particles self – if this flag is found, the rms is computed only within the rdyc file, without using as a reference the rcrd file (which is the default). pick – command to pick a subset of particles. The rms will be computed only within this subset. If no pick is found, all the particles are used. 10.4.14 str_measures Purpose : Compute the tensor of inertia, a measure of sphericity, and a “shape” measure for a collection of point masses. The diagonal elements of the tensor of inertia are: Txx = ∑ mi ri2 − xi2 ( ) and the off diagonal elements: Txy = − ∑ mi xi yi If we call a1, a2 , a3 the three eigenvalues of the tensor of inertia, and a is their average, the sphericity parameter becomes: 3 ∑ ( ai − a ) D= 2 ∑a 2 ( i 2 ) The limit of zero D means that all the eigenvalues are equal, and the collection of point masses is spherical. The “shape” measure is: s = 27 ∏ (a − a ) (∑ a ) i 2 i A positive s means that the “shape” of the protein is flat(“pita like”), a value bigger than zero means that it is a cylinder (“cigar”). Use : str_measures < input > output Input file types: required conn – connectivity file. Input file types: optional rcrd - the file where the Cartesian coordinates of all the particles are stored. The only possible format for this coordinates file is: ctyp=(CHARM) – coordinates taken from a charmm format file rdyc - read coordinates in DCD format rpth – coordinates are read in path format 94 Output file types wtab – formatted file where the elements of the tensor of inertia are stored wdel - formatted file where the value of the sphericity coefficient is stored wris - formatted file where the value of the “shape” coefficient is stored Other instructions pick – particles that you choose to compute the tensor of inertia. If no particles are picked by an external selection, the program enforces the default in which all particles are selected. Variables: (in square brackets – type, curly bracket default #crd – number of coordinate files in the dynamics file [i]{1} 10.4.15 tmalign (Zhang and Skolnick 2005) Purpose : Align a PDB structure (hereafter ‘structure.pdb’) to a target PDB (hereafter ‘target.pdb’). A transformation matrix is produced giving the translation and rotation to be applied to structure.pdb. A score, “TM-score” is also produced which assigns a metric to the resulting alignment. Detailed information about the alignment and scoring procedure can be found at: Zhang & Skolnick, Nucl. Acid Res.2005 33, 2303-9 The program was written by the above authors. A simple addition was made for inclusion in the MOIL package (see below). Use: (the following instructions are produced if tmalign is run without arguments) 1. Align 'structure.pdb' to 'target.pdb' (By default, TM-score is normalized by the length of 'target.pdb') >tmalign structure.pdb target.pdb 2. Run TM-align and output the superposition to 'TM.sup' and 'TM.sup_all': >tmalign structure.pdb target.pdb -o TM.sup To view the superimposed structures of the aligned regions by rasmol: >rasmol -script TM.sup) To view the superimposed structures of all regions by rasmol: >rasmol -script TM.sup_all) 3. If you want TM-score normalized by an assigned length, e.g. 100 aa: >tmalign structure.pdb target.pdb -L 100 If you want TM-score normalized by the average length of two structures: >tmalign structure.pdb target.pdb -a If you want TM-score normalized by the shorter length of two structures: >tmalign structure.pdb target.pdb -b 95 If you want TM-score normalized by the longer length of two structures: >tmalign structure.pdb target.pdb –c * A new option added for the MOIL version: ‘-t’ may be supplied to force a “trivial” alignment of the two structures (target and structure should be the same length). ** The tmalign program is also utilized by Zmoil for doing alignments of PDB or CRD coordinate files, allowing the visualization of a number of alignments, as well as the saving of new coordinates based on the alignment. In the case of CRD formatted files, the MOIL program crd2pdb is first run to produce a temporary structure in PDB format on which tmalign can operate. 10.4.16 Torstat Purpose : Torsion Statistics for a set of protein structures. Program picks the relevant atom sets and calculates the phi, psi and if exists chi angle of each residue in the protein structure. If chi angle is not present (glycine, alanine, and proline) chi is set to -999.0 Use : torstat < input>output Input file types: required conn – connectivity file. rcrd – the file where the Cartesian coordinates of all the particles are stored. coor – [character] {unkw} Acceptable value for coor are the three different internal coordinate formats CHAR DYNA and PATH. If the formats are PATH or DYNA and the number of structures is different from one then the variables lpst and lpen (see below), MUST be in the same line lpst– [i] {1} the starting index of a structure in unformatted coordinate file lpen– [i] {1} the ending index of a structure in unformatted coordinate file Output file types: required tors – file where the torsions are stored. Sample input file rcon name=(molecule.wcon) unit=10 read file rcrd name=(AtoB.pth) binary unit=11 read file tors name=(tors.out) unit=12 wovr coor PATH lpst=1 lpen=25 action Sample output 2 GLY 45.033 -53.495 -999.000 -999.000 0 96 3 ASN -54.236 -52.080 1.002 -121.378 4 ASN -70.088 -44.782 31.763 -75.281 5 GLN -49.863 -17.781 50.472 -112.609 10.4.17 xangle 0 0 0 Purpose : Extract angles from dynamics file Use : xangle < input > output Input file types: required conn – connectivity file. rcrd - the file where the Cartesian coordinates of all the particles are stored (dcd format). wcrd – file where the angles are stored Variables pick–flag to indicate that the present line is for selection of a subset of particles norew – do not perform a rewind on a file [l]{false} #str – number of structures in rcrd file. [i]{0} Sample Input file conn name=(val.wcon) unit=10 read file rcrd name=(valpath.dcd) bina read file wang name=(angle.out) wovr pick pick #prt 1 1 | #prt 5 5 | #prt 8 8 done #str=10 action 10.4.18 xcrd Purpose : Extract coordinates from dynamics file Use : xcrd < input > output Input file types: required conn – connectivity file. rcrd – the file where the Cartesian coordinates of all the particles are stored (dcd format). wcrd – file where the distances are stored 97 Variables pick – selection of a subset of particles for write to wcrd norew – do not perform a rewind on a file [l]{false} #str - number of structures in rcrd file. [i]{1} str1 – the first structure to be read [i]{1} #mon – running monomer index for write [i]{0} Sample input file conn name=(val.wcon) unit=10 read file rcrd name=(valpath.dcd) bina read file wcrd name=(crd.out) wovr pick pick #prt 1 1 | #prt 5 5 done #str=10 action 10.4.19 xtors Purpose : program to extract the torsion along a trajectory Use : xtors < input > output Input file types: required conn – connectivity file. rcrd - read Cartesian coordinates in DCD format. wtor – write to wtor output torsions Parameters pick –flag to indicate that the present line is for selection of a subset of particles #str – number of structures to be looked at a dynamics or path file. norew – do not rewind a DCD file in sequential reads Sample input file conn name=(val.wcon) unit=10 read file rcrd name=(valpath.dcd) bina read file wtors name=(tors.out) wovr pick pick #prt 1 1 | #prt 5 5 | #prt 6 6 | #prt 10 10 done #str=10 action 98 11 MOIL files 11.1 monomer The monomer file is where the connectivity of the particles in a monomer is listed and where the rules for joining monomers are defined. It is an input to the "conn" program in order to generate the connectivity file for the complete molecule. Typically it is NOT input prepared by the user and existing databases are used. The structure of the file is as follows: The top of the file must be (or the first non-comment line ): MONO LIST Following the title the different monomers are listed. Each monomer starts with the line MONO=(NAME) #prt=5 chrg=0. where NAME is the name of the monomer type that can be at most four characters (all character assignments must be closed in parenthesis (…) and this includes of course also the monomer name). #prt is the number of particles in the monomer. This includes also virtual particles used to link the monomer to next or previous monomers. chrg is the charge of the total monomer INCLUDING the virtual particles. The virtual particles are included since their type is the one that will be finally used (see below). The total charge is used only for test purposes. After that line a list of unique names of particles (to that monomer) and their types is provided (the last three assignment of SAID PCHG and divi are optional. HERE is assumed implicitly unless PREV or NEXT are found): ~ unique nam type link UNIQ=(UNAM) PRTC=(PTYP) HERE UNIQ=(B) PRTC=(BTYP) PREV UNIQ=(C) PRTC=(CTYP) NEXT surface chrg divisions SAID=16 PCHG=0.1 divi=1 SAID=1 divi=2 SAID=7 divi=2 where the UNIQ command assigns a unique particle name (unique to that monomer). The name can be at most four characters and must appear only once within a monomer). The PRTC defines the particle type and it is matched against the PNAM data from the property file. This is an essential match to determine the parameters for energy calculations. Failure to match particle types results in program termination. The link information makes it possible to have a set of monomers and to link them to a polymer in automated fashion. The following keywords are available for the link action (note that only four characters from a keyword are actually used): 99 HERE (or blank) - This is a normal particle of the present monomer NEXT - This particle belongs to the next monomer, when the connectivity file is generated by linking connectivity information between monomers. The NEXT particle is identified in the monomer that follows up (NEXT) according to its UNIQ name. This allows (for example) attaching an N-terminal residues that consists of the three hydrogen atoms and a NEXT nitrogen atom. We incline to use the NEXT facility over the PREVIOUS option, since typically we start with the N terminal, though the results would be (of course) identical. If a NEXT particle is not found a warning is issued.. This warning is not terminal since it is possible that a NEXT particle will be missing (for example when attaching a C terminal to an amino acid). In that case the extra particle is removed. All the bonds of a NEXT particle to the current monomer atoms are kept when a matching is made. If there is a conflict between the particle types between the NEXT and actual UNIQ particles the NEXT or PREV (see below) assignments take precedence. For example, in the N terminal we have three identical and charged hydrogen atoms. However the default structure of an amino acid includes the usual amide hydrogen atom. The last is replaced by a hydrogen atom type that is equivalent to the other two hydrogens in N terminal. PREVIOUS - The particle belongs to the previous monomer, when the connectivity file is generated the bonds of the previous particle are transferred to the corresponding (identical UNIQ name) atom in the previous monomer. DNXT - Remove a particle in the next monomers. This option is not used at present in the ALL.MONO file DPRV - Remove a particle in the PREVious monomer. This option is not used in the present ALL.MONO file. After the link list the “surface” expression is optional. The keyword SAID=[i]{0} implies that the surface attached to this particular atom belongs to type SAID (Surface Area IDentification). This is useful in calculations of hydrophobicity which is modeled as proportional to the solvent exposed surface area. This expression is optional. Obviously it is completely unnecessary if explicit water molecules are used. Yet another option is the use of PCHG. By default the charge of a particle is stored in the property file and the charge is assigned to a UNIQ atom according to the atom type PRTC. However, to allow greater diversity and for consistency with other force fields that assign new set of charges for each monomer while keeping all other parameters (van der Waals, bonds angles, etc) the same, we may assign charges at the monomer level. This assignment overrides an assignment by the PROP file. The use is simply to add PCHG=[d]{NONE} to the line of the (re)charged atom. The final entry in the particle line is that of divi. Non-bonded lists are computed in MOIL in two steps. First, a division neighbor list is generated and then an atom neighbor list is created based on the coarser division list. The divisions are groups of atoms for which the center of mass is computed and used to generate a coarse division list. How to define these groups? The default is to use the monomers as the division. Each monomer as defined in ALL.MONO is one division. Alternatively one may use the divi=[i]{1} 100 optional entry to divide the monomer entry to multiple division. For example to increase accuracy in estimating neighbors to the heme group (which is pretty large monomer) 9 division of the HEM are used in MOIL. The unique particle list ends with DONE A bond list follows the particle list. A Bond is defined by two unique particle names and a dash in between (e.g. A-B is a bond between A and B). Note that special particles for which the connectivity action is different from HERE (i.e. PREV or NEXT) are denoted by *. Example following the definition of particles above is. BOND UNAM-B* UNAM-C* DONE Special particles must come second in bond and a monomer cannot be used to define a bond between two special particles, one of the particles must be HERE (or default – no entry). Note that the * is really needed to avoid ambiguity. Since it is possible to have (for example), particle with UNIQ name A as HERE and also UNIQ A as PREV. This also means that there are some restrictions on the connectivity. Currently it is not possible to make a reference from a given monomer to the same particle name at PREVious and NEXT monomers. It is hard to imagine however a case in which it is truly needed. It is also not possible to create a bond between PREV and NEXT particles Yet another restriction is that a regular UNIQ name that ends with “*” is unacceptable, since a confusion with PREV and NEXT particles is likely. We re-emphasize that cases in which a NEXT particle is defined but not found are possible, a warning will be issued but the warning is not fatal. In fact it can be quite convenient to define peptide link as the carbonyl carbon attached to the NEXT nitrogen for all amino acids in the protein chain. This however fails at the C terminus. To treat the C-terminal correctly, the virtual nitrogen at the C terminus is deleted which bring everything back to normal, a warning about that nitrogen is however issued to the standard output, that warning can be ignored. The angles, torsions, and improper torsions are generated once the bond structure of the complete molecule is formed and they are generated comprehensively. i.e. all possible angles, torsions and improper torsions are formed. Some torsions are then eliminated (those with zero energy contribution). One consequence is that all possible bonds angles and improper torsions MUST be defined in the property file. If torsion is not found a yellow alert is issued (non-fatal warning) and that torsion is ignored. The file ends in the traditional way, i.e. *EOD Finally we give a complete example to define alanine (with minimal essential-only information) 101 MONO=(ALA) #prt=7 chrg=-0.57 UNIQ=(N) PRTC=(NH) UNIQ=(H) PRTC=(HN) UNIQ=(CA) PRTC=(CAH) UNIQ=(CB) PRTC=(CH3) UNIQ=(C) PRTC=(CO) UNIQ=(O) PRTC=(OC) UNIQ=(N) PRTC=(NH) NEXT DONE BOND C-O C-N* C-CA CA-CB CA-N N-H DONE Other examples for a monomer file can be found in moil/moil.mop/*.MONO . The most widely used version is ALL.MONO 11.2 property The property file stores the parameters of the particles, bonds, angles, torsions and improper torsions. It is an input to the connectivity program which builds the molecular connectivity file. The default file can be found in moil.mop and its name is ALL.PROP. Typically prepared files are read instead of generating the atomic properties from scratch. One such prepared file is ALL.PROP. The property file is build from sequential sections which MUST come in the following order: PRTC - individual particle properties 1-4P - scaling parameters for non-bonded 1-4 interactions. BOND - bond parameters ANGLE - angle parameters TORSION - torsion parameters IMPROPER - improper torsion parameters It is possible not to provide all the information i.e. a property file with PRTC only is legal, however PRTC and ANGLE is not. If you provide only PRTC the program will issue a warning (yellow alert), ignore it unless you want to provide bonds and somehow the bonds were not read correctly. Each subsection (e.g. PRTC, BOND, TORSION) must end with DONE The file must end with *EOD The DONE and *EOD are general termination features used in other data files. 102 Another general feature shared between different data files is the comment line. ~ ANYWHERE in the line makes it a comment. These lines are echoed by the interpreter and otherwise ignored. Below details on the syntax are provided: The explanations will be written as comment lines as in a "real" property file ~ This is a first line of a property file. The first exe line must be PRTC ~ The following line lists properties of an individual particle ~ name mass charge epsilon sigma PNAM=(NX) PMAS=14. PCHG=-0.3 PEPS=0.170 PSGM=3.250 ~ The example above provides the data for the particle type NX (Nitrogen ~ of the N-terminal. characters (like PNAM - the name of particle ~ type) must be enclosed in brackets. Each of the expressions (i.e. ~ A=B) must be separated from other expressions by space(s). No spaces ~ within an expression are allowed. ~ epsilon and sigma are the van der Waals well depth ~ and the hard core radius respectively. Obviously this type of ~ line is repeated as needed for different particle types. ~ The data base for particle properties is based on the OPLS potential ~ Jorgensen and Tirado-Rives JACS 110,1657(1988) ~ ~ Now end the particle part by DONE DONE ~ The following part specifies 1-4 scaling parameters of the force field for van den Walls ~ forces (v14f) and electrostatic forces (el14). These parameters are used for scaling ~ the above mentioned interactions between pairs of atoms separated by exactly 3 bonds. 1-4P v14f=0.125 el14=0.5 ~ Finish this section by DONE keyword DONE ~ ~ The next part lists the bond properties. The first Bond line must be BOND ~ Bond energy is set to be K(r - req)^2 ~ Below we provide the names of the two particle types, the force constant ~ (in kcal/mol angstrom^-2) and the equilibrium distance in angstrom ~ Note the different style of i/o different expression are still ~ separated by spaces but no equality is used. This requires the data ~ to be placed in exactly the same order. I.e. do not exchange equilibrium ~ position and force constant. 103 ~ The covalent part of the potential (excluding improper torsions) ~ is taken from AMBER ~ Weiner et al JACS 106,765(1984) ~ particle particle force-constant equilibrium distance NX HX 434.0 1.01 CANX NX 337.0 1.449 ~ ~ Pictorially NX-HX ~ end the BOND with DONE DONE ~ ~ Angles are similar to bonds in format style ANGLE ~ K (theta -theta(eq))^2 ~ name name name K(kcal/mol radians^-2) theta(eq) (degrees) HX NX HX 35.0 109.5 ~ ~ Pictorially HX-NX-HX DONE ~ ~ And here are the torsions. The format style is similar to BOND and ANGLE TORSION ~ (Pictorially CAH-CO-NH-CAH) ~ however the energy function is more complex: ~ E = sum k(n)*(1 + cos(n*phi+gamma) ~ (gamma should be a function of n too and will be added to the program ~ soon). Currently the format is ~ name name name name k(1) k(2) k(3) n cos(gamma1) cos(gamma2) cos(gamma3) CAH CO NH CAH 0.0 2.5 0.0 2 -1.0 -1.0 -1.0 ~ There is an option in TORSION (only) to use a wild card by X, e.g. X CANX CX X 0.0 0.0 0.0 3 0.0 0.0 0.0 ~ where X means "any atom". ~ **** ALL TORSIONS MUST BE DEFINED IN THE PROPERTY FILE *** ~ However in many cases the energy is set identically to zero. ~ This is done by setting cos(gamma)=0. When the program matched ~ this torsion, it is skipped and NOT included finally in the ~ connectivity file ~ DONE ~ 104 ~ Improper torsions are four body interactions in which one atom is sitting ~ in the center, Pictorially ~ B ~ | ~ A ~ /\ ~ C D IMPROPER ~ The internal degree of freedom - phi, is the angle between the normal ~ to the ABC plane and the normal to the BCD plane. To obtain consistent ~ values A must be first and D must be last ~ ~ The energy function is rather messy... ~ If the equilibrium angle is far from zero we use simply harmonic term ~ E = K1(phi-phi(eq))^2 (K1 kcal/mol radian^-2 ; phi(eq) degrees) ~ If the equilibrium angle equals zero then the above energy expression ~ is singular, we therefore use ~ E = K2(cos(phi) - cos(phi(eq))^2 (K2 kcal/mol ; phi(eq) degrees) ~ Note that the units for K is different in both cases. Note also ~ that at phi(eq)=0, Taylor expansion shows that E is quartic in phi ~ therefore to maintain comparable restoring force K2 > K1 ~ ~ The atom in the center is always first, the last atom must also be chosen ~ with care since it determines the sign. The other two in the middle ~ can be interchange ~ name name name name K1/K2 phi(eq) CANX NX CO CH3 55.0 35.26 DONE ~ End it all *EOD PROPERTY files are kept in moil/moil.mop/*.PROP 11.3 poly This (short) file is typically prepared by the user as input to the conn program (generating a connectivity file or wcon). It contains a list of the monomers that forms his/her molecule of interest. The file poly is accessed in the conn program and the conn input looks something like file poly name=(ala3.poly) read A simple example is below 105 MOLC=(BIG) #mon=3 NTER ALA CTER *EOD MOLC is the molecule name (called BULK in the conn file) four characters at most. #mon is the number of monomers. The line that follows includes the monomer names. The number of monomers found should match the number of monomers declared in the first line (#mon). 11.4 addbond Sometimes it is necessary to add a bond to a connectivity file since the automated generation of bonds cannot cover everything. For example the binding of carbon monoxide to a heme iron is not modeled with the usual tools of MOIL. It is therefore useful to have a tool to explicitly add bonds between pairs of atoms. The file for bond addition is an input to the conn program. file ubon name=(mb10co.addb) read The addbond file include one (or more ) line(s) identifying added bond(s). Each line follows the general syntax bond select-one-atom select-a-second-atom . The selection is slightly different from the pick command. For example bond chem HIS 95 NE2 HEM1 157 FE Which is interpreted as “a bond between the unique atom NE2 of Histidine residue number 95 and the unique atom FE of the heme residue number 157” Morse bonds (using the function D(exp(-2*a(r-r0))-2exp(-a*(r-r0))) ) can also be added. morse atm1=[i]{0} atm2=[i]{0} Where the integer entries are the indices of the atoms within the connectivity file. The syntax is not ideal since the parameters are set via the standard input of dyna, energy, or mini. The morse energy parameters are not read from the prop file. The file ends with *EOD 106 11.5 edit Sometimes it is useful to remove some terms from the connectivity data structure as generated automatically with conn. The file edit is serving that purpose. It is used during a call to connect. The prime keyword is “remo”. Only bond or an angle can be removed. A typical line to remove a bond looks like remo bond atm1=[i]{0} atm2 =[i]{0} The missing id numbers are the indices of the atoms within the connectivity file. A nicer expression is: remo bond chem HEM1 157 NA HEM1 157 FE The remove command is useful in the process of substituting a harmonic bond by a Morse bond. We first remove the usual harmonic bond and then add the Morse term via the addbond option. Similarly we can remove an angle from the list of angles generated automatically. Some of these generated angles are undesired. The example below is typical for the removal of angles. The iron in heme is bonded to 4 nitrogens in a planar arrangement. Two of the angles are linear (180 degrees). It is possible to maintain the structure with the 90 degree angles only. Moreover the 180 degree angles are bad news to the force field. Derivative involves a division by the sine of the angle and 180 degrees causes singularity. We therefore eliminate the 180 degree angles as illustrated below. remo angl chem HEM1 157 NA HEM1 157 FE HEM1 157 NC remo angl chem HEM1 157 NB HEM1 157 FE HEM1 157 ND *EOD ~ 11.6 connectivity The connectivity file is where the complete information to compute the energy of a set of coordinates of a molecule is found (the coordinates are provided separately). It typically ends with the extension *.wcon (written connectivity). The file is created by the program conn based on a list of residues (sequence) provided by the user (i.e. the *.poly file) and the generic database of monomer and particle properties (typically the ALL.MONO and ALL.PROP files). A list of all the covalent energy terms and their parameters is provided and also a list of the nonbonded parameters. The file is formatted and the users are discouraged from editing it or try to create it bypassing the conn program. 107 11.7 Coordinates 11.7.1 PDB file interpretation in MOIL The PDB files are the standard entries of the protein data bank www.rcsb.org. Zmoil views this structure “as is” by building bonds from distance proximity between atoms as well as CONECT records for hetero atoms (see PDB format). The file can be processed through the menu-based interface moil.tcl (see the file get_started.pdf in moil.doc for an introduction to the moil.tcl graphic interface) and is converted to MOIL CRD file which shares a lot of similarities with the CRD format of CHARMM. The MOIL version is more restricted that the CHARMM version. In MOIL the title is fixed and is not available for the user to edit. MOIL interprets only the ATOM and HETATM records of the PDB and ignores the rest. The atom entry is assumed to be of the following format zevel(1:4),j1,char2,char1,i1,xtmp,ytmp,ztmp,moretmp char1 and char2 are the atom and residue unique names that are extracted from the file and are compared to the residue (monomer) list available in the poly file and the atom list available in the connectivity file. The monomer list of the connectivity file must match (in order) the monomers read from the atom records of the PDB. Within a residue all the heavy atoms (non-hydrogens) must find a match with the unique atom of the residue as defined in the connectivity data structure (and the ALL.MONO file). The unique atoms within a residue need not be in the same order in the two files. A mismatch or missing residue or atom name causes a termination of the read. At present MOIL does not support insertion and modeling of atoms or residues with the exception of hydrogens. This will have to be done externally to MOIL. If the match is good the coordinate vector is filled with xtemp, ytemp, and ztmp. The value moretmp is stored internally in MOIL in the “more” array. For PDB file its value is the B factor. 11.7.2CRD file The CRD file is a standard internal MOIL coordinate file which is very similar to the CHARM format except that it is more limited in its options. The file starts with title lines. A title starts with “*” and then a comment is written. A title line with only a “*” and nothing after denotes the end of the title. Note that in MOIL the user cannot modify the title content with the program (only externally). The title is followed by a single line that includes the number of atoms in the file, read as an integer. The number of atoms must match the number of atom records in the file. A mismatch results in termination. The number of atoms is followed by lines that provide the coordinates of the atoms and “more”. 108 The following format is used: i5, i5, 1x, a4, 1x, a4, 3(f10.5), 1x, a4, 1x, a4, f10.5 for the following variables: atom id, monomer id, monomer name, particle name, x, y, z, nothing, optional vector The atom id is read but is ignored. MOIL is doing its own counting to match the number of lines read to the number of atoms stated in the line that follows immediately the title. The residue id is however a must. It is read and compared to the internal id of the monomers read from the connectivity file. The name of the monomer of a particular id must match the name of the residue with the same id in the connectivity data structure. A mismatch results (yes, here we go again) in program termination. The atom unique name in the CRD file must match one of the unique names of a particle in the corresponding monomer. If required the atom list within a monomer is searched to find a match. If a match for an atom cannot be found the program terminates. MOIL expects all the coordinates of the atoms of the monomer just read (as defined by the connectivity data structure) to be read and found. If an atom in the connectivity data structure is not found in the coordinate file, (checked monomer by monomer) the program reports a missing atom and exits. After a monomer and particles are matched, the coordinates x, y, z of the particle are read from the file. Note that the format of f10.5 is better than that of the PDB but it is still limited compared to double precision numbers. For maximum precision the PTH (path) format is the most desired. The records named FREE below are ignored at present in MOIL. It is possible to leave the records after the coordinates simply blank. The read will go on just fine. The final f10.5 vector contains a single number per atom which can be (for example) the B factor of crystallography. A sample of a start of a crd file is below * title for CHARMM coordinates * 29026 1 1 DMPC C1 -36.34491 2 1 DMPC O1 -37.38061 3 1 DMPC C11 -36.37354 4 1 DMPC C12 -35.53492 ... 5.90324 6.06437 4.54210 3.34375 -29.94611 -29.30440 -30.62652 -30.19138 memb memb memb memb FREE FREE FREE FREE 0.00000 0.00000 0.00000 0.00000 109 11.7.3 DCD and DVD files The dcd and dvd files are Dynamics CoorDinate and Dynamics VelocityD files. Coordinate and velocities are completely interchangeable and their format is identical. Used typically to store coordinates and velocities during Molecular Dynamics simulations. They are based on a format developed for CHARMM. The option in MOIL is a subset of what is available in CHARMM but seems to be sufficient for the tasks that we are after. The coordinates are written in single precision. This is probably all that we need for Molecular Dynamics simulations. However, many applications in MOIL require double precision. The files are unformatted with the intention to save space. The format of the dcd/vcd files is as follows. The first record is a header and an integer vector of twenty variables. The header is a character of length 4 that is never used in MOIL. From the vector of 20 integers only the first (the number of coordinate sets to be read) and the ninth (the number of frozen particles) entries are used. The second record is of an integer and a character of length one. Both are not used. The third record includes one integer, which is the number of particles of a molecular frame in the dcd (or dvd) file. This number must match the number of atoms of the connectivity data structure. Otherwise the program terminates with a message on nonmatching number of particles. The forth record includes the pointer to particles that are not frozen. The file is formatted in such a way that only the first coordinate set is complete. In the following coordinate set only the particles that are not frozen are written into the file. The pointer to nonfrozen particles is written as (nofreez(i),i=1,inofrz) where nofreez is the pointer, i.e. nofreez(i) is the i-th particle that is not frozen and inofrz is the number of unfrozen particles. The fifth, sixth and seventh records are all the X,Y, Z coordinates of all the particles of the system. They are written in single precision and translated internally to double precision number. Obviously some precision is lost between write and read. Follow up records includes the X,Y, Z records of the selected particles only. Triplets of records (for X,Y, Z coordinates) continue until all the coordinate sets are read. An important keyword can be set in some programs is “norw”. The DCD/DVD files can be rewind (or not if norw is .true.). If the file undergoes “rewind” every read operation reading a large number of DCD records can take a LOT of time (the read is sequential). Therefore the “norw” option is recommended for multiple reads and in use of analysis programs. 110 11.7.4 PTH files The path file (extension .pth) is a moil “invention”. They are not compatible with other programs but are useful for calculations that produce multiple structures and retain the full double precision of the coordinates. They are unformatted but in a very simple way. There is no title or internal test (we assume that you know what you are doing). Every record is identical and consists of the following sequence of double precision numbers energy_value(if available, zero if not), ((coor(j,i),i=1,npt),j=1,3) coor(j,i) is the Cartesian coordinate j of particle i. 11.7.5 wene and wmin files wene and wmin are the output file of the energy and the mini_pwl programs, an output format that is widely used in MOIL. It is therefore useful to see it once and briefly describe the meaning of the different terms. All the terms are described in the energy section of the documentation. For completeness, we briefly describe them below Parameters for energy calculation Constant dielectric will be used. elec. Cutoff= 9999.00000 vdW cutoff 9999.00000 ENERGIES: E total = -39.919 E bond = 0.754 E angl = E impr = 1.844 E vdw = E 14el = 28.889 E 14vd = E cnst = 0.000 E evsym= E centr= 0.000 E hydro= Norm Force = 4.254 Number Number Number Number Number of of of of of 3.482 3.445 3.234 0.000 0.000 neighbours for short range int. uncharged vdW interactions elec. only interactions wat-wat shrt. range neighbors wat-wat long range neighbors E tors = E elec = 2.029 -83.6 E elsym= 0.000 8 19 7 0 0 The constant dielectric is the only option currently available in MOIL. The cutoff distances of 9999 indicates that no cutoff is used. E total provides the potential energy (not including kinetic). E bond is the bond energy and similarly E angl, E tors, E impr, E 111 vdw, E elec are angle, torsion, improper torsion, van der Waals and electrostatic energies. E 14el, E 14vd are the electrostatic and van der Waals 14 interactions. E cnst corresponds to the energy of the constraints, E vsym and E elsym are van der Waals and electrostatic energies that result from translational symmetry operation (periodic boundary conditions). E center is a restraint added to a set of selected coordinate to avoid diffusion, E hydro is an approximate hydrophobic energy term. The norm of the force is the normalized length of the force vector the number of particles in the system. ∇U t ⋅∇U 3n where n is 11.8 Standard input and output in MOIL Most of the programs direct some output to the standard output that should be read in addition to the specifically designed output such as *wene and *wmin. Error messages are always directed to the standard output. The standard input is used to provide file lists, and initialize variables as discussed earlier in this document. 11.9 Other special files None 112 12 Credit Thanks to all who were involved in different phases of code developments and testing Code developers: Alfredo Cardenas, Ron Elber, Avijit Ghosh, Robert Goldstein, Chen Keasar, Serdal Kirimizialtin, Haiying Li, Peter Májek, Jaroslaw Meller, Debasisa Mohanty, Mauro Mugnai, Roberto Olender, Felicia Pitici, Adrian Roitberg, Amena Siddiqi, Carlos Simmerling, Ileana Stoica, Alex Ulitsky, Gennady Verkhivker, Yael Weinbach, Anthony West, Veaceslav Zaloj GUI developers: Thomas Blom, Baohua Wang, Avijit Ghosh, Current code keeper: Thomas Blom We made a use of the generously provided codes: (1) The Housholder diagonalization routine written by Ryszard Czerminski. (2) The truncated newton-raphson minimization by Stephen G. Nash (3) The Particle Mesh Ewald of Darden and co-workers, J. Chem. Phys. 98,10089(1993) (4) The Spherical Solvent Boundary Potential of Beglov and Roux J. Chem. Phys. 100,9050(1994) (5) Generalized Born model from Tsui and Case Biopolymers 56, 275(2000) (6) TMalign code of Zhang and Skolnick, Nucleic Acids Research 33, 2302-2309 (2005). 113 13 References The reference to the general code is (numerous other references describe concrete applications and modules, consult the main text): R. Elber, A. Roitberg, C. Simmerling, R. Goldstein, H. Li, G. Verkhivker, C. Keasar, J. Zhang and A. Ulitsky "MOIL: A program for simulations of macromolecules", Computer Physics Communications, 91, 159189(1995) Beglov, D. and B. Roux (1994). "FINITE REPRESENTATION OF AN INFINITE BULK SYSTEM - SOLVENT BOUNDARY POTENTIAL FOR COMPUTER-SIMULATIONS." Journal of Chemical Physics 100(12): 90509063. Cardenas, A. E. and R. Elber (2003). "Kinetics of cytochrome C folding: Atomically detailed simulations." Proteins-Structure Function and Genetics 51(2): 245257. Czerminski, R. and R. Elber (1990). "Self avoiding walk between 2 fixed end points as a tool to calculate reaction paths in large molecular systems." International Journal of Quantum Chemistry: 167-186. Czerminski, R. and R. Elber (1991). "Computational study of ligand diffusion in globins 1 Leghemoglobin." Proteins-Structure Function and Genetics 10(1): 70-80. Darden, T., D. York, et al. (1993). "PARTICLE MESH EWALD - AN N.LOG(N) METHOD FOR EWALD SUMS IN LARGE SYSTEMS." Journal of Chemical Physics 98(12): 10089-10092. Elber, R. (1990). "Calculation of the potential of mean force using molecular dynamics with linear constraints -An application to a conformational transition in a solvated dipeptide." Journal of Chemical Physics 93(6): 43124321. Elber, R. and A. Cardenas (2004). "From reaction pathways to classical trajectories." Biophysical Journal 86(1): 34A-34A. Elber, R., A. Ghosh, et al. (2002). "Long time dynamics of complex systems." Accounts of Chemical Research 35(6): 396-403. Elber, R. and M. Karplus (1990). "Enhanced sampling in molecular dynamics - use of the time dependent hartree approximation for a simulation of carbon monoxide diffusion through myoglobin." Journal of the American Chemical Society 112(25): 9161-9175. Elber, R. and D. Shalloway (2000). "Temperature dependent reaction coordinates." Journal of Chemical Physics 112(13): 5539-5545. Faradjian, A. K. and R. Elber (2004). "Computing time scales from reaction coordinates by milestoning." Journal of Chemical Physics 120(23): 1088010889. Ghosh, A., R. Elber, et al. (2002). "An atomically detailed study of the folding pathways of protein A with the stochastic difference equation." Proceedings 114 of the National Academy of Sciences of the United States of America 99(16): 10394-10398. Gibson, Q. H., R. Regan, et al. (1992). "DISTAL POCKET RESIDUES AFFECT PICOSECOND LIGAND RECOMBINATION IN MYOGLOBIN - AN EXPERIMENTAL AND MOLECULAR-DYNAMICS STUDY OF POSITION 29 MUTANTS." Journal of Biological Chemistry 267(31): 2202222034. Honeycutt, J. D. and D. Thirumalai (1989). "STATIC PROPERTIES OF POLYMER-CHAINS IN POROUS-MEDIA." Journal of Chemical Physics 90(8): 4542-4559. Hornak, V., R. Abel, et al. (2006). "Comparison of multiple amber force fields and development of improved protein backbone parameters." Proteins-Structure Function and Bioinformatics 65(3): 712-725. Jorgensen, W. L. and J. Tiradorives (1988). "THE OPLS POTENTIAL FUNCTIONS FOR PROTEINS - ENERGY MINIMIZATIONS FOR CRYSTALS OF CYCLIC-PEPTIDES AND CRAMBIN." Journal of the American Chemical Society 110(6): 1657-1666. Kaminski, G., R. Friesner , et al. (2001). "Evaluation and reparameterization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides." The Journal of Physical Chemistry B 105(28): 6474-6487. Li, H. Y., R. Elber, et al. (1993). "MOLECULAR-DYNAMICS SIMULATION OF NO RECOMBINATION TO MYOGLOBIN MUTANTS." Journal of Biological Chemistry 268(24): 17908-17916. Majek, P. and R. Elber (2009). "A coarse grained potential for fold recognition and molecular dynamics simulations of proteins." Proteins, Structure, Function and Bioinformatics: accepted. Majek, P., R. Elber, et al. (2009). Pathways of Conformational Transitions in Proteins, Crc Press-Taylor & Francis Group. Mohanty, D., R. Elber, et al. (1997). "Kinetics of peptide folding: Computer simulations of SYPFDV and peptide variants in water." Journal of Molecular Biology 272(3): 423-442. Olender, R. and R. Elber (1996). "Calculation of classical trajectories with a very large time step: Formalism and numerical examples." Journal of Chemical Physics 105(20): 9299-9315. Olender, R. and R. Elber (1997). "Yet another look at the steepest descent path." Theochem-Journal of Molecular Structure 398: 63-71. Onufriev, A., D. Bashford, et al. (2004). "Exploring protein native states and largescale conformational changes with a modified generalized born model." Proteins-Structure Function and Bioinformatics 55(2): 383-394. Pranata, J., S. G. Wierschke, et al. (1991). "OPLS POTENTIAL FUNCTIONS FOR NUCLEOTIDE BASES - RELATIVE ASSOCIATION CONSTANTS OF HYDROGEN-BONDED BASE-PAIRS IN CHLOROFORM." Journal of the American Chemical Society 113(8): 2810-2819. Roitberg, A. and R. Elber (1991). "MODELING SIDE-CHAINS IN PEPTIDES AND PROTEINS - APPLICATION OF THE LOCALLY ENHANCED 115 SAMPLING AND THE SIMULATED ANNEALING METHODS TO FIND MINIMUM ENERGY CONFORMATIONS." Journal of Chemical Physics 95(12): 9277-9287. Simmerling, C. and R. Elber (1994). "HYDROPHOBIC COLLAPSE IN A CYCLIC HEXAPEPTIDE - COMPUTER-SIMULATIONS OF CHDLFC AND CAAAAC IN WATER." Journal of the American Chemical Society 116(6): 2534-2547. Steinberg, M. Z., K. Breuker, et al. (2007). "The dynamics of water evaporation from partially solvated cytochrome c in the gas phase." Physical Chemistry Chemical Physics 9(33): 4690-4697. Sugita, Y. and Y. Okamoto (1999). "Replica-exchange molecular dynamics method for protein folding." Chemical Physics Letters 314(1-2): 141-151. Ulitsky, A. and R. Elber (1993). "The thermal equilibrium aspects of the timedependent hartree and the locally enhanced sampling approximations formal proeprties, a correction, and computational examples for rare gas clusters." Journal of Chemical Physics 98(4): 3380-3388. Verkhivker, G., R. Elber, et al. (1992). "Locally enhanced sampling in free-energy calculations - application of mean field approximation to accurate calculation of free energy differences." Journal of Chemical Physics 97(10): 7838-7841. Weinbach, Y. and R. Elber (2005). "Revisiting and parallelizing SHAKE." Journal of Computational Physics 209(1): 193-206. West, A. M. A., R. Elber, et al. (2007). "Extending molecular dynamics time scales with milestoning: Example of complex kinetics in a solvated peptide." Journal of Chemical Physics 126(14). Yang, Z., P. Majek, et al. (2009). "Allosteric Transitions of Supramolecular Systems Explored by Network Models: Application to Chaperonin GroEL." Plos Computational Biology 5(4). Zhang, Y. and J. Skolnick (2005). "TM-align: a protein structure alignment algorithm based on the TM-score." Nucleic Acids Research 33(7): 2302-2309. Zichi, D. A. (1995). "MOLECULAR-DYNAMICS OF RNA WITH THE OPLS FORCE-FIELD - AQUEOUS SIMULATION OF A HAIRPIN CONTAINING A TETRANUCLEOTIDE LOOP." Journal of the American Chemical Society 117(11): 2957-2969. 116