3DSig 2008 - Najmanovich Research Group
Transcription
3DSig 2008 - Najmanovich Research Group
3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting July 18-19, 2008 Toronto, Canada 3DSig Organizing Committee: Ilan Samish, University of Pennsylvania Melissa Landon, Brandeis University Rafael Najmanovich, European Bioinformatics Institute John Moult, University of Maryland Biotechnology Institute 3DSig Scientific Committee: John Moult, University of Maryland Biotechnology Institute Brian Shoichet, UCSF Ivet Bahar, Pittsburgh U. Phil Bourne, UCSD Tanja Kortemme, UCSF Alfonso Valencia, CNIO-Madrid Tamar Schlick, New York University 1 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Table of Contents Program 4 Keynote Abstracts 6 Oral Presentation Abstracts 7 Laptop Presentation Abstracts 28 73 List of Registrants 76 Index by Abstract Number 2 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada At Roche we contribute to improving people’s health and quality of life by developing and marketing innovative therapeutic and diagnostic products and services. Your ideas could help shape tomorrow’s innovations in healthcare. Plans are life’s roadmap to the future. Come realise your plans with us: www.careers.roche.ch 3 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Protein Structure, Function and Dynamics Day 1 Time ID Title Presenting author 8:55 Opening remarks – Ilan samish 9:05 9:35 Session 1 Predicting, analyzing and evaluating dynamic function (Chair: Melissa Landon) Toward elucidating allosteric mechanisms of function via K1 Ivet Bahar structure-based analysis of protein dynamics Dariya S. Glazer, Randall J. Radmer & 64 4D Structure-based Function Prediction Russ B. Altman 9:55 68 An Automatic Server for Function Prediction Evaluation Michael Tress, Alfonso Valencia, Michael Sternberg & Mark Wass Page 5 8 9 Coffee 10:15 Session 2 Joint with Automatic Function Prediction SIG (Chair: John Moult) 10:35 11:15 On the nature of protein fold space: extracting functional Donald Petrey, Markus Fischer & Barry information from apparently remote structural neighbors Honig Assessing functional novelty of PSI structures via Benoît H Dessailly, Oliver C Redfern & AFP structurefunction analysis of large and diverse Christine A Orengo superfamilies K2 11:35 16 The evolution of protein function driven by a multidomain repertoire (MGMS awardee) 11:55 K3 Prediction of functional sequence and structure 12:35 15:00 15:30 15:50 16:10 characteristics based 6 Syed Ali & Michael Sternberg 10 Alfonso Valencia 6 on Lunch & Poster/Laptop session (odd ID numbers) Session 3 Protein – nucleic acid complexes (Chair: Chakra Chennubhotla) Chromatin structure insights revealed by mesoscale 6 Tamar Schlick K4 modeling Predicting DNA-binding affinity of modularly designed Peter Zaback, Jeffry D. Sander, J. Keith 11 66 Joung, Daniel, F. Voytas & Drena Dobbs zinc finger proteins Remo Rohs, Sean West, Peng Liu & Barry 29 Minor groove electrostatics and binding specificity 12 Honig Ben A. Lewis Mateusz Kurcinski, Deepak Combining Predictions of Protein Structure and ProteinReyon, Jae-Hyung Lee, Vasant Honavar, 60 RNA Interaction to Model the Structure of the Human 13 Robert L. Jernigan, Andrzej Kolinski, Telomerase Complex Andrzej Kloczkowski & Drena Dobbs 16:30 Coffee Session 4 From protein structure to mechanism (Chair: Roland Dunbrack) 17:00 17:20 17:40 Channeling protein structure analysis towards Ilan Samish & William F. DeGrado understanding cough dynamics Classification of mechanistically diverse enzyme 31 superfamilies according to similarities in reaction Daniel Almonacid & Patricia C. Babbitt mechanism Discussion Panel I: Dynamics is all? (Moderators: Ivet Bahar, Tanja Kortemme & Yaoqi Zhou ) 67 14 15 18:30 19:00 Dinner (7:00 Reception, 7:45 Dinner) 21:00 K5 I am not a PDBid I am a Biological Macromolecule 4 Philip Bourne 7 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Zooming in: proteins, residues, atoms, cofactors and drugs Day 2 Time ID Title Presenting author Page Session 5 From protein stability and flexibility to folding and design (Chair: Yaoqi Zhou) 9:00 K6 9:30 26 9:50 50 10:10 9 Conformational flexibility and computational protein design sequence diversity 11:30 11:50 12:10 15:30 7 Ivelin Georgiev, Cheng-Yu Chen & 16 Bruce Randall Donald Poing: a fast and simple model for protein structure Benjamin Jefferys, Lawrence Kelley & 17 prediction Michael Sternberg Proteins: coexistence of stability and flexibility (MGMS Shlomi Reuveni Rony Granek & Joseph 18 awardee) Klafter Coffee Session 6 Ligand binding prediction and analysis (Chair: Warren Gallin) Hits, Leads & Artifacts from Virtual and High-Throughput Brian Shoichet K7 Screening Predicting small ligand binding sites on proteins using low17 Andrew Bordner resolution structures Scoring confidence index: statistical evaluation of ligand Maria Zavodszky, Andrew Stumpff19 Kan, David Lee & Michael Feig binding mode predictions Functional insights from binding sites similarities 22 complement existing methods for prediction of protein Rafael Najmanovich & Janet Thornton function 12:30 15:00 Tanja Kortemme Algorithms for protein design 10:30 11:00 in 7 19 20 20 Lunch & Poster/Laptop session (even ID numbers) Session 7 New algorithms - from docking to drug discovery (Chair: Graham Wood) Michael Sternberg, Stephen Muggleton, Ata Amini, Huma Lodhi, 14 Logic-based drug discovery David Gough & Paul Shrimpton Conformational free energy of protein structures: computing Hetunandan Kamisetty & Christopher 43 upper and lower bounds Langmead 15:50 6 Crystal contacts as nature's docking solutions 16:10 18 Vibin Ramakrishnan, Saeed Salem, Geofold: a mechanistic model to study the effect of topology Saipraveen Srinivasan, Wilfredo Colon, on protein unfolding pathways and kinetics Mohammed Zaki & Chris Bystroff Eugene Krissinal 16:30 22 23 24 25 Coffee Session 8 Residue level structure prediction (Chair: BK Lee) 17:00 38 The next generation of the backbone dependent rotamer library 17:20 65 Two stage residue-residue contact predictor 17:40 Discussion II - Ligand binding (Moderator: Brian Shoichet) 18:25 Closing Remarks – Rafael Najmanovich & John Moult 18:30 End of 3DSig 2007 5 Maxim Shapovalov & Roland Dunbrack 26 George Shackelford & Kevin Karplus 27 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada proteins without clear similarities with proteins of known structure and function. Surprisingly far less attention has been dedicated to the prediction of function, i.e. binding sites, in proteins with clear homologs of known structure (homology based function prediction), a non-trivial problem that is of direct interest for experimental biologists. In this presentation I will review some of the methods and resources that my group has developed in this area (López G, Valencia A, Tress ML. firestar--prediction of functionally important residues using structural templates and alignment reliability. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W573-7. Lopez G, Valencia A, Tress M. FireDB--a database of functionally important residues from proteins of known structure. Nucleic Acids Res. 2007 Jan;35(Database issue):D219-23.) A second large area of activity is the one related with the prediction of functional sites and more specifically the detection of binding specificity sites that regulate the differential interaction of proteins with specific substrates/effectors in the context of large protein families. A full range of methods for analysis of the variation in multiple sequence alignments have been published. Still it is fair to say in general we still do not sufficiently understand the basic principles behind the organization of specificity sites. I will present here our recent efforts to analyze systematically the characteristics of specificity sites in large collections of protein families and structures (Rausell et al., in preparation). Finally, the third field in which significant progress has been made in the recent years is the extraction of functional information directly from the scientific literature. I will review the current status of the text-mining methodology applied to biological problems (Krallinger M, Hirschman L, Valencia A. Linking genes to literature: text-mining, information extraction and retrieval applications for Biology. Genome Biology 2008, in press), describe some of the current efforts to integrate text mining methods in function prediction pipelines (Krallinger M, Rojas AM, Valencia A. Creating reference datasets for Systems Biology applications using text minino. New York Acad Sci. 2008. In press), and its application to specific biological problems (Krallinger et al., in preparation). The integration of the methods developed in these three area and many other new and old function prediction strategies remains certainly as a key future challenge. K4: CHROMATIN STRUCTURE INSIGHTS REVEALED BY MESOSCALE MODELING Tamar Schlick and Gaurav Arya in collaboration with S. Grigoryev, S. Correll, and C. Woodcock (New York University) Eukaryotic chromatin is the fundamental protein/nucleic acid unit that stores the genetic material. Understanding how chromatin fibers fold and unfold in physiological conditions (divalent ions, with linker histones) is important for interpreting fundamental biological processes like DNA replication and transcription regulation. Using a mesoscopic model of oligonucleosome chains and tailored sampling protocols, we elucidate the energetics of oligonucleosome folding/unfolding and the role of each histone tail, linker histones, and divalent ions in regulating chromatin structure. KEYNOTE ABSTRACTS K1: TOWARD ELUCIDATING ALLOSTERIC MECHANISMS OF FUNCTION VIA STRUTUREBASED ANALYSIS OF PROTEIN DYNAMICS Ivet Bahar (University of Pittsburgh) ____________________________________________ K2: ON THE NATURE OF PROTEIN FOLD SPACE: EXTRACTING FUNCTIONAL INFORMATION FROM APPARENTLY REMOTE STRUCTURAL NEIGHBORS Donald Petrey, Markus Fischer & Barry Honig (Howard Hughes Medical Institute and Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University) It has become increasingly apparent that geometric relationships often exist between regions of two proteins that have quite different global topologies. In this report, we examine whether such relationships can be used to infer a functional and evolutionary connection between the two proteins in question. Our results indicate that there are often unexpected functional similarities between proteins that would normally be considered to be structurally dissimilar. This suggests that, in analogy to protein sequence motifs, locally similar geometric regions can be used to infer functional relationships. The development of methods that can detect common structural motifs should significantly enhance our ability to extract information from structural and functional databases. K3: PREDICTION OF FUNCTIONAL CHARACTERISTICS FROM STRUCTURE, SEQUENCE AND PAPERS Alfonso Valencia (Spanish National Cancer Research Centre) The limitations of the current function prediction methodology, which is essentially based on the extrapolation from database annotation of similar sequences, are well known (López G, Rojas A, Tress M, Valencia A. Assessment of predictions submitted for the CASP7 function prediction category. Proteins. 2007;69 Suppl 8:16574 and Valencia A. Automatic annotation of protein function. Curr Opin Struct Biol. 2005 Jun;15(3):267-74. Review). Considerable scientific efforts are dedicated to the development of computational methods that work outside this paradigm and extract information from alternative sources. I will focus in this talk in three areas of Bioinformatics in which my group has done some recent contributions. The extrapolation of functional annotations, in particular binding and catalytic sites, from the analysis of conserved structural features is one of the more challenging fields of Structural Bioinformatics. The availability of large collection of proteins of known structure and poorly characterized functions have channeled most of the efforts towards the very hard problems of predicting function for 6 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada can begin to answer. Questions such as, how pervasive are references to structure across the biomedical literature? What can be extracted that provides valuable automated annotation? What previously unexpected and meaningful associations can be made between structures by virtue of their co-occurrence in the literature? The majority of the structural biology literature has not been open until now, so these are more questions for the future than today, but we will discuss some initial findings and work that is being done to leverage these associations [4]. Providing more of an identity to a macromolecular structure does not necessarily come from the literature, but can come from the community at large. Efforts such as Proteopedia [5] exemplify this wisdom of crowds approach. Yet another approach that puts a human face on a structure is the notion of a mashup where the traditional content as found in a database or journal article is combined with multimedia content to create a different kind of learning experience [6,7]. Will this profoundly change how we study 3D structure in the future? Time will provide the answer to this question, but we at least believe that in the vernacular of The Prisoner, number six will be identified for who he really is in the next few years. [1] K. Henrick, Z. Feng, W.F. Bluhm, D. Dimitropoulos, J.F. Doreleijers, S. Dutta, J.L. Flippen-Anderson, J. Ionides, C. Kamada, E. Krissinel, C.L. Lawson, J.L. Markley, H. Nakamura, R. Newman, Y. Shimizu, J. Swaminathan, S. Velankar, J. Ory, E.L. Ulrich, W. Vranken , J. Westbrook, R. Yamashita, H. Yang, J. Young, M. Yousufuddin, H.M. Berman 2008 Nucleic Acids Research. 36: D426-D433. [2] N. Deshpande, K.J. Addess, W.F. Bluhm, J.C. MerinoOtt, W.Townsend-Merino, Q. Zhang, C. Knezevich, L. Chen, Z. Feng, R. Kramer Green, J.L. Flippen-Anderson, J. Westbrook, H.M. Berman and P.E. Bourne 2005 The RCSB Protein Data Bank: A Redesigned Query System and Relational Database Based on the mmCIF Schema Nucleic Acids Research. 33: D233-D237. [3] P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLoS Comp. Biol. 1(3) e34. [4] J.L.Fink, S. Kushch, P. Williams and P.E.Bourne 2008 BioLit: Integrating Biological Literature with Databases Nucleic Acids Research. 36: W385-W389. http://biolit.ucsd.edu. [5] Eran Hodis, Eric Martz, Jaime Prilusky and Joel Sussman 2008 http://www.proteopedia.org. [6] J.L. Fink and P.E.Bourne 2007 Reinventing Scholarly Communication for the Electronic Age. CT Watch, 3, 26-31. [7] P.E.Bourne, J.L.Fink, M.Gerstein 2008 Open Access: Taking Full Advantage of the Content PLoS Comp. Biol. 4(3) e1000037. K6: CONFORMATIONAL FLEXIBILITY AND SEQUENCE DIVERSITY IN COMPUTATIONAL PROTEIN DESIGN Tanja Kortemme (New York University) The overall compact topologies reconcile features of the zigzag model with straight linker DNAs with the solenoid model with bent linker DNAs for optimal fiber organization and reveal a dynamic synergism of internal and external factors in chromatin compaction. K5: I AM NOT A PDBID I AM A BIOLOGICAL MACROMOLECULE Philip E. Bourne, Parker Williams and J. Lynn Fink (Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego) For the few in this audience who can see the parody between the title of this talk and the quotes from The Prisoner "I am not a number I am a person" or “I am not a number I am a free man” the theme of my talk may be apparent. For the remainder, may I first suggest you take a look, as we all do for so many things these days, at the Wikipedia page http://en.wikipedia.org/wiki/-The_Prisoner . Somewhat of a parody within a parody as the wisdom of crowds is also featured in this talk. In The Prisoner, Patrick McGoohan (left) strived to have his true identity recognized, so it is with a macromolecular structure in the Protein Data Bank (PDB). While strides have been made to create a better identity for a PDBid through the wwPDB remediation effort [1], and these will be summarized, PDB entries remain somewhat featureless, some would say unannotated with respect to function, structural features interactions with other proteins and so on. Each site supporting the same primary raw PDB data creates something of an identity, typically through the associated UniProt sequence which provides the necessary association to a variety of biological resources [2]. Notwithstanding, either little else is known about the structure, as is true of many structures determined not through a functional motivation, but via structural genomics, or what is known is found only in the literature. In other words the data resides in one place, typically a database, and the knowledge associated with that data resides somewhere else, typically in one or more journal articles [3]. This makes comprehending the full meaning of a structure more difficult than it need be. The issue becomes how to break this tradition either pre or post the deposition/publication process? Pre is hard because it involves changing scientists’ perceptions of what constitutes a database entry versus a publication. Post has just been made a little easier with the emergence of open access (OA) publishing. Among other things OA implies that journal articles will contain associated metadata amenable to manipulation by computer. This is not quite the same as text mining, which relies solely on establishing syntactic and semantic relationships in written text. Here additional tagging at the various stages of authoring and publication can be bought to bear. This provides some interesting prospects and raises some interesting questions which we K7: HITS, LEADS, AND ARTIFACTS FROM VIRTUAL AND HIGH-THROUGHPUT SCREENING Brian Shoichet (UCSF) 7 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Our work aims to improve structure-based function prediction methods, such as FEATURE [1], by coupling them to structural diversity generating methods, such as Molecular Dynamics (MD) simulations. Our test function was Ca2+ binding and our test set consisted of 5 molecules, with two structures for each molecule: a HOLO (with Ca2+ present in the structure) and an APO (no Ca2+ present in the structure). For each structure, a 1 nanosecond MD simulation with explicit solvent was created using GROMACS [2] software suit. For each system, 401 structures were extracted from the simulation trajectory, one every 2.5 picoseconds. Based on physico-chemical properties across several concentric spherical shells FEATURE determines whether a 3D structure contains a local environment that resembles a site of interest, which for this work is a calcium binding site. Using FEATURE we scanned the structures generated over the course of simulation over a 1Ǻ grid, identifying potential centers of calcium binding. Several such points were identified in each structure. This posed a new challenge: to determine whether the identified points represented a single putative calcium binding site or several, within a single structure and among all structures generated by MD for each ensemble. Such analysis is important in order to identify true positive results. In general terms this challenge exists whenever sites need to be tracked within a set of structures generated by methods that explore conformational space. Slight side chain deviations preclude simple geometric comparisons between points in Cartesian space within different structures in the structural ensemble generated by MD simulations. We propose the following clustering scheme as a plausible solution to this challenge. First, FEATURE hits are compared in the bounds of their respective structures. A Wilcoxon distance (z-value) between all the pairs of hits within each structure is calculated using the paired Wilcoxon rank sum test based on the 50 atoms closest to each of the hits. Given a Wilcoxon distance cut-off, all the hits for this structure are clustered, and the Cartesian coordinates of the centers of the newlyformed clusters are calculated. Second, the cluster centers from all structures are compared. A Wilcoxon distance between all the pairs of cluster centers is calculated using the paired Wilcoxon rank sum test based on the 50 atoms closest to those centers in respective structures. Then the cluster centers are clustered based on a given Wilcoxon distance cut-off to form super-clusters. These superclusters represent the number of independent sites identified by FEATURE coupled to MD simulations as putative calcium binding sites. Additionally, super-clusters can be related to Ca2+ binding sites as related to the location of the bound Ca2+ ions in the HOLO structures. In our dataset, there were 12 Ca2+ binding sites in the HOLO structures and 11 equivalent Ca2+ binding sites in the APO structures; one site in a single APO structure is destroyed by mutations. By itself, FEATURE identified 7 sites in the HOLO and 3 sites in the APO structures. When coupled with structural ensembles, FEATURE identified 10 sites in the HOLO and 6 sites in the APO structures. As such, we observed a 60% improvement in sensitivity when ORAL PRESENTATION ABSTRACTS 64: 4D STRUCTURE-BASED FUNCTION PREDICTION Dariya S. Glazer (Genetics Department, Stanford University, USA), Randall J. Radmer (SIMBIOS National Center, Stanford University, USA) and Russ B. Altman (Departments of Bioengineering and Genetics, Stanford University, USA). Structural dynamics of molecules play an important role in function execution, and as such should be considered by structure-based function prediction methods. We demonstrate the value of coupling molecular dynamics to function prediction methods, and propose a solution to the challenge of comparing 3D environments in equivalent structures. There are numerous computational methods which may assist experimental efforts in predicting molecular function. These methods rely on sequence and or structural similarity which can exist in molecules that perform similar functions. Structure-based methods depend on correctness of 3D structural models generated by X-ray crystallography or Nuclear Magnetic Resonance (NMR) spectroscopy. Unfortunately, the validity of many such structures suffers from inherent limitations of the methods used to generate them: crystal packing conditions, experimental modifications, averaging of coordinates, solvent composition. With the increasing number of structures being solved by the Structural Genomics initiatives, which do not bear similarity to already known folds, it is imperative that function prediction methods overcome limitations imposed on them by imperfect static structures. Typically, in order to assign putative function, function prediction methods scan a single 3D structure of a molecule. However, molecules are not static entities, and the intramolecular dynamics are very important for molecular function. Therefore, coupling function prediction methods to molecular dynamics may improve their performance. Several methods exist that explore conformational space of molecules. These methods generate ensembles of structures that allow glimpses at the dynamic motions of molecules. When coupled to structure diversity generating methods, function prediction algorithms would examine many structures for each molecule, and thus have many opportunities to assign function correctly. 8 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada molecular dynamics were considered by the function prediction method. In order to validate these results, we explored another Ca2+ binding site prediction method based on valence by Nayal et al. [3]. In the same dataset the valence method identified 1 site in the HOLO and 0 sites in the APO structures. When coupled to the same structural ensembles as FEATURE examined, the valence method identified 10 sites in the HOLO and 1 site in the APO structures. With this method, we observed a 1000% improvement in sensitivity when it took into account dynamics of the molecules. In our work, we have demonstrated that performance of structure-based function prediction methods can be improved by considering the dynamic nature of molecules. Additionally, we proposed a solution to the challenge of identifying equivalent 3D environments in spatially distributed structures that are otherwise identical. REFERENCES: 1. Halperin, I., Glazer, D.S., Wu, S., and Altman, R.B., The Feature Framework for Protein Function Annotation: Modelling New Functions, Improving Performance, and Extending to Novel Applications. BMC Genomics, 2008(In print). 2. Lindahl, E., Hess, B., and Spoel, D.v.d., Gromacs 3.0: A Package for Molecular Simulation and Trajectory Analysis. J Mol Modeling, 2001. 7: p. 306. 3. Nayal, M. and Cera, E.D., Predicting Ca2+-Binding Sties in Proteins. Proc Natl Acad Sci USA, 1994. 91: p. 817. Dramatic improvements in high throughput sequencing technologies have lead to a substantial increase in wholegenome sequencing projects. The rapid growth in sequenced genomes is leading to radical changes in our understanding of genomics and provides unparalleled opportunities for research. However, while genome-sequencing projects are generating almost unimaginable numbers of protein sequences, these sequences are not annotated with functional information. The spectacular increase in unannotated sequences is widening the gap between sequenced genes and known protein functions. Experimental procedures for characterising protein function are expensive, time consuming and difficult to automate, so researchers are turning increasingly to computational annotation to close the gap. Providing functional annotations for the torrent of new sequence information is one of the greatest challenges facing computational biology today and it is clear that function prediction is becoming an increasingly important field. Function assignment is far from simple. Although functional annotations can be transferred by homology, a common evolutionary origin does not guarantee identical function and the more distant the evolutionary relationship, the less reliable the transfer will be. Although protein 3D structure can be of use in predicting function, predicting function for proteins with known structure still presents researchers with problems. While structure may be conserved within a superfamily of proteins, it is not always true that function is conserved to the same extent. Function prediction was included in CASP6 for the first time with the aim of discovering whether computational methods could use 3D structure to add useful molecular or biological information to the target proteins. However, CASP is an experiment that evaluates the state of structure prediction and is based on structures that can be hidden from the predictors, thus making predictions blind. The same cannot be done with the function prediction category. The assessment of function was hampered by the lack of new functional information. In fact, with the exception of bound ligands, the assessors had no more functional information at the end of the experiment than was available to the predictors during the experiment. One other somewhat surprising development was the low number of predicting groups that entered the function prediction experiment. The prediction of function is an important and growing field, as evinced by the numbers of GO-based prediction servers that are already working or in development, so it was unfortunate that so few groups were prepared to participate in the experiment. It is almost certainly true that the slow release of functional information that hampered the assessment was also the cause of this low turnout. There are a number of difficulties in running a function prediction assessment in CASP, and the CASP assessment format and the slow release of functional information is not ideal for a rapidly developing field where predictors need to make use of the results and the evaluation in order to refine their methods. The main problem for an experiment like 68: AN AUTOMATIC SERVER FOR FUNCTION PREDICTION EVALUATION Michael Tress (Spanish National Cancer Research Centre, Spain), Alfonso Valencia (Protein Design Group, Centro Nacional de Biotecnologia, Madrid, Spain), Michael Sternberg (Imperial College London, UK) and Mark Wass (Imperial College London, UK) Whole-genome sequencing projects are generating unannotated sequences in increasing numbers. There is a great deal of interest in predicting function for these proteins and many groups are developing methods to predict GO functional terms. Here we present a server that will perform a continuous assessment of structurebased function prediction methods. 9 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada in proteins [3], they are very clade specific and unique enough to build accurate evolutionary trees [4]. A larger fraction of proteins in eukaryotes than prokaryotes are multi-domain; the trend formally known as ‘domain accretion’. Domain accretion is believed to reflect the increasing complexity brought about by domain multiplicity and the formation of novel domain combinations [5]. Most studies of multi-domain proteins to date have been confined to the structural basis of their formation with limited analysis of the relationship to function e.g. [3, 6, 7]. The function approach was taken to some extent by George and co-workers, with the integration of catalytic activity (a small fraction of the function space) to protein domains [8]. However, their work was focused on single domain functions and ignores the interaction between domains, an important consideration in a predominantly multi-domain repertoire. Here, we have developed a novel domainfunction map, encompassing single and multi-domain combinations, providing the first comprehensive examination of the structure-function relationship from a multi-domain perspective. The study has allowed a numerical approach for the systematic analysis of the structural basis for functional change, which has been made possible by the recent development of a graph based function ontology database, the Gene Ontology resource [9]. The domain-function map integrates a domain-to-sequence and a function-to-sequence map, using co-occurrence scores to associate domain combinations with functions. SWISSPROT sequences [10] are assigned and represented as a combination of SCOP domains [2] using homology detection procedures [11,12,13,14], and functionally annotated using the GOA [15] database. The sequences are clustered using CD-HIT [16] such that no two sequences have greater than 40% sequence identity, to prevent over representation of domain combinations. The domain combinations are associated with functional terms using cooccurrence ratios to normalise for convergent (functions maybe performed by more than one domain combination) and divergent (domain combinations may perform multiple functions) evolution. The SCOP superfamily database [2] is used for domain representation, while the Gene Ontology resource [9] provides functional description. Functional diversity follows a Pareto distribution with most domain combinations encoding a few functions, while a few are very functionally diverse. The domains central in the domain-function network are also central in the protein interaction network [17] and in taxonomic distribution [18]. The most functionally promiscuous domains in the repertoire include the P-loop NTP hydrolase and Rossmann domains. Our results show that functional diversity decreases with increasing number of domains within a combination (multi-domain combinations perform a more limited set of functions compared to single domains), while the architectural specificity of functions (the number of domain combinations that perform a particular function) increase when coded by combinations with increasing number of domains (multi-domain combinations perform more architecturally specific functions, with less evolutionary convergence, than single domains). CASP is the fact that it may take several years for functional annotations to be known. After CASP 6 and 7 the need to organize a more effective blind function prediction category is obvious. The prediction of function is important and it is crucial that it is properly assessed. We are developing a server to assess the prediction of function in a continuous fashion. The server will be similar in concept to the EVA/LiveBench structure prediction evaluation servers in that the assessment will be automatic and built on updates from the PDB. Servers will have to predict GO terms for each of the sequences, but since the function will not immediately be known the assessment will take place some time after the release and will be revisited periodically. The predictions will be assessed with a range of methods, since there is no single definitive method to assess GO term prediction. The server will assess the prediction of GO Molecular Function, Cellular Component and Biological Process terms where possible, and targets will be handicapped by prediction difficulty. 16: THE EVOLUTION OF PROTEIN FUNCTION DRIVEN BY A MULTI-DOMAIN REPERTOIRE Syed Ali & Michael Sternberg (Imperial College London, UK) We present a novel map of protein domain (SCOP) combinations to functions (GO) using co-occurrence scores, to allow a pan-genomic analysis of functional evolution. Using simple metrics to define change in domain organisation and function we analysed functional transfer via domains, showing a clear correlation between domain combination and function. During the course of evolution, forms of life with increasing complexity have arisen, the driving force behind which has been the expansion of the protein repertoire giving rise to proteins with novel functions. Proteins in the repertoire have been formed by the genetic mechanisms of gene divergence, duplication and recombination [1]. These mechanisms are paralleled at the proteome level with the processes of domain divergence, duplication and recombination, where a domain is an evolutionary unit that folds independently in protein structure space [2]. The earliest evolution of the repertoire began with the ab initio formation of protein domains giving rise to single-domain proteins. However, later in the evolutionary process as domain recombination became a major force, multi-domain proteins became more prominent in protein space. Although only a tiny fraction (<0.5%) of all possible domain combinations are observed 10 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Furthermore, based on taxonomic diversity and the protein interaction network, we confer domain age and propose a simplistic simulation for the evolution of the domain repertoire. The simulation is used to measure the functional effects of domain divergence, duplication and recombination (defined on the basis of domain arrangements within proteins). Our analysis suggests that early in the evolutionary process domain divergence is the leading mechanism for functional diversity, while domain recombination becomes the major force as the repertoire expands. This suggests that the current stage of evolution is focused on domain multiplicity where domains are reused, possibly a more evolutionary ‘cost-effective’ approach for expanding the function space, and provides an explanation for the multi-domain protein repertoire we see today. Consequently, scientists have begun to discover genomes in which most proteins are the product of extensive recombination [6, 19]. With this focus on domain multiplicity it is important to understand the functional consequence of domain interactions within proteins. Using a modified hamming distance to calculate change in domain combination (that calculates the number of domains added or removed to change from one combination to another), and the directed acyclic graph provided by the Gene Ontology (GO) resource to measure functional change (as the shortest distance between two GO terms via a common ancestor), we show that change in domain composition causes a correlated change in function (see Figure), providing evidence for an evolutionary unit of function within the structural domain. The relation between domain combinations highlights the difficulty of inferring function for multi-domain proteins from knowledge of any single domain; it is important to decipher the set of domains within a sequence associated with a specific function. As such our domain combinationto-function map can provide a valuable tool for improved function prediction. REFERENCES: 1. Chothia C, Gough J, Vogel C, Teichmann SA. Science 1998, 300: 1701-1703. 2. Murzin AG, Brenner SE, Hubbard T, Chothia C. J. of Molecular Biology 1995, 247: 536-540. 3. Apic G, Huber W, Teichmann SA. J. of Structure & Functional Genomics 2003, 4: 67-78. 4. Yang S, Doolittle RF, Bourne PE. PNAS 2004, 102(2): 362-378. 5. Koonin E, Aravind L, Kondrashov A. Cell 2000, 101(6): 573-576. 6. Teichmann SA, Park J, Chothia C. PNAS 1998, 95: 14658-14663. 7. Apic G, Gough J, Teichmann SA. J. of Molecular Biology 2001, 310: 311-325 8. George RA, Spriggs RV, Thornton JM, Al-Lazikani B, Swindells MB. ISMB Bioinformatics 2004, 20 Suppl 1: I130-I136. 9. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C et. al Nuc. Acids Research 2004, 32: D258-261. 10. Wu CH, Apweiler R, Bairoch A, Natale DA, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R et. al Nuc. Acids Research 2006, 34: D187-189. 11. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Nuc. Acids Research 1997, 25: 3389-3402. 12. Schäffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF. Bioinformatics 1999, 15(12): 10001011. 13. Eddy SE. Bioinformatics 1998, 14: 755-763. 14. Bennet-Lovsey RM, Herbert AD, Sternberg MJ, Kelley LA. Proteins 2008, 70(3): 611-625. 15. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R. Nuc. Acids Research 2004, 32(1): D262-266. 16. Li W, Godzik A. Bioinformatics 2006, 22: 1658-1659. 17. Park J, Lappe M, Teichmann SA. J. of Molecular Biology 2001, 307: 929-938. 18. Park J, Bolser D. Genome Informatics 2001, 12: 135140. 19. Gough J, Karplus K, Hughey R, Chothia C. J. of Molecular Biology 2001, 313(4): 903-919. 66: PREDICTING DNA-BINDING AFFINITY OF MODULARLY DESIGNED ZINC FINGER PROTEINS Peter Zaback1 Jeffry D. Sander1, J. Keith Joung2, Daniel F. Voytas2 & Drena Dobbs1 (1Iowa State University, USA, 2 Center for Cancer Research, and Center for Computational and Integrative Biology, Massachusetts General Hospital; Department of Pathology, Harvard Medical School, USA 3 Department of Genetics, Cell Biology & Development and Beckman Center for Genome Engineering, University of Minnesota, Minneapolis, USA) Consisting of modular nucleic acid binding domains, C2H2 zinc finger proteins provide an excellent framework for engineering “customized” sequencespecific DNA binding proteins. We present new methods that accurately predict both in vivo and in vitro efficacies of zinc finger proteins engineered by modular design. Replication and regulated expression of information encoded in genomes require proteins that bind to DNA with high sequence specificity. Researchers have long sought to fully understand the molecular mechanisms underlying this specificity, with the goal of developing powerful new tools for both research and gene therapy. Zinc finger proteins 11 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada (ZFPs) bind specific DNA motifs using highly similar helical “finger” domains that recognize adjacent DNA triplets [1]. In the “modular assembly” approach to engineering novel zinc finger proteins, individual modules are assembled into a three-finger array expected to target a specific 9 bp target sequence. In practice, however, ZFPs engineered using this approach display a wide range of binding specificities and affinities and function with highly variable success rates [2]. Due to this erratic behavior, it was previously impossible to predict whether or not a particular ZFP would function in vivo. Here, we demonstrate that it is possible to predict which combinations of zinc finger modules are most likely to successfully target specific sites in genomic DNA, based on existing in vitro binding data for individual modules. Using previously characterized GNN-specific modules[3] in the standardized framework provided by the Zinc Finger Consortium[4] (http://zincfingers.org), we designed and assembled 27 different three-finger arrays and assessed their binding to cognate target sites in vivo using a quantitative bacterial two-hybrid assay. For 7 of the assembled ZFP arrays, we also directly measured binding affinities in vitro using fluorescence anisotropy. Our predicted DNA binding affinities were highly correlated with binding constants measured in vitro (r = 0.91) and in vivo (r = 0.80). Similar accuracy was achieved on an independently generated and tested set of 23 zinc finger proteins[2]. By providing the first validated system for ranking genomic target sites, this work should lead to significantly enhanced success rates for modularly designed zinc finger proteins. An updated server that facilitates ZFP design is available at: http://bindr.gdcb.iastate.edu/ ZiFiT/ [5]. REFERENCES: [1] Miller, J., McLachlan, A. D. & Klug, A. (1985) Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J, 4, 1609-1614. [2] Ramirez, C.L., Foley, J.E., Wright, D.A., Muller-Lerch, F., Rahman, S.H., Cornu, T.I., Winfrey, R.J., Sander, J.D., Fu, F., Townsend, J.A. et al. (2008) Unexpected failure rates for modular assembly of engineered zinc fingers. Nat Methods, 5, 374-375. [3] Segal, D.J., Dreier, B., Beerli, R.R. and Barbas, C.F., 3rd. (1999) Toward controlling gene expression at will: selection and design of zinc finger domains recognizing each of the 5'-GNN-3' DNA target sequences. Proc Natl Acad Sci U S A, 96, 2758-2763. [4] Wright, D.A., Thibodeau-Beganny, S., Sander, J.D., Winfrey, R.J., Hirsh, A.S., Eichtinger, M., Fu, F., Porteus, M.H., Dobbs, D., Voytas, D.F. et al. (2006) Standardized reagents and protocols for engineering zinc finger nucleases by modular assembly. Nat Protoc, 1, 1637-1652. [5] Sander, J.D., Zaback, P., Joung, J.K., Voytas, D.F. and Dobbs, D. (2007) Zinc Finger Targeter (ZiFiT): an engineered zinc finger/target site design tool. Nucleic Acids Res, 35, W599-605. 29: MINOR GROOVE ELECTROSTATICS PROVIDES A MOLECULAR ORIGIN FOR PROTEIN-DNA SPECIFICITY Remo Rohs (HHMI & Columbia University, USA), Sean West (Columbia University, USA),, Peng Liu (Columbia University, USA), and Barry Honig (HHMI & Columbia University, USA). Hox proteins confer specificity by reading the structure and electrostatic potentials of the minor groove. Local shape recognition is distinct from known readout mechanisms and is used by proteins binding AT-rich DNA. Base sequence induces structures that enhance negative electrostatic potentials and attract basic side chains into the minor groove. The molecular basis for protein-DNA recognition and its specificity is still widely unknown. Complexes of proteins from various families bound to DNA have been solved by means of X-ray crystallography and NMR spectroscopy. However, the molecular mechanisms through which proteins specifically recognize their DNA binding sites are only partially understood. Direct readout through specific contacts between amino acids and bases dominates recognition within the DNA major groove. Different base pairs account for specific patterns of hydrogen bond donors and acceptors in the major groove with thymine offering, in addition, a methyl group for hydrophobic contacts. Direct readout in the minor groove is limited because there is no differentiation in terms of the location of hydrogen bond donors or acceptors between A-T and T-A or between G-C and C-G base pairs. Indirect readout accounts for the recognition of the overall shape of a DNA binding site by proteins. Overall shape is a function of base sequence and comprises global deformation effects such as DNA bending. It has been shown for the papillomavirus E2 protein, for example, that its binding affinity is affected by base pairs which are not contacted by the protein but which facilitate bending that enables protein contacts with base pairs in other regions of the binding site [1, 2]. In a recent study of the Hox family of transcription factors, we have identified a third mode of protein-DNA recognition that involves recognition of minor groove shape [3, 4]. Hox proteins bind DNA by making nearly identical major groove contacts via the recognition helices of their homeodomains. In vivo specificity, however, depends on extended and unstructured regions that link Hox homeodomains to a DNA 12 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada and structure with electrostatic potential in the DNA minor groove as a result of shape-induced electrostatic focusing. Local shape recognition also explains the avoidance of TpA base pair steps in some transcription factor binding sites, an observation that we validated for the tumor suppressor protein p53. Our observation of the causal relationship between minor groove structure and enhanced negative electrostatic potentials reveals the biological function of Atract motifs. In addition, our results suggest recognition of local DNA shape as a novel readout mechanism crucial for proteins that bind DNA with narrow minor groove regions. [1] R. Rohs, H. Sklenar, and Z. Shakked, Structure 13, 1499-509 (2005). [2] Commentary on [1]: T. Siggers, T. Silkov, and B. Honig, Structure 13, 1400-1 (2005). [3] R. Joshi, J. M. Passner, R. Rohs, R. Jain, A. Sosinsky, M. A. Crickmore, V. Jacob, A. K. Aggarwal, B. Honig, and R. S. Mann, Cell 131, 530-43 (2007). [4] Commentary on [3]: S. C. Harrison, Nat. Struct. Mol. Biol. 14, 1118-9 (2007). [5] B. Honig and A. Nicholls, Science 268, 1144-9 (1995). [6] H. Sklenar, D. Wustner, and R. Rohs, J. Comput. Chem. 27, 309-15 (2006). 60: COMBINING PREDICTIONS OF PROTEIN STRUCTURE AND PROTEIN-RNA INTERACTIONS TO MODEL HUMAN TELOMERASE STRUCTURE bound cofactor, Extradenticle (Exd). Crystal structures were determined for one of the eight drosophila Hox proteins, Sex combs reduced (Scr), bound to its specific DNA sequence (fkh250) and a consensus Hox-Exd site (fkh250con*). Comparison of the structures of these two Hox-Exd-DNA ternary complexes demonstrates that the overall arrangement of the proteins is similar but additional Scr residues are ordered in the fkh250 complex. The intrusion of these residues into the minor groove is shown in Figures A and B with the accessibility surface of the DNA binding sites color-coded for shape. Specifically, an Arg and His residue insert into a narrow region of the fkh250 minor groove whereas they are disordered when presented with the fkh250con* sequence. Arg5 also inserts into the minor groove in a region where the groove is narrow in both sequences (blue plots in Figures C and D). The electrostatic potential is affected by the shape and charge distribution of macromolecules [5]. For both the fkh250 and fkh250con* sequences, there is a near-perfect correlation between minor groove width and the magnitude of the negative electrostatic potential (red plots in Figures C and D). This data reveals a relationship between groove geometry and the insertion of basic amino acids into the minor groove. This finding is particularly important as both minor groove contacts only seen in the fkh250 complex (Arg3 and His-12) were shown to be critical for specific invitro and in-vivo Scr properties. The recognition of local shape by a single protein implies that the DNA conformation being recognized is an intrinsic property of the base sequence, and thus, already prevalent in unbound DNA rather than induced by protein binding. That is, since the fkh250 and fkh250con* complexes only differ in their DNA sequence, the distinct minor groove shape in each must be a property of the base sequence. All-atom Monte Carlo simulations [6] of the free DNA binding sites predict a similar sequence-dependence of minor groove shape as seen in the crystal structures [3]. These simulations predict a single minor groove width minimum in fkh250con* and two minima in fkh250 (green plots in Figures C and D) as a result of different locations of the TpA base pair step in both sequences. Our results on HoxDNA recognition indicate that the intrinsically narrow minor groove of fkh250 induces an enhanced negative electrostatic potential, which in turn attracts the positively charged Arg/His pair. Our current studies focus on the question if the local shape recognition that we found for Hox proteins is of more general nature. Electrostatics calculations along with MC structure predictions of DNA binding sites indicate that homeodomain proteins are an example of a family that employs this readout mechanism. Homeodomains bind to Atracts, which are rigid AT-rich DNA regions of three or more consecutive ApT or ApA (TpT) base pair steps. Narrow minor grooves are a common structural feature of A-tracts. TpA steps break A-tract structure since they act as flexible hinges due to unfavorable stacking interactions. Our studies on Hox proteins have proven that the location of a TpA step is key for the intrinsic structure of a binding site. Strikingly, our data shows a correlation of A-tract sequence Ben A. Lewis1, Mateusz Kurcinski2, Deepak Reyon1, JaeHyung Lee1, Vasant Honavar1, Robert L Jernigan1, Andrzej Kolinski2, Andrzej Kloczkowski2 & Drena Dobbs1 (1Iowa State University, USA, 2University of Warsaw, Poland) Telomerase is a ribonucleoprotein enzyme pivotal in cellular senescence and aging. Despite its importance, high resolution structures of the enzyme with or without its RNA component have proved difficult to obtain. This study uses machine learning predictions of RNA binding sites, along with template-based and de novo protein structure prediction, to develop a tentative model for the holoenzyme. Telomerase is a ribonucleoprotein enzyme that adds telomeric DNA repeat sequences to the ends of linear chromosomes. The enzyme is pivotal in cellular senescence and aging, and because it is overexpressed in ~90% of human cancers, it is also a potential therapeutic target. Despite its importance, a high-resolution structure of the 13 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada telomerase enzyme has been elusive, with high-resolution structures of only two of its four protein domains having been determined: those of the Nterminal domain (TEN) and RNA binding domain (TRBD) from the telomerase reverse transcriptase subunit (TERT) of Tetrahymena thermophila (1,2). Structures of the reverse transcriptase (RT) and Cterminal (TEC) domains have not yet been reported. Moreover, while secondary and tertiary structural elements within the human telomerase RNA component (hTERC) have been identified through NMR spectroscopy (3), cocrystallization of telomerase with its intrinsic RNA component has not yet been accomplished. We have used sequence-based machine learning classifiers (Naive Bayes and SVM) to identify amino acid residues in telomerase that are likely to make direct contact with either DNA or RNA (4). More recently, we generated structural models for the human and yeast TEN domains by homology modeling and threading, using the experimentallydetermined Tetrahymena TEN structure as a template (5) and, based on comparative analyses, suggested that the RNAbinding surfaces of the human and Tetrahymena enzymes are likely conserved. Building on these initial studies, here we present: - structural models for all four telomerase protein domains, generated using an ultrafast coarse-grained CABS approach (6) for template-based modeling of each domain, followed by accurate all-atom molecular dynamics simulations for structural refinement - a comparison of our models of TEN and TRBD with experimentally-determined structures of the corresponding Tetrahymena protein domains - a preliminary model for the complete human TERT complex (lacking the RNA subunit), generated using a rigid docking procedure - a refined model for the complete human TERT complex, generated by performing CABS simulations covering all TERT domains, followed by a model selection procedure based on hierarchical clustering and all-atom refinement to produce the final model - preliminary results in which RNA-binding residue predictions are used to position folded portions of the human telomerase RNA component (hTERC) structure within the modeled protein complex Taken together, these results indicate that computational approaches can be used to gain valuable insight into the structure and function of ribonucleoprotein complexes for which high-resolution structural information is incomplete. (1) Jacobs et al., Nat. Struct. Mol. Biol. (2006), 13:218-225 (2) Rouda and Skordalakes, Structure (2007), 15:1403-1412 (3) Theimer et al., Mol. Cell. (2005), 17:671-682 (4) Terribilini et al., RNA (2006), 12:1450-1462 (5) Lee et al., Pac Symp. Biocomput. (2008), 13:501-512 (6) Kurcinski and Kolinski, J. Steroid Biochem. Mol. Biol. (2007), 103:357-360 67: CHANNELING PROTEIN STRUCTURE ANALYSIS TOWARDS UNDERSTANDING COUGH DYNAMICS Ilan Samish & William F. DeGrado (Department of Chemistry and Department of Biochemistry and Biophysics, University of Pennsylvania, USA) The M2 influenza proton channel is a major drug target of the flu virus as well as a model structure for membrane protein channels. Following recent X-ray and NMR structural elucidation, we utilized an array of structural bioinformatics methods to understand, and suggest a dynamic mechanism of this slow-conducting channel. Influenza virus infection is a major public health concern, causing significant morbidity, mortality, and economic losses worldwide. Not less important, this ion channel is among the smallest bona fide channels with full properties of ion selectivity and activation, thus providing a minimal model for studying channels. Mechanistically, the influenza virion is engulfed by a lung epithelial cell and compartmentalized into an endosome. The low pH of the endosome induces proton leakage into the virion via the M2 proton channel resulting in uncoating of the viral RNA [1]. Indeed M2 blockers, e.g. amantadine, were utilized as influenza drugs till the emergence of new strains that are generally resistant to this once commonly prescribed drug. Following the recent elucidation of the protein structure via X-ray crystallography [2] and via NMR [3] we aimed at gaining insight into a possible mechanism; especially as the proposed models exhibit marked differences [4] and as the full dynamic mechanism is yet to be deciphered. We were aided by the fact that each one of the four transmembrane helices was crystallized in a different conformation within the assymetric tetramer, thus enabling to construct four symmetric models, each in a different conformation. As the specific focus was the dynamic properties, special emphasis was put on bioinformatic datamining of the crystallographic snapshots towards dynamic insight. Methods included distribution of normalized B-factors along the transmembrane helices, analysis of hydrogen bonds energetics and dynamics, analysis of local structural deformation with an emphasis on backbone tilting, normal mode analysis, distribution of pore radii and the local flexibility of the pore lining atoms. A comparative analysis was conducted to the different available structural models as well as to a hybrid of the high resolution crystal structure and the more complete NMR model was constructed. Further comparison was conducted to a structure that was simulated via molecular dynamics for 20 nanoseconds with explicit solvation. Cumulatively, the analysis suggests a dynamic mechanism for this slow channel that may act more like a transporter than like a channel. Backbone regions of elevated dynamics exhibit local deformations in the helical structures including 14 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada To capture information about enzymes that follow “chemistry-constrained” evolution, our group developed the Structure-Function Linkage Database (SFLD) [3]. The SFLD holds detailed information on the reactions catalyzed by members of six different mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, enolase, haloacid dehalogenase, terpene cyclase, and vicinal oxygen chelate. In total, the SFLD covers 6499 sequences, 392 structures and 165 different reactions. Each superfamily member maintains the ability to catalyze a key mechanistic step that is mediated by conserved active site residues and/or cofactors, but different families in these superfamilies use that common mechanistic step in different chemical reactions and/or with different substrates. Enzymes are typically classified using measures of similarity relating to sequence, structure and overall function. In the SFLD, enzymes are classified into superfamilies according to sequence and structure conservation and conservation of unique residues that catalyze the superfamily’s common mechanistic step. Conservation of additional residues is used to define subgroups and families within each superfamily. Recently, O’Boyle and colleagues developed a novel method that measures similarity of enzymes based upon the explicit mechanism of the catalyzed reaction [4]. This method opens a new avenue for classification of enzymes. Here, we have used measurements of reaction mechanism similarity to classify enzymes of the mechanistically diverse superfamilies in the SFLD. Each overall reaction is described as a sequence of mechanistic steps (or partial reactions). Each step is then represented as the set of bond changes occurring in the transformation from substrate(s) to product(s) in that step. Similarity between sets of bond changes for each possible combination of steps among two reactions is computed using Tanimoto coefficients and stored in a similarity matrix. To obtain the total similarity between sequences of steps (“step similarity” or “mechanism similarity”), an alignment of the steps is performed using the Needleman-Wunsch algorithm. To take into account the maximum possible similarity that can be calculated given the two reaction sequences under comparison, a new Tanimoto coefficient is computed using the number of steps in each reaction and the NeedlemanWunsch similarity as inputs. Additionally, similarity of overall reactions is computed, also using Tanimoto coefficients, by representing the set of bond changes occurring in the transformation of the overall substrate(s) to overall product(s) of the reaction catalyzed (‘overall similarity”). In our study, reversibility of enzyme reactions is considered explicitly by inverting the bond changes in each set of bonds and by inverting the order of the steps in the reaction sequences. Our results quantitatively show that for mechanistically diverse enzyme superfamilies, the overall reactions can vary greatly, but the similarity among reaction steps is always high. We use as an example chloromuconate cycloisomerase and dipeptide epimerase, both members of the muconate cycloisomerase subgroup of the enolase superfamily. The former enzyme catalyzes the cycloisomerization of dynamic bifurcated hydrogen bonds. Unlike other four-helix bundel channels, this protein does not exhibit large concerted backbone-mediated interhelical sliding motions and does not exhibit a constitutive 'open' conformation The mechanism agrees with previous experiments and provides a starting point for further mutational analysis and biophysical characterization. Moreover, the newly derived local structure-function-dynamics relationships provide important insight for the continuing efforts to develop drugs to this important disease. REFERENCES 1. Pinto, L.H. and R.A. Lamb, The M2 proton channels of influenza A and B viruses. J Biol Chem, 2006. 281(14): p. 8997-9000. 2. Stouffer, A.L., et al., Structural basis for the function and inhibition of an influenza virus proton channel. Nature, 2008. 451(7178): p. 596-9. 3. Schnell, J.R. and J.J. Chou, Structure and mechanism of the M2 proton channel of influenza A virus. Nature, 2008. 451(7178): p. 591-5. 4. Miller, C., Ion channels: coughing up flu's proton channels. Nature, 2008. 451(7178): p. 532-3. 31: CLASSIFICATION OF MECHANISTICALLY DIVERSE ENZYME SUPERFAMILIES ACCORDING TO SIMILARITIES IN REACTION MECHANISM Daniel E. Almonacid & Patricia C. Babbitt (UCSF, USA). We classify enzymes from mechanistically diverse superfamilies in the StructureFunction Linkage Database using a novel algorithm that quantifies similarity in reaction mechanisms. We conclude that traditional approaches of classification of enzymes based on structure and function similarity are effectively complemented by clustering according to reaction mechanism. During evolution, gene duplication and sequence divergence generates functionally different but structurally related proteins. Gerlt and Babbitt cite three possible strategies that lead to divergence of function in homologous proteins [1]: (i) substrate specificity-constrained evolution (substrate specificity is conserved whilst chemistry changes); (ii) chemistryconstrained evolution (chemistry is conserved whilst the substrate specificity is changed); and (iii) active site-constrained evolution (neither chemistry nor substrate specificity is maintained, and the conserved active site residues support different reactions). Structural and functional analysis of the known protein universe suggests that of the three strategies, chemistry-constrained evolution is dominant [2]. 15 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Biochemistry, Duke University, USA) & Bruce Randall Donald (Computer Science Department, Duke University, USA) chlorinated muconates by forming a C-O bond to create a 5membered ring, and eliminating HCl by cleaving a C-Cl bond. The latter enzyme, instead, catalyzes the epimerization of dipeptides, with the preferred substrate often L-Ala-D/L-Glu. This overall transformation is attained by the cleavage of a C-H bond, and its re-formation from the opposite face of the double bond in the intermediate. The overall reaction similarity for this pair of reactions is zero according to our measure as no bond changes are shared between the reactions. In terms of mechanistic steps, however, the reactions are highly similar. Chloromuconate cycloisomerase catalyzes a two-step reaction, and dipeptide epimerase a three-step reaction. Despite the different number of steps, both reactions share an identical step consisting of the abstraction of the α-proton to a carboxylic acid in the substrate resulting in a stabilized enolate anion intermediate. Furthermore, the enol-to-keto tautomerization that occurs in the step after the proton abstraction is also shared by both enzymes. Compared to the traditional approach of classifying enzymes according to overall reaction similarity (such as that of the Enzyme Commission), the method based on step similarity is better able to capture these elements of functional conservation. Our results also indicate that divergence of sequence and active site residues does not necessarily imply divergence of reaction mechanism. This is the case, for instance, of Dtartrate dehydratase, enolase and o-succinylbenzoate synthase. These three enzymes are highly divergent and belong to different subgroups within the enolase superfamily, yet they share identical sets of bond changes in each of their two mechanistic steps. Conversely, we found that not all members of the same subgroup within a superfamily use the same mechanism to perform catalysis, as with the case of chloromuconate cycloisomerase and dipeptide epimerase discussed above. This implies that the relationship between sequence/structure and function is yet more complicated than previously envisaged. As chemistry-constrained evolution is the major player of divergent evolution, we expect our study to be useful for guiding functional annotation of new homologues of known superfamilies. To provide access to these results, work is underway to create a knowledgebase to validate and predict overall transformations and mechanisms of enzyme reactions and to help guide engineering of enzyme functions by identifying enzyme templates capable of catalyzing the key mechanistic step of a transformation. 1. Gerlt, J.A. and Babbitt, P.C. Annu. Rev. Biochem., 2001, 70: 209-246. 2. Bartlett, G.J.; Borkakoti, N. and Thornton, J.M. J. Mol. Biol., 2003, 331: 829-860. 3. Pegg, S.C.-H.; Brown, S.D.; Ojha, S.; Seffernick, J.; Meng, E.C.; Morris, J.H.; Chang, P.J.; Huang, C.C.; Ferrin, T.E. and Babbitt, P.C. Biochemistry, 2006, 45: 2545-2555. 4. O'Boyle, N.M.; Holliday, G.L.; Almonacid, D.E. and Mitchell, J.B.O. J. Mol. Biol., 2007, 368: 1484-1499. 26: ALGORITHMS FOR PROTEIN DESIGN Ivelin Georgiev (Duke University, Computer Science Department, USA), Cheng-Yu Chen (Department of We present a suite of provablyaccurate algorithms for computational protein design developed in our lab. We report the application of our algorithms to switch the substrate specificity of a nonribosomal peptide synthetase (NRPS) enzyme. Experimental tests on a set of the top in silico predictions showed the desired improvement in substrate specificity, confirming the feasibility of our approach. Background and Motivation Protein redesign aims at improving target protein properties, such as increasing the stability of the protein, switching an enzyme's specificity towards a non-cognate substrate, or redesigning the protein so that it will perform a completely novel function. Exhaustively testing protein mutations in vitro is infeasible, due to the enormous size of the space of possible mutations. Computational in silico approaches can efficiently and accurately explore the combinatorial space of candidate solutions, and have proven valuable for protein redesign and protein engineering. Typically, structure-based protein design approaches aim at identifying the single global minimum energy conformation (GMEC) for an input model consisting of a rigid protein backbone, rigid rotamers, and a pairwise energy function. Here we present K* (pronounced "K-Star") [1,2], a provably-accurate ensemblebased (as opposed to GMEC-based) algorithm for protein design and protein-ligand binding prediction. We further present MinDEE [2] and BD [3], provably accurate enhancements to the traditional Dead-End Elimination (DEE) algorithms that guarantee the identification of the GMEC with, respectively, continuously flexible rotamers and a flexible backbone. We describe additional techniques and approaches that are combined with our K*, MinDEE, and BD algorithms into a general suite for computational protein design. Approach K*: K* [1,2] is a statistical mechanics-derived algorithm that computes Boltzmann-weighted partition functions over energy-minimized conformational ensembles and generates a provably-accurate approximation to Kd, the binding constant for a given protein-ligand complex. For a given protein, a set of mutations, and a target substrate, K* computes a Kd approximation score for each candidate mutant with the target substrate (for computational efficiency, MinDEE (see below) and sophisticated pruning filters are applied during the mutation search). Mutants are 16 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada then ranked according to the computed scores; top-ranked mutants are predicted to have the desired specificity. MinDEE: The MinDEE theorems [2] extend their traditional DEE analogs to achieve provable correctness even when rotamers are not rigid and are allowed to flex and minimize from their initial conformations (as given by the rotamer library used). In our algorithm, rotameric energy minimization is performed by allowing the rotamer chi angles to flex within a predefined continuous voxel. The main difference between MinDEE and traditional DEE is that, by computing voxel-constrained ranges of energies instead of rigid energies, MinDEE takes into account possible energy changes during rotameric energy minimization. BD: Unlike traditional DEE and MinDEE, BD is provably accurate with backbone flexibility. BD places restraining boxes around each residue in a protein, in order to define a continuous family of backbone conformations with small phi/psi changes that nonetheless can cause global shifts in the backbone coordinates. Upper and lower bounds on the pairwise rotameric energy interactions are then precomputed within the defined restraining boxes and used to determine which rotamers are provably not part of the respective GMEC. An analogous approach, but for finite sets of backbone conformations defined by backrub-type motions, will be presented as part of the main conference program of ISMB 2008 [4]. Both MinDEE and BD are fully-compatible with K*, and can thus be used as pre-processing filters to prune the majority of the candidate mutations and conformations that must be subsequently evaluated by the K* ensemble-based partition function computation algorithm. We will describe additional computational and modeling approaches incorporated into our algorithms for improved computational efficiency and prediction accuracy. Results in computational tests, allowing additional rotamer/backbone flexibility as part of the protein design algorithms was shown to result in significantly lower-energy conformations than those generated by the rigid-rotamer/rigid-backbone traditional DEE-based algorithms. We applied our K* algorithm in a redesign to switch the substrate specificity of the adenylate ion domain of the NRPS enzyme GrsA-PheA from the wildtype substrate Phe towards several noncognate substrates. Experimental tests on a set of the top in silico predictions showed the desired improvement in substrate specificity, confirming the feasibility of our approach. REFERENCES 50: POING - A FAST AND SIMPLE MODEL FOR PROTEIN STRUCTURE PREDICTION Benjamin Jefferys, Lawrence Kelley & Michael Sternberg (Imperial College London, UK) Poing is a fast new model for template-free protein structure prediction based upon Langevin dynamics with novel models for physicochemical effects. We have tested it on a benchmark set and on the template-free CASP 7 targets, and we have found its performance is comparable to the best fragment folding methods. Over the last two years we have been developing a simplified approach to modelling protein folding. The original aim was to model protein evolution, but we are obtaining successful predictions for protein structure prediction. The model developed thus far reduces a protein structure to a string of C-alpha points, and for each of these a sidechain point representing the mean location of all the non-hydrogen sidechain atoms. This is similar to the Levitt & Warshel and the Scheraga approaches. The novelty in comparison to these methods lies in the increased detail of the force field and the solvent model, developed to represent the biophysical effects driving protein folding. The model is designed to predict structures through iterative simulation of a folding pathway which enforces a number of heuristic constraints inspired by biophysical effects known to be important for in vivo protein folding. It enforces these constraints using classical mechanics under the Langevin equation, involving forces between the particles representing the protein structure. A notable feature of the force field is that it is designed to maintain the stability of the native state. Three features of protein folding are modelled in a novel way. The steric force is designed to capture some of the subtleties of molecule packing in a complex force field between particles. The repulsive interaction between two particles depends upon the probability that atoms in an all-atom model of those particles a given distance apart would clash sterically, based upon analysis of sidechain and backbone conformations in the PDB. The polar interactions of the backbone (i.e. hydrogen bonds) are modelled by initially calculating the likely position of the O and H atoms involved in the interactions. Forces between the relevant backbone particles aim to bring the notional O and H atoms closer together. This novel model for polar interactions is a compromise between adding more particles representing the O and H sidechains, slowing down simulation, and having a simpler associative force between backbone points, which would ignore some important restrictions on how strands can be arranged into sheets. [1] R. Lilien, B. Stevens, A. Anderson, and B. R. Donald. J. Comput. Biol., 12(6-7):740-761, 2005. [2] I. Georgiev, R. Lilien, and B. R. Donald. J. Comput. Chem., [Epub ahead of print, 2008 Feb 21]. PMID: 18293294 [3] I. Georgiev and B. R. Donald. Bioinformatics, 23, i185i194, 2007. Special issue on ISMB 2007, Vienna, Austria. [4] I. Georgiev, D. Keedy, J. S. Richardson, D. C. Richardson, and B. R. Donald. Bioinformatics. Special issue on ISMB 2008, Toronto, Canada. 17 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Debye density of low frequency modes. Recently, however, it became clear that proteins can be described as fractals; namely, geometrical objects that possess self similarity [3,5]. Adopting the fractal point of view to proteins makes it possible to describe within the same framework essential information regarding topology and dynamics using three parameters: the number of amino acids along the protein backbone, the spectral dimension and the fractal dimension. Based on a generalization of the Landau-Peierls instability criterion and on a melting criterion for proteins, we derive a relation between the spectral dimension, the fractal dimension and the number of amino acids along the protein backbone. In words this relation states that for every protein the sum of the inverse of the fractal dimension and twice the inverse of the spectral dimension is equal to unity plus a constant, denoted "b", times inverse the logarithm of the number of amino acids. Deviations from this equation may render a protein unfolded. The fractal nature of proteins is shown to bridge their seemingly conflicting properties of stability and flexibility. The spectral dimension governs the density of low frequency normal modes, obtained using the Gaussian Network Model (described later), of a fractal/protein. More precisely, a power law relation, with the spectral dimension as exponent, holds between the cumulative density of modes and the frequency. Describing the mass fractal dimension is most convenient using a three dimensional example. Draw a sphere of radius "R" enclosing some lattice points in space and calculate their mass, increase "R" and calculate again. Do this several times and if the mass as a function of "R" scales as R to some power this power is called the mass fractal dimension. For a regular 3D lattice both spectral and fractal dimensions coincide with the usual dimension of 3. For proteins however, it is usually found that the spectral dimension is smaller than 2 and that the fractal dimension is smaller than 3 but larger than 2 leading to an excess of low frequency modes and a sparser fill of space. The parameter "b" in our equation weakly depends on temperature and interaction parameters and hence may be considered almost constant. Analysing the harmonic vibrations spectrum of proteins we rely on the Gaussian Network Model (GNM) [4]. The GNM considers proteins to be elastic networks whose nodes correspond to the positions of the alpha-carbons in the native structure and the interactions among nodes are modelled as homogeneous harmonic springs. An interaction between two nodes exists only if the nodes are separated by less than a prefixed distance known as the interaction cutoff. The cutoff distance is usually taken in the range of six to seven angstrom, based on the radius of the first coordination shell around residues observed in PDB structures. The only information required to implement the method is the knowledge of the native structure. GNM has been widely applied because it yields results in agreement with X-ray spectroscopy and NMR experiments. The physics behind our equation has its roots in a paper generalizing the Landau-Peierls criterion. Burioni et al showed that thermodynamic instability also appears in inhomogeneous structures and is determined by the spectral We have enhanced the standard implicit solvent model of the Langevin equation by ensuring that drag and kicks only act upon parts of a particle exposed to solvent. This ensures that the internal parts of a protein are not subject to solvent effects, a key advantage of modelling an explicit solvent. The solvent-accessible surface of each particle is modelled by a sphere centered on the particle position. For particles modelling the sidechains, the sphere radius depends upon the amino acid type. The backbone particle radii are all the same. The precise radii used have been optimised to maximise the difference in accessible area between known native and a set of non-native states for a small test set of proteins, the aim being to destabilise non-native states. The simplicity of the model leads to very fast folding which can be viewed as it proceeds using a custom visualisation tool. Proteins up to 90 residues can fold to a stable state in 5-20 minutes. Our current predictions use 100 fold replicates and require between 8 and 30 CPU hours. In CASP-like testing conditions, for our test set of 30 domains less than 90 amino acids, we predict structures exceeding TM-score 0.3 for 24 domains, and exceeding 0.4 for 4 domains. We also tested the system on 12 of the template-free targets from CASP 7, and we equal or better the world leading Rosetta server for six of the targets, using tens of hours of CPU time as compared to thousands of CPU hours used by Rosetta. 9: PROTEINS: COEXISTENCE OF STABILITY AND FLEXIBILITY Shlomi Reuveni (School of Chemistry, Tel-Aviv University, Israel), Rony Granek (Department of Biotechnology Engineering, Ben-Gurion University, Israel), Joseph Klafter (Tel Aviv University, Israel). We introduce an equation for proteins native topology based on GNM analysis of PDB data and a generalization of the Landau-Peierls instability criterion for fractals. The equation relates the number of amino acids with the fractal and spectral dimensions describing the protein fold and was tested successfully over 543 proteins [1]. Two seemingly conflicting properties of native proteins, such as enzymes and antibodies, are known to coexist. While proteins need to keep their specific native fold structure thermally stable, the native fold displays the ability to perform flexible motions that allow proper function [2]. This conflict cannot be bridged by compact objects which are characterized by small amplitude vibrations and by a 18 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada dimension. They demonstrated that for a spectral dimension smaller than 2, the mean square displacement of a structural unit (for example a single amino acid) in a system composed of N elements, diverges in the limit of large N. This result when used in tandem with a proper melting criterion for proteins leads to an equation describing the native protein fold. We are led to our equation from two different independent pathways to protein melting. The first approach utilizes the Gaussian Network Model (GNM). The melting of a protein is treated in this approach in a way similar to the melting of a solid crystal, with an additional assumption: surface residues initiate the melting process in proteins. Another approach is motivated by the viewpoint of a folded protein as a collapsed polymer. It introduces a nonLindemann criterion and a bond bending Hamiltonian rather than the GNM Hamiltonian used in the first approach. In order to test the validity of our equation, we calculated the spectral and fractal dimensions for a data set of 543 proteins. Calculations were preformed on known protein structures, all structures were downloaded from the Protein Data Bank (PDB). The proteins that were chosen may differ in function and/or source organism and represent a wide length scale ranging from 100 to 3000 residues. Statistical analysis of the data gathered reveals satisfying agreement with our equation. Furthermore, in contrast to [3], were the authors suggested a relation similar to ours, we are able to recover empirically the unity on the right hand side of our equation. The results are shown in the figure attached. One may wonder what will happen if a protein is forced to strongly deviate from our equation and how artificial deformations of the protein fold may lead to a breakdown of our relation. Strong deformations of the protein fold may actually happen in vivo as part of a natural process. A possible example is GroEL, a protein chaperon that is required for the proper folding of many proteins. Recent molecular dynamics simulations demonstrate the unfolding action of GroEL on a protein substrate. Our work provides a theoretical framework that may help understand GroEL induced unfolding. In addition our work opens new possibilities for nanoscale and biologically inspired engineering of catalysts, emphasizing the importance of internal motion. REFERENCES 1. S.Reuveni, R.Granek and J.Klafter, Proteins: Coexistence of Stability and Flexibility, Phys. Rev. Lett. 100, 208101 (2008). 2. D. Joseph, G.A. Petsko, M. Karplus, Science, 249, 1425, (1990). 3. R. Burioni, D. Cassi, F. Cecconi & A.Vulpiani, Proteins 55, 529 (2004). 4. T. Haliloglu, I. Bahar & B. Erman, Phys. Rev. Lett. 79: 3090 (1997). 5. R. Granek & J. Klafter, Phys. Rev. Lett. 95, 098106(1), (2005). 17: PREDICTING SMALL LIGAND BINDING SITES ON PROTEINS USING LOW-RESOLUTION STRUCTURES Andrew Bordner (Mayo Clinic, USA) The SitePredict method uses Random Forests to predict which protein residues bind specific metal ions and small molecules based on evolutionary conservation and spatial clustering of residue types. Because it requires only a backbone structure, the method performs well for unbound structures and can be applied to unrefined homology models. Specific non-covalently bound metal ions and small ligands, such as nucleotides and cofactors, are essential for the function and regulation of many proteins. However the identity of the natural ligands and their binding sites on a particular protein are often unknown, even if a highresolution structure is available. Computational prediction of ligand binding sites can be used to guide their experimental verification and thus save considerable effort. SitePredict is a machine learning based method for predicting binding sites of different metal ions and small ligands on low-resolution protein structures. Because ligands generally bind to residues that are non-contiguous in the amino acid sequence, prediction methods that use a protein structure, when available, are expected to perform better than sequence-only methods. Furthermore, because only residue-level information is required the method works well with apo structures and may be directly applied to homology models without side chain refinement. SitePredict uses Random Forests trained on binding site properties that include neighboring residue pair counts, local enrichment of residue types, evolutionary conservation, and a rough measure of solvent accessibility. Sites for metal ions are 10 residue clusters located throughout the entire protein whereas only surface pockets are considered as sites for small molecules. Additional information on the shape of the pocket, namely its volume and principal components, are also included for small molecule binding site prediction. Prediction training and validation was performed using a comprehensive non-redundant set of protein-ligand structures. A sufficient number of structures were found for six different metal ions and six different small molecule ligands. Prediction performance was assessed by the area the under the ROC curve (AUC) calculated from 10-fold cross-validation results. Also matching cross-validation training and test sets contained data for proteins from distinct sets of Pfam families in order to insure their independence. While prediction performance varied for 19 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada different ligands, the AUC remained at least 0.80 for all ligands. A realistic test using apo structures for binding site predictions resulted in only a small decrease in prediction performance, as expected for a method that does not rely on detailed atomic level structural information. Although the Random Forest classifier gives a binary prediction (binding or non-binding site) the output score contains more information. Higher values above the cutoff indicate more confident binding site predictions. A likelihood ratio derived from score histograms for each class was used to calculate the prediction confidence. These values can be used for prioritizing predictions for experimental testing. The Random Forest method can also estimate the relative contribution of each site property to the overall prediction accuracy. The top 10 most important properties were examined for each ligand. Evolutionary conservation was among the most important properties for all metal ions but appeared in the top properties only for ATP and NAD among the small molecules. Even so, removing conservation from the input data resulted in a small decrease in performance, showing that while it is important relative to other variables it does not contribute inordinately to the accuracy by itself. This is advantageous since about 20% of the proteins did not have enough homologous sequences for calculating evolutionary conservation. SASA and specific residue propensity and residue pairs were also found to be important for the metal ion predictions. The residue types contributing most to prediction accuracy were different for each ion and agreed with a previous analysis of common coordinating residues (Harding 2004). Properties related to the surface pocket shape and size were important for 4 out of the 6 small molecules. Interestingly, no residue propensities were among the most important properties for any small molecule, possibly due to the larger size of these sites compared with those for metal ions. However, residue types appearing in important residue pairs for ATP and NAD binding sites agreed with those in previously identified sequence motifs. Discrimination between different ligands was assessed by cross-prediction in which a model trained on one ligand is used to make predictions for proteins that bind a different ligand. The ability of SitePredict to distinguish between two different ligands was found to be non-symmetric, i.e. depend on which one was used for training. Calcium and magnesium were the most difficult to distinguish metal ions. This is probably related to the fact that some proteins can bind either ion at the same site. ATP and AMP were the most difficult to distinguish small molecules, presumably due to their chemical similarity. As a demonstration of the usefulness of SitePredict for function annotation, binding site predictions for uncharacterized proteins from PSI structural genomics projects were examined. Several examples were found in which the binding site predictions corroborated independent experimental evidence and led to a consistent functional assignment. 19: SCORING CONFIDENCE INDEX: STATISTICAL EVALUATION OF LIGAND BINDING MODE PREDICTIONS Maria Zavodszky, Andrew Stumpff-Kan, David Lee, Michael Feig (Michigan State University, USA). We developed a statistical approach to quantify the confidence users can have in the ability of a scoring function to rank docked ligand poses correctly without relying on any knowledge about correct binding modes. The method can successfully differentiate between protein-ligand complexes with funnel-like and flat binding energy landscapes. Protein-ligand docking programs can generate a large number of possible binding orientations for each ligand. The challenge is to identify the orientations closest to the native binding mode using a scoring method. We developed a confidence measure of scoring performance in ranking the docked ligand poses that does not rely on any knowledge about the correct binding mode. The method exploits the fact that an adequately performing scoring function captures the roughly funnel-like shape of the binding energy landscape, with scores generally improving as the docked ligand orientations get closer to the correct binding mode. For such cases, the correlation coefficient of scores versus distances is expected to be the highest when the most nativelike orientation is used as a reference. This correlation coefficient, called the correlize score, was calculated for each docked ligand pose and it was found to be a good indicator of how far the docking is from the orientation corresponding to the global minimum of the binding energy. The correlation coefficient between the original scores and correlize scores as well as the range of correlize scores were found to be good measures of scoring performance. They were combined into a single quantity, called the Scoring Confidence Index to quantify the confidence the user can have in the ability of a scoring function to rank the docked poses correctly. The diagnostic ability of the Scoring Confidence Index was tested on 50 protein-ligand complexes scored with three commonly employed scoring functions: AffiScore, DrugScore and X-Score. Binding mode predictions were found to be three times more reliable for complexes with Scoring Confidence Index values above 0.8 than for cases with lower values. This new confidence measure of scoring performance is expected to be a valuable tool for virtual screening applications. 22: FUNCTIONAL INSIGHTS FROM BINDING SITE SIMILARITIES COMPLEMENT EXISTING METHODS FOR THE PREDICTION OF PROTEIN FUNCTION 20 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada More recently, we utilized IsoCleft to ask whether nonhomologous binding sites that bind similar ligands can be discriminated by means of their binding site similarities (Najmanovich et al., 2008). This study showed that there exists a certain level of uniqueness across non-homologous binding sites. The ability to predict binding sites is very sensitive to knowing the identity of the binding site atoms within the cleft. As this information becomes less accurate, the more difficult it is to determine what ligand the protein may bind. In the present work we describe a database of cognate binding sites. Each binding site in the IsoCleft Database is defined as the atoms in contact with a cognate ligand and includes the respective residue’s C-alpha atoms. The database is a subset of the Procognate database (Bashton et al., 2008) selecting one example for each Pfam family/ligand combination where the ligand is at least 95% similar to a ligand present in the KEGG reaction for that protein family. For each Pfam cognate-ligand combination, we select the example with lowest solvent accessible surface area (McConkey et al., 2002). The IsoCleft Database contains 1198 examples comprising 508 Pfam families and 486 ligands. To demonstrate the usefulness of the IsoCleft method and database in providing complementary information to existing methods, we show here results for particular cases of structural genomics proteins with unknown function, for which the variety of current state of the art methods for the prediction of function from sequence and structure present in the ProFunc server (Laskowski et al., 2005), do not offer functional clues. The first example is that of PDB code 2pd0, a Cryptosporidium parvum protein of unknown function. Search against the IsoCleft database detects as the top two distinct Pfam hits the product and substrate analogs of the same purine nucleoside phosphorylase reaction in humans and E.coli respectively. The second and third examples correspond to cases where a specific function could not be suggested yet, clear ligand similarities exist between the cognate ligands that bind the top scoring binding sites and may serve as initial guesses for rational drug design or for narrowing down the space of potential functions. In the case of 1sed, a hypothetical protein from bacillus subtilis, the three top distinct Pfam hits to are bound to D-glutamic acid, glutamine and fumarate, three very similar molecules. In the case of 3d0j, a protein of unknown function and origin, the three top hits are all cofactors contain the AMP moiety. We are currently working on setting up a web-based interface to query the IsoCleft Database. While no method can be accurate in all possible cases, the examples shown here were specifically chosen to show the potential of the IsoCleft method and associated database as a valuable complement to the myriad of existing methods in the quest for ever more accurate predictions of function from structure. REFERENCES: Allali-Hassani A, Pan PW, Dombrovski L, Najmanovich R, Tempel W, Dong A, Loppnau P, Martin F, Thornton JM, Rafael Najmanovich & Janet M. Thornton (European Bioinformatics Institute, UK) The detection of binding site similarities may help pinpoint protein function and serve as a starting point for rational drug design. In the present work we describe the use of IsoCleft, a program to compare binding sites, and the associated IsoCleft database on structural genomics targets of unknown function. Current computational methods for the prediction of function from structure are focused on to the detection of similarities and subsequent transfer of functional annotation. Such similarities may reflect a distant evolutionary relationship as well as unique physico-chemical constraints necessary for binding similar ligands. IsoCleft is a graph-matching based method for the detection of 3D atomic similarities introducing two innovations that allow us to extend its applicability to the analysis of large all-atom binding site models. IsoCleft does not require atoms to be connected either in sequence or space. The first innovation is to perform the graph matching in two stages. In the first stage, an initial superimposition is performed via the detection of the largest clique in an association graph constructed using only C-alpha atoms of equivalent residues in the two clefts. This superimposition is used as a means to simplify the second all-atom graph matching stage in which only atoms within a certain distance threshold are considered as potentially correspondent. The second innovation introduced is the exploitation of the fact, noted by Bron & Kerbosch (1973), that the algorithm has the tendency to produce the larger cliques first in order to implement what we call Approximate Bron & Kerbosch. In the Approximate Bron & Kerbosch, the first clique is selected as the solution (and the search procedure stopped) rather than detecting all cliques in order to find the largest. Approximate Bron & Kerbosch allows us to obtain an optimal or nearly optimal solution in a fraction of the time that would be needed without noticeable effects on the results. In the past we have used IsoCleft to study the relation of binding site similarities and experimentally determined functional similarities within members of the Human Sulfotransferase family (Najmanovich et al., 2007; AllaliHassani et al. 2007). 21 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Edwards AM, Bochkarev A, Plotnikov AN, Vedadi M, Arrowsmith CH. Structural and Chemical Profiling of the Human Cytosolic Sulfotransferases. PLoS Biology (2007) vol. 5 (5) pp. e97. Bashton M, Nobeli I, Thornton JM. PROCOGNATE: a cognate ligand domain mapping for enzymes. Nucleic Acids Research (2008) vol. 36 (Database issue) pp. D618-22. Bron C & Kerbosch J. Algorithm 457: finding all cliques of an undirected graph. Communications of the ACM (1973) vol. 16 (9) pp. 575-577. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y. KEGG for linking genomes to life and the environment. Nucleic Acids Research (2008) vol. 36 (Database issue) pp. D480-4. Laskowski RA, Watson JD, Thornton JM. ProFunc: a server for predicting protein function from 3D structure. Nucleic Acids Res (2005) vol. 33 (Web Server issue) pp. W89-93 McConkey BJ, Sobolev V, Edelman M. Quantification of protein surfaces, volumes and atom-atom contacts using a constrained Voronoi procedure. Bioinformatics (2002) vol. 18 (10) pp. 1365-73. Najmanovich R, Kurbatova N, Thornton JM. Detection of 3D atomic similarities and their use in the discrimination of small-molecule protein binding sites. Bioinformatics (2008) vol. 24 (18) in press. Najmanovich R, Allali-Hassani A, Morris RJ, Dombrovsky L, Pan PW, Vedadi M, Plotnikov AN, Edwards AM, Arrowsmith CH, Thornton JM. Analysis of binding site similarity, small-molecule similarity and experimental binding profiles in the human cytosolic sulfotransferase family. Bioinformatics (2007) vol. 23 (2) pp. e104-9. 14: LOGIC-BASED DRUG DISCOVERY derive rules from experimental data of ligand activity. In blind trials on two GPCR targets, 50% of novel virtual hits exhibited inhibitory activity upon experimental screening. The application of Quantitative Structure Activity Relationship (QSAR) is a central tool in drug discovery and development due to its role in identifying key structural features for activity or toxicity. A variety of methods have been developed and each has its merits and limitations. Over the last few years we have been using a form of logic-based machine learning based on Inductive Logic Programming (ILP) combined with regression to derive QSARs. The approach is able to identify key chemical features from large datasets and to learn rules which can be understood by medicinal chemists. This talk will present a series of studies using logic-based machine learning in drug discovery. The first set of studies combined ILP with support vector programming in an approach termed SVILP. A QSAR describing inhibition of thermolysin had an Rsquared-CV (cross-validated squared Pearson correlation coefficient) of 0.79 compared to an industry-standard method Comparative Molecular Field Analysis (CoMFA) of 0.55. The learnt rules based on the structures of the inhibitors correctly identified features of thermolysin inhibition in accord with protein crystallographic results (see Figure). SVILP was also used to derive predictive rules for toxicology from the DSSTOX dataset of fathead minnow toxicity. SVILP yielded Rsquared-CV of 0.57 compared to an industry standard TOPKAT which yielded 0.26 (ref 1). The learnt rules provided insight into key chemical alerts for toxicity. The SVILP approach has also been applied to model protein-ligand interaction (3). Despite the increased use of protein-ligand docking in the drug discovery process due to advances in computational power, the difficulty of accurately ranking the binding affinities of a series of ligands docked to a protein remains largely unsolved. This problem has lead to the development of scoring functions tailored to rank the binding affinities of a series of ligands to a specific system. We have used SVILP to produce binding affinity predictions of a series of ligands to a particular protein. Our results show that SVILP performs comparably with other state of the art methods such as CoMFA on five protein ligand systems. The ability graphically to display and understand the SVILP produced rules is demonstrated. The above studies demonstrated the applicability of SVILP to generate accurate QSARs. A major challenge in drug discovery is to use a QSAR to identify active molecules from a database of possible molecules and thus to suggest novel molecules for progression through a hit to lead programme (i.e. virtual screening). In virtual screening, one aim is to identify molecules that are sufficiently chemically different from the currently known ligands (i.e. a novel chemotype) so that can be patented. In addition, novel chemotpes may exhibit different pharmacological effects including adverse side effects of current molecules. The SVILP approach has been recently been developed into a more general logic-based approach, known as Michael Sternberg (Imperial College London, UK), Stephen Muggleton (Department of Computing, Imperial College London, UK), Ata Amini (Equinox Pharma Ltd, UK), Huma Lodhi (Department of Computing, Imperial College London, UK), David Gough (Equinox Pharma Ltd, UK), Paul Shrimpton (Structural Bioinformatics Group, Imperial College London, UK). A powerful approach to identify new chemotypes for drug discovery by virtual screening is presented. The approach is based on logic-based machine learning to 22 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada INDDExTM. We have used INDDEx™ in two blind trials to find novel chemotypes against two series A GPCR (Gprotein coupled receptors) targets. In trial 1, the training data consisted of 479 active antagonist molecues (<1000nM inhibition) and 209 inactives. The learnt QSAR was used to screen a subset of 400,000 drug-like molecules from the ZINC database. An initial screen identified 500 potential hits with a predicted inhibition of < 1000nM. From this list, molecules were removed that were chemically similar to any molecule in the training data, quantified as requiring a Tanimoto coefficient of <0.8. 157 in silico hits were then purchased and tested in primary screens yielding 76 actives, i.e. a hit rate of just under 50%. From these, 40 diverse molecules were selected for secondary screening and 30 had an IC50 of <12muM. Of these 30, 28 were quite different from any of the training data (Tanimoto coefficient < 0.7) and it unlikely that their inhibitory activity could have been predicted by expert inspection. Broadly similar results were obtained for target 2. There are several features on INDDEx that are responsible for our high predictive accuracy of c. 50% in these two blind trials. In particular, INDDEx can learn from a large dataset and use information from both actives and inactives. In addition, INDDEx is not based on global superposition but rather identifies sub-structures that are important for activity. Further blind studies are in progress to explore the power of INDDEx to discover novel antagonists and agonists using logic-based machine learning. Amini, A., Muggleton, S.H., Lodhi, H. and Sternberg, M.J. (2007) A novel logic-based approach for quantitative toxicology prediction, J Chem Inf Model, 47, 998-1006. Amini, A., Shrimpton, P.J., Muggleton, S.H. and Sternberg, M.J. (2007) A general approach for developing systemspecific functions to score protein-ligand docked complexes using support vector inductive logic programming, Proteins, 69, 823-831. FIGURE LEGEND An example of a learnt logic rule describing the structural features of an inhibitor of thermolysin. 43: CONFORMATIONAL FREE ENERGY OF PROTEIN STRUCTURES: COMPUTING UPPER AND LOWER BOUNDS Hetunandan Kamisetty & Christopher Langmead (Carnegie Mellon University, USA) We describe an approach to compute the Conformational Free Energy (G) of a Protein with a given backbone conformation for the Protein. Our technique models protein structures with a fixed backbone as a complex probability distribution over a set of torsion angles, represented by a set of rotamers. Specifically, we model protein structures using undirected probabilistic graphical models, also known as Markov Random Fields. Our representation is complete in that it models every atom in the protein. A probabilistic representation confers several advantages including that it provides a framework for predicting changes in free energy in response to internal or external changes. For example, structural changes due to changes in temperature, ligand binding, and mutation, can all be cast as inference problems over the model. Existing inference algorithms can then be used to efficiently solve these problems. In theory, the energy of interaction between any two residues of the protein is non-zero. However, due to the nature of these interactions, this energy is negligible if the two residues are distally located. Also, if all the residues that directly influence a pair of residues are in specific conformations then the random variables corresponding to these residues become conditionally independent of each other. These conditional independencies can be compactly encoded using a Markov Random Field(MRF). In general, an MRF encodes the following conditional independencies: each vertex is conditionally independent of every other set of vertices in the graph, given its immediate neighbors in the graph. While an MRF allows for a compact encoding of the probability distribution, performing statistical inference exactly can still be expensive. In fact, even computing exact marginals is NP-Hard, if the graph, like the MRF described above, has cycles. The Junction Tree algorithm for exact inference has a running time that is exponential in the tree width of the graph, which can be prohibitively expensive in large graphs. However, recent advances within the Machine Learning community on approximate algorithms for inference now allow efficient computation of approximations to the free energy. In particular, Generalized Belief Propagation gives estimates of free energy that have been shown to work well in practice, even though they have no theoretical guarantees [4], mean field and other variational approximations [3] give upper bounds on the free energy while the methods of [2], which we shall refer to as Tree-reweighted BP, give lower bounds on the free energy. Since the log partition function and the free energy differ only in the sign, we will use both in our results. When comparing different algorithms with each other, we use estimates of the log partition function and when comparing them with experimental results, we will use the negatives of the log partition estimates as free energy estimates. We will present results showing that simple bounds obtained using Naive Mean Field and Tree Reweighted Belief Propagation are reasonably tight. While GBP isn't guaranteed to give good estimates, we showed that it outperforms the two other approaches on most datapoints, often significantly. Admittedly it is possible to find better bounds, using for example, a structured variational approach We describe an approach to compute the Conformational Free Energy(G) of a protein with a fixed backbone, by posing it as a statistical inference problem in a Markov Random Field([1]). Using this framework, we shall describe fast algorithms with strong theoretical guarantees to compute lower and upper bounds for G. 23 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada instances, macromolecular complexes do not change during crystallization and therefore crystal packing should reflect significant, or biologically relevant, macromolecular interactions. This assumption is exploited in most, if not all, studies, where structural aspects of protein interactions are inferred from crystals. However, crystals exemplify thermodynamic systems in global minimum of free energy, taking into account both significant, biologically relevant interfaces found within biological complexes, and artifactual, inter-complex contacts that originate from the structure of crystal packing. Therefore, it is possible that crystals may misrepresent natural, in-solvent, interactions by sacrificing their binding energy if it is overweighed by the formation of more energetically favourable inter-complex contacts. Although this point is conceptually clear, no systematic study has been performed so far, where the correspondence between natural and in-crystal interactions were studied. The present work aims to approach the outlined problem by analyzing dimeric protein complexes obtained from crystallographic PDB entries. Two goals are pursued. First, we would like to find out to what degree our understanding of macromolecular interactions allows one to reproduce these complexes outside crystal context. Secondly, we would like to see whether, and if yes, then under what conditions, these complexes may be misrepresented by crystals. I will report the results of a massive docking experiment, which included docking of 4065 non-redundant dimeric protein complexes identified in crystal packings using PISA software [1]. Before the docking, the complexes were disassembled and their monomeric units were randomly oriented in order to exclude the possibility of docking by trivial translation. Then, a specially written program was used to find the only most energetically favourable contact of the units. Unlike in many other docking programs, no geometrical scoring of docking quality was used. The optimal docking position was identified solely by the minimum of free Gibbs energy of generated complexes, calculated in exactly the same way as in PISA software [1]. Obviously, the described experiment corresponds to the simplest case of bound docking, and if docking calculations were exact and crystal dimers were identical to complexes in solution then all dimeric structures would be reproduced. It was found, however, that in 38% of instances, the toprated orientation of docked subunits was different of the original crystal dimers (8 Å r.m.s.d. threshold was used to identify successful dockings). This unexpectedly high rate of failures demonstrates a sound dependence on the free Gibbs energy of dissociation seen in the Figure (red line). At zero dissociation energy, when no energetically preferable orientation of docked units may be found and successful dockings emerge by chance, the success rate is estimated at 10%. This suggests that an average protein chain may form about 10 geometrically suitable contacts, which agrees reasonably well with an average of 8 interfaces per chain in the PDB. With increasing free energy of dissociation, the rate of failures shows a remarkable exponential decrease, for lower bounds. While that is out of the scope of this study, it is a promising direction for future studies. REFERENCES [1] Hetunandan Kamisetty, Eric P. Xing and Chris J. Langmead, "Free Energy Estimates of All-atom Protein Structures using Generalized Belief Propagation." Proceedings of the Eleventh Annual International Conference on Research in Computational Molecular Biology (RECOMB 2007), pp:366-380. [2] M. J. Wainwright, Tommi S. Jaakkola, Alan S. Willsky, A new class of upper bounds on the log partition function, IEEE Trans. on Information Theory, vol.51, pp: 2313-2335. [3] Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, Lawrence K. Saul, An Introduction to Variational Methods for Graphical Models", Learning in Graphical Models, 1998. [4] Yedidia, J.S., Freeman, W.T., Weiss, Y., Characterizing Belief Propagation and its generalizations, http://www.merl.com/reports/TR2002-35/, 2002. 6: CRYSTAL CONTACTS AS NATURE'S DOCKING SOLUTIONS Eugene Krissinel (EBI, Genome Campus, Hinxton, Cambridge CB10 1SD, UK). The assumption that crystal contacts reflect natural macromolecular interactions makes a basis for many studies in structural biology. However, crystal state may correspond to global minimum of free energy where biologically relevant interactions are sacrificed in favour to unspecific contacts. A large-scale docking experiment was performed in order to assess the extent of misrepresentation of natural complexes by crystal packing. The ability of proteins to interact with each other and form complexes makes a basis of many important biochemical processes. In general, protein interactions are thought to be specific, which means that a given protein manifests sound interaction only with particular type of proteins and in particular spots on protein surface. This specificity is important for research and applications, and considerable amount of effort in both experimental and theoretical studies is applied to the identification of structural aspects of protein binding. Solution to this problem may bring about a better understanding of protein function and give a clue for drug discovery and design. Most of our today’s knowledge on the geometry of macromolecular interactions comes from protein crystallography. It is commonly assumed that, in most 24 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada and no docking failures have been recorded at dissociation energies higher than 50 kcal/mol. In order to rationalize the obtained results, a theoretical model for docking failures has been developed. This model considers a finite number of geometrically suitable contacts for each pair of docked proteins, and assumes that, firstly, crystal may capture geometrically different complexes with probability dependent on their free energy of dissociation, and, secondly, free energy is calculated with a normal error. Fitting to experimental results in the Figure suggests an average of 10 suitable docking contacts (geometrically different complexes) for each pair of proteins and calculation error of 2.3 kcal/mol (green line). Assuming the average of 10 contacts to be a property of crystal packing, the model gives a finite rate of failures even at zero calculation error, as shown by magenta line. This line indicates the measure of misrepresentation of dimeric complexes by crystal packing. The weaker is protein interaction in the complex, the higher are chances that it will be completely lost at crystallization due to the emergence of unspecific, inter-complex interactions. The numbered spots in the Figure indicate CAPRI targets with failure rate probabilities reported in [2]. Usually, the low success of CAPRI dockings is attributed to algorithmic imperfectness and difficulties of bound docking. However, the present study suggests that most of CAPRI targets have been chosen in the region where complexes are very likely to be misrepresented by crystals. Therefore, it is possible that, in some cases, computational docking yields correct, lowest-energy dimers that are not found in crystal packings, while docking solutions that were rated as successful could be a result of a mere chance. Only two higher-energy CAPRI targets, shown as diamonds, have been successfully docked by program used in this study. It may be also noted that considerable number of CAPRI targets appears to be unstable complexes, for which no docking solution should be sought in first place. [1] E. Krissinel & K. Henrick (2007) J.Mol.Biol. 372:774797 [2] S. Vajda (2005) Proteins, 60:176-180 estimating the barriers we construct a kinetic model capable of determining the pathway and rate of unfolding. The configurational space of a polypeptide chain is astronomically large, yet the folding of most proteins is completed within a fraction of a second. This paradoxical observation suggests that a pathway for folding, dictated by physical interactions and topological constraints, must exist. Protein topology is the primary determinant of the folding energy landscape [1], and routes through this landscape are like trees [2] where branches represent condensation events, merging two substructures into one, and the branch order represents the topological dependence of these events. The discovery of a direct linear correlation between folding rate and relative contact order was the first indication of a topological dependence on the folding rate, and hints at a mechanistic description of folding [3], but analytical models cannot capture the details of the pathway. Recent experimental observations by Colon endorse the view of topology-dependent unfolding rates in a survey of kinetically stable proteins [4]. In that study the most kinetically stable proteins are multimeric and have complex geometry, with features such as buried terminal strands, and "latches" that wrap around the protein like a belt. Our earlier model, called UNFOLD, described a protein as a weighted secondary structure element graph [5]. Contact energies were defined between secondary structure elements, and min-cuts were found such that the graph was heirarchically partitioned in the lowest energy way at each step. In the new model, GeoFold, we refine the energy expression, adding configurational and sidechain entropy, and writing the solvation energy as a function of denaturant concentration. Also new in GeoFold is the ability to carry out a kinetic simulation. To do so, we have defined reasonable estimates of the rates of transition between kinetic intermediates in the unfolding pathway. A kinetic simulation is carried out by moving concentrations along the edges of the unfolding graph. The unfolding graph is composed of elemental subsystems (Figure 1a) where substructure f is partitioned into two substructures, u1 and u2, passing over an energy barrier in the process. It is assumed that the two substructures u1 and u2, are solvated before they are separated. This means that an energy barrier can be calculated if we can only estimate the solvation energy and the gain in configurational entropy. The solvation energy is assumed to be unfavorable and proportional to the change in buried surface area, whereas the configurational entropy change is assumed to be positive, and its magnitude depends on the number of degrees of freedom gained by unfolding. Topology defines the allowable unfolding motions, which in turn define the entropy gain at each step. Three topological operators can be defined to describe all non-distorting linear transformations on a chain (Figure 1b). 1) If the chain crosses only once from u1 and u2, then the allowable motion is a pivot, which is the set of all rotations around a point. If the chain crosses twice, the two crossing points define a hinge, allowing rotations only around one axis. If the chain 18: GEOFOLD: A MECHANISTIC MODEL TO STUDY THE EFFECT OF TOPOLOGY ON PROTEIN UNFOLDING PATHWAYS AND KINETICS Vibin Ramakrishnan (Insitute for Bioinformatics and Applied Biotechnology, Bangalore, India), Saeed Salem, Saipraveen Srinivasan, Wilfredo Colon, Mohammed Zaki & Chris Bystroff (Rensselaer Polytechnic Institute USA) We seek to explain the effect of protein topology on kinetic stability using a graph-based model for unfolding. Proteins open to an unfolded state by either pivoting, hinging or separating chains. By 25 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada does not cross from u1 and u2, then the model consists of multiple chains or disjoint segments of one chain, and the motion is a simple translation, called a break in this study. A break is assigned the highest entropy gain, followed by pivots, then hinges. The structure of the unfolding graph (Figure 1c) depends on the topology of the protein. If there are strong topological dependencies on the possible ways to unfold the protein, then the unfolding graph will contain one or more bottleneck edges. The rate of passage through an edge depends on the amount of buried surface exposed and the type of unfolding motion. A series of bottleneck edges would lead to slower unfolding in general. GeoFold was applied to several proteins, some that unfold fast (factor for inversion stimulation FIS, 1F36; protein-G, 2IGD) and others that are extremely kinetically stable (papain, 1PPN; cyanase 1DWK), having unfolding halflives measured in years or decades. An unfolding graph was generated for each protein by finding all topologically possible pivots, hinges and breaks, recursively, starting with a graph node representing the complete native structure (N) and ending in graph nodes that contain single residues. The concentration of N was initialized to a non-zero molarity and all other nodes were set to zero molar to start the simulation. Concentration changes were calculated using transition state theory, where the barrier height for unfolding was set to the solvation energy times a Hammond factor (theta), and the barrier height to folding was set to the configurational energy gain minus the solvation energy times the difference Hammond factor (1 theta). Concentration changes were then calculated until the whole system reached an equilibrium state. A solvation energy factor (omega) was used to calculate the solvation free energy from the buried surface area. The configurational entropy of a break (S_break), pivot (S_pivot) and hinge (S_hinge) motion were also user defined, allowing us to empirically fit the unfolding rates to real experimental values. Unfolding simulations were carried out at various values of omega near the melting point. The unfolding rates in pure water were found by linearly extrapolating the rates to the solvation value for pure water (omega_H2O). This is the same method that is used experimentally. The initial results of simulations on fast unfolders and kinetically stable proteins show the expected trend. FIS, a dimer, unfolds the fastest and shows 3-state kinetics, with a dimeric intermediate state that dominates at the melting omega. A dimeric equilibrium intermediate has been shown experimentally for this protein [6]. Avidin (1RAV) has a beta barrel structure that forces the protein to unfold by way of an unfavorable hinge motion. Avidin unfolds much slower than FIS. Papain (1PPN), a monomeric protein having a complex topology with N and C-terminal latches, unfolds even more slowly, and exhibits 2-state behavior. The simulation data fit the experimental data qualitatively and provide a detailed look at the unfolding pathway. [1] Baker, D., Nature 2000, 405, 39-42. [2] Hockenmaier, J. J., K A. and Dill, K A., Proteins 2007, 66, 1-15. [3] Makarov, D. E., Plaxco, K. W., Protein Sci 2003, 12, 1726. [4] Xia, K., Manning, M., Hesham, H., Lin, Q., et al., Proc Nat Acad Sci 2007, 104, 17329-17334. [5] Zaki, M. J., Nadimpally, V., Bardhan, D., Bystroff, C., Bioinformatics (Oxford, England) 2004, 20, i386-393. [6] Meinhold, D., Boswell, S., Colon, W., Biochemistry 2005, 44, 14715-14724. 38: THE NEXT GENERATION OF THE BACKBONEDEPENDENT ROTAMER LIBRARY Maxim Shapovalov and Roland Dunbrack (Fox Chase Cancer Center, USA). We present the next generation of the backbonedependent rotamer library, which is widely used in structure prediction and protein design programs. We have used adaptive kernel density estimation to achieve smooth, differentiable phi,psi dependent probability estimates and angles. These libraries are useful in methods that account for backbone flexibility. As the number of high-resolution X-ray structures has increased, it has become possible to develop more detailed statistical analyses of side-chain conformational data. The backbone-dependent rotamer library, which provides rotamer frequencies and the means and variances of dihedral angles, is used in many homology modeling programs and most protein design methods. As part of improving homology modeling using the SCWRL side-chain prediction program and other programs, we have developed the next generation of a backbone-dependent rotamer library. Our central goal in releasing a new rotamer library was to provide smooth estimates of the rotamer probabilities as a function of the backbone dihedrals phi and psi_ Previous versions of the library were quite bumpy due to a lack of smoothing on the phi,psi grid, especially in regions of the Ramachandran map that are not densely populated. As these probabilities (or rather logs thereof) are often used as energy functions in programs that allow backbone flexibility (e.g., Rosetta), it is important that the density estimates have wellbehaved derivatives with respect to the backbone dihedrals. We present a purely non-parametric approach to generate a smooth, differentiable backbone-dependent rotamer library for all standard protein residue types. We applied our electron-density based method (Shapovalov and Dunbrack, 2007) to remove unreliable conformations in a set of 3000 protein chains. We used a recently developed program, siocs, to flip Asn, Gln, and His residues according to hydrogen bonding patterns within each crystal. To derive probabilities of the different rotamers for each side-chain type, we used adaptive kernel density estimation to calculate 26 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada p(phi,psi | r) for each rotamer r and Bayes’ rule to generate p(r | phi,psi). The figure shows the probability of serine g+ (+60) rotamer vs. phi and psi. A kernel is a Gaussian-like function used to spread out single data points; the amount of smoothing depends on the width of the kernel function, with greater smoothing generated by wider kernels. An adaptive kernel varies the width of the kernel depending on the local density of points, such that there is greater smoothing in sparse regions of the data set. The adaptive kernel thus reduces noise from outliers in the data. We use both data-adaptive kernels that vary from data point to data point depending on a local pilot density estimate, as well as query-point adaptive kernels, where all data points have the same kernel width but the kernel width varies the density near the query point (in this case, phi,psi). The rotamer probabilities use data-adaptive kernels. To calculate mean angles and their variances as a function of phi and psi, we used an adaptive kernel regression, using query-adaptive kernels. The kernel used in all of these calculations is a von Mises function, which is the analogue of the normal distribution for periodic variables (i.e., angles). One particularly difficult statistical problem is backbonedependent density estimates for non-rotameric dihedral degrees of freedom, such as _hi_ of Asp and Asn and _hi__of Glu and Gln. This is effectively a regression of a density estimate; that is, we provide p(chi2 | phi,psi,r1) for Asp and Asn, where r1 is the _hi1 rotamer. We have solved this problem with a novel combination of query-adaptive kernels for the backbone angles on the one hand and dataadaptive kernels for the side-chain dihedrals on the other. Effectively, in sparse or empty regions of the Ramachandran map p(chi2 | phi,psi,r1) looks like a backbone-independent estimate. In populated parts of the Ramachandran map, the local data contribute strongly and the estimate varies significantly from the backbone-independent estimate. We have also applied these methods to the aromatic _hi2 dihedral angles. The new rotamer libraries improve structure prediction in SCWRL, in particular for the aromatic amino acids and the nonrotameric degrees of freedom. We believe the new rotamer library will be an important step toward improving protein structure prediction and modeling with SCWRL, and especially for programs that rely on continuous and differentiable energy functions such as Rosetta. 65: A TWO-STAGE RESIDUE-RESIDUE CONTACT PREDICTOR Actual residue-residue contacts comprise only about 3% of possible pairs in a sequence. We use one neural network to provide an enriched set of pairs for training a second neural network. While the results show higher accuracy, the gains appear to come from pairs with low separation. Protein structure prediction continues to be a challenge despite the gains from model builders such as Modeller, Rosetta, and undertaker. The best predictions today depend on templates, known protein structures whose sequence is sufficiently similar in part or in whole to the target sequence. These templates provide important constraints in building accurate models. However there are target sequences which have no templates. For these targets, there is a need for other constraints especially in terms of the super-secondary structure, that aspect of structure between the secondary structure and the actual tertiary structure. Knowing that two residues are in close proximity to each other when the two residues are actually far apart in the sequence is part of such information so accurate predictions of these residue-residue contacts may help in building models for such difficult targets. We developed a predictor for CASP7 using local structure predictions along with paired statistics including a novel correlation statistic. Its predictions were assessed as the best for CASP7. Since then we have developed a new neural network for CASP8 that employs more inputs. While developing the new predictor, we discovered that by just using local structure predictions, we could build a good predictor. Until then we had assumed that the paired statistics were the main source of predictability and the local structure predictions added only a small amount. With this new result, we revisited an issue that arises in developing a contact predictor: the sparseness of positive examples. Actual contacts are only about 3% of the total possible pairs of residues. Originally we dealt with the sparseness by reducing the number of negative examples to get a better balance of negative and positive examples while training. The new two-stage predictor resolves this issue by providing a second stage neural network with an enriched set of predictions where the positive examples comprise about 10% of the total examples; no balancing is required. The first stage uses only local structure predictions and regularized amino acid composition as inputs. We limit resulting predictions to 10*sequence_length. Then paired statistics are calculated for this restricted set of pairs. These statistics along with the log(rank) of the first stage predictions and matching local structure predictions provide the inputs for training a second neural network. The result is a gain of about 3% in overall accuracy. This gain comes at a cost in the quality of the predictions. To explain what is meant by quality, we present a new measure called weighted accuracy. Given two residues, we define separation as the absolute difference between the indices of the two residues. Residue pairs with low separation have a significantly higher probability of contact than pairs with high separation (> 50). CASP assessors deal with this issue by dividing predictions into three categories: George Shackelford & Kevin Karplus (UCSC, USA) 27 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Surface, as defined by H. Edelsbrunner at the end of the 90’s [8]. This particular surface definition is referred to as the Molecular Skin Surface (MSS) when applied to molecular assemblies. It is basically comparable to the MS representation but providing additional smoothness and decomposability. Used in MetaMol, the MSS provides further advantages as compared to other molecular surface definitions. First, the surface does not self-intersect and is everywhere tangent continuous [8]. Moreover, MSS is composed of quadrics - whereas MS comprises torus slices which simplify calculations. Another advantage of the MSS is that the nature of the surface depends on a single parameter, the shrink factor. Adjusting it allows to evolve in real time from a van der Waals surface to the MSS (close to the MS) and finally to a simplified surface that can be very useful for coarse-grained protein docking. Some works already triangulate MSS [9-11], but this has two drawbacks: (1) the surface topology must be preserved, making the algorithm complicated (and slow); and (2) as previously, at a certain level of zoom, the triangles generate display artifacts. To overcome these limitations, we use a ray-casting method. This has two advantages: (1) the raycasting algorithm directly uses the MSS equation and does not need to resample it; and (2) pixelaccurate images are generated. In order to speed up the calculations significantly, we implemented GPU ray-casting, which has already been used to represent simple molecular models as “CPK” or “Balls and Sticks” [12, 13] but, to our knowledge, our program is the first one that achieves GPU ray-casting to the more complicated case of MSS. As a result MetaMol is able to display MSS interactively and with the best rendering quality. Furthermore, MetaMol provides sophisticated lighting effects that enhance the displaying quality and it is possible to visualize the MSS deformations with smooth transitions, which may be used for displaying, in real time, molecular surface movement during Molecular Dynamics simulations. See also: http://www.loria.fr/~chavent/metamol.htm REFERENCES: 1. Connolly, M. L., molecular surface triangulation, Journal of Applied Crystallography, 1985, 18, pp. 499-505. 2. Varshney, A., Brooks, F. P. J. and Wright, W. V., Linearly Scalable Computation of Smooth Molecular, Invited submission,, IEEE Computer Graphics and Applications, 1994. 3. Sanner, M. F., Olson, A. J. and Spehner, J. C., Reduced surface: an efficient way to compute molecular surfaces., Biopolymers, 1996, 38, pp. 305-320. 4. Can, T., Chen, C.-I. and Wang, Y.-F., Efficient molecular surface generation using levelset methods., J Mol Graph Model, 2006, 25, pp. 442-454. 5. Bates, P. W., Wei, G. W. and Zhao, S., Minimal molecular surfaces and their applications, J Comput Chem, 2007. 6. Vorobjev, Y. N. and Hermans, J., SIMS: computation of a smooth invariant molecular surface, Biophys J, 1997, 73, pp. 722-32. those with separation of 6 or greater, 12 or greater, and 24 or greater. Accuracy is measured in all three categories for assessment. Correct predictions with large separation can be considered more valuable than those with small separation. The new measure, weighted accuracy, takes the impact of separation into account. Weighted accuracy for a prediction is C(i,j)/p(|i-j|) where C is 1 if residues i and j are in contact and 0 otherwise, and p(|i-j|) is the probability that the residues with that separation are in contact. This provides a higher value for correct predictions when the separation is large. Using this measure we show that the two-stage predictor may provide better accuracy but lower weighted accuracy. This can be explained if we assume the two-stage predictor making more correct predictions but the predictions have smaller separations than those of a single-stage predictor. LAPTOP PRESENTATION ABSTRACTS 2: METAMOL: HIGH QUALITY VISUALIZATION OF MOLECULAR SKIN SURFACE Matthieu Chavent (France CNRS), Bruno Levy (France INRIA), Bernard Maigret (France CNRS). MetaMol is a new program that generates high-quality 3D representations in interactive time. In contrast with existing software that discretize the surface with triangles or grids, our program is based on a GPU-accelerated raycasting algorithm that directly uses the piecewise-defined algebraic equation of the Molecular Skin Surface. The Solvent Excluded Surface (SES) or Molecular Surface (MS) is the most widely-used surface for representing macromolecular assemblies. Starting from the pioneering algorithm proposed by Connolly [1], numerous works have been devoted to the improvement of the related methods, in order to provide fast and robust generation of high quality pictures of MS. In 1994, Varshney et al. developed a program that was easily parallelizable [2]. The year after, Sanner proposed a method based on reduced surfaces [3] to visualize large molecules (more than 10,000 atoms). More recently, using a grid associated with a marching front algorithm, Can et al. proposed a level-set-based method [4] while Bates et al. defined a Minimal Molecular Surface [5]. All these approaches are efficient but suffer from precision problems: in Varshney and Sanner algorithms, the molecular surface is triangulated while, for the Can and Bates algorithms, the surface is represented as the union of cubes so that a level of zoom is always found where triangles or cubes appear. Furthermore, the generated MS is not exempt from singularities due to self intersections [6, 7, 3, 5]. With MetaMol, we tackle the problems of precision and singularities by using the Skin 28 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada in average from the native conformation) were refined to within 2A. This happened despite the lack of hydrogenbonding or any orientation-dependent term in the DFIRE energy function. However, success deteriorates significantly as the initial structures of the helical/strand segments deviate more from their respective native conformations. Here, we propose a “dipolar” DFIRE (dDFIRE) energy function based on the orientation angles involved in dipoledipole interactions. This is done by treating each polar atom as a dipole. The orientation of the dipole is defined by the bond vectors that connect the polar atom with other heavy atoms. The dDFIRE energy function is then extracted from protein structures based on the distance between two atoms and the three angles involved in dipole-dipole interactions. This approach takes into account the hydrogen-bonding interaction via the physical dipole-dipole interaction. More importantly, it provides a consistent treatment for the possible orientation-dependent interactions between polar and nonpolar atoms and between polar atoms that are nonhydrogen-bonded. Moreover, an integrated treatment of distance and angle dependence produces a parameter-free statistical energy function. Existing orientation-dependent knowledge-based energy functions are limited to either hydrogen bonding or geometry-based orientation in coarsegrained models. This all-atom statistical energy function was employed to fold protein terminal regions with secondary-structures. Folding completely unfolded terminal segments is challenging because it requires the restoration of both mainchain and sidechain conformations. Moreover, compared to internal regions, terminal regions are more flexible and often exposed. This test is necessary because native-like fragment structures are difficult to produce by contemporary energy functions and the prevailing structureprediction techniques is to mix and/or match known native structures either in whole (template-based modeling) or in part (fragment assembly). The ab initio refolding of a completely unfolded segment also has its own biological significance, as protein folding assisted by a prefolded domain (pro-domain) is common in many proteins. It is important to learn which orientation-dependent interaction is responsible for the success of the dDFIRE energy function in segment refolding. Three orientation components of the dDFIRE energy function are employed to refold five terminal regions (two single helix segments, one two-helix bundle, one strand, and one beta hairpin of five separate proteins). The three dDFIRE components are the orientation dependence involving hydrogen-bonded polar atoms only, polar-nonpolar atoms only, and polar atoms only. (Note that the last one includes hydrogen-bonded atoms.) The three individual orientation components can restore single helix in 2guzb and 1i2ta as accurately as the full dDFIRE energy function. However, they produced slightly less accurate structures (1.5A to 1.7A in global rmsd) than the dDFIRE (0.8A) for the terminal two-helix bundle in 1o82a. While every single orientation component can refold helix-containing segments with reasonable accuracy, they cannot restore the structures of strandcontaining segments well. The orientation components 7. Geng, W., Yu, S. and Wei, G., Treatment of charge singularities in implicit solvent models, J Chem Phys, 2007, 127, pp. 114106. 8. Edelsbrunner, H., Deformable Smooth Surface Design., Discrete & Computational Geometry, 1999, 21, pp. 87-115. 9. Kruithof, N. and Vegter, G., Approximation by skin surfaces, SM '03: Proceedings of the eighth ACM symposium on Solid modeling and applications, ACM Press, New York, NY, USA, 2003, pp. 86-95. 10. Cheng, H.-L. and Shi, X., Guaranteed Quality Triangulation of Molecular Skin Surfaces, Proceedings of the conference on Visualization '04, IEEE Computer Society, 2004, pp. 481-488. 11. Cheng, H.-L. and Shi, X., Quality Mesh Generation for Molecular Skin Surfaces Using Restricted Union of Balls, vis, 2005, 00, pp. 51-57. 12. Toledo, R. and Levy, B., Extending the graphic pipeline with new GPU-accelerated primitives, Tech report, 2004. 13. Sigg, C., Weyrich, T., Botsch, M. and Gross, M., GPUBased Ray-Casting of Quadratic Surfaces, Symposium on Point-Based Graphics, 2006, pp. 56-65. 3: SPECIFIC INTERACTIONS FOR AB INITIO FOLDING OF PROTEINS Yuedong Yang (Indiana University School of Informatics, USA), Yaoqi Zhou (Indiana University, USA) Proteins interact via orientation-dependent interactions between aminoacid residues. We propose a statistical potential that consistently treats the orientation and distance dependence of interactions between all polar atoms and between polar and nonpolar atoms (in addition to hydrogen-bonded atoms). The potential is tested by ab initio refolding of protein terminal regions. The most well-known specific interaction in proteins is hydrogen-bonding. Little attention, however, has been paid to the orientation dependence of interactions between polar atoms that are not hydrogen bonded, despite evidence of their role in the formation of alpha helices and beta sheets. Moreover, the possible orientation dependence of interactions between polar and nonpolar atoms is ignored even though the hydrophobic effect is caused by the reorientation of water molecules near a hydrophobic surface. Recently, Zhu, Xie, and Honig compared several statistical energy functions and physical-based energy functions and analyzed their respective abilities to refold partially unfolded helices or strands. They found that among the energy functions tested, the most effective one is an allatom, distance-dependent, pairwise statistical energy function based on a Distance-scaled, Finite-Ideal gas Reference (DFIRE) state. In one test, more than 80% of conformations from 104 segments of 81 proteins (4A rmsd, 29 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada between hydrogen-bonded atoms and between polar and nonpolar atoms failed to fold the C-terminal beta strand of 1fltx within 2A global rmsd. Additionally, none of the three individual components can refold the C-terminal betahairpin of 2extb (in a dimeric form) to within 2A in global rmsd. The results reported here underline the importance of orientation-dependent interactions, in addition to the wellstudied hydrogen-bonding interaction, for the successful restoration of specific structural segments of proteins. The absence of orientation dependence leads to short helices or coils rather than secondary-structure elements. These results confirm the importance of orientation preference between non-hydrogen-bonded atoms in the formation of secondary structures of proteins. Additionally, the results call for the attention to the relative orientation between polar and nonpolar atoms. So far, orientation-dependent interactions other than hydrogen bonding have been ignored in constructing all-atom knowledge-based or empirical energy functions. This explains why contemporary energy functions are difficult to produce native-like fragment structures. Thus, this work has significant implications for developing more specific energy function for folding and molecular recognition. Fig. 1 compares five native structures to structures whose fragments in five different structural elements are refolded by DFIRE and by dDFIRE, respectively. There is a clear difference between the structures refolded by dDFIRE and those by DFIRE. For example, dDFIRE can refold the Cterminal single helix segment of 1i2ta very well, while DFIRE breaks it into two segments. A similar phenomenon is observed for 2guzb and 1u84. In addition, unlike dDFIRE, DFIRE fails to yield two helices in 1r690 (as shown) and 1o82a. Moreover, for single strand, DFIRE produces either a strand that is coil-like (2ptl, as shown, 1fltx, 1csp) or even a helix (2extb, as shown) while dDFIRE produces strands that have a more normal structural pattern. There is a marked difference in the quality of the secondary-structure segments refolded by the two energy functions as indicated by the local rmsd values. REFERENCES: [1] H. Zhou and Y. Zhou, Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction, Protein Science, 11, 2714--2726 (2002). [2] Y. Yang and Y. Zhou, Specific interactions for ab initio folding of protein terminal regions with secondary structures., Proteins 71, Published Online: Feb 7 2008 11:56AM DOI: 10.1002/prot.21968. Fig. 1 The segment structures (in red) refolded by DFIRE (left) and dDFIRE (center) for five proteins as labeled are compared to their respective native conformations (right). The fixed portion of each protein is colored in light green. 4: STRUCTURE DETERMINATION OF PROTEINPROTEIN COMPLEXES USING PARAMETERS OF THEIR OVERALL ROTATIONAL DYNAMICS AVAILABLE VIA NMR RELAXATION DATA Yaroslav Ryabov & Charles Schwieters (NIH, USA). Structure and dynamics of proteins have obvious mutual relationships. For example, the size and shape of a protein determine rates of its overall rotational tumbling. We present a computational approach which utilizes parameters of this tumbling encoded in experimental NMR relaxation data for structure determination of single domain proteins and protein-protein complexes. This work presents a further step in utilization of Nuclear Magnetic Resonance (NMR) data in the Xplor-NIH structure determination package [1]. Namely, we report a new algorithm which uses dynamic information encoded in NMR relaxation times for protein structure determination. The initial attempt to use the ratio of longitudinal (T1) and transverse (T2) NMR relaxation times for refinement of NMR protein structures was first undertaken by Tjandra et al. [2]. The authors of that work used residue specific dependency of NMR relaxation times on the molecular angle between an NH bond and the longer principal axis of the protein diffusion tensor. While that approach provided some improvement of protein structure quality it suffered from a number of limitations primary because the diffusion tensor anisotropy was estimated from the dispersion of the T1/T2 ratios. Recently, a fast method for direct calculation of protein diffusion tensor components has become available [3]. This method employs an ellipsoidal approximation to the protein’s shape. In particular, it considers the solvent accessible surface of a hydrated protein structure mapped by SURF tessellation method [4]. The original algorithm [3] treats vertexes of tessellated mesh with Principal Component Analysis (PCA) [5] to obtain dimensions of equivalent ellipsoid and applies further Perrin’s equations [6] to calculate components of the protein diffusion tensor for the equivalent ellipsoid shell approximating a protein’s shape. This method is about 500 times faster than conventional bead algorithms [7] with comparable accuracy. In other words, it is fast and accurate enough to be incorporated in an integrative structure calculation procedure. Preliminary work [8] used this method in combination with a Simplex search algorithm to position domains in two-domain protein complexes when only the translational degrees of freedom were searched. In that case, orientations of the protein domains were derived from other considerations. The present contribution reports the implementation of this fast method for calculating components of protein rotational diffusion tensor within the Xplor-NIH structure determination package. To achieve this goal the method was modified to calculate gradients of the chi-square function with respect to the positions of all protein atoms. Thus, it is 30 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada [2] Tjandra, N., Garrett, D.S., Gronenborn, A.M., Bax, A., Clore, G.M. (1997) Nature Struct. Bio. 4, 443 - 449. [3] Ryabov Y.E., Geraghty C., Varshney A., Fushman D. (2006) JACS, 128, 15432 - 15444. [4] Varshney, A., Brooks, F. P., Jr., Wright, W. V. (1994) IEEE Comput. Graphics Appl. 14, 19-25. [5] Jolliffe, I. T. Principal Component Analysis; SpringerVerlag: New York, 1986. [6] Perrin, F. (1934) J. Phys. Radium, 5, 497-511; Perrin, F. (1936) J. Phys. Radium, 7, 1-11. [7] Garcia de la Torre, J.; Huertas, M. L.; Carrasco, B. (2000) J. Magn. Reson. B147, 138-146. [8] Ryabov Y., Fushman D., (2007) JACS 129, 7894-7902. [9] Tjandra, N., Wingfield, P., Stahl, S., Bax A. (1996) J. Biomol. NMR 8, 273-284. [10] Yamazaki, T., Hinck, A.P., Wang, Y.X., Nicholson, L.K., Torchia, D.A., Wingfield, P., Stahl, S.J., Kaufman, J.D., Chang, C.H., Domaille, P.J., Lam, P.Y.S. (1996) Protein Sci. 5, 495 - 506. now able to explore all degrees of freedom used in protein structure elucidation. Components of the protein diffusion tensor, derived from experimentally measured T1 and T2 NMR relaxation times, are used as structural restraints for standard Xplor-NIH simulated annealing protocols. Therefore, this method essentially restrains the overall shape of a protein molecule making it conceptually different from the previous approach [2]. The ability to restrain the overall of a protein shape may help to resolve the problem of poor packing density of NMR protein structures. Here, however, we utilize these overall shape restraints for positioning and orienting domains in multi-domain protein complexes. This is especially important for NMR based methods of structure elucidation when a small number of inter-domain NOE distance restraints make positioning the subunits difficult. Figure 1 illustrates application of this method for the particular case of the HIV-1 protease homodimer. In this case, centers of gravity of both domains, which were treated as rigid bogies, were initially superimposed. Then, the position and orientation of one domain was randomized within a cube of 60X60X60 angstrom dimensions to prepare an ensemble of 512 different random initial conditions to start standard Xplor-NIH structure refinement protocol. We used previously measured [9] components of HIV-1 protease’s rotational diffusion tensor as the only experimental structural restraints. The refined structures were sorted in ascending order with respect to the values of chi-square differences between measured components of diffusion tensor and those calculated for refined structures. The first 5 percent of the sorted list contain structures which are practically equivalent to each other (less than 0.001 angstrom of Root Mean Square Deviation (rmsd) for alphacarbon positions) and very close to the reference HIV-1 structure [10] (pdb code 1BVG) derived from NOE interdomain restraints (0.3 angstrom of alpha-carbon rmsd). The average value of the chi-square function terms corresponding to the diffusion tensor restraints for these 30 structures with lowest rmsd is about 5 times lower than the same chi-square term for the structure immediately following them in the list. This makes these structures reliably recognizable among others and proves the ability of the method to obtain correct arrangement of protein’s subunits for the cases when reference structure is not available a priori. The algorithm is rather fast requiring about 200 seconds for refinement a single structure on a single core of a standard desktop. Parallelization makes calculation time on an 8 core cluster less than 4 hours. ACKNOWLEDGEMENTS The authors acknowledge stimulating discussions with Drs. G.M. Clore and J.J. Kuszewski. Y.R is supported by National Research Council Associateship Program (Award # 0710430). C.D.S is supported by the Intramural Research Program of CIT, NIH. [1] C.D. Schwieters, J.J. Kuszewski, N. Tjandra and G.M. Clore, (2003) 160, 66-74; C.D. Schwieters, J.J. Kuszewski, and G.M. Clore, (2006) Progr. NMR Spectroscopy 48, 4762. 5: FOCUSED DOCKING: A COMPUTATIONAL APPROACH TO IMPROVE SMALL-MOLECULE DOCKING INTO PROTEIN STRUCTURES Dario Ghersi & Roberto Sanchez (Mount Sinai School of Medicine, USA). A computational protocol that combines protein binding sites detection and docking is presented here and evaluated on a set of 77 cases. The comparison with blind docking shows that our protocol achieves a higher rate of binding site detection, more accurate results and requires significantly less computational time. The goal of protein-ligand docking is to predict the position and orientation of a ligand (usually a small molecule) when it is bound to a receptor protein. When the binding site to be targeted by the small-molecule is known, selecting a reasonably small docking box around this site facilitates docking by focusing sampling of the translational, rotational and torsional degrees of freedom of the ligand. This is the usual situation in lead optimization, where predicting the binding mode or pose of the ligand is needed for rational design of improved potency and selectivity, and in hit identification through virtual screening where the goal is the discovery of ligands, out of a large library, that are likely to bind a protein target. The reverse question is more difficult to address. Given a ligand, is it possible to discover its most likely target? In this “reverse virtual screening” case, since the binding site is not known it becomes necessary to explore the entire protein surface by docking, a procedure that has been named “blind docking”. Since the space where blind docking takes place must accommodate the entire protein and is therefore much larger than a regular docking 31 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada of protein ligand docking for those cases where the binding site is unknown. This approach is especially relevant in applications such as reverse virtual screening and structure-based functional annotation of proteins, since it requires only the knowledge of the three-dimensional structure of the target proteins and can allow for the discovery of unexpected interactions that may occur at previously unidentified binding sites. 7: SUPPORT VECTOR MACHINE-BASED TRANSMEMBRANE PROTEIN TOPOLOGY PREDICTION Tim Nugent and David Jones (University College London, UK) box, the number of energy evaluations carried out by the docking program is usually set up to a proportionally higher value, with a corresponding increase in the running time. This shortcoming has been partially overcome by using known protein binding sites as targets for reverse-virtual screening. While this approach enables faster reverse virtual screening, it limits the universe of candidate targets to those proteins that have clearly identified binding sites and only to those sites within the protein. Ideally, a reverse virtual screening approach would require only the knowledge of the three-dimensional structure of the candidate target proteins and would allow for the discovery of unexpected interactions that may occur at previously unidentified binding sites. The use of predicted binding sites is evaluated here as a tool to focus the docking of small molecule ligands into protein structures, simulating cases where the real binding sites are unknown. The resulting approach consists of few independent docking jobs carried out on small boxes that are centered on the predicted binding sites, as opposed to one larger blind docking job that samples the complete protein structure. The assumption behind the use of a few predicted binding sites is that only a handful of possible smallmolecule binding sites exist on protein structures, and that these sites can be reliably identified. Therefore, it is not necessary to explore a very large number of sites and a gain in speed is possible without a significant loss in coverage. Tested on a set of 77 protein-ligand complexes and compared with blind docking this approach is shown to: (1) identify the correct binding site more frequently than blind docking; (2) produce more accurate docking poses for the ligand; (3) require less computational time. Additionally, the results show that very few real binding sites are missed in spite of focusing on 3 predicted binding sites per protein. We also illustrate the performance of the binding site detection algorithm on comparative models, simulating a scenario where an experimental structure of a protein is not available. We present another approach for biasing the docking toward the predicted binding sites that is alternative to running independent docking experiments with smaller grids centered on the predicted site. The approach consists in masking the regions that are outside a sphere of 11.0Å radius centered at the predicted sites by assigning to them extremely high energy. We tested this alternative protocol by masking all but the first three predicted sites, with a resulting overall accuracy that is still much lower than with any of the focused docking protocols. As a control, we repeated the same experiment by masking one site at a time, and we yielded results that were indistinguishable from the ones produced by the focused docking protocol. Therefore, we conclude that the simultaneous presence of the hot-spots regions is suboptimal for achieving a thorough exploration of the correct binding site, and that there is an advantage in exploring individually the predicted sites one at a time. Overall the results indicate that, by improving the sampling in regions that are likely to correspond to binding sites, the focused docking approach increases accuracy and efficiency Due to the paucity of alphahelical transmembrane protein crystal structures, in silico approaches are essential for structural analysis. We present a support vector machine-based topology predictor that integrates both signal peptide and re-entrant helix prediction, and present the results of application to a number of complete genomes. Alpha-helical transmembrane (TM) proteins constitute roughly 30% of a typical genome and are involved in a wide variety of important biological processes including cell signaling, transport of membrane-impermeable molecules and cell recognition. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under represented in structural databases, making up only 1% of known structures in the PDB. Given the biological and pharmacological importance of TM proteins, an understanding of their topology - the total number of TM helices, their boundaries and in/out orientation relative to the membrane - is essential for structural and functional analysis, and directing further experimental work. In the absence of structural data, bioinformatic strategies thus turn to sequence-based prediction methods. Early prediction methods, based on the physicochemical principle of a sliding window of hydrophobicity combined with the 'positive-inside' rule [1], have been superceded by machine learning approaches which prevail due to their probabilistic orientation. These include Hidden Markov models (HMMs), neural networks (NNs) and more recently, support vector machines (SVMs). While NNs and HMMs are capable of producing multiple outputs, SVMs are binary classifiers therefore multiple SVMs must be employed to classify the numerous residue preferences before being combined into a probabilistic framework. While multiclass ranking SVMs do exist, they are generally considered unreliable, since in many cases no single mathematical function exists to separate all classes of data from one another. 32 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Biol. 1992 May 20;225(2):487-94. [2] Jannick Dyrløv Bendtsen, Henrik Nielsen, Gunnar von Heijne and Søren Brunak. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol., 340:783-795, 2004. [3] Emanuelsson O, Brunak S, von Heijne G, Nielsen H. Locating proteins in the cell using TargetP, SignalP and related tools. Nat Protoc. 2007;2(4):953-71. [4] Käll L, Krogh A, Sonnhammer EL. A combined transmembrane topology and signal peptide prediction method. J Mol Biol. 2004 May 14;338(5):1027-36. [5] Käll L, Krogh A, Sonnhammer EL. An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics. 2005 Jun;21 Suppl 1:i251-7. [6] Viklund H, Granseth E, Elofsson A. Structural classification and prediction of reentrant regions in alphahelical transmembrane proteins: application to complete genomes. J Mol Biol. 2006 Aug 18;361(3):591-603. [7] Jones DT. Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics. 2007 Mar 1;23(5):538-44. 10: DOMAIN REARRANGEMENT AND DOMAIN CREATION IN THE EVOLUTION OF NEW PROTEINS However, SVMs are capable of learning complex relationships among the amino acids within a given window with which they are trained, particularly when provided with evolutionary information, and are also more resilient to the problem of over-training compared to other machine learning methods. One problem faced by modern topology predictors is the discrimination between TM helices and other features composed largely of hydrophobic residues. These include targeting motifs such as signal peptides and signal anchors, amphipathic helices, and re-entrant helices – membrane penetrating helices that enter and exit the membrane on the same side, common in many ion channel families. The high similarity between such features and the hydrophobic profile of a TM helix frequently leads to crossover between the different types of predictions. Should these elements be predicted as TM helices, the ensuing topology prediction is likely to be disrupted. Some prediction methods, such as SignalP [2] and TargetP [3], are effective in identifying signal peptides, and may be used as a pre-filter prior to analysis using a TM topology predictor. Phobius [4] uses a HMM to successfully address the problem of signal peptides in TM protein topology prediction, while PolyPhobius [5] further increases accuracy by including homology information. Other methods such as TOP-MOD [6] have attempted to incorporate identification of re-entrant regions into a TM topology predictor but there is significant room for improvement. A key element when constructing any prediction method is the use of a high quality data set for both training and validation purposes. Extracting a training set from available databases requires requires a number of critical decisions to be made. As an example in the case of TM proteins, searches of databases such as the PDB using the keyword 'transmembrane' will return both genomically encoded TM proteins as well as TM proteins that are not native, such as venoms and bacterial colicins. Furthermore, orientation and helix boundary errors in databases are not infrequent and add an element of noise. While such noise is often well tolerated by machine learning methods, the problem is more significant in smaller data sets. We thus present a new TM topology predictor trained and benchmarked with full cross-validation on a novel data set of 131 sequences, with topologies derived solely from crystal structures. The method uses evolutionary information and four SVMs, combining the outputs using a dynamic programming algorithm, to return a list of predicted topologies ranked by overall likelihood, and incorporates signal peptide and re-entrant helix prediction. Overall, the method predicted the correct topology and location of TM helices for 88% of the test set, an improvement of 11% on our previous NN-based method [7]. An additional SVM has been trained to discriminate between TM and globular proteins with a low false positive rate of 0.4%, making this method highly suitable for whole genome analysis. REFERENCES [1] von Heijne G. Membrane Protein Structure Prediction, Hydrophobicity Analysis and the Positive-inside Rule. Mol Diana Ekman, Åsa K. Björklund and Arne Elofsson (Biochemistry and Biophysics, Stockholm University, Sweden) The metazoan lineage has unusually high rates of domain architecture creation, and the architectures contain relatively large numbers of domains. The introduction of domains amenable to exon shuffling seems to explain some of the increase. Further, most domain families are ancient and de novo domain creation is a rare event. Duplication, domain rearrangement and de novo creation are some of the mechanisms involved in evolution of new proteins. Domain rearrangements are interesting since new functionalities can be created through a single event, frequently insertion of a domain at either terminus. We have found that the rates of domain architecture creation are similar in different phylogenetic groups and have remained roughly constant throughout evolution. An exception is the metazoan lineage where the rates are clearly elevated, and their domain architectures also contain relatively large numbers of domains. The introduction of a set of domains amenable to exon shuffling seems to have been an important 33 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada ligands into account during the comparison. Here we propose a novel methodology that combines the advantages of both approaches, without being impaired by the abovementioned limitations. Comparison method Our method identifies structural motifs in protein binding pockets in a ligand-dependent manner and does not require the proteins, or their bound ligands, to be similar. Therefore the algorithm enables the pair-wise comparison of structures containing different ligands that interact with different protein folds. The procedure comprises two steps. We first identify local similarities shared by the input structures [3]. Subsequently we analyse the coordinates of the bound ligands, looking for the largest common fragment that has a similar position in space. To this end the ligands belonging to each binding pocket are superimposed according to the same roto-translation used for the protein residues. The algorithm then enumerates all the possible combinations of fragments (subset of connected atoms) using a recursive depth-first procedure and identifies the one with the highest score. Such score is defined as a trade-off between the size of the common fragment and the fact that it should be present in the highest possible number of bound ligands. Benchmark We devised a benchmark to test the assumption that the presence of specific protein residues implies a discernible preference for certain ligand fragments. To this end we identified a set of non-redundant pairwise structural similarities between binding pockets belonging to proteins of different folds. Each similarity implies a roto-translation of the binding sites and, accordingly, of the bound ligands. Using the LIGANDSCOUT software we identified a total of 3161 pharmacophoric groups in the 210 ligands considered and identified 450 pairs of pharmacophores which are superimposed by the above-mentioned roto-translation. 364 pairs involved pharmacophores with compatible chemical roles while 86 involved non compatible pairs. The result of this analysis shows that the fraction of compatible pairs of pharmacophores tends to decrease as the distance from the protein residues increases. Moreover it is interesting to note that the fraction of compatible pairs drops when the distance exceeds the threshold value for the formation of hydrogen bonds. This benchmark shows that the correspondences we identify have a functional significance, because the matching pharmacophores are those that effectively interact with the residues involved in the superimposition. Identification of structural motifs in the PDB To demonstrate the usefulness of our approach we performed a comparative analysis of all the binding pockets in the PDB structures classified in SCOP (6,5x104 binding sites). We focused on binding sites belonging to proteins of different folds, involved in binding similar as well as different ligands. We used sequence identity together with the SCOP and CATH classifications to discard all the matches involving homologous structures.This large-scale comparison resulted in the identification of 657 protein structural motifs associated to specific ligand fragments, despite a high variability in the structure of the ligand as a whole. In addition to that a lesser number (570) of motifs factor behind this explosion of new domain architectures in metazoa. In contrast to the domain architectures, most known domain families existed already in the last eukaryotic common ancestor. However, many proteins have incomplete domain coverage, and may hence contain domains created de novo. To investigate this, we have studied the amount of innovation in Saccharomyces cerevisiae and found that at least two thirds of the residues are aligned to homologs in non-fungi, whereas only a minor fraction is specific to S. cerevisiae. In addition, the species specific regions are often short, disordered sequences located at either the N- or C-terminal. 11: A NOVEL METHOD FOR THE DETECTION OF PROTEIN LOCAL STRUCTURAL MOTIFS BINDING SPECIFIC LIGAND FRAGMENTS Gabriele Ausiello1, Pier Federico Gherardini1, Elena Gatti1, Ottaviano Incani2, & Manuela Helmer-Citterich3 1 ( Dept. of Biology, University of Rome Tor Vergata, Italy, 2 Dept. of Chemistry, University of Rome Tor Vergata, Italy, 3 Centro di Bioinformatica Molecolare University of Rome Tor Vergata, Italy) We present an algorithm for the comparison of protein binding pockets that identifies small structural motifs binding specific ligand fragments. We applied this method to all proteins of known structure, identifying 657 motifs. Some of these are present in as many as 60 folds. Introduction In order to understand the rules underpinning the interaction of proteins with small ligands, a wealth of information can be derived from the comparative analysis of binding pockets of known structure. Such analysis can be performed starting from either the ligand or the protein. In the former case a number of sites that bind a molecule of interest are selected. The ligand moieties are subsequently superimposed in order to identify similarities and differences in the neighbouring protein atoms [1]. This approach necessarily limits the analysis to pockets that bind ligands with an overall similar structure, since these are used as a reference to guide the superimposition of the binding pockets. Conversely, if the analysis starts from the protein side, one can mine the PDB, looking for binding motifs which are present in non-homologous proteins. Since such motifs have evolved independently multiple times, they should represent particularly favourable modes of interaction between protein residues and ligand moieties [2]. However this approach does not systematically take the structure of the bound 34 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada with known structure, structural alignments of the fragments were made using STRUCTAL (5). If the score for a certain fragment alignment was above a cut-off, the fragments were considered as an internal repeat. In case of overlapping hits in the same protein, the fragment pair with the highest score was considered the correct one. In the human genome the most common duplication seems to be 6+6 TMH. From all 12 TMH two thirds contain such duplication. For smaller proteins (6-8 TMH) an internal repeat was found in less than 20%. This indicated that longer proteins are more likely to contain internal repeats. However, it might also be so that longer repeats are easier to detect. The same trend can be seen in yeast (S. cerevisiae), E. coli and in the test set of proteins with known structure; the more transmembrane helices in the chains, the bigger part of the chains have an internal duplication. Internal repeats seem to be a bit more common in E. coli than yeast, especially in 911 TMH chains and in the test set a higher fraction of the small proteins (6 TMH) contains internal repeats. Since no duplication events are found in the large G-protein coupled receptor family, 7 TMH proteins containing an internal repeat is lower in human than for the other species, lowering the overall fraction duplicated genes from 34% to 22%. One of the most evident examples of internal repeat with known structure is an acriflavin resistance protein with 12 membrane spanning segments (1oye). Although the sequence identity between the two halves is less than 20%, they are structurally very similar to each other (STRUCTAL score 1834). In search of homologues of different lengths the sequence was blasted against a database of almost 600 bacterial genomes. After three rounds of PSI-BLAST (6) we found frequency peaks both at homologues with 12 and 6 TM segments. The majority of the 6 TMH hits proved to be two peptide chains involved in the Sec-complex, but some examples were found where the proteins were a part of the Acr family. A phylogenetic tree containing 6 and 12 TMH homologues from ten genomes were made in order to find out how the different proteins are related. The longer proteins were split in two parts. The tree clearly separates Sec proteins from proteins in the Acr family and almost all N-terminal parts are clearly separated from the C-terminal parts. The homologues with 6 TMH which are not Sec proteins group together with the N-terminal or the Cterminal parts. They are always found in pairs in the genomes, and if one homologue is located in the N-terminal clad of the tree, the other is found in the C-terminal clad. This suggests an evolutionary model where a 6 TMH protein is duplicated and then the two copies fuse together to form a larger protein, while in the cases with two short homologues the fusion has not taken place (yet). REFERENCES: 1. Abramson J, Smirnova I, Kasho V, Verner G, Kaback HR, Iwata S:Structure and mechanism of the lactose permease of Escherichia coli. Science 2003, 301:610-615 2. Murakami S, Nakashima R, Yamashita E, Yamaguchi A: Crystal structure of bacterial multidrug efflux transporter AcrB. Nature 2002, 419:587-593 were identified on the structure but no common fragment was found in the bound molecules. Overall these figures suggest that the presence of specific residues in a binding pocket confers a discernible preference in the identity and position of a number of ligand atoms. Each motif is found in at least 2 folds. 104 motifs map to three folds, 90 to 4-10 folds and a few exceptional cases involve from 17 up to 63 different folds. Such fragments are usually small compared to the whole ligand. The 330 motifs associated to two or more ligand atoms have been manually analysed in order to categorise the types of fragments recognised. The results of this classification show that the vast majority of motifs are involved in the binding of anions (phosphate and carboxyl groups, 215 motifs) and nucleotides (35). Other highly represented motifs bind metals (14) and heme groups (10). Overall these figures confirm that our methodology is sound. Most of our results comprise motifs that are already known in the literature as having widespread occurrence in fold space. More importantly, this analysis highlights that no other motifs, occur with comparable frequency in the PDB. A more in-depth analysis of nucleotide binding sites showed for the first time their modular nature. The same portion of the nucleotide can be recognised by different motifs and these are variously combined in proteins with different folds. REFERENCES 1 Nobeli I et al. 2001; Nucleic Acids Res. 29(21):4294-309 2 Kinoshita K et al. 1999; Protein Eng. 12(1):11–14 3 Ausiello G et al. 2008; BMC Bioinformatics. 9 Suppl 2:S2 12: HOW COMMON ARE INTERNAL REPEATS IN ALPHA-HELICAL MEMBRANE PROTEINS? Jenny Falk & Arne Elofsson (Biochemistry and Biophysics, Stockholm University, Sweden) In a genomic scan for membrane proteins that contain internal repeats we found that 40% of all TM-proteins with more than 6 predicted TM-regions contain a detectable duplication, in agreement with structural data. In addition, only in a few examples it was possible to detect the existence of the parts as separate genes. After structure determination it has been noticed that some membrane proteins have an internal symmetry (1,2). The most likely explanation is that there has been a duplication of genes, where either the whole gene or a part of it is duplicated and added to the already existing protein encoding gene. This would result in a longer protein which traverses the membrane more times than the original protein. This would provide a possibility to circumvent the constraints the hydrophobic environment imposes on the evolution of membrane proteins. In this study a search for such membrane proteins has been performed, using sequence based as well as structure based methods. In the sequence based search transmembrane helices (TMH) were predicted by PRODIV-TMHMM (3) and protein profiles were made. The profiles were then split into fragments according to the predicted topology, and the fragments were aligned to each other using the profileprofile alignment method SHRIMP (4). In addition for pairs 35 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada electrostatic expansions up to polynomial order L=30 on a 2 Gb personal computer. As expected, 3D correlations are found to be considerably faster than the former 1D Hex correlations but, surprisingly, 5D correlations are often slower than 3D correlations. Nonetheless, we show that 5D correlations will be advantageous when calculating multiterm knowledge-based interaction potentials. When docking the 84 complexes of the Protein Docking Benchmark, blind 3D shape-based correlations take around 30 minutes on a contemporary personal computer and find acceptable solutions within the top 20 in 6 cases. However, applying a simple angular constraint to focus the calculation around the receptor binding site and adding electrostatics to the correlation produces acceptable solutions within the top 20 in 28 cases. Further constraining the search to the ligand binding site gives up to 48 solutions within the top 20, with calculation times of just a few minutes per complex. Hence the approach described provides a practical and fast tool for rigid body protein-protein docking, especially when some prior knowledge about one or both binding sites is available. Hex is available under a no-cost academic licence from: http://www.csd.abdn.ac.uk/hex/ 15: AN APPROACH TO TRANSMEMBRANE PROTEIN STRUCTURE PREDICTION WITH STOCHASTIC DYNAMICAL SYSTEMS USING BACKWARD SMOOTHING 3. Viklund H, Elofsson A: Best alpha-helical transmembrane protein topology predictions are achieved using hidden Markov models and evolutionary information. Protein Sci 2004, 13:1908-1917 4. Bernsel A, Viklund H, Elofsson A: Remote homology detection of integral membrane proteins using conserved sequence features. Proteins, in press 5. Gerstein M, Levitt M: Comprehensive assessment of automatic structual alignment against a manual standard, the SCOP classification of proteins. Protein Sci 1998, 7:445-456 6. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997, 25:3389-3402 13: ACCELERATING AND FOCUSING PROTEINPROTEIN DOCKING CORRELATIONS USING MULTI-DIMENSIONAL ROTATIONAL FFT GENERATING FUNCTIONS Dave Ritchie, (University of Aberdeen, UK), Dima Kozakov (Boston University, USA), Sandor Vajda (Boston University, USA). We have recently developed an analytic 6D polar Fourier correlation expression for rigid-body FFT proteinprotein docking. This approach can rapidly calculate 3D and 5D rotational correlations, and is well suited for focusing and accelerating the calculation around known or hypothesised binding sites when such information is available. Predicting how proteins interact at the molecular level is a computationally intensive task. Many protein docking algorithms begin by using FFT correlation techniques to find putative rigid body docking orientations. Most such approaches use 3D Cartesian grids and are therefore limited to computing 3D translational correlations. However, translational FFTs can speed up the calculation in only three of the six rigid body degrees of freedom, and they cannot easily incorporate prior knowledge about a complex to focus and hence further accelerate the calculation. Furthemore, several groups have developed multi-term interaction potentials and others use multi-copy approaches to simulate protein flexibility, which both add to the computational cost of FFT-based docking algorithms. Hence there is a need to develop more powerful and more versatile FFT docking techniques. We have recently developed a closed-form 6D spherical polar Fourier correlation expression from which arbitrary multidimensional multi-property multi-resolution FFT correlations may be generated. The approach has been implemented in the Hex docking program to calculate 3D and 5D rotational correlations of protein shape and Takashi Kaburagi and Takashi Matsumoto (Waseda University, Japan) A backward smoothing approach utilizing a stochastic dynamical system with two-dimensional vector trajectories is used to predict transmembrane protein structures. Given a sequence of amino acids with unknown structures, the presence/absence of each residue in a transmembrane region is predicted by the backward smoothing process. In this study, we have developed a machine learning algorithm for prediction of the structures of a single class of protein: transmembrane proteins. Transmembrane proteins have long been considered to be a critical factor in understanding biological functions such as 36 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Since the model structure employs a left-to-right topology, the proposed scheme is expected to yield better results than our previous prediction scheme. From a biological point of view, this scheme can be explained as follows: The proposed scheme is designed to predict the annotation of the target protein from the Nterminus. Since the translation process in protein biosynthesis starts from the N-terminus, this scheme is expected to yield good results. Moreover, as an amino acid chain grows in the translation process, amino acids are added at the carboxyl end of the chain. The growing chain immediately tends to fold into a particular conformation. Because of this tendency, when predicting the state at position “t,” it seems natural to use the sequence information from position “t” to the end of the sequence and not from the beginning of the sequence to position “t-1.” In this study, we have also presented the performances of five other prediction methods applied to our proposed model. The five methods that we used are as follows: (i) The proposed method (backward smoothing) (ii) Our previous method (iii) The standard Viterbi method (iv) A standard smoothing method (v) A “forward” smoothing method It should be noted that in order to perform experiments accurately, it is necessary to use appropriate data sets. Currently, one of the most difficult problems in protein structure prediction in general and in transmembrane protein structure prediction in particular, is the difficulty in obtaining appropriate data sets for experiments. We selected two publicly available data sets collected for benchmarking various algorithms. In this study, we also discuss the accuracy of the predictions of our algorithm, which predicts whether particular amino acids are present in a transmembrane region. The evaluation methods that we have followed are the same as those used in Moller et al. For the purpose of comparison, we have tested the performance of TMHMM, HMMTOP, and SOSUI, which are three well-known transmembrane structure prediction tools, using the same test data sets. We observed that the proposed backward smoothing method has a prediction accuracy of 92.3%. It should be noted that the five prediction methods applied to our proposed model used the same model and the same parameters. It was observed that among the five methods, the proposed backward smoothing method had the best performance. Precise comparisons with other prediction algorithms are difficult because the sequences used for their training may have been different. However, for comparison purposes, we tested the same test sequences against three well-known tools for predicting transmembrane helices. In this experiment, the “backward’’ smoothing scheme had a better performance compared to the other three well-known prediction tools. In this study, we have proposed a novel scheme (the backward smoothing scheme) to predict transmembrane regions utilizing a finite-state stochastic dynamical system. cell signaling, ion transport, and intercellular communication. Because of the biological and pharmaceutical importance, the identification of transmembrane helices in membrane proteins is a priority. Although promising methods in X-ray crystallography and nuclear magnetic resonance (NMR) have begun to open avenues to the determination of these structures, the number of known three-dimensional structures remains small. Therefore, reliable algorithms to predict transmembrane protein structures would be very useful. There are two basic methods for predicting protein structures. The first method is to use algorithms that are based solely on the construction principles of proteins associated with the physicochemical properties of amino acids. The algorithms do not involve any sort of training. In this method, windowed averages of physicochemical quantities are calculated. There are several successful examples of algorithms of this type. The second method is to collect data sets of known structures, to extract the features from the data set, and to apply machine learning algorithms to make predictions. Some improvements have been made in using this second method, but further development of algorithms is necessary to improve the reliability of predictions. We used a novel machine learning algorithm to predict protein structures, and we also evaluated the reliability of the predictions. A machine learning algorithm assumes that there are models and associated parameters behind the available data sets. Generally, the degree of success of a machine learning algorithm depends on two factors: how well the model structure characterizes the target molecule from which the data was taken and how well the learning algorithm incorporates the available data sets. The major features of the proposed algorithm are as follows: (i) The hidden Markov model (HMM) topology used in the proposed scheme consists of open-loop connections of submodels. The submodels are made up of two types: a transmembrane region submodel and a loop region submodel. A stochastic dynamical system runs concurrently with the inner state dynamics so that once a dynamical system leaves a particular state, it does not return to that state. In contrast, some of the previous HMM-based algorithms were designed to have five or seven states that could be revisited. (ii) The proposed scheme utilizes a finite state stochastic dynamical system with two-dimensional vector trajectories consisting of a hydropathy index and formal charge. For a given sequence of amino acids of unknown structure, the presence of each residue in a transmembrane region was predicted by a backward smoothing process. The proposed prediction scheme based on backward smoothing emphasizes the dependency more on the previous state than the previous observation data once the previous state has been estimated. 37 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada [2] Laurie ATR, Jackson RM, Bioinformatics, 21, 19081916 (2005) ASTRACT 21 COIL WITHIN THE MEMBRANE: STRUCTURAL ANOMALY FOR FUNCTIONAL NEEDS Anni Kauko, Kristoffer Illergård & Arne Elofsson (Center for Biomembrane Research, Stockholm University, Sweden) To model and understand membrane proteins, understanding on different substructures is crucial. We have for first time analysed coil within the membrane core. These polar segments consists 7 % of core residues and are buried or at polar cavities. They are conserved and functional, particularly in trasporters, where coil can introduce polarity and flexibility required for function. Introduction Membrane proteins perform many essential functions. They consist 25% of proteome and over half of drug targets. However, due to experimental difficulties only 1% of structures in PDB are membrane proteins. Therefore it is of special importance to predict different aspects of membrane protein structure. For this purpose it is crucial to understand the properties of membrane protein substructures. Traditionally helical membrane proteins have been seen as simple regular alpha-helix bundles. However, recent structures have shown various substructures that differ from this view, including reentrant regions, interfacial helices and marginally hydrophobic helices. The coil is polar due to the backbone polar groups, and 7 % of coil at membrane has been ignored so far. Here we present first systematic analysis on coil at membrane core. Structural properties Random coil segments within the deep membrane core can be divided to three separate classes (reentrants, breaks and kinks). Reentrants are coil segments present in reentrant regions that enter and exit the membrane from the same side. Breaks are longer coil segments that clearly interrupt the regular structure of a transmembrane helix. Kinks represent small distortions of the helix geometry. Coil has higher preference toward polar and charged sidechains at membrane core. This probably reflects the preference of inherently polar coil toward polar environments. Moreover, glycine and proline are more common in coil, regardless whether coil is located in the membrane or to the globular region. Further, major coil segments are typically buried or located to the polar cavities thus preventing the polar backbone groups to be exposed to membrane. All these preferences are more pronounced in reentrants and breaks than in kinks. The proposed prediction scheme emphasizes the dependency more on the previous state than the previous observation data once the previous state has been estimated. Since the model structure employs a left-to-right topology, the proposed scheme is expected to yield better results than the previous scheme. The experimental results suggest that the backward smoothing scheme has a reasonably good performance. 20: i-SITE: ENERGY-BASED METHOD FOR PREDICTING LIGAND-BINDING SITES ON PROTEIN STRUCTURES Mizuki Morita, Tohru Terada, Shugo Nakamura & Kentaro Shimizu (The University of Tokyo, Japan). We have developed a method for predicting the ligand-binding sites on protein structures. It is a simple energy-based method and delivers high performance with apo protein structures. We also could improve the accuracy of prediction with re-ranking techniques by amino acid conservation scores. Identifying ligand-binding sites on the protein surfaces is the first step of drug design and improvement of protein functions. We have developed a simple energy-based method for predicting the locations of ligand binding sites on protein 3D structures [1]. A notable feature of our method is to be successful when applied to ligand unbound (apo) as well as bound (holo) forms of the proteins. In our approach, the protein surface is coated with multiple layers of probes to calculate the van der Waals interaction energies between these probes and the protein. Energetically favorable probes are then clustered and the resulting clusters are ranked based on their total interaction energies. Our method was applied to two Laurie & Jackson's datasets: 134 proteins were used to tune the parameters and the best parameters were used to examined a set of 35 holo/apo protein pairs and the results are compared to the results of two alternative methods: Q-SiteFinder [2] and PocketFinder [2]. In 80% (28/35) of the test cases, the ligandbinding site was successfully predicted on a ligand-bound state structure and in 77% (27/35) was successfully predicted on an unbound state structure. This represents significance over conventional methods in detecting ligandbinding sites on uncharacterized proteins. We also could improve the accuracy of prediction with re-ranking techniques by amino acid conservation scores of candidates for ligand-binding sites. REFERENCES: [1] Morita M, Nakamura S, Shimizu K, Proteins, in press 38 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Conserved and Functional Coil within the membrane core is conserved, especially in case of breaks and reentrants. While in globular regions, substitution rates are equal for helices and coil, at membrane coil has significantly lower substitution rates. Further, within core, indel frequences are equally low for helices and coils. Even if accesibility is taken in account, coil has lower substitution rates than helix in membrane core. Thus membrane coil is not ore conserved because of their lower accessibility, but most likely because of their functional importance. Functional role was found for ~60 % of all reentrants and for ~30 % of all breaks and for small fraction of kinks. All functional coils (except one enzyme) are from channels and transporters. In the classical case of potassium channels and aquaporins, an exposed coil backbone from a reentrant regions forms a rigid selectivity filter. The second, and perhaps most typical, case of coil functionality is a coil segment that forms both a flexible binding site for transported substance and is involved in large conformational changes required for transport, exemplified by calcium ATPase. Finally, a coil segment can form a flexible hinge required for gating, as suggested for the ATP/ADP carrier. Taken together coil can provide polarity and flexibility required for transport. Thus coil within the membrane represent structural anomaly for needs of function. 23: MOLECULAR DYNAMICS SIMULATIONS USING AN ALPHA-CARBON-ONLY KNOWLEDGEBASED FORCE FIELD FOR PROTEIN STRUCTURE PREDICTION Patrick Buck and Chris Bystroff (Rensselaer Polytechnic Institute USA) dependencies of I-sites motifs in the protein structure database. The existence of strong sequence-structure correlations in the database should enable us to develop and test folding potentials for template-free protein structure prediction. Knowledge-based potentials based on the statistical occurrences of structural properties in native proteins have proven to be the most successful approach to protein structure prediction [8, 9]. Many theses approaches attempt to discretize conformational space by fragment insertion Monte Carlo [10] or chain build-up [11, 12] in folding simulations. Although quite successful, these methods may ignore intermediates along the folding pathway by strictly optimizing the global fold energy [13]. Modeling folding pathways is essential to the understanding of folding kinetics and kinetic stability. Non-native intermediates along the folding pathways may be required in the folding of some knotted proteins [14]. In this study we use a reduced protein representation for folding simulations in an alpha-carbon-only knowledgebased potential. Peptide residues are treated as beads on a string, with backbone atoms for each residue lumped into a single interaction center located at the position of each alpha-carbon. Such a model can significantly reduce the cost of computing trajectories to visualize long time-scale dynamics such as in protein folding [15]. Recently, there has been increased interest in simulating the physical folding process using reduced protein representations and coarsegrained potentials [16-18]. To our knowledge, no alphacarbon-only statistical potential for folding by molecular dynamics simulations has ever before been tried, as very few of the published statistical potentials act solely on alpha-carbons [19-21], hinting at the difficulty of calculating a realistic energy using a reduced model. Our new knowledge-based energy function includes potentials for virtual bond opening and dihedral angles, hydrogen bond donor and acceptor probability fields, and a local-structure dependent pair-wise potential. All are position-specific and conditional on their unique amino acid sequences which is somewhat different compared to more common residue-specific potentials. As a first test of our energy function, we folded, via Brownian Dynamics, 27 short protein segments of length 12 that were predicted to be autonomous folding units. This set of protein segments represented a variety of secondary structures including helix N-caps, beta-hairpins, and a mix of loops and turns. Most of the native structural preference was accounted for by local virtual bond angle preferences and predicted contacts, but the inclusion of a hydrogen bond probability field significantly increased the observed frequency of the native state. Additionally, the confidence of our predictions was assessed by determining how much of the simulation was spent in the largest cluster center compared to all other clusters. If more than half of those structures submitted for clustering fell into the largest cluster then those protein segments were regarded as having a structural preference. Of the 27 protein segments predicted, 19 were found to have trajectories where more than half of the total simulation could be clustered into one conformation. Additionally, 15 Folding initiation sites are short protein segments that fold independently of their three dimensional context. We used Brownian dynamics to fold peptides represented as alpha-carbon positions only, guided by a knowledge-based force field. The simulations are extremely fast and accurately predict the structures of folding initiation site peptides. Peptide sequences less than 20 residues in length can have strong structural preferences that are independent of nonlocal interactions, as shown by NMR [1-3], and simulation studies [4, 5]. It is thought that sequence patterns for these peptides in the context of a parent sequence become structured early in folding and exist in their native conformation in unfolded proteins. Some of these short sequence patterns, 3-19 residues in length, have been captured in a structural motif library called I-sites (initiation sites) [6] and the associated hidden Markov model HMMSTR [7] which describes the adjacencies and 39 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada of the 19 protein segments found to have structural preferences had cluster centers that were at least 2.5 Å away from native. Scatter plots of energy versus RMSD to native for all 27 protein segments in many cases showed that the densest sampling was both closer to native and lower in energy. For many of the best predicted structures, a correlation was observed between the energy and distance from native. A strong correlation suggest a funnel-like landscape that is advantageous to minimize frustration during simulations. Initial studies, using an alpha-carbon-only potential based on backbone virtual angles, Van der Waals repulsion and contact energy terms, but without an orientation dependent hydrogen bond term, showed errors in strand alignment of beta-sheets and other irregularities that could be traced to poor hydrogen bonding geometry. For example, three betastrands, all rich in non-polar side-chains, would arrange themselves in a collagen-like triple helix rather than in a sheet. The backbone angles permitted this, and the contact energies favored this structure, since it increases the total number of strand-strand contacts. To capture the directional nature of hydrogen bonds, three-dimensional energy fields were created by binning the positions of alpha-carbons whose backbone nitrogen donates a hydrogen around the acceptor alpha-carbon position after transforming the donor coordinates into the acceptor alpha-carbon frame of reference (Figure 1). Leaving out the hydrogen bond energetic term did not significantly change the RMSD of the largest cluster center compared to native . However, the larger size of native-like (< 2.0 Å) clusters affirmed that the hydrogen bond energy significantly stabilized the native structure relative to all other structures when compared to simulations without hydrogen bond energy (p=0.001). Folding a diverse set of short protein segments is prerequisite to developing a hierarchical folding model for larger proteins. It has long been thought that proteins fold locally first, forming secondary structures which are then able to nucleate tertiary contacting [22]. Recently, it has been reported that this type of folding mechanism could be implemented in a procedure called zipping and assembly [23]. The success of folding a diverse set of protein segments in the current study indicates that the zipping and assembly technique could also be implemented with our energy function. In finding the native conformation in simulations of several different structural motifs that are expected to fold autonomously, the force field passes a test for generality and provides hope that our simplified model could be used to fold larger sequences. 24: ENVIRONMENT-SPECIFIC SUBSTITUTION TABLES FOR MEMBRANE PROTEINS Sebastian Kelm (University of Oxford, UK), Jiye Shi (UCB group, USA) & Charlotte M. Deane (University of Oxford, UK) membrane proteins differ from soluble proteins on the molecular evolution level. Integral membrane proteins constitute about 30% of all known proteins and play key functional roles in cells. Their function is essential for a wide range of physiological events, such as neurotransmitter transport, cell recognition and nerve impulse transmission. Membrane proteins are therefore important potential drug targets. Despite their importance, experimentally determined structures are rare as they are both difficult and expensive to attain. The value of modelling the structures of these proteins is therefore large. However, there are no fully automated tools developed specifically for the structure prediction of membrane proteins as opposed to their globular soluble counterparts. The existing state of the art in membrane protein structure prediction is based on the use of tools developed and trained on globular proteins and then relies on manual manipulation and specialist expertise to generate models. In this project we are utilizing the specific structural features that membrane proteins exhibit to develop a toolkit directed at modelling them more accurately in a fully automated fashion. We have created a procedure to generate environment-specific substitution matrices for membrane proteins. In the first instance, by comparing these matrices to those generated from globular proteins, it is possible to gain valuable information about the molecular evolution of membrane proteins. In particular we can examine the environment specific substitution rates in and out of the membrane, as well as compare them between membrane and soluble proteins. Furthermore, we are investigating the contribution of biological parameters to our substitution tables and how these compare to those of globular proteins. In the next step, our substitution matrices shall be used for membrane protein model validation and, ultimately, for structural prediction, for example by homology modelling. 25: IDENTIFICATION OF NOVEL INHIBITORS FOR UBIQUITIN C-TERMINAL HYDROLASE-L3 BY VIRTUAL SCREENING Kazunori Hirayama (Department of Electrical Engineering and Bioscience, Japan, Graduate School of Advanced Science and Engineering, Waseda University, Japan), Shunsuke Aoki (Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Japan), Kaori Nishikawa (Department of Degenerative Neurological Diseases, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Japan), Takashi Matsumoto (Department of Electrical Engineering and Bioscience, Graduate School of Advanced Science and Engineering, Waseda University, Japan), Keiji Wada (Department of Degenerative Neurological Diseases, We present our environment-specific substitution tables for membrane proteins, a first step towards modelling their structure. We compare our tables to those of soluble proteins. Our results shed new light on just how 40 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada DOCK (Ewing et al., J. Comput. Aided Mol. Des. 2001, 15, 411-428), GOLD (CCDC, Cambridge, UK), and FlexX (BioSolveIT, GmbH, Germany). BCR-ABL tyrosine kinase inhibitors (IC50 values from 10 to 200 microM) were successfully identified by virtual screening of 200,000 compounds against crystal structures using DOCK (Peng et al., Bioorg. Med. Chem. Lett. 2003, 13, 3693-3699) and an anchor-and-grow algorithm taking into account ligand flexibility. Human thymidine phosphorylase inhibitor (IC50 = 77 microM) was also identified by virtual screening of 250,521 compounds using DOCK (McNally et al., Bioorg. Med. Chem. Lett. 2003, 13, 3705-3709). In addition, metallo-beta-lactamase inhibitors (IC50 values less than 15 microM) were identified by virtual screening using GOLD (Olsen et al., Bioorg. Med. Chem. 2006, 14, 2627-2635), using a genetic algorithm taking into account ligand flexibility. The advantage of chaining different docking programs was evaluated, and the results showed that virtual ligand screening can be performed with reasonable accuracy and be performed more rapidly using chained screening than screening using a single program with default parameters (Miteva, J. Med. Chem. 2005, 48, 6012-6022). In this study, the results of chained docking to UCH-L3 crystal structure were examined using a UCH-L3 hydrolysis activity assay to confirm the efficacy of the DOCK-GOLD SBDD method. We identified three inhibitors (IC50 = 100 to 150 microM) of UCH-L3 using the DOCK-GOLD virtual screening of 32,799 compounds. Human UCH-L3 and ubiquitin vinylmethylester (Ub-VME) complex crystal structure data (PDB code 1XD3) was obtained from the Protein Data Bank (PDB) (Misaghi et al., J. Biol. Chem. 2005, 280, 1512-1520). Hydrogens were added to the UCH-L3-ubiquitin complex using the CVFF99 force field in the Biopolymer module of the Insight II 2000 suite (Accelrys, Inc., San Diego, CA). Energy was minimized using the Discover 3 module of the same suite with all heavy atoms (that is, atoms other than hydrogen) restrained, to exclude short contacts. To use the UCH-L3 protein structure in the following docking simulations, the structures of the UCH-L3 and Ub-VME complex were divided into their components. In the 3D structure of the UCH-L3-ubiquitin complex, the ubiquitin C-terminus is buried in the cleft of the active site among four active site residues of UCH-L3: Gln89, Cys95, His169, and Asp184 (Johnston et al., EMBO J. 1997, 16, 3787-3796; Misaghi et al., J. Biol. Chem. 2005, 280, 15121520). In the virtual screening process using DOCK and GOLD, the protein-ligand interacting site was restricted to the binding site of the three ubiquitin C-terminal amino residues, so that the outcome could be verified using an ubiquitin C-terminal hydrolase enzymatic assay. The first DOCK screening was performed on the 32,799 compounds in the CNS-Set, which was pre-filtered by RPBS using the least stringent filtering conditions (Miteva, Nucleic Acids Res. 2006, 34, W738-744). Virtual screening experiments were performed using UCSF DOCK 5.4.0 (Ewing et al., J. Comput. Aided Mol. Des. 2001, 15, 411-428) and GOLD 3.0.1 (CCDC, Cambridge, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Japan). We screened for compounds with potential inhibitory activity of UCH-L3 (ubiquitin C-terminal hydrolase-L3), an apoptosis-associated de-ubiquitinating enzyme, using the UCH-L3 structure (1XD3) and the ChemBridge Compound Library. Using DOCK and GOLD software, we identified ten candidate compounds, and by enzymatic assay, we determined that three compounds are UCH-L3 inhibitors. Structure-based drug design (SBDD) is used to identify potentially useful drugs because it enables faster drug candidate identification than in vitro or in vivo biological assays. The computer-based approach to drug screening using molecular docking, is a shortcut method that can be employed when the crystal structure of a target protein is known. UCH-L3 (ubiquitin C-terminal hydrolase-L3) is a de-ubiquitinating enzyme that is a component of the ubiquitin-proteasome system and is known to be involved in programmed cell death. A previous high-throughput drug screening identified an isatin derivative as a UCH-L3 inhibitor. In this study, we screened for novel inhibitors having a different structural basis. We used in silico structure-based drug design using human UCH-L3 crystal structure data (PDB code 1XD3) and a virtual compound library (ChemBridge CNS-Set) of 32,799 chemicals. In a two-step virtual screening using DOCK software (first screening) and GOLD software (second screening), we identified ten candidate compounds with GOLD scores over 60. To determine whether these compounds exhibited inhibitory effects on the de-ubiquitinating activity of UCHL3, we performed an enzymatic assay using ubiquitin-7amido-4-methylcoumarin (Ub-AMC) as the substrate. Among the ten candidate compounds, we identified three compounds with similar basic dihydro-pyrrole skeletons as UCH-L3 inhibitors with IC50 values of 100-150 microM (Hirayama et al., Bioorg. Med. Chem. 2007, 15, 68106818). Experimentally determined IC50 values were 103 microM for compound 1, 154 microM for compound 6, and 123 microM for compound 7. UCH-L3 is involved in the protection of programmed cell death in germ cells and photoreceptor cells in vivo (Kwon et al., Am. J. Pathol. 2004, 165, 1367-1374; Sano et al., Am. J. Pathol. 2006, 169, 132-141). Thus, the structural information we determined regarding the UCH-L3 inhibitors may be useful in the development of apoptosis-inducing anti-cancer drugs. Key methodologies for docking small molecules with proteins were developed in the early 1980s (Kuntz et al., J. Mol. Biol. 1982, 161, 269-288), and various types of docking simulation software are now available, such as 41 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada traditional atom-based interaction scoring that is typical to most empirical, force-field based and statistical scoring methods. We have introduced a novel concept of scoring interactions based on Interacting Surface Points (ISP) that are represented by their 3D positions, normal vectors and 23 chemical feature types including H-bond donor/acceptor, aromatic Pi electrons, hydrophobic groups. A statistically derived empirical scoring function is constructed using a 4parameter geometric description of the relationship between ISP pairs. The parameters include the distance between the pairs of ISPs, the angles between the normal vectors. The energy associated with each possible ISP pair is deduced from statistics based on an inverse application of the Boltzmann distribution function. During the statistics collection temperature factors were considered with the corresponding Gaussian functions applied to the atom positions to account for the variable uncertainty of the atom positions in the Protein Data Bank (PDB) X-ray structures. More accurate geometric statistics have been collected from the Cambridge Structure Database and recently incorporated into the PDB data. Certain atoms, for example, the nitrogen atom in the imidazole ring, may participate in very different types of interactions at the same time (H-bonding and aromatic Pi-stacking). The ISP representation can describe these interactions better than the atom-based approach by having multiple ISPs associated with the same atom but pointing in different directions. The advantage of the statistically driven ISP scoring function is demonstrated on a case study using the Acetylcholine Binding Protein (AChBP) which has a key cation-Pi interaction observed crystallographically for several substrates (e.g. CCE, Nicotine, Lobeline, Epibatidine)[2]. Empirical and force-field based scoring functions fail to rank the correct binding pose highest even when using DFT-6-31**B3LYP charges. In contrast, eHiTS produces the correct pose with the best score even when using the default statistical table and weighting scheme for which no example from this protein family was included. When the automated training script is run to include the family in the knowledge base then the energy separation between the correct pose and other generated poses improves and provides very cleanly distinguished clusters. Furthermore, the eHiTS score gives a good correlation with the experimentally measured log(Kd) values for the series, correctly rank ordering the actives. A simple count of the various ISP types present on a ligand provides a very compact descriptor for the ligand's interaction activity profile. We have used these descriptors via a machine learning technique to create a very rapid ligand-based VHTS filter - called LASSO (Ligand Activity in Surface Similarity Order)[3]. The descriptor is independent of 3D conformation and is focused on the interaction properties rather than connectivity or structural similarity. It is therefore capable of scaffold hopping, the process of retrieving active ligands with different underlying structures. LASSO is demonstrated to achieve high enrichment rates for all families included in the DUD benchmark set[4]. LASSO offers an extremely rapid UK) (Jones et al., J. Mol. Biol. 1997, 267, 727-748). In the first screening using DOCK, the substrate-binding site was defined by selecting ligand-atom-accessible spheres and describing molecular surfaces using the SPHERE_GENERATOR program in the DOCK suite. All spheres within 6 angstroms of the root mean square deviation (RMSD) from each atom of the three C-terminal residues of energy-minimized ubiquitin were selected by the SPHERE_SELECTOR program in the DOCK suite. Following the first screening with rigid ligand conditions, 1,780 compounds with binding energy scores of less than 30 kcal/mol were selected for a second screening using GOLD. Using GOLD, the virtual tripeptide structure composed of three C-terminal residues of the energy-minimized ubiquitin was set as the reference ligand to define the ligand-binding site. All protein atoms within 5 angstroms of each ligand atom were used to define the binding site. As a result, the binding site was modeled as having 174 active atoms (automatically selected by GOLD software). Ligands predicted to be tight-binders by both DOCK and GOLD were then evaluated by further in vitro experiments. 27: A NOVEL SCORING FUNCTION IN eHITS AND LASSO Zsolt Zsoldos, Danni Harris, Mehdi Mirzazadeh, Aniko Simon (Simulated Biomolecular Systems, Canada) A novel statistical scoring function for flexible ligand docking is presented based on Interacting Surface Points (ISP). Results of a case study on AChBP with cation-Pi interactions are shown. A QSAR descriptor based on the ISP provides a 3D conformation independent ligand activity filtering tool, ideal for scaffold hopping. The primary goal of most virtual screening experiments is to identify new lead compounds as a starting point for developing a drug discovery pipeline. There are two typical approaches that are sometimes combined to develop a screening funnel: ligand-based approaches (2D similarity, 3D pharmacophore, fingerprint, surface or other QSAR descriptor) and structure-based flexible ligand docking and scoring approaches. The latter is often considered too slow for the large scale screening of databases of millions of structures, while the former approach does not provide 3D coordinates or estimated binding energies. The fragment-based exhaustive flexible ligand docking engine of eHiTS has been published previously[1]. We are now focusing our efforts on developing an innovative scoring function for eHiTS, one which departs from the 42 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada filtering tool in excess of a million ligands per minute on a single CPU. eHiTS flexible docking has proved to be among the most accurate pose prediction tools[5] and combined with the LASSO ligand based filter it provides one of the highest enrichment factors based on comparative evaluation studies[6]. While LASSO can rapidly and efficiently reduce the number of candidates to be docked to a few percent of the total database, accurate flexible docking with eHiTS used to take several minutes of CPU time per ligand on traditional hardware architectures. The algorithm has been recently redesigned and coded to take advantage of the Cell B/E accelerator architecture providing between 30-100 fold speed-up[7] and bringing the runtime down to a few seconds per ligand on a Sony Playstation PS3 gaming machine or even faster on an IBM Cell Blade while still producing the most accurate flexible docking. The revolutionary hardware technology requires new computational methods, replacing approximate precomputed grids with proximity look-up and explicit pairwise interaction computation. As a result, the calculation is not only orders of magnitude faster, but it also provides more accurate energy predictions. The emerging technologies presented could also be applied to speed up other molecular modeling related problems, e.g. QM or MD simulations and protein folding, by multiple orders of magnitude. [1] Z. Zsoldos, D. Reid, A. Simon, S.B. Sadjad, A.P. Johnson: eHiTS a new fast, exhaustive flexible ligand docking system; J.Mol.Graph.Modeling. (26), 1, 2007, 198212; [2] S.B. Hansen, G. Sulzenbacher, T. Huxfold, P. Marchot, P. Taylor, Y. Bourne: Structures of Aplysia AChBP complexes with nicotinic agonists and antagonists reveal distinctive binding interfaces and conformations. The EMBO Journal (2005)24, 3635-3646. doi:10.1038/sj.emboj.7600828 [3] D. Reid, B.S. Sadjad, Z. Zsoldos, A. Simon: LASSO ligand activity by surface similarity order: a new tool for ligand based virtual screening. Journal of Computer-Aided Molecular Design, [4] N. Huang, B.K. Shoichet, J.J. Irwin: Benchmarking sets for molecular docking. J. Med. Chem. 49(23), 6789-801 [5] M. Kontoyianni, L.M. McClellan, G.S. Sokol: Evaluation of Docking Performance: Comparative Data on Docking Algorithms, J.Med.Chem., 2004; 47(3); 558-565. eHiTS results for the same test case added by Fedor Zhuravlev, Assist.Prof., Technical University of Denmark: http://www.simbiosys.ca/ehits/ehits_validation.html [6] G.B. McGaughey, R.P. Sheridan, C.I. Bayly, C. Culberson, C. Kreatsoulas, S. Lindsley, V. Maiorov, J. Truchon, W.D. Cornell: Comparison of Topological, Shape, and Docking Methods in Virtual Screening. J.Chem.Inf.Model. 2007; 47(4), 150419. eHiTS results added by Merck: http://www.simbiosys.ca/ehits/ehits_enrichment.html [7] http://www.bio-itworld.com/inside-it/2008/05/gta4-andlife-sciences.html 28: SE: AN ALGORITHM FOR DERIVING SEQUENCE ALIGNMENT FROM SUPERIMPOSED STRUCTURES Chin-Hsien Tai1, James J. Vincent2, Changhoon Kim1 & Byungkook Lee1 (1National Cancer Institute, NIH, USA, 2 Vermont Genetics Network, Department of Biology, University of Vermont, USA) The Seed Extension (SE) algorithm produces more accurate sequence alignments from superimposed structures than three other programs tested which use the dynamic programming algorithm. SE does not require gap penalty and also uses less CPU time, suitable for large-scale structural comparisons. It can be implemented in other structure comparison programs. Generating sequence alignments from superimposed structures is an important part of structural comparison programs and structure-based sequence alignments. The accuracy of the alignment affects structural classification and comparisons and possibly function prediction. Many programs use a dynamic programming algorithm to generate a sequence alignment from a pair of superimposed structures. This procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here, we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds “seeds”, the pairs of residues, one from each structure, that meet a certain set of criteria for being unambiguously equivalent. Three consecutive seeds form seed-segments, which are extended along the diagonal of the alignment matrix in both directions. Distance and amino acid similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. SE is simple to implement and does not require a gap penalty. The manually curated alignments in NCBI’s Conserved Domain Database were used as reference alignments to compare the sequence alignments generated from pairs of superimposed structures by the SE algorithm and by three other programs that use dynamic programming algorithm, Chimera, LSQMAN and SHEBA. The SE algorithm performed best among the four programs tested. It gave an average accuracy of 95.9% over 582 pairs of superimposed proteins. The average accuracy of Chimera, LSQMAN and SHEBA were 89.9%, 90.2% and 91.0% 43 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada respectively. For pairs of proteins with low sequence or structural homology, the SE algorithm produced alignments that were up to 18% more accurate, on average, than the next best scoring program. Improvement was most pronounced when the two superposed structures contained equivalent helices or beta-strands that crossed at an angle. SE also used considerably less CPU time than the dynamic programming algorithm used in the original SHEBA. When SE is implemented in SHEBA, replacing a standard dynamic programming algorithm, the alignment accuracy improved by 10% on average for protein structure pairs with RMSD between 2 and 4 angstroms. The program is also two times faster than with the dynamic programming algorithm routine on average for protein pairs with about 200 residues and more than 10 times faster when larger structures are compared. An example of sequence alignment generated by SE and the dynamic programming routine used in SHEBA is shown in the representative figure. This pair of 3 helical bundle structures belongs to cd03439 family in CDD. SE generated three aligned regions corresponding to the three helices; the alignment was identical to that of CDD (100% accuracy). In contrast, the original SHEBA with the dynamic programming algorithm produced only two well-aligned regions; the third region had many gaps and a small number of inaccurately aligned residues. The Seed Extension algorithm is available as a software package for implementing in other structural comparison programs. protein design simulations, with newfound exclusion of 3-10 helix ends. Computational studies of proteins such as homology modeling and protein design involve the difficult task of predicting the conformational effects of mutations. The change of a single sidechain can have subtle, farreaching effects that are difficult to model accurately. The use of discrete “rotamers” simplifies the search over sidechain conformational space, but protein backbone cannot in general be so easily reduced. One exception to this rule is the “backrub,” a low-amplitude, hinge-like motion of a dipeptide coupled to sidechain rotamer jumps. The backrub was documented by examining very high-resolution electron density for alternate conformation sidechains and inferring the backbone changes that must be involved (Davis 2006, Structure 14:265). The backrub has now been employed to good effect in protein design studies (Smith & Kortemme 2008, J Mol Biol; Georgiev 2008, Bioinformatics). Importantly, however, no direct evidence has so far been presented to support the assumption implicit in these designs: that this dynamic, lowenergy backbone motion on the timescale of rotamer transitions for single sidechains is also relevant on the evolutionary timescale of sidechain mutations. To address this point, we have used our Top5200 structure dataset to examine two different cases for which populations of otherwise similar local conformation are related by a single amino acid difference that alters an H-bond or van der Waals contact with a neighboring chain. Both cases show sequence-dependent bimodal backbone distributions that are well described by the backrub motion. The first case is 4320 Phe, Tyr, or Trp residues with plus chi1 rotamers on antiparallel beta sheet, which places the aromatic ring directly over a sidechain on the adjacent strand. If that sidechain is a Gly, then the aromatic residue hinges downward to touch the Gly H, while the Cbeta group of any other amino acid on the opposite strand pushes the aromatic ring upward. The second case is alpha-helix N-cap residues that form classic sidechainbackbone N-cap H-bonds to the i+3 NH (Richardson 1988, Science 240:1648). 4906 of the N-caps were Asn or Asp (233 with psi from 165 to 170 degrees shown in green below), and 7405 were Ser or Thr (1554 of same subset in blue). The backbone conformations differed consistently, where the longer N/D sidechains rotate the first turn’s backbone away from residue i+3, while the shorter S/T sidechains pull the first turn’s backbone toward i+3. When examples are superimposed on the 3 atoms marked in red below, the average N/D vs. S/T N-cap Calpha positions are about 0.3 Angstroms apart in a backrub rotation of about 10 degrees, similar to shifts typical of rotamer backrubs. For the helix N-caps the sequence change and the backrub occur at the same residue, as seen earlier for rotamer changes, while for beta aromatics the sequence change on the adjacent strand causes a backrub shift at the aromatic. These findings validate the inclusion of empirically observed backbone motions such as the backrub as part of the repertoire of “moves” for protein design and other modeling efforts. If we allow nature to inform our notion of 30: CO-EVOLUTION OF STRUCTURAL BIOINFORMATICS AND PROTEIN DESIGN FOR NCAP BACKRUBS Daniel Keedy (Department of Biochemistry, Duke University, USA), Ed Triplett (Duke University, USA), David Richardson (Department of Biochemistry, Duke University, USA), Jane Richardson (Department of Biochemistry, Duke University, USA), Ivelin Georgiev (Computer Science Department, Duke University, USA), Cheng-Yu Chen (Department of Biochemistry, Duke University, USA) and Bruce Randall Donald (Duke University, USA). The “backrub” motion, a previously described dipeptide rotation coupled to rotamer jumps, is now documented to occur for helix N-cap residues and for beta-sheet aromatics related by single amino acid substitutions. Backrubs are thus suitable in a repertoire of moves for 44 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada library or a lattice model is discrete, which is inconsistent with the continuous characteristics of protein backbone torsion angles. This discrete nature may restrict the search space and cause loss of prediction accuracy. The subject of this abstract lies in protein conformation sampling in real space, that is, the exploration of the continuous conformational space compatible with a given protein sequence using a probabilistic graphical model. In particular, we develop a Conditional Random Fields (CRF) model [1], called CRFSampler, to learn the complicated relationship between protein sequence and structure and then sample the conformations of a protein using this CRF model. CRFSampler models the sequence-structure relationship using approximately one million of parameters and estimates them using a sophisticated discriminative learning method. Given a protein sequence, the occurring probability of a potential conformation (i.e., all the backbone angles) can be accurately estimated by CRFSampler and thus the protein conformation space can be efficiently explored. Instead of using fragments as basic building blocks of a protein conformation, CRFSampler directly samples the backbone angles at each position according to its occurring probability calculated from the CRF model. Different from fragment assembly methods and lattice models, CRFSampler uses a directional statistics to model the distribution of protein backbone angles at each position and thus can sample backbone angles from a continuous space. The distribution parameters of angles at each backbone position are sampled by CRFSampler using sequence information and PSIPREDpredicted secondary structure. CRFSampler guarantees to search through the whole continuous conformation space so that the native structure of a protein will not be missed. On the other hand, CRFSampler is also efficient because it is biased towards those conformations with high occurring probability. CRFSampler uses a graph to model the relationship between sequence and backbone angles. The backbone angles at a single position depend on residues and secondary structures at many positions of the target protein to be folded. CRFSampler also models the dependency between the angles at three consecutive positions or even more. In CRFSampler, a sophisticated model topology (see Figure 1 for an example) and feature set can be defined to describe the dependency between sequence and structure without worrying about learning of model parameters. CRFSampler is much more expressive than the FB5-HMM model [2], in which the angles at a single position only directly depends on residue type at this position and only interdependence between two adjacent positions are captured. Second, CRFSampler also naturally captures the interaction between primary sequence and secondary structure. CRFSampler can automatically learn the relative importance of primary sequence and secondary structure, as opposed to the FB5HMM model that assumes primary sequence and secondary structure are equally important. Finally, CRFSampler can easily incorporate sequence profile (i.e., positionspecific frequency matrix) and predicted secondary structure likelihood scores into the model to further improve sampling backbone motion by using structurally observed backbone distributions encoding “protein-like” behavior, we can implicitly incorporate aspects of protein biophysics subtler than the field has so far been able to model accurately. To this end, we have also begun to utilize backrubs for computational redesign of N-caps in GrsA PheA using a new algorithm, BRDEE (see Ivelin Georgiev’s talk in the main ISMB session). While studies by Fersht, Matthews, Kallenbach, Presta, and others have found it possible to introduce stabilizing N-caps where none existed before, on the basis of the findings described above we suspected that explicitly accounting for possible backrubs at the N-cap position could improve the success rate of designs. A new issue, highlighted by feedback between informatics and design, is the difference between helix N-cap preferences for 3-10 vs. alpha-helical conformations. “Traditional” N-caps (with sidechain-backbone H-bonds to residue i+3) appear to be significantly less compatible with 3-10 helix starts. 32: EFFICIENT PROTEIN CONFORMATION SAMPLING IN REAL SPACE Jinbo Xu (Toyota Technological Institute of Chicago, USA). Protein conformation sampling poses as a major bottleneck of ab initio folding. This abstract presents CRFSampler, a protein conformation sampling algorithm, built upon a probabilistic graphical model Conditional Random Fields. CRFSampler models the sequence-structure relationship using a million of parameters. Preliminary results indicate that CRFSampler can efficiently generate protein-like conformations. Ab initio folding has made exciting progress in the past decade, as exemplified by the fragment assembly method implemented in Rosetta and the hybrid method (i.e., hybrid of fragment assembly and lattice model) implemented in TASSER and I-TASSER. Many other groups have developed a variety of fragment assembly methods and lattice models for protein structure prediction, and demonstrated success. Although these two popular structure prediction methods achieved exciting results, several important issues remain with protein conformation sampling. First, due to the limited number of experimental protein structures in PDB, it is still very difficult to have a library of even moderate-sized fragments that can cover all the possible local conformations of a sequence stretch. Second, the conformational space defined by a fragment 45 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada performance. Although extremely expressive, CRFSampler can avoid overfitting of the model parameters by regularizing its parameters using a Gaussian prior, allowing the user to achieve a balance between model complexity and expressivity. Our experimental results indicate that using CRFSampler, protein-like conformations can be efficiently sampled in real space without using fragment assembly. Using only compactness and self-avoiding constraints, CRFSampler can quickly generate native-like conformations with quality better than those generated by the FB5-HMM model and the Levitt's lattice model [3]. Please refer to our paper [4] for a detailed comparison of CRFSampler, FB5-HMM, Levitt's lattice model and Rosetta. Currently we are developing a method for ab initio protein structure prediction by combining CRFSampler with a distancedependent statistical potential and a hydrogen bonding energy. Using the DOPE statistical potential and BMKhbond (only backbone and C-beta atoms considered), we can successfully fold a variety of alpha and beta proteins such as 1FC2, 1ENH, 2CRO, 1NKL, 1TRL, 1BG8, 2GB1, 1SRO, 1PGB, 1FGP and 1DKT. Figure 1: An example CRF model for protein conformation sampling. In this example, the angles (represented as the middle level of this figure) at position i depend on the residues and secondary structure types at positions i-2, i-1, i, i+1 and i+2 and any nonlinear combinations of them. There is also interdependence among angles in three consecutive positions. This CRF model can also be extended to incorporate long-range interdependence between angles and make use of more information such as PSIBLAST profile and alignments generated from comparative modeling. [1] John Lafferty, Andrew Mccallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282–289. Morgan Kaufmann, San Francisco, CA, 2001. [2] Thomas Hamelryck, John T T. Kent, and Anders Krogh. Sampling realistic protein conformations using local structural bias. PLoS Comput Biology, 2(9), September 2006. [3] Y. Xia, E. S. Huang, M. Levitt, and R. Samudrala. Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol, 300(1):171–185, June 2000. [4] F. Zhao, S. Li, B. Sterner and J. Xu. Discriminative Learning for Protein Conformation Sampling. PROTEINS. 2008 Apr 15. [Epub ahead of print]. 33: MODELING THE INTERACTION OF MAP KINASE PHOSPHATASE 3 WITH A NOVEL INHIBITOR BY ACCOUNTING FOR CONFORMATIONAL FACTORS Ahmet Bakan1*, Gabriela Molina2*, Andreas Vogt3*, Michael Tsang2 and Ivet Bahar1 (1Departments of Computational Biology, 2Microbiology and Molecular Genetics, 3Pharmacology and Chemical Biology, University of Pittsburgh, USA) * These authors contributed equally. We employ flexible ligand and side-chain docking to multiple target conformations in a two step procedure to pinpoint an unknown inhibitor binding site and to assess the related mechanism of inhibition. We present its application to the interaction of MAP kinase phosphatase 3 with a novel inhibitor. Molecular docking is the primary method to probe proteinligand interactions. Rigid target assumption is the major limitation to its success. In the recent years, flexible ligand and side-chain docking to multiple target conformations has emerged as a practical approach to improve pose accuracy and scoring. We employ this approach in a two step procedure to pinpoint an unknown inhibitor binding site and to assess the related mechanism of inhibition. We present its application to the interaction of MAP kinase phosphatase 3 (MKP3) with a novel inhibitor (BCI) identified from zebrafish chemical screens [1]. Based on the computational modeling, we proposed that BCI is an allosteric inhibitor and supported the allosteric inhibition mechanism by in vitro experiments. The first step of the computational procedure is identification of potential binding sites on the target protein. To this aim, unbiased rigid protein docking simulations are performed for all known distinct conformational states of the target protein using AutoDock [2]. The resulting poses are clustered to identify energetically favorable docking sites. Favorable sites are further explored by allowing the protein to undergo structural fluctuations in the neighborhood of the predefined conformational states. In the second step, flexible ligand and side-chain docking to multiple target conformations is employed to reveal the most favorable site. When a crystallographic structure of the target is available, normal mode analysis is used for efficient sampling of conformational fluctuations. When the structure of the target is not known, multiple homology models are used as an ensemble of accessible conformations. In the former case, normal modes of internal motions of the target protein are calculated using the anisotropic network model (ANM), a simple elastic network model at residue level resolution [3]. ANM modes relevant to the functional motions of the protein or those affecting the geometry of a potential binding site are selected from the low frequency regime of the spectrum of modes. Protein conformations are sampled along the selected modes by jointly optimizing backbone and side-chains using an all-atom molecular mechanics force field and harmonic restraints. For each conformation, a diverse set of ligand docking poses are generated using GOLD [4]. Resulting poses, reaching a total 46 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada number in the order of thousands for each potential site, are clustered using an agglomerative clustering scheme. Well populated and high scoring clusters are analyzed to reveal the most likely binding site. Finally, based on the normal mode analysis of the dynamics of the target and the location of the most favorable binding site, an inhibition mechanism is proposed. All together, these steps incorporate the conformational factors into scoring which are generally omitted. As opposed to selecting the highest scoring docking pose, this approach is able to pinpoint the inhibitor binding site. MKP3 is a member of MKP family that has been implicated in the development of cancer [5]. MKP3 dephosphorylates extracellular signal-regulated kinase 2 (ERK2) and regulates developmental processes. Upon binding to ERK2, MKP3 is catalytically activated [6]. A selective and potent inhibitor of MKP3 is being lacked. This approach was used to reveal the inhibition mechanism of a novel MKP3 inhibitor, BCI (Fig. panel A). BCI was identified in zebrafish chemical screens. The first step of the procedure was applied to two known states of MKP3: the low-activity state (Fig. panel B) [7] and the high-activity state. The second step of the procedure found that BCI preferentially binds a crevice between the general acid loop and the nearby helix alpha7, rather than interacting directly with the catalytic residues Asp262, Cys293, or Arg299 (Fig. panel C). At this putative binding site, a close interaction with Trp264, Asn335 and Phe336 was observed. To assist in our understanding of the potential inhibition mechanism, we explored the ANM modes of motions that induce conformational changes at the general acid loop. Our analysis showed that MKP3 possesses a tendency to reorient its general acid loop to facilitate the catalytic interactions of Asp262. We proposed that BCI binding to the accessible crevice in the low-activity state effectively blocks the flexibility of this loop, thereby restricting the movement of Asp262 towards the phosphatase loop (Fig. panel D) and inhibiting the catalytic activation induced upon ERK binding. This inhibition mechanism was supported by follow-up experiments using a fluorescent small-molecule substrate of MKP3 and ERK2. BCI was used to probe the role of MKP3 in development of zebrafish embryo. It constitutes a basis for the development of selective inhibitors of members of the MKP family. This work demonstrates a practical and efficient approach to identify the binding site of an inhibitor with an unknown inhibition mechanism. The future aim of this study is to develop this approach as a method for lead optimization, an application area in which a practical structure based approach is being lacked. REFERENCES: [1] G. A. Molina, S. C. Watkins, M. Tsang, BMC Dev Biol 7, 62 (2007). [2] G. M. Morris et al., J Comp Chem 19, 1639 (1998). [3] A. R. Atilgan et al., Biophys J 80, 505 (Jan, 2001). [4] G. Jones et al., J Mol Biol 267, 727 (Apr 4, 1997). [5] A. Bakan, J. S. Lazo, P. Wipf, K. M. Brummond, I. Bahar, Curr Med Chem, Manuscript submitted (2008). [6] M. Camps et al., Science 280, 1262 (May 22, 1998). [7] A. E. Stewart, Nat Struct Biol 6, 174 (Feb, 1999). 34: HOW GOOD CAN TEMPLATE-BASED MODELLING BE? Braddon K. Lance (McQuarie University, Australia), Graham R. Wood (McQuarie University, Australia), Charlotte M. Deane (Oxford, UK). We quantify the best possible predictions achievable in templatebased modelling when using rigid fragments from a single template. Achieving the optimum positioning of template fragments yields median improvements of 0.3 Å RMSD and 4% GDTHA, with the upper quartile yielding improvements of over 0.7 Å RMSD and 10% GDT-HA. The accuracy with which a template approximates a target is strongly related to sequence identity, a relationship which is well understood (Chothia and Lesk, 1986). A long-standing challenge in template-based modelling is generating protein structure predictions better than the best template. In template based modelling, the position of the template fragments is often modified in an attempt to improve the prediction beyond that of the template structure. The magnitude of improvements that can be achieved via movement of template fragments alone has not previously been studied. We have recently quantified these possible improvements (Lance et al., 2008). The magnitude of improvements that may be achieved by optimal positioning of template fragments were quantified using CASP7 targets (Moult et al., 2007), and the HOMSTRAD database (Mizuguchi et al., 1998). In the CASP7 tests we used the best template for each target structure as listed on the CASP website. With the Homstrad database we carried out comprehensive tests using all structure pairs within each HOMSTRAD family, arbitrarily assigning the role of target and template to each member of the pair. Structure alignments giving corresponding sequence alignments were calculated using TM-align (Zhang, 2005). Within the sequence alignment, contiguous amino-acids of length four or more that were aligned in the target and template sequence were considered to define the template fragments useful for approximating the target. In addition to the standard (unmodified) template, a fragment-optimized template was created, in which the optimum positioning of each template fragment was leastsquares supoerposed independently onto the corresponding target fragment. The structural similarity of both the standard and fragment-optimized templates to the target structure were compared using RMSD, GDT-HA, GDT-TS and HBScore. The suitability of a predicted structure for loop modelling increases with greater accuracy of the three terminal 47 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada We have developed a predictor for residueresidue contacts in alphahelical TM proteins, utilizing data on sequence space separation, amino acid content and correlated mutations of residues. Additional data include features unique to alphahelical TM regions such as the predicted distance of a residue to the membrane center. Our predictor uses a trained classifier based on support vector machines, a statistical method with a good track record, which is well equipped for the diverse data available. A challenge in these kinds of calculations is the size of the data the model has to digest, which has extensive computation times as a result. We are addressing this issue by studies on the influence of the different input data separately and in combinations to gauge what would convey the best compromise in terms of speed and predictive performance. A conclusion from this work is that the predicted distance to the membrane center is a valuable addition to the more tradional sorts of input previously tried for soluble proteins. Our method's results are on par with previous methods for soluble proteins for predictions on individual chains and satisfactory also for whole, multi-chained, proteins. Future prospects involve using the contact predictions as input for other tasks, e.g. prediction of helix binding sites or as initial input for fragment assembly algorithms. REFERENCES [1] E. Wallin and G. von Heijne, “Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms,” Protein Science, vol. 7, pp. 1029– 1038, 1998. [2] E. Granseth, H. Viklund, and A. Elofsson, “Zpred: predicting the distance to the membrane center for residues in alpha-helical membrane proteins.,” Bioinformatics, vol. 22, pp. e191–e196, Jul 2006. [3] V. Vapnik, The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995. [4] C. J. C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, no. 2, pp. 121–167, 1998. 36: COMPUTATIONAL METHODS TO ADVANCE FROM CRYSTALLOGRAPHIC MODEL TO ENZYME MECHANISM AND STRUCTUREFUNCTION RELATIONSHIPS Troy Wymore & Adam Kraut (National Resource for Biomedical Supercomputing, USA) residues on each conserved fragment, the anchor regions (Fiser et al., 2000). To gauge the effect of fragment movement upon loop modelling, the RMSD of the anchor regions (the three terminal residues of each fragment) in the standard and fragment-optimized templates were also calculated. Our results demonstrate that optimal independent fragment movement gives improvements over the template structure, with mean improvement in RMSD, GDT-TS and GDT-HA of 0.7 Angstroms, 5.4% and 6.3% respectively. For a minority of models these improvements are substantial, with the upper quartile showing improvements of 0.8 Angstroms RMSD, 8.25% GDT-TS and 10% GDT-HA. Little change was observed in the hydrogen bonding as measured using HBScore. The scope for improvement upon the template by rigid fragment movement varies as a function of template quality, with templates showing approximately 80% coverage of the target offering the greatest scope for improvement. Median change in anchor RMSD was close to zero, however the magnitude of reductions were generally greater than the increases in anchor RMSD, indicating that the fragment optimised template is better for loop modelling overall. These results demonstrate that there is still scope for much greater improvement over the template structure via fragment movement than is currently being realised in even the best template-based modelling techniques. REFERENCES Chothia C, Lesk AM: The relation between the divergence of sequence and structure in proteins. EMBO J 1986, 5(4):823-826. Fiser A, Do RKG, Sali A: Modeling of loops in protein structures. Protein Sci 2000, 9:1753:1773 Lance BK, Wood GR, Deane CM: How good can templatebased modelling be? In Preparation. Mizuguchi K, Deane CM, Blundell TL, Overington JP: HOMSTRAD: a database of protein structure alignments for homologous families. Prot Sci 1998, 7:2469-2471. Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A: Critical assessment of methods of protein structure prediction - Round VII. Proteins 2007, 69(S8):3-9. Zhang Y, Skolnick J: TM-align: a protein structure alignment algorithm based on TM-score. Nucleic Acids Res 2005, 33:2302-2309. 35: CONTACT PREDICTION FOR MEMBRANE PROTEINS Aron Hennerdal & Arne Elofsson (Stockholm University, Sweden) Due to difficulties in the experimental determination of the structure of transmembrane (TM) proteins, only relatively few such structures have been deposited in the protein databank (PDB). The alpha-helical class of TM proteins is the most common and in many ways the most interesting since it contains many novel drug targets. Structure prediction for this class is in its infancy and many strategies that have been proven useful for soluble proteins have yet to be implemented and possibly modified. 48 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada complex model, we experimented with aspects of the LysTyr-Ser catalytic triad including alternative protonation states and side chain orientation through classical MD simulations. We found that it was necessary to manually manipulate the Lys159 side chain conformation from the one present in the crystal structure in order to be optimally placed to assist in the stabilization of the tyrosinate, Tyr155. MD simulations of several nanoseconds (ns) were insufficient to observe this very small change. The reactive configurations were then used in umbrella sampling QM/MM simulations to determine the function of active site residues and the role that water molecules play in this reaction. The importance of these water molecules is not easily obtained except through these specialized simulations. Through evolution, a S-HPCDH has arisen to catalyze the oxidation of S-HPC. R- and S-HPCDH are one of two cases in which pairs of stereospecific dehydrogenases act in concert in one metabolic pathway. The sequence of SHPCDH shares only 41% sequence identity with R-HPCDH and the structure is unknown. Therefore, several comparative modeling and docking programs were examined for their ability to output an approximate Michaelis complex model that not only identifies the evolved binding site but also was useful for detailed atomic simulation. Our results show that side chain placement by the program SCWRL3 was critical for subsequent docking and simulation. Docking S-HPC into the active site with the program AUTODOCK was generally successful in determining the new location of the sulfonate-binding site. Finally, use of these models for classical MD simulation and subsequent QM/MM simulations of the reactions will be presented. 37: MOLECULAR SURFACE ABSTRACTION Gregory Cipriano, George Phillips & Michael Gleicher (University of WisconsinMadison, USA) We present tools to study protein interfaces. Our approach uses abstracted representations of the shape of the molecular surface and the physio-chemico-properties around it at various levels of scale. We demonstrate three applications: visual inspection, crystal contact complementarity, and ligand pocket morphology. Computational methods and strategies for constructing a Michaelis complex model in two evolutionarily related Dehydrogenases (one with a known crystallographic model and one without) as well as the accuracy of subsequent enzymatic reaction simulations with hybrid Quantum Mechanical/ Molecular Mechanical (QM/MM) methods from these different models will be presented. Elucidating the mechanism of enzymatic reactions and reproducing experimental reaction rates through computational methods remains an enormous challenge despite notable advances in computational methods and protein crystallography. In most cases, significant modifications of the enzyme crystallographic model must be undertaken in order to obtain a Michaelis complex model with all the critical interactions between substrate and enzyme present. These modifications minimally require the appropriate addition of protons to heavy atoms but could also include docking of the natural substrate into the active site, addition or reorientation of water molecules and alternative placement of key side chains. Finally, an appropriate and relatively computationally expensive Quantum Mechanical (QM) method in conjunction with simulation methods must to be employed to obtain free energy profiles of competing mechanisms. The task of generating such a model is made even more challenging if the structure of the enzyme has not been determined through crystallography. Yet, use of comparative modeling techniques is sufficient in cases of trivial sequence similarity to generate protein models that recapitulate several aspects of the actual structure. Unfortunately, the efficacy of comparative protein models for subsequent investigations with classical molecular dynamics (MD) simulation that employ molecular mechanical (MM) force fields is questionable and even more so if resolution of mechanistic controversies is sought with QM/MM methods. In this presentation, we will first describe computational strategies for simulating RHydroxypropylthioethanesulfonate Dehydrogenase (HPCDH) enzymatic reactions starting from a crystallographic model (1.8 Å-resolution) of the enzyme that contains a reaction product. In order to obtain a Michaelis The goal of this project is to create tools for understanding and characterizing protein surfaces. In particular, we aim to enable the study of the interfaces between proteins and their interaction partners to enable the understanding and prediction of function based on protein structure. A premise of this analysis is that the shape of the protein's surface in the interacting region, and the physical and chemical 49 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada polar regions occupying a different percentage of the total contact area. For these we were able to confirm a hypothesized correlation between high salt concentration at the time of crystallization, and highly polar crystal-contact regions. Morphological Study of Ligand Binding Pockets: We apply surface abstraction to study the binding pockets of common ligands. Abstracted representations are created for over 100 PDB entries with known bound ATP ligands. Examining the portions of the surfaces in proximity to the ligand reveals both diversity and common patterns in the active sites. This study is enabled by the surface abstractions that afford statistical characterization of the diverse patches. In the future, we expect this statistical characterizations of protein surfaces can be applied to larger corpora of proteins to provide tools for automated characterization, annotation, and classification. [1] Greg Cipriano, Michael Gleicher. "Molecular Surface Abstraction." IEEE Transactions on Visualization and Computer Graphics (Proceedings Visualization 2007). October 2007. 39: HIGH-THROUGHPUT CRYSTAL STRUCTURE PREDICTION OF DRUG-LIKE MOLECULES Bashir Sadjad, Zsolt Zsoldos and Aniko Simon (Simulated properties around it, are central to any interaction as they form the interface to partners. Tools for studying protein interaction, therefore, must consider these functional surfaces. However, due to their size and complexity, protein surfaces can be difficult both to assess visually and characterize quantitatively. Therefore, abstracted (simplified) representations of the functional surface are important components of tools for studying protein\ interfaces. Abstracted representations afford easier visual inspection, more robust shape analysis, parameterizations for encoding properties on/around the surface, and areal descriptors that allow for statistical aggregation. We have developed molecular surface abstractions that provide a multi-scale representation of the molecular surface shape and physical properties around it [1]. These abstractions simplify the functional surface by first selectively removing high-frequency detail in the surface geometry. Other physio-chemico-properties (e.g. charge, hydrophobicity) are then aggregated and smoothed, to produce a coarse representation of the original fields. To avoid bias, fields are sampled onto the surface, and then aggregated according to the overall smoothing amount. For surface analysis, this process can be repeated over multiple smoothing kernel sizes, producing a hierarchy of features for a given surface point. To date, we have explored molecular surface abstraction in three applications: Visual Abstraction of Protein Surfaces: We provide a tool for visual inspection of the functional surfaces of proteins that displays abstracted views [1]. These views depict the surface with detail suppressed, coloring, surface textures and symbols. The included figure shows striped yellow patches to denote regions of the surface in contact with known ligands, 'H' symbols to highlight potential hydrogen bonding regions, and surface coloring to indicate electrostatic charge, which has been abstracted to emphasize major positive and negative regions. The abstractions facilitate comparison: The figure depicts ribonuclease proteins from two frog species (1M07 and 215S) whose enzymatic activity varies by a factor of 100. The important similarities and differences between these contact regions are readily apparent because extraneous detail has been removed. Also note that, though the charge distribution differs between the two surfaces, it remains essentially the same in the contact regions. Complementarity Analysis of Crystal Contacts: We apply surface abstraction to study protein-protein interaction in the context of crystal-contacts in a packed crystal structure. In these cases, abstracting the surface can help to 'flatten' the contact patch, as the geometricsmoothing step removes high-frequency detail that can confound surface parameterization. This, in turn, simplifies the task of assessing the properties of each patch, and allows registration of one patch with another to study both sides of a contact region. We show such a registration in the accompanying figure. As a proof of concept, we looked at four crystallizations of myoglobin (1BZP, 1DTI, 1JW8, 1U7R), each occupying a different space group and having Biomolecular Systems, Canada). We are developing a method to predict crystal structures of drug-like molecules. This helps inclusion of physical/material properties in the the lead optimization process. The initial results for rigid molecules show that our method is capable of producing crystal structures very close to the observed experimental ones, preserving key interactions. Computational methods are widely used by pharmaceutical companies. High-throughput screening (HTS) and its 'in silico' virtual version (VHTS) have improved the process of finding 'hits' by enabling the discovery chemists to test big libraries of molecules against a target. While traditionally in the lead optimization stage, drug potency and selectivity have been the main targets of the optimization, recently there has been a shift toward including the physical properties of the drug form (e.g., solubility) in this stage [1]. We are trying to develop a computational method to predict the possible crystal structures first and their corresponding lattice energies second. This is an analogy to the VHTS tools used for docking and we call it high-throughput crystal prediction or HTCP. There are many groups that are trying 50 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada shows two sample crystal structure of a cyclic amide overlayed. Our closest predicted structure is shown by molecules with green bonds and the experimental structure is the CSD refcode RUVZEN (the picture is generated using 'mercury', the visualization software of CCDC). The RMSD between the 6 closest neighbors is 1.02 angstrom in this example. The key hydrogen bonds are also shown and it can be seen that they are preserved in the predicted structure. The closest generated structure by our method has an average RMSD of 1.08 angstroms for the test set we used. There are two major issues that we are currently working on. The first is the ranking of the generated structures. It is important to use a fast scoring function in the initial search. However once a subset of good candidates are selected for further optimization a more accurate scoring function is required to properly rank them. The second issue is to add the flexibility into our search to beable to optimize the conformation and unit cell parameters at the same time. In future we are planning to extend the current method to work for more complex asymmetric unit cells as well. This means inclusion of salts or water molecules in the unit cell or extending the asymmetric unit cell to more than one molecule. REFERENCES [1] Gardner, C.R. and Walsh, C.T. and Almarsson, O., Drugs as materials: valuing physical form in drug discovery., Nature Reviews - Drug Discovery, 2004, 3(11), 926--34. [2] Day, G. M. et al.,A third blind test of crystal structure prediction., Acta Crystallographica Section B, 2005, 61(Pt. 5), 511--527. [3] Zsoldos, Z., et al., eHiTS: a new fast, exhaustive flexible ligand docking system., Journal of Molecular Graphics and Modelling, 2007, 26(1), 198--212 [4] Allen, F. H., The Cambridge Structural Database: a quarter of a million crystal structures and rising., Acta Crystallographica Section B, 2002, 58(1 Pt. 3), 380--388. [5] Chan, T.M. and Sadjad, B.S.Geometric optimization problems over sliding windows., International Journal of Computational Geometry and Applications, 2006, 16, 145-157. 40: THE JENA LIBRARY OF BIOLOGICAL MACROMOLECULES - JENALIB: NEW FEATURES Rolf Huehne, Frank-Thomas Koch and Juergen Suehnel (Fritz Lipmann Institute, Germany) to build such a computational method and there have been several blind tests held by the Cambridge Crystallographic Data Centre (CCDC). The third of which in 2004 showed that there is still a long way to reliably predict crystal structures especially for flexible molecules [2]. From the geometric search perspective, some of the current tools use stochastic methods while some others are more systematic but use too coarse grids that can not reliably produce structures close enough to the experimental ones. We are trying to develop a method that can guarantee a certain accuracy level while having an acceptable speed. There are some fundamental new elements in our search approach. We start from a pair of neighbor molecules in a hypothetical crystal structure. For this step we rely on the fast shape fitting engine developed for the eHiTS docking software [3]. Fixing a pair of molecules puts some constraints on the crystal space group and unit cell parameters. On the other hand, we only generate pairs that satisfy a set of geometric and energetic constraints. These constraints are extracted from statistics collected from the Cambridge Structural Database (CSD) [4]. Some of these constraints are purely geometric, for example we know that for each molecule there should be another molecule where a certain ratio of the two molecules surfaces is in contact with each other. Some other constraints depend on the physicochemical properties of the molecules and the type of interactions they have. The bottom line is that all these constraints are statistically validated using the vast information stored in CSD. Our scoring function is also based on statistics collected from the CSD. We define a set of interaction types. For each occurrence of an interaction type in CSD, we collect the relative geometry of the participating atoms. Estimating the expected probability for each configuration (i.e., an interaction type plus the relative geometry of participating atoms in it), we assign an energy value to each configuration using ideas inspired by the Boltzmann distribution. Efficiency is a major concern for our HTCP tool, mainly because the number of structures that we generate is huge (hundreds of thousands to millions). For this reason we have developed special structures and algorithms to process molecules. For example we use a set of vectors for systematic sampling of the shape of a molecular fragment. The shape of the fragment is represented by the lengths of these vectors from the center of the fragment to the point where the vector intersects the surface of the molecule. This allows a fast generation of non-clashing neighbor molecules as the initial pairs used for crystal structure generation. It can also guarantee a certain level of geometric accuracy because of the bounds proven for the vector set used (this is the basis of some of the geometric approximation algorithms [5]). Our tests on a set of rigid molecules shows that our method is capable of generating a structure very close to the experimental one. To compare two crystal structures we followed the method used in the aforementioned blind tests which is to overlay one central molecule and calculate the RMSD value for a set of neighbor molecules. The figure The JenaLib (www.fli- leibniz.de/IMAGE.html) offers value-added information for all PDB and NDB database entries, e.g.: PDB/NDB atlas pages, QuickSearch, PDB/UniProt alignments, 51 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Disulfides are generally viewed as structurally stabilizing elements in proteins. However, it is well known that some disulfides are redox active and capable of being reduced under physiological conditions. The enzymatic role of redoxactive disulfides in thiol-disulfide reductases is generally appreciated but it is less well-known that redoxactive disulfides also act as redox-sensitive switches of protein function [1,2]. Thiol-based regulatory control of proteins has been demonstrated to be an important physiological control mechanism in response to changing redox conditions. In particular, redox-control of disulfides has been shown to mediate the oxidative stress response via control of transcription factors and other signalling molecules. It is likely to be important in pathological conditions involving abnormal redox states such as cardiovascular failure and ageing [3]. The ability to distinguish between structural and redoxactive disulfides is important for elucidating protein function. Experimentally, the two types of disulfide can be distinguished by their redox potentials. Disulfide redox potentials measured in thiol-disulfide oxidoreductases range from -120mV to -270mV [4-7]. For disulfides serving structural purposes, the redox potential can be as low as 470mV [8]. However individual measurements of this kind are difficult and time consuming. A computational approach that can identify and characterize redox active disulfides will contribute significantly to our understanding of disulfide redox-activity. Our work seeks to understand the physical principles of disulfide redox-activity in protein structure. It has been observed that sources of strain in a protein structure, such as residues in forbidden regions of the Ramachandran plot and cispeptide bonds, are found in functionally important regions of the protein and warrant further investigation [911]. We hypothesize that disulfides that disobey known rules of protein stereochemistry have functional importance via redoxactivity. The Thornton-Richardson rules of disulfide stereochemistry specify disulfide bonds should not be found between cysteine pairs [12,13]: A. on adjacent β-stands; B. in a single helix or strand; C. on non-adjacent strands of the same β-sheet. D. adjacent in the sequence. In previous work in our lab, we have characterized the cross-strand disulfide: a likely redox active disulfide that violates rule A [14-16]. CSDs come in two flavors: antiparallel (aCSDs), which straddle antiparallel β-strands and, more rarely, parallel (pCSDs), which bridge parallel βstrands. aCSDs are by far the most common type of forbidden disulfide in solved protein structures. Here we identify seven additional subtypes that violate the ThorntonRichardson rules of disulfide stereochemistry and examine evidence for their involvement with functional redox activity [17]. [1] Choi, H.J., Kim, S.J., Mukhopadhyay, P., Cho, S., Woo, J.R., Storz, G. and Ryu, S.E. (2001). Cell 105, 103-13. [2] Littler, D.R. et al. (2004). J Biol Chem 279, 9298-305. Jmol-based molecule viewer, SNP and PROSITE motif mapping.Most recent new features are: PFAM domain mapping, sequence pattern search, integration of various data (SAPs, Exons, Domains etc.) into PDB/UniProt alignments. The Jena Library of Biological Macromolecules (JenaLib, www.fli-leibniz.de/IMAGE.html) offers value-added information for all entries included in the Protein Data Bank (PDB) and Nucleic Acid Database (NDB), e.g.: - PDB/NDB atlas pages and entry lists - PDB sequence information extracted from atomic coordinates - PDB/UniProt alignments that clearly indicate gaps, mutations, numbering irregularities and modified residues - Integration of data on single amino acid polymorphisms (SAPs), PROSITE motifs with PDB, GO and taxonomy information - Platform-independent Jmol-based molecule viewer that offers integrated viewing of ligand, site, SAP, PROSITE and SCOP information both for asymmetric and biological units - QuickSearch option that allows searching for PDB/NDB code, UniProt ID/Accession and other search terms in one input field The most recent new features are: - PFAM domain mapping, classification tree browser and visualization - Integration of various data into PDB/UniProt alignment view, e.g.: SCOP/CATH/PFAM domains, PROSITE motifs, SAPs, Exons - Sequence homology search option (BLAST) - Sequence pattern search option Offering all this information and analysis tools in one place makes JenaLib a unique resource for the dissemination of 3D structural information on biological macromolecules. 41: COMPUTATIONAL INSIGHTS INTO REDOXACTIVE DISULFIDES IN PROTEIN STRUCTURES Samuel Fan, Richard George, Naomi Haworth & Merridee Wouters (Structural and Computational Biology Program, Victor Chang Cardiac Research Institute, Australia). We are characterizing potentially redoxactive disulfides in structures. Our previous studies investigated disulfide torsional energies and a structural motif associated with redox-activity: the cross-strand disulfide, which links adjacent beta-strands. Here we searched for other “forbidden” disulfides which violate rules of protein stereochemistry and examine evidence supporting redox activity. 52 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada [3] Humphries, K.M., Szweda, P.A. and Szweda, L.I. (2006). Free Radical Res. 40, 1239-43. [4] Huber-Wunderlich, M. and Glockshuber, R. (1998). Folding & Design 3, 161-71. [5] Krause, G. and Holmgren, A. (1991). J Biol Chem 266, 4056-66. [6] Lin, T.Y. and Kim, P.S. (1989). Biochemistry 28, 52827. [7] Wunderlich, M. and Glockshuber, R. (1993). Protein Sci 2, 717-26. [8] Gilbert, H.F. (1990). Adv Enzymol Relat Areas Mol Biol 63, 69-172. [9] Gunasekaran, K., Ramakrishnan, C. and Balaram, P. (1996). J Mol Biol 264, 191-8. [10] Pal, D. and Chakrabarti, P. (2002). Biopolymers 63, 195-206. [11] Herzberg, O. and Moult, J. (1991). Proteins 11, 223-9. [12] Richardson, J.S. (1981). Adv Protein Chem 34, 167339. [13] Thornton, J.M. (1981). J Mol Biol 151, 261-87. [14] Haworth, N.L., Feng, L.L. and Wouters, M.A. (2006). J Bioinform Comput Biol 4, 155-68. [15] Haworth, N.L., Gready, J. E., George, R.A., Wouters, M.A. (2007). in press. [16] Wouters, M.A., Lau, K.K. and Hogg, P.J. (2004). BioEssays 26, 73-9. [17] Wouters, M. A. George, R. A., Haworth, N.L (2007) Current Peptide and Protein Science 8, 000 prediction, assessment, and web-based visualization of thousands of candidate models. A critical first step in comparative modeling is the accurate alignment of the target with the template structure. Introducing errors in the alignment phase will ultimately lead to incorrect models that cannot be improved without an aggressive and time-consuming refinement phase. It has been shown that by sampling along the alignment path, with stochastic dynamic programming, so-called ‘suboptimal’ alignments can actually yield alignments as good as the structural alignment. Between 2,000 and 5,000 alternative alignments were generated as inputs to Modeller. By thoroughly sampling alignment space in this manner, we have identified several methods that can reliably identify the most native models among an ensemble. This approach exercises structural assessment rather than sequence-based homology measures. Our database, called SA-COMPAS, contains a detailed repository of template-based protein structures generated from alternative alignments. Targets were chosen from CASP6 and CASP7 TBM category. Each model in the dataset has many assessment scores calculated. Assessments currently include in SA-COMPAS are DOPE, DFIRE, and ProsaII atomic statistical potentials, Pcons and ProQres for global and local quality, Rosetta energy score, and the CHARMM energy coupled with a generalized born implicit solvent model (MMGB). Two additional scores, Psipredpercent and Psipredweight, describing the agreement of predicted secondary structures by Psipred and the model’s actual secondary structure as derived by DSSP. Our results indicate the scores based on secondary structure to be the most effective for discriminating models with incorrect alignments. A low Psipred score generally means that secondary structures of the final model are not well formed. In order to compare the effectiveness of these assessment scores, every predicted model in the database has been compared to the crystallographic coordinates in several ways. Perhaps the most important similar score calculated is the GDT_TS score, which is standard in the CASP assessments. Other measures included are TMscore, MaxSub, RMSD, fraction of correct native contacts, and percent correct chi, psi, and rho values. Continuing efforts to add value to the databank will include adding more targets to enumerate the entire known fold space as well as calculating more quality assessment scores as they become available. Recently we have added comparative modeling experiments that considered multiple template structures. Given the increasing size of the PDB today it is not uncommon to have cases where several good templates exist and it’s nontrivial to select which one will ultimate produce a better model. Our dataset indicates whether or not structural assessment can consistently choose the best structure from ensembles having models built from multiple templates. The core capabilities of the SA-COMPAS database resource include searching the databank by keywords (literature), database accession numbers (PDB, Uniprot), protein function, and sequence similarity (BLAST). Researchers interested in comparative modeling are able to graph scores 42: SA-COMPAS: A RESOURCE FOR PREDICTION, ASSESSMENT, AND WEB-BASED VISUALIZATION OF COMPARATIVE PROTEIN MODELS Adam Kraut and Troy Wymore (National Resource for Biomedical Supercomputing, Pittsburgh, USA) Here we present a resource to manage the prediction, assessment, and web-based visualization of large ensembles of predicted protein structures. Over 300,000 structures so far were predicted and assessed with model quality assessment programs, statistical potentials, and molecular mechanics energy calculations to compare effectiveness in the context of comparative modeling. Assessing the quality of predicted protein structures is an important problem of theoretical and practical interest. The most effective protein structure prediction methods are ones that generate large ensembles of possibilities and then employ statistical potentials or scoring functions to identify the best models among these ensembles. We have developed a set of resources called SA-COMPAS that manages the 53 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada actual interfaces that are involved in oligomerization are inferred from X-ray crystallographic structures using assumptions about interface surface areas and physical properties. In many cases, these hypothetical interfaces are correct, but in other cases they may not be. Our previous study showed that annotations on biological units in the Protein Data Bank (PDB) and the Protein Quaternary Server (PQS) agree only about 80% of the time. We examined thoroughly the interfaces in crystals of single homologous proteins in SCOP families. We attempted to answer several questions. First, when are two crystals of the same or similar proteins really the same crystal form and when are they not? We find surprisingly that PDB entries with the same space group, asymmetric unit size, and very similar cell dimensions and angles (within 1%) does not guarantee that two crystals are actually the same crystal form, that is containing similar relative orientations and interactions within the crystal. Conversely, two crystals in different space groups may be quite similar in terms of all of the interfaces within each crystal. Similar crystal forms can be combined into a crystal form group if all interfaces with ASA ≥ 200Å2 in one entry have corresponding interfaces in another entry and at least 2/3 interfaces with ASA ≥ 200Å2 in the second entry are similar to some interfaces of the first entry. PDB entries in a family are then divided into different crystal form groups (“CFGs”). Second, we examined the hypothesis used by many crystallographers to infer biological interactions: observation of the same interface in different crystal forms of a protein (or members of the same family) suggests that the interface may be biologically relevant. We compared all interfaces in the available CFGs in each family and determined those shared by two or more CFGs. We determined the number of CFGs with a common interface, M, compared to the total number of different CFGs in the same family, N. The usefulness of these numbers is evaluated with prior benchmarks on oligomeric interactions as well as with NMR structures. NMR structures and the benchmark of PDB crystallographic entries consisting of 126 dimers and larger structures and 132 monomers were used to determine whether the existence or lack of existence of common interfaces across multiple crystal form groups can be used to predict whether a protein is an oligomer or not, and to identify oligomeric interfaces if they exist. Monomeric proteins tend to have common interfaces across only a minority of crystal form groups (M<<N), while higher order structures exhibit common interfaces across a majority of available crystal form groups (See the figure which plots the number of interfaces we find in the benchmark for dimers and oligomers vs M and N). The data can be used to estimate the probability that an interface is biological if two or more crystal form groups are available. We find 36 families in which all N out of N crystal form groups contain a particular interface, where N≥10. These interfaces are very likely to be physiological. Third, we examined the usefulness of evolutionary information in evaluating interfaces appearing in more than one crystal form. It occurs often that different crystal forms of identical proteins contain common interfaces but that within and across ensembles, examine the correlation between assessment scores, and visually inspect the structure and the alignment within the web interface. We also provided tools for generating stochastic alignments as well as building models from alignments using Modeller9v4. Models, alignments, and all additional calculated data can be downloaded per target or in batch from our server. We’ve also cross-linked our database entries with external resources such as PDBsum, CATH, SCOP, and the RCSB PDB. Also of interest to researchers is the modern software architecture used for the web interface. The implementation of SACOMPAS leveraged many high-quality open-source technologies. The relational database engine is powered by MySQL and currently contains almost 1,000,000 individual records. Server-side tasks are written in the Ruby programming language and follow a Model-View-Controller (MVC) pattern. Client-side interaction is handled by the jQuery JavaScript framework. Two well-established Java applets are used as additional components of the web interface. JMol provides 3D molecular graphics and Jalview provides sequence alignment editing and graphics. SACOMPAS is available at http://sacompas.cb.nrbsc.org 44: STATISTICAL ANALYSIS OF INTERFACES IN CRYSTALS OF HOMOLOGOUS PROTEINS Qifang Xu and Roland Dunbrack (FCCC, Philadelphia, USA) Many proteins function as homooligomers. Examination of interfaces across different PDB entries in SCOP families identified common interfaces, which exist in different crystal forms. These interfaces are likely to be biological by testing in benchmark data and NMR structures, and can be used to predict the oligomeric status of a protein. Many proteins function as homooligomers and are regulated via their oligomeric state. Homooligomerization may be part of allosteric regulation, or contribute to conformational and thermal stabilities and to higher binding affinity with other molecules. Mutations in these interfaces may be associated with deleterious disease. For instance, Caffey disease is a genetic disorder caused by abnormal dimeric chain due to a missense mutation in exon of the gene encoding the a1 chain. For some proteins, the stoichiometry of homooligomeric states under various conditions has been studied using gel filtration or analytical ultracentrifugation experiments. The interfaces involved in these assemblies may be identified using crosslinking and mass spectrometry, solution-state NMR, and other experiments. But for most proteins, the 54 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada plasticity, learning and memory, and is believed to be the target for the noble gas xenon to produce general anesthesia. NMDA receptor is a hetero-oligomer composed of three types of subunits: NR1 and NR2 (or NR3). Its activation requires binding of neurotransmitter glutamate to the NR2 subunits and simultaneous binding of co-agonist glycine to the NR1 subunits. With both of these native agonists bound, the S1S2 clefts in the extracellular binding domains are closed and the ion channel is opened allowing permeation of ions. When antagonist, such as 5,7-Dichlorokynurenic acid (DCKA), is bound to NR1, the S1S2 cleft is opened and the ion channel is closed. Here, the opened or closed model refers to the structure with the S1S2 cleft—rather than ion channel—opened or closed. This study focuses on the interaction of xenon with the ligand-binding domains of the NMDA receptor to gain insights into the possible mechanisms of xenon’s anesthetic action. We chose two X-ray NMDA receptor structures and performed xenon docking using Autodock (1) and over 20ns MD simulations using NAMD2 (2) in the absence and presence of xenon. The structure 1PBQ (two NR1 subunits complexed with the antagonist DCKA) represents the opened NMDA model (3), whereas 2A5T (dimer of one NR1 subunit with a glycine bounded and one NR2 subunit with a glutamate bounded) represents the closed model (4). Our study revealed several potential xenon-binding sites in the ligand-binding domain, including the interface between Domain 1 of the two subunits and the hinge region of the S1S2 adjacent to the glycine- and glutamate-binding sites. Our comparative study on the molecular dynamics of the binding domains indicates that xenon has different effect on the closed and the opened conformations of the S1S2 cleft. A previous investigation suggested that xenon’s anesthetic effect might arise from xenon’s competition with the native agonists for the binding sites (5). Although the xenon binding sites near the glutamate and glycine binding sites exist, xenon occupation at these sites does not displace the native agonists in the 20-ns simulations, and the closed model (2A5T) is unchanged and seems insensitive to xenon. In contrast, in the opened conformation (1PBQ) when the native agonists are absent, xenon enhances the opening of S1S2 cleft, suggesting that xenon competition with glycine is indirect by stabilizing the opened conformation of S1S2 cleft and thereby making glycine binding to the open cleft less favorable. Another possible mechanism of xenon action on NMDA receptor involves disruption of the “communication” between two subunits. Previous studies suggest that the functional formation of the dimer occurs due to interaction between the Domain 1 of two subunits (6). Our MD simulations revealed that xenon binding at the Domain 1 interface near the hinge of S1S2 cleft is stable over the course of the simulation. This interaction may also contribute to the xenon’s effects on NMDA receptor function. This study was supported by a grant from the NIH (R01GM066358 and R01GM056257) and NCSA through PSC. REFERENCES these usually appear in only 2 or 3 such forms and are not shared by homologous proteins. That is, they are probably only formed under non-physiological crystallization conditions including high protein concentration, peculiar pH, and the presence of nonphysiological ligands. This has previously been observed for T4 lysozyme, which has been studied in many crystal forms. The benchmark data indicate that when an interface is shared in as few as two different crystal forms by divergent proteins (<90% identity), then the interface is very likely to be biologically important. This highlights the importance of solving structures of related proteins. We also find that in large families, some interfaces are restricted to one branch of a family, indicating the evolution of an interface in one branch of the family and/or loss in another. Finally, we compared interfaces common to multiple crystal forms with the annotations found in the PDB, PQS, and PISA. With an increasing number of crystal form groups that contain a given interface, it becomes increasingly likely that the available annotations agree that such an interface is part of a biologically relevant assembly. PISA is found to be the most reliable in identifying interfaces for which the evidence, in terms of number of crystal forms containing the interface, seems very high. PISA is therefore the best source of biological assembly information when only one or two crystal forms are currently available. The PDB in particular is missing highly likely biological interfaces in its biological unit files for about 10% of PDB entries. 45: XENON EFFECTS ON LIGAND BINDING DOMAIN OF NMDA RECEPTOR Lu Liu (University of Pittsburgh School of Medicine, USA), Yan Xu (Department of Anesthesiology, Pharmacology, Structural Biology, and Computational Biology, University of Pittsburgh School of Medicine, USA) & Pei Tang (Department of Anesthesiology, Pharmacology, Structural Biology, and Computational Biology, University of Pittsburgh School of Medicine, USA) MD simulations for interaction of xenon with its putative anesthesia target, N-methyl-D-aspartate (NMDA) receptor, suggests two possible mechanisms of action: to enhance opening of the agonist binding domains to prevent agonist binding; and to reside at the domain interface to disrupt the functionally important interplay between subunits. N-methyl-D-aspartate (NMDA) receptor is a member of excitatory neurotransmitter receptors essential for synaptic 55 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada 1. Huey, R., Morris, G. M., Olson, A. J. and Goodsell, D. S. (2007) J. Computational Chemistry 28, 1145-1152. 2. James C. Phillips, R. B., Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kale, and Klaus Schulten. (2005) Journal of Computational Chemistry 26, 1781-1802. 3. Furukawa, H. a. G., E. (2003) The EMBO journal 22, 2873-2885. 4. Furukawa, H., Singh, S. K., Mancusso, R. and Gouaux, E. (2005) Nature 438, 185-192. 5. Dickinson, R., Peterson, B. K., Banks, P., Simillis, C., Martin, J. C., Valenzuela, C. A., Maze, M., and Franks, N. P. (2007) Anesthesiology 107, 756-767. 6. Armstrong, N., Jasti, J., Beich-Frandsen, M. and Gouaux, E. (2006) Cell 127, 85-97. molecule is coupled with three sodium ions, one proton and followed by the counter transport of one potassium ion yielding an electrogenic uptake. Five members of human glutamate transporters (EAAT1-5) have been characterized after the first cloning of three rat transporters GLAST, GLT1 and EAAC1. This family also includes two neutral amino acid transporters ASCT1 and ASCT2, as well as a number of homologous prokaryotic amino acid and dicarboxylate transporters (2). Topology models based on the cysteine-scanning accessibility studies of the mammalian and bacterial carriers was recently advanced by the crystal structure of an archaeal transporter, GltPh (3). The first half of the protein forms six α-helical transmembrane (TM) helices and the second half is comprised of two reentrant loops (HP1 and HP2), a seventh TM helix, interrupted by a β linker and an amphipathic TM8 helix (Fig.). Top Fig. panel. Trimeric Structure of Glutamate transporter, Gltph (1xfh). Top view from extracellular side (left) and side view perpendicular to membrane bilayer (right). Molecular dynamics simulations based on the crystal structure of GltPh showed that the HP2 loops possesses a strong tendency to move away from substrate binding site in the absence of a substrate. Gaussian network models (GNM) (4) and anisotropic network models (ANM) (5) on the other hand, suggest large-scale motions of the extracellular region of glutamate transporters, which would facilitate the crosslinking of the single cysteine mutants made in this region. The low frequency modes within these models have frequently been identified to be functionally important (6). Here, the first nondgenerate mode, reveals a symmetric opening/closing of the extracellular vestibule (top panel). This kind of motion perhaps plays a significant role in substrate recognition and binding. Bottom Fig. panel. Symmetric opening/closing of Gltph in first non-degenerate ANM mode. Left and Right figures display the ANM-predicted closed and open conformations, respectively. In the central figure, corresponding to the xray structure, the basin is exposed to the EC aqueous environment, while in the closed form contact between neighboring subunits occur (see for example the L34 loops colored red). This was confirmed by cysteine cross-linking experiments in the cysteine-less version of EAAT1, leading to functional defects in the glutamate transporter. A series of single cysteine mutants made in HP2b residues, formed intersubunit cross-links spontaneously and/or catalyzed by an oxidizing agent, copper phenanthroline (CuPh). The substrate accumulation activity of these mutants is completely inhibited after cross-linking, which can be reversed by treatment of DTT. With mutant V449C and V453C, we found that substrate or its analog D,L-threo-βbenzyloxyaspartate (TBOA) can prevent the inhibition of uptake activity during cross-linking. Conformational changes of biomolecules at different time scales are associated with their functions. Ideally, researchers would like to watch individual atoms moving within a protein. However, it is experimentally impossible at 47: LARGE SCALE MOTIONS IN GLUTAMATE TRANSPORTERS REVEALED BY ELASTIC NETWORK MODELS AND CYSTEINE CROSS-LINKING STUDIES Indira H Shrivastava1, Jie Jiang2, Susan G. Amara2 & Ivet Bahar1 (1Department of Computational Biology & 2 Department of Neurobiology, University of Pittsburgh, USA). We examined the most cooperative motions of the glutamate transporter using elastic network models. Our study suggests that the three subunits of the protein undergo concerted fluctuations that alternately increase/decrease the accessibility of the central aqueous basin to the extracellular region. These large scale motions are supported by cysteine cross-linking experiments in the mutants V449C and V453C of the cysteine-less version of human excitatory amino acid transporter (EAAT1). Glutamate transporters, also termed as excitatory amino acid transporters (EAATs), belong to a secondary active transporter family which utilizes the free energy stored in ions or solute gradients. These membrane proteins remove excess glutamate from neuronal synapse, ensuring precise synaptic communication between neurons preventing glutamate toxicity. Malfunction of glutamate transporters have been implicated in neurological diseases and psychiatric disorders (1). The influx of one glutamate 56 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada presents significant challenges in understanding how sequence ultimately conveys both structure and function. As concluded from CASP, CAPRI, and similar experiments, the best structure often does not get the top score. To more fully succeed at these tasks, critical insights into the protein sequence-structure-function relationships may be obtained through the characterization of the sequence space compatible with a protein structure. The task of engineering a protein to assume a target threedimensional structure is known as protein design. Practical applications of design include modifications of existing proteins to affect such characteristics as stability or binding affinity. A more ambitious goal is to design protein sequences that will assume novel structures or acquire new functionalities. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low energy sequences is often sought. Primarily, this is performed since an individual predicted low energy sequence may not necessarily fold to the target structuredue to both inaccuracies in modeling protein energetics and the non-optimal nature of search algorithms employed. Also, some low energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Thus, a thorough understanding of the low energy sequence space will enhance protein design efforts and also allow designers to focus on structural positions with high potential to be successfully mutated. Moreover, the investigation of low energy sequence ensembles will provide crucial insights into the pseudo-physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. Thus, we are interested in exploring the near-optimal sequence space induced by a widely-used energy function (in this case, the Rosetta function), in an attempt to comprehend the predictive consequences of using such energy functions for protein design. Methods and Results In this work, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a target structure. Based on the theory of probabilistic graphical models, our algorithm performs efficient inspection and partitioning of the near-optimal sequence space, without making any assumptions of positional independence. Specifically, the underlying computational tool we utilize is the representation of the protein design energy optimization problem as a probabilistic graphical model and subsequent application of the max-product loopy belief propagation algorithm for finding high probability sequences. We thus efficiently find minimal energy sequences when the underlying search space includes numerous possible rotamers (discrete side-chain present and the dynamics of proteins are mostly inferred from sophisticated biophysical methods which measure physical properties. Alternatively, mutagenesis studies may provide some valuable information on the in-depth mechanism of proteins if mutants or sulfhydryl modifications of cysteine substitutions lock biomolecules in specific conformational states. In contrast, computational simulations have the unbeatable edge to describe protein dynamics completely since they can follow the precise position of each atom at any instant in time, provided the high resolution crystal structure of the protein is known. Therefore, the combination of experimental studies with computational simulations is of great importance for elucidating allosteric motions bearing functional significance. REFERENCES 1. Amara, S. G., and Fontana, A. C. (2002) Neurochem Int 41(5), 313-318. 2. Slotboom, D. J., Konings, W. N., and Lolkema, J. S. (1999) Microbiol Mol Biol Rev 63(2), 293-307 3. Yernool, D., Boudker, O., Jin, Y., and Gouaux, E. (2004) Nature 431(7010), 811-818 4. Bahar, I., Atilgan, A. R., and Erman, B. (1997) Fold Des 2(3), 173-181 5. Xu, C., Tobi, D., and Bahar, I. (2003) J Mol Biol 333(1), 153-168 6. Bahar, I., and Rader, A. J. (2005) Curr Opin Struct Biol 15(5), 586-592 48: ACCURATE PREDICTION OF THE NEAROPTIMAL SEQUENCE SPACE FOR ATOMICLEVEL PROTEIN DESIGN Menachem Fromer (The Hebrew University of Jerusalem, Israel) & Chen Yanover Characterization of the sequence space compatible with a protein structure will provide insights into the sequencestructure-function relationship. We present a novel algorithm, based on probabilistic graphical modeling, to obtain near optimal sequences for protein design. Our approach obtains lower energy ensembles as compared to state-of-the-art methods and suggests intriguing biological insights. Introduction After decades of computational and experimental research on protein structure-function relationships, the 'protein folding problem' is solved to a certain degree. Nevertheless, while prediction methods often find the correct fold (and even a structure with low RMSD to the targeted structure), the additional step of identifying the top-scoring structure 57 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada conformations) for each amino acid type, without considering any sequence more than once. We benchmark the performance of our novel algorithm on a diverse set of protein design examples taken from the literature and show that it consistently yieldssequences of lower energy than those derived from state-of-the-art techniques (e.g. DEE, A*, Monte Carlo simulated annealing). Thus, we find that previously presented search techniques do not fully depict the low energy space as precisely. We also observe that for cases when the complete set of lowest energy sequences can be exhaustively enumerated, the algorithm empirically obtains this set. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of sequence and biochemical similarity and demonstrate how more diverse near-optimal sequences can be predicted by our algorithm in order to systematically overcome this bottleneck for computational design. Furthermore, we exploit our in-depth analysis of a collection of low energy sequences to generate novel biological hypotheses. This is possible since, in effect, a set of low energy sequences for the target structure characterizes sequences well-suited to fold to the structure. This information is summarized in a sequence profile (positionspecific scoring matrix, PSSM) that tabulates the positional amino acid probabilities for sequences predicted to fold to the structure. These profiles were then studied and used to suggest an interpretation of previously observed experimental design results for the calmodulin (CaM) protein. In conclusion, the novel methodologies introduced here accurately portray the sequence space compatible with a protein structure, thus providing a powerful instrument for future work on protein design. In addition, we have supplied a generic and customizable scheme to yield heterogeneous low energy sequences predicted to fold to a target structure. By providing an arsenal of varied (yet near-optimal) sequences, this protocol adds a layer of robustness to the design process and can thus play a critical role in enabling protein scientists to successfully continue developing novel proteins at an ever-increasing pace and scale. CATH database. We found that the distribution of protein conformational diversity is very heterogeneous also at the S60 level of homologous superfamilies. We found that this distribution is correlated with functional diverstification. It is well established that the native state of a protein is better described by a set of conformers with about the same energy and in dynamic equilibrium. This conformational diversity is a clue feature in proteins to understand their functions and sequence-structure relationship. Since the pioneering experiments of Max Perutz in the early 60s with his studies on the T and R forms of hemoglobin, the study of protein conformations has a central role in several areas of structural biology as functional characterization, drug design, development of docking and structural alignment techniques and the understanding of protein evolution. Here we study the extension and distribution of protein conformational diversity in proteins. To this end, we have used proteins with more than one crystallographic structure as derived from the CATH structural database. We collected all the proteins sharing the first 7 codes corresponding to CATH structural classification. In those cases were different chains of a given oligomeric structure were present, a single representative domain were chosen randomly. After this, we obtained 7700 proteins were the 45% of them have at least 2 crystallographic structures. Using this derived database we estimated the conformational diversity for each protein, using different measures of structural similarity as RMS, TMscore, GDT and Maxsub scores. An all versus all calculation for those structural similarity scores was performed using the structures for each protein in the derived database mentioned above. Then, for each score the maximum structural dissimilarity was registered and these values were used to estimate conformational diversity. These information were complemented with properties taken from other databases as protein length, oligomeric state (PQS and PDB), taxonomy (NCBI taxonomy database), functional annotation (GO terms and EC), and presence of ligands (PDB). In general we found that conformational diversity does not depend on protein length and on the number of crystallized structures for each protein. Using CATH structural classification, each of the three main structural classes (mainly alpha, mainly beta and mixed alpha and beta) seems to have similar content of conformational diversity. However, architectures and topologies for each class showed a clear heterogeneity in conformational diversity extension. At this level we found a strong correlation with functional diversity using GO terms classification of proteins. These results reflect the great diversity found when the homologous superfamily level was evaluated. At this level and in spite of the great structural similarity, the heterogeneity in conformational diversity is observed up the S60 level (homologous families with more than 60% identity). We also found that the extension of conformational diversity does not depend on the presence/absence of ligands. 49: DISTRIBUTION AND EXTENSION OF PROTEIN CONFORMATIONAL DIVERSITY Ezequiel Iván Juritz, Sebastián Fernández Alberti and Gustavo Parisi (Quilmes National University, Argentina) We studied the extension and distribution of protein conformational diversity as derived from the analysis of 58 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada probes (small organic molecules or functional groups), each in a very large number of poses. The binding free energy expression includes truncated van der Waals interactions (both attractive and repulsive), a simplified PB electrostatic term, and a structure-based pairwise potential. This function provides adequate accuracy and can be written in the form of a sum of correlations, which is suitable for calculation using FFT. After adapting the energy function to model probe binding to integral membrane proteins, the mapping applied to the open and closed channel structures provided very informative results. The open structure (determined by X-ray crystallography) has its energetically most important “hot spot” at the experimental drug binding site. No other binding site is found, as some residues at the external binding sites are not present in the structure, but there is a strong free energy “field” toward the internal site. We also docked amantadine and have shown that it binds at the experimentally determined position inside the channel. For the closed (NMR-derived) structure the mapping finds “hot spots” both at the internal site and at all four external sites. Docking of amantadine shows that it does not fit inside the closed structure, in spite of the strong hot spot, but binds outside as seen in the NMR structure. These results suggest that at high pH the four externally bound inhibitors improve the stability of the closed state, but as the pH is decreased and the channel would open, the inhibitor may shift to the internal site blocking proton transfer. 52: EXPLORING THE ACTIVATION MECHANISM OF A G-PROTEIN-COUPLED PROTEIN RECEPTOR, RHODOPSIN, USING NORMAL MODES FROM COARSE-GRAINED ELASTIC NETWORK MODELS IN MOLECULAR DYNAMICS SIMULATIONS Our results indicate that the distribution of protein conformational diversity is not uniform in the structural space provided by CATH database. There exists a strong heterogeneity in the extension of protein conformational diversity also within the S60 level. Although several reports indicates that homologous proteins with the same overall fold share their dynamics and structural deformations, here we found that the extension of the conformational diversity reached by a given protein is strongly influenced by functional constraints during evolution. 51: ANALYSIS OF POTENTIAL PROTON CHANNEL INHIBITION MECHANISMS BY COMPUTATIONAL PROTEIN MAPPING Dima Kozakov, Gwo-Yu Chuang, Dmitry Beglov, Ryan Brenke and Sandor Vajda (Boston University, USA). The influenza A virus proton channel in the open state binds an inhibitor in the middle of the four-helix channel, whereas in the closed state it binds four inhibitor molecules on the outer surface. Computational solvent mapping, a technique developed to determine “hot spot” regions of proteins, resolves the apparent controversy between the two binding mechanisms. The integral membrane protein M2 of influenza virus forms a pH-gated proton channel which is necessary for infection and hence it is an important drug target. A recent X-ray structure of the transmembrane region captures the channel in the open state. The channel was also crystallized with the inhibitor amantadine, bound in the middle of the four-helix bundle. An NMR structure, published at the same time, shows the channel in closed state, and reveals four amantadine-like inhibitor molecules binding at the channel’s lipid-exposed outer surface. Despite similarities in the structure, the different binding of the inhibitor in open and closed states infers different mechanisms of inhibition. Based on the xray structure, in the open state the drug blocks the channel and prevents proton transfer, whereas by the NMR structure the four drug molecules bound on the outside stabilize the closed state. In view of this contradiction it is not clear how well the in vitro structures represent the in vivo mechanism of inhibition. To study this controversy we applied computational protein mapping, a technique developed for the characterization of protein binding sites. The method performs an efficient global search based on the Fast Fourier Transform (FFT) correlation approach to evaluate the binding of a number of Basak Isin1, Klaus Schulten2, Emad Tajkhorshid2, & Ivet Bahar1 (1Department of Computational Biology, University of Pittsburgh, USA, 2Beckman Institute, Department of Physics, University of Illinois at Urbana-Champaign, USA). Rhodopsin is a member of pharmaceutically relevant Gprotein-coupled receptor (GPCR) family and serves as a prototype for understanding their activation. We studied functional motions of rhodopsin at atomic detail in the presence of water and lipids by proposing a new molecular dynamics protocol that utilizes normal modes derived from Anisotropic network model. G protein–coupled receptors (GPCRs) are involved in a number of clinically important ligand-receptor processes 59 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada surface. We seek to explore the global dynamics, while incorporating the effects of explicit residues and interactions with lipid and water at atomic detail. We propose for this purpose an algorithm, referred to as ANM-restrained MD, which uses the deformations derived from ANM analysis as restraints in MD trajectories. This permits us to sample the collective motions that are otherwise beyond the range of conventional MD simulations. With this new approach, we seek to incorporate the realism and accuracy of MD into ENM analysis while taking advantage of ENM to accelerate MD simulations. The steps of ANM-restrained MD can be summarized as follows (9): 1. Normal modes are generated using ANM. A subset of low frequency, global modes that are sufficiently decoupled from others, is selected. 2. Starting from the first mode associated with the lowest eigenvalue, harmonic restraints are applied in two opposite directions (plus and minus) in MD simulations. 3. The resulting two conformations are then subjected to energy minimization to relieve possible unrealistic distortions lead by the restraints. The conformer with the lower energy is then selected as the starting structure for the application of the next mode as new harmonic restraints. 4. When all modes in the subset are utilized in MD, a new set of modes is generated by ANM for the next cycle of ANM-restrained MD and the procedure described above is repeated using the new subset of modes. Figure 1 left panel shows the ribbon diagram of rhodopsin color coded by the residue-RMSDs between the starting and end conformations of the simulations, from red (least mobile) to blue (most mobile). TM helices and the cytoplasmic loops are labeled. We identify two highly stable regions in rhodopsin, one clustered near the chromophore, the other near the cytoplasmic ends of transmembrane helices H1, H2 and H7. The hinge site in the vicinity of the chromophore (Figure 1, right, bottom panel) includes residues that are directly affected by the isomerization of retinal, as well as those stabilizing all-trans conformation. We compared hinge site residues in the chromophore binding with the experiments investigating the decay rate of active rhodopsin (10). These experiments have been useful in estimating the role of a given amino acid in the structure and function of rhodopsin. 11 of 16 residues of the hinge site were studied by the experiments and found to affect the stability of the active state. Along with the validated hinges, 5 untested residues are proposed to be critical for active state stability and good candidates for decay experiments. In the second stable region (Figure 1, top, right panel), we found that two water molecules located in the cavity between helices H1, H2 and H7, connect the highly conserved NPXXY motif on H7 to highly conserved N-D pair on H1 and H2. This supports the previous suggestions that water molecules in the interior of GPCRs could play critical roles in regulating their activity (11,12). The CP ends of H3, H4, H5 and H6, and the connecting loops CL2 and CL3 at the CP region, are highly mobile with high RMSDs leading to the exposure of the ERY motif crucial for G-protein binding (Figure 1, left panel). and perform diverse functions including responses to light, odorant molecules, neurotransmitters, and hormones. The crystal structures in inactive states are available for only two GPCRs, rhodopsin and beta-adrenergic receptor and no structure has yet been determined for an active state of any GPCRs(1). Rhodopsin, the vertebrate dim-light photoreceptor, is one of the best-characterized members of GPCR family. The structure-function studies of rhodopsin provide the fundamental basis for understanding how members of the GPCR family work. Like all GPCRs, rhodopsin comprises cytoplasmic (CP), transmembrane (TM), extracellular (EC) domains and contains a bundle of seven TM helices (H1-H7)(2). The CP region includes three CP loops (CL1-CL3), a soluble helix (H8) and the C-terminus (C) (see Figure 1, left panel). Seven helices (H1-H7) span the TM region. This TM bundle encloses the chromophore, 11-cis-retinal, covalently bound to Lys296 on H7 and 11-cis-retinal (colored orange in Figure 1) acts as an antagonist in the dark. The EC region consists of three loops (EL1-EL3) and the N-terminus (N). Light absorption by rhodopsin isomerizes 11-cis-retinal to all-trans. Then, in chromophore binding pocket, structural perturbations trigger the rearrangement of helices and the exposure of critical sites for G-protein binding on the CP domain site (3,4). Despite the extensive biophysical and biochemical data on rhodopsin activation, details about how the conformational changes for activation are triggered and the molecular mechanisms explaining the experimental data on the active state of rhodopsin still remain unknown. For exploring the biologically relevant, long timescale motions of large structures, elastic Network Models (ENMs) such as Anisotropic Network Models (ANM) have been successfully used while avoiding expensive computations (5,6). ENM models are based on the topology of interresidue contacts in the native structure. They assume that many functional mechanisms of proteins are intrinsically defined by their 3-dimensional structure. Interactions between residues in close proximity are represented by harmonic potentials with a uniform spring constant, and network junctions are usually identified by the Cα atoms. Low frequency motions, also referred to as ‘global’ modes, are insensitive to the details of the models and energy parameters used in normal mode analyses. Despite their numerous insightful applications, ENM methods have limitations. They lack information on residue specificities, atomic details, side chain motions, and the effects of interactions with the environment such as the lipids and water molecules on proteins. On the other hand, Molecular Dynamics (MD) simulations provide atomic-level detail with high temporal resolution for both harmonic and anharmonic motions. However, the standard MD is not efficient for sampling large conformational changes spanning periods of time longer than microseconds especially for large macromolecules (7,8). Here, our aim is to find at atomic detail the biologically relevant conformations of rhodopsin which couple retinal isomerization to conformational changes in both the TM domain and the critical G-protein binding sites on the CP 60 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada REFERENCES 1. Kobilka, B. and G. F. Schertler. 2008. New G-proteincoupled receptor crystal structures: insights and limitations. Trends Pharmacol. Sci 29:79-83. 2. Palczewski, K., T. Kumasaka, T. Hori, C. A. Behnke, H. Motoshima, B. A. Fox, I. Le Trong, D. C. Teller, T. Okada, R. E. Stenkamp, M. Yamamoto, and M. Miyano. 2000. Crystal structure of rhodopsin: A G protein-coupled receptor. Science 289:739-45. 3. Isin, B., A. J. Rader, H. K. Dhiman, J. KleinSeetharaman, and I. Bahar. 2006. Predisposition of the dark state of rhodopsin to functional changes in structure. Proteins 65:970-983. 4. Klein-Seetharaman, J. 2002. Dynamics in rhodopsin. Chembiochem 3:981-6. 5. Atilgan, A. R., S. R. Durell, R. L. Jernigan, M. C. Demirel, O. Keskin, and I. Bahar. 2001. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J. 80:505-515. 6. Bahar, I., A. R. Atilgan, and B. Erman. 1997. Direct evaluation of thermal fluctuations in proteins using a singleparameter harmonic potential. Fold. Des. 2:173-181. 7. Sotomayor, M. and K. Schulten. 2007. Single-molecule experiments in vitro and in silico. Science 316:1144-1148. 8. Tajkhorshid, E., A. Aksimentiev, I. Balabin, M. Gao, B. Isralewitz, J. C. Phillips, F. Zhu, and K. Schulten. 2003. Large scale simulation of protein mechanics and function. Adv. Protein Chem. 66:195-247. 9. Isin, B., K. Schulten, E. Tajkhorshid, and I. Bahar. 2008. Mechanism of Signal Propagation upon Retinal Isomerization: Insights from Molecular Dynamics Simulations of Rhodopsin Restrained by Normal Modes. Biophys. J. 10. Farrens, D. L. and H. G. Khorana. 1995. Structure and function in rhodopsin. Measurement of the rate of metarhodopsin II decay by fluorescence spectroscopy. J Biol Chem 270:5073-6. 11. Lehmann, N., U. Alexiev, and K. Fahmy. 2007. Linkage between the intramembrane H-bond network around aspartic acid 83 and the cytosolic environment of helix 8 in photoactivated rhodopsin. J. Mol Biol 366:1129-1141. 12. Okada, T., Y. Fujiyoshi, M. Silow, J. Navarro, E. M. Landau, and Y. Shichida. 2002. Functional role of internal water molecules in rhodopsin revealed by X- ray crystallography. Proc. Natl. Acad. Sci. U. S. A 99:59825987. 53: TOPS++FATCAT: FAST FLEXIBLE STRUCTURAL ALIGNMENT USING CONSTRAINTS DERIVED FROM TOPS+ STRINGS MODEL Mallika Veeramalai (Joint Center for Molecular Modeling, Burnham Institute for Medical Research, USA), Yuzhen Ye (School of Informatics, Indiana University, Bloomington, USA), & Adam Godzik (Joint Center for Molecular Modeling, Burnham Institute for Medical Research, USA). TOPS++FATC AT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments towards interactive structure similarity searches. Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses. We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins’ structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT [1] algorithm. Here we explore constraints obtained from the TOPS+ strings alignment, which identifies topologically equivalent secondary structure elements (alpha helices, beta strands, and loops) for this purpose. For benchmarking and comparison, we have used the PDB40 dataset of 1,901 protein domain pairs (DP) corresponding to SCOP version 1.61 from the ASTRAL database [4]. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models [2,3] contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). The overall results for all protein classes show that TOPS++FATCAT performance is only slightly lower (3%– 7% AUC value difference) as compared to FATCAT while providing a significant, more than 10-fold speedup. We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of moreprecise secondary structure element definitions. The TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches. Figure 1 - The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see text). (b) The alignment of secondary structure elements from TOPS+ comparison is TOPS++FATCAT: a fast flexible structural alignment using constraints derived from TOPS+ strings models. Intuitive topological constraints help to prune the search space involved in FATCAT comparison process. The 61 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an “eligible” block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining. REFERENCES 1. Ye Y, Godzik A: Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 2003, 19 Suppl 2:II246-II255. 2. Veeramalai M: A novel method for comparing topological models of protein structures enhanced with ligand information. PhD Degree Thesis. Department of Computing Science: University of Glasgow; 2005. 3. Veeramalai M, Gilbert D: A Novel Method for Comparing Topological Models of Protein Structures Enhanced with Ligand Information (Bioinformatics – under review). 4. Chandonia J-M, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE: ASTRAL compendium enhancements. Nucleic Acids Research 2002, 30:260- 263. 54: CONFORMATIONAL DIVERSITY MODULATES PROTEIN SEQUENCE DIVERGENCE We studied how the presence of conformational diversity constraints protein sequence evolution. We found that in 60% of the cases one of the conformer dominates the structural constraints on sequence divergence. This also indicates the importance of structural deformations to design new models of protein evolution. We have used a set of 75 proteins with different conformers taken from bibliography and from the database of macromolecular movements. For each conformer we have used the SCPE to obtain a whole set of site-specific substitution matrices. Using maximum likelihood calculations performed with the program HYPHY, a set of homologous proteins and a phylogenetic tree; we studied how well the different conformers reproduce the substitution pattern found in the alignment. As a null hypothesis we have used the JTT model of protein evolution which does not take into account protein structure for the derivation of substitution matrices. For model comparison we have used a likelihood ratio test, which provide us with a statistical framework to evaluate how well the models reproduce sequence divergence pattern found in the alignments. Also, using distance matrix comparison between the conformers, we estimated which of the structures is the “open” and which is the “closed” form. First of all, we found that in 80% of the proteins, the SCPE model outperforms JTT model which is in well agreement with previous results. We then compared how well each of the conformers for a given protein describes the substitution pattern found in the alignment. This was evaluated using SCPE runs for each of the conformers. We found that over 60% of the cases one conformer is a better model than the other. This is an important result that indicates that, over the set of conformers studied, there is one that dominates the constraints over sequence divergence. Moreover, in the 60% of these conformers, the “open” form is the one that better describes the sequence divergence of the homologous proteins. We were unable to find a clear correlation between RMS calculated between conformers and the maximum likelihood performance. This may be related with the fact that SCPE could be more sensitive to detect conformational changes than RMS does. Our results indicate that conformational diversity constraints protein sequence evolution and also indicate the importance of protein dynamics or structural deformations to design new models of protein evolution. A molecular evolution model that take into account a combination of the structural constrains of a set of conformers for a given protein, is able to describe with a superior accuracy the sequence divergence within a protein family. It is well established that the conservation of protein structure during evolution modulates sequence divergence. However, recent evidence support the fact that the native state of any protein is better described as an ensemble of protein conformations. Here we study how the different conformations of a given protein constraint the substitution pattern observed in the sequence. To study how the conservation of protein structure constrains sequence divergence, we have developed the Structurally Constrained Protein Evolution (SCPE) model. The SCPE simulates sequence divergence with special consideration to the conservation of protein structure. These simulations allow the derivation of site-specific substitution matrices that we found outperform protein evolution models that do not consider protein structure explicitly. 55: RENAMING DIASTEREOTOPIC ATOMS FOR CONSISTENT PDB-WIDE ANALYSES Christopher Bottoms & Dong Xu (University of Missouri-Columbia, USA) Biological chemistry is very stereospecific. However, diastereotopic atoms of small molecules are often given names that Ezequiel Iván Juritz, Sebastián Fernández Alberti and Gustavo Parisi (Quilmes National University, Argentina) 62 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada method of Cieplak and Wisniewski [13] for spatial comparisons. However, instead of using CIP priorities, we take advantage of the inherent “chirality” of atom names. This allows for use of idealized ligands in any conformation for naming diastereotopic atoms of query ligands in any other, or the same, conformation. It is also less computationally expensive than attempting to superpose an ideal and a query ligand. REFERENCES 1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank (http://www.rcsb.org/). Nucleic Acids Res 2000, 28(1):235-242. 2. Faig M, Bianchet MA, Winski S, Hargreaves R, Moody CJ, Hudnott AR, Ross D, Amzel LM: Structure-based development of anticancer drugs: complexes of NAD(P)H:quinone oxidoreductase 1 with chemotherapeutic quinones. Structure 2001, 9(8):659-667. 3. Bressi JC, Verlinde CL, Aronov AM, Shaw ML, Shin SS, Nguyen LN, Suresh S, Buckner FS, Van Voorhis WC, Kuntz ID, Hol WG, Gelb MH: Adenosine analogues as selective inhibitors of glyceraldehyde-3-phosphate dehydrogenase of Trypanosomatidae via structure-based drug design. J Med Chem 2001, 44(13):2080-2093. 4. Newsletter 1984. European Journal of Biochemistry 1984, 138(1):5-7. 5. Eckstein F: Nucleoside phosphorothioates. Annu Rev Biochem 1985, 54:367-402. 6. Cech TR, Herschlag D, Piccirilli JA, Pyle AM: RNA catalysis by a group I ribozyme. Developing a model for transition state stabilization. J Biol Chem 1992, 267(25):17479-17482. 7. Padgett RA, Podar M, Boulanger SC, Perlman PS: The stereochemical course of group II intron self-splicing. "Science (New York, NY" 1994, 266(5191):1685-1688. 8. Domanico PL, Rahil JF, Benkovic SJ: Unambiguous stereochemical course of rabbit liver fructose bisphosphatase hydrolysis. Biochemistry 1985, 24(7):1623-1628. 9. Tsai MD: Use of phosphorus-31 nuclear magnetic resonance to distinguish bridge and nonbridge oxygens of oxygen-17-enriched nucleoside triphosphates. Stereochemistry of acetate activation by acetyl coenzyme A synthetase. Biochemistry 1979, 18(8):1468-1472. 10. Schultze P, Feigon J: Chirality errors in nucleic acid structures. Nature 1997, 387(6634):668. 11. Waszkowycz B: Towards improving compound selection in structure-based virtual screening. Drug Discov Today 2008, 13(5-6):219-226. 12. Good A: Structure-based virtual screening protocols. Curr Opin Drug Discov Devel 2001, 4(3):301-307. 13. Cieplak T, Wisniewski J: A new effective algorithm for the unambiguous identification of the stereochemical characteristics of compounds during their registration in databases. Molecules 2001, 6:915-926. 14. Bottoms C, Xu D: Wanted: Unique names for unique atom positions. PDB-wide analysis of diastereotopic atom do not uniquely distinguish them from each other. We describe a tool for renaming their diastereotopic atoms based on idealized ligands. Often accompanying the macromolecules deposited in the Protein Data Bank (PDB) [1] are smaller molecules of biological importance. Some of these are energy-carrying cofactors, such as ATP, coenzyme A, and nicotinamideadenine dinucleotide (NAD). Some analogs of these molecules are either drugs or can be used in drug design [2, 3]. Like other biologically relevant molecules, many of these small molecules contain chiral or prochiral centers. An atom is a chiral center if four different chemical groups are attached to it. A chiral configuration can be designated R or S, depending on the arrangement of the attached groups. If, however, two of these groups are identical, then the center atom is prochiral, meaning that it would become chiral if either of the identical groups were substituted for a unique group. These two groups are called diastereotopic, i.e., if either were replaced with a unique group, the molecule would become one or another diastereomer. Within a pair of diastereotopic atoms, one is designated pro-R and the other pro-S, indicating the configuration of the chiral atom would result from replacing the diastereotopic atom with a group that has higher priority than the other groups. The pro-S and pro-R oxygen atoms of nucleic acid strands are named “OP1” and “OP2”, respectively [4]. Many enzymes treat the pro-R and pro-S oxygen atoms of DNA and RNA differently[5]. These diastereotopic oxygen atoms are also treated differently in RNA-intron splicing [6, 7]. Small diphosphate-containing molecules also participate in enzymatic reactions in which the distinction between diastereotopic atoms or groups is important [5, 8, 9]. Unfortunately, many of these diastereotopic atoms do not have standardized names (see the figure, which shows diphosphate groups from two different NAD molecules of the PDB file 2OHX). Consistent naming of diastereotopic atoms is needful when performing all-atom superpositioning or all-atom root mean square deviation (RMSD) calculations [10]. It is also needful for data mining in the PDB, e.g., structure-based virtual screening for drug candidates [11, 12]. Using the determinant algorithm of Cieplak and Wisniewski [13], we conducted a systematic PDB-wide analysis on the diastereotopic oxygen atom names of small molecules containing diphosphate [14]. The lack of standardized naming conventions for diastereotopic atoms of small molecules has left the ad hoc names assigned to many of these atoms non-unique, which may create problems in data-mining of the PDB. Therefore, researchers designing PDB-wide analyses need to consider this issue to avoid spurious results. We previously provided a tool for renaming diastereotopic oxygen atoms of diphosphate-containing molecules (http://digbio.missouri.edu/ddan/DDAN.htm), but at this conference we present a more general tool. This tool compares the naming conventions of idealized ligands and query ligands. Names of diastereotopic atom pairs in query ligands are swapped, as needed, to make them conform to the idealized ligands. Like our previous tool, this uses the 63 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada molecular bonds, and electrostatic potentials of known eh1like motifs. For example, experimentally determined structure of an eh1 peptide bound to the WD domain and a previous study on eh1-like motifs were used. Models of the WD domain interacting with eh1 motifs were generated using Deep View program and Swiss-Model server. Swiss Model used specialized software and up-todate databases to build models of putative eh1 sequences and evaluate their quality. Information gained on bond lengths, binding sites, local shape complementarity, interaction potentials, energies, and models' stabilities was used to devise a scoring function. Theoretical eh1-like sequences were assessed and ranked using the scoring function. This was used to predict which putative eh1-like motifs likely had a potential repressive function. Motif recognition techniques and bioinformatics searches for experimental data within the NCBI protein databases were employed to verify the predictions. The results showed that the scoring function captured a general correspondence between putative motifs' characteristics and the likelihood that they were found in transcription factors of various species. Conversely, the scoring function was very reliable in predicting which putative sequences were not found in nature. This report discusses findings on inter-motif bonds, charge and polarity of residues, a secondary structure of eh1-like motifs, bonds between the motifs and the WD domain, and alike. The study indicated that mutations in motifs' residues could produce only limited changes in the tertiary structures and still preserve motifs' functionality. This study identified several new eh1-like motifs. NCBI Blast searches confirmed that these motifs were conserved in transcription factors of several species, implying that they likely had transcriptional roles. The results of this study may be used to predict other regulatory motifs. Given the importance of transcriptional regulation, this report on the prediction and evaluation of new eh1-like motifs will facilitate further studies of transcriptional and regulatory mechanisms. names of small molecules containing diphosphate. BMC Bioinformatics 2008, 9(Suppl 9):S16. 56: PREDICTING NEW ENGRAILED HOMOLOGY MOTIFS FROM STRUCTURAL AND ENERGY STUDIES OF THE WD PROPELLER DOMAIN BINDINGS TO KNOWN MOTIFS Danielle S. Dalafave (The College of New Jersey, USA) Data mining and computational techniques were used to predict new eh1like motifs and evaluate their structure, stability, and functionality. To the best of the author's knowledge, this is the first report on using structural and energetic considerations to predict eh1-like motifs that bind to WD domains of Gro/TLE transcriptional corepressors. Data mining and computational techniques were used to predict new engrailed homology-1 (eh1)-like motifs and evaluate their 3D structures, amino acid sequences, stability, and possible functionality. Eh1-like motifs bind to WD domains of the Gro/TLE corepressors to provide transcriptional repressive functions. To the best of the author's knowledge, this is the first study that uses a combination of compositional, structural, and energetic considerations to predict new eh-1 motifs that bind to WD domains. Reliable methods that predict molecules involved in proteins' binding would greatly enhance our understanding of proteins' capacities for selective recognition and could potentially lead to new disease intervention methods. At times, experimental studies may be difficult to perform and computational methods need to be employed. Transcription factors are proteins with important roles in controlling the transcription of genetic information from DNA to RNA. Gro/TLE protein family performs their gene repression functions via transcription factors, rather than through direct interactions with DNA. Gro/TLE can bind to diverse transcription factors, some of which belong to systems whose abnormal activities may lead to cancers. The WD domain is a highly conserved region of Gro/TLE. X-ray studies showed that the WD domain forms a beta-propeller, which recognizes specific transcription factors. Experiments had suggested that eh1-like motifs bind to the pore region of the WD propeller to provide their repressive function. A consensus motifs' sequence is FSBXXBBX, where F = Phe, S = Ser, B = branched hydrophobic amino acid residue, and X = nonpolar or charged residue. When H (Tyr) or H (His) is substituted in the first position, the new motif also binds Gro/TLE corepressors. As a first step in this study, available experimental and theoretical information was analyzed to gain insights into structural and sequence constrains, inter- and intra- 57: USE OF EVOLUTIONARY INFORMATION IN MODEL QUALITY EVALUATION FOR PROTEIN STRUCTURE PREDICTION Nicolas Palopoli (Universidad Nacional de Quilmes, Argentina), Diego Gomez Casati (IIB-INTECH, Argentina) 64 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada have selected a number of decoys publicly available on the Web (5,6,7). They have been built by comparative modeling from a template structure, which was taken as the native structured and served as reference for our comparisons. We ran the SCPE for the template and decoy structures and assessed the results through different scoring functions, including global maximum likelihood comparisons, estimation under different cutoffs of the number of structurally constrained sites (defined as sites where the Zscore of the log likelihood for the site against the distribution of log likelihoods for the same site exceeds the desired cutoff) and partial sum of log likelihoods for structurally constrained sites, which has proven to be the most successful measure. When comparing the results, without considering any particular scoring scheme, we have found that the native structure is ranked among the top three decoys in 74% of the cases, while it is selected as the best structure in 51% of them. Our results indicate that the use of evolutionary information could indeed aid in the discrimination of native structures when combined with structural information. The SCPE has been shown to be very promising as a tool for the validation step in protein structure prediction. We still need to address some important issues before our method becomes of common use. We found that SCPE works better with structures longer than a hundred residues, where the number of structurally constraint residues is statistically significant. We also are subjected to the availability of adequate structural alignments (in terms of the number of sequences they comprise). We are currently working on a unique scoring function which would allow us to rank decoy structures on the absence of a reference native structure. REFERENCES: (1) Tramontano A. An account of the Seventh Meeting of the Worldwide Critical Assessment of Techniques for Protein Structure Prediction. FEBS Journal 2007, 274(7):1651-1654. (2) Tan CW. Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction. BMC Bioinformatics 2008, Feb 11;9:94. (3) Parisi G, Echave J. Generality of the Structurally Constrained Protein Evolution model: assessment on representatives of the four main fold classes. Gene 2005, 345(1):45-53. (4) Sander C, Schneider R. The HSSP database of protein structure-sequence alignments. Nucleic Acids Res. 1994 Sep;22(17):3597-9. (5) Samudrala R, Levitt M. Decoys 'R' Us: a database of incorrect conformations to improve protein structure prediction. Protein Sci. 2000 Jul;9(7):1399-401. (6) David Baker´s Lab, http://www.bakerlab.org (7) CASP6, http://www.predictioncenter.org/casp6/Casp6.html 58: COMPUTATIONAL DISCOVERY OF SMALL MOLECULAR WEIGHT PROTEIN INTERACTION INHIBITORS Lidio Meireles (Department of Computational Biology, University of Pittsburgh, USA), Alexander Doemling & Gustavo Parisi (Universidad Nacional de Quilmes, Argentina) We present a novel method for validation of protein models using evolutionary information based on our SCPE program. By testing it with a set of publicly available decoys we found that our method is suitable for discriminating among models, thus being useful for protein structure prediction. It is widely known that, while the primary sequence of a protein drives the adoption of a characteristic tertiary structure, it is not as conserved as its three-dimensional structure, which mostly accounts for the function of the protein. Thus, knowledge of the three-dimensional structure of a protein can often be very helpful for understanding its biological activity. When the structure has not been experimentally solved, reliable computational methods capable of predicting the structure of a protein from its amino acid sequence become extremely helpful. Different techniques are suitable for this task, such as comparative modeling, fold recognition or ab initio predictions, each of them displaying its own strengths and limitations. No matter which is taken, a critical stage when predicting protein structure involves discriminating among the proposed structural models or decoys (1). Common approaches involve energy, statistical potentials describing interactions among atoms, and structural comparisons and clustering between the proposed decoys. Besides, many model quality assessment programs provide some sort of a scoring function capable of ranking decoys according to different structural features. Though the inclusion of evolutionary information has proved to be helpful in decoy discrimination (2), it has not been extensively and explicitly used in prediction methods. Here we present a novel method for decoy discrimination and model selection based on structural features combined with evolutionary information. It is based on the Structurally Constrained Protein Evolutionary model (SCPE) program developed in our group (3). The SCPE algorithm simulates protein evolution by introducing random mutations into the evolving sequences and selecting them against too much structural perturbation. By running SCPE for different decoys we could obtain decoy-specific, site-specific sets of substitution matrices; they represent the evolution of sites under the constraints imposed by each decoy structure. The models could then be compared through their set of matrices, by estimating the likelihood of the evolutionary model for a given set of sequences and a fixed topology (both derived from HSSP database (4) of structure-based sequence alignments). Under our criteria, the best structural model would be the one that explains better the sequential divergence in the set of homologous sequences. For the SCPE to be able to discriminate among correct and wrong models, it needs to take proper account of evolutionary information while being sensitive to structural dissimilarity. We would expect to develop a proper ranking scheme which would lead us to rank native-like structures among the top results, while leaving quite dissimilar decoys (at higher RMSD) far behind. As a dataset for testing our method we 65 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada determined by comparing the SASA of the residues of each protein in the free and in the bounded state. Virtual Library Generation We generate a library of small molecular weight compounds specifically designed to mimic the chemistry and structure of deeply buried anchor residues identified on the previous step. To generate the compounds, we use a fragment-based approach associated with multicomponent reaction scaffolds (MCR). MCR are convergent and efficient ways to access and generate a large and diverse chemical space using only one reaction step (“one-pot”) [4]. By specifying basic molecular scaffolds and small molecular fragments, including anchor analog fragments, compounds are synthesized in silico by the software Chemaxon Reactor [5]. Virtual Screening Virtual compounds incorporating anchor analogs are predocked by fitting the anchor analog to the anchor residue provided by the protein-protein complex structure, significantly simplifying and expediting the virtual pipeline compared to de novo small molecule docking. Following anchor fitting, the compounds are energy minimized in the context of the acceptor protein and the top compounds are predicted from the set of best ranked structures. Our methodology is innovative and includes several benefits. For example, our virtual library is not restricted to any specific database or commercially available compounds, but instead it is constructed on demand biased by the specific protein target and the anchors revealed by the protein-protein complex. Moreover, because the library is constructed from multicomponent reaction scaffolds, the compounds are straightforward and efficiently to synthesize using standard protocols. Docking is extremely fast, in the order of a second per compound, as it merely involves anchor fitting followed by energy minimization. However, perhaps mostly important is the fact that our compounds are designed to mimic the chemistry and structure of anchors of PPIs that were favored by nature. Since anchors bury the most SASA upon binding, their conformation is critical not only for binding in vivo but also in computational docking methods. REFERENCES [1] Wells JA, McClendon CL. Reaching for high-hanging fruit in drug discovery at protein-protein interfaces. Nature 2007;450(7172):1001-1009. [2] Rajamani D, Thiel S, Vajda S, Camacho CJ. Anchor residues in protein-protein interactions. Proc Natl Acad Sci U S A 2004;101(31):11287-11292. [3] Clackson T, Wells JA. A hot spot of binding energy in a hormone-receptor interface. Science 1995;267(5196):383386. [4] Doemling A, Recent progress in isocyanide based multicomponent reaction chemistry. Chemical Review, 2006, 106, 17. [5] György Pirok, Nóra Máté, Jenő Varga, József Szegezdi, Miklós Vargyas, Szilárd Dóránt, and Ferenc Csizmadia. Making "real" molecules in virtual space. J. Chem. Inf. Model. 2006; 46, 563-568. (Departments of Pharmacy and Chemistry, University of Pittsburgh, USA) & Carlos Camacho (Department of Computational Biology, University of Pittsburgh, USA). Protein-protein interactions have proven to be difficult targets for drug discovery. To face this challenge, we propose a computational pipeline involving novel and chemically accessible target-specific libraries, which by design include sidechain analogs that mimic anchor residues from protein-protein complex structures. Protein-protein interactions (PPIs) constitute an emerging class of targets for pharmaceutical intervention with the PDB providing a highly valuable source for structural information on protein interactions [1]. However, the diversity of PPIs does not fit well in the current drug discovery paradigm that focus almost exclusively on screening large historical collections of (commercially available) small molecular weight compounds. Despite computational limitations on the sampling of chemical space and scoring of protein-small molecule docked conformations, in silico screening methods continue to be developed and improved as credible and complementary alternatives to high-throughput biochemical compound screening. In order to overcome the aforementioned limitations, we have developed a virtual screening technology of virtual libraries that by design have a built-in amino acid hot spot, or “anchor”, burying deep into acceptor proteins. Key to our methodology is the concept of anchor residues or hot spots which have been shown to play an important role in the early stages of molecular recognition [2, 3]. Moreover, there is good evidence that in many cases the anchoring grooves are relatively unchanged upon complexation [2], thus providing a uniquely well characterized starting point to docking a small molecule. Our computational pipeline for discovery of small molecular weight inhibitors of protein interactions starts with the PDB of a protein-protein complex and ends with a ranked list of compounds likely to inhibit the underlying protein interaction. The method can be described in three steps, as follows: Anchor Identification Anchor residues are reliably identified as those residues undergoing the largest change in solvent-accessible surface area (SASA) upon complexation. This can be quickly 66 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada transmembrane dimer at a one angstrom R.M.S.D. without the aid of experimental data. We have applied it to obtain a near atomistic structural model the betaglycan transmembrane homodimer, a type III TGF-beta receptor family member. The top ranking model is in excellent agreement with our mutational data obtained with the TOXCAT, an in-vivo assay for association of helices in the E.coli membrane. The positions that are most susceptible to disrupt dimerization all map at the interaction interface, and the effect of experimental and in-silico mutagenesis on association equilibria are in very good agreement. A second round of mutagenesis, performed on a selection of mutations that were predicted to be either tolerable or disruptive, further confirms the model. These included a complementary double mutant that was succesfully predicted to rescue a disruptive single mutation according to the structural model. Hence, as this case-study demonstrates, our novel in silico modeling protocol assists in understanding wide mutational and biophysical data of this important transmembrane protein. Complimentary, the modeling assists in focusing the experimental efforts towards the key loci in this protein and can also be utilized for designing altered requested structures. REFERENCES 1 Senes A, Engel DE, DeGrado WF. "Folding of helical membrane proteins: the role of polar, GxxxG-like and proline motifs." Curr Opin Struct Biol. 2004 14(4), 465-79 2 Senes A, Ubarretxena-Belandia I, Engelman DM. "The Ca–H···O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions." Proc Natl Acad Sci U S A. 2001 98(16), 9056-61 3 Senes A, Gerstein M, Engelman DM. "Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with betabranched residues at neighboring positions." J Mol Biol. 2000 296(3), 921-36 4 Walters RF, DeGrado WF “Helix-packing motifs in membrane proteins” PNAS 2006, 103 13658-63. 61: COMPARING SEQUENCE AND STRUCTUREBASED CLASSIFIERS FOR PREDICTING RNA BINDING SITES IN SPECIFIC FAMILIES OF RNA BINDING PROTEINS Michael Terribilini, Cornelia Caragea, Deepak Reyon, Ben Lewis, Li Xue, Jeffry Sander, Jae-Hyung Lee, Robert L Jernigan, Vasant Honavar, Krishna Rajan & Drena Dobbs (Iowa State University, USA). We evaluate machine learning classifiers for predicting RNA binding residues in proteins, using either sequence based information only, or a combination of sequence and structure derived information and quantitate relative contributions of these different input types to 59: A FRAGMENT BASED METHOD FOR THE PREDICTION OF ATOMISTIC MODELS OF TRANSMEMBRANE HELIX-HELIX INTERACTION Alessandro Senes1,2, Dan W Kulp1, David T. Moore1 & William F DeGrado1. (1Department of Biophysics and Biochemistry, University of Pennsylvania, USA, 2 present address: University of Wisconsin, Madison, USA) We present a general method for modeling transmembraneproteins based on combinatorial fragment-based libraries of natural proteins. It can be applied for ab initio modeling as well as utilize experimental constraints. A derived model of the TGF-beta receptor betaglycan transmembrane dimer supports our extensive mutational data and biophysical characterization. Membrane proteins are a large and medically important class of proteins. However, their structural characterization by X-ray crystallography, NMR and other biophysical techniques is generally challenging and thus they are significantly under-represented in the structural database. We present a computational method that can provide atomistic structural predictions of interacting transmembrane helices. The factors that stabilize membrane protein folding and association are different from those that apply in solution, and therefore designated computational methods need to be developed. The largest group of membrane proteins has a helical bundle topology, and the association of the hydrophobic helices is driven by detailed complementary packing, hydrogen bonding (1), often including networks of weak Ca-H...O hydrogen bonds (2), which are favored by sequence motifs comprising a patch of small interfacial residues (3). The helical pairs often adopt a number of frequent interhelical geometries, as observed in the available crystal structures (4). Our method is based on a large fragment-based combinatorial library of natural backbones that is biased to sample these common interaction motifs. The method consists of two phases. First, the most likely structural candidates for the primary sequence is selected from the comprehensive pools of backbone templates using a sequence-based score derived from sequence alignment statistics and structural constrains, e.g. sterics and hydrogen bonding potential. This stage can also incorporate experimentally derived information, such as mutagenesis data. Second, highly detailed three-dimensional models are assembled from the candidate backbones by placing the side chains from large designated rotamer libraries that have been pre-optimized for each individual backbone. The ensemble of the most energetically favorable models is screened for compatibility with any available experimental evidence, and used to guide further experimental validation. The method is very rapid and can produce very detailed complementary packing. For example, the method reproduced the structure of the glycophorin A 67 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada evaluated ensemble classifiers that use combinations of the input data types described above. Results Our results, partially summarized in Table 1 below, indicate that when classifiers are evaluated on the basis of AUC for ROC curves, the best “overall” performance is obtained using ensemble classifiers that use amino acid sequence information in combination with either: i) PSSMs (derived from sequence homologs identified using BLAST); or ii) spatial neighbor information (extracted from PDB structures of proteins). Results obtained using “custom” classifiers trained to predict RNA-binding residues in specific families of RNA binding proteins, comparisons of our results with those published by others, and comparisons of our predictions with available experimental data for several clinically important ribonucleoprotein complexes will also be presented. Table 1: Comparison of Classifiers for RNA-binding Site Prediction Classifier - AUC for ROC Sequence-based - 0.74 Structure-based - 0.77 PSSM-based - 0.80 Ensemble - 0.81 REFERENCES 1) Terribilini, M., Lee, J.H., Yan, C., Jernigan, R., Honavar, V., and Dobbs, D. 2006. Prediction of RNA binding sites in proteins based on amino acid sequence. RNA 12: 14501462. 2) Terribilini,M., Lee, J.H., Yan, C., Jernigan, R., Carpenter, S., Honavar, V., and Dobbs, D. 2006. Identifying interaction sites in “recalcitrant” proteins: predicted protein and RNA binding sites in Rev proteins of HIV-1 and EIAV agree with experimental data. Proc. Pac. Symp. Biocomput. 62: R-ALIGN: A ROBUST STATISTICS BASED SUPERPOSITION ALGORITHM FOR PROTEINS Chakra Chennubhotla and Ivet Bahar (Department of Computational Biology, University of Pittsburgh, USA) overall prediction performance. We also present novel classifiers optimized for specific families of RNA binding proteins. Introduction Protein-RNA interactions play critical roles in a wide range of biological processes. Previously, we developed a machine learning approach for predicting which amino acids in an RNA-binding protein mediate protein-RNA interactions, using only the amino acid sequence of the protein as input (http://bindr.gdcb.iastate.edu/RNABindR/)(1,2) Here we report an evaluation of the relative contributions of sequence, structural features, and evolutionary information to performance of algorithms for predicting RNA-binding residues in proteins. In this study, we train and test multiple classifiers using several benchmark datasets, including a non-redundant dataset of 181 RNA-binding polypeptide chains with <30% sequence identity (RB181), and “custom” datasets comprising sets of related RNA-binding proteins. We systematically compare results obtained using simple classifiers that use only one type of information as input (e.g., Naïve Bayes classifier, using only amino acid sequence as input) with results obtained using ensemble classifiers that exploit specific combinations of input information (e.g., an ensemble of Naïve Bayes classifiers that use the amino acid sequence, information from sequence homologs and/or the identities of spatial neighbors in known structures as input). We also attempt to generate “custom” classifiers for predicting RNA binding sites in specific families of RNA binding proteins (i.e., those sharing similar sequences or structures). Methods Interfaces from known protein-RNA complexes in the PDB were extracted to generate a non-redundant set of 181 RNAbinding protein chains (RB181). The input to the sequence-based classifier was a window of amino acid identities for contiguous residues in the protein sequence. A Naive Bayes classifier was trained using leave-one-out cross validation: one sequence was chosen as the test case and all other proteins in the dataset were used as the training set. This procedure was repeated until every protein had been used as the test case. The input to the structure-based classifier was a window of amino acid identities for spatial neighbors within the protein structure. To generate the input, we calculated the distance between each pair of residues in the structure. The window for each residue was built from the amino acid identities of the nearest n neighboring residues. A Naïve Bayes classifier was then trained and tested using the same leave-one-out cross validation procedure as for the sequence-based classifier. The input to the PSSM-based classifier was a window of PSSM vectors for residues contiguous in the protein sequence. PSSMs were generated using PSI-BLAST against the NCBI nr database. A support vector machine classifier was then trained and tested using ten-fold cross validation. The protein sequences were split into ten disjoint sets; for each round of cross validation, one set was used as the test set and the other nine sets were used for training. We also We present R-Align a robust statistics based superposition algorithm for finding 3D similarities in protein structures over a hierarchy of scales, from global to local. R-Align (1) distinguishes core residues from flexible ones; (2) identifies rigidly moving domains along with linker regions and (3) provides a metric for ranking similarities. Problem To visualize and understand structural variation in flexible proteins, the first step is to superimpose the two 68 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada robust than measuring the standard-deviation). Then, noting that the median of the absolute values of samples from a Normal distribution is roughly 2/3 of the standard deviation of a Normal distribution, we set s = 1.5*median(absolutevalue-of-errors). 4 Compute weights for each error term using the weight function W(e; s). Weigh the corresponding terms in the Kabsch least-square estimation algorithm. Solve the weighted least-squares error to find the rotation and translation parameters for superposition. The weights start to emphasize core regions that are structurally more stable than flexible regions. 5 Repeat until convergence. Given a converged solution, decide if there is sufficient support from the data to accept the superposition (eg. the sum of weights has to be greater than a threshold). Discard the solution if there is insufficient support. 6 For each verified solution, separate inliers from outliers. In particular, the scale parameter s affects the point at which the influence of the outliers begins to decrease. By examining the influence function Psi(e; s) we can deduce that outlier rejection begins where the second derivative of Rho(e; s) is zero. This means, an error e that is greater than [s/sqrt(3)] has a reduced influence and will be viewed as an outlier. From this, a threshold on weights can be used to identify inliers and outliers. 7 Repeat steps 2 to 7 until there is no more conformational change that need to be explained, i.e. all the residues have been accounted for. In summary, we discussed how to estimate the scale parameter s from the data (step 3); how core regions are emphasized over flexible portions in an automatic way (step 4); and how to identify multiple rigidly moving domains and consequently the hinge regions (steps 6 and 7). The weights derived from any given alignment help us rank the level of similarity between two structures. Results We applied R-Align successfully to many dynamic proteins with two known conformations. Fig. 1 shows the first six snapshots needed for R-Align to identify a rigidly moving domain in GroEL while aligning chain A of 1AON (blue) to chain A of 1OEL (red). In the full implementation of the algorithm, we use several different methods for generating initial guesses for the rotation and translation parameters. Additionally, we address the issue of leverage points - these are residues having extreme influence on the estimator for some initial guesses. We reduce leverage problems by controlling the spatial support of the robust superposition algorithm. We highlight several errors that can potentially arise in comparative modeling and fold recognition targets, including over-segmentation (breaking the protein into several substructures each having roughly similar motion parameters) and undersegmentation (joining two or more independently moving substructures into one rigid domain). REFERENCES 1. Kabsch, W. 1976. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A32:922-923. 2. Hample, F. R., Ronchetti, E. M., Rousseeuw, P. J. and W. A. Stahel. Robust Statistics : The Approach Based on conformations of a protein. A standard least-squares superposition estimates the optimal rotation and translation parameters by minimizing the squared error between the coordinates of the corresponding atoms in the two conformations [1]. In the language of robust statistics, least-squares solution is sensitive to gross errors or outliers [2,3], i.e., a large deviation arising from even a single residue can greatly distort the estimation of the transformation matrix parameters. In fact, a leastsquares superposition often produces a physically inappropriate result, as it fails to distinguish between core residues that are structurally stable from flexible residues that move a lot between multiple conformations. Additionally, the conformational change may involve rigid movement of more than one domain. A least-squares formulation that seeks a single set of rotational and translation parameters ignores this possibility [4]. We address these problems by introducing R-Align, a robust statistics based alignment algorithm. Robust Statistics and Robust Estimators The goal is to estimate the rigid-body parameters that can explain the (rigid) motion of a bulk of residues and identify deviating substructures for further treatment. To this end, we introduce an estimator function Rho(e; s) which provides a cost for any error e at a given scale s. We choose a robust estimator in the name of Gemen-McLure, whose shape is such that it assigns quadratic cost to low errors (just as leastsquares) but a fixed cost for large deviations. If the scale parameters s is very large, Gemen-McLure function behaves like a least-squares estimator. Given the estimator we define two new functions: a influence function Psi(e; s) given by the first derivative of the estimator Rho(d; s) and a weight function W(e; s) =Psi(e; s)/e. The Gemen-McLure influence function Psi(e; s) increases quadratically for small values of the error e. Then as the deviations increase further, the influence function eventually stops increasing and then begins to decrease. By decreasing, it is giving less influence to residues with particularly large deviations (we call these outliers). Importantly, unlike the least-squares formulation, the influence function goes to zero as the error goes to infinity. Interestingly, W(d; s) has the shape of a Gaussian, implying low weights for large errors [5]. In comparison, the weight function for the least-squares estimator is a fixed quantity! We next outline the various steps involved in using the robust estimator function in finding structural similarities. R-Align: An Iteratively Reweighted Least Squares Superposition Algorithm 1 Start from an initial guess for rotation and translation parameters. For this, use Kabsch least-squares algorithm on a random set of spatially close residues (local alignment). 2 Measure deviations (i.e. errors), which is the distance in the coordinates of a given atom j in molecule 1 and molecule 2, after aligning the two conformations. 3 Update the scale parameter using the deviations. We will assume that a bulk of the residues undergoing a coherent rigid motion have deviations that are Normal distributed. However, because of the mix of core and flexible regions, we first measure the median of the errors (which is more 69 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Influence Functions. John Wiley and Sons, New York, NY, 1986. 3. Allan Jepson, Foundations of Computer Vision, CSC487, University of Toronto. ftp://ftp.cs.utoronto.ca/pub/jepson/teaching/vision/2503/robu stEstimation.pdf 4. Zemla, A. LGA: a method for finding 3D similarities in protein structures. Nucleic Acids Research, 2003, 31(13):3370-3374. 5. Damm, K. L. and H. A. Carlson. Gaussian-Weighted RMSD Superposition of Proteins: A Structural Comparison for Flexible Proteins and Predicted Protein Structures. Biophy. J. 90:4558-4573. 63: PHYLOGENY-BASED SCORING OF STRUCTURAL COVERAGE IN PROTEIN FAMILIES Natasha Sefcovic (USCD, USA), Christian Zmasek (Burnham Institute, USA) & Adam Godzik (Burnham Institute, USA) Samuel Flores, Chris Bruns & Russ B. Altman (Stanford University, USA) RNAs play a pervasive role in gene expression and regulation. Their structure and dynamics are crucially important for understanding function. An internal coordinate representation allows us to freeze bond length and angle vibrations and rigidify secondary structure, leading us to recover observed motions of RNA at low computational cost. RNAs play a pervasive role in gene expression and regulation. Their structure and dynamics are crucially important for understanding function. Attempts to solve these have been stymied by the long time scales involved, and the lack of an accurate treatment of counterions. Freezing bond length and angle vibrations and rigidifying secondary structure using internal coordinates leads to a significant reduction in the number of energy evaluations and an increase in the permissible length of time steps. The effect of solvent and counterions can be treated by implicit or explicit means. By combining these methods experimentally observed motions of RNA can be recovered. The results suggest extensibility to large systems such as the ribosome which are difficult to study by conventional means The recently announced SimTK library for multibody dynamics contains many tools to make macromolecular simulations tractable. It is possible to rigidify arbitrary portions of the molecule or molecules, creating extended bodies whose internal interactions need not be calculated. Forces and rubber-band-like elements can be applied at arbitrary points. Molecules, fragments, or atoms can be constrained to a hypothetical ground or to each other, in one or more of their six degrees of freedom. Contact elements can keep specified molecules within given spatial boundaries. The time evolution can be controlled by choosing the time integrator (including variable step size integrators), and by adding a thermostat and velocity dampers. These tools can potentially be used to model systems much larger than are usually considered tractable. In this work we lay the groundwork for ribosomal dynamics by predicting the structure and dynamics of HIV1 Transactivation Response Element (TAR) a small molecule used as a model system for RNA dynamics. In the first stage, we economically generate a large number of conformations of TAR by rigidifying the two helices in the molecule, and allowing bond rotations in the junction connecting the helices. The time evolution is computed with Coulomb and Van der Waals interactions turned off. We then evaluate the ability of a Knowledge Based potential Proteins can be organized into families based on their common ancestry, which we can recognize from sequence similarity. If the three-dimensional structure of at least one protein in the family has been solved, this family is considered to have "structural coverage," since we can usually predict structures of all of the other proteins in the family by comparative modeling. However, the accuracy of such models critically depends on the level of structural similarity between templates and modeling targets; therefore, the quality of the structural coverage depends on the number and distribution of the proteins with the solved structures in the family. While intuitively obvious, so far no quantitative measure of the quality of structural coverage has been developed. Here, we propose a quantitative measure of the structural coverage of a protein family, in which we use distances along a phylogenetic tree to calculate and compare the impact of specific proteins on the structural coverage. We explain our measure using several examples and compare it to several alternatives. We show that the choice of proteins that have been solved for their own individual reasons, as recorded to date in the Protein Data Bank, does not provide optimal coverage of the family as a whole. With such a measure, we can now begin to provide exact answers to questions such as: How many experimental structures do we need to achieve a specific level of model quality? What is the optimal order for target selection? Would solving structures of some subsets of proteins provide better modeling coverage of the family than solving others? A quantitative measure of structural coverage for protein families is, therefore, needed to have a rational and meaningful discussion of the goals and achievements not only of structural genomics, but also those researchers interested in protein structure, function and evolution as well those in the modeling community. We trust that our proposal of such specific scoring systems will begin to add substance to this discussion. 69 INTERNAL COORDINATE METHODS FOR MACROMOLECULAR STRUCTURE AND DYNAMICS 70 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada trained on known protein structures, to find correct conformations from among the thus-generated decoys. In a second stage, we use an all-atoms SimTK analysis to model the TAR's motion under explicit water, explicit counterion conditions. We show how using a small amount of water around each molecule compares to the use of an extensive water environment. We comment on the effect of the ion environment on the conformational dynamics of TAR. The results show how the dynamics of small RNAs can be computed accurately and economically using our novel internal coordinate dynamics code. Possible extensions to larger systems are discussed. Due to bacterial resistance to current antibiotic drugs there is special interest to develop antibacterial peptide agents to address the bacterial resistance problem. Antibacterial peptides have several advantages to current drugs as they have rapid microbicidal activity as a consequence of their natural occurring means for pathogenic challenges. They have either multiple targets within the cell or bacterial cell membrane, which due to having the net positive charge undergo the electrostatic magnetic force and are interacted by cytoplasmic membrane. They interact with the charged components in the outer layer of the bacterial surface with phosphate in lipopolysaccharides of gramnegative or with the lipoteichoic acids on the gram-positive surfaces [2]. The structural studies of AMEs have shown that although there is no common folding motif for binding aminoglycosides, antibiotics-binding pockets are consistently lined with negatively charged residues so as to effectively attract and bind the positively charged drugs. Wright et al. [3] examined the cationic peptides to study their function as broad-spectrum inhibitors of AMEs. It was identified that the antimicrobial peptide Indolicidin and analogs thereof were able to inhibit APH (3')-IIIa [4], AAC (6')-Ii [5], and the bi-functional enzyme APH (2")-AAC (6'). This signifies that in principal it may be possible to develop broad-spectrum inhibitors of AMEs so as to combat aminoglycoside antibiotic resistance. However, peptides are unsuitable as therapeutic agents due to their poor pharmacokinetic properties, and as such Indolicidin and its analogs must be viewed as leads for the development of peptidomimetics. To advance the design of such peptidomimetics it is imperative that detailed information on the interactions between cationic peptides and different AMEs is obtained. For this reason series of Monte Carlo based conformational search [6,7] are performed on a group of available cationic antimicrobial peptides Indolicidin, its two analogs (CP10A and CP11CN) and an ensemble of Indolicidin derivatives with lengths of less than 14 amino acids (GW11-GW28) against two different AMEs, APH(3')-IIIa and AAC(6')-Ii. The predicted binding sites of these peptides are evaluated by calculated binding affinities and analyzing their interaction modes against the key sites of the substrates binding pockets. The calculated binding affinities of peptides in complex with APH (3')-IIIa are found in a high correlation with the experimental data for the peptides with the lengths of more than ten and less than eight amino acids, whereas the calculated binding affinity for the peptides against AAC (6')-Ii is in complete agreement with the corresponding experimental data. These observations validate the efficiency of our computational approach, which was then applied for a set of docking studies on a new group of designed peptides. The peptides either partly occupy both ATP and aminoglycoside binding pockets and form a bridge-like binding site between them or only locate in the vicinity of aminoglycoside binding site, by which inhibit antibiotic activity rather than ATP (e.g., two peptides in ribbon representation in Figure). 70: MONTE CARLO CONFORMATIONAL SEARCH ON CATIONIC PEPTIDE INHIBITORS OF ANTIBIOTIC RESISTANCE ENZYMES Laleh Alisaraie, Albert M. Berghuis (Department of Biochemistry, McGill University, Montreal, Canada ) Results of investigations on the cationic peptide inhibitors are presented. Binding sites of the peptides are identified and evaluated. The calculated binding affinity are validated with the experimental data and based on these observations a novel potential peptide inhibitor is introduced as a lead compound for development of peptidomimetic adjuvant that can inhibit enzymemediated resistance to aminoglycoside antibiotics. Aminoglycoside are often prescribed as broad-spectrum antibiotics. The available data from last decades shows the increasing of bacterial resistance to available aminoglycoside antibiotic classes. Resistance to aminoglycoside is usually because of the dramatic increase in enzymatic activity of aminoglycoside modifying enzymes (AMEs), aminoglycoside-acetyltransferase (AACs), nucleotidyltransferase (ANTs) and -phosphotransferases (APHs). Aminoglycoside resistance often appears as a result of the plasmid-borne genes encoding AMEs, which in different groups are capable to target A site of the 30S bacterial ribosome. Chemical modification of these species is catalyzed by AMEs among which, AACs and APHs are the main culprits due to having a wide range of variety in chemical modification of the antibiotics not only in terms of the site modifications also in several unique resistance profiles and protein structure designations. [1] 71 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Among the inhibitors in the designed data base, GW31 with the length of six amino acids (ILAWAW) demonstrates a high binding affinity and strong interactions with the key sites amino acids in the substrates binding pockets in both APH (3')-IIIa and AAC (6')-Ii. In Figure, GW31 (spheres) is shown that occupies the binding site of ATP (green stick) in complex state with APH (3')-IIIa. GW31 is introduced as a novel lead compound for development of peptidomimetic adjuvant that can inhibit enzyme-mediated resistance to aminoglycoside antibiotics. REFERENCES: 1. K. J. Shaw, P. N. Rather, R. S. Hare, G. H. Miller., Microbiol. Rev., 1993, 57, 138-163 2. R. E. Hanock, A. Rozek., FEMS Microbiol. Lett., 2002, 206, 143-149 3. D. Boehr, K. Draker, K. Koteva, M. Bains, R. Hancock, G. Wright., Chemistry & Biology, 2003, 10, 189-196 4. D. H. Fong, A. M. Berghuis., EMBO J., 2002, 21, 23232331 5. D. L. Burk, B. Xiong, C. Breitbach, Berghuis, A.M., Acta Crystallogr., Sect. D, 2005, 61, 1273-1279 6. C. McMartin, R. Bohacek, J. Computer-Aided. Mol. Des, 1997, 11, 333-344 7. L. Alisaraie, L. Haller, G. Fels., Chem. Inf. Model., 2006, 46, 1174 -118 72 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Last Name Chuang Cipriano Dalafave del Sol Dintyala Subrahmanya Dixit Doxey Dunbrack First Name Syed (Nabil) Daniel Gabriele Ivet Ahmet Andrew Christopher Phil Mitchell Patrick Forbes Chris Matthieu Prof. Chakra Gwo-Yu Gregory Danielle Antonio Venkata Ravikant Surjit Andrew Roland Durek Pawel Ekman Ellis Falk Fan Flores Fromer Gallin Georgiev Ghersi Gippert Glazer Hall Hart Heifets Heilbut Hennerdal Hirayama Hou Huehne Isin Jaitly Jefferys Jiang Juritz Kaburagi Kamisetty Kauko Keedy Diana Jonathan Jenny Samuel Samuel Menachem Warren Ivelin Dario Garry Dariya David Reece Abraham Adrian Aron Kazunori Zhenglin Rolf Dr. Basak Navdeep Benjamin Bo Ezequiel Takashi Hetunandan Anni Daniel Ali Almonacid Ausiello Bahar Bakan Bordner Bottoms Bourne Brittnacher Buck Burkowski Bystroff Chavent Chennubhotla Country Affiliation Imperial College University of California University of Rome - Tor Vergata University of Pittsburgh University of Pittsburgh Mayo Clinic Univ of Missouri-Columbia University of California University of Washington Rensselaer Polytechnic Institute University of Waterloo Rensselaer Polytechnic Institute CNRS/INRIA University of Pittsburgh Boston University University of Wisconsin-Madison The College of New Jersey Fujirebio Inc Cornell University Zymeworks Inc. University of Waterloo Fox Chase Cancer Center Max-Planck-Institute of Molecular Plant Physiology The Arrhenius Laboratories Statistics Department. DEFS The Arrhenius Laboratories Victor Chang Cardiac Research Institute Stanford University The Hebrew University of Jerusalem University of Alberta Duke University Mount Sinai School of Medicine Novozymes A/S Stanford University Boston University Genentech, Inc University of Toronto Boston University Center for Biomembrane Research Waseda University Pioneer, A DuPont Company Fritz Lipmann Institute (FLI) University of Pittsburgh University of Toronto Imperial College University of Massachusetts Amherst Unidad de Fisicoquimica - CEI Waseda University University of Pittsburg Stockholm University Duke University 73 UK USA Italy USA USA USA USA USA USA USA Canada USA France USA USA USA USA Japan USA Canada Canada USA Germany Sweden Australia Sweden Australia USA Israel Canada USA USA Denmark USA USA USA Canada USA Sweden Japan USA Germany USA Canada UK USA Argentina Japan USA Sweden USA Email s.ali05@imperial.ac.uk daniel.almonacid@ucsf.edu gabriele.ausiello@gmail.com bahar@pitt.edu ahb12@pitt.edu bordner.andrew@mayo.edu cab8d2@mizzou.edu bourne@sdsc.edu mbrittna@u.washington.edu buckp@rpi.edu fjburkow@plg.uwaterloo.ca bystrc@rpi.edu chavent@loria.fr chakracs@pitt.edu gychuang@bu.edu gregc@cs.wisc.edu dalafave@tcnj.edu ao-mesa@fujirebio.co.jp ravid@ices.utexas.edu sdixit@zymeworks.com acdoxey@uwaterloo.ca Roland.Dunbrack@fccc.edu durek@mpimp-golm.mpg.de diaek@sbc.su.se jenny.falk@cbr.su.se s.fan@victorchang.edu.au scflores@stanford.edu fromer@cs.huji.ac.il wgallin@ualberta.ca ivelin@cs.duke.edu dario.ghersi@mssm.edu gpgi@novozymes.com dsglazer@stanford.edu drhall@bu.edu reece@harts.net aheifets@cs.toronto.edu aheilbut@gmail.com aron.hennerdal@cbr.su.se hirayama.densei@ruri.waseda.jp zhenglin.hou@pioneer.com bai2@pitt.edu ndjaitly@yahoo.com benjamin.jefferys@imperial.ac.uk boj@engin.umass.edu kaburagi@matsumoto.elec.waseda.ac.jp hetu@cs.cmu.edu anni.kauko@sbc.su.se daniel.keedy@duke.edu 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Kelm Korkin Kortemme Kozakov Sebastian Dmitry Tanja Dima Kraut Adam Krissinel Landon Lario Lewis Lilien Liu Meireles Mettu Moll Morita Eugene Melissa Paula Byungkook (BK) Benjamin Ryan Lu Lidio Ramgopal Mark Mizuki Moult John Najmanovich Nugent Palopoli Peterson Qiu Reuveni Reyon Ritchie Rohs Rux Ryabov Safi Samish Schlick Sefcovic Senes Seto Shackelford Shoichet Shrivastava Sierk Sternberg Stogios Taylor Rafael Timothy Nicolas Matthhew Dr. Yang Shlomi Deepak Dave Remo John Yaroslav Maria Ilan Tamar Natasha Alessandro Marian George Brian Indira Michael Michael Peter J Todd JeanFrancois Lee Tomb Tress Michael Vajda Sandor Valencia Alfonso Veeramalai Mallika Wallach Wood Izhar Graham Wymore Troy University of Oxford University of Missouri University of California, San Francisco Boston University National Resource for Biomedical Supercomputing European Bioinformatics Institute Brandeis University Zymeworks, Inc. National Institutes of Health Iowa State University University of Toronto University of Pittsburgh University of Pittsburgh University of Massachusetts Amherst Rice University The University of Tokyo University of Maryland Biotechnology Institute European Bioinformatic Institute University College London Unidad de Fisicoquimica - CEI The MITRE Corporation GlaxoSmithKline Tel Aviv University Iowa State University University of Aberdeen Howard Hughes Medical Institute Wistar Institute NIH University of Toronto University of Pennsylvania New York University Joint Center for Structural Genomics University of Wisconsin, Madison Bayer HealthCare Pharmaceuticals Inc. University of California, Santa Cruz UCSF University of Pittsburgh Saint Vincent College Imperial College University of Toronto NIST E.I.DuPont De Nemours & Co., Inc Spanish National Cancer Research Centre Boston University Spanish National Cancer Research Centre Burnham Institute for Medical Research University of Toronto Macquarie University National Resource for Biomedical Supercomputing 74 UK USA USA USA USA UK USA Canada USA USA Canada USA USA USA USA Japan kelm@stats.ox.ac.uk korkin@korkinlab.org kortemme@cgl.ucsf.edu midas@bu.edu kraut@psc.edu keb@ebi.ac.uk mlandon@brandeis.edu plario@zymeworks.com bk@nih.gov balewis@iastate.edu lilien@cs.toronto.edu luliucmu@gmail.com lmm85@pitt.edu mettu@ecs.umass.edu mmoll@cs.rice.edu mizuki@iu.a.u-tokyo.ac.j jmoult@tunc.org UK UK Argentina USA China Israel USA UK USA USA USA Canada USA USA USA USA USA USA USA USA USA UK Canada USA USA Spain USA Spain USA Canada Australia USA rafael.najmanovich@ebi.ac.uk t.nugent@cs.ucl.ac.uk npalopoli@graduados.unq.edu mpeterson @mitre.org yang.x.qiu@gsk.com shlomire@post.tau.ac.il dreyon@iastate.edu d.w.ritchie@abdn.ac.uk mail@remo-rohs.de rux@wistar.upenn.edu yryabov@mail.nih.gov maryamirza@gmail.com samish@sas.upenn.edu schlick@nyu.edu natasha.sefcovic@gmail.com asenes@mail.med.upenn.edu marian.seto@bayer.com ggshack@soe.ucsc.edu shoichet@ucs.edu ihs2@pitt.edu michael.sierk@email.stvincent.edu m.sternberg@imperial.ac.uk pstogios@uhnres.utoronto.ca compbiology@yahoo.com jeanfrancois.tomb@usa.dupont.com mtress@cnio.es vajda@bu.edu valencia@cnb.uam.es iscbscmallika@gmail.com izharw@cs.toronto.edu gwood@efs.mq.edu.au wymore@psc.edu 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Xu Xu Yamanishi Zaback Zavodszky Zhang Zhou Zhu Zsoldos Jinbo Qifang Yoshihiro Peter Maria Yi Ming Hongbo Zsolt Toyota Tech Inst at Chicago Fox Chase Cancer Center Ecole des Mines de Paris Iowa State University Michigan State University University of Massachusetts Columbia University Max-Planck-Institut für Informatik SimBioSys Inc. 75 USA USA France USA USA USA USA Germany Canada j3xu@tti-c.org qifang.xu@fccc.edu Yoshihiro.Yamanishi@ensmp.fr petez@iastate.edu zavodszk@msu.edu yi@engin.umass.edu mz2140@columbia.edu hzhu@mpi-inf.mpg.de zsolt@simbiosys.ca 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Index by abstract ID Fig ID Title Presenting author Page Ivet Bahar 6 Donald Petrey, Markus Fischer & Barry Honig 6 Alfonso Valencia 6 Tamar Schlick 6 Philip Bourne 7 Tanja Kortemme 7 Brian Shoichet 7 MetaMol: High quality visualization of Molecular Skin Surface Matthieu Chavent, Bruno Levy & Bernard Maigret 28 3 Specific interactions for ab initio folding of proteins Yuedong Yang & Yaoqi Zhou Toward elucidating allosteric mechanisms of K1 function via structure-based analysis of protein dynamics On the nature of protein fold space: extracting K2 functional information from apparently remote structural neighbors Prediction of functional characteristics based on K3 sequence and structure Chromatin structure insights revealed by mesoscale K4 modeling K5 I am not a PDBid I am a Biological Macromolecule Conformational flexibility and sequence diversity in computational protein design Hits, Leads & Artifacts from Virtual and HighK7 Throughput Screening K6 2 Structure determination of protein-protein complexes 4 using parameters of their overall rotational dynamics available via NMR relaxation data Focused docking: a computational approach to improve small-molecule docking into protein 5 structures 6 Crystal contacts as nature's docking solutions 29 Yaroslav Ryabov & Charles Schwieters 30 Dario Ghersi & Roberto Sanchez 31 Eugene Krissinal 24 7 Support Vector Machine-based Transmembrane Protein Topology Prediction Tim Nugent & David Jones 32 9 Proteins: coexistence of stability and flexibility (MGMS awardee) Shlomi Reuveni Rony Granek & Joseph Klafter 18 76 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada 10 Diana Ekman, Åsa K. Björklund and Arne Elofsson 33 Gabriele Ausiello, Pier Federico Gherardini, Elena Gatti1, Ottaviano Incani & Manuela Helmer-Citterich 34 Jenny Falk & Arne Elofsson 35 Dave Ritchie, Dima Kozakov & Sandor Vajda 36 Michael Sternberg, Stephen Muggleton, Ata Amini, Huma Lodhi, David Gough & Paul Shrimpton 22 Takashi Kaburagi & Takashi Matsumoto 36 Syed Ali & Michael Sternberg 10 Andrew Bordner 19 Domain rearrangement and domain creation in the evolution of new proteins A novel method for the detection of protein local 11 structural motifs binding specific ligand fragments 12 How common are internal repeats in alpha-helical membrane proteins? Accelerating and Focusing Protein-Protein Docking 13 Correlations Using Multi-Dimensional Rotational FFT Generating Functions 14 Logic-based drug discovery An Approach to Transmembrane Protein Structure 15 Prediction with Stochastic Dynamical Systems using Backward Smoothing 16 The evolution of protein function driven by a multidomain repertoire (MGMS awardee) 17 Predicting small ligand binding sites on proteins using low-resolution structures Vibin Ramakrishnan, Saeed Salem, Saipraveen Geofold: a mechanistic model to study the effect of 18 Srinivasan, Wilfredo Colon, topology on protein unfolding pathways and kinetics Mohammed Zaki & Chris Bystroff 25 19 Scoring confidence index: statistical evaluation of ligand binding mode predictions Maria Zavodszky, Andrew Stumpff-Kan, David Lee & Michael Feig 20 20 i-SITE: Energy-based method for predicting ligandbinding sites on protein structures Mizuki Morita, Tohru Terada, Shugo Nakamura & Kentaro Shimizu 37 21 Coil within the membrane: Structural anomaly for functional needs Anni Kauko, Kristoffer Illergård & Arne Elofsson 37 Rafael Najmanovich & Janet Thornton 20 Functional insights from binding sites similarities 22 complement existing methods for prediction of protein function 77 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Molecular Dynamics Simulations Using An Alpha23 Carbon-Only Knowledge-Based Force Field For Protein Structure Prediction Patrick Buck and Chris Bystroff 39 Environment-Specific Substitution Tables for Membrane Proteins Sebastian Kelm, Jiye Shi & Charlotte M. Deane 40 Identification of novel inhibitors for ubiquitin C25 terminal hydrolase-L3 by virtual screening Kazunori Hirayama, Shunsuke Aoki, Kaori Nishikawa, Takashi Matsumoto, Keiji Wada 40 26 Algorithms for protein design Ivelin Georgiev, Cheng-Yu Chen & Bruce Randall Donald 16 27 A novel scoring function in eHiTS and LASSO Zsolt Zsoldos, Danni Harris, Mehdi Mirzazadeh, Aniko Simon 42 SE: An algorithm for deri Co-Evolution of Structural Chin-Hsien Tai, James J. Bioinformatics and Protein Design for N-cap Vincent, Changhoon Kim & 28 Backrubs ving sequence alignment from Byungkook Lee superimposed structures 43 24 29 Minor groove electrostatics and binding specificity 30 Co-Evolution of Structural Bioinformatics and Protein Design for N-cap Backrubs Classification of mechanistically diverse enzyme 31 superfamilies according to similarities in reaction mechanism 32 Efficient Protein Conformation Sampling in Real Space Remo Rohs, Sean West, Peng Liu & Barry Honig 12 Daniel Keedy, Ed Triplett, David Richardson, Jane Richardson, Ivelin Georgiev, Cheng-Yu Chen & Bruce R. Donald 44 Daniel Almonacid & Patricia C. Babbitt 15 Jinbo Xu 45 Modeling the Interaction of MAP Kinase Phosphatase Ahmet Bakan, Gabriela 33 3 with a Novel Inhibitor by Accounting for Conformational Factors 34 How good can template-based modelling be? 78 Molina, Andreas Vogt, Michael Tsang & Ivet Bahar 46 Braddon K. Lance, Graham R. Wood & Charlotte M. Deane 47 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada 35 Contact Prediction for Membrane Proteins Computational Methods to Advance from 36 Crystallographic Model to Enzyme Mechanism and Structure-Function Relationships 37 Molecular Surface Abstraction Aron Hennerdal & Arne Elofsson 48 Troy Wymore & Adam Kraut 48 Gregory Cipriano, George Phillips & Michael Gleicher 49 Maxim Shapovalov & Roland Dunbrack 26 38 The next generation of the backbone dependent rotamer library 39 High-Throughput Crystal Structure Prediction of Drug-Like Molecules Bashir Sadjad, Zsolt Zsoldos and Aniko Simon 50 40 The Jena Library of Biological Macromolecules JenaLib: New Features Rolf Huehne, FrankThomas Koch & Juergen Suehnel 51 41 Samuel Fan, Richard Computational insights into redox-active disulfides in George, Naomi Haworth & protein structures Merridee Wouters SA-COMPAS: A resource for prediction, assessment, 42 and web-based visualization of comparative protein models Adam Kraut and Troy Wymore 53 Hetunandan Kamisetty & Christopher Langmead 23 Qifang Xu and Roland Dunbrack 54 43 Conformational free energy of protein structures: computing upper and lower bounds 44 Statistical analysis of interfaces in crystals of homologous proteins 45 Xenon Effects on Ligand Binding Domain of NMDA Lu Liu, Yan Xu & Pei Tang Receptor Large Scale Motions in Glutamate Transporters 47 Revealed by Elastic Network Models and Cysteine Cross-linking Studies 79 52 Indira H Shrivastava, Jie Jiang, Susan G. Amara & Ivet Bahar 55 56 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Menachem Fromer & Chen Yanover 48 Accurate Prediction of the Near-Optimal Sequence Space for Atomic-Level Protein Design 49 Ezequiel Iván Juritz, Distribution and extension of protein conformational Sebastián Fernández Alberti diversity & Gustavo Parisi 58 50 Poing: a fast and simple model for protein structure prediction Benjamin Jefferys, Lawrence Kelley & Michael Sternberg 17 Dima Kozakov, Gwo-Yu Chuang, Dmitry Beglov, Ryan Brenke & Sandor Vajda 59 Analysis of potential proton channel inhibition 51 mechanisms by computational protein mapping 57 Exploring the Activation Mechanism of a G-ProteinCoupled Protein Receptor, Rhodopsin, Using Normal Basak Isin, Klaus Schulten, 52 Emad Tajkhorshid & Ivet Modes from Coarse-grained Elastic Network Models Bahar in Molecular Dynamics Simulations 59 53 TOPS++FATCAT: fast flexible structural alignment Mallika Veeramalai, Yuzhen Ye, & Adam Godzik using constraints derived from TOPS+ Strings Model 61 54 Conformational Diversity modulates protein sequence Ezequiel Iván Juritz, Sebastián Fernández Alberti divergence & Gustavo Parisi 662 55 Renaming diastereotopic atoms for consistent PDBwide analyses Christopher Bottoms & Dong Xu 62 Danielle S. Dalafave 64 Predicting new engrailed homology motifs from 56 structural and energy studies of the WD propeller domain bindings to known motifs 57 Use of evolutionary information in model quality evaluation for protein structure prediction Nicolas Palopoli, Diego Gomez Casati & Gustavo Parisi 64 58 Computational Discovery of Small Molecular Weight Lidio Meireles, Alexander Doemling & Carlos Protein Interaction Inhibitors Camacho 65 A fragment based method for the prediction of 59 atomistic models of transmembrane helix-helix interaction Alessandro Senes, Dan W Kulp, David T. Moore & William F DeGrado Ben A. Lewis Mateusz Kurcinski, Deepak Reyon, Combining Predictions of Protein Structure and Jae-Hyung Lee, Vasant 60 Protein-RNA Interaction to Model the Structure of theHonavar, Robert L. Jernigan, Andrzej Kolinski, Andrzej Human Telomerase Complex Kloczkowski & Drena Dobbs 80 67 13 3DSig 2008: Structural Bioinformatics and Computational Biophysics Satellite Meeting, July 18-19, 2008, Toronto, Canada Michael Terribilini, Cornelia Caragea, Deepak Reyon, Comparing Sequence and Structure-based Classifiers Ben Lewis, Li Xue, Jeffry 61 for Predicting RNA Binding Sites in Specific Sander, Jae-Hyung Lee, Robert L Jernigan, Vasant Families of RNA Binding Proteins Honavar, Krishna Rajan & Drena Dobbs 67 62 R-Align: A Robust Statistics Based Superposition Algorithm for Proteins Chakra Chennubhotla & Ivet Bahar 68 63 Phylogeny-based scoring of structural coverage in protein families Natasha Sefcovic, Christian Zmasek & Adam Godzik 70 Dariya S. Glazer, Randall J. Radmer & Russ B. Altman 8 George Shackelford & Kevin Karplus 27 Predicting DNA-binding affinity of modularly 66 designed zinc finger proteins Peter Zaback, Jeffry D. Sander, J. Keith Joung, Daniel, F. Voytas & Drena Dobbs 11 67 Channeling protein structure analysis towards understanding cough dynamics Ilan Samish & William F. DeGrado 14 68 An Automatic Server for Function Prediction Evaluation Michael Tress, Alfonso Valencia, Michael Sternberg & Mark Wass 9 Samuel Flores, Chris Bruns & Russ B. Altman 70 Laleh Alisaraie, Albert M. Berghuis 71 64 4D Structure-based Function Prediction 65 Two stage residue-residue contact predictor Internal coordinate methods for macromolecular 69 structure and dynamics 70 Monte Carlo Conformational Search on Cationic Peptide Inhibitors of Antibiotic Resistance Enzymes 81