13 estruct secund termodinam ADN ARN
Transcription
13 estruct secund termodinam ADN ARN
Predicción de la estructura 2D y 3D de ADN/ARN Bioinformática 2007-I Francis Crick Alex Rich Leslie Orgel James Watson 1 Characteristics of Life • Complexity • Ability to extract, transform, and use energy • Ability to replicate Ejemplos de sistemas que se replican • Crystals – replicate, use energy (?), are NOT complex • “Microorganisms” (viruses, viroids, prions) – Same, except usually require host 2 Common chemistry in Life • Monomeric subunits which are multifunction – amino acids (21) → proteins – monosaccharides (30-40?) → carbohydrate polymers – mononucleotides (10-15) → RNA, DNA Bioenergetics • Equilibrium – Characteristics of living systems – Characteristics of equilibrium • Steady state – inputs, outputs in balance – cellular concentrations • Entropy – Complex structures (lowered entropy) 3 Thermodynamics • Closed system at constant T, P eventually reaches equilibrium, minimizes G (Gibbs Free Energy • Living systems are OPEN; continuous input of energetic (high electronic potential compounds) enables a constant concentration of materials far from equilibrium concentration. Water and the hydrophobic effect • Water – Polar liquid • Tends to exclude Non polar materials Organizing principle for living matter 1. Advantageous molecular architectural building blocks form spherical containers for aqueous compartments without being incorporated. 2. Hydrophobic (nonpolar) materials within the compartment segregated by polar medium (water) 4 Figure 4-7 Hydrophobic interactions • Polar materials (water) on balance have attractive intermolecular forces. • Nonpolar materials have no particular tendency to associate. – Result: Nonpolar materials within an aqueous compartment associate because they are excluded by the mutually attractive forces between the molecules of the medium 5 “Weak” Interactions • Chemical bonds (covalent) have energies of ~101->2 kcal/mole. STRONG • Weak interactions – Hydrogen bonds ~ 5 kcal/mole – Van der Waals interactions ~1 kcal/mole • “Hydrophobic” interactions – Less classifiable, a consequence of other forces Table 3-2 Covalent and Noncovalent Chemical Bonds Strength (kcal/mole)* Bond Type Leng th In (nm) Vacuum In Water Covalent 0.15 90 90 Ionic 0.25 80 3 Hydrogen 0.30 4 1 van der Waals 0.1 attraction (per 0.35 0.1 atom) *The strength of a bond can be measured by the energy required to break it, here given in kilocalories per mole (kcal/mole). ( One kilocalorie is the quantity of energy needed to raise the temperature of 1000 g of water by 1° C. An alternative unit in wide use is the kilojoule, kJ, equal to 0.24 kcal.) Individual bonds vary a great deal in strength, depending on the atoms involved and their precise environment, so that the above values are only a rough guide. Note that the aqueous environment in a cell will greatly weaken both the ionic and the hydrogen bonds between nonwater molecules (Panel 3 -1, pp. 92 -93). The bond length is the center-to -center distance between the two interacting atoms; the length given here for a hydrogen bond is that between its two nonhydrogen atoms. 6 Nucleic Acid Basics • Nucleic Acids Are Polymers • Each Monomer Consists of Three units: Nucleotide A Base + A Ribose Sugar + A Phosphate Nucleoside • A Base Can be One of the Five Rings (next): Nucleic Acid Bases • Pyrimidines • Purines •Pyrimidines and Purines Can Base-Pair (Watson-Crick Pairs) 7 Modified Uridines in Eukaryotic tRNA’s Useful Website: http://medlib.med.utah.edu/RNAmods/ 8 Nucleic Acids As Heteropolymers • Nucleosides, Nucleotides • Single Stranded DNA 5’ 3’ •A single stranded RNA will have OH groups at the 2’ positions •Note the directionality of DNA or RNA 9 10 Structure Overview of Nucleic Acids • Unlike three dimensional structures of proteins, DNA molecules assume simple double helical structures independent on their sequences. There are three kinds of double helices that have been observed in DNA: type A, type B, and type Z, which differ in their geometries. The double helical structure is essential to the coding functional of DNA. Watson (biologist) and Crick (physicist) first discovered double helix structure in 1953 by X-ray crystallography. • RNA, on the other hand, can have as diverse structures as proteins, as well as simple double helix of type A. The ability of being both informational and diverse in structure suggests that RNA was the prebiotic molecule that could function in both replication and catalysis (The RNA World Hypothesis). In fact, some virus encode their genetic materials by RNA (retrovirus) Focus on 2o Structure • Difficulty: 3 o Structure Prediction is complicated. – biopolymer can assume a large number of conformations. – much important information contained in details of the o 2 structure. • Simplification: Focus on secondary (2o) structure. – limits the problem to transitions between well-defined states. – For polypeptides: • between a random coil and an α-helix. • alternatively, between an α-helix and a β-strand. – For polynucleotides: • between a pair of random coils and a double-strand. 11 Our Focus: the Helix-Coil Transition in DNA • In particular, we focus on two related processes: – DNA melting • B helix to two coils. – DNA annealing • two coils to a B helix. • Understanding these: – aids in modeling more complicated transitions. • e.g., many species. Stabilizing Interactions • DNA B-Helix structure stabilized by: – hydrogen bonding between bases (minor). + – stacking between H -bonded base-pairs (primary). • induced dipole moments in the π clouds of adjacent heterocyclic rings. • stacking also sequesters hydrophobic rings. • results in the characteristic helix shape. • In DNA melting…the helix destabilized – generally implemented by increasing temperature, T. – destabilizes the stacks…unwinding the helix. – unwound helix separates into free ssDNAs (‘coils’). 12 Monitoring the Helix-Coil Transition • Degree of stacking experimentally observable: – Let ΘB = mean fraction of stacked base pairs. – Ultraviolet absorbance at 260 nm (A 260) • inversely proportional to ΘB. • the ‘hypochromicity’. – DNA melting accompanied by ≅ 40% increase A260. – A260 vs. T yields ΘB vs. T (melting curve). DNA Melting Curves ΘB decreases monotonically from 1 to 0. – sigmoidal shape indicates DNA melting is cooperative. – Temp. at which ΘB = ½ is the Melting temperature (T m) o – Width (∆T) is non-zero (e.g., for 10-mers, ∆T ≅ 10 C). • Melting curves of longer DNAs show more structure: – several independently melting regions (AT’s less stable). – melting curve then a combination of several sigmoids. 13 DNA Renaturation • Renaturation is the ‘reverse’ of DNA melting. – also called DNA ‘annealing’ or ‘hybridization’. • DNA renaturation is a much more complicated process: DNA reassociation (renaturation) Double-stranded DNA Denatured, single-stranded DNA k2 Slower, rate-limiting, second-order process of finding complementary sequences to nucleate base-pairing Faster, zippering reaction to form long molecules of doublestranded DNA http://www3.kumc.edu/jcalvet/PowerPoint/bioc801b.ppt 14 Reversibility of DNA Melting • Melting for short DNAs strictly reversible. • Reversibility of DNA melting: – measured by a lack of ‘hysteresis’ in the melting curve. – DNA melting curve = DNA renaturation curve. • Validity of an equilibrium model of melting assumes: – melting slow enough to maintain equilibrium at each T. o • relatively slow heating/cooling (0.1-0.2 C/minute). – failure to maintain equilibrium = hysteresis in the melting curve. 15 Three Dimensional Structures of Double Helices A-DNA Minor Groove Major Groove A-RNA Forces That Stabilize Nucleic Acid Double Helix • There are two major forces that contribute to stability of helix formation – Hydrogen bonding in base-pairing – Hydrophobic interactions in base stacking 5’ 3’ 3’ 5’ Same strand stacking cross-strand stacking 16 Types of DNA Double Helix • Type A: major conformation of RNA, minor conformation of DNA; • Type B: major conformation of DNA; • Type Z: minor conformation of DNA 3’ 5’ 3’ Narrow 5’ 3’ 5’ A tight B Wide Less tight 3’ 5’ 5’ 3’ Left-handed 5’ 3’ Z Least tight A-form helix A-form and B-form helices differ. • Rotational displacement per bp (30-33o for A; 36-40o for DNA) • Displacement of base pairs from axis. • Major and minor grooves: A-form has deep and narrow major protein to distinguish among base pairs. A groove; little opportunity for a B 17 Structural Transition of DNA under stress The new structures of DNA obtained in numerical simulations when pulling on the molecule (R. Lavery). Left: usual B-DNA structure, middle: if the molecule is pulled by its 5’ ends, it keeps a double helical structure with inclined bases. Right: if the DNA is pulled by its 3’ extremities, the final structure resembles a ladder. Monte Carlo Implementation of Supercoiled Double -Stranded DNA The conformation of DNA molecule of N straight cylinder segments is specified by the space positions of vertices of its central axis, r i = (x(i), y(i), z(i)) in three-dimensional Cartesian coordinate system, and the folding angle of the sugar-phosphate backbones around the central axis, θi , i = 1,2,...,N. The length of the i-th segment satisfies where < >0 means the thermal average for a relaxed DNA molecule and n bp is the amount of basepairs. The configuration of discrete DNA chain in the model. 18 Trial Motions of the DNA Chain during Monte Carlo Simulations (a) The folding angle in i-th segment θi is changed into θi + λ1 . All segments between ith vertex and the free end are translated by the distance of |∆si - ∆s i’ |. (b) A portion of the chain is rotated by an angle of λ2 around the axis connecting the two ends of rotated chain. (c) The segments from a randomly chosen vertex to the free end are rotated by an angle λ3 around an arbitrary orientation axis that passes the chosen vertex. The current conformation of the DNA central axis is shown by solid lines and the trial conformation by dashed lines. Common Structural Elements of RNA Secondary Structure Tertiary Structure 19 20 3D Structures of RNA: Transfer RNA Structures Secondary Structure Of tRNA Tertiary Structure Of tRNA TψC Loop Variable loop Anticodon Stem D Loop Anticodon Loop 21 3D Structures of RNA: Ribosomal RNA Secondary Structure Of large ribosomal RNA Tertiary Structure Of large ribosome subunit Ban et al., Science 289 (905-920), 2000 50S 70S 70S 30S 16S rRNA 23S rRNA 50S proteins 30S proteins A site tRNA P site tRNA E site tRNA 22 3D Structures of RNA: Catalytic RNA Secondary Structure Of Self-splicing RNA Tertiary Structure Of Self-splicing RNA Secondary Structures of Nucleic Acids • DNA is primarily in duplex form. • RNA is normally single stranded which can have a diverse form of secondary structures other than duplex. 23 RNA SS: recursive definition Nussinov (1978) remade from Durbin et al.,1997 Secondary Structure : Set of paired positions on inteval [i,j]. A-U + C-G can base pair. Some other pairings can occur + triple interactions exists. Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l. j-1 i j i,j pair i+1 j i i unpaired j-1 j i j unpaired i k+1 k i+1 j bifurcation More Secondary Structures Pseudoknots: Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press. rRNA Secondary Structure Based on Phylogenetic Data 24 Self complementary methods 25 Predicting RNA Secondary Structures • By Thermodynamics Method • Minimize Gibbs Free Energy • By Phylogenetic Comparison Method • Compare RNA Sequences of Identical Function From Different Organisms • By Combination of the Above Two Methods • In principle, this could be the most powerful method Thermodynamics ∆G = ∆H - T∆S • Gibbs Free Energy, G • • Describes the energetics of biomolecules in aqueous solution. The change in free energy, ∆G , for a chemical process, such as nucleic acid folding, can be used to determine the direction of the process: ∆G=0: equilibrium ∆G>0: unfavorable process ∆G<0: favorable process • ∆H is enthalpy, ∆S is entropy, and T is the temperature in Kelvin. • Thus the natural tendency for biomolecules in solution is to minimize free energy of the entire system (biomolecules + solvent). • Molecular interactions, such as hydrogen bonds, van der Waals and electrostatic interactions contribute to the ∆H term. ∆S describes the change of order of the system. Thus, both molecular interactions as well as the order of the system determine the direction of a chemical process. For any nucleic acid solution, it is extremely difficult to calculate the free energy from first principle • • • • • • Biophysical methods can be used to measure free energy changes 26 The Equilibrium Partition Function • For a population of structures, S, a partition function Z and the probability for a particular folding, s can be calculated: − ∆G s e Z = ∑ e RT s∈ S − ∆ Gs RT Z Energy Minimization Method (mFOLD – RNA Structure) • • An RNA Sequence is called R= {r 1,r2,r3…r n}, where ri is the ith ribonucleotide and it belongs to a set of {A, U, G, C} A secondary structure of R is a set S of base pairs, i.j, which satisfies: • 1=<i <j=<n; • j-i>4 (can’t have loop containing less than 4 nucleotides); • If i,j and i’ .j’ are two basepairs, (assume i =< i’ ), then either » i = i’ and j = j’ (same base pair) » i < j < i’ < j’ (i.j proceeds i’ .j’ ) or » i < i’ < j’ < j (i.j includes i’. j’ ) (this excludes pseudoknots which is i<i’ <j<j’ ) 5’ 3’ • 3’ 5’ If e(i,j) is the energy for the base pair i.j, the total energy for R is E (S ) = ∑e (i , j ) i , j∈ S • The objective is to minimize E(S). 27 Representations (cont.) • Hydrogen bonds between intra-chain pairs are represented by circular arcs All representations are equivalent 28 Free Energy Parameters • Extensive database of free energies for the following RNA units has been obtained (so called “Tinoco Rules” and “Turner Rules”): • Single Strand Stacking energy • Canonical (AU GC) and non-canonical (GU) basepairs in duplexes • Still lacking accurate free energy parameters for • Loops • Mismatches (AA, CA etc) • Using these energy parameters, the current version of mFOLD – RNA Structure can predict ~73% phylogenetically deduced secondary structures. Dynamic Programming (mFOLD) • A matrix W(i,j) is computed that • An Example of W(i,j) • 1. 2. 3. 4. is dependent on the experimentally measured basepair energy e(i,j) Recursion begins with i=1, j=n If W(i+1,j)=W(i,j), then i is not paired. Set i=i+1 and start the recursion again. If W(i,j-1)=W(i,j), then j is not paired. Set j=j-1 and start the recursion again. If W(i,j)=W(i,k)+W(k+1,j) , the fragment k+1,j gets put on a stack and the fragment i…k is analyzed by setting j = k and going back to the recursion beginning. If W(i,j)=e(i,j)+W(i+1,j-1), a basepair is identified and is added to the list by setting i=i+1 and j=j-1 29 Suboptimal Folding (mFOLD) • For any sequence of N nucleotides, the expected number of structures is greater than 1.8 N • A sequence of 100 nucleotides has 3x1025 foldings. If a computer can calculate 1000 strs./s 1, it would take 1015 years! • mFOLD generates suboptimal foldings whose free energy fall within a certain range of values. Many of these structures are different in trivial ways. These suboptimal foldings can still be useful for designing experiments. 30 Energy dot-plot Predicting RNA 3D Structures • Currently available RNA 3D structure prediction programs make use the fact that a tertiary structure is built upon preformed secondary structures • So once a solid secondary structure can be predicted, it is possible to predict its 3D structure • The chances of obtaining a valid 3D structure can be increased by known space constraints among the different secondary segments (e.g. cross-linking, NMR results). • However, there are far less thermodynamic data on 3 -D RNA structures which makes 3-D structure prediction challenging. 31 RNA-protein Interactions • There is currently no computational method that can predict the RNA-protein interaction interfaces; • Statistical methods have been applied to identify structure features at the protein-RNA interface. For instance, ENTANCLE finds that most atoms contributed from a protein to recogonizing an RNA are from main chains (C, O, N, H), not from side chains! But much remain to be done; • Electrostatic potential has primary importance in protein-RNA recognition due to the negatively charged phosphate backbones. Efforts are made to quantify electrostatic potential at the molecular surface of a protein and RNA in order to predict the site of RNA interaction. This often provides good prediction at least for the site on the protein. Fundamentos del SSCP 32 SSCP del gen completo de PZAsa (P7-P8) R S S R H37RV (Sensible) R S R S S R: Wayne Negativo (Resistente) M.Bovis (Resistente) Calculation procedure for extinction (absorption) coefficient of DNA Extinction coefficient at 260 nm, 25 degrees of Celsius, and neutral pH for the single-strand DNA is determined by the nearest-neighbor method 33 The following table contains extinction coefficients [l/(mmol.cm)]: stack or monomer extinction coefficient pdA 15.4 pdC 7.4 pdG 11.5 pdT 8.7 dApdA 13.7 dApdC 10.6 dApdG 12.5 dApdT 11.4 dCpdA 10.6 dCpdC 7.3 dCpdG 9.0 dCpdT 7.6 dGpdA 12.6 dGpdC 8.8 dGpdG 10.8 dGpdT 10.0 dTpdA 11.7 dTpdC 8.1 dTpdG 9.5 dTpdT 8.4 http://biotools.idtdna.com/gateway/ 34 http://www.owczarzy.net/biodata.htm 35