13 estruct secund termodinam ADN ARN

Transcription

13 estruct secund termodinam ADN ARN
Predicción de la
estructura 2D y 3D de
ADN/ARN
Bioinformática 2007-I
Francis Crick
Alex Rich
Leslie Orgel
James Watson
1
Characteristics of Life
• Complexity
• Ability to extract, transform, and use
energy
• Ability to replicate
Ejemplos de sistemas que se
replican
• Crystals
– replicate, use energy (?), are NOT complex
• “Microorganisms” (viruses, viroids, prions)
– Same, except usually require host
2
Common chemistry in Life
• Monomeric subunits which are
multifunction
– amino acids (21) → proteins
– monosaccharides (30-40?) → carbohydrate
polymers
– mononucleotides (10-15) → RNA, DNA
Bioenergetics
• Equilibrium
– Characteristics of living systems
– Characteristics of equilibrium
• Steady state
– inputs, outputs in balance
– cellular concentrations
• Entropy
– Complex structures (lowered entropy)
3
Thermodynamics
• Closed system at constant T, P eventually
reaches equilibrium, minimizes G (Gibbs
Free Energy
• Living systems are OPEN; continuous
input of energetic (high electronic potential
compounds) enables a constant
concentration of materials far from
equilibrium concentration.
Water and the hydrophobic
effect
•
Water
– Polar liquid
• Tends to exclude Non polar materials
Organizing principle for living matter
1. Advantageous molecular architectural
building blocks form spherical containers for
aqueous compartments without being
incorporated.
2. Hydrophobic (nonpolar) materials within the
compartment segregated by polar medium
(water)
4
Figure 4-7
Hydrophobic interactions
• Polar materials (water) on balance have
attractive intermolecular forces.
• Nonpolar materials have no particular
tendency to associate.
– Result: Nonpolar materials within an aqueous
compartment associate because they are
excluded by the mutually attractive forces
between the molecules of the medium
5
“Weak” Interactions
• Chemical bonds (covalent) have energies
of ~101->2 kcal/mole. STRONG
• Weak interactions
– Hydrogen bonds ~ 5 kcal/mole
– Van der Waals interactions ~1 kcal/mole
• “Hydrophobic” interactions
– Less classifiable, a consequence of other
forces
Table 3-2 Covalent and Noncovalent
Chemical Bonds
Strength (kcal/mole)*
Bond Type
Leng
th
In
(nm) Vacuum In Water
Covalent
0.15
90
90
Ionic
0.25
80
3
Hydrogen
0.30
4
1
van der Waals
0.1
attraction (per
0.35
0.1
atom)
*The strength of a bond can be measured by the energy required to break it, here given in kilocalories per mole (kcal/mole). ( One kilocalorie is the quantity of energy needed to raise the
temperature of 1000 g of water by 1° C. An alternative unit in wide use is the kilojoule, kJ, equal to 0.24 kcal.) Individual bonds vary a great deal in strength, depending on the atoms
involved and their precise environment, so that the above values are only a rough guide. Note that the aqueous environment in a cell will greatly weaken both the ionic and the hydrogen
bonds between nonwater molecules (Panel 3 -1, pp. 92 -93). The bond length is the center-to -center distance between the two interacting atoms; the length given here for a hydrogen
bond is that between its two nonhydrogen atoms.
6
Nucleic Acid Basics
• Nucleic Acids Are Polymers
• Each Monomer Consists of Three units:
Nucleotide
A Base + A Ribose Sugar + A Phosphate
Nucleoside
• A Base Can be One of the Five Rings
(next):
Nucleic Acid Bases
• Pyrimidines
• Purines
•Pyrimidines and Purines Can Base-Pair (Watson-Crick Pairs)
7
Modified Uridines in Eukaryotic tRNA’s
Useful Website: http://medlib.med.utah.edu/RNAmods/
8
Nucleic Acids As Heteropolymers
• Nucleosides,
Nucleotides
• Single Stranded DNA
5’
3’
•A single stranded RNA will have OH
groups at the 2’ positions
•Note the directionality of DNA or RNA
9
10
Structure Overview of Nucleic Acids
• Unlike three dimensional structures of proteins, DNA
molecules assume simple double helical structures
independent on their sequences. There are three kinds of
double helices that have been observed in DNA: type A,
type B, and type Z, which differ in their geometries. The
double helical structure is essential to the coding
functional of DNA. Watson (biologist) and Crick (physicist)
first discovered double helix structure in 1953 by X-ray
crystallography.
• RNA, on the other hand, can have as diverse structures
as proteins, as well as simple double helix of type A. The
ability of being both informational and diverse in structure
suggests that RNA was the prebiotic molecule that could
function in both replication and catalysis (The RNA World
Hypothesis). In fact, some virus encode their genetic
materials by RNA (retrovirus)
Focus on 2o Structure
• Difficulty: 3 o Structure Prediction is complicated.
– biopolymer can assume a large number of conformations.
– much important information contained in details of the
o
2 structure.
• Simplification: Focus on secondary (2o) structure.
– limits the problem to transitions between well-defined
states.
– For polypeptides:
• between a random coil and an α-helix.
• alternatively, between an α-helix and a β-strand.
– For polynucleotides:
• between a pair of random coils and a double-strand.
11
Our Focus: the Helix-Coil
Transition in DNA
• In particular, we
focus on two related
processes:
– DNA melting
• B helix to two coils.
– DNA annealing
• two coils to a B helix.
• Understanding
these:
– aids in modeling
more complicated
transitions.
• e.g., many species.
Stabilizing Interactions
• DNA B-Helix structure stabilized by:
– hydrogen bonding between bases (minor).
+
– stacking between H -bonded
base-pairs (primary).
• induced dipole moments in the π clouds of
adjacent heterocyclic rings.
• stacking also sequesters hydrophobic rings.
• results in the characteristic helix shape.
• In DNA melting…the helix destabilized
– generally implemented by increasing temperature,
T.
– destabilizes the stacks…unwinding the helix.
– unwound helix separates into free ssDNAs (‘coils’).
12
Monitoring the Helix-Coil Transition
• Degree of stacking experimentally observable:
– Let ΘB = mean fraction of stacked base pairs.
– Ultraviolet absorbance at 260 nm (A 260)
• inversely proportional to ΘB.
• the ‘hypochromicity’.
– DNA melting accompanied by ≅ 40% increase A260.
– A260 vs. T yields ΘB vs. T (melting curve).
DNA Melting Curves
ΘB decreases monotonically from 1 to 0.
– sigmoidal shape indicates DNA melting is cooperative.
– Temp. at which ΘB = ½ is the Melting temperature (T m)
o
– Width (∆T) is non-zero (e.g., for 10-mers, ∆T ≅ 10 C).
• Melting curves of longer DNAs show more structure:
– several independently melting regions (AT’s less stable).
– melting curve then a combination of several sigmoids.
13
DNA Renaturation
• Renaturation is the ‘reverse’ of DNA melting.
– also called DNA ‘annealing’ or ‘hybridization’.
• DNA renaturation is a much more complicated
process:
DNA reassociation (renaturation)
Double-stranded DNA
Denatured,
single-stranded
DNA
k2
Slower, rate-limiting,
second-order process of
finding complementary
sequences to nucleate
base-pairing
Faster,
zippering
reaction to
form long
molecules
of doublestranded
DNA
http://www3.kumc.edu/jcalvet/PowerPoint/bioc801b.ppt
14
Reversibility of DNA Melting
• Melting for short DNAs strictly reversible.
• Reversibility of DNA melting:
– measured by a lack of ‘hysteresis’ in the melting curve.
– DNA melting curve = DNA renaturation curve.
• Validity of an equilibrium model of melting
assumes:
– melting slow enough to maintain equilibrium at each T.
o
• relatively slow heating/cooling (0.1-0.2 C/minute).
– failure to maintain equilibrium = hysteresis in the melting
curve.
15
Three Dimensional Structures of Double
Helices
A-DNA
Minor Groove
Major Groove
A-RNA
Forces That Stabilize Nucleic Acid
Double Helix
• There are two major forces that contribute to
stability of helix formation
– Hydrogen bonding in base-pairing
– Hydrophobic interactions in base stacking
5’
3’
3’
5’
Same strand stacking
cross-strand stacking
16
Types of DNA Double Helix
• Type A: major conformation of RNA, minor
conformation of DNA;
• Type B: major conformation of DNA;
• Type Z: minor conformation of DNA
3’
5’
3’ Narrow 5’
3’
5’
A
tight
B
Wide
Less tight
3’
5’
5’
3’ Left-handed 5’
3’
Z
Least tight
A-form helix
A-form and B-form helices differ.
•
Rotational displacement per bp (30-33o for A; 36-40o for DNA)
•
Displacement of base pairs from axis.
•
Major and minor grooves: A-form has deep and narrow major
protein to distinguish among base pairs.
A
groove; little opportunity for a
B
17
Structural Transition of DNA under stress
The new structures of DNA
obtained in numerical
simulations when pulling on
the molecule (R. Lavery).
Left: usual B-DNA structure,
middle: if the molecule is
pulled by its 5’ ends, it
keeps a double helical
structure with inclined bases.
Right: if the DNA is pulled
by its 3’ extremities, the
final structure resembles a
ladder.
Monte Carlo Implementation of Supercoiled Double -Stranded DNA
The conformation of DNA molecule of N straight
cylinder segments is specified by the space positions
of vertices of its central axis, r i = (x(i), y(i), z(i)) in
three-dimensional Cartesian coordinate system, and
the folding angle of the sugar-phosphate backbones
around the central axis, θi , i = 1,2,...,N.
The length of the i-th segment satisfies
where < >0 means the thermal average for a relaxed
DNA molecule and n bp is the amount of basepairs.
The configuration of discrete DNA chain in the
model.
18
Trial Motions of the DNA Chain during Monte Carlo Simulations
(a) The folding angle in i-th segment θi is
changed into θi + λ1 . All segments between ith vertex and the free end are translated by the
distance of |∆si - ∆s i’ |. (b) A portion of the
chain is rotated by an angle of λ2 around the
axis connecting the two ends of rotated chain.
(c) The segments from a randomly chosen
vertex to the free end are rotated by an angle
λ3 around an arbitrary orientation axis that
passes the chosen vertex. The current
conformation of the DNA central axis is
shown by solid lines and the trial conformation
by dashed lines.
Common Structural Elements of
RNA
Secondary Structure
Tertiary Structure
19
20
3D Structures of RNA:
Transfer RNA Structures
Secondary Structure
Of tRNA
Tertiary Structure
Of tRNA
TψC Loop
Variable
loop
Anticodon
Stem
D Loop
Anticodon Loop
21
3D Structures of RNA:
Ribosomal RNA
Secondary Structure
Of large ribosomal RNA
Tertiary Structure
Of large ribosome subunit
Ban et al., Science 289 (905-920), 2000
50S
70S
70S
30S
16S rRNA
23S rRNA
50S proteins
30S proteins
A site tRNA
P site tRNA
E site tRNA
22
3D Structures of RNA:
Catalytic RNA
Secondary Structure
Of Self-splicing RNA
Tertiary Structure
Of Self-splicing RNA
Secondary Structures of Nucleic Acids
• DNA is primarily in
duplex form.
• RNA is normally
single stranded
which can have a
diverse form of
secondary
structures other than
duplex.
23
RNA SS: recursive definition
Nussinov (1978) remade from Durbin et al.,1997
Secondary Structure : Set of paired
positions on inteval [i,j].
A-U + C-G can base pair. Some
other pairings can occur + triple
interactions exists.
Pseudoknot – non nested pairing:
i < j < k < l and i-k & j-l.
j-1
i
j
i,j pair
i+1
j
i
i unpaired
j-1
j
i
j unpaired
i
k+1
k
i+1
j
bifurcation
More Secondary Structures
Pseudoknots:
Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F.
(1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press.
rRNA Secondary Structure Based on Phylogenetic Data
24
Self complementary methods
25
Predicting RNA Secondary
Structures
• By Thermodynamics Method
• Minimize Gibbs Free Energy
• By Phylogenetic Comparison Method
• Compare RNA Sequences of Identical Function
From Different Organisms
• By Combination of the Above Two
Methods
• In principle, this could be the most powerful
method
Thermodynamics
∆G = ∆H - T∆S
•
Gibbs Free Energy, G
•
•
Describes the energetics of
biomolecules in aqueous solution.
The change in free energy, ∆G , for a
chemical process, such as nucleic
acid folding, can be used to
determine the direction of the
process:
∆G=0: equilibrium
∆G>0: unfavorable process
∆G<0: favorable process
•
∆H is enthalpy, ∆S is entropy, and T
is the temperature in Kelvin.
•
Thus the natural tendency
for biomolecules in solution
is to minimize free energy of
the entire system
(biomolecules + solvent).
•
Molecular interactions, such as hydrogen
bonds, van der Waals and electrostatic
interactions contribute to the ∆H term. ∆S
describes the change of order of the
system.
Thus, both molecular interactions as well
as the order of the system determine the
direction of a chemical process.
For any nucleic acid solution, it is
extremely difficult to calculate the free
energy from first principle
•
•
•
•
•
•
Biophysical methods can be used to
measure free energy changes
26
The Equilibrium Partition
Function
• For a population of structures, S, a
partition function Z and the probability for a
particular folding, s can be calculated:
− ∆G
s
e
Z = ∑ e RT
s∈ S
− ∆ Gs
RT
Z
Energy Minimization Method
(mFOLD – RNA Structure)
•
•
An RNA Sequence is called R= {r 1,r2,r3…r n}, where ri is the ith ribonucleotide and
it belongs to a set of {A, U, G, C}
A secondary structure of R is a set S of base pairs, i.j, which satisfies:
• 1=<i <j=<n;
• j-i>4 (can’t have loop containing less than 4 nucleotides);
• If i,j and i’ .j’ are two basepairs, (assume i =< i’ ), then either
» i = i’ and j = j’ (same base pair)
» i < j < i’ < j’ (i.j proceeds i’ .j’ ) or
» i < i’ < j’ < j (i.j includes i’. j’ ) (this excludes pseudoknots which is
i<i’ <j<j’ )
5’
3’
•
3’
5’
If e(i,j) is the energy for the base pair i.j, the total energy for R is
E (S ) =
∑e (i , j )
i , j∈ S
• The objective is to minimize E(S).
27
Representations (cont.)
• Hydrogen bonds between
intra-chain pairs are
represented by circular
arcs
All representations are equivalent
28
Free Energy Parameters
• Extensive database of free energies for the
following RNA units has been obtained (so
called “Tinoco Rules” and “Turner Rules”):
• Single Strand Stacking energy
• Canonical (AU GC) and non-canonical (GU) basepairs in
duplexes
• Still lacking accurate free energy parameters
for
• Loops
• Mismatches (AA, CA etc)
• Using these energy parameters, the current
version of mFOLD – RNA Structure can
predict ~73% phylogenetically deduced
secondary structures.
Dynamic Programming
(mFOLD)
• A matrix W(i,j) is computed that
• An Example of W(i,j)
•
1.
2.
3.
4.
is dependent on the
experimentally measured
basepair energy e(i,j)
Recursion begins with i=1, j=n
If W(i+1,j)=W(i,j), then i is not
paired. Set i=i+1 and start the
recursion again.
If W(i,j-1)=W(i,j), then j is not
paired. Set j=j-1 and start the
recursion again.
If W(i,j)=W(i,k)+W(k+1,j) , the
fragment k+1,j gets put on a
stack and the fragment i…k is
analyzed by setting j = k and
going back to the recursion
beginning.
If W(i,j)=e(i,j)+W(i+1,j-1), a
basepair is identified and is
added to the list by setting i=i+1
and j=j-1
29
Suboptimal Folding (mFOLD)
• For any sequence of N nucleotides, the
expected number of structures is greater than
1.8 N
• A sequence of 100 nucleotides has 3x1025
foldings. If a computer can calculate 1000 strs./s 1, it would take 1015 years!
• mFOLD generates suboptimal foldings whose
free energy fall within a certain range of values.
Many of these structures are different in trivial
ways. These suboptimal foldings can still be
useful for designing experiments.
30
Energy
dot-plot
Predicting RNA 3D Structures
• Currently available RNA 3D structure prediction
programs make use the fact that a tertiary structure is
built upon preformed secondary structures
• So once a solid secondary structure can be predicted, it
is possible to predict its 3D structure
• The chances of obtaining a valid 3D structure can be
increased by known space constraints among the
different secondary segments (e.g. cross-linking, NMR
results).
• However, there are far less thermodynamic data on 3 -D
RNA structures which makes 3-D structure prediction
challenging.
31
RNA-protein Interactions
• There is currently no computational method that can
predict the RNA-protein interaction interfaces;
• Statistical methods have been applied to identify
structure features at the protein-RNA interface. For
instance, ENTANCLE finds that most atoms
contributed from a protein to recogonizing an RNA
are from main chains (C, O, N, H), not from side
chains! But much remain to be done;
• Electrostatic potential has primary importance in
protein-RNA recognition due to the negatively
charged phosphate backbones. Efforts are made to
quantify electrostatic potential at the molecular
surface of a protein and RNA in order to predict the
site of RNA interaction. This often provides good
prediction at least for the site on the protein.
Fundamentos del SSCP
32
SSCP del gen completo de PZAsa
(P7-P8)
R
S
S
R
H37RV (Sensible)
R
S
R
S
S
R: Wayne Negativo (Resistente)
M.Bovis (Resistente)
Calculation procedure for extinction (absorption)
coefficient of DNA
Extinction coefficient at 260 nm, 25 degrees of Celsius, and neutral pH
for the single-strand DNA is determined by the nearest-neighbor method
33
The following table contains extinction coefficients [l/(mmol.cm)]:
stack or monomer
extinction coefficient
pdA
15.4
pdC
7.4
pdG
11.5
pdT
8.7
dApdA
13.7
dApdC
10.6
dApdG
12.5
dApdT
11.4
dCpdA
10.6
dCpdC
7.3
dCpdG
9.0
dCpdT
7.6
dGpdA
12.6
dGpdC
8.8
dGpdG
10.8
dGpdT
10.0
dTpdA
11.7
dTpdC
8.1
dTpdG
9.5
dTpdT
8.4
http://biotools.idtdna.com/gateway/
34
http://www.owczarzy.net/biodata.htm
35