- PINE Server
Transcription
- PINE Server
ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... ARECA Server Validation of Protein NMR Chemical Shift Assignments against NOE Data Home | Examples | NOESY Contacts Probabilities | Manual | About | NMRFAM Contents: 1. 2. Section.0 Applications of ARECA Section.1 Inputs 1. Input files 2. How to prepare the input files 1. Option A: Peak lists 1. Chemical shift assignments 2. NOESY peak lists 3. Optimizing ARECA's calculation 2. Option B: NOESY spectra 1. Packed input file 2. Optimizing ARECA's calculation 3. Section 2. Outputs 1. 2. 3. 4. 4. 2.1. 2.2. 2.3. 2.4. Short report via email Simple report Comprehensive report NOESY peak lists Section 3. FAQ 1. How to interpret ARECA's probabilities 1. Truth model 2. Overall Probability of a proton 3. Overall Probability of a heavy atom 2. What are these erroneous atoms? 3. What does is it mean when the percentage of erroneous atoms is higher than 5% 4. How to find the atoms with low probabilities? 5. How to find the reasoning behind a probability? 6. Who to contact for further questions and comments? Section.0 Applications of ARECA 1 of 9 04/27/2015 04:34 PM ARECA@NMRFAM 2 of 9 sftp://chianina/raid/data/mani/website/ARECA/m... 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... [TOP] Section 1. Inputs 1.1. Input files To validate the chemical shift assignments ARECA uses: 1. 1. Chemical shift (CS) assignments. 2. 2. At least one of the common NOESY experiments: 1. 15N-NOESY 2. 13C-NOESY 3. 13C-NOESY (Aromatic) 4. 13C-NOESY (D2O) [TOP] 1.2. How to prepare the input files There are two options for preparing the input files: 1.2.1. Peak lists To use NOESY peak lists, you need to prepare the chemical shift assignments and NOESY peak list in one of the following ways. 1.2.1.1. Chemical shift assignments: 3 of 9 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... You can prepare the chemical shift assignmens in either the BMRB NMR-STAR or XEASY formats 1. BMRB NMR-STAR formats: extensive descriptions of NMR-STAR 2.1 and 3.1 could be found here 1. NMR-STAR 2.1: The assignment file should start with a header that explains the format in which the assignments are stored in. ARECA explicitly looks for the following tags: _Residue_seq_code _Residue_label _Atom_name _Chem_shift_value 2. NMR-STAR 3.1: The assignment file should start with a header that explains the format in which the assignments are stored in. ARECA explicitly looks for the following tags: _Atom_chem_shift.Seq_ID _Atom_chem_shift.Comp_ID _Atom_chem_shift.Atom_ID _Atom_chem_shift.Val 2. XEASY format: XEASY prot file requires an additional file that shows the sequence of the protein. The following formats are needed to run ARECA: 1. Assignment .prot file: Col1: assignment index Col2: chemical shift Col3: error estimate Col4: atom name Col5: residue index example: 1 62.755 0.000 CA 2 2 34.367 0.000 CB 3 3 7.689 0.000 H 3 4 107.846 0.000 N 3 2. Sequence (3-letter-code with indices): Col1: residue three letters Col2: residue index example: GLY 1 SER 2 LYS 3 [TOP] 1.2.1.2. NOESY peak lists: Peak lists could be in SPARKY or XEASY formats, where the SPARKY file starts with a header usually Assignment w1 w2 w3 Data Height and the XEASY files contain peak information as follows: # Number of dimensions 3 #INAME 1 13C #INAME 2 H1 #INAME 3 1H 4525 119.578 4.684 7.067 2 U 8.400e+02 0.00e+00 m 0 0 0 0 0 4526 119.578 3.206 7.067 2 U 1.201e+03 0.00e+00 m 0 0 0 0 0 4527 119.578 3.092 7.067 2 U 1.399e+03 0.00e+00 m 0 0 0 0 0 [TOP] 1.2.1.3. Optimizing ARECA's calculation As indicated in the Section.0 Applications of ARECA, the calculation of probabilities in ARECA can narrow down to specific NOE contacts. In this section you can optimize your calculation according to your NOESY data. [TOP] 1.2.2. NOESY spectra Packed input file Use the PONDEROSA Client program for preparing a 'packed-input.txt' file. 4 of 9 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... This client performs an advanced 'restricted' peak picking and generates one compact file (packed-input.txt). This file contains the CS assignments and the NOESY peaks lists and is ready to be submitted to the ARECA's website. [TOP] Section 2. Outputs For the complete list of the output files check out ARECA's publication. Here we describe some of them: 2.1. Short report via email As soon as ARECA finishes the validation process, an email will be sent to you with the following information: 1. A short report: In this report you will see 1. Number of missing assignments (residues): This number indicates the number of residues without any chemical shift assignments. 2. Number of missing assignments (atoms): Number of atoms with missing chemical shift assignments. 3. Number of missing NOESY strips: For every heavy atom and its directly attached proton, ARECA looks for peaks in the given NOESY peak lists within a tolerance (heavy atom 0.4ppm, proton 0.03ppm, entire HNOE dimension), and when ARECA cannot find any peak then the heavy atom will be marked as ‘missing NOESY strip’. The ‘Number of missing NOESY strips’ shows the number of such heavy atoms. 4. Number of erroneous atoms:ARECA calculates a probability of correctness for every atom. The atoms with the probability lower than 50% are considered erroneous. More information about these probabilities (Section 3.1.). 5. Percentage of erroneous atoms: This number is the ratio of a.4 over the total number of assigned atoms. Note: when this percentage is more than 5%, it means the assignments are not consistent with the NOESY spectra and the spectra should be investigated manually. When this number is less than 5%, it means there are some incorrect assignments that should be reconsidered. These incorrect assignments could be found (Section 3.4.). 2. An URL to a compressed file that contains the complete report. 3. An URL to a figure that shows the ‘Overall backbone heavy atoms probabilities’ in bar-plot. In this plot the residues are distributed on the x-axis and y-axis shows the probabilities. 4. An URL to a figure that shows the ‘Overall protons probabilities’ in bar-plot. In this plot the residues are distributed on the x-axis and y-axis shows the probabilities. 2.2. Simple report This report is in xml format and for every residue it shows the atoms with low probability (less than 0.5) or missing assignments. 2.3. Comprehensive report This report in pdf format contains all the necessary information for recalculating the probabilities and investigating the reasoning behind them. 2.4. NOESY peak lists ARECA maps the the given assignments onto the NOESY peak lists which will be reported along with the calculated probability for every assignment. These probabilities are listed under the 'Note' column of the Sparky peak list table (two letter code 'lt'). When ARECA's extension in NMRFAM-SPARKY is used to load the peak lists, it will automatically color the peaks (acceptable assignmens: green and blue; suspecious assignments: red, yellow). Section 3. FAQ 3.1. How to interpret ARECA's probabilities In this section we explain the meaning of ARECA’s probabilities, and then different representations of these probabilities are discussed. 3.1.1. Truth model 5 of 9 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... The truth model indicates the probability of observing a NOESY contact (a cross-peak) between every two protons of an amino acid (intra-residue contacts) and also this probability between two protons of two sequentially adjacent amino acids (inter-residue contacts). These probabilities are explained in ARECA’s manuscript and could be found on “NOESY Contacts Probabilities (NCP)†server. 3.1.2. Overall Probability of a proton For a proton, the truth model provides a list of heavy atoms that their directly attached protons are expected to have a NOESY contact with the proton. These expected NOESY contacts have a probability higher than 95% in the truth model. For each of these expected NOESY contacts, ARECA looks for a peak in the given NOESY peak lists and assigns a probability to the contact. This probability is 1 of there is peak with ppm differences less than 0.03ppm and is zero for differences higher than 0.04ppm. The probabilities for other ppm-differences between these cutoffs are calculated based on a linear function that is indicated in Fig 3.1.f1. Fig 3.1.f1. Probability function for of observing an expected NOESY contact in the given NOESY peak list. The x-axis shows the minimum ppm differences between the expected position of a NOES peak and the peaks in the NOESY peak lists. Therefore, to calculate the overall probability of a proton, ARECA performs the following steps. 1. For a proton (H) extract all the intra- and inter-residue protons that are expected to form a NOESY contact with it. Call the set of these protons (h). 2. For every proton in h 1. Find its directly attached heavy atom. 2. Find a set of peaks (from the given peak list) that are in tolerance of 0.4ppm of the heavy atom and 0.03ppm of the proton. 3. Calculate the minimum ppm differences between the H and the peaks in set of peaks. 4. Assign a probability according to Fig 3.1.f1. 3. Take average of the probabilities to calculate the overall intra- or inter-residue probabilities. For an example we calculate the overall probability of E10HG2 of the protein HR8254A (CASD-NMR [https://www.wenmr.eu/wenmr/casd-nmr-data-sets]). For this example the raw peak list is used. ARECA reported the overall probability of 0.918 for this proton, and here we see the process step by step. To calculate the intra- and inter-residue probabilities we consider the triplet Glu9-Glu10-Gln11. 1. Overall inter-residue probability of E10HG2 being observed with heavy atoms of Glu9(overall inter-residue probability) According to the truth model there is no expected NOESY contact between protons of Glu9 and the E10HG2. The probabilities of the truth model could be found [http://pine.nmrfam.wisc.edu/NCP].Therefore, ARECA expects no NOESY contacts and the overall inter-residue probability between E10HG2 and protons of Glu9 is 1. 2. Overall intra-residue probability of E10HG2 being observed with intra-residue heavy atoms(overall intra-residue probability) The probabilities of expecting NOESY contacts between intra-residue protons of Glu are shown in Table 3.1.T1. Intra-residue CS Assignments Min. ppm differences Truth model probability ARECA’s probability 6 of 9 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... H 7.83 0.010 0.98 1 HA 3.92 0.017 0.98 1 HB2 2.00 0.019 1.00 1 HB3 2.32 0.442 0.98 0 HG2 2.34 0.001 1.00 1 HG3 1.92 0.003 0.98 1 HE NA - 0.03 - Therefore, 5 of the 6 expected NOESY contacts could be confirmed with the given NOESY peak lists. Thus, the overall intra-residue probability for E10HG2 is 0.83. 3. Overall inter-residue probability of E10HG2 being observed with heavy atoms of Gln11(overall inter-residue probability) Table 3.1.T2 shows the expected probabilities and minimum ppm differences between the chemical shift of E10HG2 and peaks in the strip plots of heavy atoms of the Gln11. Intra-residue CS Assignments Min. ppm differences Truth model Probability ARECA’s probability H 8.608 0.016 0.99 1 HA 4.117 - 0.42 - HB2 2.227 - 0.23 - HB3 2.370 - 0.14 - HE21 7.990 - 0.14 - HE22 6.243 - 0.09 - HG2 2.406 - 0.22 - HG3 2.303 - 0.19 - According to the truth model, ARECA expect to observe a NOESY cross-peak between the amide of Gln11 and the E10HG2, and since the min ppm difference between the peaks in the strip plot of amide of Gln11 is less than .03, ARECA assigned the overall inter-residue probability of E10HG2 being observed by heavy atoms of Q11 as 1.00. The overall probability of E10HG2 is a weighted sum of these overall intra- and inter-residue probabilities. In this sum the weight of intra-residue probability is 50% and the inter-residue probabilities 25%. Therefore, the overall probability of E10HG2 is 0.25*1+0.5*0.83+0.25*1=0.915. If this overall probability is less 0.50, ARECA flags the assignment of this atom as suspicious. This overall probability is reported in the comprehensive report (pdf file), chapter 3, table (Chemical Shift Distributions and Assignment Probabilities). The overall intra- and inter-residue probabilities are reported in the pdf file table (Atom overall intra/inter-residue assignment probabilities). The ppm differences could be found in the strip plots in the chapter3 of the comprehensive report. 3.1.3. Overall Probability of a heavy atom To explain these probabilities, let’s assume we want to calculate the overall intra-residue probability of amide nitrogen of an amino acid. The following steps show the process. 1. For a heavy atom (i.e. N) find its directly attached proton (i.e. H). 2. From the given chemical shift assignments, find the chemical shift of the directly attached proton. 3. From the truth model, find all the intra-residue protons that are expected to form a NOESY contact with the directly attached proton of the heavy atom. Call the set of these protons h *. 4. From the given NOESY peak lists, extract the peaks that are in tolerance of 0.4ppm and 0.03ppm of the chemical shifts of the heavy atom and its directly attached proton, respectively. Call the set of these peaks p. 5. For every proton (p) in h*, calculate the minimum ppm differences between p and peaks in p. 6. Calculate the probabilities of these differences using the probability function (Fig 3.1.f1.) 7. Assign the average of these probabilities as the probability of the overall intra-residue probability of the heavy atom. These steps will be used for other heavy atoms and inter-residue probabilities. For example let us calculate the overall probability of the amide nitrogen of Arg70 from a triplet Ser69-Arg70-Ala71 from the protein HR8254A (CASD-NMR [https://www.wenmr.eu/wenmr/casd-nmr-data-sets]). In this example the raw 7 of 9 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... peak lists of the protein are used. 1. Amide nitrogen of Arg70 observing protons of Ser69 (overall inter-residue probability) The probabilities of expecting a NOESY contact between protons of Ser69 and the amide proton of Arg70 are shown in Table 3.2.T1 (“Truth model Probabilityâ€). The proton S69HG is not assigned; therefore ARECA does not consider it in the calculation of the probabilities. However, this atom will be reported in as “a missing assignmentâ€. The assigned chemical shifts of the other three protons are shown in the table. The minimum ppm differences (from step 5) are calculated and reported in the table as well. The probability function (Fig 3.1.f1.) was used to assign a probability to each of differences (ARECA’s probability). The overall probability of amide nitrogen observing protons of Ser69 is calculated by averaging these probabilities (0.66). Protons of Ser(i-1) CS assignments Min ppm differences Truth model probability ARECA’s probability H 8.10 0.072 1.000 0 HA 4.34 0.015 1.000 1 HB 3.86 0.011 1.000 1 HG NA - 0.969 - 2. Amide nitrogen of Arg70 observing protons of Arg70 (overall intra-residue probability) In Table 3.2.T2 we report the necessary values for calculating the overall intra-residue probability of amide nitrogen of Arg70. Since the expectation probability of HD is less than 95%, ARECA does not consider this proton in its probability calculation process. From other assigned and expected atoms two of them have the probability of 0.0 (ppm difference more than 0.04). The average of the assigned probabilities is (1+1+.4)/5=0.48. Intra-residue CS Assignments Min. ppm differences Truth model Probability ARECA’s probability H 8.01 0.014 1.000 1 HA 4.27 0.050 0.995 0 HB2 1.84 0.036 0.977 0.4 HB3 1.72 0.090 0.993 0 HD 3.14 0.258 0.849 - HE NA - 0.591 - HG 1.60 0.007 0.977 1 HH# NA - 0.240 - 3. Amide nitrogen of Arg70 observing protons of Ala71 (overall inter-residue probability) The same steps are followed to calculate the overall probability of amide nitrogen observing the protons of Ala71. Table 3.2.T3 shows the necessary information for calculating this probability. As indicated in the table, the only Ala71 proton that is expected (truth model probability > 95%) to form a NOESY contact with the amide proton of Arg70 is A71H. And since the minimum ppm difference for A71H is less than 0.03ppm ARECA’s probability is 1. Protons of Ala(i+1) CS Assignments Min ppm differences Truth model Probability ARECA’s probability H 8.04 0.013 1.000 1 HA 4.23 0.072 0.888 - HB 1.33 0.054 0.472 - Next, to calculate the overall probability of R70N, we take a weighted sum of the calculated overall intra- and interresidue probabilities. For this sum, the weight for intra-residue probability is 50% and the weights for inter-residue probabilities are 25%. Therefore, the overall probability of R70N is equal to 0.25*0.66 + 0.50*0.48 + 0.25*1 = 0.65 3.2. What are these erroneous atoms? Flagged atoms are the ones with the overall probability lower than 0.50. When the overall probability is less the 50%, it means more than half of the expected peaks could not be found in the NOESY spectra. 3.3. What does is it mean when the percentage of erroneous atoms is higher than 5% 8 of 9 04/27/2015 04:34 PM ARECA@NMRFAM sftp://chianina/raid/data/mani/website/ARECA/m... Most probably, there is something wrong with the NOESY peak lists; bad peak picking or low resolution spectra. I guess you need to check the quality of your spectra/peak list. However, when the percentage is less than 5%, probably there are some incorrect assignments and you should check the assignments. 3.4. How to find atoms overall probabilities? If the overall probability is less than 50%, ARECA considers the assignment as an erroneous assignment. This overall probability is reported in 1. From the comprehensive report (pdf file), Section 3, Table named “Chemical Shift Distributions and Assignment Probabilities†2. If the probability is lower than 50%: Short summary (xml file). 3. “Assigned NOESY Peaks†folder in the 15N- or 13C-NOESY peak list. The overall intra- and inter-residue probabilities are reported in the comprehensive report (pdf file) in table “Atom overall intra/inter-residue assignment probabilitiesâ€. The ppm differences are shown in NOESY plots in the comprehensive report (pdf file), Section 3, simulated strip plots. 3.5. How to find the atoms with low probabilities? There are several ways to find these atoms. An easy way is to open the summary.xml file under Output/txt. If you want to check them on the spectra, open the provided peak lists using the two letter code ‘ar’ in the NMRFAMSPARKY. This will load the peaks on the NOESY spectra and color them. The low probability assignments are colored yellow and red. Or you can load ARECA’s peak lists with the ‘rp’ command (in NMRFAM-SPARKY) and open the peak list with the ‘lt’ command. The probabilities of the assignments are shown under the ‘Note’ column. 3.6. How to find the reasoning behind a probability? Follow the steps in Section 3. The necessary information are provided in the comprehensive report (pdf file) and the NCP web-site. 3.6. Who to contact for further questions and comments? Feel free to contact us: areca.nmrfam @ gmail.com NMRFAM© 9 of 9 04/27/2015 04:34 PM