Day 1 Databases

Transcription

Day 1 Databases
Introduc1on to Bioinforma1cs applied to genomics Day 1 Databases May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases hDp://www.biodbs.info/ May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases The main sequence databases General DB: NCBI h'p://www.ncbi.nlm.nih.gov/ EMBL h'p://www.ebi.ac.uk/embl/ DDBJ h'p://www.ddbj.nig.ac.jp/ Specialized DB (specific organisms): Flybase h'p://flybase.org/ SGD h'p://www.yeastgenome.org/ TAIR h'p://www.arabidopsis.org/ ENSEML h'p://www.ensembl.org/index.html Proteins Uniprot h'p://www.uniprot.org/ Swiss-­‐prot h'p://web.expasy.org/ PDB h'p://www.rcsb.org/pdb/home/home.do Publica1ons Pubmed May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Exponen1al Growth of GenBank (hDp://en.wikipedia.org/wiki/GenBank) As of 15 April 2012, GenBank release 189.0 has 151,824,421 loci, 139,266,481,398 bases, from 151,824,421 reported sequences May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases GenBank (hDp://www.ncbi.nlm.nih.gov/genbank/) May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases GenBank Database Divisions (hDp://www.ncbi.nlm.nih.gov/genbank/) h'p://www.ncbi.nlm.nih.gov/books/NBK21105/
#GenBank_ASM May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Refseq hDp://www.ncbi.nlm.nih.gov/projects/RefSeq/ May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Uniprot hDp://www.uniprot.org/ Protein databases in two sec_ons: UniProtKB Swiss-­‐Prot manually annotated and reviewed (smaller than TrEMBL) TrEMBL automa_cally annotated and nor reviewed UniRef Sequence clusters May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Data Access Easy Interface to use for query Descrip1on of keys for searching : hDp://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusB May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Data Access Easy Interface to use for query ie A search for all nuc. from Coffea canephora (organism) with a sequence length between 1000 bp and 10000 gave 16 results May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Data Access Easy Interface to use for query ie A search for all nuc. from Coffea canephora (organism) with a sequence length between 1000 bp and 10000 gave 16 results Taxonomy Download sequences in various format Publica1on May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Formats May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Make your own Database with GenBank and SRS! Go to GenBank h'p://www.ncbi.nlm.nih.gov/ Use the GenBank search fields Enter search keys such as Coffea canephora [ORGN] and genomic sequence length range between 100 bp to 1000 bp : 100:1000 [SLEN] Locus name
[ACCN]
Sequence length [SLEN]
Molecule Type
[PROP]
Genbank Division [PROP]
Modification Date[MDAT]
Definition
[TITL]
Accession
[ACCN]
Version
All fields
GI
All fields
Keywords
[KYWD]
Source
[ORGN]
Organism
[ORGN]
Reference
[TITL][AUTH][JOUR]
Features
[FKEY]
CDS
[FKEY]
gene
[FKEY]
May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Pair-­‐wise sequences comparisons Make your own with GenBank and SRS! Go to EMBL SRS hDp://srs.ebi.ac.uk/srsbin/cgi-­‐bin/wgetz?-­‐page+top Select EMBL (Nucleo_de database) Extended query form (lef) Select all genomic DNA > 1 kb for the genus Coffea Download sequences to create a Database of Coffea sequence May-­‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Exercises Day 1 May-­‐June 2012 Romain Guyot & Chris1ne Tranchant