Day 1 Databases
Transcription
Day 1 Databases
Introduc1on to Bioinforma1cs applied to genomics Day 1 Databases May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases hDp://www.biodbs.info/ May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases The main sequence databases General DB: NCBI h'p://www.ncbi.nlm.nih.gov/ EMBL h'p://www.ebi.ac.uk/embl/ DDBJ h'p://www.ddbj.nig.ac.jp/ Specialized DB (specific organisms): Flybase h'p://flybase.org/ SGD h'p://www.yeastgenome.org/ TAIR h'p://www.arabidopsis.org/ ENSEML h'p://www.ensembl.org/index.html Proteins Uniprot h'p://www.uniprot.org/ Swiss-‐prot h'p://web.expasy.org/ PDB h'p://www.rcsb.org/pdb/home/home.do Publica1ons Pubmed May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Exponen1al Growth of GenBank (hDp://en.wikipedia.org/wiki/GenBank) As of 15 April 2012, GenBank release 189.0 has 151,824,421 loci, 139,266,481,398 bases, from 151,824,421 reported sequences May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases GenBank (hDp://www.ncbi.nlm.nih.gov/genbank/) May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases GenBank Database Divisions (hDp://www.ncbi.nlm.nih.gov/genbank/) h'p://www.ncbi.nlm.nih.gov/books/NBK21105/ #GenBank_ASM May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Refseq hDp://www.ncbi.nlm.nih.gov/projects/RefSeq/ May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Uniprot hDp://www.uniprot.org/ Protein databases in two sec_ons: UniProtKB Swiss-‐Prot manually annotated and reviewed (smaller than TrEMBL) TrEMBL automa_cally annotated and nor reviewed UniRef Sequence clusters May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Data Access Easy Interface to use for query Descrip1on of keys for searching : hDp://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusB May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Data Access Easy Interface to use for query ie A search for all nuc. from Coffea canephora (organism) with a sequence length between 1000 bp and 10000 gave 16 results May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Data Access Easy Interface to use for query ie A search for all nuc. from Coffea canephora (organism) with a sequence length between 1000 bp and 10000 gave 16 results Taxonomy Download sequences in various format Publica1on May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Databases Formats May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Make your own Database with GenBank and SRS! Go to GenBank h'p://www.ncbi.nlm.nih.gov/ Use the GenBank search fields Enter search keys such as Coffea canephora [ORGN] and genomic sequence length range between 100 bp to 1000 bp : 100:1000 [SLEN] Locus name [ACCN] Sequence length [SLEN] Molecule Type [PROP] Genbank Division [PROP] Modification Date[MDAT] Definition [TITL] Accession [ACCN] Version All fields GI All fields Keywords [KYWD] Source [ORGN] Organism [ORGN] Reference [TITL][AUTH][JOUR] Features [FKEY] CDS [FKEY] gene [FKEY] May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Pair-‐wise sequences comparisons Make your own with GenBank and SRS! Go to EMBL SRS hDp://srs.ebi.ac.uk/srsbin/cgi-‐bin/wgetz?-‐page+top Select EMBL (Nucleo_de database) Extended query form (lef) Select all genomic DNA > 1 kb for the genus Coffea Download sequences to create a Database of Coffea sequence May-‐June 2012 Romain Guyot & Chris1ne Tranchant Introduc1on to Bioinforma1cs applied to genomics Exercises Day 1 May-‐June 2012 Romain Guyot & Chris1ne Tranchant