The Continually Expanding Internet: how to find Quality Information NOLUG Presentation 27

Transcription

The Continually Expanding Internet: how to find Quality Information NOLUG Presentation 27
The Continually Expanding Internet: how to
find Quality Information
NOLUG Presentation
27th February 2009
Presented by Karen Blakeman
http://www.rba.co.uk/nolug/
Photo: Oslo University College http://www.flickr.com/photos/damiel/1534329928/
27 May 2010
Karen Blakeman
www.rba.co.uk
1
This presentation is licensed
under
a Creative Commons Attribution 3.0 License
Karen Blakeman
RBA Information Services
Tel: +44 118 947 2256
karen.blakeman@rba.co.uk
http://www.rba.co.uk/
blog: http://www.rba.co.uk/wordpress/
Facebook – Karen Blakeman
Twitter: karenblakeman
27 May 2010
Karen Blakeman www.rba.co.uk
2
What Google's homepage may look like in 2084
www.nytimes.com/imagepages/2005/10/10/opinion/1010opart.html
27 May 2010
Karen Blakeman www.rba.co.uk
3
Two points to remember..
1. Google et al do not exist to help
you find information
2. Search engines, and in particular
Google, are temperamental beasts
Do not attempt to apply logic to the
way they work – therein lies the
path to madness
27 May 2010
Karen Blakeman www.rba.co.uk
4
Types of search tools
 Humans
– is colleague or a friend already working or has worked in the
subject area?
– who have you met at meetings, conferences?
– discussion lists, trade/professional associations, bloggers,
LinkedIn, Facebook etc.
 Search engines
– different options for different types of information e.g. news,
images
 Evaluated listings, subject listings, types of information
 Databases and peer reviewed sources
 multi search engine tools
– search many search tools at once
– or type in your search once and click on each search tool in turn
27 May 2010
Karen Blakeman www.rba.co.uk
5
How up to date are search engines?
 Not very
 You are searching an out of date index of the web and not
the live web itself
 May takes days to months for a site to be added to the
index
 Hierarchy of sites for updating
 Some tools keep links to dead pages for a long time
 Least up to date:
– Google
 Most up to date:
– Live Search, Yahoo
27 May 2010
Karen Blakeman www.rba.co.uk
6
A search engine‟s results may vary
 In content and presentation
– from one minute to the next
– different server being used
– testing out different search and ranking algorithms
 Country versions
–
–
–
–
27 May 2010
different emphasis
local content
different interface
different search features
Karen Blakeman www.rba.co.uk
7
27 May 2010
Karen Blakeman www.rba.co.uk
8
General search techniques
 By default, the major search tools look for all of your terms in a
page
 Use double quote marks around phrases
– e.g. “climate change”
 To exclude pages containing a term, precede the term with a
minus sign (-) – use with care
 Boolean search
– OR, AND, NOT
– must use capital letters for the operators
– only OR works in Google and even that does not work well but
worth trying more complex searches
– Live.com, MSE360 and Exalead are best (Yahoo has withdrawn
NOT and nested searches no longer work correctly
– for example (directory OR directories OR database)
AND (oil OR petroleum) AND Norway
27 May 2010
Karen Blakeman www.rba.co.uk
9
General search techniques (2)
 Focus your search on areas of the document
– inurl: for example inurl:”climate change”
• looks for your terms in the URL
– intitle: for example intitle:”climate change”
• looks for your term in the title of the page
 Search sites or domains using the site: command
– chocolate labelling regulations site:europa.eu
 Imagine what you would like to appear in your ideal document
and include those terms in your strategy
 Partially answer your question in your strategy
– “A hippopotamus can run at”
 Use the file formats and domain search to refine your search
27 May 2010
Karen Blakeman www.rba.co.uk
10
File format search
 Use advanced search options to limit your search to file
types or format:
–
–
–
–
pdf or doc for government or industry/market reports
xls for data and statistics
ppt or pdf for presentations
Search in at least Google and Yahoo, also consider Live.com
 Looking for experts on a topic, presentations, a „how to”
guide”, general background on a subject, information on an
organisation
–
–
–
–
27 May 2010
advanced search ppt or pdf format
Slideshare http://www.slideshare.net/
authorSTREAM http://www.authorstream.com/
YouTube http://www.youtube.com/
Karen Blakeman www.rba.co.uk
11
Advanced Search
options can vary
depending on the
country version of
Google
27 May 2010
Karen Blakeman www.rba.co.uk
12
General search techniques (3)
 Repeat your key search terms in your strategy
– chocolate production france belgium austria
– chocolate production austria france belgium belgium belgium
• give different results
• In Google can enter up to 32 terms, Yahoo 250 characters
 Change the order of your terms
– chocolate production france belgium austria
– production france belgium austria chocolate
• different results
• See the summary and comparison chart for the major
search engines at
http://www.rba.co.uk/search/compare.pdf and
http://www.rba.co.uk/search/compare.shtml
27 May 2010
Karen Blakeman www.rba.co.uk
13
Unique Google search features
 Automatically looks for variations on your terms
– to force and exact match precede your terms with plus signs
e.g. air +pollution
 Synonym search
– precede your search terms with a tilde (~) e.g. ~banking
– only works on English terms
 Numeric range search
–
–
–
–
–
–
27 May 2010
can be weights, distances, years, prices
use Advanced Search screen
or the search box on the Google home page
search term(s) first value..second value unit of measurement
toblerone 1..5 kg
TV advertising spend forecasts 2009..2015
Karen Blakeman www.rba.co.uk
14
Unique Google search features (2)
 Proximity
– use the asterisk (*) to stand in for one or more terms
– macular * degeneration picks up
• macular retinal degeneration
• macula disciform degeneration
• macular choroidal degeneration
• macular vitelliform degeneration
• macular pigmentary degeneration
– separates the terms by one or more words
• no information on maximum number of terms of
separation
27 May 2010
Karen Blakeman www.rba.co.uk
15
Google - What‟s New
 Knol – “A unit of knowledge”
– competing with Wikipedia
– http://knol.google.com/
 Google results may now include images, books, news, site
summaries and links
– varies depending on country version of Google
 Much improved Google Finance, worthy competitor to
Yahoo Finance
– http://www.google.com/finance
– BUT country coverage of share prices not as good as
Yahoo e.g. for Norway
27 May 2010
Karen Blakeman www.rba.co.uk
18
Google Finance
27 May 2010
Karen Blakeman www.rba.co.uk
19
Yahoo Finance
27 May 2010
Karen Blakeman www.rba.co.uk
20
Google SearchWiki
 Enables you to customise your results
– move pages up or down the ranking, delete pages from your list
– add comments to a page
 Must be signed in with a Google account
 Can interfere with Firefox add ons such as Customise Google
 Not available in all country versions
27 May 2010
Karen Blakeman www.rba.co.uk
21
Google plug-ins and add ons
 Google Toolbar for both Firefox and IE
– search from your browser
– direct search for highlighted terms
– fully customisable
 Firefox Add-on
–
–
–
–
Customize Google
http://www.customizegoogle.com/
Add numbers to results
Can “stream” , keep scroll down the page to see more results
instead of clicking on the next page
– Links to other search engines at the top of the results list,
engines vary depending on search type e.g. web, new,
images
27 May 2010
Karen Blakeman www.rba.co.uk
22
Design your own search engine
 For
– regularly searched sites
– selected sites on a topic
– searching sites on a reading list
 Rollyo
– http://www.rollyo.com/
– max 25 sites
 Google Custom Search Engines
– http://www.google.com/coop/cse
– at least hundreds of sites, maybe thousands!
– can import lists of sites
 Cannot search password protected sources or sites where you
have to fill in a form to access the information
27 May 2010
Karen Blakeman www.rba.co.uk
23
Google CSE
 Examples:
– Netting the Evidence
• http://www.google.com/coop/cse?cx=0043268979584776
06950%3Adjcbsrxkatm
– AlacraSearch
• http://www.alacra.com/alacrasearch
– pipl
• http://www.pipl.com/
– Chipwrapper
• http://www.chipwrapper.co.uk/
 can be hosted on your own site or on Google
– http://www.rba.co.uk/sources/energy.shtml
– http://www.google.com/coop/cse?cx=0143042123649627400
38:tui4ebh5r_a
27 May 2010
Karen Blakeman www.rba.co.uk
24
Create your own Google CSE on Google
27 May 2010
Karen Blakeman www.rba.co.uk
25
..or host it on your own web site or blog
27 May 2010
Karen Blakeman www.rba.co.uk
26
Other search engines...
 Different coverage
–
–
–
–
Level of indexing on web sites
Sites included in the index
Update frequency
Amount of a page that is indexed
 Different search features
 Different algorithms for sorting results
 Compare search engines
– http://ranking.thumbshots.com/
27 May 2010
Karen Blakeman www.rba.co.uk
27
http://ranking.thumbshots.com/
27 May 2010
Karen Blakeman www.rba.co.uk
28
Ask




http://www.ask.com/, http://www.ask.co.uk/
Recent changes resulted in loss of features
Suggests related topics
Particularly good for searching blogs (but need to do a web
search first to see the More option)
 new Q&A tab/more answers
27 May 2010
Karen Blakeman www.rba.co.uk
29
Exalead
 http://www.exalead.com/search/
 Supports wild cards
– asterisk (*) at the end of a word
• pollut* finds pollute, pollutant, polluting etc.
 NEAR - finds words within 16 terms of one another
– NEAR/n finds words within n number of terms one another
• climate NEAR/3 change
 Approximate spelling, phonetic search (?)
 Regular expression (internal masking of letters)
 Feedback from users is that there is more European content that
seems to be given priority
27 May 2010
Karen Blakeman www.rba.co.uk
30
http://www.exalead.com/
27 May 2010
Karen Blakeman www.rba.co.uk
31
iSEEK
 http://www.iseek.com/
 Clusters results into
topics, people,
places, organisations,
date & time
 Search on a person
gives priority to social
media profiles
 “Education” option –
more research
oriented pages
27 May 2010
Karen Blakeman www.rba.co.uk
32
Live Search







http://www.live.com/
Results tend to be more consumer oriented
Has the most up to date database
Possibly has the most extensive database of web pages
Good image search option
Blogs & RSS search http://search.live.com/feeds/
Revamped interface but no improvement in advanced
search screen – best results by using commands e.g.
filetype: and Boolean search
 Link commands, Books and Academic Live all gone
27 May 2010
Karen Blakeman www.rba.co.uk
33
MSE360.com
 http://www.mse360.com/
 See reviews at
– http://www.rba.co.uk/wordpress/2008/10/05/mse360-search/
– http://www.rba.co.uk/wordpress/2008/10/06/update-on-mse360/
 Full Boolean nested search options
 Advanced search screen offers country, phrase, excluding
terms, domain/site search
 Can use commands e.g. filetype: , site;
 Results show web, video, images, Wikipedia and blogs
 Quick to respond to bug reports and fix problems
27 May 2010
Karen Blakeman www.rba.co.uk
34
Yahoo!
 http://search.yahoo.no/ http://search.yahoo.com/
 Results are ranked in a different order to Google
 Boolean AND, OR
– NOT no longer available – use the minus sign.
– parentheses no longer work
 Indexes first 500 K of a document (Google 101 K)
 Region command (inherited from Inktomi)
 region:
– e.g. region:europe, region:mediterranean
– others are africa, asia, centralamerica, northamerica,
southamerica, mideast, southeastasia, downunder
27 May 2010
Karen Blakeman www.rba.co.uk
35
Yahoo!
27 May 2010
Karen Blakeman www.rba.co.uk
36
Compare search engines
 Graball.com
– http://www.graball.com/
– compares two search engines of your choice side by side
 TripleMe
– http://www.tripleme.com/
– compares Google, Yahoo and Live side by side
 FuzzFind
– http://www.fuzzfind.com/
– searches Google, Yahoo, Live, Del.icio.us
 Zuula
– http://www.zuula.com
– runs your search through a range of search tools one by one –
order can be customised
 Browsys Powersearch (was Intelways/Crossengine)
– http://www.browsys.com/powersearch/
– runs your search through a plethora of search tools one by one
27 May 2010
Karen Blakeman www.rba.co.uk
37
FuzzFind
http://www.fuzzfind.com/
27 May 2010
Karen Blakeman www.rba.co.uk
38
Zuula
 http://www.zuula.com/
27 May 2010
Karen Blakeman www.rba.co.uk
39
http://www.browsys.com/powersearch/
27 May 2010
Karen Blakeman www.rba.co.uk
40
Evaluated listings and customised search
 Evaluated subject listings
 Some examples:
– Alacrawiki Industry Spotlights– http://www.alacrawiki.com/
– Intute – http://www.intute.ac.uk/
– Pinakes – http://www.hw.ac.uk/libWWW/irn/pinakes/pinakes.html
 Heavy human involvement
– evaluation and assessment of content
– only the home page or relevant section of a site is listed
 Customised search engines
– AlacraSearch - http://www.alacra.com/alacrasearch/
– Chipwrapper – http://www.chipwrapper.co.uk/
– Pipl - http://www.pipl.com/
27 May 2010
Karen Blakeman www.rba.co.uk
41
http://www.alacrawiki.com/ - spotlights
27 May 2010
Karen Blakeman www.rba.co.uk
42
http://www.alacra.com/alacrasearch/
27 May 2010
Karen Blakeman www.rba.co.uk
43
http://www.alacra.com/alacrasearch/
27 May 2010
Karen Blakeman www.rba.co.uk
44
Specialist search tools
 Think type of information
– news, official company information, statistics, scientific,
biomedical?
 Reference sources and peer reviewed, for example:
–
–
–
–
–
Wikipedia .org (yes, I know there can be quality issues!)
Scirus.com
TechXtra.ac.uk
Google Scholar (possible quality issues)
Google Books – especially for older material
 Structured databases e.g. Web of Science, Scopus, STN,
Factiva, LexisNexis – often priced
27 May 2010
Karen Blakeman www.rba.co.uk
45
Scientific/Technical & Peer Reviewed Resources
 RefSeek – http://www.refseek.com/
 Ten Science Search Engines http://hwlibrary.wordpress.com/2008/09/22/science-searchengines/
–
–
–
–
–
–
–
–
–
27 May 2010
Scirus – http://www.scirus.com/
Scitopia.org – http://www.scitopia.org/
Science.gov – http://www.science.gov/
ScienceResearch.com - http://www.scienceresearch.com/
Scitation - http://scitation.aip.org/
WorldWideScience.org - http://worldwidescience.org/
Science Accelerator - http://www.scienceaccelerator.gov/
TechXtra – http://www.techxtra.ac.uk
search.optics.org - http://search.optics.org/
Karen Blakeman www.rba.co.uk
46
Scientific/Technical & Peer Reviewed Resources
 Highwire Press http://highwire.stanford.edu/
 PubMed Central Homepage
http://www.pubmedcentral.nih.gov/
 UK PubMed Central http://ukpmc.ac.uk/
 DeepDyve http://mysearch.deepdyve.com/start.php
 Google Scholar – http://scholar.google.com/
– use with caution
27 May 2010
Karen Blakeman www.rba.co.uk
47
Google Scholar
 http://scholar.google.com/
 No source list
 Both peer-reviewed and un-reviewed articles, pre-prints,
institutional repositories, references to books, citations
 Excludes Reed Elsevier
 Author search unreliable, search on year of publication
unreliable
 But
– And the winner is: Google Scholar!
http://74120.weblog.leidenuniv.nl/2009/02/24/and-the-winneris-google-scholar
– Google Scholar Search Performance: Comparative Recall
and Precision
– http://tinyurl.com/c7ta6s
27 May 2010
Karen Blakeman www.rba.co.uk
48
Google Scholar
 “Google Scholar is brain damaged”
Peter Jasco, Trends in Professional and Academic Online
Information Services, presented at Inforum , 22nd May 2007,
Prague
 Does not use publishers‟ meta data
 Cannot differentiate between author, affiliation, geographic
location, titles and headings
– author:bagsvaerd 115
– author:acknowledgements 158
– author:glossary 471
 Cannot differentiate between publication year and page
numbers
27 May 2010
Karen Blakeman www.rba.co.uk
49
Google Scholar
2540 documents published
in 2011 or 2012!
27 May 2010
Karen Blakeman www.rba.co.uk
50
Scirus





http://www.scirus.com/
Scientific, scholarly, technical and medical information
Reed Elsevier journals
Also web sites, patents and pre-prints
Good advanced search features
– date searching, author searching etc.
27 May 2010
Karen Blakeman www.rba.co.uk
51
Scirus
TechXtra
 http://www.techxtra.ac.uk
 ICBL and the Library at Heriot-Watt University, Edinburgh
 Articles, key web sites, theses and dissertations, books,
industry news, new job announcements, technical reports,
eprints
 Engineering, mathematics and computing
 Free information and pay per view
27 May 2010
Karen Blakeman www.rba.co.uk
53
Books
 Amazon
 Google Books http://books.google.com/
– can sometimes search inside the book and looks at individual pages
– useful for older texts and suppliers of the book
– Advanced search - search by year, author, title, ISBN
 Open Library http://openlibrary.org/
– 23,044,231 books, 1,064,822 with full-text
 Project Gutenburg http://www.gutenberg.org/
– different editions may be available e.g. Darwin‟s Origin of Species
 viaLibri http://www.vialibri.net/ Rare books from over 20,000
booksellers
 Book swap schemes
– Turning over an old leaf
– http://www.guardian.co.uk/environment/2008/may/01/ethicalliving.rec
ycling
– e.g. http://www.bookmooch.com/
27 May 2010
Karen Blakeman www.rba.co.uk
55
News
 BBC – http://news.bbc.co.uk/
 Search engine news options e.g. Google
– last 30 days of free news
– no source list, key industry publications may not be included
– use country versions for prioritised local content
 Google News Archive http://www.google.com/archivesearch
– some sources going back 200 years
– many articles are priced (before you buy check other
sources)
 Silobreaker - http://www.silobreaker.com/
 Individual newspaper sites
– http://www.abyznewslinks.com/
27 May 2010
Karen Blakeman www.rba.co.uk
57
Silobreaker http://www.silobreaker.com
 covers free resources
 news, blogs, video,
images
 market trends
 geographical location
of stories
 people
 networks
27 May 2010
Karen Blakeman www.rba.co.uk
58
Images
 TASI
 Morguefile
– http://www.tasi.ac.uk/advice/
using/finding.html
 images.google.com
 search.yahoo.com –
images tab
 Ask – images tab
 Live.com - images
 Flickr.com
 Wikimedia Commons
– http://commons.wikimedia.
org/
 Freefoto
– http://www.freefoto.com/
– check the license
– http://www.flickr.com/creative
commons
27 May 2010
– http://www.morguefile.com
 US government web
sites
 NASA
– http://www.nasa.gov/
Karen Blakeman www.rba.co.uk
59
Audio & Video






Google Video
YouTube
Yahoo
Exalead
Live.com
Blinkx for news
– http://www.blinkx.com/
 Browsys Powersearch (formerly Intelways/Crossengine)
– http://www.browsys.com/powersearch/
– Click on the video tab
27 May 2010
Karen Blakeman www.rba.co.uk
60
Audio & Video
27 May 2010
Karen Blakeman www.rba.co.uk
61
Blogs as sources of information
 Blogs by industry gurus and experts are a good way of
keeping up to date with what is happening in a sector
 Look for the Blogroll of List of Links on a relevant blog
 Google Blogsearch http://www.google.com/blogsearch
– use advanced search to search within an individual blog
 Ask http://www.ask.com/ – Blogs and feeds
 Blog search engines and directories
– http://www.technorati.com/
– http://www.blogpulse.com/
27 May 2010
Karen Blakeman www.rba.co.uk
62
Blogpulse search and trends
Click on the graph
to see ‘trends’
27 May 2010
Karen Blakeman www.rba.co.uk
63
Blogpulse Trends
Shows how often your
search terms occur in
postings – can compare
up to three searches
27 May 2010
Karen Blakeman www.rba.co.uk
64
Twitter
 http://www.twitter.com/
 Microblogging –
postings are called
„tweets‟ and 140
characters long
 See who is „following‟
whom
 Monitor conferences,
what people are saying
about companies,
products, services
 http://search.twitter.com/
27 May 2010
Karen Blakeman www.rba.co.uk
65
Twitter
 Reputation management
 What are people saying about you?
– Oh dear!
27 May 2010
Karen Blakeman www.rba.co.uk
66
pipl
 http://www.pipl.com/
 Review at
http://www.rba.co.uk/wordpress/2007/05/05/pipl-peoplesearch-beta/
 Searches „hidden‟ web + Google search
– blog search, Google Groups, LinkedIn, Flickr, Google
Scholar, Electoral Roll, Directories, Amazon, Hoovers,
Zoominfo etc.
– Google web search results not the same as an ordinary
Google search – they incorporate terms such as resume,
CV
27 May 2010
Karen Blakeman www.rba.co.uk
67
LinkedIn
27 May 2010
Karen Blakeman www.rba.co.uk
68
Facebook
27 May 2010
Karen Blakeman www.rba.co.uk
69
123People
 http://www.123people.com/
 Searches
–
–
–
–
–
–
–
–
27 May 2010
image sections of major search engines
Flickr
Facebook
LinkedIn
Blogs
Web
Videos
Email addresses
Karen Blakeman www.rba.co.uk
70
Search visualisation tools
 Different ways of visualising results
 Show links between documents, search terms, people,
organisations
 Can help identify alternative search terms, search topics
27 May 2010
Karen Blakeman www.rba.co.uk
71
kartoo.com
27 May 2010
Karen Blakeman www.rba.co.uk
72
Cluuz
 http://www.cluuz.com/
 “Cluuz … core technology understands the
relationship between the entities, terms, or persons
searched leading to more relevant, easy to
understand search results”
 Not totally intuitive but the network visualisation is
„cool‟
 The links in the network visualisation do not always
relate to the same person or organisation but they are
usually working in a similar field or subject area
 Results change from one day to the next, one hour to
the next, but still worth a look
27 May 2010
Karen Blakeman www.rba.co.uk
73
Cluuz
27 May 2010
Karen Blakeman www.rba.co.uk
74
Quintura.com
27 May 2010
Karen Blakeman www.rba.co.uk
75
AllPlus.com
27 May 2010
Karen Blakeman www.rba.co.uk
76
„Disappearing‟ pages
 Search engine cache copies
– Google, Yahoo, Live, Ask, Exalead
 Firefox users
– install the Resurrect Pages add-on
 Wayback machine
– http://www.archive.org/
– from 1996 to about 6 months ago
– navigate the archived site or type in the full URL of the
document if known
27 November
May 2010 2006
Karen Blakeman www.rba.co.uk
77
Wayback Machine
27 May 2010
Karen Blakeman www.rba.co.uk
78