Expanding the usability of recorded lectures

Transcription

Expanding the usability of recorded lectures
Expanding the usability of recorded lectures
Expanding
the usability of
recorded lectures
A new age in teaching
and classroom instruction
E.L. de Moel
EE.L. de Moel
Expanding the usability of recorded lectures
A new age in teaching and classroom instruction
E.L. de Moel
for the degree of:
M aster of Science in Com puter Science
Date of submission: 26 February 2010
Date of defense: 3 March 2010
Committee:
dr.ir. D. Hiemstra
Dipl. Wirtsch.-Info R. Aly
dr. T.H.S. Eysink
Chair Databases
University of Twente
Faculty of Electrical Engineering, Mathematics and Computer Science
University of Twente
Faculty of Electrical Engineering, Mathematics and Computer Science
University of Twente
Faculty of Behavioural Sciences
Department Computer Science
Faculty of Electrical Engineering, Mathematics and Computer Science
University of Twente
Summary
Background
At present, Delft University of Technology records around 10% of their lectures. This number
is expected to increase in the following years. Having these recorded lectures opens the door
to all kinds of new ideas and improvements for their educational program. At this moment
they employ a video streaming system called Collegerama, which allows viewers with an
active Internet connection to watch their lectures online. It combines a video stream of the
lecturer with a series of screenshots of the accompanying PowerPoint slides.
Research
The main research question for this project is: “How can we efficiently and effectively present
recorded lectures and course material to students at universities?” This can be divided into
three sub-questions:
• How can we increase the accessibility and availability of the recorded lectures in
Collegerama?
• How can we make recorded lectures easier to follow, especially for foreign speaking
students?
• How can we effectively and efficiently navigate and search within recorded lectures?
The research approach for this project was to study the individual questions separately. In
the second phase of the project, the individual results were combined into a set of integrated
recommendations for further development in a short term implementation project.
Accessibility and availability
To increase the availability of the lectures, it is recommended to create a single video file
from the Collegerama recordings. This will allow for the distribution over many other popular
online multimedia platforms, such as YouTube-Edu and iTunes-U. A single video file
distribution allows for offline viewing without an active broadband Internet connection (for
example, while sitting in the train). This is not possible within the current Collegerama system.
In this research project, a Collegerama lecture has been converted into a single video stream,
after careful review of several layout designs and technical specifications. This lecture has
been published on YouTube. Several other formats have been created, so that the lecture can
also be distributed on all kinds of distribution platforms. This includes a smaller sized version,
created specifically for mobile devices and has been tested on Apple’s latest iPhone.
Easier to follow
To make lectures easier to follow, we show that the creation and displaying of subtitles is
useful. These subtitles can automatically be translated using machine translation. For this
research project, Google Translate has been used which currently supports translation to 52
different languages. The quality of these is decent, depending on the target language that
has been chosen. If necessary, this generated text can be enhanced by manual postprocessing. The current speech recognition technology has also been evaluated for the
generation of proper subtitles, using the speech recognition engine created by University of
Twente called SHoUT. It is concluded that this system is not yet sufficient to generate proper
subtitles and manual post-processing to improve the output is always required.
Navigation and search
This research project has shown that to properly navigate through the available recorded
lectures, the input from teachers is important. They need to provide the lecture title and
divide their lectures into several chapters with a proper chapter title, based on separate
timeframes (start time and end time). These chapters together with the slide titles and slide
content form the foundation for navigation and searching. The search element can be further
expanded by the available subtitles. For the purpose of this research project, all lecture titles
Summary
3
and chapters provided by the lecturer, slide titles and content and the generated SHoUT
transcripts for all 14 lectures (28 lecture videos) have been collected. The slide metadata has
been digitally and automatically extracted from the original PowerPoint files.
All this new information and metadata has been stored in a multimedia database, so that the
retrieval options for the lecture content could be researched. This database serves as the
source for all the additional options for navigation and searching:
• generating a static and/or interactive table of contents for each lecture (based on lecture
chapters)
• generating tag clouds
• displaying subtitles in several different languages
• searching within lecture material
To demonstrate its functionality, a prototype for a Collegerama lecture search engine has
been developed. This is an online web application that can be accessed from any location
with an active Internet connection and searches within all the above mentioned data linked to
a lecture. Every search result provides a link to Collegerama, so users can immediately see
the related part of the lecture.
Future developments
It is concluded that a better system for recording slides needs to be developed. Looking at
the future of education and the increasing developments in technology, it’s clear that
presentations are going to be supported by more animation and video. This means that an
old screenshot recording system will no longer be sufficient to properly record PowerPoint
slides.
To further increase the usability of the recorded lectures, a new interactive way to discuss
lectures with the teacher and other students can be introduced. It promotes the asking and
answering of questions, not just by the teacher but also by fellow classmates. This can be
done through the use of a dynamic message board that is linked to the timeline of each
lecture. Students can comment and discuss on the different topics in the lecture. To support
such a system, an extension of the current multimedia database is required, so that the
messages along with their optional timeframes can be stored.
With these recommendations, it is possible to use recorded lectures as a foundation for future
online-given courses without the need for live lectures.
4
Summary
Abstract
The status of recorded lectures at Delft University of Technology has been studied in order to
expand its usability in their present and future educational environment. Possibilities for the
production of single file vodcasts have been tested. These videos allow for an increased
accessibility of their recorded lectures through the form of other distribution platforms.
Furthermore the production of subtitles has been studied. This was done with an ASR system
called SHoUT, developed at University of Twente, and machine translation of subtitles into
other languages. SHoUT generated transcripts always require post-processing for subtitling.
Machine translation could produce translated subtitles of sufficient quality.
Navigation of recorded lectures needs to be improved, requiring input of the lecturer.
Collected metadata from lecture chapter titles, slide data (titles, content and notes) as well as
ASR results have been used for the creation of a lecture search engine, which also produces
interactive tables of content and tag clouds for each lecture.
Recorded lectures could further be enhanced with time-based discussion boards, for the
asking and answering of questions. Further improvements have been proposed for allowing
recorded lectures to be re-used in recurring online-based courses.
Abstract
5
6
Preface
This report has been written as a result of my research project at University of Twente, in
cooperation with Delft University of Technology. It was originally a project that started at TU
Delft, in which my father was involved. He has been active at the university for the past 8
years at the chair of drinking water engineering, to improve their educational programs.
When the development of a system for recording and sharing recorded lectures started
(Collegerama), they were one of the first chairs at the university that started recording all
their lectures.
At that time, I was active as an online poker instructor, teaching enthusiastic players ways to
improve their game. I told my father the techniques we used to teach and instruct students
all over the world through the use of the Internet, either one on one or via online streaming
recorded lectures. We began exchanging ideas about this subject and started to see the
remarkable potential that lies ahead with this new form of online multimedia education. That
is how I got involved with this research project.
The goal of this project is to research possibilities for expanding the usability of recorded
lectures at TU Delft and University of Twente and improve the means for distributing and
sharing lecture and course material to students at universities.
I would like to thank the following people:
• Djoerd Hiemstra, my first supervisor, for assisting and guiding me during my research
project
• Robin Aly, my second supervisor, for providing ideas about evaluating certain topics in my
thesis
• Peter de Moel, for providing feedback and to bounce ideas back and forth about the
future of online recorded lectures
• Thijs Verschoor, for generating the subtitles for the 28 lecture videos of course CT3011,
using the SHoUT engine developed by University of Twente
• Willem Jansen, for helping me with the automatic conversion from pdf PowerPoint sheets
to an Excel data sheet and a SQL 2005 database using a C++ script
• Koen Zomers, for his useful Visual Studio 2008 tips while programming the Collegerama
lecture search engine
• my parents, sister, family and friends for their support during the past 9 months and
during the course of my master
February 2010
Erwin de Moel
Preface
7
8
Table of contents
Summary
Abstract
Preface
List of figures
List of tables
1.
2.
3.
4.
5.
6.
7.
Introduction
Existing systems for digitally recorded lectures
2.1 Massachusetts Institute of Technology
2.2 Delft University of Technology
2.3 University of Twente
2.4 Summary
Distribution platforms
3.1 YouTube
3.2 iTunes
3.3 Portable Document Format (PDF)
3.4 Conclusions
Subtitling
4.1 Subtitling process
4.2 Subtitles from speech recognition
4.3 Machine translation for subtitles
4.4 Text-to-speech for translated subtitles
4.5 Conclusions
Navigation and searching
5.1 Meta-data for navigation and search
5.2 Metadata sources
5.3 Metadata storage
5.4 Course and lecture navigation
5.5 Collegerama lecture search
5.6 Conclusions
Proposed improvements
6.1 Lecture accessibility
6.2 Navigation and searching
6.3 Student interaction
6.4 Increasing course frequency
6.5 Pilot project for further development
Conclusions
List of references
List of URL’s
Annexes
Accompanying material
Table of contents
3
5
7
10
12
13
15
15
18
24
25
27
29
33
37
39
41
42
43
46
50
50
51
52
54
55
56
60
66
67
67
68
69
70
72
75
77
79
81
83
9
List of figures
Figure 1.1: Screenshot of a recorded lecture at TU Delft in Collegerama ............................. 13
Figure 2.1: Prof. Walter H.G. Lewin, the YouTube superstar .............................................. 16
Figure 2.3: Older Collegerama lectures (TN2012) recorded in 2004 had a smaller video size 18
Figure 2.4: Collegerama recording using a Tablet PC as an interactive blackboard .............. 19
Figure 2.5: Stationary and mobile recording unit of Collegerama ........................................ 20
Figure 2.6: Screenshot of a Collegerama lecture with too many screenshots because of mouse
movement ...................................................................................................................... 20
Figure 2.7: Examples of the three presentation options ..................................................... 21
Figure 2.8: Collegerama screenshots of the three different presentation options ................. 21
Figure 2.9: The Collegerama lecture at Mechanical Engineering is streamed to the “movie
theater” next door........................................................................................................... 22
Figure 2.10: The Collegerama live streamed online recordings (CT2011) were announced as
“lectures in bed” ............................................................................................................. 23
Figure 2.11: Collegerama recording (214020) at University of Twente in November 2007 .... 24
Figure 2.12: Overview of video lectures (214020) in Blackboard (2nd quarter of study year
2009-2010)..................................................................................................................... 24
Figure 3.1: Combining the Collegerama components into a single video file enables broader
distribution of recorded lectures ....................................................................................... 28
Figure 3.2: The two main components of Collegerama, video and slides ............................. 30
Figure 3.3: Layout of Collegerama elements within the resolution constraints for YouTube
movies (1280x720) ......................................................................................................... 31
Figure 3.4: Collegerama as a vodcast for YouTube (1280x720) .......................................... 32
Figure 3.5: Converting PowerPoint slides into a movie file by recording the Collegerama slidedisplay ........................................................................................................................... 32
Figure 3.6: Vodcast of a Collegerama recording converts a small-sized video into a HD movie
with room for proper subtitles .......................................................................................... 33
Figure 3.7: A typical PowerPoint slide at iPod resolution (320x240) .................................... 35
Figure 3.8: Collegerama vodcasts with different options for the video component at iPod
aspect ratio .................................................................................................................... 35
Figure 3.10: PowerPoint slide in TU design at iPod size, with and without inserted movie
components (20%) ......................................................................................................... 37
Figure 3.11: Adobe Presenter allows the creation of lectures based on PowerPoint ............. 37
Figure 3.12: Screenshot of lecture CT3011 implemented within Adobe Presenter ................ 38
Figure 4.1: Creation process for subtitles .......................................................................... 42
Figure 4.2: Screenshot of the program SubCreator ............................................................ 42
Figure 4.3: Translated subtitles improve the learning environment for non-native speaking
students ......................................................................................................................... 43
Figure 4.4: Word correctness of SHoUT for the CT3011 lectures, clustered by speaker ........ 45
Figure 4.5: Subtitles created from the SHoUT transcript .................................................... 46
Figure 4.6: The performance of some English to German translation engines compared to
human translation (=ref) [6] ............................................................................................. 48
Figure 4.7: Translated subtitles from Dutch to English in YouTube ..................................... 49
Figure 4.8: Will automatic real-time translation engines become available within the next
decade? (Source: http://www.meglobe.com) .................................................................... 50
Figure 5.1: Catalog of recorded lectures in a course .......................................................... 51
Figure 5.2: Schematic view of a multimedia information retrieval system [14] ....................... 53
Figure 5.3: Searching in parallel metadata of videos [18]..................................................... 53
Figure 5.6: Interactive TOC for recorded lecture #15 in course CT3011, generated from the
Collegerama data system................................................................................................. 57
Figure 5.7: Tag cloud for recorded lecture #15 in course CT3011, generated by Wordle, with
and without deleted words by prof J.C. van Dijk [54] .......................................................... 58
Figure 6.1: Online viewing (YouTube) and available downloads and links for a recorded
lecture............................................................................................................................ 68
10
List of figures
Figure 6.2: Tools created from the Collegerama database (slide navigator, tag clouds and
search application) will significantly improve the accessibility of recorded lectures .............. 69
Figure 6.3: Time-lined online discussions on recorded lectures are common practice for the
online educational poker community ................................................................................ 69
Figure 6.4: Multiple scheduling of courses with recorded lectures and online/moderated
assistance by a lecturer ................................................................................................... 70
Figure 6.5: Examples of multiple scheduled courses .......................................................... 71
Figure 6.6: Online poker courses are scheduled on specific days, in order to enlarge the
attendance and to promote live online discussion (Source: http://www.deucescracked.com/)
...................................................................................................................................... 71
Figure 6.7: Recorded lectures are embedded in a Multimedia Information Retrieval System,
containing multimedia content and structured course and lecture metadata ....................... 73
List of figures
11
List of tables
Table 2.1: Number of slides/screenshots for the three presentation options ........................ 21
Table 4.1: Quality assessment of word correctness by speech recognition on lectures ......... 45
Table 4.2: Some popular machine translation engines ....................................................... 47
Table 5.1: Primary metadata for selecting of and navigating in recorded lectures ................ 52
Table 5.2: Analogy of navigation in DVDs and recorded lectures ........................................ 52
Table 5.3: Database table Content ................................................................................... 55
Table 5.4: Database table Lectures .................................................................................. 55
Table 5.5: List of Text_types and the amount of records and words in the database for course
CT3011 .......................................................................................................................... 56
Table 5.7: Occurrences of the 15 most used nouns from ASR versus human-made subtitles 61
Table 5.9: Video length per data source in Collegerama lecture search for course CT3011 ... 63
Table 5.10: Precision and recall measurement for different data sources on 3 important words
of lecture #15................................................................................................................. 64
Table 6.1: Current situation and goals for future academic courses .................................... 72
Table 6.2: Additional products for expanded usability of recorded lectures ......................... 73
12
List of tables
1.
Introduction
Background
For the past 10 years, there has been little or no change in the way that lectures are given at
the various Universities throughout the world. With the emerging of new technologies, there
are numerous new possibilities for improving the method in which information is shared
between student and teacher. Through the use of the Internet, there is an incredible amount
of additional material that can be found in order to delve even deeper into the subject matter.
Most universities already employ an online community and messaging system where lecture
sheets, additional subject material and practice exams are shared.
Every year, a teacher of a course gives a similar lecture compared to the previous year, while
a new group of students follows the course. As long as both the course and lecture material
don’t go through a significant change, this seems somewhat redundant. In the past year, TU
Delft has also been faced with a problem. The amount of registrants for certain courses
exceeds the maximum capacity of the largest available classroom.
TU Delft has been developing its own system for the production and streaming of digitally
recorded lectures, called Collegerama. They stream lectures on a web server to further
support their learning programs. Figure 1.1 shows an example of a lecture given at the
University of Delft that has been recorded and can be viewed online.
Figure 1.1: Screenshot of a recorded lecture at TU Delft in Collegerama
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=7548f752-101b-417e-a4e7-58aebc595376)
The recorded lectures on Collegerama contain the following elements:
• a video stream of the lecturer
• screenshots of the presentation sheets or an interactive screen (tablet PC) on which the
presenter writes notes
• navigation tools for scrolling through the video and/or slides
• controls for play/pause, full screen mode and to modify the playback speed
Goals
TU Delft would like to investigate possibilities of expanding the usability of their recorded
lectures. They have several ideas for achieving this:
• subtitling the lectures for students with hearing problems
• subtitling in other languages (English subtitles for Dutch lectures and vice versa)
• translated subtitles spoken over the original video stream (with or without subtitles)
• searching in lecture content (whether in transcripts/subtitles)
• searching in lecture content (by handmade content overview and/or computer generated
keywords)
• distribution as a vodcast (for PDA, iPod, iPhone or another type of mobile phone)
1. Introduction
13
Research questions
The main research question for this project is:
How can we efficiently and effectively present recorded lectures and course material to
students at universities?
This question can be divided into three sub questions:
•
•
•
How can we increase the accessibility and availability of the recorded lectures in
Collegerama?
How can we make recorded lectures easier to follow, especially for foreign speaking
students?
How can we effectively and efficiently navigate and search within recorded lectures?
Project boundaries
This research project will not include investigations on user preferences (teachers and
students), best educational practices, optimal teaching methods for recorded lectures etc. It
will be restricted to alternatives technically feasible within the E-learning and ICT
environment of TU Delft. This is not restricted to presently used ICT tools, but might include
commercial available products implementable within the TU Delft environment.
Starting point for this research project are the presently produced Collegerama recordings. No
other methods of recording lectures will be evaluated. This research project includes a
technology-centered approach to the subject. A user-centered approach might be taken in a
succeeding research project, evaluating the different proposed extending products and
applications for using recorded lectures and the benefits and problems it might bring to the
different user groups (teachers, local students, foreign exchange students, the University
etc).
Report outline
The entire report can be separated into three parts; chapters 2 and 3 give an introduction
into systems currently used for the distribution of recorded lectures, followed by chapters 4, 5
and 6 in which possibilities for expanding the usability of these recorded lectures are
discussed. Chapter 7 contains the conclusions of the entire research project.
In chapter 2, a detailed history of recorded lectures is described, starting with the way that
the Massachusetts Institute of Technology produces and distributes its lectures. This is
followed by the current Collegerama system that is used by both University of Twente and
Delft University of Technology. Chapter 3 describes several formats for producing and storing
recorded lectures. It mentions several audio and video formats that can be used, in what way
a timeline can be determined for a lecture and how to handle the link between the video
stream of the lecturer and the slides that accompany the presentation. After this introduction
into the world of online recorded lectures, chapter 4 discusses new possibilities for expanding
its usability through the means of subtitling and translation. In chapter 5, a description of
several different methods for navigating and searching through the various recorded lectures
is given. Chapters 4 and 5 are concluded in chapter 6, by describing a list of proposed
improvements. An actual prototype for a lecture search engine has been designed and a new
way of browsing through a group of lectures for a single course is demonstrated. Finally, the
conclusions of the entire report are presented in chapter 7.
The printed version of the report does not include the annexes. The annexes are only
included in the electronic version of the report. An accompanying DVD includes the
intermediate and final products of this research project.
14
1. Introduction
2.
Existing systems for digitally recorded lectures
Ten years ago (in 1999), the Massachusetts Institute of Technology started broadcasting
several unique physics lectures over a local TV channel. This was primarily done to gain more
exposure for their educational programs. It received a lot of positive results, which caused
them to start recording and distributing more lectures from other sciences by use of the
Internet. As time went on, the recorded lectures were also being used to improve and expand
their learning programs for their own students by publishing them on the Internet.
Several years later, a trend started to emerge and several other large universities in the
United States, such as Berkeley and Stanford started to do the same. In 2000, Delft
University of Technology, followed by University of Twente, started its own lecture recording
programs. After running a few successful pilots, they are now recording more and more
lectures each year. It’s not going to be surprising to see that within the next couple of years,
all the Dutch universities are doing the same with their courses and lectures.
In this chapter, the history, recording process and developments with regards to recorded
lectures are discussed. The differences between the techniques used at MIT, TU Delft and
University of Twente will be shown and several drawbacks in the current system that both
Dutch universities are using will be described.
For further background information about the research on this topic, see Annex A and B.
2.1
Massachusetts Institute of Technology
Massachusetts Institute of Technology (MIT) is a private research university located in
Cambridge, Massachusetts in the United States. It has five schools and one college,
containing a total of 32 academic departments, with a strong emphasis on scientific and
technological research. It is one of the most prestigious technical universities in the world.
Their reputation is based on their scientific output through the publishing of scientific articles
and reports and the awards received by their staff. Seventy-three members of the MIT
community have won the Nobel Prize, including seven current faculty members.[24]
MIT enrolled 4,232 undergraduates and 6,152 graduate students during the fall of 2009–
2010.[25] It employs about 1,000 faculty members. Its endowment and annual research
expenditures are among the largest of any American university. 75 Nobel Laureates, 47
National Medal of Science recipients and 31 MacArthur Fellows are currently or have
previously been affiliated with the university.[24] The aggregated revenues of companies
founded by MIT alumni would be the seventeenth largest economy in the world.[26][27]
OpenCourseWare
In 2000, MIT started the concept of publishing their course material on the Internet, which
would be publically available for everyone. They called this project OpenCourseWare (OCW).
The first proof-of-concept site was published in 2002, containing 50 courses. By November of
2007, MIT completed the initial publication of almost their entire curriculum which contained
over 1,800 courses in 33 academic disciplines.[29]
MIT also publishes some of their courses in one or more translated versions and have
formally partnered with four organizations that are translating OCW course material into
Spanish, Portuguese, Simplified Chinese, Traditional Chinese and Thai. Their material has
already been translated into at least 10 different languages, including French, German,
Vietnamese, and Ukrainian.
2. Existing systems for digitally recorded lectures
15
Since 2008, MIT has added audio and video-taped lectures to their OCW website. These
lectures were recorded between 1999 and 2008 and have been published on YouTube,
iTunes and VideoLectures.net. The OCW concept has received an enormous amount of
attention from all over the world, both from students as well as from universities. In 2005,
the OpenCourseWare Consortium was established to advance education and empower people
through open courseware. At present, about 200 higher education institutions and associated
organizations from around the world are a member of this organization, including TU Delft,
the Dutch Open University and HAN University of Applied Sciences (Hogeschool van Arnhem
and Nijmegen). Because of the positive response on their OCW activities, MIT employs a
special OCW office where close to 20 people are working every day.[30]
Walter Lewin
In 1999, MIT started recording the lectures of their most popular courses. Professor Walter
Lewin is one of the most well-known lecturers today, who has been made famous through TV
and the Internet. He is an extremely enthusiastic physics teacher who received his Ph.D.
degree in nuclear physics in 1965 at Delft University of Technology. He joined MIT in January
of 1966 as a post-doctoral associate and became an assistant professor later that year.[28]
Figure 2.1: Prof. Walter H.G. Lewin, the YouTube superstar
(Source: http://bibliotematica.wordpress.com/2009/06/05/walter-lewin-quiero-morir-en-una-clase/
and http://www.pbs.org/kcet/wiredscience/blogs/2007/12/free-to-be-mit.html)
Even before the advent of MIT OpenCourseWare, Lewin’s lectures could be found on UWTV
in Seattle, where he reached an audience of about four million people, and on MIT Cable TV,
where he helped freshmen with their weekly homework assignments. Lewin’s lectures on
“Newtonian Mechanics, Electricity and Magnetism” and on “Vibrations and Waves” comprise
some of the most popular content on MIT OpenCourseWare. He consistently holds a spot in
the most downloaded videos on Apple’s iTunes-U as well as on YouTube-Edu. His unique
style of teaching has captured the attention of a broad range of students, educators and selflearners.[28] Thanks to the various distribution channels that MIT OCW employs, the lectures
of Walter Lewin now receive about 3,000 views a day, from people all over the world.
Online distribution
YouTube is the most popular website for online video content in the world. Nearly 20% of all
global Internet users visit YouTube, an average of 16 page views per visit. In October 2009,
they were ranked the 4th in the top 500 websites list, right after Google, Facebook and
Yahoo.[33]
Since March 2009, YouTube has a special section for education called YouTube-Edu. In April
of 2009, about 150 universities and colleges in the United States have submitted around
25,000 educational videos. 8 months later, in December of 2009, there are already 298
participating universities. The videos on YouTube-Edu are not all recorded lectures, but also
short movies (6 to 12 minutes).[34]
16
2. Existing systems for digitally recorded lectures
iTunes-U is a part of the iTunes Apple Store. The service was created to manage, distribute,
and control access to educational audio and video content for students within a college or
university or for outside viewers. The member institutions are given their own iTunes-U site
that makes use of Apple’s iTunes Store infrastructure. The online service is without cost to
those uploading or downloading material. Content includes course lectures, language lessons,
lab demonstrations, sports highlights and campus tours provided by many top colleges and
universities from the US, United Kingdom, Australia, Canada, Ireland and New Zealand.[35]
In November of 2009, iTunes-U holds over 200,000 educational audio and video files from
top universities, museums and public media organizations around the world. About 200
international universities and colleges have published content on iTunes-U, including MIT,
Yale, Stanford, UC Berkeley, Oxford, Cambridge, Freiburg, Lausanne, TU Aachen and
Melbourne. The number of participating universities, as well as the number of audio and
video files, has doubled in the previous 7 months.
Apart from iTunes-U and YouTube, which are commercial services, there are also a few
websites who offer their services for other reasons. A popular example of this is
VideoLectures.net. Their main purpose is “to provide free and open access of high quality
video lectures presented by distinguished scholars and scientists at the most important and
prominent events like conferences, summer schools, workshops and science promotional
events from many fields of Science. The portal is aimed at promoting science, exchanging
ideas and fostering knowledge sharing by providing high quality didactic contents not only to
a scientific community but also to a general public.”[32]
A recent addition to this group is Academic Earth, which launched in March of 2009. Their
mission statement, as stated on their website: “Academic Earth is an organization founded
with the goal of giving everyone on earth access to a world-class education”.[31]
Video composition
Every MIT video has a camera angle that is fixed on the front side of the classroom. Most of
the time a professor is walking in front of a whiteboard while explaining several course topics.
The video camera follows the professor and zooms in and out on the blackboard whenever
the professor is writing on it. Sometimes during the video, parts of the surrounding classroom
are visible and you can see students sitting down and/or people walking in.
Most MIT professors only use the blackboard, while PowerPoint slides, overhead projectors or
projected illustrations are rarely used. In case this does happen, the content of these slides
are included in the video by zooming in on the projected screen, or the recorded video might
show a text screen referring to the lecture material. These slides are published as a pdf file
under the “Lecture notes”.
Figure 2.2: MIT lecture with a professor using slides, which are also included in the recorded video
(Source: http://www.youtube.com/watch?v=R90sohp6h44)
2. Existing systems for digitally recorded lectures
17
The MIT lectures were initially recorded with two cameras; one camera was used for the
overview and one camera took care of the close-ups of the blackboard. More recent
recordings included two more cameras in the back of the classroom to provide a wider
overview of the lecturer in front of the class. All these multi-camera lecture recordings had to
undergo some form of post-production to work out the different camera angles, so that a
single continuous video could be constructed combining all of the different recorded footage.
Transcripts, captions and annotations
About 60% of the recorded lectures at MIT are provided with a transcript. The transcripts are
presented on the MIT-OCW website, on the page of the related lecture under the embedded
YouTube movie. Most of the time these transcripts are also available as a pdf file.
In YouTube, these transcripts are used for the YouTube Caption option that shows subtitles
in the bottom part of the movie. Captions or subtitles are available in YouTube since August
of 2008.
2.2
Delft University of Technology
Development of Collegerama
In the year 2000, the section Multimedia Services (MMS) of Delft University of Technology
started with the development of Collegerama in a pilot project on streaming media.[38][39] The
main goal of this pilot was the recording of lectures which could be viewed by students within
Blackboard, their digital learning environment. These “web lectures” were regarded as
instruments to improve study results and to increase the efficiency of the education at the
university.
MMS selected the commercially available Mediasite system, created by Sonic Foundry, as a
basis for Collegerama. The term Collegerama is a private brand created by TU Delft, so that
they could be independent from the technical infrastructure for their web lectures. Selecting a
standard product avoids the high development cost for creating a new system. By using an
existing solution, the university also has the added benefit of getting new updates and
features within the Mediasite platform.
The early years
In April and May of 2004, Professor Barend Thijsse was teaching the BSc course TN2012
Quantum mechanics. He was giving the course for the last time, because he was leaving the
university. Since he was recognized as an outstanding teacher, TU Delft wanted to record his
lectures now that they still had the chance. He gave the course and lectures together with his
successor, Professor Leo Kouwenhoven.
Figure 2.3: Older Collegerama lectures (TN2012) recorded in 2004 had a smaller video size
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=735a8c5902864988b01157c16f8e632e)
18
2. Existing systems for digitally recorded lectures
Mediasite was used for recording the 25 lectures (40-45 minutes) of the BSc course. A Tablet
PC functioned as a blackboard to write notes on and both lecturers had a speaker microphone
attached to their jackets. The recorded courses were used during the succeeding years as a
reference until a drastic curriculum change in September of 2008.
After this successful project, there were 3 additional presentations recorded using Mediasite
from September until December of 2004, as part of tests for the technical infrastructure of
Collegerama. These web lectures were filmed with poor audio recording equipment (no
special microphone for the speaker) and a small sized video recording (256x192 resp.
240x180). By that time, 240x180 was the standard video size for Mediasite recordings.
In January of 2006, Collegerama was used for the recording of the closing speech by the
Rector Magnificus, Prof. Dr. Ir. J.T. Fokkema, at the 164th Dies Natalis of TU Delft. This was
the start of a yearly tradition where all the Dies Natalis speeches were recorded. The video
was recorded at a higher resolution of 320x240, which is still the standard Collegerama video
resolution in 2009.
Between September and December of 2006, the 30 lectures (40-45 minutes) of the BSc
course TN2545 Signals and Systems by Professor Lucas van Vliet were recorded. This course
was normally given in Dutch, but for the sake of the recordings they decided to give them in
English to allow non-Dutch speaking students to follow the course. The recorded lectures
consist of videos showing the lecturer and synchronized screenshots of a Tablet PC, used as
an interactive blackboard. These recorded lectures were actually used for several years, until
in September 2009 a new lecturer took over the course. They are currently available on
Blackboard as reference material.
Figure 2.4: Collegerama recording using a Tablet PC as an interactive blackboard
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9)
Collegerama recording
Collegerama has two possibilities for recording lectures. They can either use a stationary
setup that has been placed at a few classrooms at TU Delft, or they can use the mobile
station which can be used at any given location. Both of the systems consist of a stationary
webcam which can be operated remotely by use of a joystick. The operator, usually a student
aid, makes sure that the camera is always pointed at the lecturer while he or she is moving
around the classroom. The laptop that comes with the presenter unit is connected to a
beamer, so that the PowerPoint slides can be viewed in the classroom and recorded by the
system. The recording system takes screenshots of the beamer screen, based on computer
activity. Every 1 to 4 seconds, the system checks for a change on the screen. If a different
slide has been loaded or the position of the mouse has been changed, a new screenshot will
be saved as a jpeg image file.
2. Existing systems for digitally recorded lectures
19
Figure 2.5: Stationary and mobile recording unit of Collegerama
The current system used to record the slides of the lectures relies on the fact that changes on
the screen always correspond to a change in the presentation. This is clearly not the case and
several scenarios can cause a faulty screenshot to be taken:
• a video is played within a PowerPoint slide
• the lecturer inadvertently moves the mouse
• the lecturer leaves PowerPoint to demonstrate an application on his PC
This recording flaw creates a problem, because a lot of Collegerama lectures contain a lot of
abundant images that were accidentally saved. Some of these lectures contain 400
screenshots, when in fact the original PowerPoint presentation only had about 50 slides.
While playing the lectures, the interface relies on the screenshots that are created during the
recording for navigation. The problem with this navigation system is that once the lecture
contains an overflow of useless slides, there is no other way of browsing through the lecture
except for the video timeline. An example of the navigation element in such a lecture is
shown in Figure 2.6.
Figure 2.6: Screenshot of a Collegerama lecture with too many screenshots because of mouse movement
After the lecture has been given, the data is sent to the presentation server. It will process
the different data sources and create three different outputs:
• an audio/video stream (wmv file)
• pictures of the different PowerPoint slides or computer screenshots (jpeg files)
• different settings and additional information about the lecture (xml file)
The presentation server will synchronize all the different elements and will store the required
information in the xml file. This information will later be used to correctly display the video in
combination with the screenshots. When the presentation has been processed, it is written to
the Collegerama web server and is now available for students with Internet access all over
the world.[42]
20
2. Existing systems for digitally recorded lectures
Presentation options
During the presentation, the lecturer is provided with three different presenting options:
• blackboard
The lecturer uses the blackboard or an overhead projector to give his lecture, while the
video camera records the content.
• PowerPoint
This works in combination with a prepared set of PowerPoint slides that will be displayed
while the presentation is being given.
• screen capturing
The contents of the computer screen will be displayed during the presentation, which
allows for the lecturer to use external software such as computer simulations or written
text on a Tablet PC and record the results as separate screenshots.
Figure 2.7: Examples of the three presentation options
Each of these presentation options uses the same storage system, which is based on screen
activity. Especially while using the blackboard or desktop methods, there will be an abundant
amount of images stored, since every mouse movement and change on the screen, when
writing down notes, will cause a new screenshot to be saved. Collegerama uses a uniform
view for all three presenting options, as is shown in the examples given in Figure 2.8.
Figure 2.8: Collegerama screenshots of the three different presentation options
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=ca42dce5-bb51-4c39-93de-50528dd6b880
and http://collegerama.tudelft.nl/mediasite/Viewer/?peid=724886f7-cfd0-441d-ae85-1fae0cbb28a1
and http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9)
Figure 2.8 illustrates that Collegerama is suitable for the showing of lectures in which a
PowerPoint presentation or a Tablet PC is used (middle and right screenshot). In these cases
the most detailed information is presented on the presentation block. For a lecture with
blackboard only (left screenshot), the Collegerama system is a little superfluous. The three
presentation options differ significantly in the number of slides (or screenshots). This
difference is illustrated in Table 2.1.
Table 2.1: Number of slides/screenshots for the three presentation options
Presenting option
Number of slides
Blackboard
PowerPoint
Screen capturing
0 (no slides picture)
30
308
2. Existing systems for digitally recorded lectures
Navigation pages
(list – small – large)
0-0-0
2-3-5
12 - 15 - 29
21
With respect to navigation, only lectures with PowerPoint slides seem to be suitable for a
Collegerama recording. Blackboard lectures lack the navigation by slides/screen shots, while
Tablet PC lectures have too many screenshots for a proper navigation. For the latter, the
screenshots can be clustered in chapters as part of the post-processing process of a
Collegerama recording.
Collegerama as a service
Starting in September of 2007, Collegerama became part of the regular facilities for education
at TU Delft, under the responsibility of the University Corporate Office for Education and
Student Affairs (O&S). This office is also responsible for the electronic learning system
Blackboard. As a consequence, recording of lectures was financed by the Corporate Office
and became free for the lecturers at the different faculties. Before that time recordings were
made at a rate of € 500,- per recorded session of 45 minutes. The scheduling of recording
units and operators is now organized by O&S and lecturers can apply there to have their
lectures recorded. This service has resulted in a huge increase of recorded lectures. In
September and October of 2009 alone, around 60 to 75 lectures were recorded each week
(30 to 40 sessions of 2 lectures of 45 minutes each). This amounts to 5% of all lecture hours
given each week at TU Delft.
In September of 2009, the faculty Mechanical Engineering was faced with a huge student
overflow. The 500 first-year students did not fit in their largest lecture room available, which
had a capacity of 300 students. To overcome this problem, they used two lecture rooms. In
one lecture room, the lecturer gives the live lecture which was recorded using a high quality
camera. This recording was then streamed to the other lecture room via a larger data stream
to accommodate the higher quality. The recorded lectures were afterwards also available at a
lower quality via Blackboard and Collegerama. The faculty called this service: “lectures in a
movie theater”.
Figure 2.9: The Collegerama lecture at Mechanical Engineering is streamed to the “movie theater” next door
(Source: Delta 27, 17 September 2009)
Collegerama live
The mobile recording units of Collegerama have a personal storage unit. After the lecture
recording has been completed, the stored data is uploaded to the central Collegerama server.
It is also possible to stream this recording to the server immediately while recording, thus
generating a live stream to the outside world. This live streaming process has a 5 to 10
second delay between recording and broadcasting. In the Collegerama setup, a URL for a
Collegerama lecture is automatically created 4 hours before the recording. This URL is
published before the lecture starts, so that every student can watch it from their own room or
any other location that has live Internet access.
22
2. Existing systems for digitally recorded lectures
This live streaming system was used for the course CT2011 Watermanagement in
September-October of 2009. The course was moved within the curriculum from the third year
to the second year, which caused the student attendance to double to about 500 students.
This again largely exceeded the maximum seating capacity of the largest classroom available
at the faculty of Civil Engineering (it holds only 350 students). To reduce the number of
students attending, the lectures were scheduled on Monday and Friday during the first two
lecture hours. The lectures were also announced to be broadcasted live and received a wide
media attention under the title “lectures in bed”. The system was a huge success. After the
initial lecture, the number of attending students reduced to around 100 attendees, with a
large number of online viewers during lecture hours or several hours after the lecture. The
movie theater lecture room stayed empty after the first lecture.
Figure 2.10: The Collegerama live streamed online recordings (CT2011) were announced as “lectures in bed”
(Source: Delta 27, 17 September 2009)
OpenCourseWare
In March 2007, TU Delft started its own OpenCourseWare pilot project.[40] In this pilot project
the course material of about 20 MSc courses from 6 different disciplines were published.
Collegerama lecture recordings were part of this material. This initiative was very well
received and students found the Collegerama recordings to be of extraordinary quality.
Because of the national and international response, TU Delft decided in 2008 to continue its
OpenCourseWare program at a more extensive scale.
In October 2009, TU Delft hosted the yearly conference of the OpenCourseWare consortium,
in which more than 200 universities worldwide are active.[30] The Director of Education and
Student affairs of TU Delft is a member of the board of the OCW Consortium.
In January 2010, TU Delft has renewed its OpenCourseWare website. One of the goals for
update was to give the recorded lectures a more pronounced exposure and to give it the look
and feel of the original Blackboard courses. A month later, they have created an iTunes-U
account and have started to publish recorded lectures, partly as a result of the work that has
been described in this research report. Annex C gives further information on this subject.
Future developments
In 2010 a new viewer for Collegerama will be implemented at TU Delft (Mediasite version
5.2). This Silverlight player has the look and feel of viewing YouTube movies, with a small
slide viewer for navigation and of viewing iPhone or Windows 7 screens (dynamic screen
changes). This new viewer will still encounter the major drawback of having an overload of
useless slides, because this problem is related to the recording process, not to the viewer.
2. Existing systems for digitally recorded lectures
23
2.3
University of Twente
In 2007, University of Twente started a pilot project on recorded lectures. This pilot project
used the experience of TU Delft with its Collegerama system. The same technical
infrastructure of Collegerama was also used at University of Twente. Within the pilot project,
the lectures of 10 BSc courses have been recorded. One of these was the course Algorithms,
Data structures and Complexity (214020), which is also a pre-master course for the master
Computer Science. Between November 2007 and January 2008, 8 of their lecture sessions
have been recorded. Afterwards the 7th lecture session was not available due to technical
difficulties. The recorded sessions include two lecture hours (40 minutes each) and the
intermediate coffee break (a 15 minute recording of a clock).
Figure 2.11: Collegerama recording (214020) at University of Twente in November 2007
(Source: http://videolecture.utwente.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a028-34b7f1d0dfdb)
After each recorded course, an evaluation form was used to register the opinion of the
students. Based on the positive results of the pilot project it was decided to continue the
project. Since September 2008, lectures at University of Twente can be recorded with
Collegerama i.e. Mediasite.[41] At University of Twente, two lecture rooms are available with
recording facilities for Collegerama/Mediasite (Horst C101 and Cubicus B209). There is also
one mobile recording unit available (Spiegel, for room 1, 2, 4 and 5). This unit can also be
used in other buildings and lecture rooms, if requested. The service for recording lectures is
free of charge and is provided by the ICT Service Centre of University of Twente.
In September 2009, University of Twente started using Blackboard as its digital learning
system as a replacement for Teletop. At present, TU Delft and University of Twente use the
same technical infrastructure for their digital learning environment as well as their lecture
recording and streaming system.
Figure 2.12: Overview of video lectures (214020) in Blackboard (2nd quarter of study year 2009-2010)
(Source: http://blackboard.utwente.nl/webapps/blackboard/content/listContent.jsp?course_id=_758_1&content_id=_
92264_1)
24
2. Existing systems for digitally recorded lectures
2.4
Summary
Massachusetts Institute of Technology (MIT) was the first university to start recording their
lectures back in 1999, by taking a video camera into the classroom and video-taping the
lecturer as he was teaching. Several years later, TU Delft started doing the same, using a
more sophisticated system that simultaneously records the slides along with a video stream of
the lecturer. This made the recorded lectures easier to follow, but also added a problem. The
lectures were no longer contained within a single video file, which severely limits the
possibilities for different online distribution channels.
In 2007, University of Twente decided to use the same system for the recording and
distribution of their lectures as TU Delft. After several pilot projects, they purchased 2
stationary recording units and 1 mobile recording unit of Collegerama.
The current Collegerama system has several problems:
• navigation, since it is based on inconsistent screenshots of slides
• not distributable through a single (video) file
• no easy way of browsing/searching through a lecture
2. Existing systems for digitally recorded lectures
25
26
2. Existing systems for digitally recorded lectures
3.
Distribution platforms
Collegerama / Mediasite player
At present, lectures recorded in Collegerama can only be viewed as streaming video with an
Internet connection to the Collegerama server. The movies are played within the custom Java
player developed by Mediasite. This setup has several advantages:
• no distribution channels required, avoiding its institutional and technical requirements
• single point of entry, with its benefits on updating (its content as well as the player)
• no storage required at the point of viewing/listening
Aside from these advantages, there are also a number of severe drawbacks to the current
Collegerama distribution platform:
• limited distribution options
• no offline viewing
• compatibility
• limited expansion options
Limited distribution options
The current Collegerama system can be divided into two parts:
• video stream of the lecturer (wmv)
• screenshots of PowerPoint slides (jpg)
During playback, the web player will update the screenshots based on a time index that is
stored in the configuration file of the lecture. Basically, a video stream is played and the
corresponding pictures are reloaded on the right side of the viewer during playback. Virtually
all online distribution platforms operational today require a video file to be uploaded. This file
will usually be re-encoded using a specific codec compatible with that player. YouTube for
instance uses mp4 as the way of storing its online video files. Unfortunately this poses a
problem when distributing lectures stored within the Collegerama server over any of these
other multimedia platforms. It is possible to upload the video stream, since that component is
stored in a video file format, but without the lecture slides to accompany it the lecture will
miss most of its important content.
No offline viewing
Since all the lectures are streamed over the Internet, it is not possible to view the
Collegerama lectures without an active Internet connection. This means that it’s not possible
to store the lectures and view them later on your laptop, iPhone/iPod or other mobile
multimedia device.
Compatibility
The current player that is being used within Mediasite is based on Microsoft Silverlight, which
has a bad compatibility with other operating systems such as Linux. There is a custom made
version available created by Novell, but this solution won’t always work when Mediasite
releases a new version of their player. Users are dependent on the developments by Novell to
keep their system up to date.[43]
Limited expansion options
At present, the Mediasite player cannot be easily integrated with (multi-language) subtitles.
This might be improved in future versions, but Collegerama is dependent on the Mediasite
developments in order to add custom functionality. Other channels such as YouTube do
provide these options as a default and are ahead of Mediasite in this area.
3. Distribution platforms
27
Figure 3.1: Combining the Collegerama components into a single video file enables broader distribution of recorded
lectures
Other distribution platforms
In this chapter there are two important platforms for which the options and capabilities have
been researched, YouTube and iTunes. These two have been selected for the following
reasons:
• their worldwide exposure
• the acceptance of their technical specifications by other external platforms
• the experiences of MIT (see Annex A)
• the compatibility of these technical specifications on TU Delft’s own Blackboard learning
environment, the OpenCourseWare website and other web platforms
The distribution of recorded lectures through these platforms requires the creation of a single
video file, which can be uploaded to their server. For the creating of such a Collegerama
vodcast/podcast, the following aspects should be examined:
• content (slides, audio, video, subtitles and any combination of these)
• presentation of the content (lay-out, introduction tune/movie, branding)
• video quality (resolution, frame rate)
• format of video file (mov, wmv, flv, mp4, codec etc)
• audio quality (stereo/mono, frequency range)
• format of audio file (mov, mp3, mp4, codec etc)
Above mentioned technical specifications (quality, codec) primarily determine the file size.
The technical specification should balance between quality (usability) and quantity (download
time and storage requirements).
Outline
This chapter will focus on the distribution of Collegerama over various different platforms.
The popular audio/video sharing mediums YouTube and iTunes will first be covered. After
that, a new lecture creation tool called Adobe Presenter will be demonstrated. Each of these
platforms will be thoroughly examined and a conclusion will be made about the quality of
each of these systems.
For further background information about the research on this topic, see Annex C.
28
3. Distribution platforms
3.1
YouTube
YouTube is a video sharing website where users can upload and share their videos. Three
former PayPal employees created YouTube in February of 2005. In November 2006, YouTube
was bought by Google Inc. for $1.65 billion and is now operated as a subsidiary of Google. It
uses Adobe Flash Video technology to display a wide variety of user-generated video content
and is currently the biggest distributor of streaming online video content.
Unregistered users can watch the videos, while registered users are permitted to upload an
unlimited number of videos. Videos that are considered to contain potentially offensive
content are available only to registered users over the age of 18. The uploading of videos
containing copyright violations is prohibited by YouTube’s terms of service. Accounts of
registered users are called “channels”.[44]
In the last few years YouTube became a medium for several Universities to publish their
recorded lectures on. One of the first was MIT (Massachusetts Institute of Technology), who
joined in October of 2005. Later, other Universities like Purdue (2006), Stanford (2006), UC
Berkeley (2007) and Harvard Business (2007) started publishing recorded lectures and course
material via the popular Internet medium.
Video formats for YouTube
YouTube’s video playback technology, based on the Adobe Flash Player, allows the site to
display videos with quality comparable to more established video playback technologies such
as Windows Media Player, QuickTime, and RealPlayer. These formats generally require the
user to download and install a web browser plug-in to view video content. Viewing Flash
video also requires a plug-in, but market research from Adobe Systems has found that its
Flash plug-in is installed on over 95% of the personal computers around the world.[45]
Videos uploaded to YouTube are limited to ten minutes in length and a file size of 2
gigabyte.[47] When YouTube was first launched in 2005, it was possible for any user to upload
videos longer than ten minutes, but YouTube’s help section now states: “You can no longer
upload videos longer than ten minutes regardless of what type of account you have. Users
who had previously been allowed to upload longer content still retain this ability, so you may
occasionally see videos that are longer than ten minutes.”[46] The ten minute limit was
introduced in March 2006, after YouTube found that the majority of videos exceeding this
length were unauthorized uploads of television shows and films.
Video formats and quality
YouTube accepts videos uploaded in most formats, including .WMV, .AVI, .MKV, .MOV,
MPEG, .MP4, DivX, .FLV, and .OGG. It also supports 3GP, allowing videos to be uploaded
directly from a mobile phone.
They originally offered their videos in only one format, but now use three main formats, as
well as a “mobile” format for the viewing on mobile phones. The original format, now labeled
“standard quality”, displays videos at a resolution of 320x240 pixels using the Sorenson Spark
codec, with mono MP3 audio. This was, at the time, the standard for streaming online videos.
“High quality” videos, introduced in March 2008, are shown at a resolution of up to 860x480
pixels with stereo AAC sound. This format offers a significant improvement over the standard
quality. In November 2008, 720p HD support was added. At the same time, the YouTube
player was changed from an aspect ratio of 4:3 to a widescreen 16:9 resolution. 720p videos
are shown at a resolution of 1280x720 pixels and encoded with the H.264 video codec. They
also feature stereo audio encoded with AAC.
3. Distribution platforms
29
Collegerama components
A Collegerama lecture has screenshots of the PowerPoint slides and a video of the lecturer
giving the lecture. On the web interface, these have been split up into separate parts. If the
recorded lectures in Collegerama are to be published as a vodcast, the different elements
need to be combined into a single multimedia file format.
In
•
•
•
•
the current video system of Collegerama, the following elements are kept in sync:
video of the lecturer
audio
PowerPoint slides
closed captions/subtitles (not currently used at TU Delft)
Figure 3.2: The two main components of Collegerama, video and slides
Video of the lecturer
The video part of Collegerama usually shows the lecturer, but might occasionally be switched
to a recording of the display screen for animations, movies etc. Collegerama publishes the
video stream using the following quality settings:
Resolution:
320 x 240 (ratio 4:3)
Frame rate:
25 fps
Bit rate:
370 kb/s
Codec:
wmv3
In short:
Windows Media Video 9 / 320x240 / 25.00fps / 341kbps
Audio
Audio is an important part of the vodcast. It contains all the spoken text and explanations by
the lecturer. A lecture can be followed by only having an audio recording without video, but
not the other way around. This is shown by podcasts of lectures. A video stream without
audio doesn’t make any sense. Collegerama publishes the audio stream using the following
quality settings:
Channels:
2 (Stereo)
Sampling rate: 22050 Hz (22 kHz)
Bit depth:
16 bits/sample
Bit rate:
20 kB/s
Codec:
wma2
In short:
Windows Media Audio 9.2 / 20 kbps / 22 kHz / stereo (1-pass CBR)
PowerPoint slides
The slides of a presentation contain the most detailed information. It’s important for the
viewers since it gives a guideline to the story. Fortunately the slides mostly contain keywords
at a pretty decent font size, which means that the quality and resolution do not have to be
high for it to be readable. Collegerama publishes PowerPoint slides using the following
specifications:
Resolution:
1024 x 768 (ratio 4:3)
Bit depth:
24 bits/pixel (full color)
Codec:
jpg
30
3. Distribution platforms
Closed captions / subtitles
There are different ways of publishing closed captions or subtitles on video. The most
commonly used method is a text file containing the spoken sentences along with their
corresponding timestamps. Closed captions and subtitles for Collegerama lectures are
described elsewhere. For the production of a vodcast, the subtitle files are not relevant since
they will be attached to the vodcast based on the internal timestamps of the movie file.
Publishing Collegerama on YouTube
A vodcast for YouTube should comply with the restrictions for resolution of YouTube. A
general strategy for this is to develop a vodcast at the best video quality supported by
YouTube, with the following considerations and constraints:
• movie size is limited to 2 gigabyte
• display size is limited to 10 minutes for the general public, unlimited for channel
managers like YouTube-Edu
• YouTube gives the viewer the option to display at a lower quality when bandwidth is a
limiting factor
• producing a vodcast at the highest quality enables the production of “child products” for
other platforms with a lower quality, which results in smaller file sizes or bandwidth
requirements
• YouTube converts movies with non-normalized resolution by downsizing to the nearest
standard heights of 360, 480, 720 or 1280 pixels
Within these constraints, the best quality of a Collegerama vodcast for YouTube can be
achieved by following these steps:
• reduce the size of the slides from 1024x768 to 960x720 (downsizing to 94%, keeping the
display ratio 4:3)
• leave the video resolution at 320x240
• put both elements alongside each other, giving an overall size of 1280x720 (HD720,
widescreen, display ratio 16:9)
• fill the remaining area with related info, navigation tools or leave them blank
Video
320x240
Slide
960x720
Related
info
320x480
Figure 3.3: Layout of Collegerama elements within the resolution constraints for YouTube movies (1280x720)
A layout according to this setup is given in Figure 3.3. The video is located on the right-hand
side of the slides, to give a more balanced overall picture for left to right reading. The overall
view could be mirrored to obtain an overall picture which resembles the original Collegerama
view, where the video is located on the left. A screenshot of a lecture converted to the
resolution requirements and uploaded to YouTube can be seen in Figure 3.4.
3. Distribution platforms
31
Figure 3.4: Collegerama as a vodcast for YouTube (1280x720)
Vodcast production
Single audio files are often referred to as "podcast" files. The term podcast originates from
the iPod, as iPod-broadcasting. In the slipstream of this term, single movie files are often
referred to as "vodcast" files. Originally these were downloaded files since iPod and iTunes
did not support streaming content. The meaning of these terms has later transferred into
"audio on demand" or "video on demand (VOD)", in combination with an RSS feed. This
audio or video can also be streaming audio or video, without actual distribution of a real file.
The most important step in the production of a downloadable vodcast out of a Collegerama
recording is the conversion of the PowerPoint slides into a movie. This can be achieved with
the help of screen capturing systems such as Camtasia Screen Recorder. These systems
record an assigned part of the display screen into a movie file. By playing a Collegerama
lecture, the slides can be recorded as a movie with the right time-framing. Figure 3.5 gives an
impression of such a screen recording.
Figure 3.5: Converting PowerPoint slides into a movie file by recording the Collegerama slide-display
This screen recording resulted in a movie file of 39 MB (1024x768, 15 fps, wmv3), which is
only 6.3 times the total file size of the 29 slides (1024x768, jpg). The wmv3 compression
proves to be efficient when recording still pictures, since the original 29 pictures have been
converted into over 40,000 picture frames. The captured slides movie and the Collegerama
movie have been combined into a single HD movie file of only 88 MB (1280x720, 15 fps). This
is only (88/117=) 75% of the original small sized Collegerama movie (320X240, 25 fps). The
reduction is caused by a lower frame rate and the efficient compression of the wmv3 codec
for still pictures. Converting this movie file into the H264 codec increases the file size to over
500 MB. This shows an inferior compression of the H264 codec over the wmv3 codec for this
type of movie, typically including large areas with still pictures.
32
3. Distribution platforms
Scientific research on compression efficiency of these two codes shows less significant
differences.[7][8] The common opinion is that the compression of these two codes is similar,
but wmv3 (VC-1) would require less processor power for encoding and decoding. The
differences in architecture might result in larger differences in specific situations. Moreover,
the achieved compression with these codecs is also influenced by the efficiency of the
encoding software. Wmv3 (VC-1) has more advanced features for motion compensation with
a more flexible block sizing, which might be the main cause of the observed differences. The
creation of a HD movie from a Collegerama recording increases the movie resolution with a
factor of 4x4, allowing for much better display of subtitles as is shown in Figure 3.6.
Figure 3.6: Vodcast of a Collegerama recording converts a small-sized video into a HD movie with room for proper
subtitles
Above described production of a vodcast is rather labor and time consuming. A more or less
similar result could be obtained by doing a one step recording session, where the overall
Collegerama display is recorded by Camtasia.
3.2
iTunes
iTunes is an application that allows the user to manage audio and video on a personal
computer, acting as a front-end for Apple’s QuickTime media player. Officially, iTunes is
required in order to manage the audio of an Apple iPod portable audio player (although
alternative software does exist). Users can organize their music into playlists within one or
more libraries, copy files to a digital audio player, purchase music and videos through its
built-in music store, download free podcasts and encode music into a number of different
audio formats. There is also a large selection of free internet radio stations to listen to.
Version 4.9 of iTunes, released on June 28th 2005, added built-in support for podcasting. It
allows users to subscribe to podcasts for free using the iTunes Music Store or by entering the
RSS feed URL. Once subscribed, the podcast can be set to download automatically. Users can
choose to update podcasts weekly, daily, hourly or manually. It is also possible to select
podcasts to listen to from the Podcast Directory, to which anyone can submit their podcast
for placement. The front-page of the directory displays high-profile podcasts from commercial
broadcasters and independent podcasters. It also allows users to browse the podcasts by
category or popularity and to submit new podcasts to the directory.
Video content available from the store used to be encoded as 540 kbit/s protected MPEG-4
video (H.264) with a 128 kbit/s AAC audio track. Many videos and video podcasts currently
require the latest version of QuickTime, version 7, which is incompatible with older versions
of Mac OS (only v10.3.9 and later are supported). On September 12th 2006, the resolution of
video content sold on the iTunes Store was increased from 320x240 (QVGA) to 640x480
(VGA). The higher resolution video content is encoded as 1.5 Mbit/s (minimum) protected
MPEG-4 video (H.264) with a minimum of 128 kbit/s AAC for the audio track.
3. Distribution platforms
33
Video formats for iTunes
The main focus of iTunes is to distribute content to the Apple iPod and its successors. The
original iPod was not provided with a video screen for movie display until October of 2005.
The iPod Nano received a movie display in September 2007. The screen size of the iPod
family is shown in Table 3.1.
Table 3.1: Screen sizes of the iPod and its successors
Type
iPod video
iPhone
iPod Touch
iPod Nano
iPod Nano (new)
Supported video
(external screen)
HD movies
Introduction date
October 2005
June 2007
September 2007
September 2007
September 2009
Screen size
480
480
320
376
640
x
x
x
x
x
320
320
240
240
480
Aspect ratio
1.33 (4:3)
1.5 (3:2)
1.5 (3:2)
1.33 (4:3)
1.57
1.33 (4:3)
1.78 (16:9)
Over the years, the different iPod versions have evolved to larger screen sizes and wider
screens (higher aspect ratio). If the iPhone aspect ratio is compared to the HD widescreen
ratio used today, the iPhone is somewhere in between the traditional TV and the HD
widescreen standards. All iPods support a video display of a maximum of 640x480 by use of
an external screen. As widescreen HD video has become more or less the standard nowadays,
it looks like Apple will someday also transform into larger video displays with HD
specifications.
iPod constraints for Collegerama vodcasts
For the development of a Collegerama vodcast for iTunes (and iPods), the following aspects
are of concern:
• the rather low resolution of the screen
• the different aspect ratio
These constraints have consequences for the following design aspects:
• the size of the display
• the size of the video component
• the location of the video component (upper/lower and left/right corner)
Low resolution
The resolution of the iPod is the same as that of the Collegerama video component. This
would allow for simple distribution, using just the video stream as a vodcast and leaving out
the presentation slides. Such a setup is used at MIT and many other universities. However,
the slides in Collegerama provide a lot of the lecture content, since the keywords and a large
part of the subject matter is on it.
In an alternative setup, the vodcast might include the slide part of Collegerama with the
audio of the video component. This is only an adequate alternative if the slides are readable
at this low resolution. Figure 3.7 gives an example of a typical PowerPoint slide at a normal
iPod resolution. It shows that the smaller fonts in a presentation are no longer readable at
the low iPod resolution, but the typical PowerPoint fonts can still be read quite well. The iPod
resolution is around (320/1024=) 30% of the maximum slide size in Collegerama and
(320/640=) 50% of the slide size in an overall Collegerama display.
34
3. Distribution platforms
Figure 3.7: A typical PowerPoint slide at iPod resolution (320x240)
Different aspect ratio
The iPod aspect ratio is the same as both the slides and the movie components in
Collegerama. Therefore combining these two components in a widescreen view, as is done in
the previous YouTube vodcast, is not possible. Alternative solutions are:
• the slide components are not included (video only)
• the video component is not included (audio only)
• the video component is included at a smaller size (picture-in-picture)
• the video component is included at a smaller size (side-by-side) with unequal scaling
An example of these images is shown in Figure 3.8, which gives an impression of the latter
three options.
Figure 3.8: Collegerama vodcasts with different options for the video component at iPod aspect ratio
From Figure 3.8, it is concluded that the most convenient option for including the movie
component is the picture-in-picture layout. This is based on the following considerations:
• the slides should be shown at a maximum size for proper readability (no side-by-side)
• the movie component can be reduced to a small size (thumbnail) and still remain properly
visible
• the audio component without the video component misses a focus point for the viewer
(the movements of the lecturer give a better understanding of the lecture)
Size of display
An important aspect in the design of a vodcast for iTunes is the display resolution selected for
the production and distribution. The design strategy for creating the smallest file size looks
most promising, for the following reasons:
• vodcasts for iTunes should be downloaded to and stored on the iPod of the viewers
(download time and storage capacity are relevant factors now, which is not the case in a
streaming video setup)
• small file-sized vodcasts will minimize the requests for other small sized output options
like podcasts (audio only), which would require additional production and distribution
efforts (time, costs, organization)
• a small sized design gives a larger differentiation to the YouTube HD quality design
3. Distribution platforms
35
•
•
iTunes-Uses the H264 codec, which is not as efficient in video compression as the wmv3
codec used in the YouTube design, so a smaller display size will be more relevant for a
less efficient compression
the smallest display design allows for viewing on the older iPods, which is still the
majority of the iPods currently in use
For above mentioned reasons a vodcast for iTunes will be produced with a display size of
320x240 pixels.
Size of video component
The video in Collegerama shows the lecturer talking to the attendees. For this function a
small video size is sufficient as the most important aspect of such a movie is its audio
component (spoken words). This is shown in Figure 3.9, in which the original video resolution
(320x240) is downsized to 10% of its original size. It shows that downsizing the Collegerama
video to 20% (64x48) still gives supporting visibility of the speaking lecturer.
In some recorded lectures, the lecturer is writing text on the blackboard or is presenting
experiments. Both circumstances require a larger display size for proper viewing and a full
switch from the slide view to the video component might be useful. The downside is that this
will require an extensive video-editing process which might also need the input of the lecturer.
These constraints are not within the scope of a vodcast production out of a Collegerama
recording. Production of a vodcast should be possible within a fully automated production
process.
Figure 3.9: Collegerama video in original size (320x240) and reduced to 30%, 20% and 10%
The video component in a picture-in-picture design with the slides on the background will
cover part of the slides, reducing its readability. This can be minimized by doing the following:
• selecting a small video component (10%-20%)
• making the video component (partly) transparent, still allowing for a background view
(this setup might allow for a larger video size than a non-transparent movie, 20%-30%
instead of 10%-20%)
• placing the video component in an area with the lowest disturbance of the slide view
Location of the video component
The video component should be located on the least disturbing part of the slide. Figure 3.10
gives an impression of these locations for a TU Delft PowerPoint slide at iPod resolution. It
shows that the upper-left corner and the lower-right corner are unsuitable for movie insertion.
The upper-left corner hides the important slide title, while the left corner hides the slide
number. The lower-left corner hides the TU Delft logo and the upper-right corner might hide
part of the slide title. Both locations can be deemed acceptable.
36
3. Distribution platforms
Figure 3.10: PowerPoint slide in TU design at iPod size, with and without inserted movie components (20%)
The lower-left corner might have a small advantage since this resembles the general lecture
room layout at TU Delft, in which the lecture desk is in the front-left and the projection
screen is located in the upper-center or upper-right part of the lecture room. This lecture
room layout results in many Collegerama recordings showing the lecturer looking to his/her
upper-left. With a movie component in the upper-right corner, the lecturer often seems to
look up into the “sky”.
It should be noticed that not all lecturers use the standard TU Delft PowerPoint design. If the
lecturer would have been made aware that his Collegerama recording is transformed into an
iPod vodcast, he or she might adjust the slides to keep a certain corner of the slide empty.
Therefore a uniform predesigned position of the movie component is important.
3.3
Portable Document Format (PDF)
Portable Document Format (pdf) is a file format created by Adobe Systems in 1993 for
document exchange. It is used for representing two-dimensional documents in a manner
independent of the application software, hardware and operating system. Each pdf file
encapsulates a complete description of a fixed-layout of 2D document that includes the text,
fonts, images and 2D vector graphics which compose the documents.[48] The great thing
about pdf files is the fact that all the data of a document is frozen and “digitally printed”, so
that it cannot be edited and all the layout properties are fixed. Over the years, it has become
the standard medium for distributing and sharing documents online.
A new development at Adobe is the release of Adobe Acrobat Connect Pro (formerly called
Macromedia breeze). It allowed for a new way of creating general presentations, online
training materials, web conferencing, learning modules and user desktop sharing. The entire
product is Adobe Flash based.[49] The module for creating lectures based on PowerPoint
presentations is a plug-in called Adobe Presenter.
Figure 3.11: Adobe Presenter allows the creation of lectures based on PowerPoint
3. Distribution platforms
37
There are several advantages that come with the use of Adobe Presenter, as opposed to
Collegerama:
• better navigation
• higher slide quality
• distributable through a single pdf file
Navigation
As you can see in Figure 3.12, the Adobe Presenter interface creates an automatic index
based on the different slides. On the right side you can see each slide title, which is
automatically extracted from the PowerPoint file. The great thing about this feature is the fact
that there’s a clear way of navigating through a lecture based on keywords taken from the
lecture material. This is an option that Collegerama does not have.
Figure 3.12: Screenshot of lecture CT3011 implemented within Adobe Presenter
Adobe Presenter has a different navigation system compared to Collegerama. Instead of
having a video stream that has several images of PowerPoint slides linked to it, it uses a
different approach by placing the PowerPoint presentation at the heart of the interface. This
means that there is no long video of 45 minutes with a main timeline. It splits the
presentation up into separate timelines per slide. Each of these has its own short video
attached to it with a separate timeline. As you can see in Figure 3.12, a video of 7 minutes
and 25 seconds is playing along with the first introductory slide. The problem with such a
system is that it requires the video recording of the lecturer to be split up into smaller
segments and linked to each separate slide. This is a time consuming process.
Slide quality
Since the Adobe Presenter system makes use of the original PowerPoint presentation, it has
all the slides digitally available at the highest quality. Once the lecture is converted to a
shareable format, the quality of the sheets is no longer limited to a set resolution (1024x768
for Collegerama), but is stored as a vector oriented image. This means that the viewing
quality is incredibly high compared to Collegerama.
Distributable through a single pdf file
There are two ways of distributing the recorded lectures with Adobe Presenter:
• server-based streaming
• single pdf file distribution
The obvious problem with the server-based streaming is the same as that of the current
Collegerama system. It is not possible to distribute the lectures through a standard videosharing and streaming medium such as YouTube or iTunes. This means that the distribution
options are severely limited.
When choosing the single pdf file distribution, all the data that is required to view the lecture,
the audio and video stream and the PowerPoint slides, are compacted within one single pdf
file. It offers the option of playing it on an offline device that has the Adobe Reader installed.
Once downloaded, it is also possible to play the lecture an unlimited amount of times, without
having to be connected to the server.
38
3. Distribution platforms
Unfortunately the same distribution problem arises when choosing the pdf option. Currently
none of the online streaming servers support the playing of pdf files. This means that for
other distribution channels to be available, the lecture needs to be converted back to a single
file video format.
3.4
Conclusions
Timeline
There are two approaches to creating recorded lectures:
• video-based
• slide-based
The difference between these two types is the timeline on which the lecture is based. The
video-based system is the standard Collegerama method, where a video file of the lecturer
exists and several screenshots are linked to the timeline of this video. An example of a slidebased system is Adobe Presenter. Here, the PowerPoint slides pose as a logical timeline for
the whole lecture and audio and video streams are linked to each slide.
Navigation
The current navigation system within Collegerama does not work well. It relies on the
screenshots of the PowerPoint slides that are displayed during the lecture. The problem is
that during the recording of these lectures, a screenshot is taken every 1 to 4 seconds
whenever a change on the screen has been detected. When the lecturer inadvertently moves
the mouse or plays a video in his presentation, a lot of abundant screenshots are taken and
the efficiency of navigation is greatly decreased.
Collegerama as vodcast
It is clear that if Collegerama lectures are going to be distributed through the current popular
video-sharing mediums, it is required to convert the lectures to a single video file. This is the
standard input that is required and accepted by all platforms. To do this, the two elements of
a lecture need to be combined:
• video stream of the lecturer (wmv)
• screenshots of PowerPoint slides (jpg)
A lot of thought has to go into what screen resolution to use, where to place each element
within the video stream and how to fill up any extra unused space in the newly created video.
The size of the video used is dependent on the medium while sharing it. If a vodcast stream
for an iPod or iPhone is being created, the resolution is obviously going to be a lot different
compared to a video that is created for a high definition YouTube video.
It is concluded that a video-based system is a lot better for distribution, since virtually all
popular online distribution channels do not offer support for pdf files or a server-based
infrastructure to share lectures (YouTube, iTunes-U, Academic Earth etc). By creating high
definition movies from the original Collegerama recordings, all other video versions with
different pixel sizes can be derived (for instance, vodcasts designed to fit on mobile media
players such as the iPhone or Blackberry). This HD movie has a smaller file size than the
original Collegerama recordings, due to the efficient compression of still pictures (slides as
movie). In the example lecture of 45 minutes, the file size is 88 MB instead of 117 MB.
3. Distribution platforms
39
40
3. Distribution platforms
4.
Subtitling
Subtitles form the foundation for a lot of extra functionality options, such as tag cloud
indexing, searching and translation. In this chapter, the methods for creating subtitles,
reasons for wanting to do so and ways of translating subtitles for foreign speaking students is
discussed.
There are several reasons why the addition of subtitles for Collegerama lectures is useful:
• lectures are easier to follow
• lectures are available to foreign-speaking students
• lectures can be made searchable
Lectures are easier to follow
If a lecture contains subtitles during playback, it will be possible for the deaf and people with
a hearing problem to understand what is being said. These special subtitles for the hearing
impaired are called “closed captions” or are sometimes also referred to as “subtitles for the
hard of hearing”. The term “closed” in closed captioning indicates that not all viewers see the
captions, only those who choose to decode or activate them. This distinguishes from “open
captions” (sometimes called “burned-in” or “hardcoded” captions), which are visible to all
viewers.
Most countries in the world do not distinguish captions from
subtitles. In the United States and Canada, these terms do
have different meanings. Subtitles assume the viewer can
hear but cannot understand the language or accent, or the
speech is not entirely clear, so they only transcribe dialogue
and some on-screen text. Captions aim to describe all
significant audio content—spoken dialogue and non-speech
information such as the identity of speakers and occasionally
their manner of speaking—along with music or sound effects
using words or symbols.
Lectures are available to foreign-speaking students
Subtitles are generally used to display the spoken words in a video on the screen. For every
different language, a new subtitle track has to be created. Most DVD movies that are released
in Europe contain at least the subtitle tracks for the languages German, French and English.
During production these subtitles are mostly created by hand using professional translators.
An alternative for generating different subtitle tracks is to use an automated computer system.
An example of such a service that is publically available is Google Translate. It is a beta
service provided by Google Inc. to translate a section of text or a webpage into another
language. In December of 2009 the system supports 52 different languages from around the
world. Like other automatic translation tools, it has its limitations. While it can help the reader
understand the general content of a foreign language text, it does not always deliver accurate
translations. Some languages produce better results than others.[37]
Lectures can be made searchable
Every Collegerama lecture consists of a single video stream. Without some sort of indexing
system, the only element offered is a 45-minute long video that has no possibility for skipping
through relevant parts based on a certain topic.
For further background information about the research on this topic, see Annex D.
4. Subtitling
41
4.1
Subtitling process
Subtitles for translation and searching are only composed of spoken text. This is created from
the audio track that has been extracted from the video stream. The creation method is shown
in Figure 4.1.
Figure 4.1: Creation process for subtitles
There are several ways of creating subtitles:
• manual subtitling
• real-time subtitling
• speech recognition
Manual subtitling
Many different programs can be used to manually create subtitles for a movie, but the overall
usage of them is generally the same. You start by typing in the lines of text that are spoken
in the movie. Once these are finished, the transcript needs to be matched to the time
sequences of the movie. For every line of text, a timestamp is added so that the subtitle
generator can later show the appropriate text at the right timeframe.
Figure 4.2: Screenshot of the program SubCreator
(Source: http://www.radioactivepages.com/index.php?section=software&docid=subcreator)
The advantage of this method is the easy editing of the subtitles. Everyone who can
understand the language that is being spoken can write out the transcripts of a given video
stream. Unfortunately, this process is very time consuming and therefore relatively expensive.
Real-time subtitling
Real-time subtitles have to be created within 2 or 3 seconds of the broadcast. There are
people specializing in this sort of work, called Communication Access Real-Time Translation
stenographers. They use a specialized keyboard that is specifically designed to support
shorthand writing, called a stenotype or velotype typewriter.
Real-time stenographers are the most highly skilled in their profession. Stenography is a
system of rendering words phonetically, and English, with its multitude of homophones (e.g.
there, their, they’re), is particularly unsuited for easy transcriptions. They must deliver their
transcriptions accurately and immediately.[23]
42
4. Subtitling
Speech recognition (ASR)
At the moment, speech recognition technology or Automated Speech Recognition (ASR) is still
a long way from achieving fully automatic subtitles for any program. There are still many
errors in generating text and several challenges such as background noise, different accents
and multiple simultaneous speakers make the process difficult. Speech recognition
technologies do have their place in the world of modern subtitling. ASR systems are already
used in live subtitling systems for sports, news and politics.
Translated subtitles
Previous described methods for creating subtitles can also be applied to the creation of
subtitles in languages other than the spoken language. In general, two ways of creating
translated subtitles can be distinguished:
• human translation of the spoken text (either offline or live)
• machine translation from subtitles of the spoken text
Figure 4.3: Translated subtitles improve the learning environment for non-native speaking students
At present, machine translation is not able to produce high quality subtitles. The produced
quality is either accepted as an improvement over “no translation” or used as a starting point
for human post-processing. Google Translate is a well known example of machine translation,
but many other systems are presently available. Machine translation is a booming industry
supported by an enormous amount of scientific research programs, executed at nearly every
university in the world.
4.2
Subtitles from speech recognition
Automated speech recognition (ASR)
ASR is a sub-field of computational linguistics that investigates the use of computers to
transfer spoken words into computer data, ranging from text (speech-to-text) to input control
(voice-controlled machines). The fast development of stronger computers has boosted this
field in the last decade, sometimes ironically leading to disastrous overrating, such as the
Lernout & Hauspie collapse.[57]
Speech recognition systems have been and are being developed by universities as well as by
commercial companies. Some major international institutions on ASR:
• LIMSI - Spoken language processing group (France)
• Speech research group at University of Cambridge (UK)
• Raytheon - BBN Technologies (USA)
• SRI - Speech Technology and Research (STAR) Laboratory (USA)
For recent research on ASR, a reference is made to publications of the International Speech
Communication Association (ISCA). The most recent conference of the ISCA was held
between September 6th and 10th 2009 in Brighton (UK). This 10th yearly conference
(Interspeech 2009) included 38 oral sessions, 39 poster sessions and 10 special sessions,
resulting in 762 reviewed and accepted papers.[9]
4. Subtitling
43
Performance evaluation of speech recognition
Speech recognition engines are developed for a certain language and most often a certain
environment, such as telephone conversations, voicemails, news readings, movies etc. The
performance of an ASR engine differs not only based on environment, but also on the
different speakers (male/female voice, dialect, intonation etc).
The standard evaluation metric used to measure the accuracy of an ASR engine is the Word
Error Rate (𝑊𝑊𝑊𝑊𝑊𝑊). The word error rate is defined as the ratio of word errors over the total
number of words in the correct reference transcript 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁. The number of word errors is the
sum of the number of deletions 𝐷𝐷, insertions 𝐼𝐼 and substitutions 𝑆𝑆:[12]
𝑊𝑊𝑊𝑊𝑊𝑊 =
𝐷𝐷 + 𝐼𝐼 + 𝑆𝑆
∙ 100%
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁
Note that the word error rate can be higher than 100%. For example, when the result set
contains more words then the reference transcript and all of these words are incorrect. In this
case the number of substitutions would be equal to the number of words in the reference text.
On top of that there would be insertion errors. For ASR, a 𝑊𝑊𝑊𝑊𝑊𝑊 of 50% is often considered as
an adequate baseline for retrieval.[10] Modern ASR engines have a 𝑊𝑊𝑊𝑊𝑊𝑊 between 10% and
60%. Human-made transcripts have a 𝑊𝑊𝑊𝑊𝑊𝑊 between 2% and 4%.[11]
Word accuracy (𝑊𝑊𝑊𝑊) is defined as the supplement of the word error rate:[12]
𝑊𝑊𝐴𝐴 = 100 − 𝑊𝑊𝑊𝑊𝑊𝑊
The word accuracy is not just the fraction of words correctly recognized, because the latter
does not include insertions.
Determining the 𝑊𝑊𝑊𝑊𝑊𝑊 value requires a reference transcript. By absence of such a transcript,
the quality of ASR can be indicated by the Word Correctness (𝑊𝑊𝑊𝑊). The 𝑊𝑊𝑊𝑊 value is defined
by the ratio of the number of correct words 𝑁𝑁𝑁𝑁 over the number of output words 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁:[12]
𝑊𝑊𝑊𝑊 =
𝑁𝑁𝑁𝑁
∙ 100%
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁
Word accuracy and word correctness can be used interchangeably in case the ASR engine
does not produce deletions (𝐷𝐷 = 0) and insertions (𝐼𝐼 = 0), or only in a negligible number
(less than 5% to 10%). This is often true for modern ASR engines with a good performance.
In this case, the ASR output only includes correct and incorrect words and the number of
words in the reference transcript is equal to the number of words in the ASR output (𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 =
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁), so:
𝑊𝑊𝑊𝑊 = 𝑊𝑊𝑊𝑊 = 100 – 𝑊𝑊𝑊𝑊𝑊𝑊
Speech recognition for recorded lectures
The 28 recorded lectures of course CT3011 (TU Delft) have been used as input for ASR (see
Annex E). These lectures were given in the Dutch language. Speech recognition was done
with SHoUT, a speech recognition engine for the Dutch language developed at University of
Twente by Marijn Huijbregts, as part of his PhD research.[13] SHoUT is an acronym for the
Dutch project name “Spraak Herkennings Onderzoek Universiteit Twente” (in English: Speech
recognition research University of Twente).
44
4. Subtitling
Table 4.1 gives some data on the lectures and the quality assessment. Figure 4.4 gives the
word correctness per lecturer.
Table 4.1: Quality assessment of word correctness by speech recognition on lectures
Item
Number of recorded lectures
Duration of lectures (hh:mm:ss)
Number of words in output
Sample size for assessment
Word correctness
Range
Average per lecture
23:26 – 53:51
3,581 – 9,392
4.2% – 11.0%
23% – 73%
41:20
6,748
6.1%
50%
Total
28
19:17:33
188,957
6.1%
50%
Figure 4.4: Word correctness of SHoUT for the CT3011 lectures, clustered by speaker
The results of the quality assessment of the SHoUT output can be discussed for the following
items:
• number of words
• word correctness
Number of words
For one of the 28 recorded lectures, human-made subtitles have been manually created.
These subtitles contain 6.970 words. SHoUT produces 7.351 words for the same lecture. This
5% increase is probably due to the rather low speaking-rate in lectures, for which SHoUT
divides long words into smaller words.
Word correctness
The average word correctness of SHoUT amounts to 50%, with a variation between 23% and
73%. The word correctness differs significantly for different lecturers. For the word
correctness, no correlation was found with either the gender of the lecturer (male or female
voice) or the age (lowered voice).
Subtitles from speech recognition output
For one lecture the SHoUT output was used for creating subtitles. This required substantial
input to cluster the individual words and sentences into proper subtitles. The result of this
conversion is shown in Figure 4.5.
4. Subtitling
45
Figure 4.5: Subtitles created from the SHoUT transcript
In the produced subtitles, no word correction was done. Such a correction for real lectures is
essential for using SHoUT output for subtitles. In the result set of Figure 4.5, only 7 out of
the 48 test sentences (15% of the subtitles sentences) have a word correctness of 100%.
Speech recognition engines like SHoUT might be extended with a statistical post-processor
which clusters the generated words into subtitle sentences. This post-processor will be fed by
a huge collection of subtitle sentences, in a way similar to collections used for machine
translation (see paragraph 4.3). Statistical post-processing will not only produce sentences
instead of isolated words, but might also increase the word correctness of the ASR engine by
analyzing these complete sentences. In this way, statistical post-processing will reduce the
efforts required for the production of high quality subtitles.
ASR time-coding of transcript
An alternative approach of using ASR for the creation of subtitles is the time-coding of
human-made transcripts. The ASR engine will analyze the entire transcript and try to match
the known words to similar words that are picked up. These words are then linked to the
proper time-code. The website Radio Oranje[51] shows a demo of this method for a speech
broadcast on the radio by Queen Wilhelmina during World War II. The existing transcripts of
the broadcast were available and each individual word has been time-coded by SHoUT.
4.3
Machine translation for subtitles
Machine Translation (MT)
Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of
computational linguistics that investigates the use of computer software to translate text or
speech from one natural language to another. At its basic level, MT performs simple
substitution of words in one natural language for words in another. Using corpus techniques,
more complex translations may be attempted, allowing for better handling of differences in
linguistic typology, phrase recognition and translation of idioms, as well as the isolation of
anomalies.
Machine translation can be diverted into two main approaches:
• rule based translation
• statistical translation
Rule-based machine translation relies on countless built-in linguistic rules and millions of
bilingual dictionaries. It focuses on translating separate words and afterwards correcting the
grammar by using dictionary grammar rules. The translation is predictable, but the translation
results may lack the fluency readers expect.
46
4. Subtitling
Statistical machine translation utilizes statistical translation models whose parameters come
from the analysis of monolingual and bilingual collections of texts. Building statistical
translation models is a quick process, but the technology relies heavily on existing multilingual
documents. A minimum of 2 million words for a specific domain and even more for general
languages are required. Statistical MT provides good quality when large and qualified text
data is available. The translation is fluent, meaning it reads well and therefore meets user’s
expectations. However, the translation is neither predictable nor consistent.[1]
Phrase-based statistical machine translation has emerged as the dominant paradigm in
machine translation research.[1] In order to obtain the benefit from both approaches, existing
rule-based translation systems are presently extended by statistical post-processing.[2] For
further recent research on machine translation, a reference is made to publications of the
Association for Computational Linguistics.[50] The ACL is the most prominent international
scientific and professional society for people working on problems involving natural language
and computation. Regional associations related to the ACL include:
• EACL: The European Chapter of the ACL
• NAACL: The North American Chapter of the ACL
The most recent conference of EACL was held between March 30th and April 3rd 2009 in
Athens, Greece. The 12th conference included several special workshops. Many international
researches on this subject were presenting their most recent findings at the conference and
workshops.[3][4] Research on machine translation in Europe is heavily funded by the European
Union. Their research programs on machine translations are:
• EuroMatrix Project (Sept. 2006-Febr. 2009)[52]
Project motto: Statistical and Hybrid Machine Translation Between All European
Languages
• EuroMatrixPlus (March 2009-February 2012)[53]
Project motto: Bringing Machine Translation for European Languages to the User
A special aspect of machine translation is machine transliteration. Transliteration is the
conversion from one writing system to another, with different scripts. Translation from
English to Chinese is an example of this. The most recent workshop on this subject (2009
Named Entities Workshop: Shared Task on Transliteration) was held in Singapore on August
7th 2009 as part of Association for Computational Linguistics.[5] Proceeding of this workshop
can be found on the Internet.
Many machine translation engines have been developed by universities as well as by
commercial companies. These machine translation engines compete in the quality of the
produced translation. Table 4.2 gives some examples of these engines.
Table 4.2: Some popular machine translation engines
Product
Owner
SYSTRAN
Babelfish
Translate
MOSES
Asia Online
Bing
SYSTRAN
Yahoo
Google
Open source
Asia Online (MOSES)
Microsoft
Type
(*)
R
R (+S?)
S
S
S
S
Start
year
1968
1990
2004
2006
2006
2009
Type: R= rule based S=statistical
(Source: http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications)
4. Subtitling
Languages
21
13
53
toolkit
516
20
47
Performance of Machine Translation
The performance of statistical machine translation depends strongly on the size and quality of
its data (corpus). This performance might differ with the direction of translation. Translation
from Dutch to English might differ from the translation of English to Dutch, even though it is
produced by the same translation engine. Moreover, the performance will be different in
districted domains. Translation of news bulletins might be different to translation of scientific
articles, produced by the same translation engine.
Several automatic metric scores have been developed for evaluating machine translation
performance, such as Blue, Meteor, TER (Translation Error Rate), HTER (Human-targeted
Translation Error Rate) MaxSim, ULC, and many others. However, automatic measures are
considered to be an imperfect substitute for human assessment of translation quality.[6] The
performance of some English to German machine translation engines is shown in Figure 4.6.
These results were obtained from a quality assessment by 160 translators for English and five
other languages (German, Spanish, French, Czech and Hungarian).[6] The translators were
asked to rank the outcome of 26 MT engines on 38,000 sentences (1,500-7,000 per language
pair). They were also asked to edit about 9,000 isolated sentences, coming from the MT
engines, into fluent and correct sentences without looking at the original source. This should
reflect the people’s understanding of the output. The edited output was used in the
evaluation, even in instances where the translators were unable to improve the output
because it was too incomprehensible. The edited output was given a value for the percentage
of the time that each MT system was judged to produce an acceptable translation. This value
can be considered as a value for “understandability”, not as a real measurable value, but as a
relative figure for comparison of different systems. The reference system is an online humanmade translation. Around 20% to 50% of the time, adequate edited translations were
obtained with machine translation.
Figure 4.6: The performance of some English to German translation engines compared to human translation (=ref)
[6]
Assessments showed that languages for which large and reliable language pairs are available
are better translated.[5]
Differences in evaluation of ASR and MT
Under the present state of development the values for word accuracy of ASR engines are in
the same order as the values for understandability of MT engines. However, the first is
regarded as “far from sufficient for subtitling”, while the latter is often considered as
“adequate for subtitling”. This phenomenon can be explained by the big difference in
awareness of the viewer. Hearing the speaker while watching the ASR subtitles, put a lot
more emphasis on differences between the two. Most of these differences are noticed by the
viewer and can be seen as a serious shortcoming. Bad subtitles will also result in bad
translated subtitles through the use of MT.
48
4. Subtitling
ASR can be used for searching, since mistakes in sentences won’t be visible and aren’t a big
problem for the search functionality. A word accuracy of 50% is considered as suitable and is
obviously a lot better than having no reference data for search. For MT, the criteria aren’t as
demanding, since the reference situation is that of a student who is trying to find a spoken
word in a bilingual dictionary to his native language, while the lecturer is continuing with his
lecture.
Google Translate in YouTube
If there is at least one subtitle track available, YouTube provides a translation service that can
automatically convert the subtitles to another language. This is done through the Google
Translate service mentioned. On the bottom-right of the YouTube interface, a button with the
CC logo (the official logo which stands for Closed Captions) is available to turn the subtitles
on or off. It also opens a submenu from which you can access the translation menu. When
the translation menu has been opened, the user can choose from 52 different languages that
are available under the dropdown menu. Once a language has been chosen, the subtitles will
be automatically sent to the Google Translate engine and YouTube will display the results.
Figure 4.7: Translated subtitles from Dutch to English in YouTube
Google Translate coverage has been expanded dramatically. It now supports the translation
between any of the following languages: English, Arabic, Bulgarian, Chinese, Croatian, Czech,
Danish, Dutch, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian,
Polish, Portuguese, Romanian, Russian, Spanish and Swedish. Google Translate now supports
56 language pairs and has become the most comprehensive online translation tool available
for free. In November 2009, YouTube announced that they will expand their services for
translating subtitles. Users will be able to post-process the subtitles generated by Google
Translate. This service also includes the use of Google’s ASR system for generating timetagged subtitles for YouTube-Edu channels (initially only available in English). As part of this
service it will be possible to upload transcripts which will be time-tagged by Google’s ASR
system.
Human post-processing
The present automatic translation by Google Translate results in translated subtitles which
are readable for 20% to 80% of the time. It is expected that this quality will improve
significantly over the next few years. This quality improvement is obtained with the help of
larger and better data sets.
For
•
•
•
recorded lectures at TU Delft, the following translation pairs are most significant:
Dutch to English (BSc courses for non-Dutch-speaking MSc students)
English to Dutch (MSc courses for Dutch professionals, as life-long learning material)
English to any other language (MSc courses for non-native English speaking students)
4. Subtitling
49
For these target areas the present quality of machine translation might be considered to be
insufficient. Manual post-processing might be used in these cases for improving the quality of
the machine translation output.
4.4
Text-to-speech for translated subtitles
Having proper translated subtitles opens the door for spoken subtitles in the native language
of the student. Dubbing of lectures is possible by using text-to-speech engines. In the chain
from spoken words to speech recognition (speech to text) to machine translation (text to
translated text) to spoken translated words (text to speech), this part has been most
developed. IBM’s ViaVoice Text-To-Speech is an example of such a service, which is available
online. It should be noted that “Real-Time Translation Service” will be a major research goal
for the near future. Another example is MeGlobe, which is an instant messaging service with
real-time translation to and from over 15 languages (see Figure 4.8). For educating foreign
speaking students, such developments will be a serious boost. This futuristic development is
not further elaborated within the scope of this research project.
Figure 4.8: Will automatic real-time translation engines become available within the next decade?
(Source: http://www.meglobe.com)
4.5
Conclusions
It is concluded that producing subtitles for a video lecture opens up a lot of new possibilities.
Having the option of turning subtitles on in the same language as the spoken text could make
lectures easier to follow for certain students. For Dutch students who follow an English
master course, it adds to their learning experience if those lectures are subtitled in Dutch.
Subtitles can also be useful as a basis for searching of lecture content.
The present state of development in speech recognition for producing subtitles, and machine
translation for producing translated subtitles, has been investigated in this research project.
The current speech recognition technology has also been evaluated for the generation of
proper subtitles. For this, the speech engine created at University of Twente, called SHoUT,
has been used.
With this ASR engine a word correctness of 25% to 75% was observed for the 28 Dutch
spoken lectures that were tested. It is concluded that this system is not yet sufficient to
generate proper subtitles and manual post-processing is always required. Machine translation
allows for a decent translation, which is always better than having no translation at all. Using
it professionally in the education program still requires substantial post-processing.
A problem that most universities currently have is that certain master courses have a
prerequisite bachelor course that is given in Dutch. Foreign speaking students who are only
going to do a master need to know the subject matter of these courses, but aren’t able to
look back through those lectures. With subtitles and MT technology, it becomes possible for
them to at least follow part of the lecture (dependant on the quality of translation).
50
4. Subtitling
5.
Navigation and searching
Presently, Collegerama does not provide any form of search functionality. The Collegerama
catalog shows an overview of all recorded lectures in a course in a crude form. An example of
this catalog is shown in Figure 5.1.
Figure 5.1: Catalog of recorded lectures in a course
(Source: http://collegerama.tudelft.nl/mediasite/Catalog/?cid=16b5f5fa-0745-4b8b-9f02-f79a03abf50a)
The lecture titles and the name of the lecturer are usually wrong. The only correct metadata
of a lecture are the recording date and time (announced as air date and time) and the
duration of the recording. Searching for a particular lecture in Collegerama can only be done
by sorting on this improper metadata. This form of searching seems far from sufficient. Due
to this inadequate metadata, the lecturer usually creates an URL-link of a particular lecture
recording within Blackboard, the digital learning environment. In Blackboard the lecturers
have full access to the published course material.
Within a lecture, the only navigation and/or search facility of Collegerama is the overview of
slides. Using this thumbnail table during playback hides the view of the current slide. The
main drawback of the Collegerama navigator is the disturbance caused by screen actions
either by mouse movements or by screen actions, due to a PowerPoint animation or by
writing on an electronic blackboard. This enormous amount of screenshots makes this slidebased system completely unsuitable for navigation (see Figure 2.6).
This description clearly shows the need for improvement of navigation and searching facilities
in the Collegerama environment. In this chapter the possibilities for searching in and
browsing through recorded lectures in a course will be presented. Initially, navigation in
movies and DVD’s is presented, as well as the scientific research on multimedia retrieval
systems.
In
•
•
•
the next paragraph, the following sources of information are presented:
lecturer (lecture titles, lecture chapters)
slides (slide titles, slide content, slide notes)
spoken words (transcripts, subtitles and/or speech recognition output)
Afterwards, the different products are presented:
• search engine on lecture data in a course
• tables of content (for courses and lectures)
• tag cloud presentations of lecture content
Finally the results will be evaluated in order to determine a proposal for searching facilities,
i.e. required sources and proposed output. For further background information about the
research on this topic, see Annex E and F.
5. Navigation and searching
51
5.1
Meta-data for navigation and search
Elaborating the improvement of navigation and search within lectures recorded by
Collegerama might be preceded by investigating these aspects in parallel environments or
disciplines. For navigation of videos, the navigation within DVD and Blu-Ray movies can be
evaluated. These movies are considered to be the most commonly accepted development in
user accessibility. For search, the latest developments in multimedia retrieval have been
studied.
Selecting of and navigation in DVD movies
The selection and navigation process for (recorded) lectures could be compared to the
selecting (buying/hiring) and viewing of DVD movies. The movie box sets containing movies
from a TV series can be considered as comparable to courses containing recorded lectures.
To make a proper selection, the potential viewer requires further information on the actual
content of the movie box set and its movies. This metadata is normally printed on the movie
box set and on the cover of the individual movies. With this concept in mind, the primary
metadata of courses, lectures and lecture content is presented in Table 5.1.
Table 5.1: Primary metadata for selecting of and navigating in recorded lectures
Course
University
Course name
Responsible teacher
Course code
(Academic) Year
Academic discipline
Faculty
Logo
Lecture
Lecture title
Name of lecturer
Course name (and year)
Date of recording
Initial slide (picture)
Tag cloud of content
Screenshots (picture story)
Short description
Lecture content
Table of contents
Tag cloud of content
Screenshots (picture story)
Short description
Not all metadata is text. Screenshots, logos and tag clouds are pictures which give a better
impression on the movie box (course) and its movies (lectures) than text in titles and
descriptions. For navigation within a movie itself, Table 5.2 gives the analogy for recorded
lectures.
Table 5.2: Analogy of navigation in DVDs and recorded lectures
Element
Main menu
Submenu per chapter
DVDs
Chapters
Scenes
Recorded lectures
Chapters
Slides
Search in movies
Searching in movies is studied in the research discipline of computational multimedia
information retrieval.[14] Such video information retrieval focuses on searching in video
collections by using various methods of abstracting information from video recordings. The
abstraction of spoken text (speech recognition) for data retrieval or the detection of shot
changes for segmenting can be mentioned as examples of these methods. Figure 5.1 gives
an overview of a multimedia information retrieval system, as described in the book
Multimedia Retrieval.[14]
52
5. Navigation and searching
Figure 5.2: Schematic view of a multimedia information retrieval system
[14]
Specific elements of multimedia information retrieval with relevance for recorded lectures are:
• languages for metadata[15]
• presentation of search results[16]
• evaluation of Multimedia Retrieval Systems[17]
An important element in searching within multimedia data is the relation between the video
content and the metadata. For recorded lectures, this relation can be fixed by using timetagging. With time-tagging, the metadata is related to a certain time interval in the
multimedia content. Subtitles with a particular begin and end time is a typical example of this.
Other items such as slide views (pictures/scenes) and chapters can be time-tagged. Figure
5.3 gives an impression of searching in multiple parallel metadata of recorded lectures.[18]
Figure 5.3: Searching in parallel metadata of videos
5. Navigation and searching
[18]
53
Searching in a multimedia system will give a result set. The user will be confronted with this
result set in order to further select one or more of the results for actual viewing. For this,
selection it might be essential that the user is able to see the context of the result element.
As a user looks for the keyword “water”, the result set will show multiple occurrences of this
word. Information about the context of the search result or the source type that the data
came from might be relevant for evaluating the search results. This constraint requires
context-preserving information retrieval.[19]
5.2
Metadata sources
Input from lecturer
The Collegerama recording system is based on input from a video camera and input from
screenshots at the display-computer. These screenshots should be regarded as a low level of
screen recording with a maximum frame rate of 1 fps. Thumbnails of screenshots are used
for navigation in Collegerama/Mediasite. For this the individual screenshots can be clustered
in a group showing only one thumbnail in the navigation screen. This clustering is done
automatically during recording. At TU Delft, this results in the generation of far too many
thumbnails. Further clustering can be done in a manual post-processing session, but this is
currently never done. The lecture recording department is understaffed to handle this task
and the lecturer does not have access to the Collegerama server. The ultimate result is that
the recorded lectures often lack a proper navigation.
The lecturer should get access to the Collegerama server so that the overhead of screenshots
in the recorded lectures can be corrected. As an alternative approach, the recording
department may develop an offline tool (or web based data collection system) in which this
clustering can be done. Such a system could be used for collecting all data from the lecturer,
such as:
• proper lecture title
• accurate name of the lecturer or lecturers
• time based chapter titles of a lecture
• time based correlation between recording and original PowerPoint slides
• original PowerPoint presentation (either as ppt or as pdf file)
The main purpose of the data collected from the lecturer is to create a proper table of
contents for the recorded lecture. To accomplish this, the lecture should be divided into 3 to
8 “chapters” for a 45 minute lecture. This provides each lecture with chapter durations of
approximately 5 to 15 minutes. The lecturer should at least create a “text slide” per chapter
in case the lecturer does not use a PowerPoint presentation or equal presentation tools (such
as electronic blackboards etc). This text slide is used as an equivalent to a presentation slide
and is shown during the playback of the whole chapter. The collected data can be
incorporated into a database per course (Collegerama data system) and might also be used
to improve the original Collegerama recording/navigation. This database can also be used to
generate a proper table of contents (TOC), containing all recorded lectures of a course. This
might replace the original Collegerama catalog. The word correctness of text information from
data collected from the lecturer is estimated at 90% to 100%. The text itself has completely
been recovered from the PowerPoint slides, but the lecturer might have made mistakes while
creating them.
Input from slides
Text on PowerPoint slides form a rich source of data for recorded lectures. The text data from
slides can be divided into:
• slide titles
• slide content
• slide notes
54
5. Navigation and searching
The text data of PowerPoint slides can automatically be retrieved from a digital file, either
from the ppt/pptx file and/or the “printed” pdf file. This data can then be inserted into the
Collegerama data system. The slide titles form a table of contents (TOC) of the lecture based
on the timing input of the lecturer. Every word in the text itself is automatically retrieved;
however the text that is shown in pictures requires a different technique (Optical Character
Recognition). In this research project, OCR has not been used to accomplish this.
Input from spoken words
The spoken text in recorded lectures might be available in one of the following forms:
• transcript (full text, without timestamp)
• subtitles (time-stamped per sentence)
• words (time-stamped per word)
For the sample course CT3011, the following sources are available:
• subtitles and transcript of the sample lecture #15 of this course (transcript generated
from human-made subtitles)
• words of all lectures retrieved through speech recognition (SHoUT)
For the speech recognition by SHoUT, the word correctness of all recorded lectures has been
determined (see Annex E). The mean word correctness is 50%, with values between 23%
and 73% (standard variation 14.6%).
5.3
Metadata storage
All the collected metadata can be incorporated into a Collegerama data system. For this
research project this database is restricted to only the recorded course. The database shown
consists of 2 tables:
• lectures, containing all metadata related to the lecture as a whole
• content, containing all metadata within the course, on a time-based level (start and end
time in milliseconds)
A visual representation of each table, its columns and their corresponding data type is given
in Table 5.3 and Table 5.4.
Table 5.3: Database table Content
Field name
Content_id
Lecture_id
Start_time
End_time
Text_type
Text
Table 5.4: Database table Lectures
Field name
Lecture_id
Lecture_nr
Title
Lecturer
Air_date
Collegerama_id
5. Navigation and searching
Data type
int
int
int
int
int
nvarchar(MAX)
Data type
int
int
nvarchar(100)
nvarchar(50)
datetime
nvarchar(50)
55
For this project, only text data has been included into these tables. A future addition could be
the adding of thumbnails per record, so that a characteristic screenshot preserves the context
of information. This screenshot might be taken at a certain time moment 𝑇𝑇𝑇𝑇𝑇𝑇 in the time
interval (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 to 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) at a fixed elapsed time interval:
𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 + 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∙ (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇)
In the latest YouTube movies, the typical screenshot for a movie shown at selection is taken
at 33% of the length (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0.33). The screenshot might be replaced by storing the value
for 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 in the metadata tables, in case the movie and metadata are stored in a multimedia
retrieval system. The value of 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 per record could be flexible giving additional selection
freedom to the lecturer.
Figure 5.4 and Table 5.5 give an impression of the data collected in the Collegerama data
system.
Figure 5.4: Source and number of records in the Collegerama data system for the course CT3011 (assuming subtitled
for all lectures)
Table 5.5: List of Text_types and the amount of records and words in the database for course CT3011
ID
1
2
3
4
5
6
7
8
9
Text type
Lecture title
Lecture chapter
Slide title
Slide content
Slide notes
Transcript (lecture)
Transcript (slide)
Transcript (sentence)
Transcript (word)
Nr of records
28
116
1,183
1,042
280
28
1,183
21,812
118,926
Nr of words
129
300
3,900
15,943
22,512
* 179,480
* 179,480
* 179,480
188,926
Nr of characters
917
2,526
28,741
129,195
142,856
* 768,058
* 768,058
* 768,058
808,482
* 95% of the total number of words generated by SHoUT, based on the comparison between the human-made
subtitles and the SHoUT subtitles
5.4
Course and lecture navigation
Tables of content
The Collegerama data system can be used as a generator for a table of contents (TOC) for:
• an overview of recorded lectures in a course
• an overview of content in a recorded lecture
For this research project, two prototypes of TOC’s have been evaluated:
• a static TOC (list)
• an interactive TOC (based on Macromedia Flash technology)
56
5. Navigation and searching
Static table of contents
Figure 5.5 gives an impression of a static TOC generated from the Collegerama data system.
Figure 5.5: Table of contents for recorded lectures in course CT3011, generated from the Collegerama data system
The generated TOC lists all lecture titles, the lecturer and the duration. The TOC also contains
a hyperlink to the related Collegerama recording. This generated TOC is an improvement over
the TOC generated by the lecturer, created as an improvement over the Collegerama catalog,
for the following reasons:
• a uniform layout for the whole university
• possibility for automatically updating after modification of the content within the
Collegerama data system
Interactive table of contents
Figure 5.6 gives an impression of an interactive TOC generated from the Collegerama data
system. In this example a Flash movie is generated containing:
• time slider for all chapters, including chapter titles
• time slider for all slides, including slide titles
• screenshots of HD movie (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0.1)
Figure 5.6: Interactive TOC for recorded lecture #15 in course CT3011, generated from the Collegerama data system
The generated TOC shows the screenshot of the HD movie whenever the users’ mouse goes
over the related time slider section. This is synchronized with the related chapter. The
chapter slides does the opposite, showing the first slide in the chapter. This TOC gives a
proper viewing of the content of the lecture and is a great improvement over the Collegerama
thumbnail navigation.
5. Navigation and searching
57
A similar interactive TOC can be generated for each course. This interactive TOC might show
additional metadata such as:
• lecture name
• date and time of recording (air date/time)
• short description of the lecture
• tag cloud of the lecture
The Flash technology allows for relatively large amounts of text information. Flash movies
contain vector based text which keeps it sharp at all magnifications (for example at full
screen display).
Tag clouds
A tag cloud is a selection of tags or a list of relevant words from a document, in which the
size of each tag is based on its frequency of occurrence. The Collegerama data system can be
used as a generator for Tag clouds for a certain lecture. These are considered to be a useful
representation of the content of a lecture. Annex F evaluates different forms of tag clouds. A
basic relationship between frequency and word size is:
𝐶𝐶 – 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
∗ 𝑅𝑅𝑅𝑅𝑅𝑅 + 𝐵𝐵𝐵𝐵𝐵𝐵
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 – 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
In which:
𝑆𝑆
= font size of word (pixels)
𝐶𝐶
= frequency count for the word (or tag)
C𝑚𝑚𝑚𝑚𝑚𝑚 = frequency count for the least popular word (or tag)
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = frequency count for the most popular word (or tag)
𝑅𝑅𝑅𝑅𝑅𝑅
= largest font size minus smallest font size for words (pixels)
𝐵𝐵𝐵𝐵𝐵𝐵
= smallest font size for words (pixels)
𝑆𝑆 =
In practice, more sophisticated relations are also applied, such as logarithmic or different
non-linear relations as well as all kinds of clustering algorithms. Tag clouds have been studied
on various other aspects, such as order of words, layout of words, color usage etc.[22] Tag
clouds are often produced using specialized websites, such as MakeCloud, Wordle or
ToCloud.[54][55][56] The tag clouds for this research project have been produced via the website
Wordle. Figure 5.7 gives an impression of such a web-generated tag cloud.
Figure 5.7: Tag cloud for recorded lecture #15 in course CT3011, generated by Wordle, with and without deleted
words by prof J.C. van Dijk [54]
Evaluation of tag clouds
In this research project, tag clouds have been produced from different data sources or text
types (subtitles, ASR output, slide titles, slide content). These tag clouds were evaluated in
order to determine rules for creating the best tag clouds that could best represent the
content of a lecture. All tag clouds have been produced in black and white with the same font
face, in order to have only the font size as a distinctive element. In most cases the words for
the tag clouds have initially been “cleaned” by removing “common Dutch words” (common
according to wordle.net) or by selecting only the nouns. These tag clouds and the
assessment experiments are reported in more detail in Annex E and Annex F.
58
5. Navigation and searching
This assessment was done in 2 steps:
• quality assessment of 10 tag clouds with 15 to 100 words from different sources
• quality assessment of 10 uniform tag clouds with 15 words from the same sources
The second step was based on the remarks by the lecturer on the first step:
• tag clouds with 100 words are always unacceptable since these are unreadable
• tag clouds with 25 to 35 words contain too many irrelevant words
In the second step, the lecturer was asked to assign a sequential ranking of the 10 tag clouds
(actually 9, as #6 was identical to #7) and to mark irrelevant words in each tag cloud for
deletion, in order to obtain a better representation for the lecture. In Table 5.6, the results of
this second assessment are shown. It contains two rankings, the first is the ranking as given
by the lecturer, the second is this ranking combined with a ranking based on the number of
deleted irrelevant words.
Table 5.6: Tag cloud assessment of modified tag clouds (all 15 words)
ID
Source
Cleaning
method *
1
Slide titles
1
2
Slide content
1
3
Slide titles and
slide content
Slide notes
Human subtitles
Human subtitles
Human subtitles
Human subtitles
SHoUT output
1
4
5
6/7
8
9
10
1
1
2
2
2
Lecturer assessment results
General
appearance
Many same sized
(small) words
Too many same
sized (small) words
Too many same
sized (small) words
Word “chloor” is
missing
Rank
5
Nr of deleted
words/rank
4 (=1)
Total
rank
6
7
7 (=6)
13
8
5 (=2)
10
6
9
4
1
3
2
9
15
11
5
5
6
(=7)
(=9)
(=8)
(=2)
(=2)
(=5)
13
18
12
3
5
7
* 1 = after removing common Dutch words; 2 = nouns only
Table 5.6 shows that the two tag clouds from nouns in the subtitles (#8 and #9) have the
best overall ranking. These two tag clouds contain the same words, but differ in letter font
and layout of the words. The best readable font (Coolvetica) was preferred by the lecturer
over a less readable font (Vigo). The lowest number of deleted words was obtained from the
slide titles. However, the produced tag cloud contains a low variance in font size, so there
isn’t a large distinction in occurrence. The variance in word count in subtitles is much larger
giving a more pronounced picture. The tag cloud from the SHoUT output has a lower ranking
than the human subtitles, because it misses an important word (“chloor”) and has more
deleted words. The other produced tag clouds were significantly less appreciated.
The following conclusions have been made from these results:
• tag clouds should contain less than 15 words
• tag clouds should be obtained from “nouns only”
• tag clouds from subtitles (or speech recognition) are preferred over tag clouds from slide
titles (or slide content / slide notes), because of their larger variance in font size
• tag clouds need a “best readable font”
• tag clouds could be improved by removing bad words chosen by the lecturer
The use of colored tag clouds is not evaluated, since this might be largely dependent on the
personal preference of a lecturer.
5. Navigation and searching
59
5.5
Collegerama lecture search
The collected data is the source for the Collegerama lecture search engine. Figure 5.8 gives
an impression of this.
Figure 5.8: Collegerama search engine produced for the course CT3011
The produced search engine allows for selecting each individual data source. The user might
search for a certain word or word combination in the selected sources. Along with this, the
user might also search over all lectures or within a particular lecture. It is also possible to look
through all the available content leaving the keyword empty, which results in:
• a table of contents (TOC) of the course (by selecting only the lecture titles)
• a table of contents (TOC) over the lecture (by selecting only the slide titles in a particular
course)
The output of the Collegerama Lecture Search, shown in Figure 5.8, presents the following
context-preserving data:
• data source (subtitles, slide titles, etc)
• lecture number (ID)
• lecture title
• lecturer
• time interval (begin, end)
• queried keyword, with 30 preceding and 30 subsequent letters
Evaluation of search engine
The performance of search engines on recorded lectures is studied in the research discipline
of Spoken Document Retrieval (SDR).[20] SDR involves the retrieval of excerpts from
recordings of speech using a combination of automatic speech recognition and information
retrieval techniques. Movies and videos form a sub domain of Spoken Documents. Special
60
5. Navigation and searching
workshops on evaluation of information retrieval systems for movies and videos have been
organized under the name TRECVid (Text Retrieval Conferences on Videos).[21]
For this research project, an analysis on the results of certain important keywords has been
evaluated. The following tests have been done:
• comparing query results from ASR output versus human-made subtitles
• comparing query results from all data sources
• analyzing the video length of search results, based on different data sources in
Collegerama lecture search
• “precision and recall” measurement[17]
• analyzing multiple keyword queries
Comparing query results from ASR versus human-made subtitles
The query results of the 15 most-used nouns on both data sources are presented in Table 5.7.
The data has been abstracted from lecture #15. In determining the query results of the word
“water”, compounds such as “drinkwater”, “drinkwatervoorziening”, “grondwater”,
“oppervlaktewater” have not been included (as is the case for “drinkwater” in
“drinkwatervoorziening”). This table also shows the 5 deleted words that are marked by the
lecturer as less relevant in the assessment of tag clouds (see chapter 5.4), leaving the ten
most important words (marked by “ok” in the table) as selected by the lecturer.
Table 5.7: Occurrences of the 15 most used nouns from ASR versus human-made subtitles
Keyword
chloor
drinkwatervoorziening
boek
oppervlaktewater
plaatje
vragen
soort
water
stoffen
grondwater
Nederland
dingen
keer
drinkwater
jaar
Total
Total ok words
Lecturer
check
ok
ok
ok
ok
deleted
ok
deleted
ok
ok
ok
ok
deleted
deleted
ok
deleted
ASR
(occurrences)
0
4
5
7
6
6
7
33
8
20
35
17
16
16
28
208
134
Human-made
subtitles (ref)
(occurrences)
10
16
15
15
11
10
9
39
9
21
36
16
13
13
16
249
184
𝑾𝑾𝑾𝑾 for single
word
(%)
0%
25%
33%
47%
55%
60%
78%
85%
89%
95%
97%
106%
123%
123%
175%
84%
73%
The most remarkable result in the occurrences is the word “chloor”, which has been indicated
by the lecturer as one of the ten most important words. This word has not been recognized
by SHoUT as being an uncommon word in the Dutch language. This word or item is therefore
not retrieved from the lecture if no correct subtitles are available.
The word accuracy ( 𝑊𝑊𝑊𝑊 ) for “jaar”, “keer” and “drinkwater”, shows that for searching
composed words in the ASR output, it is better to search for word components instead of full
words. This is illustrated by the low 𝑊𝑊𝑊𝑊 of the word “drinkwatervoorziening”. A 𝑊𝑊𝑊𝑊 of above
50% is expected from the ASR output as the accepted or expected quality level for ASR
engines. The word “boek” has a lower 𝑊𝑊𝑊𝑊 in the ASR output, which shows that for SHoUT,
this word is difficult to decode. This word has also been indicated by the lecturer as one of
the ten most important words.
5. Navigation and searching
61
Comparing query results from different data sources
In order to evaluate the different data sources, the search results of the ten most important
keywords of lecture #15 (as determined by the lecturer) have been compared. The results
are shown in Table 5.8.
Table 5.8: Occurrences of the 10 most important keywords (as determined by lecturer) from different data sources
Keyword (lecturer)
water
Nederland
grondwater
drinkwatervoorziening
boek
oppervlaktewater
drinkwater
chloor
vragen
stoffen
Total occurrences
Nr retrieved
keywords
% retrieved
keywords
Subtitles
39
36
21
16
15
15
13
10
10
9
184
10
ASR
33
35
20
4
5
7
16
0
6
8
134
9
Slide
titles
5
0
0
0
0
0
1
0
0
0
6
2
Slide
cont.
2
0
1
0
1
1
0
0
0
0
5
4
Slide
t+c
7
0
1
0
1
1
1
0
0
0
11
5
Slide
notes
0
3
1
2
0
2
2
0
0
0
10
5
Lecture
title
0
1
0
1
0
0
0
0
0
0
2
2
Lecture
chapter
0
0
0
0
0
0
0
0
0
0
0
0
100%
90%
20%
40%
50%
50%
20%
0%
The results of Table 5.8 show that for searching in lectures, the lecture chapter titles are of
no importance, since none of the important keywords are retrieved. The slide titles and the
lecture titles only retrieve 20% of the keywords. These three text types are particularly
suitable for navigation, but clearly not for searching. To a lesser extent, the same holds true
for slide content and slide notes, which retrieve 40% to 50% of the keywords.
The overall ASR word correctness of this lecture is 46%, as shown in Annex E. The word
correctness over the keywords is 73% (= 134 / 184). When comparing the keywords
themselves, 90% of them are retrieved by ASR. These results show that ASR gives a drastic
increase in search results over slide data. Having human-made subtitles will further increase
the search results to an assumed 100% value. The results of Table 5.8 can partly be
explained by the fact that transcripts, either from ASR, subtitles or other, contain around ten
times more words than the slide content. This is shown in Table 5.5.
Video length per data source
The query results indicate how many of the items are found in a search, but not how long the
accompanying video length is for each item. Searching an item in (non time-tagged)
transcripts may indicate the lecture in which the item is used, but the user has to watch/listen
to the whole lecture to actually come across the correct video segment. Assuming a constant
speaking rate might give a best guess to jump to the equivalent time-frame, but in most
cases this is not suitable for the user. The time correctness of a search is related to the video
length or duration (end time minus start time) of the related video fragment. The video
length per data source in Collegerama lecture search is shown in Table 5.9.
62
5. Navigation and searching
Table 5.9: Video length per data source in Collegerama lecture search for course CT3011
Data source
(text type)
Lecture title
Transcript (lecture)
Lecture chapter
Slide title
Slide content
Slide notes
Transcript (slide)
Transcript (sentence)
Transcript (word)
Description
Minimum
(sec)
1,351
Maximum
(sec)
3,231
Mean
(sec)
2,451
Chapters by lecturer
Slide data
15
2
2,197
611
592
55
Subtitles
ASR output
0.6
0.0
6.0
3.4
3.4
0.3
Lecture recording
Table 5.9 shows that the video length for slides may vary between 2 seconds and 7:28
minutes, with a mean value of 58 seconds. This means that on average the user has to wait
for nearly 1 minute to encounter his searched item. This video length might be acceptable for
recorded lectures, as most spoken text has a relevant surrounded text. In general, all spoken
text belongs to that particular slide, as the lecturer explains the slide content. More detailed
searching for a specific sentence can be achieved by searching in subtitles or time-tagged
words (such as the ASR output of SHoUT). With time-tagged words, it is possible to show a
karaoke-type subtitling, with sentences and coloring of the spoken word. An example of this
can be seen at the website for Radio Oranje, in which old transcripts have been time-tagged
by ASR (SHoUT).[51]
Precision and recall measurement
The effectiveness of an information retrieval system is often measured by the combination of
“precision” and “recall”.[17] Precision is the fraction of retrieved objects that is relevant. Recall
is the fraction of relevant objects that is retrieved. These values can be defined in the
following formulas:
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 =
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 =
𝑟𝑟
𝑅𝑅
𝑟𝑟
𝑛𝑛
In which:
r
= number of relevant documents retrieved
𝑛𝑛
= number of documents retrieved
𝑅𝑅
= total number of relevant documents
The measurements require a set of objects or documents for which the number of relevant
objects is known. For searching in recorded lectures, slides can be considered as documents
since they give an overall overview of the subject matter. For the Collegerama lecture search
engine, these test can be executed on the data of lecture #15. The slides of the lectures can
be used as an object for these tests. A slide is regarded to give a completed subset of a
lecture in which the related subject matter is explained. A detailed description of this test is
given in Annex F, while the results are shown in Table 5.10. The test was done on 3 of the 10
“important words” of lecture #15: “stoffen”, “grondwater” and “chloor”.
The words “stoffen” and “grondwater” were selected because of their high ASR accuracy and
their low occurrence in all lectures. It is assumed that these 2 keywords will also give a high
ASR accuracy for the other lectures, despite the fact that most of these lectures were given
by other lecturers. The low occurrence will result in more profound results. The word “chloor”
was selected because of the missing of this word in ASR.
5. Navigation and searching
63
Table 5.10: Precision and recall measurement for different data sources on 3 important words of lecture #15
Item
Data source
Occurrences
Number of related slides (𝑅𝑅)
Number of slides retrieved (𝑛𝑛)
Number of related slides
retrieved (𝑟𝑟)
Recall (𝑟𝑟 / 𝑅𝑅)
Precision (𝑟𝑟 / 𝑛𝑛)
subtitles
subtitles
ASR
slide titles
slide content
slide notes
ASR
slide titles
slide content
slide notes
stoffen
9
4
4
0
0
0
4
0
0
0
Keyword
grondwater
21
5
6
0
1
1
5
0
0
0
chloor
9
2
0
0
0
0
0
0
0
0
ASR
slide
slide
slide
ASR
slide
slide
slide
100%
0%
0%
0%
100%
-
100%
0%
0%
0%
83%
0%
0%
0%
0%
0%
0%
-
titles
content
notes
titles
content
notes
Table 5.10 shows that for the keyword “stoffen”, the slide recall and slide precision for ASR
are both 100%, despite the fact that the retrieval rate for this keyword was only 89% (8 out
of 9, according to Table 5.8). Although ASR has missed 1 occurrence, on an
object/slide/document level this was not relevant. Slide data does not give any recall for this
keyword and consequently no precision. For the keyword “grondwater”, the slide recall is 100%
(20 out of 21, according to Table 5.8). The precision is 83% since 1 additional slide is
retrieved by ASR. Again, for recall and precision the slide data is of no importance. The
keyword “chloor” was missed by ASR and is not shown on the slides. Consequently the recall
is 0%. The above mentioned recall and precision measurements show higher values on slide
level than were obtained by a previous a known item/keyword search.
Multiple-keyword search
Multiple-keyword searching on individual subtitles will not give a positive result, as these
keywords are never used in one particular sentence and won’t be retrieved as one record in
the database. The same holds true for searching on individual words from ASR. A solution to
this problem is offered by storing all spoken text belonging to a slide, called a slide transcript.
The time-code contains a start and end time for the slide. The same is done for an entire
lecture. This will allow for the searching of combined keywords. The student can use the slide
or lecture timeframe as the starting point for further viewing.
Searching within spoken text per slide is included in the database but not implemented in the
prototype for the web interface. Evaluation of this feature has been done directly on the
database. This approach results in the storing of the same data in multiple records.
Transcripts per lecture could be searched by a search engine using the transcript per word
(ASR output). The approach used gives additional flexibility in the layout of transcripts, which
enables more sophisticated output options. A lecture transcript can be printed in a more
convenient way if additional line breaks are included. This option is not available if lecture
transcripts are automatically abstracted from word transcripts.
If a multiple-keyword search is done on the ASR data for the words “stoffen” and
“grondwater” in lecture #15, 8 results are returned. When clustering this result set by slide,
there are only 2 slides out of a total of 29 slides that contain both keywords. The slide
timeframe 24:07-25:09 gives 1 paired result and the slide timeframe 29:47-33:35 gives 4
paired results. The total viewing time for the combined results is reduced from the lecture
duration of 45:09 minutes to only 4:50 minutes.
64
5. Navigation and searching
Table 5.11: Occurrence of combinations of 2 important words in all lectures
Lecture #
15
17
19
20
21
23
25
Total occurrence
Number of lectures
Number of slides
Total duration
stoffen
grondwater
20
12
26
2
75
21
10
stoffen +
grondwater
in lecture
8
11
11
2
4
15
1
stoffen +
grondwater
in slide
1+4
1+1+1
1+1+2+1+2
0
1+1+1
1+2+1
1
8
11
11
3
4
15
1
54
8
-
223
18
-
52
7
270
5:22:05
23
6
17
20:57
These 2 words can be searched in all lectures. Both keywords are present in 7 lectures.
Without a multiple-keyword search per slide, this will require a total viewing time of 5:22:05
hours in order to see all results. If a search is done on slide level, only 6 lectures will be
retrieved, with a total of 17 slides in which the combination of keywords is found. This
reduces the viewing time to only 20:57 minutes. Searching on slide level reduces the total
viewing time to 6.5%, or a reduction of 93.5%.
Ranked search results (implementation)
The search results have to be ordered according to a certain norm. In this research project,
two of these options have been evaluated:
• time-based
• rank based
Time-based
In this order method, all the results are sorted in chronological order. This makes sense for
recorded lectures, assuming the sequential explanation of key items in lectures. Later in the
course, the key items are explained in further detail. In SQL Server, this can be accomplished
by ordering the query results on Lecture_nr and Start_time. The query that can be used for
this is shown below:
SELECT *
FROM Content
INNER JOIN Lectures ON Content.Lecture_id = Lectures.Lecture_id
WHERE CONTAINS (Text, 'stoffen')
AND Lecture_ID = '15'
ORDER BY Lectures.Lecture_nr, Start_time, Content.Text_type
Rank based
SQL Server has a function ranks search results based on several factors:
• text length
• number of occurrences of search words/phrases
• proximity of search words/phrases in proximity search
• user-defined weights
The query that can be used for this is shown below:
SELECT *
FROM Content AS FT_TBL INNER JOIN
CONTAINSTABLE(Content, Text, 'stoffen') AS KEY_TBL
ON FT_TBL.Content_id = KEY_TBL.[KEY]
WHERE Lecture_ID = '15'
ORDER BY KEY_TBL.RANK DESC;
5. Navigation and searching
65
Evaluation
With the rank based approach, the ASR results (text_type = 7) are higher ranked than the
other types, because each record only contains one single word. According to the relevance
ranking system Okapi BM25[58], these will be evaluated as being of high relevance. Similar
results can be expected for subtitles and slide titles in comparison with slide notes and slide
transcripts. These effects might be corrected by using user-defined weights for different text
types. However, this has not been tested in this research project. The current search engine
uses time-based ordering.
5.6
Conclusions
Table of contents
A table of contents (TOC) of a recorded lecture is an important element in the navigation of
recorded lectures. Such a table of contents should be drafted by the lecturer or the assisting
staff. It’s useful to prevent such a TOC in an interactive way based on screenshots from the
HD movie, the timeline and the available text data. This could be generated automatically
into a Flash movie by accessing the database content and the related HD lecture movie.
Tag clouds
The accessibility of a recorded lecture could be further enlarged by creating tag clouds per
lecture (limited to 15 words). The following conclusions have been made from these results:
• tag clouds should contain less than 15 words (nouns only)
• the best source of information for tag clouds are human-made subtitles
• tag clouds from subtitles (or speech recognition) are preferred over tag clouds from slide
titles (or slide content / slide notes), because of their larger variance in font size
• tag clouds need a “best readable font”
• tag clouds could be improved by removing bad words chosen by the lecturer (in our
examples, 25%-40% of the words were removed)
Search engine
A search engine on the course database is a useful element for enlarging the accessibility of
the course and its lectures. It forms as an additional component over the navigation tools
such as table of contents and tag clouds. The following conclusions can be made:
• the best source of information for searching are human-made subtitles followed by ASR
output
• chapter titles and slide content has a low importance for searching
• chapter titles and slide titles are only relevant for the generation of table of contents
• by clustering subtitles or ASR per slide, multiple-keyword searching is largely improved
because of shorter viewing times in the search results (in our example lecture, it was
reduced from 5.3 hours to 21 minutes)
For the proper operation of a search engine, the output of a speech recognition system with
sufficient word correctness is required. Better retrieval rates can be obtained with full
subtitles. In view of the other beneficial elements of subtitles (for machine translation, and
for better following of the lectures) these subtitles are considered as an essential part of all
recorded lectures.
Future extensions
A text-based database per course can also form as a basic container for a course discussion
board, using time stamped remarks (“questions and answers”, discussions). A further
extension to the database and search engine could be the adding of the other course material,
such as readings (books, lecture notes), activities (assignments, tests, lab tests) and practice
exams.
66
5. Navigation and searching
6.
Proposed improvements
From the information and knowledge derived in this research project, as described in the
previous chapters, it can be concluded that the usability of recorded lectures can be
expanded. However, to increase the usability, it will be necessary to improve and extend the
existing lecture recording and storage system. These improvements and extensions can be
divided into these four categories:
• improved lecture accessibility
• improved navigation and searching
• addition of online discussion
• re-using recorded lectures to increase the course frequency
In paragraph 6.1 through 6.4, each of these elements will be discussed and the
accompanying recommendations for improvement are mentioned. These improvements are a
combination of conclusions from this research project, as well as suggestions and
recommendations for future developments.
Paragraph 6.5 will give the outline of a pilot project for further development of these
proposed improvements. This project can be regarded as a practical approach for
implementing the conclusions and recommendations of the previous paragraphs.
6.1
Lecture accessibility
Improving the accessibility focuses on giving more students access to recorded lectures,
independent of their location, computer device or operating system. The ultimate goal is to
offer all lectures in several different video formats, as well as a small sized version that is
designed specifically for mobile devices like the iPhone or Windows Smartphones.
All lectures need to have subtitles of the spoken language, as well as translated subtitles for
the most common foreign languages such as English, Spanish, French, German and Chinese.
This will support the student exchange programs that are available at most universities in the
Netherlands.
We
•
•
•
can divide the improvement of accessibility into these three general categories:
vodcast distribution
subtitling
translation
Vodcast distribution
Since TU Delft likes to offer lectures to any student, no matter what his or her location is,
several vodcast versions need to be produced. At the moment the only way to watch
recorded lectures is by having access to a broadband Internet connection that has enough
bandwidth to support the online streaming of videos. This makes it impossible to watch
lectures while being in the train or bus, where a fixed high-speed Internet connection isn’t
available (mobile GPRS and EDGE data networks do not suffice).
A prototype for the integration between streaming and downloadable recorded lectures within
the Blackboard environment is shown in Figure 6.1. This figure shows the different
downloadable video formats in which this sample lecture is available, as well as the related
course items.
6. Proposed improvements
67
Figure 6.1: Online viewing (YouTube) and available downloads and links for a recorded lecture
(Source: http://blackboard.tudelft.nl CT3011-OpenCourseWare – Lecture new – demo-version)
Subtitling and translation
Subtitling has proved to be a substantial improvement to the online viewing experience of
lectures. It is therefore recommended to display the subtitles of the spoken language for all
different lectures in Collegerama. Furthermore, Dutch lectures (in the BSc phase) should be
subtitled in proper English whenever the course is regarded as a useful resource for English
speaking MSc students. For this goal, an automated translation as offered by Google
Translate might be of insufficient quality. Additionally, English spoken courses could be
subtitled in the Dutch language as a service to people who have trouble understanding
English.
Subtitles available in one or two languages enables automated subtitling in other languages.
Such an automated subtitling system could be convenient for non-native English-speaking
students. This service reduces the need for using a dictionary in order to understand the
lecture, which is common for Chinese students in their first MSc year.
Subtitles in the original spoken language can be created with the help of an ASR system such
as SHoUT. The word-error rate of SHoUT is rather high (30%-70%), however these systems
do provide an accurate timing of the spoken words. Human post-processing should improve
the generated text and should divide the text in sentences, as is needed for proper subtitling.
Figure 6.1 gives an impression of subtitles in the spoken language of a lecture. Subtitling and
translated subtitles are further described in chapter 4.
6.2
Navigation and searching
Recorded lectures have an average duration of 30 minutes for a short lecture and 100
minutes for a double lecture session (discounting the break time). For first-time viewing this
might be considered as acceptable, resembling the live course environment. However for
reviewing lectures at a later time, better browsing, navigation and search capabilities are
required. This is especially true for students who are studying for the exam and are browsing
through the course material and/or doing course assignments.
Students also need a much better indication of the content of a certain lecture. The only
available piece of metadata available is the lecture title. Searching for specific course content
is not possible. The following improvements are recommended:
• browsing the lectures and its content through a course navigator and/or table of contents
• searching the course content (online search engine)
• indication of the course content by presenting a tag cloud for each lecture
The contours of a search engine and the creation of tag clouds have been described in
chapter 5. A course and slide navigator could be produced from the content of the search
engine. Such navigators function as an interactive table of contents. Figure 6.2 shows the
improved navigation and searching within the Blackboard environment.
68
6. Proposed improvements
Figure 6.2: Tools created from the Collegerama database (slide navigator, tag clouds and search application) will
significantly improve the accessibility of recorded lectures
(Source: http://blackboard.tudelft.nl CT3011-OpenCourseWare – Lecture (new) – demo-version)
6.3
Student interaction
Live lectures given in a lecture room allow for a direct form of communication between
students and lecturer. This communication is two-way. The lecturer might ask the students
some questions and receive feedback in order to test his educational performance. The rest
of his lecture will then be based on this response. When a recorded lecture is used, this form
of communication is no longer available.
A similar kind of discussion can be achieved by employing an online message board linked to
each recorded lecture. Students will be able to ask questions, discuss events and topics
during the lecture and receive feedback from the lecturer. During a live lecture the frequency
of these questions is very low when the student attendance is very high. They are either too
far away from the lecturer and/or students dislike interrupting a large classroom and drawing
a lot of attention to them. Such an online messaging system also promotes student-tostudent discussion and interaction that is not possible during a live lecture, since it will hinder
the other classmates. In general, an online discussion board linked to recorded lectures could
greatly increase and promote frequency of students asking questions.
An online discussion board will have even more value when the discussions are moderated by
the lecturer or someone from the teaching staff. This moderation could include the answering
of questions and the removal of silly unrelated remarks. This form of discussion can be
complemented by adding the option to post time-tagged questions and comments. This
means that the student can ask a question based on a certain timeframe within the lecture to
which the question is relevant. With such a form of time-lined discussion, other students
might look for specific remarks. These time-based discussions could be accessed by means of
a search engine and/or a time slider that gives a popup whenever a discussion is related to
that moment within the lecture. Figure 6.3 gives an impression of such a time-based
discussion for online poker lectures.
Figure 6.3: Time-lined online discussions on recorded lectures are common practice for the online educational poker
community
(Source: http://www.deucescracked.com/videos/1210-Episode-Seven)
6. Proposed improvements
69
6.4
Increasing course frequency
Recorded lectures with improved accessibility and provided with online communication
facilities could allow for the repeating of a course in the same academic year. These recurring
courses might be of importance in the following situations:
• students following a minor program in another faculty (all scheduled in the first academic
semester) might miss courses in their own faculty
• students with deadlines for BSc or MSc exams might encounter problems when preferred
courses are not available in the current and/or next course period
These students can now be given the option of following and trying to pass the course
through self-study, since all recorded lectures and accompanying material can now be shared.
It could facilitate better study results and shorter study durations. A moderating lecturer can
provide students with the required assistance and help by answering questions via the online
communication facilities. Figure 6.4 gives a visual representation of these recurring courses.
Figure 6.4: Multiple scheduling of courses with recorded lectures and online/moderated assistance by a lecturer
Time-critical courses
If TU Delft wants to apply this program of recurring courses within the same academic year,
then this multiple scheduling is beneficial to the following types of time-critical courses:
• last year BSc courses
• minor-program courses (inside/outside faculty in first semester)
• courses for exchange students (Erasmus Mundus exchange in 1 semester)
• courses in cooperation with other universities (unparalleled scheduling)
• intensive courses (3 full weeks instead of 10 weeks of 30%)
• courses in graduate school (for starting PhD students, multiple starting moments)
Giving the students more freedom in choosing when to follow a certain course within an
academic year, should have a positive influence on the time it takes for them to complete
their education. Often times, a student will have to wait several months before he or she can
follow a specific course that is required for them to finish their curriculum.
Figure 6.5 shows a visual representation of the current lecture situation, along with 3 possible
ways to execute such a recurring course system. The green bars represent a live lecture that
is given in front of students in a classroom. The yellow bars represent a course that is given
primarily online, in which no live lectures are available. The red dots constitute the moments
of examination.
70
6. Proposed improvements
Present situation
Extra recorded course before second exam
Extra recorded course in other semester
Full year course
Figure 6.5: Examples of multiple scheduled courses
These additional online courses without live lectures should be provided with an online
discussion board, to allow for the input of students by asking questions and comments of the
lecturer by answering them. This further promotes students helping each other and starting a
dialogue about the presented course material. The lecturer also acts as a moderator for this
discussion board.
Scheduling
When all lectures are pre-recorded and available, it is easy to simply allow students access to
all the lectures. In that fashion, they can decide whenever they want to watch a lecture.
Another option is to create scheduled releases of pre-recorded lectures. This means that all
lectures are made invisible, but are released at set intervals (for instance, every week). Such
a system simulates the experience of following a live course in which students go to the
classroom every week.
This form of scheduled releasing of lectures might give the following advantages:
• improving the weekly attendance by students (fixation in calendars of students)
• increased concurrent attendance by concentrating students into virtual classrooms
• allowing for moderation by lecturers (supporting the virtual classroom)
In the online poker teaching community, such a system is already employed. They offer the
recurring releasing of pre-recorded lectures on a weekly basis. An impression of such a
schedule that is offered at a poker instruction website called Deuces Cracked is shown in
Figure 6.6.
Figure 6.6: Online poker courses are scheduled on specific days, in order to enlarge the attendance and to promote
live online discussion
(Source: http://www.deucescracked.com/)
6. Proposed improvements
71
6.5
Pilot project for further development
Goals
The above described improvements can best be developed in a pilot project under a real
educational environment. The goals for the pilot project are summarized in Table 6.1. This
table shows both the required short term improvements (1-3 years) as well as the long term
goals (5-10 years).
Table 6.1: Current situation and goals for future academic courses
Current
situation
1 time a year
1 location
1 language
Short term
improvements
2 times a year (each semester)
between 1-3 locations (3TU)
Dutch and English (subtitled)
Long term goals
5 times a year
3-10 locations (associated universities)
plus 1 or 2 other local languages
The developments in Table 6.1 are based on two alternative approaches:
• classroom courses, with a live moderating lecturer
• scheduled self-study courses, with an online moderating lecturer
It is recommended that these are developed within the scope of a pilot project and run
alongside the ongoing TU Delft OpenCourseWare project. A similar concurrent pilot project
could also be done at University of Twente. The project should include about 5 to 10 courses,
giving enough content to apply for a YouTube-Edu account and/or an iTunes-U account.
These platforms require a minimum volume of around 100 video lectures organized in 5 to 10
courses.
The pilot project should focus on expanding the scheduling of courses from once a year to at
least once per semester (repeated courses with recorded lectures) and the expansion of the
course locations from only in Delft or Twente to at least one other location (simultaneous
distant learning, with live streaming and the playing of recorded lectures). This approach
covers a classroom environment. A classroom approach is preferred for this demo since it
gives the smallest deviation to the current curriculum and it allows for the maximum amount
of feedback from the students.
In a second phase, the focus could be shifted more towards individual self-learners. In this
phase it should be established whether a scheduled organization gives better results over a
free agenda approach.
Developments of new products
Different additional new products have to be developed in order to achieve the above
mentioned goals within this pilot project. Table 6.2 gives an overview of these products, for
which Figure 6.7 gives the relations.
72
6. Proposed improvements
Table 6.2: Additional products for expanded usability of recorded lectures
Item
Videos
- HD-video (YouTube)
- Mini-video (iTunes) *
Table of contents
- Course (Flash)
- Lectures (Flash)
Subtitles
- Course language
- NL / EN (optional)
Search
- Course search
- Tag clouds
Discussion board
- Course discussion board
- Lecture discussion board
* if design differs from HD-videos
Addition to /
Replacement for
Responsible
Number
per
course
MMS
MMS
5 - 30
5 - 30
Collegerama catalog
Collegerama slide navigator
Lecturer
Lecturer
1
5 - 30
Lectures
NL in EN / EN in NL
MMS
MMS
5 - 30
5 - 30
TOC course/lectures
TOC lectures
MMS
MMS
1
5 - 30
Course
Lectures
MMS / Lecturer
MMS / Lecturer
1
5 - 30
Collegerama online view
Collegerama online view
Figure 6.7: Recorded lectures are embedded in a Multimedia Information Retrieval System, containing multimedia
content and structured course and lecture metadata
Requirements
The usability of recorded lectures can be expanded with the following requirements and/or
additional provisions:
• proper recording
• HD movie creation
• post-processing by lecturer
• post-processing by data creator
Proper recording
It is concluded that a better system for recording slides needs to be developed. Looking at
the future of education and the increasing developments in technology, it seems clear that
presentations are going to be supported by animation and video. This means that an old
screenshot recording system will no longer be sufficient to properly record PowerPoint slides.
Re-using recorded lectures requires proper recording of a lecture. For this the following
guidelines can be given:
• record lectures in a natural classroom environment (“recorded for a live audience”, no
“talking head” recording)
• no slides, no recording (if not, creation of slides is required during post-processing)
• use full audio recording (minimum of 1 extra microphone, for introducing speaker and/or
for the lecture room audience)
• add full screen recording options to Collegerama, for animations, electronic drawing
boards, movies, computer demos (minimum of 5 fps, preferably 10-25 fps)
• original Collegerama camera (small size movie) should follow the lecturer at all times,
never the projected slide or PowerPoint material
6. Proposed improvements
73
HD movie creation
The creation of a HD movie from a Collegerama recording will allow for the distribution of
recorded lectures via YouTube, iTunes and Blackboard. For the creation of this HD movie, the
following conclusions and guidelines can be presented:
• Collegerama recordings can be used as a basis for the creation of a HD movie (minimum
of 1280x720)
• a HD movie is preferred for streaming and distribution
• a uniform design of HD movies is proposed
• several LQ movies can be derived from this HD movie for the distribution on alternative
platforms (mobile phones, mobile media players, iPod/iPhone)
• the HD movie is prepared for subtitles (no hard-coded subtitles, always as separate
subtitle text files)
Post-processing by the lecturer
Recorded lectures require post-processing with the following guidelines:
• provide lectures with proper lecture titles, speaker names etc
• divide lectures into 2-10 chapters (5-15 minutes per chapter)
• connect the video time-frame to the original PowerPoint slides
• eventually improve the slides and/or add slides (explaining text)
This post-processing should be done either within the Collegerama system (by using special
login access for lecturers) or in a new recorded lecture data system.
Post-processing by Collegerama services
Recorded lectures require post-processing by a data creator with the following guidelines:
• import the slide data into a database (slide titles, slide content, slide notes)
• create tag clouds based on subtitles or slide titles for each lecture
• create subtitles for the lectures (at least in the spoken language, preferably in the
additional Dutch or English language)
• create interactive tables of contents (both for the lectures in a course as well as for the
chapters and slides in the individual lectures)
• create a search engine for course content and lectures
• create a discussion board for the course and the individual course lectures
• provide these elements within the Blackboard environment of the course
The post-processing for creating subtitles might be largely reduced when better performing
ASR systems become available, which includes statistical post-processing of the result set
produced by the word decoder of the system.
74
6. Proposed improvements
7.
Conclusions
At present, Delft University of Technology records around 10% of their lectures. This number
is expected to increase in the following years. Having these recorded lectures opens the door
to all kinds of new ideas and improvements for their educational programs. At this moment
they employ a video streaming system called Collegerama, which allows viewers with an
active Internet connection to watch their lectures online. It combines a video stream of the
lecturer with a series of screenshots of the accompanying PowerPoint slide.
In this thesis, a broad spectrum of possibilities for expanding the usability of recorded
lectures has been examined and evaluated. The main research question for this project is:
How can we efficiently and effectively present recorded lectures and course material to
students at universities?
This main research question has been divided into three sub-questions, which are discussed
below.
How can we increase the accessibility and availability of the recorded lectures in Collegerama?
To increase the availability of the lectures, it is recommended to create a single video file
from the Collegerama recordings. This will allow for the distribution over many other popular
online multimedia platforms, such as YouTube-Edu and iTunes-U. A single video file
distribution allows for offline viewing without an active broadband Internet connection (for
example, while sitting in the train or lying at the beach). This is not possible within the
current Collegerama system.
In this research project, a Collegerama lecture has been converted into a single video stream,
after careful review of several layout designs and technical specifications. This lecture has
been published on YouTube. Several other technical formats have been created, so that the
lecture can also be distributed elsewhere. This includes a smaller sized version, created
specifically for mobile devices and has been tested on Apple’s latest iPhone.
How can we make recorded lectures easier to follow, especially for foreign speaking students?
To make lectures easier to follow, it is concluded that the creation and displaying of subtitles
is useful. These subtitles can automatically be translated using machine translation. For this
research project, Google Translate has been used which currently supports translation to 52
different languages. Although the quality of these has not been tested on Collegerama,
evaluations in EACL show that around 20% to 50% of the time, adequate edited translations
was obtained with machine translation. If necessary, this generated text can be enhanced by
manual post-processing. The current speech recognition technology has also been evaluated
for the generation of proper subtitles, using the speech engine created by University of
Twente called SHoUT. It has an average word error rate of 50% and it's concluded that this
system is not yet sufficient to generate proper subtitles and manual post-processing to
improve the output is always required.
How can we effectively and efficiently navigate and search within recorded lectures?
This research project has shown that to properly navigate through the available recorded
lectures, the input from teachers is important. They need to provide the lecture title and
divide their lectures into several chapters with a proper chapter title, based on separate
timeframes (start time and end time). These chapters together with the slide titles and slide
content form the foundation for navigation and searching. The search element can be further
expanded by the available subtitles. For the purpose of this research project, all lecture titles
and chapters provided by the lecturer, slide titles and content and the generated SHoUT
transcripts for all 14 lectures (28 lecture videos) have been collected. The slide metadata has
been digitally and automatically extracted from the original PowerPoint files.
7. Conclusions
75
All this new information and metadata has been stored in a multimedia database, so that the
retrieval options for the lecture content could be researched. This database will serve as the
source for all the additional options for navigation and searching:
• generating a static and/or interactive table of contents for each lecture (based on the
lecture chapters)
• generating tag clouds
• displaying subtitles in several different languages
• searching within lecture material
To demonstrate its functionality, a prototype for a Collegerama lecture search engine has
been developed. This is an online web application that can be accessed from any location
with an active Internet connection and searches within all the above mentioned data linked to
a lecture. Every search result provides a link to Collegerama, so users can immediately see
the related part of the lecture. The following conclusions can be made:
• the best source of information for searching are human-made subtitles followed by ASR
output
• chapter titles and slide content has a low importance for searching
• chapter titles and slide titles are only relevant for the generation of table of contents
• by clustering subtitles or ASR per slide, multiple-keyword searching is largely improved
because of shorter viewing times in the search results (in our example lecture, it was
reduced from 5.3 hours to 21 minutes)
For the proper operation of a search engine, the output of a speech recognition system with
sufficient word correctness is required. Better retrieval rates can be obtained with full
subtitles. In view of the other beneficial elements of subtitles (for machine translation, and
for better following of the lectures) these subtitles are considered as an essential part of all
recorded lectures.
Future developments
It is concluded that a better system for recording slides needs to be developed. Looking at
the future of education and the increasing developments in technology, it’s clear that
presentations are going to be supported by more animation and video. This means that an
old screenshot recording system will no longer be sufficient to properly record PowerPoint
slides.
To further increase the usability of the recorded lectures, a new interactive way to discuss
lectures with the teacher and other students needs to be introduced. It promotes the asking
and answering of questions, not just by the teacher but also by fellow classmates. This can
be done through the use of a dynamic message board that is linked to the timeline of each
lecture. Students can comment and discuss on the different topics in the lecture. To support
such a system, an extension of the current multimedia database is required, so that the
messages along with their optional timeframes can be stored.
With these recommendations, it is possible to use recorded lectures as a foundation for future
online-given courses without the need for live lectures.
76
7. Conclusions
List of references
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Improving Mid-Range Reordering using Templates of Factors
Hieu Hoang and Philipp Koehn
In: Proceedings of the 12th Conference of the European Chapter of the ACL, pages
372–379, Athens, Greece, 30 March – 3 April 2009. Association for Computational
Linguistics 2009
SMT and SPE Machine Translation Systems for WMT’09
Holger Schwenk and Sadaf Abdul-Rauf and Loïc Barrault
In: Proceedings of the 4th EACL Workshop on Statistical Machine Translation , pages
130–134, Athens, Greece, 30 March – 31 March 2009. Association for Computational
Linguistics 2009
Proceedings of the 12th Conference of the European Chapter of the ACL
Athens, Greece, 30 March – 3 April 2009. Association for Computational Linguistics
2009
Proceedings of the 4th EACL Workshop on Statistical Machine Translation
Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics
2009
Proceedings of 2009 Named Entities Workshop (NEWS2009): Shared Task on
Transliteration
Singapore, 7 August 2009, The Association for Computational Linguistics and The Asian
Federation of Natural Language Processing, 2009
Findings of the 2009Workshop on Statistical Machine Translation
Chris Callison-Burch, Philipp Koehn, Christof Monz and Josh Schroeder
In: Proceedings of the 4th EACL Workshop on Statistical Machine Translation , pages 128, Athens, Greece, 30 March – 31 March 2009, Association for Computational
Linguistics 2009
An Overview of VC-1
Sridhar Srinivasan and Shankar L. Regunathan
In: Visual Communications and Image Processing 2005, Proc. of SPIE, 2005
The VC-1 and H.264 Video Compression Standards for Broadband Video Services
Jae-Beom Lee and Hari Kalva
Springer 2008 (Google books: http://books.google.nl/books?id=MKhbLPHRb78C page
114-122)
Conference Proceedings of the 10th Annual Conference of the International Speech
Communication Association (Interspeech 2009 Brighton, UK) (including Abstract book
and conference programme)
International Speech Communication Association, 2009 (partly available
at http://www.interspeech2009.org/conference/)
Speech indexing
Roeland Ordelman, Franciska de Jong and David van Leeuween
In: Multimedia retrieval, Henk Blanken, Arjen P. De Vries, Henk Ernst Blok and Ling
Feng, Springer 2007
The Rich Transcription 2007 Speech-To-Text (STT) and Speaker Attributed STT (SASTT)
Results
J. Fiscus and J. Ajot
In: Presentation at NIST’s Rich Transcription 2007 Meeting Recognition Workshop,
National Institute of Standards and Technology, 2007
Evaluation of Text and Speech Systems
Laila Dybkjæ , Holmer Hemsen and Wolfgang Minker, Springer 2008
Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled (PhD
thesis)
M. A. H. Huijbregts
University of Twente, Nov. 2008
List of references
77
[14] Multimedia Retrieval
Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling Feng
Springer 2007
[15] Languages for Metadata
Ling Feng, Rogier Brussee, Henk Blanken and Mettina Veenstra
In: Multimedia Retrieval, Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling
Feng, Springer 2007
[16] Interaction
Erik Broertjes and Anton Nijholt
In: Multimedia Retrieval, Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling
Feng, Springer 2007
[17] Evaluation of Multimedia Retrieval Systems
Djoerd Hiemstra and Wessel Kraaij
In: Multimedia Retrieval, Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling
Feng, Springer 2007
[18] A Query Model to Synthesize Answer Intervals from Indexed Video Units
Sujet Pradhan, Keishi Tajima and Katsumi Tanaka
IEEE Transactions on Knowledge and Data Engineering, Volume 13, p. 824 - 838
IEEE 2001
[19] Towards a Unified Framework for Context-Preserving Video Retrieval and
Summarization
Nimit Pattanasri, Somchai Chatvichienchai, and Katsumi Tanaka
In: Proceedings of 8th International Conferences on Asian Digital Libraries (ICADL
2005), Springer 2005
[20] TREC-6 1997 Spoken Document Retrieval Track - Overview and Results
John S. Garofolo, Ellen M. Voorhees, Vincent M. Stanford and Karen Sparck Jones
In: Proceedings of the 6th Text REtrieval Conference (TREC 6), National Institute of
Standards and Technology (NIST), 1997
[21] Evaluation Campaigns and TRECVid
Alan F. Smeaton, Paul Over and Wessel Kraaij
In: Proceedings of the 8th ACM international workshop on Multimedia information
retrieval (MIR 2006), Association for Computing Machinery (ACM), 2006
[22] Tagging: people-powered metadata for the social web
Gene Smith
New Riders (Pearson Education), 2008
[23] Submissions to the captioning standards review
Department of Communications, Information Technology and the Arts; Australian
Caption Centre
78
List of references
List of URL’s
[24] MIT Facts 2009: Faculty and Staff [online] [Cites 20 October, 2009]
Access: http://web.mit.edu/facts/faculty.html
[25] Enrollment Statistics: MIT Office of the Registrar [online] [Cites 7 November, 2009]
Access: http://web.mit.edu/registrar/stats/yrpts/index.html
[26] Research – The Center for Measuring University Performance [online] [Cites 7
November, 2009]
Access: http://mup.asu.edu/research_data.html
[27] Kauffman Foundation study [online] [Cites 7 November, 2009]
Access: http://web.mit.edu/newsoffice/2009/kauffman-study-0217.html
[28] MIT [online]. Instructor Profile: Walter Lewin [Cites 20 October, 2009]
Access: http://ocw.mit.edu/OcwWeb/web/courses/instructors/lewin/lewin.htm
[29] Our History – MIT OpenCourseWare [online] [Cites 20 October, 2009]
Access: http://ocw.mit.edu/OcwWeb/web/about/history/index.htm
[30] Home Page - OpenCourseWare Consortium [online] [Cites 20 October, 2009]
Access: http://www.ocwconsortium.org/
[31] About – Academic Earth [online] [Cites 21 October, 2009]
Access: http://academicearth.org/about
[32] About VideoLectures.Net [online] [Cites 21 October, 2009]
Access: http://videolectures.net/site/about/
[33] Alexa Top 500 Global Sites [online] [Cites 21 October, 2009]
Access: http://www.alexa.com/topsites
[34] YouTube Blog: Higher Education for All [online] [Cites 21 October, 2009]
Access: http://www.youtube.com/blog?entry=uvxBVPuf4A8
[35] Apple Announces iTunes-U on the iTunes Store [online] [Cites 7 November, 2009]
Access: http://www.apple.com/pr/library/2007/05/30itunesu.html
[36] TAG - definition of TAG by the Free Online Dictionary [online] [Cites 7 November,
2009]
Access: http://www.thefreedictionary.com/TAG
[37] Google Translate [online] [Cites 7 November, 2009]
Access: http://translate.google.com/
[38] Collegerama.nl [online] [Cites 28 October, 2009]
Access: http://www.collegerama.nl/
[39] Paul Copier – LinkedIn [online] [Cites 28 October, 2009]
Access: http://www.linkedin.com/in/paulcopier
[40] Open Course Ware: Home [online] [Cites 28 October, 2009]
Access: http://ocw.tudelft.nl/
[41] Inleiding Videolectures [online] [Cites 28 October, 2009]
Access: http://www.utwente.nl/videolecture/
[42] Collegerama Etalage [online] [Cites 28 October, 2009]
Access: http://collegerama.tudelft.nl
[43] Silverlight-op-Linux plugin Moonlight 1.0 beschikbaar [online] [Cites 2 November, 2009]
Access: http://webwereld.nl/nieuws/55091/silverlight-op-linux-plugin-moonlight-1-0beschikbaar.html
[44] YouTube Community Guidelines [online] [Cites 7 November, 2009]
Access: http://www.youtube.com/t/community_guidelines
[45] Adobe – Flash Player Statistics [online] [Cites 2 November, 2009]
Access: http://www.adobe.com/products/player_census/flashplayer/
[46] Learn More: Longer videos. - YouTube Help [online] [Cites 2 November, 2009]
Access: http://help.youtube.com/support/youtube/bin/answer.py?hl=en&answer=7167
3
[47] Getting Started: Video length and size - YouTube Help [online] [Cites 2 November,
2009]
Access: http://www.google.com/support/youtube/bin/answer.py?hl=en&answer=5574
3
List of URL’s
79
[48] PDF Reference – sixth edition [online] [Cites 7 November, 2009]
Access: http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
[49] Adobe Acrobat Connect Pro [online] [Cites 7 November, 2009]
Access: http://www.adobe.com/products/acrobatconnectpro/
[50] The Association for Computational Linguistics [online] [Cites 25 November, 2009]
Access: http://www.aclweb.org/
[51] Radio Oranje demo [online] [Cites 7 December, 2009]
http://hmi.ewi.utwente.nl/choral/radiooranje.html
In:
Choral (http://hmi.ewi.utwente.nl/choral/index.html)
University of Twente, 2005-2010
[52] The EuroMatrix Project (Sept. 2006 – Febr. 2009) [online] [Cites 14 December, 2009]
Access: http://www.euromatrix.net/
[53] EuroMatrixPlus – Bringing Machine Translation for European Languages to the User
[online] [Cites 14 December, 2009]
Access: http://www.euromatrixplus.net/
[54] Wordle – Beautiful Word Clouds [online] [Cites 14 December, 2009]
Access: http://www.wordle.net
[55] Make A Tag Cloud [online] [Cites 17 December, 2009]
Access: http://www.makecloud.com
[56] Keyword Cloud Generator [online] [Cites 17 December, 2009]
Access: http://www.tocloud.com/
[57] How High-Tech Dream Shattered In Scandal at Lernout & Hauspie [online] [Cites 14
December, 2009]
Access: http://web.archive.org/web/20060420133237/http://www2.gol.com/users/coy
nerhm/how_high.htm
[58] Integrating the Probabilistic Model BM25/BM25F into Lucene [online] [Cites 18 January,
2009]
Access: http://nlp.uned.es/~jperezi/Lucene-BM25/
80
List of URL’s
Annexes
A.
Recorded lectures at MIT
B.
Recorded lectures in Collegerama
109-136
C.
Collegerama as vodcast
137-170
D.
Subtitling of Collegerama
171-198
E.
Speech recognition
199-224
F.
Searching
225-264
Annexes
85-108
81
82
Annexes
Accompanying material
DVD 1.
Lectures CT3011
Lectures CT3011
28 lectures
Maps per Lecture, each map contains the following material:
Item
Description
Lectures (video)
Slides
SHoUT
Collegerama video’s of all lectures
Powerpoint presentations and abstracts
SHoUT- output and trancripts
Total
Files
28
622
117
Total size
(MB)
3.140
76
272
767
3.488
Lectures CT3011
Lecture no.15 Civiele Gezondheidstechniek
Maps for additional produced material from Lecture no. 15 Civiele Gezondheidstechniek:
Item
Description
Lecture presentation
Collegerama_CT3011_11
Vodcast
Vodcast Camtasia
Vodcast_TU
Presenter
Flash
Subtitles
Speech recognition
Powerpoint-file (plus pdf)
Collegerama files
Vodcast try-outs and results….
Screen captures Camtasia
Vodcasts by TUDelft
Presenter try-outs and results
Try-out for Flash
Subtitle files and try-outs
Initial try-outs and evaluation SHoUT
Total
Accompanying material
Files
2
296
42
3
2
59
31
24
12
Total size
(MB)
19
125
3.081
148
229
269
981
122
13
471
4.987
83
84
Accompanying material
Annex A.
1.
2.
3.
Recorded lectures at MIT
Massachusetts Institute of Technology
Introduction
Departments
Students
Staff
OpenCourseWare
Translated courses at MIT
Recorded lectures at MIT
Lectures
Recorded lectures in MIT OCW
Composition
Camera setup
Transcripts, captions and annotations
Technical specifications of videos
External publishing of MIT recorded lectures
YouTube
iTunes
VideoLectures.net
Academic Earth
Annex A. Recorded lectures at MIT
87
87
88
89
90
90
91
92
92
92
94
95
96
98
100
100
103
105
107
85
86
Annex A. Recorded lectures at MIT
1.
Massachusetts Institute of Technology
Introduction
Massachusetts Institute of Technology (MIT) is a private research university located in
Cambridge, Massachusetts in the United States. It is one of the most prestigious technical
universities in the world. Their reputation is based on their scientific output in the way of
articles and reports and the awards of their staff. Seventy-three members of the MIT
community have won the Nobel Prize, including seven current faculty members.
According to their website, the mission of MIT is to advance knowledge and educate students
in science, technology and other areas of scholarship that will best serve the nation and the
world in the 21st century.
(Source: http://web.mit.edu/registrar/stats/yrpts/index.html,
http://web.mit.edu/facts/faculty.html, http://www.universityportal.net/2007/09/worlduniversity-ranking-of-engineering.html and http://web.mit.edu/aboutmit)
Figure 1.1: MIT logo
Annex A. Recorded lectures at MIT
87
Departments
The education at MIT is organized into 6 "schools" which includes in total 30 departments,
sections and programs (http://web.mit.edu/education).
Table 1.1: MIT Schools with departments, sections or programs
School
Department, section or program
ID
Engineering
Civil and Environmental Engineering
Mechanical Engineering
Materials Science and Engineering
Electrical Engineering and Computer Science
Chemical Engineering
Aeronautics and Astronautics
Biological Engineering
Nuclear Science and Engineering
Engineering Systems Division
1
2
3
6
10
16
20
22
ESD
Science
Chemistry
Biology
Physics
Brain and Cognitive Sciences
Earth, Atmospheric, and Planetary Sciences
Mathematics
5
7
8
9
12
18
Management (Sloan School)
Business / Management
15
Architecture and Planning
Architecture
Urban Studies and Planning
Media Arts and Sciences (Media Lab)
4
11
MAS
Humanities, Arts, and Social
Sciences
Economics
Political Science
Anthropology
Foreign Languages and Literatures
History
Literature
Music and Theater Arts
Writing and Humanistic Studies
Comparative Media Studies
Science, Technology, and Society
Health Sciences and Technology
(Whitaker College)
Health Sciences and Technology
14
17
21A
21F
21H
21L
21M
21
W
CMS
STS
HST
The departments (ID) are numbered in the approximate order of when the department was
founded.
88
Annex A. Recorded lectures at MIT
Students
MIT has around 10,000 students (http://web.mit.edu/facts/enrollment.html) of which some
25% MSc students and some 35% PhD students.
Table 1.2: MIT students per faculty, and per grade (2009-2010)
Schools (Faculties)
Undergraduate
Sophomores (pre-academic),
others
Engineering
Science
Management
Architecture and Planning
Humanities, Arts, and Social
Sciences
Health Sciences and Technology
Total
Graduate
MSc
PhD
Special
1,092
-
-
-
1,851
827
174
69
140
-
1,070
13
879
402
33
23
1,636
1,047
114
179
277
337
101
11
12
3
7
2
4,153
3,220
3,590
136
(Source: http://web.mit.edu/facts/enrollment.html)
There are around 3,000 international students registered at MIT (30%), for the most part as
graduate student (85%) (http://web.mit.edu/facts/international.html). About 50% of these
international students originate from Asia and about 25% from Europe.
The total number of students at MIT is comparable to Twente and Delft. However, MIT has a
lot more PhD students and a relatively larger international population (Twente 12%, Delft
15%, for BSc and MSc only).
Table 1.3: Number of students at Twente and Delft
University
BSc
MSc
Other
PhD
Twente University (2008)
5,409
2,099
737
Not published
160 / year
Delft University of Technology (2007)
9,453
4,724
122
1,650
229 / year
Twente:
http://www.utwente.nl/feitenencijfers/onderwijs/totaal/inschrijvingen.doc/
Delft:
http://www.tudelft.nl/live/pagina.jsp?id=dcb20543-c4f7-4e6d-891a-703b9e8f6701&lang=nl
Students who apply for an education at MIT have to go through an evaluation program,
which focuses on academic potential, strong personal qualifications and outstanding interests,
activities and achievements (http://web.mit.edu/facts/admission.html). In 2008 alone, 13,396
candidates submitted their final applications for the freshman class and 1,589 (11.9%) were
offered admission. The actual first-year enrollment was 1,051 (69% of admitted).
Applicants for graduate degree programs are evaluated for previous performance and
professional promise by the department in which they wish to register. In 2008, 17,271
candidates applied for a graduate education. Of the 3,680 candidates who received offers of
admission, 2,300 or 63 percent registered in advanced degree programs at MIT.
Annex A. Recorded lectures at MIT
89
Nine months' tuition and fees for 2008–2009 is $36,390. Summer term tuition in 2008 was
$12,045 for students enrolled in courses. Additionally, undergraduate room and board is
approximately $10,860, dependent on the student's housing and dining arrangements. Books
and personal expenses are about $2,850 (http://web.mit.edu/facts/tuition.html). About 62%
of the undergraduates receive need-based financial aid and 87% of the graduate students are
supported by MIT fellowships, research assistantships, or teaching assistantships.
Staff
MIT employs about 11,500 people including around 3,500 researchers, 650 professors, 213
associated professors and 146 assistant professors (http://web.mit.edu/facts/faculty.html).
The number of employees is much larger compared to the Dutch technical universities,
because of their larger research programs and facilities. The University of Twente employs
2,804 people (2008) while Delft University of Technology employs 3,571 (2007).
OpenCourseWare
In 2000, MIT started the concept of publishing their course material on the internet for open
access, called OpenCourseWare (OCW) (http://ocw.mit.edu). They published the first proofof-concept site in 2002, containing 50 courses. By November 2007, MIT completed the initial
publication of virtually the entire curriculum, over 1,800 courses in 33 academic disciplines.
(http://ocw.mit.edu/OcwWeb/web/about/history/index.htm)
MIT publishes some courses in one or more translated versions, including Spanish,
Portuguese, Simplified Chinese, Traditional Chinese, and Thai.
Since 2008 MIT has added audio and video-taped lectures to their OCW website. These
lectures were recorded between 1999 and 2008. These lectures are also published on
YouTube, iTunes and VideoLectures.net.
(Source: http://www.youtube.com/profile?user=MIT and
itms://deimos3.apple.com/WebObjects/Core.woa/Browse/mit.edu and
http://www.videolectures.net)
Figure 1.2: MIT OCW homepage
The OCW concept has received an enormous attention worldwide, both from students as well
as from universities. In 2005 the OpenCourseWare Consortium was established to advance
education and empower people worldwide through open courseware. At present about 200
higher education institutions and associated organizations from around the world are a
member of this organization, including TU Delft, the Dutch Open University and HAN
University of Applied Sciences (Hogeschool van Arnhem and Nijmegen). Because of the
positive response on their OCW activities, MIT operates a rather large OCW office with close
to 20 people (http://www.ocwconsortium.org).
90
Annex A. Recorded lectures at MIT
Translated courses at MIT
MIT has formally partnered with four organizations that are translating OCW course materials
into Spanish, Portuguese, Simplified Chinese, Traditional Chinese, and Thai. OCW materials
have been translated into at least 10 languages, including French, German, Vietnamese, and
Ukrainian.
Figure 1.3: MIT translated courses
(Source: http://ocw.mit.edu/OcwWeb/web/courses/lang/index.htm)
Our example course 18.06 Linear Algebra is available in Chinese (simplified), Portuguese and
Spanish.
Figure 1.4: Course 18.06 in Chinese
In the translated courses, the recorded lectures are not provided with subtitles. The previous
method of linking to RealMedia streams is used.
Subtitles are only available at 60% of the lectures published at YouTube. In YouTube you
have the option of automatically translating all the subtitles into 43 languages using the autotranslate function (enabled by Google Translate).
Annex A. Recorded lectures at MIT
91
2.
Recorded lectures at MIT
Lectures
Since 1999 MIT has recorded lectures for the most popular courses.
The recorded lectures are presented in their internal network (http://web.mit.edu/[courseID],
presently migrating to an uniform umbrella http://stellar.mit.edu/). These websites function
as an E-learning System like BlackBoard or TeleTop.
Figure 2.1: E-learning for course 18.06 (past and present)
(Source: http://web.mit.edu/18.06/www/index.html and http://stellar.mit.edu/S/course/18/sp09/18.06/)
Initially these video's were presented as RealMedia files and offered in 3 different bandwidth
types, 56k, 80k and 220k. The highest quality version contains a video with a resolution of
320x240 pixels at 15 frames per second and an audio quality of 8,000 Khz at 8.5 Kbps.
Since 2005 more lectures were recorded. These lectures are published as RealMedia files
(220k), MP4 files (iTunes) and FLV (YouTube), all at 320x240 pixels at 15 frames per second.
Since 2008, lectures were recorded at a higher quality for the purpose of publishing them as
high quality videos on YouTube. These have a standard resolution of 480x360 pixels at 30
frames per second.
A part of the recorded lectures are provided with transcripts. These are also use for subtitling
and automatic subtitle translation on YouTube. You will see an icon light up that says CC,
which stands for closed caption.
A couple of published courses have Flash video's as further teaching material, sometimes with
voice narration added. These video's are all animations and do not contain any pre-recorded
material. They are also available in MIT-OCW under 'Lecture notes' or Assignments.
Recorded lectures in MIT OCW
The recorded lectures were included in their OpenCourseWare program since 2008. For
publishing these, they have a special section on their OCW website.
Figure 2.2: Multimedia section on MIT OCW
(Source: http://ocw.mit.edu/OcwWeb/web/courses/av/index.htm)
92
Annex A. Recorded lectures at MIT
Figure 2.3: Video page of Lecture 33 of Course 18.06 on MIT-OCW
(Source: http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-005/VideoLectures/detail/lecture33.htm)
In OCW, most of the recorded lectures are offered as RealMedia (220k with above mentioned
specifications) and MP4 (converted from the 220k RealMedia recording). For streaming video,
a link is presented to http://videolectures.net, from where the lectures could also be
downloaded.
These recorded lectures are also presented as embedded movies from YouTube. In the
YouTube player, the user can navigate to the related lectures of this course (via the standard
YouTube button on the right bottom). For some recorded lectures, transcripts are also
available and shown under the embedded YouTube movie.
Recorded lectures in MIT OCW might originate from a previous year, as suggested by the
OCW course dating. Above mentioned course 18.06 is dated spring 2005. The videos were
recorded live in the fall of 1999. The attached reading is a book that has been released in
February 2009.
More recent recordings are only presented on the MIT OCW website as MP4 downloads and
as embedded YouTube streaming video. This course is announced on VideoLectures.net (April
2009).
Figure 2.4: Video page of Lecture 1 of Course 5.60 on MIT-OCW (recent recorded lecture)
(Source: http://ocw.mit.edu/OcwWeb/Chemistry/5-60Spring-2008/VideoLectures/detail/embed01.htm)
Annex A. Recorded lectures at MIT
93
Composition
Every MIT video has a camera that is fixed on the front side of the classroom. You generally
see a walking professor explaining certain topics in front of a whiteboard. The video camera
follows the professor and zooms in and out on the blackboard when the professor is writing
on it. Sometimes even part of the searing area of the classroom is visible and you can see
sitting students and people walking in.
Figure 2.5: Screen shot for recorded lecture (Course 18.06 - Lecture 33)
(Source: http://www.youtube.com/watch?v=sWh92ZnYfZE)
Most MIT professors are only using the blackboard. PowerPoint slides, overhead projectors or
projected illustrations are not often used. In some courses the professor uses an electronic
blackboard, computer projected slides or overhead projectors. The content of these slides
might be included in the video by zooming to the projected screen, or the recorded video
might show at the relevant moment a text screen referring to the lecture material. Most often
these slides are available as PDF-file under the 'Lecture notes'.
Figure 2.6: MIT lecture with professor using slides, which are also included in the video
(Source: http://www.youtube.com/watch?v=R90sohp6h44)
Figure 2.7: MIT lecture with professor using an electronic blackboard, with related slides/screen slides (not recorded
in the video)
(Source: http://www.youtube.com/watch?v=tynCH4dosA8)
94
Annex A. Recorded lectures at MIT
Figure 2.8: Sometimes the video is blacked for copyright reasons
(Source: http://www.youtube.com/watch?v=UxdUvyBtfXY)
The copyright issue is partly the reason why the recorded lectures at MIT do not show slides,
illustrations etc.
Camera setup
Initially, a lecture was recorded with two cameras, one camera for the overview and one
camera for the close-ups.
Figure 2.9: Two different camera angels during a lecture
(Source: http://www.youtube.com/watch?v=sWh92ZnYfZE)
More recent recordings were done with up to 4 cameras.
Figure 2.10: Two other camera angels during a lecture
(Source: http://www.youtube.com/watch?v=2x3F08_8B80)
All these multi-camera recordings were apparently done under the supervision of a movie
director, since some post-recording editing and camera angle selection would have to be
done.
Annex A. Recorded lectures at MIT
95
Most of the latest recordings were done with a single camera. Details of demonstrations and
blackboard writings are visible in these recordings by intensively using the zoom function of
the camera:
Figure 2.11: Single camera recording
(Source: http://www.youtube.com/watch?v=kLqduWF6GXE)
Transcripts, captions and annotations
Around 60% of the recorded lectures of MIT are provided with a transcript. The transcripts
are presented on the MIT-OCW website, on the page of the related lecture under the
embedded YouTube movie. Most often transcripts are also available as a PDF file.
In YouTube these transcripts are used for the YouTube Caption option, which shows subtitles
in the bottom part of the movie. Captions or subtitles are available in YouTube since August
2008.
Figure 2.12: Recorded lecture with captions on YouTube
(Source: http://www.youtube.com/watch?v=sWh92ZnYfZE)
(Note the new "Turn down the lights"-button in the upper right corner)
The captions of this older recording are most probably created afterwards by computers,
considering the rather poor timing of the subtitles. More recent recordings have a much
better timing of the captions.
96
Annex A. Recorded lectures at MIT
Figure 2.13: Recent (27 January 2009) published lecture with captions
(Source: http://www.youtube.com/watch?v=ZwpwmGP5ITM)
At starting a YouTube movie with captions, the player shows for a few seconds the language
in the left upper corner (English for MIT lectures). This text is also shown after restarting the
caption during display of the movie.
Movies on YouTube can have captions in different languages. The viewer can select from the
available languages via the CC button. MIT lectures are uploaded to YouTube with English
captions only.
Alternatively, the viewer can start the auto-translate option, which provides for an online
translation of the (selected) captions. The viewer can select from 43 languages, including
Dutch, Chinese (simplified and traditional) and Indonesian. Apparently, this option uses
Google Translate. The auto-translate option is available in YouTube since November 2008,
benefiting from the ongoing improvements and expansions of Google Translate.
(Source: http://www.youtube.com/blog?entry=oqBeXa7v_aE, http://translate.google.com/#
and http://www.google.com/intl/en/help/faq_translation.html)
Figure 2.14: Auto-translating MIT lectures in YouTube
(Source: http://www.youtube.com/watch?v=ZwpwmGP5ITM)
Figure 2.15: Online translation of captions (English to Dutch) in YouTube
(Source: http://www.youtube.com/watch?v=ZwpwmGP5ITM)
Annex A. Recorded lectures at MIT
97
Since June 2008, YouTube can also show annotations on their movies, provided by the
uploader of the movie. MIT does not use this option (no annotations loaded).
(NB Since February 2009 also viewers can make annotations on viewed movies
http://www.youtube.com/blog?entry=cfPYFjnzJIk)
Technical specifications of videos
In general the older recorded MIT lectures have a resolution of 320x240 pixels, at a frame
rate of 15 fps. VideoLectures.net uses a larger resolution for their wmv and flv files, resulting
in larger files but without any quality improvement. These files are probably converted from
the MIT MP4 files.
Table 2.1: Technical specifications of older recorded lecture movies (Course 18.06 - Lecture 33)
Type
Location
Size
(MB)
Specifications
RealMedia
MIT-OCW +
VideoLectures
59.4
Video: RealVideo 8 / 320x240 / 15 fps / 188 Kbps
Audio: RealAudio 4.0 / 8.5 Kbps
MP4
MIT-OCW +
iTunes +
VideoLectures
89.8
Video: MPEG4 Video / 320x240 (AR 1:1) / 15 fps
Audio: AAC 44100Hz / stereo / 1411 Kbps
FLV
MIT-OCW +
YouTube
96.7
VideoLectures
156.4
Video: Flash Video 1 / 320x240 / 15 fps
Audio: MPEG Audio Layer 3 22050Hz / mono / 8
Kbps
VideoLectures
366.3
WMV
Video: Flash Video 4 / 352x288 / 15 fps
Audio: MPEG Audio Layer 3 44100Hz / mono /
64Kbps
Video: Windows Media Video 9 / 352x288 15 fps
650Kbps
Audio: Windows Media Audio 44100Hz / stereo /
96Kbps
Length: 41:52 (rm, flv on YouTube), 41:59 (MP4, wmv, flv on VideoLectures)
In Google Video the 18.06 course is used to show that recorded lectures can be played at
double speed with the VLC media player
(http://sites.google.com/site/variablespeedlectures/).
More recent recordings have the same technical specifications for the MP4 files, but are also
available as HQ movies on YouTube, with a larger resolution and a larger frame rate.
Table 2.2: Technical specifications of recent recorded lecture movies (Course 5.60 - Lecture 1)
Type
Location
Size
(MB)
Specifications
MP4
MIT-OCW +
iTunes
101
Video: MPEG4 Video / 320x240 (AR 1:1) / 15 fps
Audio: AAC 44100Hz / stereo / 1411Kbps
FLV
MIT-OCW +
YouTube
110
Video: MPEG4 Video (H264) / 320x240 / 30 fps
Audio: AAC 22050Hz / stereo
FLV HQ
MIT-OCW +
YouTube
245
Video: MPEG4 Video (H264) / 480x360 (AR 1:1) /
30 fps
Audio: AAC 44100Hz / stereo
Length: 46:46 min (MP4 and flv)
98
Annex A. Recorded lectures at MIT
The quality of the MP4 video is better than the older recordings, despite its comparable
technical specifications. This is most probably caused by the inferior RealMedia codecs used
in the older recordings.
In YouTube this video is available as High Quality (with HQ symbol), giving the viewer the
option to switch to 'normal quality'. The HQ YouTube recording has a larger resolution and a
better audio than the MP4 file or the normal FLV file (480x360 / 44 kHz versus 320x240 / 22
kHz). This higher resolution significantly improves full screen display. The YouTube video's
have a larger frame rate than the MP4 file (30 versus 15 fps). This should give a better
visibility of movements (moving objects either or camera movements).
Since January 2009 YouTube is converting its video store from the original FLV format with
Sorenson codec into the FLV format with H.264 codec (MP4).
The video quality of the recently recorded MIT lectures is comparable to the Collegerama
lectures (wmv / 320x240 / 30 fps). The lectures on HQ YouTube have a better quality than
used in Collegerama.
The captions of the YouTube movies are made in a SubViewer (*.SUB) file either or in a
SubRip
(*.SRT)
file
being
the
only
formats
supported
by
You
Tube
(http://help.youtube.com/support/youtube/bin/answer.py?answer=100077). These formats
do not contain language information. The language of the caption file is added as part of the
YouTube upload procedure. It is possible to upload separate files for different languages to
the same movie. Apparently MIT lectures are uploaded to YouTube with English captions
only.
As part as the upload procedure the uploader selects a font size for the captions.
The auto-translate option in YouTube use the Google Translation API:
http://code.google.com/intl/nl/apis/ajaxlanguage/documentation/
Annex A. Recorded lectures at MIT
99
3.
External publishing of MIT recorded lectures
YouTube
YouTube (http://www.youtube.com) is the most popular website for video content. Nearly
20% of all global internet users visit YouTube with some 16 page views per visit (alexa.com,
ranked #3 after Google and Yahoo).
Since March 2009 (http://www.youtube.com/blog?entry=uvxBVPuf4A8) YouTube has a
special section for Education (YouTube EDU: http://www.youtube.com/edu) in which about
150 universities and colleges from the USA have submitted some 25,000 video's (April 2009).
These are not all recorded lectures, but also short movies (6 - 12 min.).
Figure 3.1: YouTube EDU section
(Source: http://www.youtube.com/members?s=ytedu_ms&gl=US)
Starting in January 2008 MIT is publishing all their recorded lectures on YouTube. For this an
user section has been created (http://www.youtube.com/profile?user=MIT). With over
25.000 subscribers, MIT is the most popular EDU member. Berkeley and Stanford are
subscribed slightly less.
Figure 3.2: MIT channel on YouTube
(Source: http://www.youtube.com/profile?user=MIT)
100
Annex A. Recorded lectures at MIT
At present (April 2009), a total of 893 recorded MIT lectures are available on YouTube, from
49 courses. The number of recorded lectures per course varies from 4 to 51 (on average 18).
The recorded lectures mostly have a length of around 50 minutes, but may vary between 40
and 120 minutes. Shorter videos are most often introduction videos. At present (April 2009)
around 30-40 recorded lectures are published each month.
YouTube has a play list for each course.
Figure 3.3: YouTube play list for course 18.06
(Source: http://www.youtube.com/view_play_list?p=E7DDD91010BC51F8)
The playlist view gives a link to all recording of the lecture, including an introduction video if
available. The play list also shows the ratings and views for each of these videos.
The most viewed MIT lecture on YouTube is the first lecture of the course 18.06 Linear
Algebra (http://www.youtube.com/watch?v=gVMRuLH6FdQ). This lecture is viewed about
201,000 times between January 2008 and April 2009 (440 views per day). The succeeding
lectures are viewed roughly 5,000 and 6,000 times (12 views per day).
Most probably only a very limited number of these viewers have watched the whole lecture.
Figure 3.4: Views on YouTube of course 18.06 (period January 2008 - April 2009)
(Source: http://www.youtube.com/view_play_list?p=E7DDD91010BC51F8)
YouTube also allows for rating and commenting the movies. The most viewed lecture (see
above) is rated 498 times and commented 360 times (during 16 months).
Annex A. Recorded lectures at MIT
101
Figure 3.5: Lecture page for the most viewed MIT lecture on YouTube
(Source: http://www.youtube.com/watch?v=gVMRuLH6FdQ)
This lecture is rated as 5 stars ('Awesome'). The comments are mainly positive ("free
lectures"). Later lectures of this course are rated and commented much less . Lecture 33 has
received a rating of 4 and 1 comment (http://www.youtube.com/watch?v=sWh92ZnYfZE).
YouTube also registers the website from which the viewer has reached the page. For the
most viewed lecture around 10% originates from the MIT-OCW site. The rest might come
from YouTube itself, or might be unregistered.
Since February 2009, YouTube viewers can make annotations on viewed movies
(http://www.youtube.com/blog?entry=cfPYFjnzJIk). This feature can be disabled by the
uploader (owner) of the videos. Apparently, annotations are blocked by MIT.
Since February 2009, YouTube is testing the possibility for downloading their movies via an
additional download button (http://www.youtube.com/blog?entry=Mp1pWVLh3_Y). Owners
of the movie (uploaders) may allow viewers for this option, for free either or charged. MIT is
not one of the testers for this option.
102
Annex A. Recorded lectures at MIT
iTunes
iTunes is the media player of Apple. The program organizes and plays all kinds of digital
media. It forms the direct link to the "iTunes Store", from which users can buy and download
songs and other multimedia content. On January 6, 2009 Apple announced that over 6 billion
songs had been downloaded since the service first launched on April 28, 2003.
iTunes U is a part of the iTunes Store featuring free lectures, language lessons, audio books
and more. At present (April 2009) iTunes U holds over 100,000 educational audio and video
files from top universities, museums and public media organizations from around the world.
About 100 international universities and colleges have published content on iTunes U,
including MIT, Yale, Stanford, UC Berkeley, Oxford, Cambridge, Freiburg, Lausanne, TU
Aachen and Melbourne.
Figure 3.6: ITunes U in iTunes
MIT has a section on iTunes U, as one of their featured providers (next to amongst others
Yale, Oxford, UK Open University).
Figure 3.7: MIT section on iTunes U
In the MIT section the MIT-OCW content is clustered into 21 sections largely reflecting the
MIT departments. At present (April 2009) MIT has some 1,500 files available, including video,
audio, and transcripts.
Moreover the MIT section includes some 200 other files.
Selecting an OCW cluster shows the courses of the selected department.
Annex A. Recorded lectures at MIT
103
Figure 3.8: MIT-OCW Mathematics
Selecting a course gives the video recording and transcripts of the selected course.
Figure 3.9: Play list of Course 18.06
Selecting a video by clicking the 'Get movie' button starts downloading the video, which is
stored on the user's library and from there available to be played in iTunes.
Figure 3.10: Course 18.06 lecture 1 played in iTunes
In iTunes, the transcripts of the recorded lectures can also be downloaded (if available).
These downloadable files are the PDF-files which are also available on the MIT-OCW page of
the respective lecture. iTunes does not show the number of downloads and also does not
allow for ratings or comments by users.
104
Annex A. Recorded lectures at MIT
VideoLectures.net
VideoLectures.net is a portal for recorded lectures (and interviews). It started on 2002 as a
project at the Jozef Stefan Institute in Slovenia. VideoLecture has offices in Slovenia and the
Ukraine. At present (April 2009), the portal contains nearly 7,000 video's from around 4,500
presenters all over the world.
Figure 3.11: Homepage of VideoLecture.net
(Source: http://videolectures.net/)
VideoLectures.net has no provisions for transcripts or for closed captions.
Since October 2008, MIT OpenCourseWare has its own portal site on VideoLectures.net,
which contains all recorded lectures of MIT-OCW.
Figure 3.12: MIT-OCW portal at VideoLectures.net
(Source: http://videolectures.net/mit_ocw/)
The MIT-OCW lectures are available as streams (flv and wmv) and as downloads (rm, MP4,
flv and wmv).
Each course has an overview page containing links to the MIT-OCW course and a visual
overview of all recorded lectures of that course. This overview includes the length of the
video and the number of views.
The first lecture of the 18.06-course has 681 views and the last lecture has 15 views. There
are 3 comments on this course. The course is most probably available since the beginning of
March 2009. This means some 17 views per day for the first lecture, compared with 440
views per day for this lecture on YouTube.
Annex A. Recorded lectures at MIT
105
Figure 3.13: Homepage at VideoLectures of the 18.06 Linear Algebra course
(Source: http://videolectures.net/mit1806s05_linear_algebra/)
Each lecture has its own page, showing the available streams and downloads, as well as the
rating and comments of the viewers on the lecture itself. For rating and commenting, the
viewer should be logged in.
Figure 3.14: Video page of Lecture 33 of Course 18.06 on VideoLectures.net
(Source: http://videolectures.net/mit1806s05_strang_lec33/)
This lecture is available as:
• streams: FLV (JW-player Flash), WMV (Windows Media Player)
• downloads: FLV, MP4, RealMedia, WMV
There are no comments or ratings submitted by viewers of this lecture (nor for other lectures
of this course).
Transcript and subtitles of the MIT lectures are not available at VideoLectures.net.
106
Annex A. Recorded lectures at MIT
Academic Earth
Academic Earth (http://academicearth.org) is a website for recorded lectures from selected
universities. They collect videotaped lectures from the internet and present this on their own
website. Their website received more than one million visits in the first three months since its
January 18, 2009 beta launch, with more than 50% of users coming from outside of the US
(http://www.slideshare.net/academicearth/academic-earth-one-million-visits-1364816).
At present (April 2009) they show 60 videotaped courses (including 1,500 lectures) and 900
guest lectures from 6 USA universities, i.e. Berkeley, Harvard, MIT, Princeton, Stanford, and
Yale. They don't have any formal relation with these universities, but they collect the
videotaped lectures from the OpenCourseWare sites of these universities.
Figure 3.15: Homepage of Academic Earth
The course and lectures are sorted by origin (university) and subject. Top ratings (visits) are
presented per course, playlist, lecture and instructors. Their website has a very colorful user
interface which is filled with pictures, which makes the whole appeal very attractive to
visitors.
By selecting MIT under Universities, the user gets an overview of the 23 MIT courses
included, listed by 'Subject' (Chemistry, Mathematics, etc.), as well as by 'Featured'
(Featured, Most view, Top rated).
Figure 3.16: MIT page on Academic Earth
(Source: http://academicearth.org/universities/mit/)
Annex A. Recorded lectures at MIT
107
Figure 3.17: MIT courses on Mathematics at Academic Earth
(Source: http://academicearth.org/universities/mit/subject:20)
By selecting a course the viewer gets all lecture of this course.
Figure 3.18: Homepage at Academic Earth of the 18.06 Lineair Algebra course
The course number at Academic Earth might differ from the course number in the original
OCW site. Lecture 33 of Course 18.06 of MIT is presented as Lecture 34 at Academic Earth
due to the fact that the original course includes a course number 24 and 24B.
The video page of a course shows the video itself, as well as links for sharing on different
social networks (Facebook, Digg, etc.), code for embedding the video on user's webpage,
citation to the original source, the available downloads, and subscription on iTunes and RSS
for video or audio.
Figure 3.19: Video page of Lecture 33 of Course 18.06 on Academic Earth
The available downloads include one video file (mp4, 240 MB, 320x240, 15 fps) and one
audio file (mp3, 12.4 MB). The downloaded video has the same specifications as the mp4 file
available at MIT OCW, but is nearly 3 times larger in file size (240 vs. 89.8 MB). For Lecture
33, a transcript is also available at Academic Earth. At the video page the mean rating result
is shown and the number of viewers who have rated the lecture. The actual number of views
is not shown.
108
Annex A. Recorded lectures at MIT
Annex B.
1.
2.
3.
4.
5.
Recorded lectures in Collegerama
History
Development of Collegerama
The early years
Collegerama as a service
Collegerama live
OpenCourseWare
Collegerama for research congresses
Collegerama for famous speakers
Collegerama as streaming video server
Collegerama at the University of Twente
Recording of a lecture
Collegerama recording
Presentation options
Multimedia lectures with Collegerama recording
Disturbed recording
Presentation of recorded lectures
Course catalog
Alternative display options
Player options
Navigation
Technical specifications
Data storage
Synchronization between slides and video
Evaluation
Annex B1.
Collegerama lectures of CT3011 in OpenCourseWare
Annex B. Recorded lectures in Collegerama
111
111
111
113
114
115
116
116
117
118
120
120
121
122
123
124
124
125
126
127
128
128
128
131
132
109
110
Annex B. Recorded lectures in Collegerama
1.
History
Development of Collegerama
The section Multimedia Services (MMS) of Delft University of Technology started in 2000 with
the development of Collegerama (www.collegerama.nl) in a pilot project on Streaming media
(http://www.linkedin.com/in/paulcopier). The main goal was the recording of lectures which
could be viewed by students within Blackboard (http://blackboard.tudelft.nl), the digital
learning system used at TU Delft. These 'web lectures' were regarded as instruments to
improve study results and to increase the efficiency of the education at TU Delft.
MMS selected the commercially available system Mediasite, by Sonic Foundry, as a basis for
Collegerama. The term Collegerama is a private brand created by the TU Delft, so that they
could be independent from the technical infrastructure for their web lectures.
Selecting a standard product avoids the high development cost for creating a new system. By
using an existing solution, the university also has the added benefit of getting new updates
and features within the Mediasite platform. At present Sonic Foundry claims to be the global
leader for enterprise webcasting (www.sonicfoundry.com). Mediasite is their flagship product
(www.sonicfoundry.com/mediasite/).
Figure 1.1: Collegerama is a TU Delft brand name for weblectures, at present based on the Mediasite platform
The early years
An overview of the developments of Collegerama can be found on the Collegerama main
catalog page (http://collegerama.tudelft.nl).
From September till December of 2004, 3 presentations have been recorded with Mediasite as
part of tests for the technical infrastructure of Collegerama. These web lectures were filmed
with very poor audio recording equipment (no special microphone for the speaker) and a
small sized video recording (256x192 resp. 240x180). By that time, 240x180 was the
standard video size for Mediasite recordings.
Prior to this, in April and May 2004, Mediasite was already used for recording 25 lectures (4045 minutes) of the BSc course TN2012 Quantum mechanics. This was the last year in which
professor Barend Thijsse was teaching this course, who was recognized as an outstanding
teacher. He gave that course together with his successor professor Leo Kouwenhoven. They
used a tablet PC as a blackboard and had speaker microphones attached to their shirts or
jackets. The recorded courses were used during the succeeding years as a reference until a
drastic curriculum change in September 2008.
Annex B. Recorded lectures in Collegerama
111
Figure 1.2: Collegerama recording of the 25 lectures of Barend Thijsse and Leo Kouwenhoven in 2004
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=735a8c5902864988b01157c16f8e632e)
In April of 2005, Collegerama was used for the recording of 4 presentations of the Future
Design competition. These presentations, with durations between 15 and 21 minutes, were
used for promotional purposes. The presentations were recorded in the MMS studio as
opposed to a classroom. The video size is still small (240x180), but the speaker voices were
now recorded with the use of a small head microphone.
Figure 1.3: Collegerama recording of the Future Design competition in 2005
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=04f4279ee6c14782aee3d2603143e207 )
In January of 2006, Collegerama was used for recording of the closing speech by the Rector
Magnificus, Prof. Dr. Ir. J.T. Fokkema, at the 164th Dies Natalis of Delft University of
Technology. This was the start of a yearly tradition in recording these speeches. This time the
video was recorded at a resolution of 320x240, which is still the standard Collegerama video
resolution today.
Figure 1.4: The Collegerama recording of Dies Natalis in 2006 started a yearly tradition for Collegerama recordings
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b3338ad192f043acae7193f83e2033e5)
112
Annex B. Recorded lectures in Collegerama
Between September and December of 2006 Collegerama was used for recording the 30
lectures (40-45 minutes) of the BSc course TN2545 Signals and Systems by prof. Lucas van
Vliet. Normally this course was given in the Dutch language. The recorded course was given
in English to allow non-Dutch speaking students to follow this course. The recorded lectures
consist of videos showing the lecturer and synchronized screenshots of a Tablet PC, used as
an interactive blackboard. These recorded lectures were actually used for several years, until
in September 2009 a new lecturer took over the course. They are currently available on
Blackboard as reference material.
Figure 1.5: Collegerama recording of course TN2545 in fall 2006, using a Tablet PC as interactive blackboard
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9)
Collegerama as a service
Starting in September of 2007 Collegerama became part of the regular facilities at TU Delft,
for education under the responsibility of the University Corporate Office for Education and
Student Affairs (O&S). This office is also responsible for the Electronic learning system
Blackboard.
As a consequence, recording of lectures was financed by the Corporate Office and became
free for the lecturers at the TU faculties. Before that time recordings were made at a rate of
€ 500 per recorded session of 45 minutes. Moreover the scheduling of recording units and
operators is now organized by O&S and lecturers can apply there to have their lectures
recorded.
This service has resulted in a dramatic increase of recorded lectures. In September and
October of 2009, around 60 to 75 lectures are recorded each week (30 to 40 sessions of 2
lectures of 45 minutes each). This amounts to 5% of all lecture hours given each week at TU
Delft.
Eight mobile recording units are available for Collegerama. These units can be used at all
faculties of TU Delft. At the faculty of Mechanical Engineering, 2 lecture rooms are provided
with Collegerama recording equipment. In September 2009 this faculty was faced with a
huge student overflow. The 500 first-year students did not fit in the largest lecture room
available, which had a capacity of 300 students. To overcome this problem, they used two
lecture rooms for the first-year lectures. In one lecture room the lecturer gives the lecture
which is recorded and streamed to the other lecture room. The recorded lectures are
available afterwards via Blackboard. The faculty calls this service: "lectures in a movie
theater".
Annex B. Recorded lectures in Collegerama
113
Figure 1.6: The Collegerama recording is streamed to the 'movie theater' next door at Mechanical Engineering
(Source: Delta 27, 17 September 2009)
In September 2009 a total of around 5.000 lectures have been recorded at TU Delft. A
Collegerama lecture requires approximately 200 MB of server storage capacity. The total
storage requirement of the Collegerama server is around 1 TB, which is about 10% of the
storage requirement of the Blackboard system at TU Delft.
Collegerama live
The mobile recording units of Collegerama have a personal storage unit. After the lecture has
finished recorded, these are uploaded to the central Collegerama server. It is also possible to
stream this recording to the server immediately while recording, thus generating a live stream
to the outside world. This live streaming process has a 5 to 10 seconds delay between
recording and broadcasting.
In the Collegerama setup, a URL for a Collegerama lecture is automatically created 4 hours
before the recording. This URL is published before the lecture starts, so that every student
can watch it from their own room or any other study location that has live Internet access.
This live streaming system was used for the course CT2011 Watermanagement in
September-October 2009. The course was moved within the curriculum from the third year to
the second year, which caused the student attendance to double to about 500 students. This
largely exceeded the maximum seating capacity of the largest classroom available at the
faculty of Civil Engineering (it holds only 350 students). To reduce the number of students
attending the lectures, they were scheduled on Monday and Friday during the first two
lecture hours. Moreover the lectures were announced to be broadcasted live. The lectures got
a rather wide media attention under the title 'lecture from your bed'.
The system was a huge success. After the initial lecture, the number of attending students
reduced to around 100 attendees, with a large number of viewers during lecture hours or
several hours after the lecture. The movie theater lecture room stayed empty after the first
lecture.
114
Annex B. Recorded lectures in Collegerama
Figure 1.7: The Collegerama-live recordings in September 2009 for the course CT2011 were announced as 'lectures
from your bed'
(Source: Delta 27, 17 September 2009)
OpenCourseWare
In March 2007, the TU Delft started a pilot project for OpenCourseWare
(http://ocw.tudelft.nl). In this pilot project the course material of around 20 MSc courses
from 6 different disciplines were published. Collegerama lecture recordings were part of this
material. This initiative was very well received and the Collegerama recordings were said to
be of extraordinary quality. Because of the national and international response, the TU Delft
decided in 2008 to continue its OpenCourseWare program at a more extensive scale.
Figure 1.8: Collegerama recordings form an important element in the OpenCourseWare program of TU Delft
(Source: http://ocw.tudelft.nl)
In October 2009, the TU Delft hosted the yearly conference of the OpenCourseWare
consortium,
in
which
more
than
200
universities
worldwide
are
active
(www.ocwconsortium.org). The Director of Education and Student affairs of TU Delft is a
member of the board of the OCW Consortium.
TU Delft works on the renewal of its OpenCourseWare website. One of the goals for this is to
give the recorded lectures a more pronounced exposure at the OCW website and to give it
the look and feel of the original Blackboard courses.
Annex B. Recorded lectures in Collegerama
115
Collegerama for research congresses
Collegerama is not only used for lectures.
The first scientific conference fully recorded with Collegerama at TU Delft was held in June
2007.
Figure 1.9: The first scientific conference fully recorded in Collegerama was held in June 2007
(Source: http://drinkwater.tudelft.nl)
Afterwards more congresses, inauguration speeches, PhD defenses have been recorded with
Collegerama. As an example of this the yearly vacation course in Drinking water and
Wastewater can be mentioned. Every year over 400 water professionals take part in this
course to listen to presentations by national and international experts. The entire course of
January 2008 and January 2009 were recorded with Collegerama. These recordings are
publicly available at http://drinkwater.tudelft.nl.
Collegerama for famous speakers
Collegerama offers the opportunity to record the speeches of famous speakers. Examples are
the recordings of the Dutch astronaut André Kuiper (in September 2004), of the famous
Italian designer Alberto Alessi (in December 2004), of the President of the Republic of
Mozambique Mr Armando Emilio Guebuza (in February 2008) and of the Dutch Crown prince
Willem Alexander (in June 2009) (http://collegerama.tudelft.nl).
Figure 1.10: The Dutch Crown prince as a speaker at the conference in Delft on Sustainable built environments in
June 2009
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=4af4ce446fe142189e76e920acfcdc15)
116
Annex B. Recorded lectures in Collegerama
Collegerama as streaming video server
Collegerama is not only used for recorded lecturers. It is also used as the streaming video
server for all kind of video recordings. In 2006 the docusoap 'Delft Blauw' was broadcast at
RTL5 TV. This TV show followed the daily life of 9 students at the TU Delft who lived in a
board house in the old city of Delft. It was made in cooperation with the university to show
different aspects of their educational and research programs. A total of 13 episodes have
been made, which are still available at the TU Delft website in the form of Collegerama
recordings.
Figure 1.11: The TV serie Delft Blauw, (recorded in 2006) can still be seen in Collegerama at the TU Delft website
(Source: http://www.tudelft.nl/live/pagina.jsp?id=260550da-bf3a-48a1-9a44-94e367d7f041)
In September 2009 the Executive Board of TU Delft started using Collegerama for video
messages, to communicate with employees and students of TU Delft. This is done in the form
of recorded monologs or interviews with an average duration of 5 to 10 minutes. At present,
no slides are used for these recordings.
The video messages and interviews are in the Dutch language and provided with English
subtitles (hard branded in the movie recording). The Collegerama recording is embedded in
the TU Delft website. They have also been uploaded to YouTube within the TU Delft channel
(http://www.youtube.com/user/tudelft). Apparently the university prefers to embed the
Collegerama viewer over embedding the YouTube upload (avoiding YouTube branding).
Figure 1.12: The TU Delft board uses Collegerama for video messages (October 2009)
(Source: http://www.tudelft.nl/live/pagina.jsp?id=e323c397-d00a-4721-9f15-67fb278e1b36 and
http://www.tudelft.nl/live/pagina.jsp?id=4e31e711-82fe-4971-8d33-53998efeadc8)
Videos of more than 10 minutes are only available at the Collegerama platform. At the time of
writing (October 2009), the TU Delft does not have a YouTube Edu account which would
allow the uploading of longer videos.
Annex B. Recorded lectures in Collegerama
117
Collegerama at the University of Twente
In 2007 the University of Twente started a pilot project on videotaped lectures. This pilot
project used the experience of TU Delft with its Collegerama system. The same technical
infrastructure of Collegerama is also used at the University of Twente.
Within the Twente pilot project, the lectures of 10 BSc courses have been recorded. One of
the courses was the course 214020 Algorithms, Data structures and Complexity. Between
November 2007 and January 2008, 8 of their lecture sessions have been recorded.
Afterwards the 7th lecture session was not available due to technical difficulties. The recorded
sessions include two lecture hours (40 minutes each) and the intermediate coffee break (a 15
minute recording of a clock).
Initially these recorded lectures were hosted at the TU Delft Collegerama server:
http://collegerama.tudelft.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a02834b7f1d0dfdb
The URLs of these lectures were published at Teletop, the digital learning system of the
University of Twente. Somewhere in the summer of 2009 the recordings were moved to the
University of Twente servers (or domain):
http://videolecture.utwente.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a02834b7f1d0dfdb
Figure 1.13: Collegerama recording of the course 214020 at University of Twente in November 2007
(Source: http://videolecture.utwente.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a028-34b7f1d0dfdb)
After each recorded course, an evaluation form was used to register the opinion of the
students. Based on the positive results of the pilot project it was decided to continue the
project. Since September 2008, lectures at the University of Twente can be recorded with
Collegerama i.e. Mediasite (http://www.utwente.nl/videolecture/).
Figure 1.14: Videolectures as a service, at University Twente since September 2008
(Source: http://www.utwente.nl/videolecture/)
118
Annex B. Recorded lectures in Collegerama
At the University of Twente, 2 lecture rooms are available with recording facilities for
Collegerama/Mediasite (Horst C101 and Cubicus B209). There is also one mobile recording
unit available (Spiegel, for room 1, 2, 4 and 5). This mobile set can also be used in other
buildings and lecture rooms, if requested. The service for recording lectures is free of charge
and is provided by the ICT Service Centre of the University of Twente.
In September 2009, the University of Twente started using Blackboard as its digital learning
system as a replacement for Teletop. At present, the TU Delft and the University of Twente
use the same technical infrastructure for both the digital learning environment as well as the
system for streaming recorded lectures.
Annex B. Recorded lectures in Collegerama
119
2.
Recording of a lecture
Collegerama recording
Collegerama has two possibilities for recording the lectures. They can either use a stationary
setup that has been placed at a few classrooms at the TU Delft, or they can use the mobile
station which used at any given location. Both of the systems consist of a stationary webcam
which can be operated remotely by use of a joystick. The operator, usually a student aid,
makes sure that the camera is always pointed at the lecture while the lecturer is moving
around the classroom.
Figure 2.1: Stationary recording unit for Collegerama
Figure 2.2: Mobile recording unit for Collegerama
The laptop that comes with the presenter unit is connected to a beamer, so that the
PowerPoint slides can be viewed in the classroom and recorded by the system. The recording
system takes screenshots of the computer screen that is visible on the beamer, based on
computer activity. Every 1 to 4 seconds, the system checks for a change on the computer
screen. If a different slide has been loaded or the position of the mouse has been changed, a
new screenshot will be saved as a JPEG image file. The disadvantage of this is that a lot of
abundant images might be saved and used in the eventual Collegerama presentation, which
is later published online.
After the lecture has been given, the data will be sent to the presentation server. It will
process the different data sources and create three different outputs:
• an audio/video stream (wmv file)
• pictures of the different PowerPoint slides or computer screenshots (jpeg files)
• different settings and additional information about the lecture (xml file)
120
Annex B. Recorded lectures in Collegerama
The presentation server will synchronize all the different elements and will store the required
information in the XML file. This information will later be used to correctly display the video in
combination with the screenshots. When the presentation has been processed, it is written to
the Collegerama web server (http://collegerama.tudelft.nl) and is now available for students
with Internet access all over the world.
Presentation options
During the presentation, the lecturer is provided with three different presenting options:
• blackboard
•
•
The lecturer uses the black board or an overhead projector to give his lecture, while the
video camera records the content.
PowerPoint
This works in combination with a prepared set of PowerPoint slides that will be displayed
while the presentation is being given.
screen capturing
The contents of the computer screen will be displayed during the presentation, which
allows for the lecturer to use external software such as computer simulations or written
text on a Tablet PC and record the results as separate screenshots.
Figure 2.3: Examples of the three presentation options
Each of these presentation options uses the same storage system, which is based on screen
activity. Every 1 to 4 seconds, the system will store the current screen that is visible on the
beamer and they will be stored as images. Especially with the blackboard and desktop
methods, there will be an abundant amount of images stored, since every mouse movement
and change on the screen, when writing down notes, will cause a new screenshot to be
saved.
Collegerama uses a uniform view for all three presenting options, as is shown in the examples
given below.
Figure 2.4: Collegerama with black board
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=ca42dce5-bb51-4c39-93de-50528dd6b880)
Annex B. Recorded lectures in Collegerama
121
Figure 2.5: Collegerama with PowerPoint
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=724886f7-cfd0-441d-ae85-1fae0cbb28a1)
Figure 2.6: Collegerama with Tablet PC
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9)
The three presenting options differ significantly in the number of slides (or screenshots).
Table 2.1: Number of slides/screenshots for the three Collegerama examples
Presenting option
Number of slides
Blackboard
PowerPoint
Screen capturing
0 (no slides picture)
30
308
Navigation pages
(list – small – large)
0-0-0
2-3-5
12 - 15 - 29
The figures show that Collegerama is very suitable for registration of lectures in which a
PowerPoint presentation or a Tablet PC is used. In these cases the most detailed information
is presented on the presentation block. For a lecture with BlackBoard only, the Collegerama
system is a little superfluous, but still gives a proper registration of the lecture.
With respect to navigation, only lectures with PowerPoint seem to be suitable for a
Collegerama recording. Blackboard lectures lack the navigation by slides/screen shots, while
Tablet PC lectures have too many screenshots for a proper navigation. For the latter, the
screenshots can be clustered in chapters as part of the post processing of a Collegerama
recording.
Multimedia lectures with Collegerama recording
A multimedia lecture can be recorded within Collegerama by either recording the projected
movie or recording the screen (full time screen capturing).
The Collegerama operator can then choose between recording either the camera or the
presentation PC (full screen recording). During the full screen recording, the slide in the
Collegerama slide section is either the previous slide or screen shots of the movie.
122
Annex B. Recorded lectures in Collegerama
Figure 2.7: Screen recording in Collegerama for multimedia lectures
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=f9379fda-848c-4a6f-8210-a5fe2b91edb7)
Figure 2.8: Screen recording in Collegerama for multimedia lectures
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b8c63b74-be43-4864-8a6c-17cf6a8130d4)
Disturbed recording
Recording of slides can be disturbed by unwanted screen activities caused by improper
mouse movements. In this way, a recording with 575 screenshots was obtained from 20
slides.
Figure 2.9: Collegerama recording with screen disturbing (575 screen shots from 20 slides)
(Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b8c63b74-be43-4864-8a6c-17cf6a8130d4)
Annex B. Recorded lectures in Collegerama
123
3.
Presentation of recorded lectures
Course catalog
All recorded lectures of a course are grouped in a 'Catalog' which can be accessed by its own
URL. This catalog functions as a directory of the recorded lectures. Figure 3.1 shows an
example of the catalog for course CT3011.
Figure 3.1: Course catalog in Collegerama
(Source: http://collegerama.tudelft.nl/mediasite/Catalog/?cid=16b5f5fa-0745-4b8b-9f02-f79a03abf50a)
Each lecture can be started by clicking the lecture hyperlink. The catalog gives metadata of
the lectures such as name of the lecture, the presenter, the recording (air) date and the
duration. The recorded lectures can be sorted by lecture name, date and presenter. The URL
of the catalog can be saved as both an RSS feed as well as a browser link.
Most lectures at the TU Delft do not use the catalog. Most often the lecture links are included
in the BlackBoard site of the course giving the lecturer more flexibility in presenting related
information, such as handouts, downloads, links to other course items etc.
Some lectures give a short content list of a lecture in BlackBoard, giving the student more
information about the content of the course. Such list can be used as direct link into the
course by using the start time (in milliseconds) in the URL.
http://collegerama.tudelft.nl/mediasite/Viewer/?peid=7548f752-101b-417e-a4e758aebc595376&playFrom=1218000
Figure 3.2: Recorded lectures for course 3011 in Blackboard
(Source: http://blackboard.tudelft.nl/webapps/blackboard/content/listContent.jsp?mode=reset&course_id=
_24753_1&content_id=_1048479_1)
124
Annex B. Recorded lectures in Collegerama
Alternative display options
Collegerama is displayed within the Mediasite-player, which is based on Windows Media
Player and uses java-script to store configuration information. The display settings of the
player can be modified by changing the configuration files, which are javascript documents.
The size of the slides and video screen and their position within the player is determined by
the values in the file 'standalone-layout.js'. Additional settings are included in the file
'standalone-manifest.js'.
The position of the slides and video screen
LayoutOptions.DefaultPosition. Possible values are:
1. video in left upper corner
2. video in right upper corner
3. video in right lower corner
4. video in left lower corner
is
controlled
by
the
value
Figure 3.3: Collegerama display with video in right upper corner
The display size of the slides and the video is controlled by the values of
LayoutOptions.VideoHeight, LayoutOptions.VideoWidth, LayoutOptions.SlideHeight and
LayoutOptions.SlideWidth. These settings should comply with the display size which is
controlled by the values of LayoutOptions.PlayerHeight and LayoutOptions.PlayerWidth.
The default settings for the sizes are 240, 320, 480 and 640, with related display sizes of 584
and 983. These settings correspond with a 100% display of the video size and 62% of the
slide sizes.
Figure 3.4: Collegerama display with a smaller video in right upper corner (50%, 120x160)
The other display settings are included in the file 'standalone-manifest.js'. This file includes
the time settings of the slides as well as the text in the viewer text block, such as title,
presenter info and time data.
Annex B. Recorded lectures in Collegerama
125
Player options
The main player of Collegerama is split into three parts:
• video screen
• display of slides or Tablet PC
• information bar
Figure 3.5: Screenshot of the Collegerama web application
Video screen
The video screen shows a video recording of the lecturer talking, while the camera is
following him around. This is the only part of the lecture that shows a continuous motion.
Around the video player are several controls that allow you to customize your lecture viewing
experience. You have the option of pausing and skipping back the movie to the beginning.
You can also adjust the volume settings and increase the video to full-screen.
The last interesting option that is offered is the playback speed. In December of 2001, a
student at Brigham Young University in Utah wrote a paper called 'Variable Speed Playback of
Digitally Recorded Lectures', in which he analyzed the usefulness of alternating the playback
speed. They released a plugin that allowed you to play the video at twice the original speed.
Afterwards he interviewed 625 students who took the course that was published via digitally
recorded lectures and 256 students in this group (41%) ended up purchasing and using the
plugin. Most of the buyers were very enthusiastic about the program and were very content
in using it.
Figure 3.6: Screenshot of the video screen
Display of slides or presentation aid
The slide selector of Collegerama is located on the right side of the screen. You have three
separate options for selecting these slides:
• a list of slide numbers, their corresponding time and the title of the slide
• an overview of the following 16 slides in full color
• a larger overview of the following 6 slides in full color
126
Annex B. Recorded lectures in Collegerama
Figure 3.7: Screenshot of the three different slide selection screens
By clicking on a slide, the video stream will automatically fast forward to the beginning of that
slide. The timing information is retrieved from the xml file that contains all the configuration
information. What you can see when using Collegerama is that the focus has been laid on the
video stream. When the viewer clicks on a slide, it updates the video screen but not the
current slide view. It is necessary to leave the slide selection screen, to see the current slide
in full-screen again. This is very distracting, because the most important content and
information about the presentation is contained in the Presentation slides.
Information bar
This part of the interface contains some additional information about the lecturer, the title,
date and length of the lecture and some optional information. After the lecture has been
uploaded, the text on this screen can be updated. The problem is that generally lecturers at
the TU Delft don't have the opportunity to modify these text fields on the Collegerama server.
That means that this field is usually kept empty and therefore kind of useless.
Figure 3.8: Screenshot of the information bar
Navigation
A characteristic about the navigation in Collegerama is that it's based on the video stream.
The foundation of the lecture is the webcam recording, to which the slides have been linked
based on time codes. This means that a movie is required to use the Collegerama system and
a simple collection of slides with possibly some audio narration is not be sufficient.
The only way to really navigate through the lecture is by using the slide navigation screens.
You get to see an overview of the different slides and you can fast forward to them. This
works very well when the lecturer makes use of a Powerpoint presentation, in which they
generally have a talk of a couple of minutes, illustrating what is shown on the slide.
When the lecturer uses the Tablet PC option, in which they write down notes, the navigation
becomes problematic. Collegerama checks for changes in the screen during the presentation
every 1 to 4 seconds. If the lecturer is writing down notes, a large amount of slides will be
generated since the screen is constantly changing. When the lecture is put on the server,
these types of presentation contain about 150 slides on average, most of them about 10
seconds long. This makes navigation through the lecture virtually impossible, because a slide
no longer constitutes a change in the subject but simply a small change on the screen.
Annex B. Recorded lectures in Collegerama
127
4.
Technical specifications
Data storage
Every presentation is split up into several parts:
• video
• audio
• slides
• XML
• presentation viewer
If you compare the different file sizes of each segment, a considerable difference can be
observed. The video stream is the part that takes up the biggest amount of space, while the
slides are relatively small. If the only thing available was a combination of slides and audio,
the complete lecture would be 1/5th times the size that it would have been, had the video
stream been included.
Table 4.1: Storage and quality information of a Collegerama lecture (PowerPoint method)
Stream
Video
Data size
91,4 MB
Length / amount
45:09 minutes
Audio
20,6 MB
45:09 minutes
Slides
6,32 MB
29 slides
XML
Presentation viewer
21 KB
1,75 MB
-----
Recording quality
Windows Media Video 9 codec
320x240 pixels
30 fps
350 kbps bitrate
Windows Media Audio 9 codec
48 kbps
44 kHz mono (A/V)
CBR encode mode
48024 bitrate
44100 sample rate
1024x768 pixels
96 pixels/inch
Synchronization between slides and video
The different contents of the presentation need to be synchronized, so that when the lecture
is viewed, all the streams (video, audio, slides) are displayed at the correct time. This is
accomplished using an XML file. Apart from the timing information, it also contains all
relevant information concerning the lecture. The markup file is divided up into 9 sections:
• presentation
• folders
• profile
• slides
• chapters
• presenters
• polls
• external links
• viewer
Presentation element
This element contains all the information about the presentation. It defines the owner, shows
a creation and modification date, the title of the lecture and information about the time zone
of the recording. There's also a link included to the location of the lecture on the server and
in what online directory the file is located.
128
Annex B. Recorded lectures in Collegerama
<Id>
<Value>f33ba7ff-0160-4259-bd94-7ee0d9c5a461</Value>
<EntityType>Presentation</EntityType>
</Id>
<Owner>Kees</Owner>
<CreationDate>2008-11-05T09:56:29Z</CreationDate>
<LastModified>2008-11-05T09:56:29Z</LastModified>
<RootId>
<Value>f33ba7ff-0160-4259-bd94-7ee0d9c5a461</Value>
<EntityType>Presentation</EntityType>
</RootId>
<Title>CT3011/11</Title>
<FolderId>
<Value>16b5f5fa-0745-4b8b-9f02-f79a03abf50a</Value>
<EntityType>Folder</EntityType>
</FolderId>
Figure 4.1: XML snippet of the <presentation> element
Folders element
Every lecture that is recorded is usually stored in more than one folder, since most of them
belong to different bachelor or master programs. Each lecture also contains an owner ID, a
creation and modification date of the folder and an identifier that can be used to locate the
folder on the web server.
<Folder>
<Id>
<Value>b2d9aa4e-c67a-4200-9297-3670289bfea7</Value>
<EntityType>Folder</EntityType>
</Id>
<Owner>Jos-pc</Owner>
<CreationDate>2009-02-03T13:29:28Z</CreationDate>
<LastModified>2009-02-03T14:11:28Z</LastModified>
<Name>TU Delft COLLEGES</Name>
<ParentId>
<Value>11cd7471-86a3-4d32-a5cd-300dca2a78bf</Value>
<EntityType>Folder</EntityType>
</ParentId>
<Type>Folder</Type>
</Folder>
Figure 4.2: XML snippet of the <folders> element
Profile
The profile holds all the relevant information about the video and audio streams that have
been stored. For the video stream it holds the creation date, the frame rate, the resolution
and the video codec used (in this case the Windows Media Video 9 codec). For the audio
stream it contains the bit rate, sample rate, number of channels and the encoding mode.
Slides
This part of the configuration file holds the timing information for the presentation. Every
slide has a unique identifier, an entity type (such as Presentation), a number, the filename
and the timing index in milliseconds. With this information, the viewer application knows at
which timeframe the image in the slide viewer needs to change. The timeline of the video
stream functions as the foundation for the whole Collegerama presentation system and all
slide components rely on it to be correct.
<Slide>
<PresentationId>
<Value>f33ba7ff-0160-4259-bd94-7ee0d9c5a461</Value>
<EntityType>Presentation</EntityType>
</PresentationId>
<Number>1</Number>
<Time>47</Time>
<SlideImageFileNameTemplate>slide_{0:D4}_full.jpg</SlideImageFileNameTemplate>
</Slide>
Figure 4.3: XML snippet of the <slide> element
Annex B. Recorded lectures in Collegerama
129
Chapters
This element is used to store additional information about each slide, such as title and
additional text. They are displayed in the slide listing on the Collegerama application while
selecting another slide. At present, this element is not used very often since they need to be
added after the presentation has been published and have to be added manually.
Presenters
Since every lecture is given by one or more presenters, this information is also stored with
the presentation. It contains some general contact information and provides a link to the
photo of the lecturer. This photo is displayed during playback of the given lecture on the
Collegerama web application.
<Presenter>
<Id>
<Value>0cdd3b98-9bd5-4275-8afc-38ada3eb03a7</Value>
<EntityType>Presenter</EntityType>
</Id>
<Owner>MediasiteAdmin</Owner>
<CreationDate>2008-11-05T09:46:56Z</CreationDate>
<LastModified>2008-11-10T15:33:28Z</LastModified>
<FirstName>J.C.</FirstName>
<MiddleName>van</MiddleName>
<LastName>Dijk</LastName>
<BioUrl>http://tudelft.nl/live/pagina.jsp?id=e1c07f7e-f00d-4aa1-837a27798eab23c6</BioUrl>
<EmailAddress>J.C.vanDijk@tudelft.nl</EmailAddress>
<ImageName>0cdd3b98-9bd5-4275-8afc-38ada3eb03a7.JPG</ImageName>
<ImageUrl>http://collegerama.tudelft.nl/mediasite/MediasiteData/Presenters/0cd
d3b98-9bd5-4275-8afc-38ada3eb03a7/0cdd3b98-9bd5-4275-8afc38ada3eb03a7.JPG</ImageUrl>
<AdditionalInfo />
</Presenter>
Figure 4.4: XML snippet of the <presenter> element
Polls
Here you can enter polls and questions about the lecture. The TU Delft does not make use of
this feature.
ExternalLinks
Here you can add external links that are relevant for the lecture. The TU Delft does not make
use of this feature.
Viewer
This contains information about the viewer, such as dimensions, resolution, a link to the title
banner etc. You can also update the images used while loading, slide start, slide end and
other pictures.
130
Annex B. Recorded lectures in Collegerama
5.
Evaluation
Collegerama provides a good and stable platform for the distribution of online video lectures.
The biggest advantage of this is that you only need a video stream to upload your lecture and
additional slides or screenshots are optional, since the system relies on the video stream as a
basic timeline.
The disadvantage is that the navigation is not that great. The creators built the system with
the idea in mind that the main focus should be on the video and not the slides. This doesn't
make much sense, since the main storyline and structure of most lectures is based on the
keywords that are placed on the PowerPoint slides. When navigating through the slides, the
focus is lost on the current slide that lets the viewer know where the lecturer is with the
story.
Another problem is that there is no possibility for searching within the lectures. The text
slides are converted to pictures which makes searching them increasingly difficult. The quality
of these screenshots is also decreased since the slides are no longer vector oriented, but
converted to still images.
Overall, Collegerama is a good system, but there are definitely a few flaws that could be
improved in order to make it even easier to work with and several aspects could be enhanced
to allow for better usability by students.
Annex B. Recorded lectures in Collegerama
131
Annex B1. Collegerama lectures of CT3011 in OpenCourseWare
132
Annex B. Recorded lectures in Collegerama
Annex B. Recorded lectures in Collegerama
133
134
Annex B. Recorded lectures in Collegerama
Annex B. Recorded lectures in Collegerama
135
136
Annex B. Recorded lectures in Collegerama
Annex C.
1.
2.
3.
4.
Collegerama as single movie/audio file
Single movie or audio file.......................................................................... 139
Benefits of a single movie or audio file? .............................................................. 139
Podcast, vodcast or what's in the name? ............................................................ 140
Multimedia container files .................................................................................. 140
Developments in multimedia container formats ................................................... 141
Developments in multimedia systems for movies and home theaters .................... 142
Technical specification for video streams/files ..................................................... 144
Technical specification for audio streams/files ..................................................... 146
Collegerama for YouTube .......................................................................... 148
What is YouTube? ............................................................................................. 148
Video formats for YouTube ................................................................................ 148
Components of a Collegerama vodcast ............................................................... 149
Collegerama as vodcast for YouTube .................................................................. 150
Vodcast production............................................................................................ 151
One step recording for vodcast production .......................................................... 155
Uploading to YouTube ....................................................................................... 156
Downloading of vodcasts from YouTube ............................................................. 157
Conclusions and recommendations for vodcasts on YouTube ............................... 158
Collegerama for iTunes ............................................................................. 159
What is iTunes? ................................................................................................ 159
Video formats for iTunes.................................................................................... 159
iPod constraints for Collegerama vodcasts .......................................................... 160
Components of a Collegerama vodcast for iTunes ............................................... 163
Vodcast production............................................................................................ 165
Vodcast production by TU Delft .......................................................................... 168
Uploading to and downloading from iTunes ........................................................ 169
Conclusions and recommendations for vodcasts on iTunes ................................... 169
Evaluation ................................................................................................. 170
iTunes versus YouTube ...................................................................................... 170
Alternative download options ............................................................................. 170
Future developments of Collegerama.................................................................. 170
Annex C. Collegerama as single movie/audio file
137
138
Annex C. Collegerama as single movie/audio file
1.
Single movie or audio file
Benefits of a single movie or audio file?
At present, lectures recorded in Collegerama can only be viewed as streaming video with an
Internet connection to the Collegerama server (http://collegerama.tudelft.nl).
This setup has several advantages, such us:
• no distribution channels required, avoiding its institutional and technical requirements
• single point of entry, with its benefits on updating (its content as well as the player)
• no storage required at the point of viewing/listening
Severe drawbacks of this setup are:
• no distribution via popular channels, such as YouTube and iTunes (and missing their
worldwide exposure)
• no offline viewing/listening and therefore repeatedly streaming for multiple views
In order to overcome these drawbacks, it's required to create a single movie or audio file
from a Collegerama recording which can be used in most distribution channels. A single video
file also makes it easier to allow additional options such as subtitling or voice narration.
Each distribution channel requires its own specific technical specifications. To avoid the
production of a wide range of files, only the following distribution channels are taken into
consideration:
• movie files:
YouTube and iTunes
• audio files:
iTunes
These two platforms have been selected based on:
• their worldwide exposure
• the acceptance from their technical specifications by other external platforms
• the experiences of MIT (see separate document)
• the usability of these technical specifications in TUDelft's own BlackBoard, OCW and other
web platforms
Figure 1.1: Combining the Collegerama components into a single video file enables distribution of recorded lectures
For
•
•
•
•
•
•
creating a single movie and/or audio file, the following aspects should be determined:
content (slides, audio, video, subtitles, and any combination of these)
presentation of the content (lay out, introduction tune/movie, branding)
video quality (resolution, frame rate)
format of video file (mov, wmv, flv, mp4, codec etc)
audio quality (stereo/mono, frequency range)
format of audio file (mov, mp3, mp4, codec etc)
Annex C. Collegerama as single movie/audio file
139
Above mentioned technical specifications (quality, codec) primarily determine the file size.
The technical specification should balance between quality/usability and quantity (download
time and storage requirements).
Podcast, vodcast or what's in the name?
Single audio files are often referred to as "podcast" files. The term podcast originates from
the iPod, as iPod-broadcasting. In the slipstream of this term, single movie files are often
referred to as "vodcast" files. Originally these were downloaded files since iPod and iTunes
did not support streaming content. The meaning of these terms has later transferred into
"audio on demand" or "video on demand (VOD)", in combination with an RSS feed. This
audio or video can also be streaming audio or video, without actual distribution of a real file.
Figure 1.2: The RSS logo (background, brown) is combined with the movie icon to indicate a vodcast distribution
A video always contains a digital audio and a digital video stream that is put into a single file.
The advantage of downloading the complete vodcast (or video podcast) is that it can later be
played offline on a PC or some other portable multimedia device. When downloading the
complete video, it can be watched multiple times without causing additional bandwidth usage
on the server, or even connecting to the server (offline viewing or listening).
Streaming allows for skipping parts of the video without downloading the whole content, but
users may have to face pauses in playback due to slow transfer speeds. Downloaded files
have a much faster response on interruptions from the user. Recorded lectures are often
watched more than one time and often with user interference like skipping or rewinding some
passages.
Multimedia container files
Digital multimedia files consist of at least two digital streams:
• video stream
• audio stream
These two streams are synchronized by an Audio to Video Synchronization process (A/Vsync). The video and audio streams as well as the synchronization process are stored in a
single multimedia container file or stream. In these container files, the digital streams are
stored in a compressed form in order to reduce the file or stream size. Several different
codecs (compression-decompression systems) have been developed for the compression of
these streams. Decompression is done in the digital media player during display.
140
Annex C. Collegerama as single movie/audio file
Table 1.1 gives an overview of the most popular container formats. A container file format
supports one or more codecs as indicated in this table. Some container file formats also
include subtitles files, menu-systems and/or metadata files. For other container formats this
information is provided by attached files.
Table 1.1: Most popular multimedia container file formats
File
extension
avi
Owner
Video formats
Audio formats
Microsoft
Almost anything through ACM;
Vorbis is problematic
rm
RealNetworks
Almost anything through VFW;
H.264/AVC is problematic due
to the limited B-frame support
RealVideo 8, 9, 10
mpg
MPEG
MPEG-1, MPEG-2
mov
Apple
asf, wma,
wmv
Microsoft
flv
Adobe
mp4
MPEG
Limited to what is available to
the QuickTime codec manager
Almost anything through VFW
or DMO; H.264/AVC is
problematic
Sorenson, VP6, Screen Video,
H.264/MPEG-4 AVC
MPEG-4 ASP, H.264/MPEG-4
AVC, H.263, VC-1, Dirac, others
mkv
public domain
virtually anything
(Source: http://en.wikipedia.org/wiki/Container_format_(digital) and
http://en.wikipedia.org/wiki/Comparison_of_container_formats)
(HE)-AAC, Cook Codec, Vorbis,
RealAudio Lossless
MPEG-1 Layers I, II, III (mp3),
other formats only in private
streams: LPCM
Limited to what is available to
Sound Manager or CoreAudio
Almost anything through ACM
or DMO; Vorbis is problematic
MP3, Nellymoser, ADPCM,
Linear PCM, AAC[8], Speex
MPEG-1 Layers I, II, III (MP3),
MPEG-2/4 (HE)-AAC, AC-3,
Vorbis (with privat
objectTypeIndication), Apple
Lossless, others
virtually anything
Developments in multimedia container formats
Multimedia container formats have been developed since their early years in 1980-1990,
when PC's became sufficiently powerful to display videos and movie files. Over time much
better movie quality was achieved driven by the following developments:
• better codecs resulting in smaller files, enabled by faster and larger processors
• larger file sizes, enabled by cheaper data storage
• larger stream sizes, enabled by faster Internet connections
These developments were incorporated into the container systems of each of the respective
suppliers/owners. This development is reflected in the succeeding numbering of the file
systems. Microsoft's extension wmv is internally indicated as Windows Media Video 9, in short
labeled as wmv3. Similar developments can be viewed for the other suppliers.
(Source:
http://www.microsoft.com/windows/windowsmedia/forpros/codecs/video.aspx#WindowsMedi
aVideo9VCM)
The only exception is the development of the mp4 extension. This is an open standard
supported by many suppliers. To distinguish between the older mpg extension (for MPEG1
and MPEG2), a new extension has been introduced.
Annex C. Collegerama as single movie/audio file
141
Developments in multimedia systems for movies and home theaters
In 1977, the Video Home System (better known as VHS) has been introduced by JVC. There
were several rival formats that were competing to be the leading video format, with Sony's
Betamax being its fiercest competitor. JVC's standard offered a longer playing time and had
the advantage of a far less complex tape transport mechanism. Early VHS machines could
also rewind and fast forward the tape considerably faster than a Betamax VCR.
(Source: http://besser.tsoa.nyu.edu/impact/f96/Projects/jchyung/)
By the 1990s, the VHS format became the standard for distributing movies and videos
throughout the consumer market. The problem with this format was the fact that it contains
an analogue signal. In 1982, Sony and Philips developed a new standard for audio and data
storage called Compact Disc (CD). In that same year, Sony Music Entertainment released the
first music album on their new digital Compact Disc format. Since then, it became the
standard for the distribution of digital audio, while video was still being released in an
analogue format.
For the following ten years, this trend continued until in 1993 a new video format was
developed based on Sony's original CD technology, called Compact Disc Digital Video or Video
CD (VCD). It contained an MPEG-1 video stream with a resolution of 352x240 for NTSC and
352x288 for PAL. The overall picture quality was intended to be comparable to VHS video
(although poorly compressed VCD video can sometimes have a lower quality than VHS
video). The advantage of VCD was the use of block artifacts rather than analog noise, which
doesn't deteriorate further with each use.
(Source: http://www.philipsmuseumeindhoven.nl/phe/products/e_cd.htm and
https://www.ip.philips.com/view_attachment/2450/https://www.ip.philips.com/view_attachm
ent/2450/sl00812.pdfsl00812.pdf)
In 1997, a new disc format called Digital Video Disc (DVD) was introduced, which offered
about 7 times the storage capacity of a CD. This increase in space allowed for the distributing
of movies at a much larger quality and video resolution. These videos were stored using a
new format called MPEG-2, at a resolution of 720x480 for NTSC and 720x576 for PAL. It also
allowed for the support of wide-screen video with an aspect ratio of 16:9. As of this moment,
DVD became and still is the standard for all public movie and video productions.
In 2006, a new format called Blu-ray Disc (BD), designed by Sony, Philips and Panasonic, was
released as the successor to DVD. However, unlike previous format changes (e.g. audio tape
to compact disc, VHS videotape to DVD), there is no immediate indication that production of
the standard DVD will gradually wind down, as they still dominate with around 87% of video
sales and approximately one billion DVD player sales worldwide.
Table 1.2 gives the technical details for the above mentioned systems. The developments
within the movie piracy scene are displayed in Table 1.3. The main differences to the legal
world systems is the use of new technology on older media, enabled by better processors and
cheaper media, and the use of single file systems enabling more easier display at the end
user's PC.
From these developments it can be seen that future digital movie system will be more and
more based on MPEG-4 within the Blu-ray specifications.
142
Annex C. Collegerama as single movie/audio file
Table 1.2: Developments in digital video for movies and home theaters
VCD
DVD
Blu-ray
Year of
introduction
Base size
Max size
1993
1997
2006
700 MB
700 MB (mode 1)
800 MB (mode 2/xa)
20 GB
20 GB (single layer)
50 GB (dual layer)
Video
encoding
MPEG-1
4.7 GB
4.7 GB (s-sided s-layer)
8.54 GB (s-sided d-layer)
17.08 GB (d-sided dlayer)
MPEG-2
Audio
encoding
MPEG-1 Audio Layer II
Up to 44.1 kHz
Dual channel or stereo
352x240 (NTSC)
352x288 (PAL)
1,150 kbits/sec
4:3
29.97 or 23.976 (NTSC)
25 (PAL)
Sony, Philips, Matsushita
and JVC
Resolution
Bitrate
Aspect ratio
FPS
Creator
DVD-Audio
Up to 192 kHz
Up to 6.1 surround
720x480 (NTSC)
720x576 (PAL)
4:3 or 16:9
29.97 (NTSC)
25 (PAL)
Sony, Panasonic,
Toshiba, Fox Studios,
Warner Brothers, Philips
Table 1.3: Developments in digital video within the movie piracy scene
MPEG-4 met H264
VC1
MPEG-2 (backward
compatible)
AAC
Up to 192 kHz
Up to 7.1 surround
1920x1080
29.4 Mbit/sec
16:9
Blu-ray Disc Association
(Sony, Panasonic,
Pioneer, Philips,
Thomson, LG Electronics,
Hitachi, Sharp and
Samsung)
DivX
XviD
Matroska
Year of
introduction
Purpose
1998
2001
2006
Compression of MPEG-4
Compression of MPEG-4
File extension
*.avi
*.avi
Container for MPEG4 files
*.mkv
Annex C. Collegerama as single movie/audio file
143
Technical specification for video streams/files
The digital video stream contains pictures (or frames) played at a certain rate (frame rate).
The picture or frame is composed of pixels (resolution).
(Source: http://www.equasys.de/videoformats.html)
Resolution or frame size
Since every frame is an orthogonal bitmap digital image, it comprises of a raster of pixels. If
it has a width of W pixels and a height of H pixels, the frame size is stated as WxH. The
frame size should fit within the size of the digital display. Figure 1.3 gives an overview of
common digital displays modes.
Figure 1.3: Overview of common display resolutions
(Source: http://en.wikipedia.org/wiki/Display_resolution)
The development in display resolution is aimed towards larger resolutions (from 320x240 to
1920x1080) and towards wider screens (ratio 4:3 to 16:9 i.e 1.33 to 1.78). These example
resolutions increase the number of pixels per frame from 76.8k to 2.07M, or an increase by a
factor 27.
Frame rate
In a digital video, the digital images or frames are displayed in rapid succession at a constant
rate. The rates at which these frames are displayed are measured in frames per second, or
FPS. The frame rate may vary from 1 to 100, depending on the actual motion within the
movie.
(Source: http://spng.se/frame-rate-test/)
144
Annex C. Collegerama as single movie/audio file
Table 1.4 gives the frame rate for most popular video and television systems.
Table 1.4: Frame rates for television and movie
System
Initial games
Cinema film
TV-PAL
TV-NTSC
Blue-ray
Blue-ray
Monitor
Frame rate (FPS)
6
24
25
29.97
24-p / 23.976-p
59.94-i / 50-i
60 / 100
Remarks
Accepted by players in 3D game
progressive
interlaced
More modern systems are also capable to handle a variable frame rate.
For recording lectures, a very high frame rate is not required. A frame rate of around 15
frames per second seems more than sufficient.
Video compression
Video compression is achieved not only on a frame to frame basis (such as bmp to jpg
compression), but also over successive frames. Video compression typically operates on
square-shaped groups of neighboring pixels, often called macro blocks. These pixel groups or
blocks of pixels are compared from one frame to the next and the video compression codec
(encode/decode scheme) sends only the differences within those blocks. This works
extremely well if the video has no motion. A still frame of text for example can be repeated
with very little transmitted data. In areas of video with more motion, more pixels change
from one frame to the next. When more pixels change, the video compression scheme must
send more data to keep up with the larger number of pixels that are changing. Very good
video compression is flexible, making the actual frame rate of minor importance.
(Source: http://www.videsignline.com/howto/showArticle.jhtml?articleID=185301351)
Video codecs used in multimedia files are often identified by a 4 digit code (FourCC). Table
1.5 provides a list of the codes that are most used around the world.
Table 1.5: Some popular codecs for video compression
Code
Owner
RAW
-
avc1
Apple
MP4V
Apple
Alternative name
Description, Products Using the Codec,
etc.
Full Frames (Uncompressed)
H.264
Apple's version of the MPEG4 part 10/H.264
standard apparently.
Apple QuickTime MPEG-4 native
H264
Intel ITU H.264
FLV1
FLV1 codec (supported by ffdshow)
TSCC
TechSmith Screen Capture Codec
WMV3
Microsoft
Windows Media
Video 9
The codec implements the Simple and Main
modes of the VC-1 codec standard
WVC1
Microsoft
Windows Media
Video 9 Advanced
Profile
DIVX
OpenDivX
DivX
DMO-based codec. VC-1 compliant format.
Fully compliant Advanced Profile of the VC-1
codec standard.
WVC1 is included in Windows Media Player
11.
This FOURCC code is used for versions 4.0
and later of the DivX codec.
XVID
XviD MPEG-4 codec
XVIX
Based on XviD MPEG-4 codec
(Source: http://www.videsignline.com/howto/showArticle.jhtml?articleID=185301351, http://www.fourcc.org/,
Annex C. Collegerama as single movie/audio file
145
http://abcavi.kibi.ru/fourcc.php)
Table 1.6 gives the compression efficiency for MPEG compression. More recent codecs such
as VC1 (in wmv3 or Windows Media Video 9) give even better compression than MPEG-4.
Table 1.6: Global compression efficiency for MPEG video codecs
Compression
Absolute
Relative
MPEG-1
7 - 15
1
MPEG-2
15 - 30
2
MPEG-4
50 - 100
6
Technical specification for audio streams/files
A digital audio stream contains one or more channels (mono/stereo/surround). For each
channel the analog signal is converted into a digital signal at a given sampling rate and bit
resolution or bit depth. Generally speaking, the higher the sampling rate and bit depth, the
more fidelity combined with an increase in the amount of digital data.
Number of channels
An audio stream can be mono (1 channel), stereo (2 channels) or surround sound (3 to 7
channels).
Sampling rate
The sampling rate, sample rate, or sampling frequency defines the number of samples per
second (or per other unit) taken from a continuous signal to make a discrete signal. For timedomain signals, it can be measured in samples per second or hertz (Hz).
Bit depth
In digital audio, bit depth describes the number of bits of information recorded for each
sample. Bit depth directly corresponds to the resolution of each sample in a set of digital
audio data. Common examples of bit depth include CD audio, which is recorded at 16 bits,
and DVD-Audio, which can support up to 24-bit audio.
Bit rate
Bit rate refers to the amount of data, specifically bits, transmitted or received per second
(bit/s or bps).
The bit rate is often quantified in conjunction with an SI prefix such as kilo- (kbit/s or kbps),
mega- (Mbit/s or Mbps), giga- (Gbit/s or Gbps) or tera- (Tbit/s or Tbps). 1 kbit/s has almost
always meant 1,000 bit/s, not 1,024 bit/s.
The bit rate can be calculated in the following way:
Bit rate = (bit depth) x (sampling rate) x (number of channels).
One of the most common bit rates given is that for compressed audio files. For example, an
MP3 file might be described as having a bit rate of 160 kbit/s or 160000 bits/second. This
indicates the amount of compressed data needed to store one second of music.
The standard audio CD is said to have a data rate of 44.1 kHz/16, implying the audio data
was sampled 44,100 times per second, with a bit depth of 16. CD tracks are usually stereo,
using a left and right track, so the amount of audio data per second is double that of mono,
where only a single track is used. The bit rate is then 44100 samples/second x 16 bits/sample
x 2 = 1,411,200 bit/s or 1.4 Mbit/s.
146
Annex C. Collegerama as single movie/audio file
Audio compression
Audio compression algorithms are implemented in computer software as audio codecs. The
most known audio codec is MP3 (MPEG-1 Audio Layer 3), a patented digital audio encoding
format using a form of lossy data compression. It is a common audio format for consumer
audio storage, as well as a de facto standard of digital audio compression for the transfer and
playback of music on digital audio players.
Designed to be the successor of the MP3 format, AAC (Advanced Audio Coding) generally
achieves better sound quality than MP3 at similar bit rates. AAC's best known use is as the
default audio format of Apple's iPhone, iPod, iTunes, and the MPEG-4 video standard (MP4).
Table 1.7 gives an overview for some audio codecs.
Table 1.7: Some popular codecs for audio compression
Code
Creator
Year
Latest version
Remarks
MP3
1993
1997
ISO/IEC 11172-3,
ISO/IEC 13818-3
ISO/IEC 14496-3
wma
ISO/IEC MPEG Audio
Committee
ISO/IEC MPEG Audio
Committee
Microsoft
1999
11
Ogg
Xiph.Org Foundation
2001
1.2
Patent rights are
disputed
iTunes DRM
audio for MP4
Free for Windows
licensees
Free to use
AAC
(Source: http://en.wikipedia.org/wiki/Comparison_of_audio_codecs)
Annex C. Collegerama as single movie/audio file
147
2.
Collegerama for YouTube
What is YouTube?
YouTube is a video sharing website on which users can upload and
share videos. Three former PayPal employees created YouTube in
February 2005. In November 2006, YouTube, LLC was bought by
Google Inc. for $1.65 billion, and is now operated as a subsidiary of
Google. It is the biggest distributor of streaming online video content.
(Source: http://news.bbc.co.uk/2/hi/business/6411017.stm)
The company is based in San Bruno, California, and uses Adobe Flash Video technology to
display a wide variety of user-generated video content, including movie clips, TV clips, and
music videos, as well as amateur content such as video blogging and short original videos.
Most of the content on YouTube has been uploaded by individuals, although media
corporations including CBS, the BBC, UMG and other organizations offer some of their
material via the site, as part of the YouTube partnership program.
Unregistered users can watch the videos, while registered users are permitted to upload an
unlimited number of videos. Videos that are considered to contain potentially offensive
content are available only to registered users over the age of 18. The uploading of videos
containing defamation, pornography, copyright violations, and material encouraging criminal
conduct is prohibited by YouTube's terms of service. Accounts of registered users are called
"channels".
In the last few years YouTube became a medium for several Universities to publish their
recorded lectures on. One of the first was MIT (Massachusetts Institute of Technology), who
joined in October of 2005. Later other Universities like Purdue (2006), Stanford (2006), UC
Berkeley (2007) and Harvard Business (2007) started publishing recorded lectures and course
material via the popular Internet medium.
Video formats for YouTube
YouTube's video playback technology for web users is based on the Adobe Flash Player. This
allows the site to display videos with quality comparable to more established video playback
technologies (such as Windows Media Player, QuickTime, and RealPlayer) that generally
require the user to download and install a web browser plug-in to view video content.
Viewing Flash video also requires a plug-in, but market research from Adobe Systems has
found that its Flash plug-in is installed on over 95% of personal computers.
Videos uploaded to YouTube are limited to ten minutes in length and a file size of 2 GB.
When YouTube was launched in 2005, it was possible for any user to upload videos longer
than ten minutes, but YouTube's help section now states: "You can no longer upload videos
longer than ten minutes regardless of what type of account you have. Users who had
previously been allowed to upload longer content still retain this ability, so you may
occasionally see videos that are longer than ten minutes." The ten minute limit was
introduced in March 2006, after YouTube found that the majority of videos exceeding this
length were unauthorized uploads of television shows and films.
Video formats
YouTube accepts videos uploaded in most formats, including .WMV, .AVI, .MKV, .MOV, MPEG,
.MP4, DivX, .FLV, and .OGG. It also supports 3GP, allowing videos to be uploaded directly
from a mobile phone.
148
Annex C. Collegerama as single movie/audio file
Video quality
YouTube originally offered videos in only one format, but it now has three main formats, as
well as a "mobile" format for viewing on mobile phones. The original format, now labeled
"standard quality", displays videos at a resolution of 320x240 pixels using the Sorenson Spark
codec, with mono MP3 audio. This was, at the time, the standard for streaming online videos.
"High quality" videos, introduced in March 2008, are shown at up to 864x480 pixels with
stereo AAC sound. This format offers a significant improvement over standard quality. In
November 2008 720p HD support was added. At the same time, the YouTube player was
changed from an aspect ratio of 4:3 to a widescreen 16:9 resolution. 720p videos are shown
at 1280x720 pixels resolution and encoded with the H.264 video codec. They also feature
stereo audio encoded with AAC.
Components of a Collegerama vodcast
A Collegerama lecture has screenshots of the PowerPoint slides and a video of the lecturer
giving the lecture. On the web interface, these have been split into separate parts. If
Collegerama is going to be published as a vodcast, a conversion of the different Collegerama
elements into a single video (multimedia) file format is necessary.
In
•
•
•
the current video system of Collegerama, the following elements are kept in sync:
video (with audio)
PowerPoint slides
closed captions (not currently used at the TU Delft)
Video
The video part of Collegerama usually shows the lecturer, but might occasionally be switched
to a recording of the display screen for animations, movies etc. Collegerama publishes the
video stream using the following quality settings:
Resolution:
320 x 240 (ratio 4:3)
Frame rate:
25 fps
Bit rate:
370 kb/s
Codec:
wmv3
In short:
Windows Media Video 9 / 320x240 / 25.00fps / 341kbps
Audio
Audio is a very important part of the vodcast. It contains all the explanations by the lecturer
and is a main part of the lecture. A lecture can be followed by only having an audio recording
without video. This is shown by podcasts of lectures. A video stream without audio doesn't
make any sense.
Collegerama publishes the audio stream using the following quality settings:
Channels:
2 (Stereo)
Sampling rate: 22050 Hz (22 kHz)
Bit depth:
16 bits/sample
Bit rate:
20 kB/s
Codec:
wma2
In short:
Windows Media Audio 9.2 / 20 kbps / 22 kHz / stereo (1-pass CBR)
Slides
The slides of a presentation contain the most detailed information. It's important for the
viewers since it gives a guideline as to where the lecturer is with his story. Fortunately the
slides mostly contain keywords at a pretty decent font size, which means that the quality and
resolution do not have to be very high for it to be readable. Collegerama publishes
PowerPoint slides using the following specifications:
Resolution:
1024 x 768 (ratio 4:3)
Bit depth:
24 bits/pixel (full colour)
Codec:
jpg
Annex C. Collegerama as single movie/audio file
149
Closed captions / Subtitles
There are different ways of publishing closed captions or subtitles on video. The most
commonly used is a text file containing time stamps and corresponding spoken sentences.
Closed captions and subtitles for Collegerama lectures are described elsewhere. For the
production of a vodcast, these subtitles are not relevant, since they will be attached to the
vodcast based on the internal timestamps of the movie file.
Collegerama as vodcast for YouTube
A vodcast for YouTube should comply with the restrictions for resolution of YouTube. A
general strategy for this is to develop a vodcast at the best video quality supported by
YouTube, with the following considerations and constraints:
• YouTube movies are limited to a file size of 2GB and a display time (10 minutes for the
general public, unlimited for channel managers like YouTube Edu)
• YouTube gives the viewer the option to display at a lower quality when bandwidth is a
limiting factor
• producing a vodcast at the highest quality enables the production of "child products" for
other platforms with a lower quality, which results in smaller file sizes or bandwidth
requirements
• YouTube converts movies with non-normalized resolution by downsizing to the nearest
standard heights of 360, 480, 720 or 1280 pixels
The best quality of a Collegerama vodcast for YouTube within these constraints can be
achieved by:
• reducing the size of the slides from 1024x768 to 960x720 (downsizing to 94%, keeping
the display ratio 4:3)
• keeping the video resolution of 320x240
• putting both elements next to each other giving an overall size of 1280x720 (HD720,
widescreen, display ratio 16:9)
• filling the remaining area with related info and/or navigation tools (or blank)
Video
320x240
Slide
960x720
Related info
320x480
Figure 2.1: Layout of Collegerama elements within resolution constraints for YouTube movies (1280x720)
A layout according to this setup is given in Figure 2.1. The video is located on the right hand
side of the slides, to give a more balanced overall picture for left to right reading. The overall
view might be mirrored to obtain an overall picture which resembles the original Collegerama
view (video left).
150
Annex C. Collegerama as single movie/audio file
Figure 2.2 gives this layout for the Collegerama lecture in the course CT3011 Water
management.
Figure 2.2: Collegerama as vodcast for YouTube (1280x720)
Vodcast production
A high quality vodcast for YouTube can be produced by following these steps:
• convert the slides into a movie file
• convert the slide movie into a HD resolution
• combining the slide movie and the lecture movie
• conversion to other file formats and codecs (if beneficial)
Convert the slides into a movie file
The most important step in the production of a vodcast out of a Collegerama recording is the
conversion of the slides into a movie. This can be achieved with the help of screen capturing
systems such as Camtasia. These systems record an assigned part of the display screen into
a movie file. By playing a Collegerama lecture, the slides can be recorded as a movie with the
right time framing.
Figure 2.3 gives an impression of such a recording. In the actual recording of the
Collegerama slides the full display mode of Collegerama has been recorded, instead of the
slide segment in the overall view.
Annex C. Collegerama as single movie/audio file
151
Figure 2.3: Converting the slides into a movie file by recording the Collegerama slide display
Table 2.1 gives the results of the conversion of the slides into a movie file.
Table 2.1: Properties of original slides and the single movie file obtained by screen recording with Camtasia
Property
Number of files
Resolution
Frame rate
Codec
Duration
Total number of frames
Total storage
Original
29 picture files
1024x768
jpg
29
6.3 MB *
Result of conversion
1 movie file
1024x768
15 fps
wmv3
45:09 min. (2,709 s)
15 x 2,709 = 40,635
39.8 MB
1
1
1
(39.8 / 6.3 =) 6.32
(40.635/29 =) 1,401
(1,401/6.3 =) 222
Increase of file size
Increase of frames
Efficiency of compression
* original PowerPoint presentation was 16.1 MB (text and higher quality pictures)
Table 2.1 shows the amazing efficiency of the video compression. The 29 frames (single
pictures) are converted into more than 40,000 frames with only an increase in file size of a
factor 6.3. This proves that the wmv3 codec is extremely efficient in compressing such still
picture movies.
Convert the slide movie into a HD resolution
The slide movie resembles the original slide resolution (1024x768). This should be downsized
to the most nearby HD resolution (in this case HD720). This conversion can be executed by
various video editing systems, including Camtasia.
Figure 2.4 and Table 2.2 give the results of this conversion/editing step in which additionally
the text area is incorporated at the right hand lower side.
152
Annex C. Collegerama as single movie/audio file
Figure 2.4: Converting the slides into a movie file by recording the Collegerama slide display
Table 2.2: Properties of converting the single movie file to HD720, with additional text area (Camtasia)
Property
Number of files
Resolution
Frame rate
Codec
Duration
Total number of frames
Total storage
Original
1 movie file + 1 picture file
1024x768 + 320x480
15 fps
wmv3
45:09 min. (2,709 s)
15 x 2,709 = 40,635
39.8 MB + 0.1 MB
Result of conversion
1 movie file
1280x720
15 fps
wmv3
45:09 min. (2,709 s)
15 x 2,709 = 40,635
39.8 MB
Increase of file size
Increase of resolution
1
1
Efficiency of compression
1
(39.8 / 39.8 =) 1.0
(1280x720/1024x768 = )
1.17
(1.17/1.0 =) 1.17
Table 2.2 shows that the 17% increase in resolution does not result in a larger file size due to
the efficient compression efficiency of the wmv3 codec.
Combining the slide movie and the lecture movie
Finally the slide movie should be combined with the Collegerama lecture movie. This editing
can be executed by various video editing systems, including Camtasia.
Figure 2.5 and Table 2.3 Table 2.2give the results of this editing step.
Annex C. Collegerama as single movie/audio file
153
Figure 2.5: Combining the slide movie with the lecture movie
Table 2.3: Properties of the movie files before and after the incorporation of the lecture movie (Camtasia)
Property
Number of files
Resolution
Frame rate
Codec
Duration
Total number of frames
Total storage
Original
2 movie file
1280x720 + 320x240
15 fps + 25 fps
wmv3
45:09 min. (2,709 s)
15 x 2,709 = 40,635
(39.8 + 117 = ) 157 MB
Result of conversion
1 movie file
1280x720
15 fps
wmv3
45:09 min. (2,709 s)
15 x 2,709 = 40,635
87.8 MB
Increase of file size
Increase of resolution
Reduction in frame rate
1
1
1
(87.8/157 =) 0.56
1
1 resp 25/25 = 0.6
Table 2.3 amazingly shows that the resulting file size is 75% compared to that of the original
Collegerama movie part, although the resolution is 12 times larger. Comparing the file size of
the HQ720 movie without the Collegerama movie part to the result file shows an increase of
(87.8-39.8=) 49.1 MB. This size reflects the costs for the dynamic movie part over the
stagnant picture part in the upper right hand side of the vodcast.
The reduction in file size can only partly be caused by the smaller frame rate (25 versus 15
fps). It proves the efficient use of the wmv3 codec within the Camtasia movie editing system.
The reduction in frame rate is not observed in a reduction of the quality of the movie. Lecture
movies might be recorded at a lower frame rate than those used in Collegerama recordings.
The file size of the vodcast is even smaller than the sum of the basic ingredients (1 movie, 29
slides and 1 information picture). The produced vodcast requires less bandwidth for
streaming or download time for distributing of this Collegerama lecture without losing
information or resolution.
Conversion to other file formats and codecs (if beneficial)
The result file might be converted to other file formats, frame rates and codecs in order to
investigate the compression efficiency of alternative codecs. The results of these conversions
are shown in Table 2.4. All the file formats and codecs in this table are accepted by YouTube.
154
Annex C. Collegerama as single movie/audio file
Table 2.4: Collegerama vodcast (HD720, 45:09 min.) in alternative specifications (Camtasia)
File format
wmv
mp4
mp4
flv
f4v
Video Codec
wmv3
H264
H264
On2 VP6
H264
Frame rate
(fps)
15
10
24
15
15
File size
(MB)
87.8
507
828
465
408
File size
(ratio)
1.0
5.8
9.4
5.3
4.6
Table 2.3 shows that the wmv3 codec has a superior compression over the other codecs,
including the H264 codec as used in the Blu-ray mp4 specification. Apparently the mp4 codec
is quite sensitive to frame rate, which is remarkable in view of the relative still movie (slides
contain the largest part of the movie). The new Flash codec (H264 in f4v) is slightly more
efficient than the older On2 codec, but both are more efficient than the H264 codec in mp4.
The results show that the wmv3 codec is the best option for vodcast production for YouTube.
One step recording for vodcast production
Above described production of a vodcast is rather labor and time consuming. A more or less
similar result might be obtained by a one step recording session. In this the overall
Collegerama display is recorded by Camtasia.
Figure 2.6 gives an impression of such a recording. Table 2.5 shows the results.
Figure 2.6: One step vodcast production for a Collegerama lecture (left: recording; right: result)
Table 2.5: Properties of original Collegerama components and the screen recording by Camtasia
Property
Number of files
Resolution
Frame rate
Codec
Duration
Total number of frames
Total storage
Original
1 movie file, 29 slides, player
320x240 + 640x480 + ..
25 fps + 0.01 fps
wmv3
45:09 min. (2,709 s)
25 x 2,709 = 67,725
(117 + 6.3 = ) 123 MB
Result of screen recording
1 movie file
972x480
15 fps
wmv3
45:09 min. (2,709 s)
15 x 2,709 = 40,635
65.1 MB
Increase of file size
Increase of resolution
Reduction in frame rate
1
1
1
(65.1/123 =) 0.56
1
15/25 = 0.6
The results of Table 2.5 are comparable to the results of Table 2.4.
The file size of the vodcast is around (65.1/117=) 50% of the original Collegerama movie,
although the resolution is (972*480/(320*240=) 6.1 times larger.
The produced vodcast resembles the view of the Collegrama viewer which might be more
recognizable for the viewer. However the Collegerama navigation buttons for the movie part
are shown but not functional.
Annex C. Collegerama as single movie/audio file
155
The production method is rather efficient since only 1 recording/editing session is required.
The resolution of the direct recorded vodcast is not according to the normalized sizes for HD
movies. Moreover the resolution of the slides is smaller than available at the Collegerama
server. These drawbacks might be overcome by modifying the slide display sizes of the
Collegerama player and record the enlarged display size. In this modification, also the movie
section of the Collegerama display might be changed to the upper right hand side in
accordance with the vodcast of Figure 2.4.
Uploading to YouTube
The produced vodcasts can be uploaded to YouTube if your YouTube account allows for
uploading unlimited movie lengths, as is the case for YouTube Edu accounts. Since the TU
Delft does not have such an account, a normal user account has been used in order to
investigate the uploading and storage facilities at the YouTube server. Such an account is
limited to a maximum movie length of 10 minutes.
Table 2.6 shows the results for uploading the (slightly modified) vodcast of Table 2.3.
Table 2.6: Properties of vodcasts uploaded to and stream/downloaded from YouTube
Property
Uploaded
movie
wmv
10:28 min.
19.5 MB
1280x720
15 fps
wmv3
mono
44.1 kHz
wma2
-
Stream/download
Non-HD
flv
10:28 min.
15.0 MB
640x360
15fps
H264
stereo
22.1 kHz
mp4a (AAC-SBR)
854x480
Stream/download
HD
mp4
10:28 min.
70.4 MB
1280x720
15 fps
H264
stereo
44.1 kHz
mp4a (AAC-SBR_
854x480
Increase of file size
1
Increase of resolution
1
(70.4/19.5= )
3.61
1.0
Increase in compression
efficiency
1
(15.0/19.5= )
0.77
(640*360/(1280*720=)
0.26
(0.26/0.77 =)
0.33
Container type
Duration
File size
Video
Audio
Displayed on YouTube
(in web player)
(Source: http://www.youtube.com/watch?v=9na2hHJmmvE)
(1.0/3.61 =)
0.28
Table 2.6 shows that YouTube converts the uploaded movie into its own file formats and
movie quality. YouTube stores and/or streams the uploaded vodcast in 2 file types: flv for
"normal quality" and mp4 for "HD quality". Both quality streams are displayed in the same
sized player. The HD quality becomes more relevant when displaying in full-screen mode.
YouTube uses the H264 codecs for both qualities. This codec is around 3 times less efficient
than the uploaded wmv3 codec. Nevertheless, YouTube uses this less efficient codec as this
is playable within the Flash player. The Flash player has the highest installation coverage on
viewer PC's.
Similar results can be obtained by uploading the vodcast of Table 2.5. These results are
shown in Table 2.7.
156
Annex C. Collegerama as single movie/audio file
Table 2.7: Properties of vodcasts uploaded to and stream/downloaded from YouTube
Property
Uploaded movie
Container type
Duration
File size
wmv
10:26 min.
(65.1*626/2710= )
15.0 MB
972x480
15 fps
wmv3
mono
44.1 kHz
wma2
-
Video
Audio
Displayed on
YouTube
(in web player)
Increase of file
size
Increase of
resolution
Increase in
compression
efficiency
1
1
1
Stream/download
Non-HD
flv
10:26 min.
14.2 MB
Stream/download
HD
flv
10:26 min.
33.9 MB
640x320
15fps
H264
stereo
22.1 kHz
mp4a (AAC-SBR)
640x315
864x432
15 fps
H264
stereo
44.1 kHz
mp4a (AAC-SBR_
854x421
(14.2/15.0= )
0.95
(640*320/(972*480=)
0.44
(0.44/0.95 =)
0.46
(33.9/15.0= )
2.26
(864*432/(972*480=)
0.8
(0.8/2.26 =)
0.35
(Source: http://www.youtube.com/watch?v=otGN0NUYs5w)
Table 2.7 also shows that YouTube converts the uploaded movie into its own file format and
movie quality.
The uploaded resolution is downsized into 2 standard sizes, either with a width of 640 pixels
(normal quality) or a width of 864 pixels (HD quality). The heights are in accordance with the
original aspect ratio.
In this case YouTube stores and/or streams the uploaded vodcast only as flv format, both for
"normal quality" as well as for "HD quality". The mp4 container file is apparently used only for
HD720 files and higher. The different quality streams are displayed in different sized players.
YouTube uses the H264 codecs for both qualities. This codec is about 2-3 times less efficient
than the uploaded wmv3 codec, which has a slightly better result than the previous upload.
Downloading of vodcasts from YouTube
For recorded lectures, a download of the Collegerama vodcast might be beneficial to students
for use in areas without Internet access, such as trains, beaches, parks etc. YouTube only
provide streaming videos. Downloads of these videos is not offered by YouTube itself,
however YouTube movies can be easily downloaded with the help of third party tools and/or
websites.
Table 2.8 gives the result of the download via http://www.youtubedownload.nl. This website
uses the YouTube URL as input and modifies this URL into an URL for direct playback with
save options from the original YouTube server (http://v6.lscache1.c.youtube.com/).
Annex C. Collegerama as single movie/audio file
157
Table 2.8: Properties of downloaded vodcast via youtubedownload.nl
Property
Container type
Duration
File size
Video
Audio
Increase of file
size*
Increase of
resolution*
Increase in
compression
efficiency*
flv
flv
10:28 min.
mp4
mp4
10:28 min.
3gp
3gp
10:28 min.
23.3 MB
640x360
15 fps
flv1
stereo
22.1 kHz
mp3
25.8 MB
480x270
15fps
H264
stereo
44.1 kHz
mp4a (AAC-SBR)
4.3 MB
176 x 144
12 fps
mp4v
mono
22.2 kHz
AAC
(23.3/19.5= )
1.19
(640*360/(1280*720=)
0.26
(0.26/1.19 =)
0.21
(25.8/19.5= )
1.32
(480*270/(1280*720=)
0.14
(0.14/1.32 =)
0.10
(4.3/19.5= )
0.22
(176*144/(1280*720=)
0.03
(0.03/0.22 =)
0.14
* compared to the original wmv upload
(Source: http://www.youtube.com/watch?v=9na2hHJmmvE and http://www.youtubedownload.nl)
The results of Table 2.8 show that this download service is focused on the "normal quality"
videos of YouTube, in view of the smaller file sizes. Apparently YouTube files can also be
presented in a 3gp file format, which is most suitable for mobile phones.
The efficiency of the used codecs in these downloads is 5-10 times less than the original
wmv3 codec. The downloaded flv file is around (23.3/15.0=) 1.6 times larger than the
directly obtained non-HD movie with similar technical specifications. This download service
offers simplicity in use and flexible file formats but no efficient downloads.
Alternative YouTube download services are:
• Moyea FLV Downloader
• http://www.downloadyoutubevideos.com/
• http://www.viddownloader.com
• and many, many more
YouTube does not encourage the direct download of their movies as this reduces their
viewing rate resulting in reduced advertisement revenues. In February 2009, YouTube
announced a test service, allowing some partners to offer video downloads for free or for a
fee paid through Google Checkout. It is very likely that this will also be part of the YouTube
Edu channel.
Conclusions and recommendations for vodcasts on YouTube
The production of Collegerama vodcasts has resulted into the following conclusions:
• a high quality vodcast can be produced at HD720 specifications with a file size which is
even smaller than the original Collegerama video recording
• the wmv3 codec is much more efficient for the recorded lectures than the H264 or flv1
codec
The upload of these vodcasts to YouTube has resulted into the following conclusions:
• the high quality vodcasts can be uploaded to and displayed from YouTube without loss of
quality
• the video codecs used by YouTube are less efficient than the wmv3 codecs, resulting in
larger download files if downloaded from YouTube
• uploading Collegerama lectures to YouTube requires a YouTube Edu account which
allows for uploading movies over 10 minutes
158
Annex C. Collegerama as single movie/audio file
3.
Collegerama for iTunes
What is iTunes?
iTunes is an application that allows the user to manage audio and video
on a personal computer, acting as a front-end for Apple's QuickTime
media player. Officially, iTunes is required in order to manage the audio
of an Apple iPod portable audio player, although alternative software
does exist. Users can organize their music into playlists within one or
more libraries, edit file information, record Compact Discs, copy files to
a digital audio player, purchase music and videos through its built-in
music store, download free podcasts, back up songs onto a CD or DVD,
run a visualizer to display graphical effects in time to the music, and
encode music into a number of different audio formats. There is also a
large selection of free internet radio stations to listen to.
Version 4.9 of iTunes, released on June 28, 2005, added built-in support for podcasting. It
allows users to subscribe to podcasts for free in the iTunes Music Store or by entering the
RSS feed URL. Once subscribed, the podcast can be set to download automatically. Users can
choose to update podcasts weekly, daily, hourly, or manually.
Users can select podcasts to listen to from the Podcast Directory, to which anyone can submit
their podcast for placement. The front page of the directory displays high-profile podcasts
from commercial broadcasters and independent podcasters. It also allows users to browse
the podcasts by category or popularity, and to submit new podcasts to the directory.
Video content available from the store used to be encoded as 540 kbit/s Protected MPEG-4
video (H.264) with an approximately 128 kbit/s AAC audio track. Many videos and video
podcasts currently require the latest version of QuickTime, QuickTime 7, which is
incompatible with older versions of Mac OS (only v10.3.9 and later are supported).
On September 12, 2006, the resolution of video content sold on the iTunes Store was
increased from 320x240 (QVGA) to 640x480 (VGA). The higher resolution video content is
encoded as 1.5 Mbit/s (minimum) Protected MPEG-4 video (H.264) with a minimum 128
kbit/s AAC audio track.
Video formats for iTunes
The main focus of iTunes is to distribute content to the Apple iPod and its successors. The
original iPod was not provided with a video screen for movie display until October of 2005.
The iPod Nano got a movie display in September 2007. The screen size of the iPod family is
shown in Table 3.1.
Table 3.1: Screen sizes of the iPod and its successors
Type
iPod video
iPhone
iPod Touch
iPod Nano
Introduction date
October 2005
June 2007
September 2007
September 2007
September 2009
Supported video
(external screen)
Annex C. Collegerama as single movie/audio file
Screen size
320 x 240
480 x 320
480 x 320
320 x 240
376 x 240
640 x 480
Aspect ratio
1.33 (4:3)
1.5 (3:2)
1.5 (3:2)
1.33 (4:3)
1.57
1.33 (4:3)
159
The iPod family has developed into larger screen sizes and wider screens (higher aspect
ratio). If the iPhone aspect ratio is compared to the HD widescreen ratio used today, the
iPhone is somewhere in between the traditional TV standard and HD widescreen. All iPods
support a video display of maximum 640x480 by use of an external screen. It is expected
that future iPhone models will at least support the HD 720p and 1080i output modes for
external display with its 16:9 aspect ratio. As widescreen HD video has become more or less
the standard nowadays, it looks like Apple will also transform into larger video displays with
HD specifications.
iPod constraints for Collegerama vodcasts
For the development of a Collegerama vodcast for iTunes (and iPods), the following aspects
are of major concern:
• the rather low resolution of the screen
• the different aspect ratio
These constraints have consequences for the following design aspects:
• size of display
• size of the video component
• location of the video component (upper/lower/left/right corner)
Low resolution
The resolution of the iPod is the same as of the Collegerama video component. This would
allow for simply distributing this video component as a vodcast leaving out the presentation
slides. Such setup is used at MIT and many other universities. However the slides provide for
the most viewable information in a Collegerama recording. The slides form an important part
of a Collegerama vodcast.
In an alternative setup, the vodcast might include the slide part of Collegerama with the
audio of the video component. This is only an adequate alternative whenever the slides are
readable at this low resolution.
Figure 3.1 gives an example of a typical PowerPoint slide at iPod resolution.
Figure 3.1: A typical PowerPoint slide at iPod resolution (320x240)
Figure 3.1 shows that the smaller fonts in a presentation are no longer readable at iPod
resolution, but the typical PowerPoint fonts can still be read quite well. The iPod resolution is
around (320/1024=) 30% of the maximum slide size in Collegerama and (320/640=) 50% of
the slide size in an overall Collegerama display.
160
Annex C. Collegerama as single movie/audio file
Different aspect ratio
The iPod aspect ratio is the same as both the slides and the movie components in
Collegerama. Therefore combining these two components in a widescreen view as in the
previous YouTube vodcast is not possible.
Alternative solutions are:
• the slide components is not included (video only)
• the video components is not included (audio only)
• the video component is included at a rather small size (picture-in-picture)
• the video component is included at a rather small size (side-by-side) with unequal scaling
Figure 3.2 gives an impression of the latter three options.
Figure 3.2: Collegerama vodcasts including slides with different options for the video component, at iPod aspect ratio
From Figure 3.2, it is concluded that the most convenient option for including the movie
component is the picture-in-picture layout. This is based on the following considerations:
• the slides should be shown at maximum size for proper readability (no side-by-side)
• the movie component can be reduced to a rather small size (thumbnail) and still
obtaining proper visibility
• including the audio component without the video component misses a focus point for the
viewer (the movements of the lecturer give a better understanding of the lecture)
Size of display
An important aspect in the design of a vodcast for iTunes is the display resolution selected for
the production and for the distribution. The design strategy for creating the smallest file size
looks most promising in this case, for the following reasons:
• vodcasts for iTunes should be downloaded to and stored at the iPod of the viewers
(download time and storage capacity are relevant factors now, which is not the case in a
streaming video setup)
• small file sized vodcasts will minimize the requests for other small sized output options
like podcasts (audio only), which would require additional production and distribution
efforts (time, costs, organization)
• a small sized design gives a larger differentiation to the YouTube HD quality design
• iTunes uses the H264 codec, which is not as efficient in video compression as the wmv3
codec used in the YouTube design, so a smaller display size will be more relevant for a
less efficient compression
• the smallest display design allows for viewing on the older iPods, which is still the
majority of the iPods currently in use
For above mentioned reasons a vodcast for iTunes will be produced with a display size of
320x240 pixels.
Annex C. Collegerama as single movie/audio file
161
Size of video component
The video in Collegerama shows the lecturer talking to the attendees. For such oration, a very
small size is sufficient for viewing as the most important aspect of such a movie is its audio
component (spoken words). This is shown in Figure 3.3 in which the original video resolution
(320x240) is downsized to 10% of its original size.
Figure 3.3: Collegerama video in original size (320x240), and reduced to 30, 20 and 10%
Figure 3.3 shows that downsizing the Collegerama video to 20% (64x48) still gives a
sufficient visibility with a speaking lecturer.
However in some recorded lectures the lecturer is writing text on the blackboard or is
presenting experiments. Both situations require a larger display size for proper viewing. For
these situations, a full switch from the slide view to the video component might be
suggested.
However this will require an extensive video editing procedure which also might require the
input of the lecturer. These constraints are not within the scope of a vodcast production out
of a Collegerama recording. Production of a vodcast from a Collegerama recording should be
possible within a fully automated production process.
The video component in a picture-in-picture design with the slides at the background will
cover part of the slides reducing its readability. This can be minimized by doing the following:
• selecting a very small video component (10-20%)
• making the video component (partly) transparent, still allowing for a background view
(this setup might allow for a larger video size than a non-transparent movie, 20-30%
instead of 10-20%)
• placing the video component in an area with the lowest disturbance of the slide view
Location of the video component
The video component should be located on the least disturbing part of the slide. Figure 3.4
gives an impression of these locations for a TU Delft PowerPoint slide at iPod resolution.
162
Annex C. Collegerama as single movie/audio file
Figure 3.4: PowerPoint slide in TU design at iPod size, without and with inserted movie components (20%)
Figure 3.4 shows that the upper left corner and the lower right corner are unsuitable for
movie insertion. The upper left corner hides the important slide title, while the left corner
hides the slide number. The lower left corner hides the TU Delft logo and the upper right
corner might hide part of the slide title. Both locations are regarded to be acceptable.
The lower left corner might have a small advantage since this resembles the general lecture
room layout in TU Delft, in which the lecture desk is in the left front and the projection screen
is located in the upper center of the lecture room either or to the upper right. This lecture
room layout results in many Collegerama recordings showing the lecturer looking to his/her
upper left. With a movie component in the upper right corner the lecturer often seems to look
in the "sky".
It should be noticed here that not all lecturers use the standard TU Delft PowerPoint design.
However, in case a lecturer is aware that his Collegerama recording is transformed into an
iPod vodcast, he or she might adjust their slides to keep a certain corner of the slide empty.
Therefore a uniform predesigned position of the movie component is important.
Components of a Collegerama vodcast for iTunes
Small file sizes can be also be obtained by excluding parts of the Collegerama recording.
To determine the relative importance of Collegerama components, vodcasts have been
produced with different components, such as:
• slides with audio (no video component)
• slides with audio and subtitles (no video component)
• slides with audio and video
• slides with audio, video and subtitles
These vodcasts have been compared to other output forms of a lecture such as:
• audio only (podcast)
• slides only (pdf)
The vodcasts and other output forms have been uploaded to the TU Delft E-learning system
(BlackBoard) for evaluation by TU Delft employees. These output forms are produced at a
video resolution of the original slides (1024x768) with Microsoft codecs (wma, wmv). The
selected video resolution is larger than supported by iTunes and iPod.
This video resolution has been selected to enable the evaluation at larger screen sizes. The
vodcasts have been produced for the initial 10:28 minutes of the lecture to limit the
production time.
Annex C. Collegerama as single movie/audio file
163
Vodcast file size
Table 3.2 gives an overview of the file sizes for the different output forms. In this table the
file sizes have been extrapolated to a full lecture of 45:00 minutes by assuming a linear
relation with duration.
Table 3.2: File size for different Collegerama output files
Type
Slides
Slides
Slides
Slides
+
+
+
+
audio
audio + subtitles
audio + video
audio + subtitles + video
Audio only (podcast 48 kbps)
Audio only (podcast low quality 32 kbps)
Slides only
File format
wmv
wmv
wmv
wmv
File size
10:28 min.
(MB)
9.7
10.7
14.3
15.3
File size
45:00 min.
(MB)
38.7
46.0
61.5
65.8
wma
wma
pdf/jpg
3.7
2.4
0.1
15.9
10.3
2.9
Slides: size 1024x768 - frame rate 15 fps
Audio: mono - 44.1 kHz - 48 kb/s
Video: size 205x154 (20%) – frame rate 15 fps (picture-in-picture)
The results of Table 3.2 show that Collegerama vodcasts with slides are 40 to 65 MB. All of
these vodcasts have the same resolution, the same frame rate, and the same audio
component. The file size becomes larger when the vodcast reaches a part that has some form
of increased movement.
The wmv3-codec is most efficient in its compression for slideshows (picture movies). That's
why the vodcast with slides and audio is only double the size of the podcast (audio only) with
similar audio specifications.
Moreover including a small sized video in the corner of the slide increases the file size much
less than might be expected. A podcast (audio only) of the lecture requires some 10-15 MB,
with the smaller size for the lower audio quality.
The vodcasts with slides are smaller than the vodcast of Table 2.4 (87.8 MB). The latter is
produced at a higher resolution of both the slides as the video component.
Vodcast information transfer efficiency
The different output forms give different results in 'understanding of the lecture'. Slides only
(as pdf) give a proper impression of the content but will not allow for understanding the
lecture. Adding the audio will give an improvement of this understanding.
Table 3.3 gives score for the relative information transfer of the different output forms, in
which a normal Collegerama view is set to 100%. The ratio file size over relative information
transfer gives the information transfer efficiency per output form.
164
Annex C. Collegerama as single movie/audio file
Table 3.3: Relative score for understanding the lecture for different output files, and the information transfer
efficiency
Type
Collegerama recording
Slides
Slides
Slides
Slides
Stream
Relative
information
transfer
(%)
100
Information
transfer
efficiency
(MB / %)
-
38.7
46.0
61.5
65.8
70 - 90
75 - 95
90 - 100
95 - 105
0.5
0.5
0.6
0.6
15.9
10.3
2.9
20 - 30
20 - 30
5 - 10
0.6
0.4
0.4
File size
(MB)
+
+
+
+
audio
audio + subtitles
audio + video
audio + subtitles + video
Audio only (podcast 48 kbps)
Audio only (podcast low quality 32 kbps)
Slides only
Table 3.3 shows that slides and audio are the most essential parts of a vodcast for adequate
information transfer. Audio only (as in a podcast) is less or not suitable for this type of lecture
in which a lot of information is shown on the slides, in the form of pictures.
The video component in the corner of the slide area improves the information transfer,
without significantly reducing the transfer efficiency. It is therefore concluded that a vodcast
for iTunes should include the video component (at a small size).
Lectures without the use of slides will have a larger information transfer in a podcast (audio
only) than observed in Table 3.3. Information transfer within the 80-90% range might be
obtained for some lectures as seen in MIT OpenCourseWare in which the lecturer is given an
oration without the use of slides and not using the blackboard.
A slide only file is quiet useful despite its low information transfer. Most often it gives the title
of the lecture on the initial slide, the structure of the lecture on the content slide and the
highlights of the lecture in the title of the slides itself.
The slide only files have the best performance in information efficiency (small ratio of size
over information transfer).
Moreover, slide only files are easy to navigate for the viewer.
From these observations it is concluded that distributing the presentation slides (in the form
of a pdf-file) always is an adequate distribution feature, even in combination with a vodcast.
Vodcast production
For the production of a vodcast for iTunes the following production strategies can be
evaluated:
• tailored vodcast production process for iTunes/iPod
• vodcast production out of the YouTube vodcast
Tailored vodcast production for iTunes/iPod
The tailored vodcast for iTunes/iPod might be produced in a similar way as described in the
chapter on a vodcast for YouTube.
This vodcast can be produced in the following steps:
• convert the slides into a movie file (same as for YouTube)
• combine the slide movie and the lecture movie, the latter in reduced resolution
• convert the produced vodcast into iPod resolution (320x240) and iPod codec (mp4)
Annex C. Collegerama as single movie/audio file
165
The result of the second step has been published in BlackBoard en uploaded to YouTube for
general evaluation.
BlackBoard enables display of wmv files in the Windows media player at iPod resolution (and
others).
YouTube enables embedded viewing at iPod resolution by embedding.
The results of both display modes are shown in BlackBoard at:
http://blackboard.tudelft.nl/webapps/blackboard/content/listContent.jsp?mode=reset&course
_id=_13432_1&content_id=_1085686_1#_1092564_1
Figure 3.5: Collegerama vodcast at iPod resolution
(Source: http://www.youtube.com/watch?v=5Qqe6XxbvS4)
For distribution on iTunes, the resolution should be downsized to 640x480 and might be
downsized to 320x240 for the smallest download size.
Table 3.4 gives the results for such conversion.
Table 3.4: Properties of vodcasts for iTunes/iPod
Property
Vodcast at slide
resolution
wmv
10:28 min.
14.3 MB
1024x768
15 fps
wmv3
mono
44.1 kHz
wma2
Vodcast at iPod
resolution
wmv
10:28 min.
9.1 MB
320x240
15fps
wmv3
mono
22.1 kHz
wma2
Vodcast at iPod
resolution
mp4
10:28 min.
31.3 MB
320x240
15 fps
H264
mono
22.1 kHz
mp4a (AAC-SBR)
Increase of file size
1
Increase of resolution
1
Increase in
compression
efficiency
1
(9.1/14.3= )
0.64
(320*240/(1024*768=)
0.10
(0.10/0.64 =)
0.16
(31.3/14.3= )
2.1
(320*240/(1024*768=)
0.10
(0.10/2.1 =)
0.05
Container type
Duration
File size
Video
Audio
Table 3.4 shows that downsizing to an iPod resolution in wmv3 codec reduces the file size but
far less than the reduction in resolution. At this smaller resolution the compression efficiency
is less.
166
Annex C. Collegerama as single movie/audio file
By transforming to an mp4 file with the H264 codec the file size is (31.3/9.1=) 3.4 times
enlarged. This again shows the inferior compression of the H264 codec compare to the wmv3
codec.
Vodcast production from the HD vodcast
An iTunes.iPod vodcast might alternatively be produced from the widescreen HD vodcast for
YouTube by unequal resizing. This allows for production of different resolutions and aspect
ratios for iPod and for iPhone. This can be done by automatic conversion.
Figure 3.6 gives the different aspect ratios of these vodcasts. Table 3.5 gives the results of
this conversion.
Figure 3.6: Change in aspect ratio by resizing from HD widescreen resolution to iPhone and iPod resolution
Table 3.5: Properties of vodcasts for iTunes/iPod
Property
Vodcast at HQ
resolution
wmv
10:28 min.
19.5 MB
1280x720 (16:9)
15 fps
wmv3
mono
44.1 kHz
wma2
Vodcast at iPhone
resolution
w4v
10:28 min.
19.0 MB
480x320 (3:2)
10 fps
H264
stereo
44.1 kHz
mp4a (AAC-SBR)
Vodcast at iPod
resolution
m4v
10:28 min.
15.5 MB
320x240 (4:3)
10 fps
H264
mono
44.1 kHz
mp4a (AAC-SBR)
Increase of file size
1
Increase of resolution
1
Increase in
compression
efficiency
1
(19.0/19.5= )
0.97
(480*320/(1280*720=)
0.17
(0.17/0.97 =)
0.18
(15.5/19.5= )
0.79
(320*240/(1280*720=)
0.083
(0.083/0.79 =)
0.11
Container type
Duration
File size
Video
Audio
Table 3.5 shows that the reduction in resolution hardly reduces the file size. This is caused by
inferior compression efficiency of the H264 codec compared to the wmv3 codec. The loss in
compression efficiency is larger at smaller resolution. This observation is in line with the
conclusion in Table 3.5.
The file size of the iPod vodcast is smaller than produced for Table 3.2. This difference can
probably not fully be caused by the lower frame rate.
Annex C. Collegerama as single movie/audio file
167
Vodcast production by TU Delft
The department of TU Delft that is responsible for the Collegerama facilities has produced
two vodcasts out of the recordings of two lectures of the same course (CT3011 – Inleiding
Watermanagement).
Based on the results of this study, which was presented on BlackBoard for evaluation, they
decided to produce a vodcast for iPod resolution in which the video part was included as a
transparent movie section.
Figure 3.7 gives an impression of this layout. The properties of the vodcasts are given in
Table 3.6.
Figure 3.7: Vodcast for iPod resolution as produced by the Collegerama department
Table 3.6: Properties of vodcasts for iPod as produced by the Collegerama department
Property
Container type
Duration
File size
Video
Audio
CT3011
Lecture 3a
mp4
45:56 min.
123 MB
320x240 (4:3)
15 fps
H264
Video 105x78 (33%)
stereo
48.0 kHz
mp4a (AAC-SBR)
CT3011
Lecture 3b
mp4
39:18 min.
106 MB
320x240 (4:3)
15 fps
H264
Video 105x78 (33%)
stereo
48.0 kHz
mp4a (AAC-SBR)
Table 3.6 shows that the average file size for lectures with duration of 40-45 minutes is 105125 MB.
This file size will require a relatively large bandwidth for fast downloading.
These files sizes are even larger than the HD vodcast for YouTube as presented in Table 2.2
(90 MB for a 1280x720 vodcast). As presented in Table 2.3, this is largely caused by the
inferior compression of the H264 codec compared to the wmv3 codec.
The vodcasts of Table 3.6 have a proper visibility for the slides. The visibility of the movie
component is less than for a non-transparent display despite its larger size (33% versus
20%). The part of the slides that is covered by the movie component has about 50%
readability. However the overall view resembles a more modern iPod view. Whether a larger
transparent movie section is better than a smaller non-transparant section seems to be a
matter of taste.
168
Annex C. Collegerama as single movie/audio file
The vodcasts again confirm the conclusion in the paragraph on Components in a vodcast.
Audio is very important, the readability of the slides should be good and the video component
is of minor importance for this type of lecture.
Uploading to and downloading from iTunes
Uploading to and downloading from iTunes could not be tested yet.
The handling of the account request of TU Delft for an iTunes U account is still in progress.
Conclusions and recommendations for vodcasts on iTunes
The production of Collegerama vodcasts for iTunes has resulted into the following
conclusions:
• an iPod vodcast can be produced out of a Collegerama recording with sufficient visibility
and readability despite the low resolution (320x240)
• a picture-in-picture layout is preferred over a side-by-side layout because of the larger
display size of the slides and therefore better readability
• lecturers might anticipate in their slide design to this picture-in-picture vodcast
• the wmv3 codec is much more efficient for the recorded lectures than the H264 or flv1
codec
Annex C. Collegerama as single movie/audio file
169
4.
Evaluation
iTunes versus YouTube
Comparing the iTunes world (vodcasts and iTunes functionality) with the YouTube world
gives the following observations:
• YouTube allows for distributing HD vodcasts at Collegerama resolution, for iTunes lower
quality vodcasts should be produced
• YouTube gives much more functionality than iTunes (subscripts, automatic translation of
subtitles, annotation, flexibility in viewing)
• iTunes is more easy for downloading, but this drawback can be overcome for YouTube by
using third party download tools and by the introduction of the announced download
option for YouTube channels
• by using third party download tools for YouTube, users are able to download vodcasts at
requested resolution, file formats etc. (mp4, flv, 3gp)
YouTube is the preferred channel for distributing Collegerama vodcast. Additionally,
distributing Collegerama vodcasts via iTunes is more focused on marketing. This channel
might require tailored vodcasts because of the limitations in screen sizes for iPod and iPhone.
It is expected that future iPods and iPhones (or their successors) will allow for displaying
larger resolutions, probably at HD quality.
Alternative download options
Downloads for Collegerama vodcast could also be offered as part of a BlackBoard course, or
within the courses available of the TU Delft OpenCourseWare.
Future developments of Collegerama
It might be assumed that future releases of the Collegerama server (Mediasite of Sonic
Foundation) will include options for downloads produced online at a selected resolution.
170
Annex C. Collegerama as single movie/audio file
Annex D.
1.
2.
3.
Subtitling of Collegerama
Subtitles on digital media ......................................................................... 173
Why subtitles? .................................................................................................. 173
Subtitle types .................................................................................................... 175
Developments ................................................................................................... 176
From image to text ............................................................................................ 176
Subtitle formats ................................................................................................ 177
How to create subtitles ...................................................................................... 180
Subtitling for recorded lectures ................................................................ 182
Selecting of an example lecture.......................................................................... 182
Creating subtitles for the example lecture ........................................................... 182
Subtitling in YouTube ........................................................................................ 184
Translation of subtitles for recorded lectures ........................................... 185
YouTube ........................................................................................................... 185
Annex D1.
Annex D2.
Annex D3.
Annex D4.
Transcript lecture CT3011 (unsorted) ............................................. 188
Transcript lecture CT3011 (sorted) ................................................. 191
Partial transcript lecture CT3011 (incl. time frames / sentence) ... 196
Partial transcript lecture CT3011 (incl. time frames / word) ......... 198
Annex D. Subtitling of Collegerama
171
172
Annex D. Subtitling of Collegerama
1.
Subtitles on digital media
Why subtitles?
There are several added benefits for adding subtitles to Collegerama lectures:
• lecture is easier to follow
• lecture is available to foreign speaking students
• lectures can be made searchable
Lecture is easier to follow
If a lecture contains subtitles during playback, it will be
possible for deaf and people with a hearing problem to
understand what is being said. These special subtitles for the
hearing impaired are called "closed captions" or sometimes
also referred to as "Subtitles for the hard of hearing". The
term "closed" in closed captioning indicates that not all
viewers see the captions, only those who choose to decode
or activate them. This distinguishes from "open captions"
(sometimes called "burned-in" or "hardcoded" captions),
which are visible to all viewers.
Most of the world does not distinguish captions from subtitles. In the United
States and Canada, these terms do have different meanings, however:
"subtitles" assume the viewer can hear but cannot understand the language
or accent, or the speech is not entirely clear, so they only transcribe
dialogue and some on-screen text. "Captions" aim to describe all significant
audio content—spoken dialogue and non-speech information such as the identity of speakers
and, occasionally, their manner of speaking—along with music or sound effects using words
or symbols.
Lecture is available to foreign speaking students
Subtitles are generally used to display the spoken
words in a video on the screen. For every different
language, a new subtitle track has to be created. Most
DVD movies that are released in Europe contain at
least the subtitle tracks for the languages German,
French and English. During production these subtitles
are mostly created by hand using professional
translators.
An alternative for generating different subtitle tracks is
to use an automated computer system. An example of such a service that is publically
available is Google Translate. It is a beta service provided by Google Inc. to translate a
section of text, or a webpage, into another language. At the moment of writing the system
supports 52 different languages from around the world. Like other automatic translation
tools, it has its limitations. While it can help the reader to understand the general content of a
foreign language text, it does not always deliver accurate translations. Some languages
produce better results than others.
Lectures can be made searchable
Every Collegerama lecture consists of a single video stream. Without some sort of indexing
system, the only thing offered is a 45 minute long video that has no possibility for skipping
through relevant parts based on a certain topic.
Annex D. Subtitling of Collegerama
173
There are several methods for indexing:
• transcript of spoken text
• time stamped transcripts of spoken text (subtitles)
• tagging
Transcript of spoken text
A transcript is a written record (usually typewritten) of dictated or recorded speech. When
this is available for a certain movie, a search engine can be used to look through its content
to see if a certain search term is mentioned somewhere. The problem with having just the
spoken text is that there is no way of knowing at which timeframe the word has actually been
spoken.
Time stamped transcripts of spoken text
Subtitles or time stamped transcripts of a video serve as the foundation for making every part
of the video searchable. It's possible to search for a certain keyword or term within the
spoken text. Along with the search results, a reference link to a certain timestamp can be
returned so that the user can fast forward the video to that part.
Tagging
In online computer systems terminology, a tag is a non-hierarchical keyword or term
assigned to a piece of information (such as an internet bookmark, digital image, or computer
file). This kind of metadata helps describe an item and allows it to be found again by
browsing or searching. Tags are chosen informally and personally by the item's creator or by
its viewer, depending on the system. Within a movie, tags usually contain the title of a movie,
certain topics that are discussed and possible chapter titles of a certain book that is being
covered.
When a movie has been described by a certain amount of tags, it is possible to create a tag
cloud that shows the different topics that are covered. By matching the terms with the
frequency of the times they're said in the video, a relevancy weight can be assigned to them.
This results in a tag cloud where the size of the text is equivalent to its relevance.
Figure 1.1: Example of a tag cloud
174
Annex D. Subtitling of Collegerama
Subtitle types
There are three different types of subtitles available:
• hard
• pre-rendered
• soft
Figure 1.1 gives an overview of the basic differences between these types.
Table 1.1: Three different subtitle types
Feature
Can be turned on/off
Editable
Player requirements
Hard
No
No
None
Transitions and effects
Highest
Pre-rendered
Yes
Difficult, but possible
Most players support
DVD subs
Low
Distribution
Inside video stream
Separate video stream
Additional overhead
Example
None
VHS video tape /
Karaoke CD movie
High
DVD movie
Soft
Yes
Yes
Usually requires
special software
Depends on player,
usually poor
Small subtitle file or
instructions stream
Low
Blu-ray movie
Hard
In this form, the subtitle text is merged with the original video frames and no special
software or hardware is required for playback. The most commonly known form of these is
Karaoke, where complex animations such as a bouncing ball are used to follow the lyrics. The
disadvantage is that these cannot be turned off unless the original video stream is also
available.
Pre-rendered
These are separate video frames that are overlaid on the original video stream while playing
and are used on DVD's. The general codec used for movie DVD's is called vobsub,
recognizable by the .sub and .idx files on the DVD. You can turn them on or off and usually
include separate streams for different languages. They're usually encoded as images with
minimal bit rate and number of colors. It's very hard to alter the subtitles, but it is possible to
convert them to "soft subtitles" using software such as SubRip (using OCR technology).
Soft (also known as softsubs or closed subtitles)
Softsubs are separate instructions that usually contain a timestamp in combination with a
piece of text that can be displayed during playback. It requires player support and there are
numerous different file formats available. They are relatively easy to create and update.
Figure 1.2: Example of a pre-rendered subtitle
Annex D. Subtitling of Collegerama
175
Developments
Each new movie distribution technology uses a more advanced type of subtitle system. When
the first video tapes came out, they were displayed on an analog system. In order to display
the subtitles it was necessary to hardcode them onto the video stream. For every different
language, a new video tape had to be produced. Once the video was created and the
subtitles were burned into the stream, it was impossible for anyone to alter them post
production.
When the first karaoke CD movies came out, they also used a hard-coded system that burned
the subtitles directly onto the video stream. This offered the advantage of transition effects
such as word highlighting that could be incorporated in these types of videos.
Several years later the video distribution was done
through a digital medium (DVD). This allowed for new
possibilities, such as combining different video streams
together. Because of this new technology, a single DVD
could be produced for all the different countries and
languages, because audio and video streams of subtitles
could be mixed together and chosen by the end user.
The problem with this subtitle system is flexibility in
updating the subtitles. Because the basic system is
essentially the same as the VHS streams that burn the
subtitles onto the video stream, they are almost impossible to alter once the subtitle stream
has been produced.
In the past few years, a new digital medium has been produced called Blu-ray and once again
a new subtitle system was employed. Because movie producers would like more flexibility in
altering subtitle streams, it was decided that the streams should be created in a text-based
form. They were no longer displayed by combining different video streams, but by letting a
player render the subtitles as the movie is being played. The obvious disadvantage of this
system is that a piece of software is required on the player side to accomplish this. However
this is easily overcome by setting standards for the creation of Blu-ray players, that all
incorporate this simple piece of rendering software.
From image to text
Most subtitles consist purely of text characters. Since text is also some of the easiest data to
store and compress it makes sense to store subtitles as simple text files or a text stream
within a video file. Although it's normal for all subtitles to start out this way, that doesn't
mean that this is the way they are stored.
As a matter of fact, subtitles on DVDs aren't actually text. They're actually encoded as raster
graphics. Much like the way characters on older text-based computer interfaces, they're
actually just a collection of dots on a grid. These images are put over the top of the video
frame when displayed.
The important thing about any text-based subtitle format is that you do have the ability to
edit subtitles easily. Since editing a text-based subtitle can generally be done with even a
simple text editor like Notepad, they're the easiest to modify and by far the easiest to create
yourself. Creating subtitles isn't exactly something most people have the inclination (or time)
to do, but if you want to do this you'll have to at least start with a text-based format.
Perhaps the biggest reason for the widespread development of text-based subtitles is their
use in AVI files. While AVI files can't contain graphic subtitles, they can have text subtitles.
AVI was/is the most common container for MPEG-4 ASP video encoded with codecs like DivX
and XviD and was also the format first added to DVD players for MPEG support.
176
Annex D. Subtitling of Collegerama
In order to create text-based subtitles from an image-based format, a process called OCR or
Optical Character Recognition is used. OCR software essentially attempts to 'read' the text
represented by the images. The problem is that there can be big differences between two
different images of the same character. Differences in fonts and spacing make it nearly
impossible for even the most sophisticated OCR software to identify every character correctly.
Subtitle formats
There are countless different formats for the displaying and storing of subtitles. This is mainly
caused by different movie producers, video players or other software development companies
who are all trying to create the best technology that will become the standard format to be
used. Some of these are more popular and more widely used as others, but each format can
be placed in one of four categories:
• image based subtitles
• text based subtitles
• HTML-based subtitles
• XML-based subtitles
Table 1.2 shows a select number of examples for each category, along with a few
characteristics that are typical for that format.
Table 1.2: Different subtitle formats
Name
Extension
Text Styling
Metadata
VobSub
.sub + .idx
N/A
N/A
XSUB
embedded
N/A
N/A
SubRip
.srt
No
No
SubViewer
.sub
No
Yes
AQTitle
.aqt
No
No
JACOSub
.jss
Yes
No
MicroDVD
.sub
No
No
MPSub
.sub
No
Yes
Ogg Writ
embedded
Yes
Yes
Phoenix Subtitle
.pjs
No
No
PowerDivX
.psb
No
No
(Advanced) SubStation Alpha
.ssa or .ass
Yes
Yes
SAMI
.smi
Yes
Yes
RealText
.rt
Yes
No
MPEG-4 Timed Text
.ttxt (or embedded)
Yes
No
Structured Subtitle Format
.ssf
Yes
Yes
Universal Subtitle Format
.usf
Yes
Yes
Image based
Text based
HTML-based
XML-based
Annex D. Subtitling of Collegerama
177
Image based
The most commonly used and known image based subtitle format is vobsub. It is the name
of the format for bitmap subs after they have been extracted from a VOB, Video Object file,
on a DVD. The application which extracted the subtitles was also called VobSub (now known
as VSRip) was developed by Gabest. VobSubs consist of an .idx file (the index of starting
timestamps, colors, and other basic info) and a .sub file (which contains the bitmap pictures
for the subtitles themselves). Unlike text based formats, VobSubs are usually somewhat
larger in size because the images take up more disk space.
Index code and bitmap sample:
timestamp: 00:18:40:752, filepos: 0000aa000
File:ct3011_001.PNG
Figure 1.3: Sample code for vobsub (image based)
Text based
SubRipText (or SRT) is a subtitle format commonly used in combination with XviD or DivX
movies. SubRip is an optical character recognition program for Windows which rips (extracts)
subtitles and their timings from video files or DVDs, recording them as a text file. It is also
the name of the subtitle format created by this software. The caption files are named with
the extension .srt. This format is supported by most software video players and subtitle
creation programs. An example of this is shown in Figure 1.4.
1
00:00:00,100 --> 00:00:03,300
Na een ruime inlooptijd
2
00:00:03,300 --> 00:00:06,800
kunnen we beginnen
3
00:00:06,800 --> 00:00:11,700
met het tweede deel van 30-11, Watermanagement.
Figure 1.4: Sample code for SRT (text based)
HTML-based
An example of a HTML-based subtitle system is called SAMI. It stands for Synchronized
Accessible Media Interchange and is a rare subtitle HTML-format that is based on start
frames that are given for each subtitle. The structured markup language is designed to
simplify creating captions for media playback on a PC, i.e. not for broadcast purposes.
SAMI documents are text, and can be written in any text editor, although there are special
utilities available to create SAMI documents. They use .smi or .sami file extensions. The
common use of .smi for SAMI files creates a file extension collision with SMIL files. The
advantage of this format is that each SAMI document may contain more than one language.
It is also the only supported subtitle format supported by Microsoft Windows Media Player. An
example of the SAMI format is shown in Figure 1.5.
178
Annex D. Subtitling of Collegerama
<HEAD>
<STYLE TYPE="Text/css"><!-P {margin-left: 29pt; margin-right: 29pt; font-size: 14pt;
text-align: center; font-family: tahoma, arial, sans-serif;
font-weight: bold; color: white; background-color: black;}
.SUBTTL {Name: 'Subtitles'; SAMIType: CC;}
-->
</STYLE>
</HEAD>
<BODY>
<SYNC Start=0><P Class=SUBTTL><br>
<SYNC Start=100><P>Na een ruime inlooptijd
<SYNC Start=3300><P><br>
<SYNC Start=3300><P>kunnen we beginnen
<SYNC Start=6800><P><br>
<SYNC Start=6800><P>met het tweede deel van 30-11, Watermanagement.
</BODY></SAMI>
Figure 1.5: Sample code for SAMI (HTML-based)
XML-based
One of the latest subtitle types is based on XML and is called MPEG-4 Part 17 or MPEG-4
Timed Text. It is a text based subtitle format for MPEG-4 which is used on the new Blu-ray
media disc system. It is streamable, which was one of the main aspects when creating the
format and is mainly aimed for use in the .mp4 container, but can also be used in the .3gp
container (as 3GPP Timed Text), which is technically almost identical with .mp4 but more
used in cell phones. 3GPP Timed Text is exactly the same as MPEG-4 Timed Text when used
in the .mp4 container.
QuickTime Pro and MP4Box can create or produce these kind of subtitle streams out of
various subtitle input formats. MP4Box uses the fourcc tx3g for MPEG-4 Timed Text because
of its inherently higher compatibility. MPEG-4 Timed Text is heavily based on XML semantics.
Of interest is the fact that it seems a line must defined for all times, meaning when there are
no subtitles to be displayed, a blank line must be inserted. An example of MPEG-4 Timed Text
subtitles is shown in Figure 1.6.
<tt xml:lang="en" xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:tts="http://www.
w3.org/2006/10/ttaf1#style">
<head>
<layout />
</head>
<body>
<div xml:id="captions">
<p begin="00:00:01" end="00:00:07"><![CDATA[Na een ruime inlooptijd]]></p>
<p begin="00:00:08" end="00:00:10"><![CDATA[kunnen we beginnen]]></p>
<p
begin="00:00:10.5"
end="00:00:12.5"><![CDATA[met
het
tweede
Watermanagement.]]></p>
</div>
</body>
deel
van
30-11,
Figure 1.6: Sample code for MPEG-4 timed text (XML-based)
Annex D. Subtitling of Collegerama
179
How to create subtitles
Subtitles for translation and searching are only composed of spoken text. This is created from
the audio track extracted from the video stream. The creation method is shown in Figure 1.7.
Figure 1.7: Conversion process for creating subtitles
There are several ways of creating subtitles:
• manual post processing
• speech recognition
• live
Manual post processing
Many different programs can be used to manually create subtitles for a movie, but the overall
usage of them is generally the same. You start by typing in the lines of text that are spoken
in the movie. Once these are finished the transcript need to be matched to the time
sequences of the movie. For every line of text, a digital timestamp is added so that the
subtitle generator can later show the appropriate text at the right timeframe.
Figure 1.8: Screenshot of the program SubCreator
The advantage of this method is that it is very easy and editing the subtitles is simple.
Everyone who can understand the language that is being spoken can write out the transcripts
of a given video stream. The problem is that such a process is very time consuming and
therefore relatively expensive.
Speech recognition
At the moment, speech technology is still a long way from achieving totally automatic
subtitling for any program. There are still too many errors in generating text and several
challenges such as background noise, different accents and multiple simultaneous speakers
make the process very difficult. However, speech technologies do have their place in the
world of modern subtitling. Speech recognition systems are already used in live subtitling
systems for sports, news and politics. Note that for searching within transcripts, the text does
not have to be as good as for translation.
Live
Live subtitles have to be created within 2 or 3 seconds of the broadcast. There are people
specializing in this sort of work, called Communication Access Real-Time Translation
stenographers. They use a specialized keyboard that is specifically designed to support
shorthand writing, called a stenotype or velotype typewriter.
180
Annex D. Subtitling of Collegerama
Figure 1.9: Two examples of a velotype typewriter
Realtime stenographers are the most highly skilled in their profession. Stenography is a
system of rendering words phonetically, and English, with its multitude of homophones (e.g.
there, their, they're), is particularly unsuited to easy transcriptions. They must deliver their
transcriptions accurately and immediately. They must therefore develop techniques for keying
homophones differently, and be unswayed by the pressures of delivering accurate product on
immediate demand.
Annex D. Subtitling of Collegerama
181
2.
Subtitling for recorded lectures
Selecting of an example lecture
For the case of this project, a Dutch lecture given by J.C. (Hans) van Dijk about Sanitary
Engineering was selected for the purpose of testing certain subtitling techniques. This lecture
is from the bachelor's course CT3011, Introduction Water Management. It has the following
specifications:
Table 2.1: Sample lecture chosen for subtitling
Course
Lecture
Lecturer
Duration
Number of slides
Collegerama link
CT3011 – Introduction Water Management
Lecture 7 – Sanitary Engineering (Civiele Gezondheidstechniek)
Prof. ir. J.C. (Hans) van Dijk
45:09
27
http://collegerama.tudelft.nl/mediasite/Viewer/?peid=f33ba7ff-01604259-bd94-7ee0d9c5a461
For the creation of subtitles for this lecture, the most flexible type is soft subtitles, because
they are manageable and easy to update. Another advantage is that the subtitles are offered
in plain text, so they can easily be sent through translation engines or even search through
them with relative ease.
Creating subtitles for the example lecture
To manually create a soft subtitle track, there are plenty of programs which are easy to use
and all work very similarly. The one used in this example is called SubCreator (as shown
earlier). It allows the user to play, pause, fast forward and rewind the video stream and
contains a textbox where the spoken text can be entered (see Figure 2.1).
Figure 2.1: Transcript annotation using Subcreator
After the entire transcript has been typed out, the video has to be replayed from the
beginning. This is done in order to add the timecodes corresponding to the text, which are
necessary for later playback. A subtitle player can't possibly know when to display a certain
piece of text on the screen. Within SubCreator, the shortcut ctrl+a will add a timecode to the
current line of text and automatically skip to the next line. Sometimes the sentences are too
long and they have to be split up in order for them to fit on the screen during playback.
Usually, comma's or short pauses in the text are chosen to split them up.
182
Annex D. Subtitling of Collegerama
Another common problem is the pace at which the text is spoken. Sometimes the lecturer
speaks at such a high speed that it is hard to follow the subtitle text, because the text on the
screen is updated so quickly. In these instances it is important to try and make the text on
the screen as long as possible, so that it won't update too fast for viewers to follow, but will
still fit well on the screen.
These two problems combined make it difficult to properly create good subtitles. There are
two options that can solve this problem. The long sentence can be split up into smaller
sentences, or a really long subtitle line is created which will automatically be divided into two
rows by the subtitle player during playback. The best solution depends on the pace at which
the lecturer is speaking. If he is speaking really fast, then it is very chaotic to use one single
line, because the text will update really fast. For this situation it is better to create two rows
of text so that the subtitles become easier to follow. If the lecturer is speaking at a slower
pace, it is best to simply cut up the text into single lines, usually done in between short
pauses.
Figure 2.2: Adding timecodes to the transcript
Once the transcript has been completely worked out, the sentences have been properly split
up (so that they are easy to follow and aren't too long) and the timecodes have been
entered, it is time to convert them to a proper subtitle format. SubCreator offers several
different formats:
• SSA format
• SRT format
• simple time format
• frame format (MicroDVD)
SRT is the most commonly used and therefore the one used in this example (as you can see
in Figure 2.3). It converts the text and saves it to a *.srt file on your hard drive so it can be
used for playback along with the video file.
Annex D. Subtitling of Collegerama
183
Figure 2.3: Converting the time-coded transcript to a generic subtitle format
To manually work out the whole transcript and to properly timecode every single sentence,
takes a normal student approximately 3 hours, for a video of 45 minutes. This means that it
takes 4 times the length of the video to create a complete time-coded subtitle stream. The
results can be viewed in Annex A, B, C and D.
Subtitling in YouTube
The online video distribution website YouTube allows for the possibility of adding subtitle
tracks to your uploaded videos. For every language, a new subtitle track needs to be
uploaded so the viewers can switch between languages (as shown in Figure 2.4). YouTube
accepts *.srt files for this.
Figure 2.4: Adding subtitles to a YouTube movie
184
Annex D. Subtitling of Collegerama
3.
Translation of subtitles for recorded lectures
Subtitles for translation and searching are only composed of spoken text. This is created from
the audio track extracted from the video stream. The creation method is shown in Figure 3.1.
Figure 3.1: Creation process for subtitles
There are several ways of creating translated subtitles:
• manual subtitling
• real-time subtitling
• speech recognition
YouTube
If there is at least one subtitle track available, YouTube provides a translation service that can
automatically convert the subtitles to another language. This is done through the Google
Translate service mentioned. On the bottom right of the YouTube interface, a button with the
CC logo (the official logo which stands for Closed Captions) is available to turn the subtitles
on or off. It also opens a submenu from which you can access the translation menu (see
Figure 3.2).
Figure 3.2: Turning subtitles on or off in YouTube
When the translation menu has been opened, the user can choose from 52 different
languages that are available under the dropdown menu (see Figure 3.3). Once a language
has been chosen, the subtitles will be automatically sent to the Google Translate engine and
YouTube will display the results.
Annex D. Subtitling of Collegerama
185
Figure 3.3: Google Translate menu in YouTube
Automatic translation is very difficult, as the meaning of words depends on the context in
which they're used. Scientists and computer developers are still working on this problem and
it may be some time before anyone can offer a quick and seamless translation experience.
Obviously, the translation that is offered today is far from being perfect or even coherent.
However, it's still a great way to understand the central ideas from a text. Now that Google
Translate supports so many languages, it's not hard to imagine that you'll be able to read
almost any web page in your language and maybe any application will be able to use Google
Translate's APIs to speak your language.
Figure 3.4: Translated subtitles from Dutch to English in YouTube
Google Translate's coverage has been expanded dramatically. It now supports the translation
between any of the following languages: English, Arabic, Bulgarian, Chinese, Croatian, Czech,
Danish, Dutch, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian,
Polish, Portuguese, Romanian, Russian, Spanish, Swedish. From 26 language pairs, Google
Translate now supports 56 language pairs and becomes the most comprehensive online
translation tool available for free.
Most state-of-the-art, commercial machine-translation systems in use today have been
developed using a rule-based approach, and require a lot of work to define vocabularies and
grammars. Google Translate takes a different approach and feeds the computer billions of
words of text, both monolingual text in the target language, and aligned text consisting of
examples of human translations between the languages. After that, statistical learning
techniques are applied to build a translation model.
186
Annex D. Subtitling of Collegerama
Table 3.1: List of languages supported by Google Translate
African
Albanian
Arabic
Belarusian
Bulgarian
Catalan
Chinese
Croatian
Czech
Danish
Dutch
English
Estonian
Filipino
Finnish
French
Galician
Annex D. Subtitling of Collegerama
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Irish
Italian
Japanese
Korean
Latvian
Lithuanian
Macedonian
Malay
Maltese
Norwegian
Persian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Swahili
Swedish
Thai
Turkish
Ukrainian
Vietnamese
Welsh
Yiddish
187
Annex D1. Transcript lecture CT3011 (unsorted)
Na een ruime inlooptijd kunnen we beginnen met het tweede deel van 30-11, Watermanagement. Het deel over gezondheidstechniek ga ik de komende
zeven weken met jullie doornemen. En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus, mijn naam is Hans van Dijk, zoals jullie daar zien staan
en ik dacht, laat ik daar maar twee dingen voor nemen, mijn hobby en mijn werk. Nou de hobby dat zien jullie, ik ben een marathonloper. Een mooie foto
van de glorieuze binnenkomst in Rotterdam in april afgelopen periode. Marathonlopers dat zijn allemaal een beetje fanatieke lui he, echte doordouwers, die
trainen iedere dag. Die weten hun leven zodanig te organiseren dat dat allemaal kan. Dus ik loop hier ook iedere dag tussen de middag een rondje naar
Delfts hout, of langs de Schie, of een ander parcour hier. Als jullie me eens een keer in korte broek of trainingspak zien lopen dan klopt dat, dat ben ik. En
dat doe ik inmiddels met een heel groepje mensen, bij ons op de afdeling, met studenten en promovendi. En een van die studenten is hier weergegeven,
dat is Karin Teunissen. Die zat drie jaar geleden hier bij inleiding watermanagement. Was toen derde jaars, inmiddels is ze afgestudeerd en begonnen met
een promotieonderzoek bij het duinwaterbedrijf in Scheveningen. En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april 42 kilometer
samen gelopen. Nou dat is een herinnering die ons beide in het geheugen gegrift zal blijven. Dan het werk. Ik heb, ik ben, ja een vraag, ben jij ook een
hardloper? - Sorry? Ben jij ook een hardloper? - Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al gegeven. Nee - Niet? Dan weet ik niet
hoe ik dit al wist, maar... Nou ik zeg dit wel eens vaker, dus dat zou best kunnen. Waar ben je geweest? - Ja volgens mij vorig jaar, maar... Ja tuurlijk, vorig
jaar hebben we ook 30-11 gegeven ja, dat klopt haha. Maar deze foto is echt van april hoor dus dat is toch vrij recent. Wat misschien zou kunnen zijn is, ik
geef ook altijd een van de gastcolleges bij inleiding Civiele Techniek in het eerste jaar. En daar begin ik natuurlijk ook een beetje met, ja wie ben ik, dus dat
zou best kunnen, dat je het daarvan herinnert. Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd. Ik heb toen ook Civiele Techniek
gestudeerd, in '76 afgestudeerd. Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in Amersfoort, en dat kan ik jullie van harte aanraden als je
straks afgestudeerd bent om bij een ingenieursbureau te gaan werken. Dat is een geweldige ervaring, je bent met allerlei projecten over de hele wereld
bezig. In mijn geval dan drinkwater projecten. Dus het ontwerpen van zuiveringsinstallaties, bouwen van systemen, ook het doen van onderzoek. Eigenlijk
kun je alle kanten op bij een ingenieursbureau en de Nederlandse ingenieursbureaus zijn redelijk succesvol, ook op de internationale markt tegenwoordig.
Ja ik heb daar vele jaren gewerkt, totdat op een gegeven moment, inmiddels is dat alweer 17 jaar geleden, er een advertentie stond dat we een hoogleraar
zochten hier in Delft. En toen dacht ik van, nou ja, laat ik maar eens een brief schrijven, je weet het nooit, niet geschoten is altijd mis. Dus ik heb een brief
geschreven en ik dacht, ik zal het vast wel niet worden, maar ik werd het wel. Dus ook daar zit al meteen een eerste levensles in, probeer maar eens wat
en het kan altijd meevallen. Ik ben in eerste instantie vervolgens voor een dag in de week hier deeltijdhoogleraar geworden in de drinkwatervoorziening,
dat is mijn leerstoel. En ja, zo langzamerhand van het een komt het ander, je wordt voor steeds meer dingen gevraagd. Dus ik ben langzamerhand meer
dingen hier in Delft gaan doen en die aanstelling bij DHV heb ik steeds verder afgebouwd, en vanaf 1999 ben ik volledig gestopt bij DHV en ben ik hier
voltijd hoogleraar. En voltijd hoogleraar dat betekent ook, je hebt enerzijds taken op het gebied van onderwijs, anderzijds onderzoek, maar ook
management, dus management, ja, dan moet je, ik ben hoofd van een afdeling enzo en dan zit je in het managementteam of in de opleidingscommissie.
Moet je over algemene dingen meepraten en beslissen. Daar kun je natuurlijk een dagtaak van maken, dat heb ik altijd vermeden. Ik vind het toch altijd het
leukste om met het vak bezig te zijn en daarmee kom ik op het tweede plaatje wat hier staat, want het allerleukste is eigenlijk afstudeerders begeleiden.
Dat gaan jullie de komende jaren dat proces doormaken. Dat is voor ons altijd ontzettend leuk om te zien hoe studenten zich transformeren van min of
meer anonieme figuren die in de collegezaal zitten en zitten te luisteren. Min of meer absorberen wat ik in een monoloog aan het overdragen ben. Hoewel ik
overigens wel reacties van jullie zeer op prijs stel hoor en ik zal daar ook af en toe expliciet om vragen. Maar goed, de praktijk is toch dat in deze fase van
de studie zitten jullie nog vooral te luisteren en dat wordt eigenlijk steeds leuker als je verder komt in het vierde en het vijfde jaar en het hoogtepunt is dan
natuurlijk het afstuderen, waar je echt een onderwerp helemaal zelf bij de kop pakt. Ik zeg ook altijd tegen mijn afstudeerders, je moet van je
afstudeerproject je visitekaartje maken, he Doris, en dat werkt ook echt zo. Op het moment dat je klaar bent met dat afstudeeronderwerp dan weet jij het
meeste van dat onderwerp af. Meer dan wie dan ook in Nederland. Dat bewijzen we ook iedere keer weer door de afstudeercolloqui. Daar geven we veel
kenbaarheid aan, daar komen altijd mensen vanuit de waterbedrijven van KIWA, van andere researchinstituten. Die doen daar mee in de discussies en onze
afstudeerders die weten keer op keer alle vragen te beantwoorden. Misschien niet altijd 100% goed, maar toch wel 99% goed. Dat is altijd een genoegen
om mee te maken. Ik zeg ook altijd dat ik trots ben op mijn afstudeerders, en dat is ook zo. Ik heb er inmiddels een stuk of 80 gehad en soms gaat het dan
heel goed, zoals hier staat met Karin en Doris, Doris is hier trouwens in de zaal aanwezig, die dan het afgelopen jaar allebei zelfs met lof zijn afgestudeerd.
Dat betekent dus dat je het heel goed gedaan hebt, hoge cijfers gehaald hebt, en ook het afstudeerproject heel goed gedaan hebt. Ja, dat is voor ons
gewoon heerlijk om dat mee te maken. Om te zien hoe jonge mensen het vak ook leuk gaan vinden, zelf ook enthousiast worden, en hun stempel gaan
zetten op ons vakgebied. En ik hoop dat enkele van jullie ook zo ver zullen komen. Goed, dat is wat mijzelf betreft. Dan wat dit vak betreft. We gaan dat
doen aan de hand van het boek, dat staat al op blackboard aangegeven. Daar hebben we een Nederlandse en een Engelstalige versie van. Dat boek dat
moeten jullie kopen bij de secretaresse van ons, Mieke op de vierde verdieping, voor 25 euro. In de winkel kost het 50 euro, maar wij hebben een speciale
kortingsregeling. Jullie mogen zelf weten of je het Nederlandse of het Engelse boek koopt. De inhoud is vrijwel hetzelfde en in ieder geval voldoende voor
dit vak. Als jullie een advies van mij willen hebben dan zou ik zeggen, als je goed Engels kunt lezen, koop het Engelse boek, dat is iets actueler, staat iets
meer informatie in, maar het Nederlandse boek is voor dit vak zeker voldoende. Ja, zo'n boek heeft natuurlijk, behalve dat we er over gaan vragen bij het
tentamen, daar zal ik bij mijn volgende dia op terugkomen, heeft zo'n boek natuurlijk ook nog een zekere functie als naslagwerk. Als je zo'n boek eenmaal
hebt, dan heb je dat bij je, ook na je afstuderen neem je dat mee. Als je vervolgens ergens in een vreemd land een installatie moet ontwerpen, dan haal je
dat boek weer eens uit de tas en dan weet je weer het een en ander. Die functie heeft zo'n boek ook. Daar staan vraagstukken ook in, in dat boek, en we
hebben ook vraagstukken op blackboard staan. Dat zullen jullie misschien ook al gezien hebben, computer assignments. Dat is overigens niet verplicht, er is
bij ons niets verplicht. Ja, jullie moeten uiteindelijk het tentamen doen, maar we bieden materiaal aan, dus maak er gebruik van zou ik zeggen maar we
gaan dat niet controleren. Er staan daar vragen op blackboard, er zitten vragen in dat boek, de antwoorden staan er ook bij, of althans, als je die computer
assignment gemaakt hebt dan krijg je na afloop te melden welke vragen goed waren en welke vragen fout waren. Dus dat is een ondersteuning voor jullie
bij het kennismaken met de materie en het leren van de stof. En oude tentamens hebben we daar ook bij staan, dus dan kun je ook nog eens oefenen en
kijken wat er ongeveer gevraagd wordt. En dan gaan we college geven de komende periode. Oh ja, dus over het boek, jullie hoeven niet het hele boek te
kennen. Dat boek wordt zowel gebruikt bij 30-11, als bij het volgende college 34-20, wat een a keuzevak is voor de mensen die watermanagement gaan
doen, en de hoofdstukken die voor 30-11 gevraagd worden op het tentamen staan hier aangegeven. En die presentatie komt ook weer op blackboard zoals
jullie weten, inclusief deze video opname. Dan gaan we deze colleges geven, dus 7 keer de komende periode vanaf nu, en ik wil het dit jaar zo doen dat in
het eerste uur vertel ik een beetje de grote lijn van het betreffende onderwerp. De belangrijkste punten, ik probeer daar wat kleuring aan te geven. Wat is
nou belangrijk en wat minder. En het tweede uur heb ik steeds een van de promovendi, vandaag is dat Doris, die dan iets gaan vertellen over hun eigen
onderwerp, hun eigen onderzoek, hun eigen project, wat een stukje actualiteit geeft, en kleuring, verdieping, van het betreffende onderwerp. En ik heb het
zo georganiseerd dat dat steeds, als het goed is, goed op elkaar aansluit en jullie een goed beeld geven van de stof, zodat je straks het tentamen ook
makkelijk kunt maken. Dat wil niet zeggen dat alle onderdelen van de verhalen van de promovendi tentamenstof zijn. Dat zullen we zo her en der ook wel
aangeven. Ja, zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan jullie nu in het derde jaar hoeven te weten, maar het gaat meer om de
beeldvorming, de kleuring en het begrip van de materie. Dan hebben we een excursie gepland naar de Berenplaat, de grote zuiveringsinstallatie bij
Rotterdam, bij Spijkenisse om precies te zijn, op 11 oktober. Ook dat is niet verplicht, alles is facultatief bij ons. Daar hebben zich tot nu toe een stuk of 60
mensen aangemeld. De inschrijving sluit op 1 oktober hebben we gezegd, omdat bij de waterbedrijven tegenwoordig ook strikte veiligheidsvereisten enzo
zijn na de aanslagen in New York. Je moet daar precies opgeven wie er allemaal komen, met naam enzo en wij moeten daar voor instaan ook, dat er geen
vervelende dingen gebeuren, en er moeten natuurlijk ook bussen gereserveerd worden en we krijgen daar lunch geserveerd. Dus de mensen die zich
opgegeven hebben die krijgen nog een mailtje binnenkort, kort na 1 oktober, met een bevestiging, en degene die zich niet opgegeven hebben die gaan niet
mee. En ik ga er ook van uit dat degenen die zich wel opgegeven hebben, dat die ook komen he, het is natuurlijk een beetje vervelend tegenover de
organisatoren als we daar met veel minder mensen zouden aankomen dan we aangemeld hebben. We zullen proberen, ik heb wat vragen gekregen over
dat er 's middags verplichte practica zouden zijn van constructieleer en statistiek geloof ik, dus we zullen proberen om tijdig weer terug te zijn. Dat zal zeker
niet om half 2 zijn, dus ik denk dat we ongeveer om half 3 terug zullen zijn, en we vertrekken gewoon na het college op donderdag, dus om half 11. Ik
weet niet of, even kijken of ik al ga beginnen, ja ik ga al beginnen dus, zijn er vragen over de organisatie en deze algemene inleiding? Okee. Nou dan ga ik
kort even iets vertellen over gezondheidstechniek, dat zal jou ook bekend voor komen want dat heb ik ook bij het eerste jaar al verteld, en dan ga ik iets
meer vertellen over de drinkwatervoorziening van Nederland en na de pauze gaat Doris dan iets vertellen over de drinkwatervoorziening in
ontwikkelingslanden, want daar is zij vooral mee bezig. We hadden natuurlijk gezondheidstechniek. Nou dat zal ieder van jullie niet onbekend zijn, dat dat
gaat over de stedelijke waterkringloop, dus de infrastructurele werken voor de voorziening van drinkwater, het winnen van grondwater, het winnen van
oppervlaktewater, het zuiveren daarvan, het vervolgens transporteren met een heel transportleidingen en distributieleidingensysteem naar ons allen toe.
Naar de huishoudens en de industrieen, de bedrijven, vervolgens het inzamelen van het afvalwater via de riolering. Het zuiveren van dat afvalwater en dat
wordt dan vervolgens weer geloosd op het oppervlaktewater. Dus alle infrastructurele werken die over die kleine stedelijke waterkringloop gaan, dat is wat
we gezondheidstechniek noemen, en ik zal hier vooral focussen op de drinkwatervoorziening, omdat we daar ook het meest duidelijke effect zien zoals hier
in deze figuur weergegeven. Het verdwijnen van besmettelijke ziekten in Nederland, doordat die niet meer overgedragen worden via besmet drinkwater. In
de rest van de wereld is dat natuurlijk nog een hele andere situatie, maar hier hebben we daar flink veel succes mee gehad in de 20e eeuw. We zien hier
een plaatje dat weergeeft de daling van de sterfte aan buiktyfus in de 20e eeuw, en dat loopt parallel aan het percentage van de mensen wat niet
aangesloten is op de drinkwatervoorziening, in diezelfde periode is in Nederland de drinkwatervoorziening aangelegd. Rond 1900, zelfs kort voor 1900, de
grote steden en zo langzamerhand ook de kleinere steden en het platteland, en vanaf 1975 zeg maar, is in Nederland iedereen op de drinkwatervoorziening
aangesloten en komen besmettelijke ziekten die door besmet drinkwater overgedragen worden ook niet meer voor. Dus het gaat bij ons om infrastructurele
188
Annex D. Subtitling of Collegerama
werken voor een goede waterkwaliteit, dus zaken als waterwinning, waterzuivering, watertransport, waterchemie en microbiologie ook, die waterkwaliteit.
Microbiologie, enerzijds het afwezig zijn van organismen waar we ziek van kunnen worden maar anderzijds ook het gebruiken van micro organismen om de
zuivering te optimaliseren. Micro organismen kunnen ook weer verontreinigingen afbreken, bekendste voorbeeld daarvan is de afvalwaterzuivering waar we
met behulp van zuurstof en actief slib, dat is een mengsel van bacterien, de afvalstoffen in het afvalwater laten afbreken. Dus waterkwaliteit, waterchemie
en microbiologie zijn in dit deel van de civiele techniek vrij belangrijk. We maken natuurlijk ook gebruik van de algemene kennis van civiele ingenieurs en
met name dan van zaken als hydraulica, hydrologie, constructieleer, constructieve vormgeving, projectrealisatie, informatica, zijn natuurlijk allemaal dingen
die je in projecten nodig hebt. Vaak ook in teamverband, bij zo'n ingenieursbureau bijvoorbeeld. De een is meer bezig met de automatisering, de ander is
meer bezig met het constructieve deel, een derde is weer met de hydraulica bezig, en jullie kunnen afhankelijk van de specialisatie die je kiest daar een
verschillende rol in spelen. Die gezondheidstechniek is natuurlijk van groot belang voor de volksgezondheid, dat spreekt voor zich. Het gaat over relatief
grootschalige infrastructurele werken, we zien hier de zogenaamde Biesbosch bekkens. Dat is in de Brabantse Biesbosch. Bekkens die aangelegd zijn voor
de drinkwatervoorziening, en het gaat om een goed georganiseerde sector met heldere taken. Er is zelfs een aparte wetgeving voor, de waterleidingwet, als
het over de drinkwatervoorziening gaat, waarin gewoon staat precies waar alles aan moet voldoen, en dat de directeur van het waterleidingbedrijf daar
persoonlijk voor aansprakelijk is. Die riskeert gevangenisstraf als die onvoldoende water of water distribueert waar je ziek van kan worden. Dus dat is
allemaal goed georganiseerd. En we doen daar in Delft een hoop aan, dus die leerstoel van mij, de leerstoel drinkwatervoorziening, is de enige leerstoel in
Nederland op het gebied van de drinkwatervoorziening. Dus dat is wel fijn, geeft ons een zekere exclusiviteit. Veel van onze studenten die zijn dus ook, ja
die hebben toonaangevende posities in die vakwereld, die zijn directeur of staffunctionaris, of ontwerper bij de waterbedrijven, en ook veel van onze
ingenieurs gaan naar de ingenieursbureau's toe. Nou, daar gebeurt een heleboel, af en toe hebben we zelfs ook gastcolleges van Willem Alexander die ook
het watermanagement interessant vindt. Nou dan heb ik tenslotte nog drie dia's voordat ik wat meer ga vertellen over de opzet van de infrastructuur in
Nederland, die nog even wat illustreren van dat werk van ons vakgebied. Dus dit plaatje dat heb jij ook al gezien he, dus jij kan mij nu ook vertellen waar
dit dipje vandaan komt? - Volgens heeft dat iets met de pauze te maken. Ja precies, dus dit is het waterverbruik tijdens massa events, in dit geval de WK
voetbal, en nu zie je dat we ons allemaal als kuddedieren gedragen. Dat vanaf het begin van de wedstrijd, dat is hier, het waterverbruik enorm naar
beneden gaat. Niemand gebruikt meer water, iedereen zit voor de TV, zit te kijken. Totdat het rust is, dan rennen we allemaal naar de WC en naar de
koffieautomaat, dan hebben we een enorme stijging in het waterverbruik. In de tweede helft gaat weer iedereen kijken, zien we weer een zeer lage piek in
het waterverbruik met zelfs een minimum kort voor de tijd toen dat beslissende doelpunt, in dit geval door Dennis Bergkamp gemaakt werd, en aan het
einde van de wedstrijd rent iedereen weer naar de WC toe. En datzelfde zie je dus ook bij het industriele verbruik he. Zelfs daar is het zo dat operators
enzo, die zitten ook te kijken, en alles zit toch een beetje op halve kracht te draaien. Dat is een enorm reproduceerbaar fenomeen, deze curves, soort
electrocardiogrammen van ons gedrag. Het gedrag van de bevolking. En dit dipje, dat noemen we inderdaad de Cruijff dip. Dat is het moment tijdens de
pauze waarop Cruijff commentaar komt geven. Dan rent iedereen weer even terug van de WC om even te luisteren wat Cruijff te zeggen heeft en dan
worden vaak ook de doelpunten herhaald, en dan kijken we allemaal weer eventjes naar de TV. We zijn natuurlijk vooral bezig met ontwerpen. Het gaat
natuurlijk vaak om nieuwe infrastructurele werken. De bouw van een pompstation, het ontwerp van een zuiveringsinstallatie en transportleiding en
ontwerpen daar hebben jullie natuurlijk al veel over gehad bij projectonderwijs, en het ontwerponderwijs. Dat is schematiseren. Een bepaald kader in je
hoofd maken van hoe iets in elkaar zit. Dus een filter, hoe schematiseren we dat nou, en hoe stroomt het water door een installatie heen. De hydraulische
lijn. Daar moeten we een bepaald schema van maken. Daar moeten we formules op kunnen toelaten. Dat moeten we kunnen berekenen. En daar moeten
we vooral ook geen fouten bij maken, daar is dit plaatje voor bedoelt. Een van de koolfilterinstallaties bij de drinkwaterleiding van Rotterdam, bij Kralingen,
langs de Drienernoordbrug, waar toendertijd een keer waterslag is opgetreden, met als gevolg implosie van dat koolfilter, en dat is natuurlijk heel
vervelend. Vaak loop je daar dan ook tegenaan dat je met die wet van Murphy te maken hebt, dat alles wat fout kan gaan dat gaat ook een keer. Dus
waterslag dat is het verschijnsel dat als bijvoorbeeld een pomp afslaat, dat er een onderdrukgolf kan ontstaan en die onderdruk die kan dus inderdaad tot
implosie leiden. Nou dat kan je natuurlijk voorkomen door een ontluchting beluchtingsventiel aan te brengen. Dat is hier ook gedaan, bovenop dat koolfilter
zat zo'n ventiel, maar helaas was het net op het moment dat die pomp hier uitviel, ten gevolge van een stroomstoring, was het ook winter en was het een
hele strenge vorst en was dat ontluchtingsventiel bevroren, waardoor er geen lucht meer kon toetreden en er dus toch vacuum ontstond in dat vat, en ja,
dit resultaat optrad. Dus ontwerpen is vooral ook bewust zijn van dingen die mis kunnen gaan, vandaar ook dat hydraulica ook vrij belangrijk is. Het is
natuurlijk heel vervelend als het water ergens uit spuit of de verkeerde kant op gaat, dus je moet vooral ook steeds alert zijn op dingen die fout kunnen
gaan en ontwerpen is vooral ook ervaring. Dingen gezien hebben, hoe doe je het in de praktijk nou? Vandaar ook dat we die excursie gepland hebben naar
de Beerenplaat toe, dan kunnen jullie voor de eerste keer vast eens even kijken van, ja, hoe ziet zo'n installatie er nou uit, waar moet je nou allemaal
rekening mee houden? Alright, nou, nog een paar plaatjes van een ander project, in Limburg in dit geval, waar een grote transportleiding is aangelegd bij
een oppervlaktewaterproject in Panheel. Dat was in het kader van de zogenaamde verdrogingsdiscussie. Dat is een discussie die in Nederland een aantal
jaren gevoerd is, onder andere door de winning van drinkwater gaan de grondwaterstanden omlaag en treed er verdroging van natuurgebieden op. Dus er
is toen hier in Limburg gezegd, een jaar of 10 geleden van, nou we moeten de grondwaterwinning gaan verminderen en overgaan op de Maas. Die stroomt
tenslotte door Limburg heen, dus dat is vrij makkelijk. Toen is er hier een spaarbekken aangelegd. Nou aangelegd, dat was een oud grindgat. Dus er was
daar grind gewonnen, dus die put was er toch al. Die is gevuld met Maaswater. Dat Maaswater gaat vervolgens vanuit dat bekken, dat zien we hier, zakt
dat vanzelf de grond in. Dat noemen we infiltratie, kunstmatige infiltratie, dat water zakt de grond in waarbij er alvast een heleboel kwaliteitsverbetering
optreed. Allerlei stoffen die worden afgefiltreerd tussen het zand van de ondergrond, en de bacterien gaan dood door de lange verblijftijd. Dus je krijgt al
een aanzienlijke verbetering van de waterkwaliteit. Dan wordt het water weer opgepompt met behulp van putten, die dan op een bepaalde afstand rond dat
bekken zijn opgeplaatst. Dus dan win je eigenlijk een soort kunstmatig grondwater. Je maakt dan eigenlijk van het Maaswater, wat natuurlijk allerlei
bacterien en virussen en andere verontreiningen bevat, maak je een soort kunstmatig grondwater. Dat wordt dan weer gewonnen en het wordt vervolgens
nog gezuiverd in de zuiveringsinstallatie die we hier zien weergegeven. En dan ging het dus met die transportleiding door heel Limburg heen, naar de
verbruikers toe. En tenslotte doen we natuurlijk ook onderzoek, vooral hier op de TU. Als je bij een ingenieursbureau werkt, nou dan heb je niet zoveel
onderzoek nodig, dan gebruik je meestal vuistregels en ontwerpcriteria, maar het vakgebied ontwikkelt zich natuurlijk ook steeds verder, er zijn iedere keer
weer nieuwe bedreigingen. Momenteel bijvoorbeeld nogal in het nieuws, het voorkomen van geneesmiddelen in de Rijn. De pil die is aantoonbaar in
concentraties in de Rijn aanwezig, en komt dat nou ook in het drinkwater terecht en wat moeten we daaraan doen. Moet de zuivering weer uitgebreid
worden? Dat soort vragen die leven. En dan zijn we dan met onderzoek bezig. Onderzoek dat gebeurt vaak ter plaatse bij ons. Dit is een plaatje van het
veldpracticum in Luxemburg. Zal Huub Savernije misschien afgelopen maandag ook wat over verteld hebben, maar dat is ook heel relevant omdat het ene
water het andere niet is. Water is een natuurlijke stof en de verontreinigingen en de stoffen waar het om gaat, ja dat is afhankelijk van de bron. De
interactie, de lozing van stoffen die eventueel plaats gevonden hebben. Interactie met de bodem, bladeren en natuurlijke afvalstoffen die in het water
terecht komen. Dus ieder water is weer anders en je moet het bij voorkeur ter plaatse doen. Het is niet zo goed mogelijk om te zeggen van, nou ja, ik doe
in het laboratorium maar proeven. Nee, je hebt toch altijd weer de toets nodig van de praktijk. Gedraagt het water zich in de praktijk ook zoals we dat
theoretisch denken. Sommige dingen gebeuren natuurlijk wel in het lab. Er is hier ook een laboratorium Stevin 3, het waterlaboratorium, waar allerlei
opstellingen staan. Filters, bezinkinstallaties, andere proefopstellingen, en daar krijgen jullie later, zullen jullie daar zelf ook practicum doen als je in deze
richting door gaat. En uiteindelijk kun je zelfs een promotieonderzoek doen, en in de aula de doktorsbul uitgereikt krijgen. Goed. Dan heb ik nog een
kwartier als ik het goed heb. Ja, en die kan ik goed gebruiken voor een stukje om eens even een eerste verhaal vast te geven van wat is er nou bijzonder
aan de drinkwatervoorziening in Nederland? Wat moeten jullie daar nou van weten. En ik maak daar gebruik van een presentatie die ik vorig jaar gegeven
heb in Canada, voor de Canadeze waterbedrijven, en daar is het weer heel anders. Dus ik heb daar ook echt mijn best gedaan om een beetje duidelijk te
maken van, wat is er nou bijzonder in Nederland? En wat zouden jullie in Canada daar nou aan kunnen hebben? En ik denk dat dat voor jullie ook een
aardige introductie zou kunnen zijn in het vakgebied. En eigenlijk is dat trouwens al heel kernachtig weergegeven met dit plaatje. Dus dat is een plaatje van
een kindje, Joey, die het water uit de kraan drinkt en eigenlijk, zoals dat plaatje hier weergeeft, vertrouwt he. Dus het water moet zo goed zijn, dat je er
volledig op kunt vertrouwen. Dat je het zelfs je kinderen laat drinken en dat het boven elke verdenking verheven is. Dat is eigenlijk de kern van de filosofie
van de drinkwatervoorziening in Nederland. Dat is natuurlijk ook bij andere landen in zekere zin wel het geval, maar toch veel minder. Ik weet niet of jullie
in Amerika en Canada en dat soort landen geweest zijn. Daar is het eigenlijk meer zo dat men het drinkwater, dat heet daar ook tapwater, kraanwater, dat
is meer iets dat gebruik je voor de wasmachine, en de WC, dat doen wij ook hoor, maar drinken doe je dat eigenlijk niet, in Canada en Amerika. Als je
water wilt drinken, dan ga je een fles kopen bij de supermarkt. Of je zet nog een filter op je kraan, om het water na te zuiveren. En dat noemen we, het
consumentenvertrouwen is in die landen dus veel minder dan in Nederland. En dat heeft voor een deel te maken met, ja, cultuur en traditie. In Europa zijn
we gewend dat de overheid dingen goed regelt, en in Amerika zijn ze dat veel minder gewend. Daar overstroomt gewoon heel New Orleans, en dan gaan
we het weer eens opnieuw opbouwen enzo. Dat doen wij in Nederland ook niet. En zo is dat met drinkwater ook zo. In Nederland is het zo dat we vinden
dat we absolute zekerheden moeten hebben dat dat drinkwater wat uit de kraan komt, dat dat er A altijd is, leveringszekerheid, en B dat het altijd goed is,
zodat onze kinderen het met een gerust hart kunnen drinken en wij zelf ook. Een plaatje met een aantal kernbegrippen vast, het waterverbruik, het feit dat
we gebruik maken van grondwater en oppervlaktewater voor de drinkwatervoorziening. Grondwater is ook in Nederland vaak nog van een hele goede
kwaliteit. Een beetje geillustreerd aan dit plaatje van de Veluwe, waar we regenwolken zien, en je kunt je wel voorstellen, als die regen daar op dat enorme
zandoppervlak van de Veluwe stroomt. Ja, dat water wordt heel goed gefiltreerd en dat grondwater wat je daar wint, dat is van hele goede kwaliteit
natuurlijk, dus grondwater is over het algemeen goed. Er zijn best ook wel zorgen over hoor, zoals hier en daar hebben we natuurlijk, ik geloof niet te
weinig zelfs, vuilnisstortplaatsen, en die kunnen het grondwater ook weer verontreinigen. En boeren die gebruiken natuurlijk mest en bestrijdingsmiddelen
en dat kan uiteindelijk ook in het grondwater terecht komen, maar gemiddeld gesproken is grondwater toch van een prima kwaliteit, en dus kunnen we ook
volstaan met een eenvoudige zuivering. Beluchting en zandfiltratie, daar komen we nog op, dat is meestal wel voldoende. Oppervlaktewater daarentegen,
dat is juist het andere eind van het spectrum zou je kunnen zeggen. We zitten in Nederland bij het afvoerputje van Europa. De Rijn en de Maas die zijn door
Frankrijk en Duitsland en Belgie gestroomd. Al dat afvalwater is erop geloosd. Dus het oppervlaktewater bevat een volledige cocktail aan alle stoffen die je
je kunt voorstellen. Dus oppervlaktewater moet zeer uitgebreid gezuiverd worden. Dat doen we ook in Nederland. Wordt in het buitenland wel eens
aangeduid als double Dutch threatment. We hebben heel veel zuiveringsprocessen achter elkaar, om er maar zeker van te zijn dat dat water uiteindelijk
Annex D. Subtitling of Collegerama
189
toch goed is. En heel bijzonder in internationaal verband, we gebruiken geen chloor. Amerikanen die vinden het vanzelfsprekend om chloor te gebruiken,
het drinkwater smaakt ook naar chloor daar, ruikt ook naar chloor daar. Dat vinden Amerikanen volkomen normaal. En in Nederland zeggen we, nee, dat
willen we niet. In de eerste plaats is daar een inhoudelijke reden voor, namelijk we weten dat als je chloor toepast, dat gaat reageren met bepaalde stoffen
die van nature in water voorkomen, organische verbindingen, en dan krijg je bepaalde desinfectie nevenproducten noemen we het. Chloroform is het
meest bekende voorbeeld. En dat zijn dus ongewenste stoffen. Dat zijn stoffen die giftig kunnen zijn. Nou, daar kun je wel van zeggen van, ik kan daar een
bepaalde norm voor stellen, en misschien kan ik net nog aan die norm voldoen, maar dat vinden we in Nederland al voldoende. We zeggen nee, dat zijn
ongewenste stoffen, die willen we gewoon niet hebben. Dus we willen chloor gewoon niet gebruiken. Dat is een bepaald essentieel uitgangspunt, wat ook
heel veel consequenties heeft hoor, maar wat in Nederland al meer dan 30 jaar gehanteerd wordt, en daar is ook veel aan gedaan, veel onderzoek aan
gedaan. Dus dat is denk ik een belangrijk punt al voor dit eerste college om even vast te houden. Chloor leidt gewoon tot die giftige verbindingen en dat
moet je daarom niet willen. In ieder geval hebben we dat in Nederland besloten, dat we dat niet willen, en dat doen we dus ook niet. Er is nog een
praktisch ander aspect en dat is dat water met chloor naar chloor smaakt, en dat vinden we in Nederland ook niet fijn. We vinden toch dat water wat uit de
kraan komt, dat moet lekker smaken, dat moet niet zo'n vieze choorsmaak hebben. Dat is een zwembad chloorsmaak. Dat willen we voor drinkwater niet.
Dat is waarschijnlijk ook weer een van die aspecten die met cultuur en consumentenvertrouwen samenhangen. We hebben een aantal principes die we in
Nederland gebruiken, en ik heb daar een stuk of 3, 4, sheets voor om die kort even de revue te laten passeren. Nou die focus op het gezondheid daar heb
ik al voldoende over gezegd. In Nederland is het ook zo dat we relatief grotere bedrijven hebben, die een soort mengsel zijn van publiek en privaat. Het zijn
NV's. Evides, het waterbedrijf wat hier is, dat is een NV, maar de aandelen zijn in handen van de gemeente Rotterdam en de provincie, en andere
gemeenten in het voorzieningsgebied. Dus het is eigenlijk een soort overheid, maar net weer niet. Semi-overheid, en dat geeft ook iets bijzonders. Er is
vorige week ook een nieuwe hoogleraar benoemd bij TBM, die daar een heel verhaal over heeft, dat dat eigenlijk een ideale formule is. Dat je op die manier
zeg maar de waarde van water, water is toch iets wat niet zomaar een marktgoed is, wat je niet zo makkelijk kunt reguleren, zoals andere, zoals auto's en
andere dingen, dus water heeft ook iets te maken met, het is van ons allemaal, we moeten het zorgvuldig beheren, en zo'n publieke verantwoordelijkheid
binnen een privaatrechtelijke organisatie, een NV, die dus wel efficient werkt, ja, dat is wel iets wat een zekere aantrekkelijke kant heeft. En typisch voor
Nederland, we polderen hier heel graag, dus we doen graag dingen samen. Dus die watersector in Nederland is heel goed georganiseerd. Die heeft een
gemeenschappelijk researchinstituut opgesteld, KIWA, waar het speurwerk voor de waterbedrijven wordt uitgevoerd, en die hebben een
belangenorganisatie opgericht, de VEWIN. En die hebben ook een personenvereniging, de KVWN, waar, als jullie hierin doorgaan, zul je daar allemaal lid
van worden. En ja, er is een heel wereldje waarin er goed samengewerkt wordt enzo, informatie uitgewisseld. Dat is ook wel iets bijzonders van Nederland.
Is in Nederland ook makkelijker dan in Amerika natuurlijk he. In Amerika kun je niet zo makkelijk even samenwerken tussen Los Angeles en New York. Dat
gaat in Nederland allemaal wat makkelijker. Als we naar de opzet van de infrastructuur kijken, dan zijn dit essentiele kenmerken. Om te beginnen de
bescherming van de bron, daar moet alles natuurlijk mee beginnen. Niet het paard achter de wagen spannen en met vies water beginnen. Nee, begin altijd
met een zo schoon mogelijke bron, en zorg dan ook dat die bron schoon blijft. Daarom zie je overal, in de duinen en in de bosgebieden, zie je van die
bordjes staan met grondwaterwinning, grondwaterbeschermingsgebied. Ja, niet verontreinigen als je het kunt voorkomen. Gebruik grondwater als het
mogelijk is. Dus in het hele blauwe gebied, het noorden, het oosten en het zuiden van Nederland, wordt alleen maar grondwater gebruikt voor de
drinkwatervoorziening. Daar is grondwater beschikbaar, dat is van goede kwaliteit. Dat is microbiologisch betrouwbaar, dus je zal er sowieso nooit ziek van
worden, dus dat is de voorkeursbron, die gebruiken we dan dus ook. Nou, in het westen van Nederland kan dat natuurlijk niet. Dat weet jij ook he? - Eeh...
Waarom niet? - Eeh, dat weet ik niet meer. Dat ben je vergeten. Maar iemand anders weet het misschien wel, want gezond boeren verstand, kun je ook
een hoop mee he. Dus gewoon even nadenken, het is helemaal niet zo moeilijk. - Zeewater. Zeewater, zout, precies he. Dus het grondwater hier is zout.
Ontzouten is heel erg duur, dus dat is eigenlijk niet praktisch. Dus ja, hier kun je geen grondwater gebruiken, dus gebruiken we maar oppervlaktewater. En
dan doen we dat bij voorkeur, hier in het hele duingebied, door van dat oppervlaktewater, kunstmatig grondwater te maken via die infiltratie. Dus we
pompen oppervlaktewater de duinen in, laten dat de bodem in zakken, en dat wordt het een soort kunstmatig grondwater. Wil jij een vraag stellen? - Ja,
want ik zie op de waddeneilanden wordt wel grondwater gebruikt, - maar dat is in principe ook een duingebied. Ja. - Maar daar zou toch ook zout in het
waterwingebied voorkomen? Ja, dan moet ik even iets meer zeggen dan. In het hele duingebied geldt eigenlijk dat er door, als gevolg van eeuwenlang
regen die op die duinen gevallen is, dat er een zoetwaterbel op het zoute water drijft. Dus als je heel voorzichtig dat water wint, kun je wel zoet water
winnen. Dat kan snel fout gaan hoor, dus daar moet je echt wel mee oppassen, maar dat kan net wel. En zo is ook de duinwaterwinning in het westen van
Zuid Holland en Noord Holland begonnen, in de 19e eeuw, door gewoon eerst duinwater op te pompen. Op een gegeven moment is het daar fout gegaan,
kregen ze zout water, en toen zijn ze met dat infiltreren van oppervlaktewater begonnen. Nou, als het helemaal niet anders kan en dat is nou net bij
Rotterdam het geval, en daarom gaan we ook bij die Beerenplaat kijken, daar moet je oppervlaktewater gebruiken. Rotterdam heeft geen grondwater,
Rotterdam heeft ook geen duinen, dus je moet daar oppervlaktewater gebruiken. Dan moet je dus een hele uitgebreide zuivering hebben, dus dat is ook
niet eenvoudig. En daar gaan we kijken. Dus bronbescherming, dat is toch nummer 1. We zien ook regelmatig dat de waterbedrijven berichten in de krant
zetten van, deze stof moet verboden worden, hier moeten beperkingen aan gesteld worden. Gewoon zorgen dat wat goed is, goed blijft. Nou, grondwater
dat plaatje. Ik denk dat dit toch wel erg illustratief is. Als je een mental map hiervan maakt, van grondwater, dat is eigenlijk regenwater wat door een
gigantisch zandfilter gestroomd is. Nou, dat is goed. Oppervlaktewater, nou, bijvoorbeeld bij Scheveningen hier. Er is gewoon in de natuurlijke
duinvalleitjes, pompen we Maaswater. Wel na voorzuivering overigens hoor, want anders verstoppen die duinvalleien meteen, en dan verontreinigen we het
duinmilieu, dat willen we natuurlijk niet. Dus het water wordt eerst voorgezuiverd, dan de duinen in gepompt, dan zakt het de bodem in. Dan winnen we
het weer terug met putjes die her en der in die duinen ook geplaatst zijn. Dan gaat het naar de nazuivering toe, die we hier zien staan, en dan vervolgens
het distributienet in. Oppervlaktewater hebben we meervoudige barrieres. Op dit moment volstaat eigenlijk om zo'n stroomschema te zien, en te zien dat
daar een heleboel stappen achter elkaar zitten. We hebben gewoon veel afzonderlijke zuiveringsprocessen. Enerzijds om er zeker van te zijn dat als de ene
iets minder werkt, dat de andere het wel opvangt. Veiligheid, robuustheid, is heel belangrijk. En anderzijds, om verschillende soorten stoffen met
verschillende zuiveringssystemen tegen te kunnen houden. Dus het gaat altijd om vrij uitgebreide zuiveringsschema's als het over oppervlaktewater gaat.
Daar gebruiken we ook moderne technologie bij. Dus dat zijn dan weer ontwikkelingen die de afgelopen decennia zeg maar mogelijk geworden zijn. Hier
zien we de membraanfiltratie-installatie bij Heemskerk. Dat is de modernste en grootste zuivering van dit type in Europa. Is in Nederland ontwikkeld. We
zien hier de desinfectie met UV licht. Dus dat zijn eigenlijk gewoon TL buizen zou je je kunnen voorstellen, maar die stralen dan UV licht uit. En bacterien
die kunnen daar niet tegen, die gaan daar dood van. Dus dat is een goede manier om desinfectie van dat water te bewerkstelligen. Nou, dat is 2 jaar
geleden geopend in aanwezigheid van de Prins en ook dat is weer een Nederlandse ontwikkeling, om de zuivering weer beter te krijgen. Nou, het resultaat
daarvan is dan dat we dus... Aan de kraan, als we die openzetten, dan komt er water van een hoge kwaliteit uit he, zuiver water, wat geen
verontreinigingen bevat, en ook geen chloor. Wat ook zacht is. Het water is ook onthard in Nederland. Daar komen we nog later op terug. En uiteindelijk is
het resultaat mede daardoor, dat we in Nederland ook helemaal geen flessenwater gebruiken. Althans, heel weinig. En dat is uiteindelijk weer, als je er
macro-economisch naar kijkt, of zelfs naar de individuele klant, is dat gewoon een hele verstandige zaak, want flessenwater is vele malen duurder dan
drinkwater. Het is 500 keer duurder. Het is ook veel slechter voor het milieu. Het milieubeslag van flessenwater, daar zijn eens een keer sommetjes van
gemaakt, met ecopunten enzo, van die flessen moeten allemaal over de weg vervoerd worden met vrachtwagens, en die moeten weer schoongemaakt
worden enzovoort. Als je die sommetjes maakt, dan is het milieubeslag van flessenwater 30 keer zo hoog als van drinkwater. Dus als je even op de
achterkant van een sigarendoos een sommetje maakt, van wat nou de Nederlander voor water kwijt is, en de Italiaan, dan is de Italiaan 2 tot 3 keer zoveel
kwijt voor water dan de Nederlander. En dat zit hem vooral in het feit dat men flessenwater gebruikt. De drinkwatervoorziening zelf, de kosten daarvan, zijn
min of meer vergelijkbaar, want dat is ook een kenmerk van dit soort grootschalige infrastructuur. Om iets goed te doen, meervoudige zuiveringen, veilige
systemen maken, dat is niet zo heel veel duurder dan om het slecht te doen. Want het merendeel van de kosten zit er toch in dat je moet beginnen met
een winning te maken, je moet een zuivering hebben, je moet transportleidingen, distributieleidingen, heel veel van die kosten die heb je sowieso. En als je
het goed doet, is niet veel duurder dan als je het slecht doet. Ik denk dat ik er ben, oh ja, we hebben ook nog andere dingen, dus we hebben het laagste
lekpercentage van de wereld, en hele betrouwbare systemen en we letten tegenwoordig natuurlijk in Nederland op waterbesparing. Water is toch een
natuurlijke grondstof, dat moet je niet verspillen. Dus het waterverbruik in Nederland stijgt niet, is relatief constant, en het huishoudelijk waterverbruik daalt
zelfs, doordat we tegenwoordig waterbesparende toiletten en douches en wasmachines enzo hebben, en die worden ook allemaal gestimuleerd, krijg je
subsidie op enzovoort. We zijn er allemaal verantwoord mee bezig. En dan hebben we de laatste dia. Dus het uiteindelijke resultaat van die hele filosofie en
de dingen die daaraan gedaan zijn de afgelopen 30 jaar, is dus dat we zeggen van, we hebben het wonder uit de kraan. Dat was een reclamekreet van de
waterbedrijven een aantal jaren geleden. Werd toen posters van gemaakt en reclame op radio en TV. Het wonder uit de kraan, heel goed water, het is er
altijd. We worden er niet ziek van, geen verontreinigingen. We hebben geen flessenwater nodig. We hebben geen filters aan de kraan nodig. We verspillen
het water niet. Dus we hebben de zaken goed voor elkaar. Nou, dat is enerzijds natuurlijk een beetje een gechargeerd beeld. Er zijn best wel dingen die
nog beter kunnen en beter moeten, en daar komen we ook wel op terug, maar qua filosofie zeker in vergelijking met Canada en Amerika bijvoorbeeld, is
dat gewoon zo. En aan de andere kant moeten we ons ook realiseren dat dat natuurlijk ook heel anders kan, en in heel veel landen ook heel anders gaat.
En het meest extreme voorbeeld daarvan zijn natuurlijk de ontwikkelingslanden, waar gewoon de basisinfrastructuur nog volledig ontbreekt, en daar gaat
na de pauze Doris over vertellen. We gaan even pauzeren, bedankt.
190
Annex D. Subtitling of Collegerama
Annex D2. Transcript lecture CT3011 (sorted)
Na een ruime inlooptijd
kunnen we beginnen
met het tweede deel van 30-11, Watermanagement.
Het deel over gezondheidstechniek ga ik
de komende zeven weken met jullie doornemen.
En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus,
mijn naam is Hans van Dijk, zoals jullie daar zien staan
en ik dacht, laat ik daar maar twee dingen voor nemen,
mijn hobby en mijn werk.
Nou de hobby dat zien jullie,
ik ben een marathonloper.
Een mooie foto van de glorieuze binnenkomst in Rotterdam
in april afgelopen periode.
Marathonlopers dat zijn allemaal een beetje fanatieke lui he,
echte doordouwers, die trainen iedere dag.
Die weten hun leven zodanig te organiseren dat dat allemaal kan.
Dus ik loop hier ook iedere dag tussen de middag een rondje naar
Delfts hout, of langs de Schie,
of een ander parcour hier.
Als jullie me eens een keer in korte broek of trainingspak zien lopen
dan klopt dat, dat ben ik.
En dat doe ik inmiddels met een heel groepje mensen,
bij ons op de afdeling, met studenten en promovendi.
En een van die studenten is hier weergegeven, dat is Karin Teunissen.
Die zat drie jaar geleden hier bij inleiding watermanagement.
Was toen derde jaars, inmiddels is ze afgestudeerd
en begonnen met een promotieonderzoek bij het duinwaterbedrijf in
Scheveningen.
En zij is ook een fanatieke hardloper geworden, en zo hebben wij in
april
42 kilometer samen gelopen.
Nou dat is een herinnering die ons beide in het geheugen gegrift zal
blijven.
Dan het werk.
Ik heb, ik ben, ja een vraag,
ben jij ook een hardloper?
- Sorry?
Ben jij ook een hardloper?
- Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al
gegeven.
Nee
- Niet? Dan weet ik niet hoe ik dit al wist, maar...
Nou ik zeg dit wel eens vaker, dus dat zou best kunnen.
Waar ben je geweest?
- Ja volgens mij vorig jaar, maar...
Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha.
Maar deze foto is echt van april hoor dus dat is toch vrij recent.
Wat misschien zou kunnen zijn is,
ik geef ook altijd een van de gastcolleges
bij inleiding Civiele Techniek in het eerste jaar.
En daar begin ik natuurlijk ook een beetje met, ja wie ben ik,
dus dat zou best kunnen, dat je het daarvan herinnert.
Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd.
Ik heb toen ook Civiele Techniek gestudeerd, in '76 afgestudeerd.
Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in
Amersfoort,
en dat kan ik jullie van harte aanraden als je straks afgestudeerd bent
om bij een ingenieursbureau te gaan werken.
Dat is een geweldige ervaring,
je bent met allerlei projecten over de hele wereld bezig.
In mijn geval dan drinkwater projecten.
Dus het ontwerpen van zuiveringsinstallaties,
bouwen van systemen,
ook het doen van onderzoek.
Eigenlijk kun je alle kanten op bij een ingenieursbureau
en de Nederlandse ingenieursbureaus zijn redelijk succesvol,
ook op de internationale markt tegenwoordig.
Ja ik heb daar vele jaren gewerkt,
totdat op een gegeven moment, inmiddels is dat alweer 17 jaar
geleden,
er een advertentie stond dat we een hoogleraar zochten hier in Delft.
En toen dacht ik van, nou ja, laat ik maar eens een brief schrijven,
je weet het nooit, niet geschoten is altijd mis.
Dus ik heb een brief geschreven en ik dacht,
ik zal het vast wel niet worden, maar ik werd het wel.
Dus ook daar zit al meteen een eerste levensles in,
probeer maar eens wat en het kan altijd meevallen.
Ik ben in eerste instantie vervolgens
voor een dag in de week hier deeltijdhoogleraar geworden
in de drinkwatervoorziening, dat is mijn leerstoel.
En ja, zo langzamerhand van het een komt het ander,
je wordt voor steeds meer dingen gevraagd.
Dus ik ben langzamerhand meer dingen hier in Delft gaan doen
en die aanstelling bij DHV heb ik steeds verder afgebouwd,
en vanaf 1999 ben ik volledig gestopt bij DHV en ben ik hier
voltijd hoogleraar.
En voltijd hoogleraar dat betekent ook,
je hebt enerzijds taken op het gebied van onderwijs,
anderzijds onderzoek, maar ook management,
dus management, ja, dan moet je, ik ben hoofd van een afdeling enzo
en dan zit je in het managementteam of in de opleidingscommissie.
Annex D. Subtitling of Collegerama
Moet je over algemene dingen meepraten en beslissen.
Daar kun je natuurlijk een dagtaak van maken,
dat heb ik altijd vermeden.
Ik vind het toch altijd het leukste om met het vak bezig te zijn
en daarmee kom ik op het tweede plaatje wat hier staat,
want het allerleukste is eigenlijk afstudeerders begeleiden.
Dat gaan jullie de komende jaren dat proces doormaken.
Dat is voor ons altijd ontzettend leuk om te zien
hoe studenten zich transformeren van
min of meer anonieme figuren die in de collegezaal zitten
en zitten te luisteren.
Min of meer absorberen wat ik in een monoloog aan het overdragen
ben.
Hoewel ik overigens wel reacties van jullie zeer op prijs stel hoor
en ik zal daar ook af en toe expliciet om vragen.
Maar goed, de praktijk is toch dat in deze fase van de studie
zitten jullie nog vooral te luisteren
en dat wordt eigenlijk steeds leuker als je verder komt
in het vierde en het vijfde jaar
en het hoogtepunt is dan natuurlijk het afstuderen,
waar je echt een onderwerp helemaal zelf bij de kop pakt.
Ik zeg ook altijd tegen mijn afstudeerders,
je moet van je afstudeerproject je visitekaartje maken,
he Doris,
en dat werkt ook echt zo.
Op het moment dat je klaar bent met dat afstudeeronderwerp
dan weet jij het meeste van dat onderwerp af.
Meer dan wie dan ook in Nederland.
Dat bewijzen we ook iedere keer weer
door de afstudeercolloqui.
Daar geven we veel kenbaarheid aan,
daar komen altijd mensen vanuit de waterbedrijven
van KIWA,
van andere researchinstituten.
Die doen daar mee in de discussies
en onze afstudeerders die weten keer op keer alle vragen te
beantwoorden.
Misschien niet altijd 100% goed, maar toch wel 99% goed.
Dat is altijd een genoegen om mee te maken.
Ik zeg ook altijd dat ik trots ben op mijn afstudeerders,
en dat is ook zo.
Ik heb er inmiddels een stuk of 80 gehad
en soms gaat het dan heel goed,
zoals hier staat met Karin en Doris,
Doris is hier trouwens in de zaal aanwezig,
die dan het afgelopen jaar allebei zelfs met lof zijn afgestudeerd.
Dat betekent dus dat je het heel goed gedaan hebt,
hoge cijfers gehaald hebt,
en ook het afstudeerproject heel goed gedaan hebt.
Ja, dat is voor ons gewoon heerlijk om dat mee te maken.
Om te zien hoe jonge mensen het vak ook leuk gaan vinden,
zelf ook enthousiast worden,
en hun stempel gaan zetten op ons vakgebied.
En ik hoop dat enkele van jullie ook zo ver zullen komen.
Goed, dat is wat mijzelf betreft.
Dan wat dit vak betreft.
We gaan dat doen aan de hand van het boek,
dat staat al op blackboard aangegeven.
Daar hebben we een Nederlandse en een Engelstalige versie van.
Dat boek dat moeten jullie kopen
bij de secretaresse van ons, Mieke op de vierde verdieping,
voor 25 euro.
In de winkel kost het 50 euro, maar wij hebben een speciale
kortingsregeling.
Jullie mogen zelf weten of je het Nederlandse of het Engelse boek
koopt.
De inhoud is vrijwel hetzelfde en in ieder geval voldoende voor dit vak.
Als jullie een advies van mij willen hebben dan zou ik zeggen,
als je goed Engels kunt lezen, koop het Engelse boek,
dat is iets actueler,
staat iets meer informatie in,
maar het Nederlandse boek is voor dit vak zeker voldoende.
Ja, zo'n boek heeft natuurlijk, behalve dat we er over gaan vragen bij
het tentamen,
daar zal ik bij mijn volgende dia op terugkomen,
heeft zo'n boek natuurlijk ook nog een zekere functie als naslagwerk.
Als je zo'n boek eenmaal hebt, dan heb je dat bij je,
ook na je afstuderen neem je dat mee.
Als je vervolgens ergens in een vreemd land een installatie moet
ontwerpen,
dan haal je dat boek weer eens uit de tas
en dan weet je weer het een en ander.
Die functie heeft zo'n boek ook.
Daar staan vraagstukken ook in, in dat boek,
en we hebben ook vraagstukken op blackboard staan.
Dat zullen jullie misschien ook al gezien hebben,
computer assignments.
Dat is overigens niet verplicht,
er is bij ons niets verplicht.
Ja, jullie moeten uiteindelijk het tentamen doen,
maar we bieden materiaal aan,
191
dus maak er gebruik van zou ik zeggen
maar we gaan dat niet controleren.
Er staan daar vragen op blackboard,
er zitten vragen in dat boek,
de antwoorden staan er ook bij,
of althans, als je die computer assignment gemaakt hebt dan krijg je
na afloop
te melden welke vragen goed waren en welke vragen fout waren.
Dus dat is een ondersteuning voor jullie bij het kennismaken met de
materie
en het leren van de stof.
En oude tentamens hebben we daar ook bij staan,
dus dan kun je ook nog eens oefenen en kijken wat er ongeveer
gevraagd wordt.
En dan gaan we college geven de komende periode.
Oh ja, dus over het boek,
jullie hoeven niet het hele boek te kennen.
Dat boek wordt zowel gebruikt bij 30-11, als bij het volgende college
34-20,
wat een a keuzevak is voor de mensen die watermanagement gaan
doen,
en de hoofdstukken die voor 30-11 gevraagd worden op het tentamen
staan hier aangegeven.
En die presentatie komt ook weer op blackboard zoals jullie weten,
inclusief deze video opname.
Dan gaan we deze colleges geven, dus 7 keer de komende periode
vanaf nu,
en ik wil het dit jaar zo doen dat in het eerste uur
vertel ik een beetje de grote lijn
van het betreffende onderwerp.
De belangrijkste punten, ik probeer daar wat kleuring aan te geven.
Wat is nou belangrijk en wat minder.
En het tweede uur heb ik steeds een van de promovendi,
vandaag is dat Doris,
die dan iets gaan vertellen over hun eigen onderwerp,
hun eigen onderzoek,
hun eigen project,
wat een stukje actualiteit geeft,
en kleuring, verdieping, van het betreffende onderwerp.
En ik heb het zo georganiseerd
dat dat steeds, als het goed is, goed op elkaar aansluit
en jullie een goed beeld geven van de stof,
zodat je straks het tentamen ook makkelijk kunt maken.
Dat wil niet zeggen dat alle onderdelen
van de verhalen van de promovendi tentamenstof zijn.
Dat zullen we zo her en der ook wel aangeven.
Ja, zo'n promotieonderzoek dat gaat natuurlijk veel dieper
dan jullie nu in het derde jaar hoeven te weten,
maar het gaat meer om de beeldvorming, de kleuring en
het begrip van de materie.
Dan hebben we een excursie gepland naar de Berenplaat,
de grote zuiveringsinstallatie bij Rotterdam,
bij Spijkenisse om precies te zijn, op 11 oktober.
Ook dat is niet verplicht,
alles is facultatief bij ons.
Daar hebben zich tot nu toe een stuk of 60 mensen aangemeld.
De inschrijving sluit op 1 oktober hebben we gezegd,
omdat bij de waterbedrijven tegenwoordig ook strikte
veiligheidsvereisten enzo zijn na
de aanslagen in New York.
Je moet daar precies opgeven wie er allemaal komen, met naam enzo
en wij moeten daar voor instaan ook, dat er geen vervelende dingen
gebeuren,
en er moeten natuurlijk ook bussen gereserveerd worden
en we krijgen daar lunch geserveerd.
Dus de mensen die zich opgegeven hebben die krijgen nog een mailtje
binnenkort,
kort na 1 oktober,
met een bevestiging,
en degene die zich niet opgegeven hebben die gaan niet mee.
En ik ga er ook van uit dat degenen die zich wel opgegeven hebben,
dat die ook komen he,
het is natuurlijk een beetje vervelend tegenover de organisatoren
als we daar met veel minder mensen zouden aankomen dan we
aangemeld hebben.
We zullen proberen, ik heb wat vragen gekregen over dat er 's middags
verplichte practica zouden zijn
van constructieleer en statistiek geloof ik,
dus we zullen proberen om tijdig weer terug te zijn.
Dat zal zeker niet om half 2 zijn,
dus ik denk dat we ongeveer om half 3 terug zullen zijn,
en we vertrekken gewoon na het college op donderdag,
dus om half 11.
Ik weet niet of,
even kijken of ik al ga beginnen, ja ik ga al beginnen dus,
zijn er vragen over de organisatie en deze algemene inleiding?
Okee.
Nou dan ga ik kort even iets vertellen over gezondheidstechniek,
dat zal jou ook bekend voor komen want dat heb ik ook bij het eerste
jaar al verteld,
en dan ga ik iets meer vertellen over de drinkwatervoorziening van
Nederland
en na de pauze gaat Doris dan iets vertellen
over de drinkwatervoorziening in ontwikkelingslanden,
want daar is zij vooral mee bezig.
We hadden natuurlijk gezondheidstechniek.
192
Nou dat zal ieder van jullie niet onbekend zijn,
dat dat gaat over de stedelijke waterkringloop,
dus de infrastructurele werken voor de voorziening van drinkwater,
het winnen van grondwater,
het winnen van oppervlaktewater,
het zuiveren daarvan,
het vervolgens transporteren met een heel transportleidingen en
distributieleidingensysteem
naar ons allen toe.
Naar de huishoudens en de industrieen,
de bedrijven,
vervolgens het inzamelen van het afvalwater via de riolering.
Het zuiveren van dat afvalwater
en dat wordt dan vervolgens weer geloosd op het oppervlaktewater.
Dus alle infrastructurele werken die over die kleine stedelijke
waterkringloop gaan,
dat is wat we gezondheidstechniek noemen,
en ik zal hier vooral focussen op de drinkwatervoorziening,
omdat we daar ook het meest duidelijke effect zien
zoals hier in deze figuur weergegeven.
Het verdwijnen van besmettelijke ziekten in Nederland,
doordat die niet meer overgedragen worden via besmet drinkwater.
In de rest van de wereld is dat natuurlijk nog een hele andere situatie,
maar hier hebben we daar flink veel succes mee gehad
in de 20e eeuw.
We zien hier een plaatje dat weergeeft
de daling van de sterfte aan buiktyfus in de 20e eeuw,
en dat loopt parallel aan het percentage van de mensen
wat niet aangesloten is op de drinkwatervoorziening,
in diezelfde periode is in Nederland de drinkwatervoorziening
aangelegd.
Rond 1900, zelfs kort voor 1900,
de grote steden en zo langzamerhand ook de kleinere steden en het
platteland,
en vanaf 1975 zeg maar, is in Nederland iedereen op de
drinkwatervoorziening aangesloten
en komen besmettelijke ziekten die door besmet drinkwater
overgedragen worden
ook niet meer voor.
Dus het gaat bij ons om infrastructurele werken
voor een goede waterkwaliteit,
dus zaken als waterwinning, waterzuivering, watertransport,
waterchemie en microbiologie ook, die waterkwaliteit.
Microbiologie, enerzijds het afwezig zijn van organismen waar we ziek
van kunnen worden
maar anderzijds ook het gebruiken van micro organismen
om de zuivering te optimaliseren.
Micro organismen kunnen ook weer verontreinigingen afbreken,
bekendste voorbeeld daarvan is de afvalwaterzuivering
waar we met behulp van zuurstof en actief slib,
dat is een mengsel van bacterien,
de afvalstoffen in het afvalwater laten afbreken.
Dus waterkwaliteit, waterchemie en microbiologie
zijn in dit deel van de civiele techniek vrij belangrijk.
We maken natuurlijk ook gebruik van de algemene kennis van civiele
ingenieurs
en met name dan van zaken als hydraulica, hydrologie,
constructieleer, constructieve vormgeving,
projectrealisatie, informatica,
zijn natuurlijk allemaal dingen die je in projecten nodig hebt.
Vaak ook in teamverband,
bij zo'n ingenieursbureau bijvoorbeeld.
De een is meer bezig met de automatisering,
de ander is meer bezig met het constructieve deel,
een derde is weer met de hydraulica bezig,
en jullie kunnen afhankelijk van de specialisatie die je kiest
daar een verschillende rol in spelen.
Die gezondheidstechniek is natuurlijk van groot belang voor de
volksgezondheid,
dat spreekt voor zich.
Het gaat over relatief grootschalige infrastructurele werken,
we zien hier de zogenaamde Biesbosch bekkens.
Dat is in de Brabantse Biesbosch.
Bekkens die aangelegd zijn voor de drinkwatervoorziening,
en het gaat om een goed georganiseerde sector met heldere taken.
Er is zelfs een aparte wetgeving voor,
de waterleidingwet, als het over de drinkwatervoorziening gaat,
waarin gewoon staat precies waar alles aan moet voldoen,
en dat de directeur van het waterleidingbedrijf daar persoonlijk voor
aansprakelijk is.
Die riskeert gevangenisstraf als die onvoldoende water
of water distribueert waar je ziek van kan worden.
Dus dat is allemaal goed georganiseerd.
En we doen daar in Delft een hoop aan,
dus die leerstoel van mij, de leerstoel drinkwatervoorziening,
is de enige leerstoel in Nederland op het gebied van de
drinkwatervoorziening.
Dus dat is wel fijn, geeft ons een zekere exclusiviteit.
Veel van onze studenten die zijn dus ook,
ja die hebben toonaangevende posities in die vakwereld,
die zijn directeur of staffunctionaris,
of ontwerper bij de waterbedrijven,
en ook veel van onze ingenieurs gaan naar de ingenieursbureau's toe.
Nou, daar gebeurt een heleboel,
af en toe hebben we zelfs ook gastcolleges van Willem Alexander
die ook het watermanagement interessant vindt.
Annex D. Subtitling of Collegerama
Nou dan heb ik tenslotte nog drie dia's
voordat ik wat meer ga vertellen over de opzet van de infrastructuur in
Nederland,
die nog even wat illustreren van dat werk van ons vakgebied.
Dus dit plaatje dat heb jij ook al gezien he,
dus jij kan mij nu ook vertellen waar dit dipje vandaan komt?
- Volgens heeft dat iets met de pauze te maken.
Ja precies,
dus dit is het waterverbruik tijdens massa events,
in dit geval de WK voetbal,
en nu zie je dat we ons allemaal als kuddedieren gedragen.
Dat vanaf het begin van de wedstrijd, dat is hier, het waterverbruik
enorm naar beneden gaat.
Niemand gebruikt meer water,
iedereen zit voor de TV,
zit te kijken.
Totdat het rust is,
dan rennen we allemaal naar de WC en naar de koffieautomaat,
dan hebben we een enorme stijging in het waterverbruik.
In de tweede helft gaat weer iedereen kijken,
zien we weer een zeer lage piek in het waterverbruik
met zelfs een minimum kort voor de tijd toen dat beslissende doelpunt,
in dit geval door Dennis Bergkamp gemaakt werd,
en aan het einde van de wedstrijd rent iedereen weer naar de WC toe.
En datzelfde zie je dus ook bij het industriele verbruik he.
Zelfs daar is het zo dat operators enzo, die zitten ook te kijken,
en alles zit toch een beetje op halve kracht te draaien.
Dat is een enorm reproduceerbaar fenomeen,
deze curves,
soort electrocardiogrammen van ons gedrag.
Het gedrag van de bevolking.
En dit dipje, dat noemen we inderdaad de Cruijff dip.
Dat is het moment tijdens de pauze waarop Cruijff commentaar komt
geven.
Dan rent iedereen weer even terug van de WC om even te luisteren
wat Cruijff te zeggen heeft
en dan worden vaak ook de doelpunten herhaald,
en dan kijken we allemaal weer eventjes naar de TV.
We zijn natuurlijk vooral bezig met ontwerpen.
Het gaat natuurlijk vaak om nieuwe infrastructurele werken.
De bouw van een pompstation, het ontwerp van een
zuiveringsinstallatie en transportleiding
en ontwerpen daar hebben jullie natuurlijk al veel over gehad
bij projectonderwijs,
en het ontwerponderwijs.
Dat is schematiseren.
Een bepaald kader in je hoofd maken van hoe iets in elkaar zit.
Dus een filter,
hoe schematiseren we dat nou,
en hoe stroomt het water door een installatie heen.
De hydraulische lijn.
Daar moeten we een bepaald schema van maken.
Daar moeten we formules op kunnen toelaten.
Dat moeten we kunnen berekenen.
En daar moeten we vooral ook geen fouten bij maken,
daar is dit plaatje voor bedoelt.
Een van de koolfilterinstallaties bij de drinkwaterleiding van Rotterdam,
bij Kralingen, langs de Drienernoordbrug,
waar toendertijd een keer waterslag is opgetreden,
met als gevolg implosie van dat koolfilter,
en dat is natuurlijk heel vervelend.
Vaak loop je daar dan ook tegenaan dat je met die wet van Murphy te
maken hebt,
dat alles wat fout kan gaan dat gaat ook een keer.
Dus waterslag dat is het verschijnsel dat als bijvoorbeeld een pomp
afslaat,
dat er een onderdrukgolf kan ontstaan
en die onderdruk die kan dus inderdaad tot implosie leiden.
Nou dat kan je natuurlijk voorkomen door een ontluchting
beluchtingsventiel aan te brengen.
Dat is hier ook gedaan, bovenop dat koolfilter zat zo'n ventiel,
maar helaas was het net op het moment dat die pomp hier uitviel,
ten gevolge van een stroomstoring, was het ook winter en was het een
hele strenge vorst
en was dat ontluchtingsventiel bevroren,
waardoor er geen lucht meer kon toetreden
en er dus toch vacuum ontstond in dat vat,
en ja, dit resultaat optrad.
Dus ontwerpen is vooral ook bewust zijn van dingen die mis kunnen
gaan,
vandaar ook dat hydraulica ook vrij belangrijk is.
Het is natuurlijk heel vervelend als het water ergens uit spuit
of de verkeerde kant op gaat,
dus je moet vooral ook steeds alert zijn op dingen die fout kunnen
gaan
en ontwerpen is vooral ook ervaring.
Dingen gezien hebben,
hoe doe je het in de praktijk nou?
Vandaar ook dat we die excursie gepland hebben naar de Beerenplaat
toe,
dan kunnen jullie voor de eerste keer vast eens even kijken van,
ja, hoe ziet zo'n installatie er nou uit,
waar moet je nou allemaal rekening mee houden?
Alright, nou, nog een paar plaatjes van een ander project,
in Limburg in dit geval,
waar een grote transportleiding is aangelegd
Annex D. Subtitling of Collegerama
bij een oppervlaktewaterproject
in Panheel.
Dat was in het kader van de zogenaamde verdrogingsdiscussie.
Dat is een discussie die in Nederland een aantal jaren gevoerd is,
onder andere door de winning van drinkwater gaan de
grondwaterstanden omlaag
en treed er verdroging van natuurgebieden op.
Dus er is toen hier in Limburg gezegd, een jaar of 10 geleden van,
nou we moeten de grondwaterwinning gaan verminderen en overgaan
op de Maas.
Die stroomt tenslotte door Limburg heen, dus dat is vrij makkelijk.
Toen is er hier een spaarbekken aangelegd.
Nou aangelegd, dat was een oud grindgat.
Dus er was daar grind gewonnen, dus die put was er toch al.
Die is gevuld met Maaswater.
Dat Maaswater gaat vervolgens vanuit dat bekken, dat zien we hier,
zakt dat vanzelf de grond in.
Dat noemen we infiltratie, kunstmatige infiltratie,
dat water zakt de grond in waarbij er alvast een heleboel
kwaliteitsverbetering optreed.
Allerlei stoffen die worden afgefiltreerd tussen het zand van de
ondergrond,
en de bacterien gaan dood door de lange verblijftijd.
Dus je krijgt al een aanzienlijke verbetering van de waterkwaliteit.
Dan wordt het water weer opgepompt met behulp van putten,
die dan op een bepaalde afstand rond dat bekken zijn opgeplaatst.
Dus dan win je eigenlijk een soort kunstmatig grondwater.
Je maakt dan eigenlijk van het Maaswater, wat natuurlijk allerlei
bacterien en virussen
en andere verontreiningen bevat, maak je een soort kunstmatig
grondwater.
Dat wordt dan weer gewonnen en het wordt vervolgens nog gezuiverd
in de zuiveringsinstallatie die we hier zien weergegeven.
En dan ging het dus met die transportleiding
door heel Limburg heen,
naar de verbruikers toe.
En tenslotte doen we natuurlijk ook onderzoek,
vooral hier op de TU.
Als je bij een ingenieursbureau werkt, nou dan heb je niet zoveel
onderzoek nodig,
dan gebruik je meestal vuistregels en ontwerpcriteria,
maar het vakgebied ontwikkelt zich natuurlijk ook steeds verder,
er zijn iedere keer weer nieuwe bedreigingen.
Momenteel bijvoorbeeld nogal in het nieuws, het voorkomen van
geneesmiddelen in de Rijn.
De pil die is aantoonbaar in concentraties in de Rijn aanwezig,
en komt dat nou ook in het drinkwater terecht en wat moeten we
daaraan doen.
Moet de zuivering weer uitgebreid worden?
Dat soort vragen die leven.
En dan zijn we dan met onderzoek bezig.
Onderzoek dat gebeurt vaak ter plaatse bij ons.
Dit is een plaatje van het veldpracticum in Luxemburg.
Zal Huub Savernije misschien afgelopen maandag ook wat over verteld
hebben,
maar dat is ook heel relevant omdat het ene water het andere niet is.
Water is een natuurlijke stof en de verontreinigingen en de stoffen
waar het om gaat,
ja dat is afhankelijk van de bron.
De interactie, de lozing van stoffen die eventueel plaats gevonden
hebben.
Interactie met de bodem, bladeren en natuurlijke afvalstoffen die in het
water terecht komen.
Dus ieder water is weer anders en je moet het bij voorkeur ter plaatse
doen.
Het is niet zo goed mogelijk om te zeggen van, nou ja, ik doe in het
laboratorium maar proeven.
Nee, je hebt toch altijd weer de toets nodig van de praktijk.
Gedraagt het water zich in de praktijk ook zoals we dat theoretisch
denken.
Sommige dingen gebeuren natuurlijk wel in het lab.
Er is hier ook een laboratorium Stevin 3, het waterlaboratorium,
waar allerlei opstellingen staan.
Filters, bezinkinstallaties, andere proefopstellingen,
en daar krijgen jullie later, zullen jullie daar zelf ook practicum doen
als je in deze richting door gaat.
En uiteindelijk kun je zelfs een promotieonderzoek doen, en in de aula
de doktorsbul uitgereikt krijgen.
Goed. Dan heb ik nog een kwartier als ik het goed heb.
Ja, en die kan ik goed gebruiken voor een stukje om
eens even een eerste verhaal vast te geven van
wat is er nou bijzonder aan de drinkwatervoorziening in Nederland?
Wat moeten jullie daar nou van weten.
En ik maak daar gebruik van een presentatie die ik vorig jaar gegeven
heb in Canada,
voor de Canadeze waterbedrijven,
en daar is het weer heel anders.
Dus ik heb daar ook echt mijn best gedaan om een beetje duidelijk te
maken van,
wat is er nou bijzonder in Nederland?
En wat zouden jullie in Canada daar nou aan kunnen hebben?
En ik denk dat dat voor jullie ook een aardige introductie zou kunnen
zijn in het vakgebied.
En eigenlijk is dat trouwens al heel kernachtig weergegeven met dit
plaatje.
Dus dat is een plaatje van een kindje, Joey,
193
die het water uit de kraan drinkt en eigenlijk, zoals dat plaatje hier
weergeeft,
vertrouwt he.
Dus het water moet zo goed zijn, dat je er volledig op kunt
vertrouwen.
Dat je het zelfs je kinderen laat drinken en dat het boven elke
verdenking verheven is.
Dat is eigenlijk de kern van de filosofie van de drinkwatervoorziening in
Nederland.
Dat is natuurlijk ook bij andere landen in zekere zin wel het geval,
maar toch veel minder.
Ik weet niet of jullie in Amerika en Canada en dat soort landen geweest
zijn.
Daar is het eigenlijk meer zo dat men het drinkwater, dat heet daar
ook tapwater,
kraanwater, dat is meer iets dat gebruik je voor de wasmachine, en de
WC,
dat doen wij ook hoor,
maar drinken doe je dat eigenlijk niet, in Canada en Amerika.
Als je water wilt drinken, dan ga je een fles kopen bij de supermarkt.
Of je zet nog een filter op je kraan, om het water na te zuiveren.
En dat noemen we, het consumentenvertrouwen is in die landen dus
veel minder
dan in Nederland.
En dat heeft voor een deel te maken met, ja, cultuur en traditie.
In Europa zijn we gewend dat de overheid dingen goed regelt,
en in Amerika zijn ze dat veel minder gewend.
Daar overstroomt gewoon heel New Orleans, en dan gaan we het weer
eens opnieuw opbouwen enzo.
Dat doen wij in Nederland ook niet.
En zo is dat met drinkwater ook zo.
In Nederland is het zo dat we vinden dat we absolute zekerheden
moeten hebben dat dat drinkwater
wat uit de kraan komt, dat dat er A altijd is, leveringszekerheid,
en B dat het altijd goed is,
zodat onze kinderen het met een gerust hart kunnen drinken
en wij zelf ook.
Een plaatje met een aantal kernbegrippen vast, het waterverbruik,
het feit dat we gebruik maken van grondwater en oppervlaktewater
voor de drinkwatervoorziening.
Grondwater is ook in Nederland vaak nog van een hele goede kwaliteit.
Een beetje geillustreerd aan dit plaatje van de Veluwe, waar we
regenwolken zien,
en je kunt je wel voorstellen, als die regen daar op dat enorme
zandoppervlak van de Veluwe stroomt.
Ja, dat water wordt heel goed gefiltreerd en dat grondwater wat je
daar wint,
dat is van hele goede kwaliteit natuurlijk,
dus grondwater is over het algemeen goed.
Er zijn best ook wel zorgen over hoor, zoals hier en daar hebben we
natuurlijk,
ik geloof niet te weinig zelfs,
vuilnisstortplaatsen, en die kunnen het grondwater ook weer
verontreinigen.
En boeren die gebruiken natuurlijk mest en bestrijdingsmiddelen
en dat kan uiteindelijk ook in het grondwater terecht komen,
maar gemiddeld gesproken is grondwater toch van een prima kwaliteit,
en dus kunnen we ook volstaan met een eenvoudige zuivering.
Beluchting en zandfiltratie, daar komen we nog op, dat is meestal wel
voldoende.
Oppervlaktewater daarentegen, dat is juist het andere eind van het
spectrum zou je kunnen zeggen.
We zitten in Nederland bij het afvoerputje van Europa.
De Rijn en de Maas die zijn door Frankrijk en Duitsland en Belgie
gestroomd.
Al dat afvalwater is erop geloosd.
Dus het oppervlaktewater bevat een volledige cocktail aan alle stoffen
die je je kunt voorstellen.
Dus oppervlaktewater moet zeer uitgebreid gezuiverd worden.
Dat doen we ook in Nederland.
Wordt in het buitenland wel eens aangeduid als double Dutch
threatment.
We hebben heel veel zuiveringsprocessen achter elkaar,
om er maar zeker van te zijn dat dat water uiteindelijk toch goed is.
En heel bijzonder in internationaal verband, we gebruiken geen chloor.
Amerikanen die vinden het vanzelfsprekend om chloor te gebruiken,
het drinkwater smaakt ook naar chloor daar,
ruikt ook naar chloor daar.
Dat vinden Amerikanen volkomen normaal.
En in Nederland zeggen we, nee, dat willen we niet.
In de eerste plaats is daar een inhoudelijke reden voor,
namelijk we weten dat als je chloor toepast,
dat gaat reageren met bepaalde stoffen die van nature in water
voorkomen,
organische verbindingen,
en dan krijg je bepaalde desinfectie nevenproducten noemen we het.
Chloroform is het meest bekende voorbeeld.
En dat zijn dus ongewenste stoffen.
Dat zijn stoffen die giftig kunnen zijn.
Nou, daar kun je wel van zeggen van, ik kan daar een bepaalde norm
voor stellen,
en misschien kan ik net nog aan die norm voldoen,
maar dat vinden we in Nederland al voldoende.
We zeggen nee, dat zijn ongewenste stoffen, die willen we gewoon
niet hebben.
Dus we willen chloor gewoon niet gebruiken.
194
Dat is een bepaald essentieel uitgangspunt, wat ook heel veel
consequenties heeft hoor,
maar wat in Nederland al meer dan 30 jaar gehanteerd wordt,
en daar is ook veel aan gedaan, veel onderzoek aan gedaan.
Dus dat is denk ik een belangrijk punt al voor dit eerste college om
even vast te houden.
Chloor leidt gewoon tot die giftige verbindingen en dat moet je daarom
niet willen.
In ieder geval hebben we dat in Nederland besloten, dat we dat niet
willen,
en dat doen we dus ook niet.
Er is nog een praktisch ander aspect en dat is dat water met chloor
naar chloor smaakt,
en dat vinden we in Nederland ook niet fijn.
We vinden toch dat water wat uit de kraan komt, dat moet lekker
smaken,
dat moet niet zo'n vieze choorsmaak hebben.
Dat is een zwembad chloorsmaak.
Dat willen we voor drinkwater niet.
Dat is waarschijnlijk ook weer een van die aspecten die met cultuur en
consumentenvertrouwen samenhangen.
We hebben een aantal principes die we in Nederland gebruiken,
en ik heb daar een stuk of 3, 4, sheets voor om die kort even de revue
te laten passeren.
Nou die focus op het gezondheid daar heb ik al voldoende over
gezegd.
In Nederland is het ook zo dat we relatief grotere bedrijven hebben,
die een soort mengsel zijn van publiek en privaat.
Het zijn NV's. Evides, het waterbedrijf wat hier is, dat is een NV,
maar de aandelen zijn in handen van de gemeente Rotterdam en de
provincie,
en andere gemeenten in het voorzieningsgebied.
Dus het is eigenlijk een soort overheid, maar net weer niet.
Semi-overheid, en dat geeft ook iets bijzonders.
Er is vorige week ook een nieuwe hoogleraar benoemd bij TBM, die
daar een heel verhaal over heeft,
dat dat eigenlijk een ideale formule is.
Dat je op die manier zeg maar de waarde van water,
water is toch iets wat niet zomaar een marktgoed is,
wat je niet zo makkelijk kunt reguleren,
zoals andere, zoals auto's en andere dingen,
dus water heeft ook iets te maken met, het is van ons allemaal,
we moeten het zorgvuldig beheren,
en zo'n publieke verantwoordelijkheid
binnen een privaatrechtelijke organisatie,
een NV, die dus wel efficient werkt,
ja, dat is wel iets wat een zekere aantrekkelijke kant heeft.
En typisch voor Nederland, we polderen hier heel graag, dus we doen
graag dingen samen.
Dus die watersector in Nederland is heel goed georganiseerd.
Die heeft een gemeenschappelijk researchinstituut opgesteld, KIWA,
waar het speurwerk voor de waterbedrijven wordt uitgevoerd,
en die hebben een belangenorganisatie opgericht, de VEWIN.
En die hebben ook een personenvereniging, de KVWN,
waar, als jullie hierin doorgaan, zul je daar allemaal lid van worden.
En ja, er is een heel wereldje waarin er goed samengewerkt wordt
enzo,
informatie uitgewisseld.
Dat is ook wel iets bijzonders van Nederland.
Is in Nederland ook makkelijker dan in Amerika natuurlijk he.
In Amerika kun je niet zo makkelijk even samenwerken tussen Los
Angeles en New York.
Dat gaat in Nederland allemaal wat makkelijker.
Als we naar de opzet van de infrastructuur kijken, dan zijn dit
essentiele kenmerken.
Om te beginnen de bescherming van de bron, daar moet alles
natuurlijk mee beginnen.
Niet het paard achter de wagen spannen en met vies water beginnen.
Nee, begin altijd met een zo schoon mogelijke bron,
en zorg dan ook dat die bron schoon blijft.
Daarom zie je overal, in de duinen en in de bosgebieden, zie je van die
bordjes staan
met grondwaterwinning, grondwaterbeschermingsgebied.
Ja, niet verontreinigen als je het kunt voorkomen.
Gebruik grondwater als het mogelijk is.
Dus in het hele blauwe gebied, het noorden, het oosten en het zuiden
van Nederland,
wordt alleen maar grondwater gebruikt voor de drinkwatervoorziening.
Daar is grondwater beschikbaar, dat is van goede kwaliteit.
Dat is microbiologisch betrouwbaar, dus je zal er sowieso nooit ziek
van worden,
dus dat is de voorkeursbron, die gebruiken we dan dus ook.
Nou, in het westen van Nederland kan dat natuurlijk niet.
Dat weet jij ook he?
- Eeh...
Waarom niet?
- Eeh, dat weet ik niet meer.
Dat ben je vergeten.
Maar iemand anders weet het misschien wel, want gezond boeren
verstand,
kun je ook een hoop mee he.
Dus gewoon even nadenken, het is helemaal niet zo moeilijk.
- Zeewater.
Zeewater, zout, precies he. Dus het grondwater hier is zout.
Ontzouten is heel erg duur, dus dat is eigenlijk niet praktisch.
Annex D. Subtitling of Collegerama
Dus ja, hier kun je geen grondwater gebruiken, dus gebruiken we maar
oppervlaktewater.
En dan doen we dat bij voorkeur,
hier in het hele duingebied, door van dat oppervlaktewater,
kunstmatig grondwater te maken via die infiltratie.
Dus we pompen oppervlaktewater de duinen in, laten dat de bodem in
zakken,
en dat wordt het een soort kunstmatig grondwater.
Wil jij een vraag stellen?
- Ja, want ik zie op de waddeneilanden wordt wel grondwater gebruikt,
- maar dat is in principe ook een duingebied.
Ja.
- Maar daar zou toch ook zout in het waterwingebied voorkomen?
Ja, dan moet ik even iets meer zeggen dan.
In het hele duingebied geldt eigenlijk dat er door,
als gevolg van eeuwenlang regen die op die duinen gevallen is,
dat er een zoetwaterbel op het zoute water drijft.
Dus als je heel voorzichtig dat water wint, kun je wel zoet water
winnen.
Dat kan snel fout gaan hoor, dus daar moet je echt wel mee oppassen,
maar dat kan net wel.
En zo is ook de duinwaterwinning in het westen van Zuid Holland en
Noord Holland begonnen,
in de 19e eeuw, door gewoon eerst duinwater op te pompen.
Op een gegeven moment is het daar fout gegaan, kregen ze zout
water,
en toen zijn ze met dat infiltreren van oppervlaktewater begonnen.
Nou, als het helemaal niet anders kan en dat is nou net bij Rotterdam
het geval,
en daarom gaan we ook bij die Beerenplaat kijken,
daar moet je oppervlaktewater gebruiken.
Rotterdam heeft geen grondwater, Rotterdam heeft ook geen duinen,
dus je moet daar oppervlaktewater gebruiken.
Dan moet je dus een hele uitgebreide zuivering hebben,
dus dat is ook niet eenvoudig.
En daar gaan we kijken.
Dus bronbescherming, dat is toch nummer 1.
We zien ook regelmatig dat de waterbedrijven berichten in de krant
zetten van,
deze stof moet verboden worden, hier moeten beperkingen aan gesteld
worden.
Gewoon zorgen dat wat goed is, goed blijft.
Nou, grondwater dat plaatje.
Ik denk dat dit toch wel erg illustratief is.
Als je een mental map hiervan maakt, van grondwater, dat is eigenlijk
regenwater
wat door een gigantisch zandfilter gestroomd is.
Nou, dat is goed.
Oppervlaktewater, nou, bijvoorbeeld bij Scheveningen hier.
Er is gewoon in de natuurlijke duinvalleitjes, pompen we Maaswater.
Wel na voorzuivering overigens hoor, want anders verstoppen die
duinvalleien meteen,
en dan verontreinigen we het duinmilieu, dat willen we natuurlijk niet.
Dus het water wordt eerst voorgezuiverd, dan de duinen in gepompt,
dan zakt het de bodem in.
Dan winnen we het weer terug met putjes die her en der in die duinen
ook geplaatst zijn.
Dan gaat het naar de nazuivering toe, die we hier zien staan,
en dan vervolgens het distributienet in.
Oppervlaktewater hebben we meervoudige barrieres.
Op dit moment volstaat eigenlijk om zo'n stroomschema te zien,
en te zien dat daar een heleboel stappen achter elkaar zitten.
We hebben gewoon veel afzonderlijke zuiveringsprocessen.
Enerzijds om er zeker van te zijn dat als de ene iets minder werkt,
dat de andere het wel opvangt.
Veiligheid, robuustheid, is heel belangrijk.
En anderzijds, om verschillende soorten stoffen met verschillende
zuiveringssystemen tegen te kunnen houden.
Dus het gaat altijd om vrij uitgebreide zuiveringsschema's als het over
oppervlaktewater gaat.
Daar gebruiken we ook moderne technologie bij.
Dus dat zijn dan weer ontwikkelingen die de afgelopen decennia zeg
maar mogelijk geworden zijn.
Hier zien we de membraanfiltratie-installatie bij Heemskerk.
Dat is de modernste en grootste zuivering van dit type in Europa.
Is in Nederland ontwikkeld.
We zien hier de desinfectie met UV licht.
Dus dat zijn eigenlijk gewoon TL buizen zou je je kunnen voorstellen,
maar die stralen dan UV licht uit.
En bacterien die kunnen daar niet tegen, die gaan daar dood van.
Dus dat is een goede manier om desinfectie van dat water te
bewerkstelligen.
Nou, dat is 2 jaar geleden geopend in aanwezigheid van de Prins
Annex D. Subtitling of Collegerama
en ook dat is weer een Nederlandse ontwikkeling,
om de zuivering weer beter te krijgen.
Nou, het resultaat daarvan is dan dat we dus...
Aan de kraan, als we die openzetten, dan komt er water van een hoge
kwaliteit uit he,
zuiver water, wat geen verontreinigingen bevat,
en ook geen chloor.
Wat ook zacht is. Het water is ook onthard in Nederland.
Daar komen we nog later op terug.
En uiteindelijk is het resultaat mede daardoor,
dat we in Nederland ook helemaal geen flessenwater gebruiken.
Althans, heel weinig.
En dat is uiteindelijk weer, als je er macro-economisch naar kijkt,
of zelfs naar de individuele klant,
is dat gewoon een hele verstandige zaak,
want flessenwater is vele malen duurder dan drinkwater.
Het is 500 keer duurder.
Het is ook veel slechter voor het milieu.
Het milieubeslag van flessenwater, daar zijn eens een keer sommetjes
van gemaakt,
met ecopunten enzo, van die flessen moeten allemaal over de weg
vervoerd worden met vrachtwagens,
en die moeten weer schoongemaakt worden enzovoort.
Als je die sommetjes maakt, dan is het milieubeslag van flessenwater
30 keer zo hoog als van drinkwater.
Dus als je even op de achterkant van een sigarendoos een sommetje
maakt,
van wat nou de Nederlander voor water kwijt is, en de Italiaan,
dan is de Italiaan 2 tot 3 keer zoveel kwijt voor water dan de
Nederlander.
En dat zit hem vooral in het feit dat men flessenwater gebruikt.
De drinkwatervoorziening zelf, de kosten daarvan, zijn min of meer
vergelijkbaar,
want dat is ook een kenmerk van dit soort grootschalige infrastructuur.
Om iets goed te doen, meervoudige zuiveringen, veilige systemen
maken,
dat is niet zo heel veel duurder dan om het slecht te doen.
Want het merendeel van de kosten zit er toch in dat je moet beginnen
met een winning te maken,
je moet een zuivering hebben, je moet transportleidingen,
distributieleidingen,
heel veel van die kosten die heb je sowieso.
En als je het goed doet, is niet veel duurder dan als je het slecht doet.
Ik denk dat ik er ben, oh ja, we hebben ook nog andere dingen,
dus we hebben het laagste lekpercentage van de wereld,
en hele betrouwbare systemen
en we letten tegenwoordig natuurlijk in Nederland op waterbesparing.
Water is toch een natuurlijke grondstof, dat moet je niet verspillen.
Dus het waterverbruik in Nederland stijgt niet,
is relatief constant,
en het huishoudelijk waterverbruik daalt zelfs,
doordat we tegenwoordig waterbesparende toiletten en douches en
wasmachines enzo hebben,
en die worden ook allemaal gestimuleerd, krijg je subsidie op
enzovoort.
We zijn er allemaal verantwoord mee bezig.
En dan hebben we de laatste dia.
Dus het uiteindelijke resultaat van die hele filosofie
en de dingen die daaraan gedaan zijn de afgelopen 30 jaar,
is dus dat we zeggen van, we hebben het wonder uit de kraan.
Dat was een reclamekreet van de waterbedrijven een aantal jaren
geleden.
Werd toen posters van gemaakt en reclame op radio en TV.
Het wonder uit de kraan, heel goed water, het is er altijd.
We worden er niet ziek van, geen verontreinigingen.
We hebben geen flessenwater nodig.
We hebben geen filters aan de kraan nodig.
We verspillen het water niet.
Dus we hebben de zaken goed voor elkaar.
Nou, dat is enerzijds natuurlijk een beetje een gechargeerd beeld.
Er zijn best wel dingen die nog beter kunnen en beter moeten,
en daar komen we ook wel op terug,
maar qua filosofie zeker in vergelijking met Canada en Amerika
bijvoorbeeld,
is dat gewoon zo.
En aan de andere kant moeten we ons ook realiseren dat dat natuurlijk
ook heel anders kan,
en in heel veel landen ook heel anders gaat.
En het meest extreme voorbeeld daarvan zijn natuurlijk de
ontwikkelingslanden,
waar gewoon de basisinfrastructuur nog volledig ontbreekt,
en daar gaat na de pauze Doris over vertellen.
We gaan even pauzeren, bedankt.
195
Annex D3. Partial transcript lecture CT3011 (incl. time frames /
sentence)
1
00:00:00,100 --> 00:00:03,300
Na een ruime inlooptijd
2
00:00:03,300 --> 00:00:06,800
kunnen we beginnen
3
00:00:06,800 --> 00:00:11,700
met het tweede deel van 30-11, Watermanagement.
4
00:00:11,700 --> 00:00:14,300
Het deel over gezondheidstechniek ga ik
5
00:00:14,300 --> 00:00:18,600
de komende zeven weken met jullie doornemen.
6
00:00:18,600 --> 00:00:21,400
En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus,
7
00:00:21,400 --> 00:00:25,000
mijn naam is Hans van Dijk, zoals jullie daar zien staan
8
00:00:25,000 --> 00:00:26,900
en ik dacht, laat ik daar maar twee dingen voor nemen,
9
00:00:26,900 --> 00:00:29,000
mijn hobby en mijn werk.
10
00:00:29,000 --> 00:00:30,800
Nou de hobby dat zien jullie,
11
00:00:30,800 --> 00:00:32,600
ik ben een marathonloper.
12
00:00:32,600 --> 00:00:37,400
Een mooie foto van de glorieuze binnenkomst in Rotterdam
13
00:00:37,400 --> 00:00:40,300
in april afgelopen periode.
14
00:00:40,300 --> 00:00:43,100
Marathonlopers dat zijn allemaal een beetje fanatieke lui he,
15
00:00:43,100 --> 00:00:46,600
echte doordouwers, die trainen iedere dag.
16
00:00:46,600 --> 00:00:50,800
Die weten hun leven zodanig te organiseren dat dat allemaal kan.
17
00:00:50,800 --> 00:00:54,000
Dus ik loop hier ook iedere dag tussen de middag een rondje naar
18
00:00:54,000 --> 00:00:55,800
Delfts hout, of langs de Schie,
19
00:00:55,800 --> 00:00:59,400
of een ander parcour hier.
20
00:00:59,400 --> 00:01:03,100
Als jullie me eens een keer in korte broek of trainingspak zien lopen
......................
......................
770
00:44:26,200 --> 00:44:28,800
We hebben geen filters aan de kraan nodig.
196
Annex D. Subtitling of Collegerama
771
00:44:28,800 --> 00:44:30,700
We verspillen het water niet.
772
00:44:30,700 --> 00:44:33,000
Dus we hebben de zaken goed voor elkaar.
773
00:44:33,000 --> 00:44:35,700
Nou, dat is enerzijds natuurlijk een beetje een gechargeerd beeld.
774
00:44:35,700 --> 00:44:38,900
Er zijn best wel dingen die nog beter kunnen en beter moeten,
775
00:44:38,900 --> 00:44:40,700
en daar komen we ook wel op terug,
776
00:44:40,700 --> 00:44:44,700
maar qua filosofie zeker in vergelijking met Canada en Amerika bijvoorbeeld,
777
00:44:44,700 --> 00:44:46,800
is dat gewoon zo.
778
00:44:46,800 --> 00:44:52,200
En aan de andere kant moeten we ons ook realiseren dat dat natuurlijk ook heel anders kan,
779
00:44:52,200 --> 00:44:54,900
en in heel veel landen ook heel anders gaat.
780
00:44:54,900 --> 00:44:58,100
En het meest extreme voorbeeld daarvan zijn natuurlijk de ontwikkelingslanden,
781
00:44:58,100 --> 00:45:01,600
waar gewoon de basisinfrastructuur nog volledig ontbreekt,
782
00:45:01,600 --> 00:45:05,100
en daar gaat na de pauze Doris over vertellen.
783
00:45:05,100 --> 00:45:11,100
We gaan even pauzeren, bedankt.
Annex D. Subtitling of Collegerama
197
Annex D4. Partial transcript lecture CT3011 (incl. time frames /
word)
woord
en
aan
uh
ja
en
daarin
lopen
we
daar
is
zeker
geen
en
uh
we
met
het
hele
heelal
van
uh
dertig
elf
en
later
naar
het
cement
't
cd
lover
gezondheidss
en
niet
werd
hij
de
zeven
weken
met
jullie
uh
doornemen
en
ik
dacht
ik
zelf
mee
zeker
je
je
voorstellen
is
mijn
naam
is
als
een
tank
zoals
jullie
daar
zien
staan
en
ik
dacht
laat
ik
daar
maar
twee
dingen
van
één
en
een
half
jaar
naar
werk
ja
nou
willen
niet
inzien
198
begin
millisec
110
240
760
1080
1260
1410
2260
2560
2880
3070
3170
3710
3940
4220
5840
6730
6910
7260
7480
7820
8300
9200
9550
10020
10210
10600
10810
10900
11470
11700
12180
12760
13270
13370
13670
13880
14260
14780
15170
15550
15740
16200
16460
18600
18730
18830
19040
19120
19350
19520
19950
20050
20180
20970
21710
21880
22110
22250
22500
22630
22940
23280
23470
23700
23880
24750
24870
24950
25160
25370
25440
25610
25740
26000
26260
26470
26730
26900
26980
27190
27470
27630
28120
28970
29140
29380
29540
eind
millisec
240
760
1070
1250
1410
2260
2540
2830
3070
3160
3690
3940
4220
4670
6630
6910
7130
7480
7820
8300
9170
9540
10010
10210
10600
10800
10900
11460
11700
12180
12710
13270
13370
13670
13880
14180
14370
15170
15550
15740
16200
16460
17190
18730
18830
19040
19120
19350
19520
19950
20050
20180
20960
21390
21880
22110
22250
22500
22630
22940
23280
23470
23690
23880
24350
24870
24950
25160
25360
25440
25610
25740
26000
26260
26470
26730
26900
26980
27180
27470
27630
28110
28400
29120
29370
29540
30050
ja
ik
ben
marathonloper
een
mooie
foto
van
de
glorieuze
binnenkomst
in
uh
rotterdam
in
april
afgelopen
periode
en
marathonlopers
dat
zijn
allemaal
een
beetje
fanatieke
leidde
er
echter
door
de
ouders
die
twee
iedere
dag
[s]
die
je
er
beter
inleven
zelf
de
aandacht
te
organiseren
dat
het
allemaal
kan
dus
ik
loop
hier
ook
iedere
dag
tussen
de
meer
dan
ooit
je
naar
delft
houdt
of
langs
deze
rivier
of
andere
koerier
en
uh
als
jullie
d'r
is
30050
30470
30600
30820
32610
32730
33080
33550
33760
33820
35300
36050
36350
36580
37370
37650
38110
38640
40040
40210
41010
41140
41390
41760
41850
42100
42660
43000
43200
43480
43720
43800
44270
44990
45230
45710
46290
46610
46920
47010
47280
47640
48020
48260
48340
48590
48680
49450
49620
49750
50150
50920
51060
51180
51420
51620
51900
52260
52490
52750
52830
52980
53280
53520
53670
53900
54250
54670
54790
55060
55290
55770
56400
56830
58230
58590
59580
59750
59970
60320
30450
30600
30820
31920
32730
33080
33550
33760
33820
34800
36050
36350
36570
37360
37640
38100
38640
39360
40210
41010
41140
41390
41760
41850
42100
42660
43000
43190
43480
43720
43800
44270
44610
45230
45710
46290
46610
46920
47010
47280
47640
48020
48260
48340
48590
48680
49450
49620
49750
50100
50470
51060
51180
51420
51620
51880
52260
52490
52750
52830
52980
53280
53520
53670
53900
54250
54670
54790
55060
55290
55770
56050
56830
57580
58590
58930
59750
59970
60310
60580
Annex D. Subtitling of Collegerama
Annex E.
1.
2.
3.
4.
Speech recognition
Speech recognition for movies .................................................................. 201
Types of speech recognition............................................................................... 201
Speech recognition at University of Twente ......................................................... 201
SHoUT .............................................................................................................. 201
SHoUT for example lecture in CT3011 ...................................................... 202
SHoUT on example lecture ................................................................................. 202
Number of segments and words ......................................................................... 202
Total duration of words and silences .................................................................. 203
SHoUT compared with human made subtitles ..................................................... 203
SHoUT as subtitling system ................................................................................ 204
SHoUT and word frequency ............................................................................... 205
SHoUT for tag cloud search ............................................................................... 206
SHoUT for example course CT3011 (all lectures) ..................................... 207
SHoUT on example course ................................................................................. 207
Lectures and lecturers ....................................................................................... 207
SHoUT output analysis....................................................................................... 209
Quality of word recognition ................................................................................ 209
Word correctness per lecturer ............................................................................ 211
Evaluation ................................................................................................. 212
SHoUT for word indexing ................................................................................... 212
SHoUT for tag cloud production ......................................................................... 212
SHoUT for transcripts ........................................................................................ 212
SHoUT for subtitles ........................................................................................... 212
Annex E1.
Annex E2.
Annex E3.
SHoUT result from lecture CT3011.................................................. 213
Transcript of lecture CT3011 from speech recognition (SHoUT) .... 218
Speech recognition (SHoUT) compared to human made subtitles . 222
Annex E. Speech recognition
199
200
Annex E. Speech recognition
1.
Speech recognition for movies
Types of speech recognition
Speech recognition (also known as automatic speech recognition or computer speech
recognition) converts spoken words to text. The term "voice recognition" is sometimes used
to refer to speech recognition where the recognition system is trained to a particular speaker
- as is the case for most desktop recognition software; hence there is an aspect of speaker
recognition, which attempts to identify the person speaking, to better recognize what is being
said. Speech recognition is a broad term which means it can recognize almost anybodys
speech - such as a callcentre system designed to recognize many voices.
Speech recognition at University of Twente
Speech recognition is one of the focus points of the chair Human Media Interaction at the
University of Twente.
Figure 1.1: Logo of the chair Human Media Interaction at the University of Twente
Further reference is made to the websites of this chair:
• chair: http://hmi.ewi.utwente.nl)
• multi media retrieval: http://hmi.ewi.utwente.nl/topic/Multimedia%20Retrieval
SHoUT
SHoUT is a software package that has been developed at the University of Twente at the
chair Human Media Interaction by promovendus Marijn Huijbregts. He was doing a PHD
project titled "Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled".
ShoUT is a Dutch acronym for "Speech Recognition Research at the University of Twente"
which is a speech recognition system based on machine learning techniques that are
commonly used. It is used to do research on Large Vocabulary Continuous Speech
Recognition (LVCSR), but the speech/non-speech detector and the speaker diarization
application can be used separately. It is written in C++ on a Linux platform.
(Source: http://wwwhome.cs.utwente.nl/~huijbreg/shout/)
Annex E. Speech recognition
201
2.
SHoUT for example lecture in CT3011
SHoUT on example lecture
For the case of this project, a Dutch lecture given by J.C. (Hans) van Dijk about Sanitary
Engineering was selected for the purpose of testing the results on speech recognition by
SHoUT.
This lecture has been subtitled previously, by human subtitling (Erwin de Moel). This allows
for evaluating the quality of the speech recognition.
The lecture is from the bachelor's course CT3011, Introduction Water Management. Further
information on this course is given in Table 2.1.
Table 2.1: Sample lecture chosen for speech recognition
Course
Lecture
Lecturer
Duration
Number of slides
Recording date/time
Collegerama link
CT3011 – Introduction Water Management
Lecture 8 – Sanitary Engineering (Civiele Gezondheidstechniek)
(#15 in Collegerama recorded lecturers)
Prof. ir. J.C. (Hans) van Dijk
45:09
27
29 September 2007 / 8:45 AM – 9:30 AM
http://collegerama.tudelft.nl/mediasite/Viewer/?peid=f33ba7ff-01604259-bd94-7ee0d9c5a461
The result of the speech recognition by SHoUT is given in Annex E1. This is an xml-file with
time stamps associated with each spoken word. Silences are considered as words (marked as
[s]) with a certain duration. This xml-file can be converted into a transcript by removing the
time stamps and replacing [s] with "…". The result of this conversion is shown in Annex E2.
Number of segments and words
The number of segments and words retrieved by ShoUT is shown in Table 2.2.
Table 2.2: Analysis of speech recognition for lecture CT3011
Property
Number of speech segments
SHoUT
170
Human subtitling
779
Number of words - incl. [silence]'s
Number of real words (excl. [silence]'s)
9,739
7,351
6,970
Number of text blocks *
2,223
779
* Assuming [s] of [s][s] as sentence delimiter
The SHoUT output file contains 170 speech segments, labeled as SPK01-001 to SPK01-170.
Segments are defined by SHoUT based on the following procedures:
• energy detection
• speech activity detection
• smoothing (combining simular elements without long silence periods)
Nearly all speech segments begin and end with a silence (165 out of 170 segments). Silences
are shown as "[s]" in the SHoUT output file. Segments cannot be considered as real
sentences. Their number is much smaller than the number of subtitles (170 versus 779).
Also a silence cannot be regarded as a sentence delimiter. The output file contains 165
double [s] words and 2.058 single [s]-words. This total number is much more than the
number of sentences in the subtitling file.
202
Annex E. Speech recognition
The number of real words in the SHoUT output is 5% more compared to those in the subtitle
file. Apparently longer words might be divided into different words whenever their syllables or
word parts are recognized as separate words.
Total duration of words and silences
Speech segments and words might have an interval time. In that case the starting time
differs from the ending time of the previous segment or word. Table 2.3 gives an overview of
the interval duration between words and silences as well as the duration of silences and
words.
Table 2.3: Duration of intervals, silences and words from speech recognition for lecture CT3011
Property
Number of elements
Minimum duration (milliseconds)
Maximum duration (milliseconds)
Median duration (milliseconds)
Total time (milliseconds)
Total time (minutes)
Intervals
220
20
19,020
30
Silences
2,388
10
1,710
50
Words
7,351
30
1,480
240
Total
779
-
135,410
2:15
5%
409,150
6:49
15%
2,162,130
36:02
80%
2,707,690
45:07
100%
The interval time varies from 30 to 19,020 milliseconds. The total interval time amounts to
5% of the lecture time. The silence time varies from 10 to 1,710 milliseconds. The total
silence time amounts to 15% of the duration of the lecture.
The total time of the SHoUT elements (45:07 minutes) nearly equals the playing time of the
Collegerama lecture (45:09 minutes). The difference might be caused by the start and end
periods and/or the difference in timing of the movie and the SHoUT speech recognition
system. It is assumed that the time stamps of the speech recognition allows for adequate
timing for use in subtitling and/or word searching.
SHoUT compared with human made subtitles
The output of the SHoUT speech recognition can be compared with the human made
subtitles. For this purpose the transcript of Annex E2 has been converted into sentences as is
required for subtitles. The result of this conversion is shown in Annex E3, which shows the
human made subtitles as well as the converted SHoUT results. Additionally, Annex E3 shows
the speech segments from SHoUT.
The results are summarized in Table 2.4. The comparison has been made for the initial part
of the lecture with a total sample size of 471 words. This sample size has been reduced by
removing a conversation between the lecturer and a student. The recording of this
conversation is hampered by the absence of a microphone for students. The reduced sample
size amounts to nearly 6% of the total lecture.
Annex E. Speech recognition
203
Table 2.4: Recovery of words by SHoUT speech recognition compared to human made subtitles
Set
Collection
Words
Lecture
Sample
Sample, excluding conversation
Sample, excluding conversation (%)
Recovered words
Sample, excluding conversation
Lines with 100% word correctness
Lines
Human
subtitling
(# words or
lines)
6.970
471
401
5.8 %
401
48
-
Speech recognition
(SHoUT)
(# words
(%)
or lines)
7.351
105 %
443
94 %
411
102 %
5.6 %
204
51 %
48
100 %
7
-
Table 2.4 shows that the total number of words recovered by the SHoUT speech recognition
systemis a little bit more than the actual number of words (105%). This is probably caused
by the fact that SHoUT recognizes long words as separate smaller words. This word splitting
might be explained by the low speaking rate in lectures.
The total number of words in the reduced sample is 102% compared to that of the humanmade subtitles. This corresponds well to the lecture as a whole. The word correctness of
SHoUT proves to be approximately 50%. The reduced sample size includes 48 subtitle lines.
Only 7 lines have a 100% word correctness. This corresponds to approximately 15% of the
subtitle lines. The rather low word correctnesse and the dramatically low sentence
correctness require a substantial improvement by human intervention, when using the SHoUT
results for subtitling.
SHoUT as subtitling system
The output of the SHoUT speech recognition might be used for the creation of subtitles.
Previous comparison showed that SHoUT recognizes around 50% of the words correctly. For
accurate subtitling, this is clearly not enough. This is properly shown in the YouTube
example: http://www.youtube.com/watch?v=otGN0NUYs5w
The subtitles from SHoUT are indicated as Interlingua-SHoUT. Figure 2.1 gives an impression
of these subtitles.
Figure 2.1: Subtitles created from the SHoUT transcript
204
Annex E. Speech recognition
Moreover, the quality of the speech recognition is inadequate for using these subtitles for
automated translation by Google Translate. This too can be demonstrated in the previous
mentioned YouTube example.
SHoUT and word frequency
The human-made subtitles were previously analyzed on the frequency of words. This result
can be compared to the results of the SHoUT speech recognition system. Table 2.5 gives the
comparison for the top-20 words, Table 2.6 for the top-15 nouns.
Table 2.5: Top 20 most common words in transcript of lecture #15 of CT3011 (human made versus speech
recognition SHoUT)
Nr
Word
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
dat
de
en
het
is
een
van
in
ook
we
die
je
dus
ik
dan
daar
niet
zijn
op
met
Count
Human made
269
231
225
220
181
174
162
151
134
128
113
107
93
88
80
76
55
54
54
53
Table 2.6: Top 15 most used nouns in transcript of lecture #15 in CT3011
Nr
Word
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
water
Nederland
grondwater
jaar
drinkwatervoorziening
dingen
oppervlaktewater
boek
keer
drinkwater
plaatje
vragen
chloor
soort
stoffen
Annex E. Speech recognition
Count
manual
39
36
21
16
16
16
15
15
13
13
11
10
10
9
9
Count
Speech
recognition
218
276
359
155
270
158
132
192
109
46
99
117
22
74
89
72
165
75
46
59
Word accuracy
Count
SHoUT
33
35
20
28
4
17
7
5
16
16
6
6
0
7
8
Word accuracy
81%
119%
160%
70%
149%
91%
81%
127%
81%
36%
88%
109%
24%
84%
111%
95%
300%
139%
85%
111%
85%
97%
95%
175%
25%
106%
47%
33%
123%
123%
55%
60%
0%
78%
89%
205
Table 2.3 and Table 2.5 show that speech recognition by SHoUT has a word accuracy
between 0 and 300%. Some words are never recognized (as the word "chloor"), and some
words are recognized far too often (as the word "niet").
SHoUT for tag cloud search
The human made subtitles were previously analyzed for the frequency of words.
Table 2.7: Top 15 most used nouns in speech recognition of lecture #15 in CT3011
Nr
Word
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
nederland
water
jaar
grondwater
mensen
dingen
drinkwater
stoffen
jaren
onderzoek
oppervlaktewater
soort
kwaliteit
plaatje
wereld
Count
Speech recognition
35
33
28
20
20
17
16
8
7
7
7
7
6
6
6
In top-15 human made
subtitles
yes
yes
yes
yes
no
yes
yes
yes
no
no
yes
yes
no
no
no
The top-4 words from the SHoUT system are the same as those obtained from human-made
subtitling. The top-10 is for 70% identical (7 out of 10) and the top-15 for 60% (9 out of 15).
These results show that speech recognition is rather suitable for creating tag clouds. This is
demonstrated in Figure 2.2.
Figure 2.2: Tag cloud from SHoUT (left) versus tag cloud from human-made subtitles (right) with common Dutch
word removal
(Source: http://www.wordle.net)
206
Annex E. Speech recognition
3.
SHoUT for example course CT3011 (all lectures)
SHoUT on example course
For the case of this project, a BSc course in the Dutch language was selected for the purpose
of testing the results on speech recognition by SHoUT. The course is from the BSc program
Civil Engineering. This course is part of the TU Deft OpenCourseWare (OCW - OpenER)
program. Further details of this course are given in Table 3.1.
Table 3.1: Sample course chosen for speech recognition
Course
Academic year
Period
Lecturers
Course credits
Number of
recordings
OCW links
Collegerama
catalog link
CT3011 – Introduction Water Management
2007 – 2008
(Lecture #1 to #4 are recorded in 2008/2009)
P1 (september – november)
Prof. dr.ir. N.C. (Nick) van de Giesen
Prof. ir. J.C. (Hans) van Dijk
Plus 9 guest lecturers
4 ECTS
28
(14 double lecture sessions)
Available as OCW course and also as the original Blackboard-course.
For links see http://drinkwater.tudelft.nl banner OpenCourseWare
http://collegerama.tudelft.nl/mediasite/Catalog/?cid=16b5f5fa-0745-4b8b9f02-f79a03abf50a
Lectures and lecturers
The course CT3011 consists of 28 lectures with a nominal duration of 45 minutes, given in 14
double lecture sessions. The two responsible professors gave 18 lectures while the remaining
10 lectures were given by 9 different guest lecturers. The lectures were given in the academic
year 2007/2008, except for the first 4 lectures which were recorded a year later.
Table 3.2 gives an overview of the lectures and the lecturers. This table also includes the
gender and age of the lecturer, since this might be of relevance for evaluating the word
correctness of the speech recognition. All lectures were given in the Dutch language and all
lecturers are native Dutch speakers.
Annex E. Speech recognition
207
Table 3.2: Lectures and lecturers in CT3011 (lectures in Dutch by native Dutch speakers)
Nr
Lecturer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Nick van de Giesen
Peter- Jules van Overloop
Nick van de Giesen
Nick van de Giesen
Huub Savenije
Huub Savenije
Hans van Dijk
Doris van Halem
Hans van Dijk
Patrick Smeets
Hans van Dijk
Jasper Verberk
Hans van Dijk
Karin Teunissen
Hans van Dijk
Anke Grefte
Hans van Dijk
Mirjam Blokker
Hans van Dijk
Jan Vreeburg
* M Male, F Female
208
Nr
1
1
1
1
1
1
1
1
1
4
1
1
3
3
2
5
2
6
2
7
2
8
2
9
2
10
2
11
Gender
(*)
M
M
M
M
M
M
M
M
M
M
M
M
M
M
M
F
M
M
M
M
M
F
M
F
M
F
M
M
Age
(at time of
recording)
46
46
46
46
45
45
45
45
45
38
45
45
54
54
53
26
53
36
53
35
53
23
53
27
53
33
53
47
Recorded
length
(hh:mm:ss)
0:36:03
0:44:20
0:43:46
0:43:05
0:45:56
0:39:19
0:39:19
0:48:19
0:44:49
0:41:27
0:36:50
0:38:35
0:40:50
0:40:38
0:45:09
0:32:25
0:46:28
0:44:22
0:48:22
0:39:56
0:53:51
0:34:46
0:48:30
0:23:26
0:39:49
0:22:31
0:46:17
0:42:20
Annex E. Speech recognition
SHoUT output analysis
The number of words retrieved by SHoUT from all recording is shown in Table 3.3.
Table 3.3: Words from SHoUT for all recorded lectures of CT3011
Lecture
Nr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Total
Lecturer
Number of words
Nr
1
1
1
1
1
1
1
1
1
4
1
1
3
3
2
5
2
6
2
7
2
8
2
9
2
10
2
11
19:22:00
Nr
5,716
7,149
6,930
5,966
7,792
5,803
6,492
7,776
6,768
6,371
5,857
6,596
6,152
5,923
7,351
4,520
8,139
7,320
8,693
7,370
9,392
5,652
8,642
3,740
6,946
3,581
8,188
8,132
188,957
Timing last word
(sec)
2,161
2,658
2,625
2,584
2,752
2,309
2,358
2,897
2,688
2,486
2,207
2,619
2,445
2,436
2,706
1,892
2,831
2,665
3,056
2,390
3,229
2,027
3,029
1,358
2,387
1,350
2,776
2,532
69,453
Speech rate
(words/sec)
2.65
2.69
2.64
2.31
2.83
2.51
2.75
2.68
2.52
2.56
2.65
2.52
2.52
2.43
2.72
2.39
2.87
2.75
2.84
3.08
2.91
2.79
2.85
2.75
2.91
2.65
2.95
3.21
2.71
Table 3.3 shows that the 28 recorded lectures have a total length of almost 19.5 hours. In
total, nearly 190,000 words have been produced by SHoUT. The mean speech rate in these
lectures is 2.7 words per second, with a variation between plus or minus 20%. This variation
is mainly caused by the speaking rate of the different lecturers and to a lesser extent by the
speaking pauses during the lectures. The latter are more or less absent.
Quality of word recognition
The quality of the speech recognition is determined by comparing a sample of the transcript
generated by SHoUT with the actual spoken words. Selection of this sample was done for all
lectures by using a selection of 25 sentences, somewhere in the beginning of the lecture. This
came out to be 5%-6% of the number of words in a lecture. The comparison was visualized
by color marking the correct words. Figure 3.1 gives a print of this color marking for the
highest and for the lowest word correctness. Table 3.4 gives the results of this comparison.
Annex E. Speech recognition
209
Figure 3.1: Quality check of speech recognition by marking the correct words, showing the highest word correctness
(left, 73%) and the lowest word correctness (right, 23 %)
Table 3.4: Words from SHoUT for all recorded lectures of CT3011
Lecture
Nr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Total
Minimum
Maximum
Mean
Std Deviation
210
Lecturer
Nr
1
1
1
1
1
1
1
1
1
4
1
1
3
3
2
5
2
6
2
7
2
8
2
9
2
10
2
11
Lecture size
SHoUT
(Nr of words)
5,716
7,149
6,930
5,966
7,792
5,803
6,492
7,776
6,768
6,371
5,857
6,596
6,152
5,923
7,351
4,520
8,139
7,320
8,693
7,370
9,392
5,652
8,642
3,740
6,946
3,581
8,188
8,132
188,957
Sample size
(Nr of words)
434
445
428
446
417
426
313
411
411
447
443
430
409
407
437
398
400
406
384
439
394
405
373
389
425
395
404
411
11,527
(%)
7.6
6.2
6.2
7.5
5.4
7.3
4.8
5.3
6.1
7.0
7.6
6.5
6.6
6,9
5.9
8.8
4.9
5.5
4.4
6.0
4.2
7.2
4.3
10.4
6.1
11.0
4.9
5.1
6.1
4.2
11.0
Word
correctness
(%)
35
40
26
23
45
31
34
41
40
60
36
35
73
47
46
64
64
64
61
49
56
41
67
64
48
71
64
66
23
73
50
14.6
Annex E. Speech recognition
Table 3.4 shows that the average word correctness of SHoUT amounts to 50%, with a
variation between 23 and 73%. The latter corresponds to a word error rate of approximately
77 respectively 27 %.
Word correctness per lecturer
The word correctness may vary for the different speakers. Therefor the word correctness can
be clustered per lecturer. This is presented in Figure 3.2.
Figure 3.2: Word correctness clustered per lecturer shows significant differences amongst speakers
Figure 3.2 shows significant differences amongst speakers. The first speaker has an average
word correctness of 35% (variation between 23 and 45%), the second speaker has an
average word correctness of 58% (variation between 45 and 67%). This difference cannot be
related to technical differences of the recordings or the difference in speech rate. The
differences are most likely caused by differences in prononcation and articulation of both
lecturers.
The guest lectures have a word correctness within the same range as the second lecturer. No
significant differences have been observed between male (10, 18, 20, 28) and female (16,
22, 24, 26) speakers.
Annex E. Speech recognition
211
4.
Evaluation
Speech recognition can be used for word and text recovery from recorded lectures. Based on
the results of the quality analyses of the recorded lecture, the following conclusion can be
drawn for using SHoUT as speech recognition system for lectures recorded with Collegerama.
SHoUT for word indexing
The word correctness of SHoUT amounts to 50% with variation between 25 and 75%. This
recovery rate allows for word indexing in cases where no better sources for word indexing,
such as correct subtitles, are available.
SHoUT for tag cloud production
The word correctness of SHoUT amounts to 50% with variation between 25 and 75%.
Testing the produced tag cloud for the most frequently used nouns shows that tag clouds
produced from SHoUT output are more or less similar to tag clouds produced from handmade subtitles. It should be noted that SHoUT is missing some uncommon words completely,
like the word "chloor" in the test lecture. For academic lectures, this might be a serious
shortcoming.
SHoUT for transcripts
The word correctness of SHoUT is too low for producing readable transcripts.
SHoUT for subtitles
The word correctness of SHoUT is too low for producing readable subtitles. The SHoUT
output however might be used as a starting point for human-made subtitles since it provides
the proper timing of all words. In such a human-based post processing these words should
be clustered in subtitle sentences and the incorrect words should be corrected.
212
Annex E. Speech recognition
Annex E1. SHoUT result from lecture CT3011
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-###############################################################################################
###
###
Shout is the decoder of the 'SHoUT LVCS Recognition toolkit'.
###
###
This toolkit is developed by:
###
###
Marijn Huijbregts, HMI, University of Twente.
###
###
http://wwwhome.cs.utwente.nl/~huijbreg/
###
###
marijn.huijbregts@utwente.nl
###
###############################################################################################
###
-->
<shout_metadata>
<model_info>
<AM>/home/parlevink/verschoort/projects/asr-test/shout/models/acoustic-model.16.try4.orgsil.am</AM>
<DCT>/home/parlevink/verschoort/projects/asr-test/shout/models/9904.65K.release-003.dct.bin</DCT>
<LM>/home/parlevink/verschoort/projects/asr-test/shout/models/mix-9904.3g.interpolate.v02.lowprobs_cgn-comp-fijkl.arpa.plus.bin</LM>
</model_info>
<decoding_settings>
<LM_SCALE>30.0</LM_SCALE>
<TRANSITION_PENALTY>0.0</TRANSITION_PENALTY>
<SHORT_WORD_PENALTY>0.0</SHORT_WORD_PENALTY>
<SHORT_WORD_LENGTH>3</SHORT_WORD_LENGTH>
<SIL_PENALTY>0.0</SIL_PENALTY>
<GLOBAL_BEAM>175.0</GLOBAL_BEAM>
<NODE_BEAM>60.0</NODE_BEAM>
<ENDWORD_BEAM>50.0</ENDWORD_BEAM>
<LMLA>ON</LMLA>
<LMLA_CACHESIZE>800</LMLA_CACHESIZE>
<LMLA_CLEANUPINTERVAL>100</LMLA_CLEANUPINTERVAL>
<MAX_TOKENS_PER_STATE>160</MAX_TOKENS_PER_STATE>
<MAX_TOKENS_TOTAL>35000</MAX_TOKENS_TOTAL>
</decoding_settings>
<segments>
<speech label="SPK01-001" begintime="0.00" endtime="17.67" >
<real_time milliseconds="22112" frames="1767" RTF="1.2514"/>
<EOS-score="0.00000"/>
<score COMBINED="-59866.66016"/>
<wordsequence>
<word wordID="[s]" beginTime="0.000" endTime="0.110">
<score AM="-232.13780" LM="0.00000" COMBINED="-232.13780"/>
</word>
<word wordID="en" beginTime="0.110" endTime="0.240">
<score AM="-714.45599" LM="-42.98058" COMBINED="-757.43658"/>
</word>
<word wordID="aan" beginTime="0.240" endTime="0.760">
<score AM="-2258.56299" LM="-118.15379" COMBINED="-2376.71680"/>
</word>
<word wordID="uh" beginTime="0.760" endTime="1.070">
<score AM="-2902.88232" LM="-183.18549" COMBINED="-3086.06787"/>
</word>
<word wordID="[s]" beginTime="1.070" endTime="1.080">
<score AM="-2947.88574" LM="-183.18549" COMBINED="-3131.07129"/>
</word>
<word wordID="ja" beginTime="1.080" endTime="1.250">
<score AM="-3457.09912" LM="-242.73988" COMBINED="-3699.83911"/>
</word>
<word wordID="[s]" beginTime="1.250" endTime="1.260">
<score AM="-3498.23120" LM="-242.73988" COMBINED="-3740.97119"/>
</word>
<word wordID="en" beginTime="1.260" endTime="1.410">
<score AM="-4014.74512" LM="-295.96329" COMBINED="-4310.70850"/>
</word>
<word wordID="daarin" beginTime="1.410" endTime="2.260">
<score AM="-6442.80078" LM="-409.44135" COMBINED="-6852.24219"/>
</word>
<word wordID="lopen" beginTime="2.260" endTime="2.540">
<score AM="-7443.77246" LM="-503.81482" COMBINED="-7947.58740"/>
</word>
<word wordID="[s]" beginTime="2.540" endTime="2.560">
<score AM="-7505.68408" LM="-503.81482" COMBINED="-8009.49902"/>
</word>
<word wordID="we" beginTime="2.560" endTime="2.830">
<score AM="-8327.93164" LM="-533.22229" COMBINED="-8861.15430"/>
</word>
<word wordID="[s]" beginTime="2.830" endTime="2.880">
<score AM="-8552.04395" LM="-533.22229" COMBINED="-9085.26660"/>
</word>
<word wordID="daar" beginTime="2.880" endTime="3.070">
<score AM="-9426.74512" LM="-599.57513" COMBINED="-10026.32031"/>
</word>
<word wordID="is" beginTime="3.070" endTime="3.160">
<score AM="-9854.66602" LM="-661.00763" COMBINED="-10515.67383"/>
</word>
<word wordID="[s]" beginTime="3.160" endTime="3.170">
<score AM="-9895.22363" LM="-661.00763" COMBINED="-10556.23145"/>
</word>
<word wordID="zeker" beginTime="3.170" endTime="3.690">
<score AM="-11814.48828" LM="-751.24976" COMBINED="-12565.73828"/>
Annex E. Speech recognition
213
</word>
<word wordID="[s]" beginTime="3.690" endTime="3.710">
<score AM="-11905.47949" LM="-751.24976" COMBINED="-12656.72949"/>
</word>
<word wordID="geen" beginTime="3.710" endTime="3.940">
<score AM="-12843.25195" LM="-789.18256" COMBINED="-13632.43457"/>
</word>
<word wordID="en" beginTime="3.940" endTime="4.220">
<score AM="-13695.13770" LM="-862.70569" COMBINED="-14557.84375"/>
</word>
<word wordID="uh" beginTime="4.220" endTime="4.670">
<score AM="-14415.94141" LM="-916.09106" COMBINED="-15332.03223"/>
</word>
<word wordID="[s]" beginTime="4.670" endTime="5.840">
<score AM="-17269.53516" LM="-916.09106" COMBINED="-18185.62695"/>
</word>
<word wordID="we" beginTime="5.840" endTime="6.630">
<score AM="-18720.78516" LM="-989.69421" COMBINED="-19710.47852"/>
</word>
<word wordID="[s]" beginTime="6.630" endTime="6.730">
<score AM="-19061.42383" LM="-989.69421" COMBINED="-20051.11719"/>
</word>
<word wordID="met" beginTime="6.730" endTime="6.910">
<score AM="-19861.14648" LM="-1053.53931" COMBINED="-20914.68555"/>
</word>
<word wordID="het" beginTime="6.910" endTime="7.130">
<score AM="-20622.81445" LM="-1095.39929" COMBINED="-21718.21289"/>
</word>
<word wordID="[s]" beginTime="7.130" endTime="7.260">
<score AM="-21291.16211" LM="-1095.39929" COMBINED="-22386.56055"/>
</word>
<word wordID="hele" beginTime="7.260" endTime="7.480">
<score AM="-22231.99609" LM="-1169.62207" COMBINED="-23401.61719"/>
</word>
<word wordID="heelal" beginTime="7.480" endTime="7.820">
<score AM="-23341.09375" LM="-1271.07471" COMBINED="-24612.16797"/>
</word>
<word wordID="van" beginTime="7.820" endTime="8.300">
<score AM="-24820.38672" LM="-1324.94214" COMBINED="-26145.32812"/>
</word>
<word wordID="uh" beginTime="8.300" endTime="9.170">
<score AM="-26474.20117" LM="-1380.89258" COMBINED="-27855.09375"/>
</word>
<word wordID="[s]" beginTime="9.170" endTime="9.200">
<score AM="-26595.05664" LM="-1380.89258" COMBINED="-27975.94922"/>
</word>
<word wordID="dertig" beginTime="9.200" endTime="9.540">
<score AM="-27844.60742" LM="-1496.64246" COMBINED="-29341.25000"/>
</word>
<word wordID="[s]" beginTime="9.540" endTime="9.550">
<score AM="-27888.40625" LM="-1496.64246" COMBINED="-29385.04883"/>
</word>
<word wordID="elf" beginTime="9.550" endTime="10.010">
<score AM="-29522.81055" LM="-1581.08472" COMBINED="-31103.89453"/>
</word>
<word wordID="[s]" beginTime="10.010" endTime="10.020">
<score AM="-29572.72656" LM="-1581.08472" COMBINED="-31153.81055"/>
</word>
<word wordID="en" beginTime="10.020" endTime="10.210">
<score AM="-30031.23047" LM="-1634.81067" COMBINED="-31666.04102"/>
</word>
<word wordID="later" beginTime="10.210" endTime="10.600">
<score AM="-31528.07227" LM="-1732.08472" COMBINED="-33260.15625"/>
</word>
<word wordID="naar" beginTime="10.600" endTime="10.800">
<score AM="-32227.48438" LM="-1785.05762" COMBINED="-34012.54297"/>
</word>
<word wordID="[s]" beginTime="10.800" endTime="10.810">
<score AM="-32279.51172" LM="-1785.05762" COMBINED="-34064.57031"/>
</word>
<word wordID="het" beginTime="10.810" endTime="10.900">
<score AM="-32755.11133" LM="-1816.83484" COMBINED="-34571.94531"/>
</word>
<word wordID="cement" beginTime="10.900" endTime="11.460">
<score AM="-34582.38672" LM="-1954.31470" COMBINED="-36536.70312"/>
</word>
<word wordID="[s]" beginTime="11.460" endTime="11.470">
<score AM="-34614.26953" LM="-1954.31470" COMBINED="-36568.58594"/>
</word>
<word wordID="'t" beginTime="11.470" endTime="11.700">
<score AM="-35245.50391" LM="-1995.90515" COMBINED="-37241.41016"/>
</word>
<word wordID="cd" beginTime="11.700" endTime="12.180">
<score AM="-36775.66016" LM="-2143.99170" COMBINED="-38919.65234"/>
</word>
<word wordID="lover" beginTime="12.180" endTime="12.710">
<score AM="-38360.26172" LM="-2316.99219" COMBINED="-40677.25391"/>
</word>
<word wordID="[s]" beginTime="12.710" endTime="12.760">
<score AM="-38546.11328" LM="-2316.99219" COMBINED="-40863.10547"/>
</word>
<word wordID="gezondheidss" beginTime="12.760" endTime="13.270">
<score AM="-40555.60938" LM="-2493.16968" COMBINED="-43048.77734"/>
</word>
<word wordID="en" beginTime="13.270" endTime="13.370">
<score AM="-40998.75391" LM="-2500.35669" COMBINED="-43499.10938"/>
214
Annex E. Speech recognition
</word>
<word wordID="niet" beginTime="13.370" endTime="13.670">
<score AM="-42095.47656" LM="-2567.79614" COMBINED="-44663.27344"/>
</word>
<word wordID="werd" beginTime="13.670" endTime="13.880">
<score AM="-42794.28906" LM="-2670.35693" COMBINED="-45464.64453"/>
</word>
<word wordID="hij" beginTime="13.880" endTime="14.180">
<score AM="-44075.64062" LM="-2732.96899" COMBINED="-46808.60938"/>
</word>
<word wordID="[s]" beginTime="14.180" endTime="14.260">
<score AM="-44427.12891" LM="-2732.96899" COMBINED="-47160.09766"/>
</word>
<word wordID="de" beginTime="14.260" endTime="14.370">
<score AM="-44871.51562" LM="-2785.76685" COMBINED="-47657.28125"/>
</word>
<word wordID="komende" beginTime="14.370" endTime="14.780">
<score AM="-46386.73828" LM="-2853.83130" COMBINED="-49240.57031"/>
</word>
<word wordID="zeven" beginTime="14.780" endTime="15.170">
<score AM="-47839.63281" LM="-2934.08618" COMBINED="-50773.71875"/>
</word>
<word wordID="weken" beginTime="15.170" endTime="15.550">
<score AM="-49237.71875" LM="-2973.23242" COMBINED="-52210.94922"/>
</word>
<word wordID="met" beginTime="15.550" endTime="15.740">
<score AM="-49943.36328" LM="-3037.76416" COMBINED="-52981.12891"/>
</word>
<word wordID="jullie" beginTime="15.740" endTime="16.200">
<score AM="-51646.74219" LM="-3154.68311" COMBINED="-54801.42578"/>
</word>
<word wordID="uh" beginTime="16.200" endTime="16.460">
<score AM="-52159.67188" LM="-3223.82251" COMBINED="-55383.49609"/>
</word>
<word wordID="doornemen" beginTime="16.460" endTime="17.190">
<score AM="-55033.90234" LM="-3385.21851" COMBINED="-58419.12109"/>
</word>
<word wordID="[s]" beginTime="17.190" endTime="17.670">
<score AM="-56481.44141" LM="-3385.21851" COMBINED="-59866.66016"/>
</word>
</wordsequence>
</speech>
<speech label="SPK01-002" begintime="18.13" endtime="35.00" >
<real_time milliseconds="17020" frames="1687" RTF="1.0089"/>
<EOS-score="0.00000"/>
<score COMBINED="-68474.89844"/>
<wordsequence>
<word wordID="[s]" beginTime="18.130" endTime="18.600">
<score AM="-1041.74548" LM="0.00000" COMBINED="-1041.74548"/>
</word>
<word wordID="en" beginTime="18.600" endTime="18.730">
<score AM="-1560.05615" LM="-42.98058" COMBINED="-1603.03674"/>
</word>
<word wordID="ik" beginTime="18.730" endTime="18.830">
<score AM="-1991.38696" LM="-88.98117" COMBINED="-2080.36816"/>
</word>
<word wordID="dacht" beginTime="18.830" endTime="19.040">
<score AM="-2784.23022" LM="-145.49277" COMBINED="-2929.72290"/>
</word>
………………………………….
<word wordID="ook" beginTime="2688.400" endTime="2688.560">
<score AM="-62879.57422" LM="-3356.99512" COMBINED="-66236.57031"/>
</word>
<word wordID="realiseren" beginTime="2688.560" endTime="2689.390">
<score AM="-65831.93750" LM="-3444.05444" COMBINED="-69275.99219"/>
</word>
<word wordID="[s]" beginTime="2689.390" endTime="2689.570">
<score AM="-66419.42969" LM="-3444.05444" COMBINED="-69863.48438"/>
</word>
</wordsequence>
</speech>
<speech label="SPK01-170" begintime="2689.59" endtime="2706.69" >
<real_time milliseconds="19453" frames="1710" RTF="1.1376"/>
<EOS-score="0.00000"/>
<score COMBINED="-68375.91406"/>
<wordsequence>
<word wordID="[s]" beginTime="2689.590" endTime="2689.740">
<score AM="-534.19904" LM="0.00000" COMBINED="-534.19904"/>
</word>
<word wordID="dat" beginTime="2689.740" endTime="2689.920">
<score AM="-1349.89319" LM="-44.63421" COMBINED="-1394.52734"/>
</word>
<word wordID="er" beginTime="2689.920" endTime="2690.020">
<score AM="-1821.88281" LM="-98.82831" COMBINED="-1920.71106"/>
</word>
<word wordID="natuurlijk" beginTime="2690.020" endTime="2690.440">
<score AM="-3515.03809" LM="-181.26859" COMBINED="-3696.30664"/>
</word>
<word wordID="ook" beginTime="2690.440" endTime="2690.620">
<score AM="-4123.34424" LM="-209.74142" COMBINED="-4333.08545"/>
</word>
<word wordID="[s]" beginTime="2690.620" endTime="2690.630">
<score AM="-4170.82178" LM="-209.74142" COMBINED="-4380.56299"/>
</word>
Annex E. Speech recognition
215
<word wordID="heel" beginTime="2690.630" endTime="2690.920">
<score AM="-5322.86133" LM="-260.55157" COMBINED="-5583.41309"/>
</word>
<word wordID="[s]" beginTime="2690.920" endTime="2690.930">
<score AM="-5364.94189" LM="-260.55157" COMBINED="-5625.49365"/>
</word>
<word wordID="anders" beginTime="2690.930" endTime="2691.370">
<score AM="-7285.50293" LM="-308.50604" COMBINED="-7594.00879"/>
</word>
<word wordID="[s]" beginTime="2691.370" endTime="2691.530">
<score AM="-8042.13965" LM="-308.50604" COMBINED="-8350.64551"/>
</word>
<word wordID="kan" beginTime="2691.530" endTime="2692.050">
<score AM="-10443.12793" LM="-372.76978" COMBINED="-10815.89746"/>
</word>
<word wordID="[s]" beginTime="2692.050" endTime="2692.140">
<score AM="-10779.51074" LM="-372.76978" COMBINED="-11152.28027"/>
</word>
<word wordID="en" beginTime="2692.140" endTime="2692.280">
<score AM="-11268.68262" LM="-425.27689" COMBINED="-11693.95996"/>
</word>
<word wordID="in" beginTime="2692.280" endTime="2692.430">
<score AM="-11783.96582" LM="-486.46951" COMBINED="-12270.43555"/>
</word>
<word wordID="een" beginTime="2692.430" endTime="2692.560">
<score AM="-12219.71777" LM="-525.48224" COMBINED="-12745.20020"/>
</word>
<word wordID="[s]" beginTime="2692.560" endTime="2692.690">
<score AM="-12731.64941" LM="-525.48224" COMBINED="-13257.13184"/>
</word>
<word wordID="heel" beginTime="2692.690" endTime="2692.920">
<score AM="-13424.57227" LM="-595.17139" COMBINED="-14019.74316"/>
</word>
<word wordID="veel" beginTime="2692.920" endTime="2693.150">
<score AM="-14489.91504" LM="-651.30042" COMBINED="-15141.21582"/>
</word>
<word wordID="[s]" beginTime="2693.150" endTime="2693.160">
<score AM="-14535.47363" LM="-651.30042" COMBINED="-15186.77441"/>
</word>
<word wordID="landen" beginTime="2693.160" endTime="2693.470">
<score AM="-15868.35742" LM="-731.72437" COMBINED="-16600.08203"/>
</word>
<word wordID="ook" beginTime="2693.470" endTime="2693.640">
<score AM="-16490.08789" LM="-794.62903" COMBINED="-17284.71680"/>
</word>
<word wordID="heel" beginTime="2693.640" endTime="2693.850">
<score AM="-17516.69141" LM="-864.57343" COMBINED="-18381.26562"/>
</word>
<word wordID="[s]" beginTime="2693.850" endTime="2693.860">
<score AM="-17571.29492" LM="-864.57343" COMBINED="-18435.86914"/>
</word>
<word wordID="anders" beginTime="2693.860" endTime="2694.160">
<score AM="-18876.98633" LM="-912.52789" COMBINED="-19789.51367"/>
</word>
<word wordID="[s]" beginTime="2694.160" endTime="2694.170">
<score AM="-18926.98828" LM="-912.52789" COMBINED="-19839.51562"/>
</word>
<word wordID="gaat" beginTime="2694.170" endTime="2694.670">
<score AM="-20875.73633" LM="-981.36896" COMBINED="-21857.10547"/>
</word>
<word wordID="[s]" beginTime="2694.670" endTime="2695.030">
<score AM="-22006.42578" LM="-981.36896" COMBINED="-22987.79492"/>
</word>
<word wordID="het" beginTime="2695.030" endTime="2695.220">
<score AM="-22716.94141" LM="-998.54010" COMBINED="-23715.48242"/>
</word>
<word wordID="meest" beginTime="2695.220" endTime="2695.470">
<score AM="-23682.84570" LM="-1089.76001" COMBINED="-24772.60547"/>
</word>
<word wordID="extreme" beginTime="2695.470" endTime="2695.890">
<score AM="-25189.66016" LM="-1165.45996" COMBINED="-26355.11914"/>
</word>
<word wordID="voorbeeld" beginTime="2695.890" endTime="2696.260">
<score AM="-26716.26367" LM="-1201.65186" COMBINED="-27917.91602"/>
</word>
<word wordID="daarvan" beginTime="2696.260" endTime="2696.590">
<score AM="-27988.59961" LM="-1258.88196" COMBINED="-29247.48242"/>
</word>
<word wordID="zijn" beginTime="2696.590" endTime="2696.780">
<score AM="-28713.11133" LM="-1301.98157" COMBINED="-30015.09375"/>
</word>
<word wordID="tien" beginTime="2696.780" endTime="2696.960">
<score AM="-29364.91016" LM="-1415.34509" COMBINED="-30780.25586"/>
</word>
<word wordID="ontwikkelingslanden" beginTime="2696.960" endTime="2697.850">
<score AM="-33020.26172" LM="-1557.82581" COMBINED="-34578.08594"/>
</word>
<word wordID="[s]" beginTime="2697.850" endTime="2698.380">
<score AM="-34633.25000" LM="-1557.82581" COMBINED="-36191.07422"/>
</word>
<word wordID="waaronder" beginTime="2698.380" endTime="2699.060">
<score AM="-37268.43750" LM="-1628.67432" COMBINED="-38897.11328"/>
</word>
<word wordID="basis" beginTime="2699.060" endTime="2699.450">
<score AM="-38777.08203" LM="-1773.71997" COMBINED="-40550.80078"/>
</word>
216
Annex E. Speech recognition
<word wordID="in" beginTime="2699.450" endTime="2699.600">
<score AM="-39287.77344" LM="-1822.52649" COMBINED="-41110.30078"/>
</word>
<word wordID="first" beginTime="2699.600" endTime="2699.850">
<score AM="-40235.83203" LM="-1972.55884" COMBINED="-42208.39062"/>
</word>
<word wordID="die" beginTime="2699.850" endTime="2700.040">
<score AM="-41120.54297" LM="-2045.21863" COMBINED="-43165.76172"/>
</word>
<word wordID="nog" beginTime="2700.040" endTime="2700.230">
<score AM="-41832.09766" LM="-2114.40869" COMBINED="-43946.50781"/>
</word>
<word wordID="volledig" beginTime="2700.230" endTime="2700.810">
<score AM="-43819.07422" LM="-2224.03735" COMBINED="-46043.11328"/>
</word>
<word wordID="[s]" beginTime="2700.810" endTime="2700.820">
<score AM="-43871.33594" LM="-2224.03735" COMBINED="-46095.37500"/>
</word>
<word wordID="ontbreekt" beginTime="2700.820" endTime="2701.450">
<score AM="-46094.56641" LM="-2319.25000" COMBINED="-48413.81641"/>
</word>
<word wordID="[s]" beginTime="2701.450" endTime="2701.750">
<score AM="-46920.07422" LM="-2319.25000" COMBINED="-49239.32422"/>
</word>
<word wordID="en" beginTime="2701.750" endTime="2701.850">
<score AM="-47307.50781" LM="-2377.24951" COMBINED="-49684.75781"/>
</word>
<word wordID="daar" beginTime="2701.850" endTime="2702.000">
<score AM="-47861.07812" LM="-2434.52393" COMBINED="-50295.60156"/>
</word>
<word wordID="[s]" beginTime="2702.000" endTime="2702.010">
<score AM="-47905.36328" LM="-2434.52393" COMBINED="-50339.88672"/>
</word>
<word wordID="gaat" beginTime="2702.010" endTime="2702.170">
<score AM="-48427.79688" LM="-2483.29663" COMBINED="-50911.09375"/>
</word>
<word wordID="na" beginTime="2702.170" endTime="2702.340">
<score AM="-49053.28516" LM="-2578.76147" COMBINED="-51632.04688"/>
</word>
<word wordID="de" beginTime="2702.340" endTime="2702.460">
<score AM="-49529.62500" LM="-2599.29126" COMBINED="-52128.91797"/>
</word>
<word wordID="[s]" beginTime="2702.460" endTime="2702.470">
<score AM="-49568.12109" LM="-2599.29126" COMBINED="-52167.41406"/>
</word>
<word wordID="pauze" beginTime="2702.470" endTime="2702.800">
<score AM="-50858.21875" LM="-2661.28027" COMBINED="-53519.50000"/>
</word>
<word wordID="door" beginTime="2702.800" endTime="2703.060">
<score AM="-51838.53516" LM="-2743.34717" COMBINED="-54581.88281"/>
</word>
<word wordID="is" beginTime="2703.060" endTime="2703.360">
<score AM="-52878.67578" LM="-2825.90088" COMBINED="-55704.57812"/>
</word>
<word wordID="[s]" beginTime="2703.360" endTime="2703.370">
<score AM="-52931.14453" LM="-2825.90088" COMBINED="-55757.04688"/>
</word>
<word wordID="er" beginTime="2703.370" endTime="2703.550">
<score AM="-53466.64062" LM="-2855.25171" COMBINED="-56321.89062"/>
</word>
<word wordID="[s]" beginTime="2703.550" endTime="2703.630">
<score AM="-53726.53125" LM="-2855.25171" COMBINED="-56581.78125"/>
</word>
<word wordID="welgeteld" beginTime="2703.630" endTime="2704.230">
<score AM="-56197.35938" LM="-2989.80566" COMBINED="-59187.16406"/>
</word>
<word wordID="[s]" beginTime="2704.230" endTime="2705.140">
<score AM="-59231.87109" LM="-2989.80566" COMBINED="-62221.67578"/>
</word>
<word wordID="aan" beginTime="2705.140" endTime="2705.430">
<score AM="-60231.19141" LM="-3078.66821" COMBINED="-63309.85938"/>
</word>
<word wordID="uh" beginTime="2705.430" endTime="2705.570">
<score AM="-60526.45312" LM="-3139.66064" COMBINED="-63666.11328"/>
</word>
<word wordID="[s]" beginTime="2705.570" endTime="2705.610">
<score AM="-60670.96094" LM="-3139.66064" COMBINED="-63810.62109"/>
</word>
<word wordID="even" beginTime="2705.610" endTime="2705.800">
<score AM="-61350.90234" LM="-3239.85645" COMBINED="-64590.75781"/>
</word>
<word wordID="pauzeren" beginTime="2705.800" endTime="2706.330">
<score AM="-63343.58203" LM="-3358.18481" COMBINED="-66701.76562"/>
</word>
<word wordID="met" beginTime="2706.330" endTime="2706.690">
<score AM="-64959.30859" LM="-3416.60718" COMBINED="-68375.91406"/>
</word>
</wordsequence>
</speech>
</segments>
<statistics>
<real_time milliseconds="3150020" frames="261370" RTF="1.2052"/>
</statistics>
</shout_metadata>
Annex E. Speech recognition
217
Annex E2. Transcript of lecture CT3011 from speech recognition
(SHoUT)
en aan uh ja en daarin lopen we daar is zeker geen en uh we met het hele heelal van uh dertig elf en later naar het cement 't cd lover gezondheidss en niet
werd hij de zeven weken met jullie uh doornemen en ik dacht ik zelf mee zeker je je voorstellen is mijn naam is als een tank zoals jullie daar zien staan en
ik dacht laat ik daar maar twee dingen van één en een half jaar naar werk ja nou willen niet inzien ja ik ben marathonloper een mooie foto van de glorieuze
binnenkomst in uh rotterdam in april afgelopen periode en marathonlopers dat zijn allemaal een beetje fanatieke leidde er echter door de ouders die twee
iedere dag die je er beter inleven zelf de aandacht te organiseren dat het allemaal kan dus ik loop hier ook iedere dag tussen de meer dan ooit je naar delft
houdt of langs deze rivier of andere koerier en uh als jullie d'r is een keer in korte broek als d'r is vaccineren lopen dan klopt dat we daar niet en uh dat
door tieners met een heel groepje er mensen zijn bij ons op de afdeling met studenten en en één van die je ergens tien cent en is zeer tevreden met daarin
trainen zijn diezelfde drie jaar geleden leerde hij jij inleidingen en laat de meeste mensen was een derdejaars inmiddels is afgestudeerd dat heeft hij er twee
ik ben ja ja ja ja ja dat was nou als ik zeg niet als derde is dat zou mensen kunnen daar waar deze leest hè ja tuurlijk vorig jaar lang werden afgegeven in
alle klassen het is een stuk zeg maar dat is wat er is echter laten we deze wist dat hij zijn dus uh wat te zien zou kunnen zijn in eerste reden is ook altijd ze
er één van de gast gelezen medelijden is niet echt niet het eerste jaar en daar is natuurlijk ook een beetje net diary ben ik er dus dat zou best kunnen
missen je daar je daarvan in het nou dan weet je nou dat weet jij hier dertig jaar geleden de laatste set neer te zetten dat is een techniek verste dertien zes
en zeventig afgestudeerd en daarna de reeks werken aan werk erbij en is hier wel meer dingen zei in amersfoort en dat kan ik hier niet van harte aan uh
raden als je straks afgestudeerd ben om daarin niet hier is d'r oren te gaan werken we zeggen wel ervaring je bent met allerlei projecten over de hele
wereld bezig in mijn geval dan drinkwater projecten is het ontwerp van zijn installaties en waarom van de systemen ook het doen van onderzoek eigenlijk
kun je alle kanten op de één en hier is wel in een nederlands indië zullen zijn redelijk succesvol ook op de internationale markten tegenwoordig ja ik heb
daar uh vele jaren gewerkt en op dat andere evenementen inmiddels is dat alweer zeventien jaar geleden werd dat tensen stond dat ze later herenhuis op
vier en delft vier en toen dacht ik van nou ja maar ik maak een brief schrijven je weet het nooit uh en niet wist dat is altijd mis dus ik heb een brief
geschreven en ik dacht ik standvastig niet worden maar ik weet 't wel is ja dan meteen een eerste levenslessen in colombia maar is wat en het kan altijd
meevallen naar ik ben is dat die vervolgens voor één dag in een beetje in deeltijd hoogleraar geworden in de drinkwater zien het is mijn leerstoel en en ja
zo langzamerhand verandering komt dat andreae wordt voor steeds meer dingen gevraagd is in de laatste maand meer die nier in delft gaan doen en met
de aanstelling daarentegen werd ik steeds verder af behoud en vanaf negentien negen en negentig en ik geleden gestopt met trainen ben ik hier voor altijd
een leraar en vonden het ook weer aan de betekent ook dat je hebt enerzijds raken want niets aan onderwijs anderzijds onderzoek maar ook een incident
en is management jaagt de moeder en ik ben een hoofd afdeling en zo en dat is in het managementteam of opleiding commissie moet je over algemene
dingen meepraten en meebeslissen ja daar is in een dagtaak van maken en dat ze er heb ik altijd een eer vinden dat het leukst om met 't vaak bezig te zijn
en daarin kan ook op de tweede plaatje wat hier staat dat alleen steeds eigenlijk als het is de leider en dat ga je niet komen die ja en dat proces doormaken
met stront want als er een leuk om te zien hoe studenten zich transformeren van janine neer anonieme figuren die in de collegezaal zitten erin zitten te
luisteren en niet meer als almere wat niet in een monoloog dan overdag werden hoewel ik overigens wel reacties realiseerde hij stellen en ik zal er ook altijd
doen er expliciet om vragen aan 't maar goed de praktijk is toch dat ze in deze staat van de studie ziet hierin ofwel te luisteren en delen kijkt steeds leuker
als je verder komt in een vier en vijftien jaar en 't hoogtepunt is dan natuurlijk het zetten achter je laat echter een onderwerp helemaal zelf bij de kop wat
ik zeg ook altijd tegen mijn afstuderen is je moet van je afstudeerde leer je visitekaartje maken en door essent en en dat is dan krijgt ze al op momenten
dat je klaar en het afstuderen en ontwerpt alleen via het meeste van dat onderwerp vrijaf meer dan er niet aan ook in nederland dan de ijssel ook niet meer
door de achtste keer komen ook weer aan de aangever veel ken daar eind aan daar komen de mensen vanuit de waterbedrijven van klimaatverandering
steeds instituten dienen aan hele discussies en onze acht jaar in ere zie weet ik keer op keer als er aan de man te worden misschien niet altijd honderd
procent gelukt maar toch al één en negentig procent werd goed dan zou 't genoeg landen eten maken ik zeg alleen dat wat men ook aan af te dingen zit en
dat is ook zo en uh ik heb kunnen stickers tachtig gehad en soms gaat dat dan heel goed zoals je staat met de rk in en doris doris is niet alles in de zaal
aanwezig en dat niet dan uh het afgelopen jaar allebei zelfs net langs zijn afgestudeerd te betreden zit je ja heel goed gedaan ook stijgen schraal te zijn
dan ook de achtste keer projecten heel goed gedaan dan het jaar dat is voor ons gewoon eerlijk om dat mee te maken om me te zien hoe jonge mensen
het vaak al blijft gaan naar zinnen zelf ook enthousiaster worden en uh ja stempel gaan zitten om voor uh op ons vakgebied maar ik had dat enkele van je
nieren op z'n grenzen maar kan me de 't dan is alles en hijzelf en het recht dan dat dit vaker bedreigd gaan dat ze er doen aan de hand van naar het
boekwerk naast alle wordt aangegeven hebben en de nederlandse en engelstalige versie hier staan dan ook dat moeten jullie kant en daarin is dit het beste
van ons niet veel te zien de verdieping voor vijfentwintig euro in de winkel kosten vijftig euro maar naar de speciale korting sturen regeling en jelena
dezelfde is als je nederlands of engels een boekwerk telt de inhoud is jarenlang was ze als werk en in ieder geval voor dit uh vaak als jullie je en als die
staan heel erg en dan zou ik zeggen als je goed engels kunnen lezen koken zijn als een boek dat is niet actuele staat iets meer informatie in maar het
nederlandse boek is voor dit vak zeker ja is dan ook heeft natuurlijk alles wat er uh dat we daar over gaat vragen bij het 't en daar een en dezelfde en die je
ook terug te komen deze omroep een drietal nog onzekere vinci als naslagwerk en jullie als je zo'n boek eenmaal hebt dan heb je daarbij neer dat ook na je
afstuderen nee niet nee hè als je vervolgens ergens in een vreemd land en is te laat zien dat er van haider roepen is uit het duits en dan weet je het één en
ander het is niet te zien is zo'n boek ook daar staan ze naast de kantine ook iedereen hebben ook vraagstukken landelijk noord staan zullen jullie misschien
ook wel gezien hebben computers daar ineens zegt dat is uh over uh is niet verplicht is merels niet stuk ligt en ja je moet eigenlijk een tentamen doen maar
uh de dealer en materiaal aan 't is uh de maakt er gebruik van zelf zeggen maar een ander niet controleren bestaan daar graag aan ontbreekt wordt zitten
vragen in de broek de antwoorden staan welke bijen of althans als je die computers zijn na te maken te krijgen na de ramp met een andere vraag maar aan
welke vragen fout waren deze leer is een ondersteuning voor jullie wij het kennismaken met de materie en het leren van de stof en oude tentamens hebben
en ook bij staan deze keer ook nog is uh hoe venijn kijken wat er ongeveer gevraagd wordt en elke lezer eigenlijk alleen in ere houden en sir maar ja deze
over een half uur niet helemaal te kennen dat boek hoort zowel daarbij dertig elf als mij het commissie vier en dertig twintig jaar na keizer gemaakt is voor
de mensen die laten mensen met gaande is en uh de hoofdstukken die voor dertig elf gedragen te worden op tentamen staan ieren aangegeven en uh en
die presentatie komt ook willen we bereiken wordt zijn zoals jullie weten niet precies weet deze video-opnamen dan gaan we deze colleges schneider is
zeven keer kan de periode vanaf en niet naar nederland dit jaar zou doen dat in die eerste uren vertel ik een beetje de grote lijnen van het onderwerp is de
belangrijkste ik proberen alle kleren aan te geven wat is nou belangrijker en wat minder en het leren heb ik steeds één van de trommels er niet en daar is
dat uh door s. die daar niets aan vertellen over hun eigen onderwerp zijn eigen onderzoek naar eigen project dat een stukje actualiteit en geest en uh
werken lering verdieping van het onderwerp en ik heb 't zo organiseert dat dat steeds als het goed niet goed op elkaar aansluiten en uh ja jullie je goede
beeld geven van de stof dat je straks een tamelijk makkelijk kunt maken wil niet zeggen dat alle onderdelen van de verhalen van 't allemaal genie
tentamens of uh zijn dat zullen ze er ook wel aangeven ja ja zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan je in een jaar hoeven te weten
maar gaat niet om de de beeldvorming die nu in het van de matinee de hele reeks is in de eind jaren dieren te laten grote zuivering installatie bij rotterdam
wijst naar de mis en om precies te zijn op de elf oktober ook dat is niet het verplicht alles is dat niet bij ons hebben zich tot nu toe steken zestig mensen
aangemeld uh daarin zich eveneens sluit op één oktober hebben gezegd omdat beide laten bedrijven tegenwoordig nog strikter de veiligheid van de russen
eiste nu zou zijn naam en de aanslagen in new_york medici je je moet daar uh precies op gegeven hier allemaal komen met name en zo hè maar moeten
daarvoor in staan maar ook dat er geen dingen gebeuren en er moet een tienkamp is gereserveerd worden hier en daar is gesitueerd is de mensen die zich
al gegeven hebben niet in eigen land een heel sieren binnenkort kort na één oktober met een bevestiging en degene die zich niet opgegeven hebben niet
aan de niet nee je en ik ga ervan uit dat degene die zich wel opgegeven hebben het hier ook komen er dus niet een beetje tegenover de organisatoren als
we daar met dat minder mensen zouden aankomen dan aangemeld te hebben hebben zelf te redeneren kijken wat daar in zekere zin al over dat is middags
uh de verplichte tactieken uit zouden zijn van constructie leerde statistiek geloof ik een ze zullen proberen om 't hele neer te zetten zijn dat zal zeker niet
om half twee zijn is ik denk dat we zo ongeveer om half drie terug en ze zijn naar en we kijken gewoon naar college op donderdag en dus allemaal zelf nee
dat is kijk als ik al gaat ja kralj dus zijn de vragen over de organisatie en deze algemene inleiding elke nou dan ga ik altijd iets vertellen over gezondheid ze
echt niet dat ze al jarenlang vertel dat het ook bij het eerste jaar al verteld en dat lijkt niets meer vertellen over de dingen laten zien van nederland en
nadat de oud-senator is dan iets vertellen over de dingen laten zien in ontwikkelingslanden want daar is hij dan ben je bezig uh nou we hadden natuurlijk
gezondheidss techniek en nou dat zal ieder van jullie niet uh zijn dat dat gaat over de stedelijke later uh ik kringloop is de infrastructurele werken voor de
voorziening van drinkwater te winnen van grondwater twintig al oppervlaktewater en stuiteren daarvan het gevolg te transporteren met een heel transport
218
Annex E. Speech recognition
leidingen en distributie leiding systeem naar ons al een toename is een oude zijn d'r industriële bedrijven vervolgens het inzamelen aansturen van het
afgeladen vier irian het zuiveren van dat afvalwater en dat wordt dan vervolgens weer geloosd op het oppervlaktewater is alle infrastructurele werken die
over die kleine stedelijke waterkering ook aan dat is wat de gezondheid zegt niet te noemen en ik zal hier voor alle strokes er op de de en
drinkwatervoorziening omdat we daarin ook een is een duidelijk effect zien zoals hier in deze steriele weergegeven het verdwijnen van besmettelijke ziekten
en in nederland kunnen die niet meer overgedragen worden via besmet werden in krachtig in het voordeel is dat dit nog heel andere situatie maar hier
hebben we daar flink veel succes meer gehad in de twintigste eeuw zien hier een plaatje dat weer geeft de daling van de sterfte aan en lijkt niet steeds in
de twintigste eeuw en dat loopt parallel aan het percentage van de mensen nog niet aangesloten is de drinkwatervoorziening in diezelfde periode is in
nederland drinken laten zien aangelegd rond negentien honderd zelfs kort voor negentien honderd drie grote steden en zo langzamerhand ook in kleinere
steden en met het platteland en in z'n al vanaf negentien vijfenzeventig zeg maar niet in nederland iedereen 't drinken aangezien aangesloten ik al een
besmettelijke ziekte die ieder het is net drinkwater over een aantal worden ook niet meer voor en deze gaat ze bij ons er om daar uh infrastructurele
werken en voor een goeie laten kwaliteit dus uh zaken en als de waterlinie water zuivering water transport laten genie en niet meeregeren ook en niet
waterkwaliteit microbioloog die één is eind het afwezig zijn van de organismen waren ziet veranderen kunnen worden maar anderzijds ook het gebruik
ervan en niet de organismen onder zijn d'r in 't uh optimaliseren en micro-organismen kunnen ook weer verontreiniging afbreken ik eis een voorbeeld
daarvan is afgeladen zijn ring waarin met behulp van zuurstof en actief leren tennissen mengsel van bacteriën 't afvalstoffen in het afvalwater laten
afbreken dus waterkwaliteit laten geen ier en microbioloog die zijn in dit deel van de civiele techniek je vrij belangrijk wij maken natuurlijk ook gebruik van
de algemene kennis van de civiele zie je er zijn met name dan van zaken als iedereen audina niet oorlogen die en die constructie een leren constructieve
vormgeving projecten realisatie en informatica aan vind ik allemaal dingen die je in projecten nodig hebt vaak al tien tien verband hebben ze niet hier is
wereldbeeld om ons nee niet meer bezig met automatisering de ander is meer bezig met het constructieve deel is weer met die naliet aanwezig en jullie
kunnen afhankelijk van de specialisatie niet kiest daar een rol in spelen en in iets anders en niet is natuurlijk van groot belang voor de volksgezondheid is
trekt gezichten het gaat over de relatief grootschalig en uh uh infrastructurele werken we zien hier de zogenaamde die scorsese de mens is in de brabantse
biesbosch mckenzie aangelegd zijn voor de drinkwatervoorziening en het gaat om een goed georganiseerde sector met z'n allen aan en zelfs een aparte
wetgeving voor de waterleiding werd was dat willink laten zien gaat men gewoon staat precies waar alles aan moet voldoen en dat de directeur van het
waterleidingbedrijf daar persoonlijk voor aansprakelijk is hier is je gevangenisstraf als die water of water distribueert waar je ziek van kan worden zeggen ze
allemaal goed georganiseerd ja ja ja en houdt aan en is die studie is de rol van mijn leerstoel drinkwater zien is de enige leerstoel in nederland op het
gebied van drinkwater zien dus dat is nog klein geeft een zekere exclusief en uh expliciet die tijd veel van onze studenten die zijn deze ook uh uh ja die
hebben positie zien niet waar de wereld die zijn directeur of staf functionarissen of ontwerper daarin uh beiden laten bedreigen en ook veel van onze indiërs
gaan naar de hier is een loser door en nou dat gebeurt een heleboel ander toe hebben zelfs ook als college willem-alexander die ook het watermanagement
interessant vindt nou daar heb ik 't is altijd nog drie dia's worden niet aan naar nog meer vertellen over de opzet van de eerste keer in nederland die je er
nog even laten registreren van dat werk van anderen ons vakgebied het is tien plaatje dat je er ook alle zien er dit jaar kan maar niet altijd wel uit dit diertje
vandaan komt rent ja precies is dit is 't water verdrijft daar is niets aan en niet in zijn geval de media voeden boog en is hier dat ons alles kunnen dieren
gedragen en dat was het van de wedstrijd is hier 't water gebruiken enorm naar mijn idee gaat niemand gelijk meer laten iedereen ziet voordelen in voor de
tv zitten te kijken tot al te drieste is dan helemaal naar de wc met koffie uitermate en een enorme stijging in 't water gebruiken om de tweede en als
garnering kijken zien we weer een zeer lage die in het water gebruikt mensen als er niet in een kort voor tijd toen dat doelpunt in dit geval de dennis
bergkamp gemaakt werd en aan de einde van de mensenrechten en iedereen die naar de wc toe en dat ze al de zie je deze ook bij die niet hier verder uit
en zelfs daar is het zo dat operators instellen die ziet er ook te kijken en alles in een beetje op als alle verkrachtte de ja ja ja dat is een enorme rij
terroriseren waar zij dan in dit werk deze de rechter is het hele strook radio dan dat van uh ons gedrag 't gedacht van de bevolking en in de jaren dat
normen inderdaad ik eis die je niet verder is het nou mensen daar is de oudste waarom juist commentaar komt geven daarin iedereen heeft het recht van
de wc en heeft een lijst een jaar is het zeggen heeft en maar de zaken op de doelpunten herhaald en enkele andere geesten is uh nou dat is één ja
natuurlijk vooral deze met ontwerpt en en 't gaat natuurlijk dus laat komen en nieuwe infrastructurele werken de bouw van een ongezonde te ontwerpen
van een zuivering sinds te laat zien transport leiding en antwerpen en in dit als er over het project onderwijs en ontwerper andere eisen dat is geen visie en
een bepaald aandeel in je hoofd maker van doen iets in elkaar zit is er veel te ver in het dat nauwe en en hoe stroomt het water door een installatie niet al
is de lijn daar moeten we een bepaald schema van maken en moeten we vermoeden zou kunnen toelaten dat moeten we kunnen berekenen en daar
moeten de vrouwen ook geen fouten bij maken en daar is dit plaatje voor uh bedoelt in één van de consulten installaties nee en drinkwater evenals een
aandenken aan in en langs de drie nooit terug maar toen ik daar tien keer laten slachten is opgetreden met als gevolg in rosie van de consultant en uh ja
natuurlijk heel vaak dat je dan ook helemaal niet dat je met die wet van murphy te maken met zijn dat alles wat fout kan gaan dat gaat ook een keer deze
laatste slag met het verschijnsel dat als bijvoorbeeld een pond af slaat dat er onder druk of kan ontstaan en die onder druk niet aan en uh ja deze
inderdaad tot in de uzi een leiden naar 't kan je team voorkomen door een uh mmm en onderricht in de lucht in zijn nieuwe aan te dringen 't is hier ook
gedaan bovenop dat al veel te zeldzame ventiel maar helaas was er net dat er dan mensen die komt hieruit viel de stroomstoring was het ook een intern
was het een hele strenge vorst en was dat de dienst in tiel de verloren waardoor er geen lucht meer kon toetreden en dat is toch vaker mond stond in dat
land en ja de liefde en niet wil goedpraten optrad is ontwerper is vooral ook de messiah van die niet niet kunnen gaan vandaar dat italie kan ook vrij
belangrijk is is niet heel tevreden dat het water en ze uitsluitsel over de verkeerde kant op gaat en je moet vooral ook steeds aan het einde op de dingen
die fout kunnen gaan en ontwerpen is vooral ook echt waarin uh dingen gezien hebben hoe iemand in de praktijk wel vandaar dat we die excursie gepland
hebben naar de wereld laten doen in juli voor de eerste keer vast is even kijken van uh ja precies een installatie naar haar gedaan moet je allemaal
rekening mee houden paula iets en dan nog alleen maar maakt zich al een ander jaar in limburg in dit geval met waren grote transporten leiding is
aangelegd bij en oppervlaktewater project nu al heel dat was in het kader van uh dezelfde namen de centen wel eens discussie is deze discussie die in
nederland een aantal jaren voor het eerst aan de orde die niet onder andere door de winning van drinkwater aan de grond laten staan allemaal
aangetreden verdroging van natuurgebieden op 't eerste hier in limburg zeg tien jaar tien geleden is er nou moeten de grondwater ingaan verminderen en
overgaan op de laatste is dan tenslotte door naar de trainer zijn is dat is vrij makkelijk is te lezen hiernaast waarbij aangelegd aangelegd 't was een hand in
hand is te laat stadium gewonnen is niet het was toch al niet zoveel met naast je later en dat laat ze laten gaat vervolgens uh faneyte dat werk en dat zien
hier zakt dat ze er vanzelf de grond in een door infiltratie kunstmatig in getraceerd water de grond in waarbij je er vast een heleboel kwaliteitsverbetering
optreedt allerlei stoffen die worden afgespeeld in het westen met zand ondergronds en dat die jullie gaan dood door de lange verblijf tijd je krijgt dan een
aanzienlijke verbetering van de late kwaliteit is dan wordt het water weer op om met behulp van peter niet aan op een bepaalde afstand rook lont albert
einstein had geplaatst dus dan dan niet eigenlijk een soort kunstmatige ontstaat en je maakt dan eigenlijk van nagelaten want natuurlijk en allerlei
bacteriën en virussen en andere verontreiniging bevat maakt een soort kunstmatige grondwater dat wordt dan weer gewonnen en wordt vervolgd nog
gezuiverd in zijn installatie die je werk zien weergegeven en dan ging 't is net niet worden leiding heel limburg erin naar de gebruikers te doen en en
tenslotte door natuurlijk ook een onderzoek van oranje op de thee in en als je mij niet hier willen werken nou net niet zo veel onderzoek nodig dan gebruik
je meestal vijftien regels en het ontwerp twee criteria maar 't vaak niet ontwikkelt zich natuurlijk ook steeds verder naar zijn niet intimideren bedreigingen
momenteel onmogelijk niels het voorkomen van geneesmiddelen en in de rij in de wereld niet aantoonbaar in concentraties in een jaar in aanwezig en komt
dat nou ook in het drinkwater terecht en dan moeten we daar doen moeten ze uit de meer uitgebreide worden dat soort vragen die leven en zijn er aan het
onderzoek bezig onderzoek gebeurt maakt er de laatste bij ons in een plaatje van 't zelfde kern in luxemburg zal ik sterven naar je misschien afgelopen
maanden wat ook verteld hebben we nu maar het is ook heel relevant omdat het één en laat het andere niet is later is er natuurlijk een stof en de
verontreiniging en de stoffen waar het om gaat ja dat is afhankelijk van de bron is de interactie de lozingen van stoffen die evenveel plaats gevonden
hebben interactie met broeder een beladen natuurlijk afvalstoffen in het water terechtkomen die je later is er anders en je moet het bij voorkeur ter plaatse
doen het is niet zo goed mogelijk om te zeggen van nou ja ik doe niet landelijk door in maart oefenen nee je hebt altijd weer de toets nodig van de praktijk
gedrag te laten zich in de praktijk uh ook zoals de theoretici denken sommige dingen gebeurt dit wel in dat laatste is hier ook een land dat alleen steeds in
drie laten laboratorium waar allerlei opstelling staan filter is er dus niet installaties anderen ertoe voorstellingen en dan krijg je je laat ze serieuzer artikelen
moet doen als je in deze richting uh doorgaat uiteindelijk kerk waar is die als een promotieonderzoek kunnen doen en in de aula aan de de dokters wil
uitgereikt werk krijgen grootste dan heb ik nog een kwartier ik het goed zegt ja en die kan ik goed gebruiken versterkt ja om uh is negen en eerste gehaald
was te geven van wat is er nou die zonder nadenken aangezien in nederland wat moeten jullie daar nou van eten en en ik maakte aangelegenheid van een
presentatie die vorig jaar gegeven en tien jaar daarvoor de canadezen laten bedreigen en daar is 't is heel anders is ik heb daar ook echt m'n best gedaan
om een beetje duidelijk te maken van wat is er nou bijzonder in nederland en ze al wat zou hier niet aan de aan de aan de trainer en ik denk dat we jullie
ook een aardige introductie zou kunnen zijn in dit vakgebied en eigenlijk is dat trouwens al heel kernachtig weergegeven met dit laatste is een plaatje van
hun hun niet zinnen en joey dieren en ja het water uit te keren aan de ind en eigenlijk zoals maakt hier een uh weergeeft vertel zijn is het water moet zo
Annex E. Speech recognition
219
zijn dat je de volledig op kunt vertrouwen dat het niet zelf je kinderen later in de jaren en dat 't beloofde elke verdenking verheven is dat is eigenlijk niet en
dan zich niet aan de drinken aangezien in nederland dat is natuurlijk ook bij andere landen in zekere zin wel te groot maar toch veel minder niet alleen iets
te vieren in amerika en daarna en dat soort landen geweest zijn daar is 't ja eigenlijk niet zo dat men er drinkwater dat ik daar ook tenslotte ja later dat is
niet iets dat gebruik je om de voor de was te zien en en de missie niet te doen maar je ook over maar drink intieme eigenlijk niet in canada en india je laten
wilde in kennen eigen renstal te blijven zitten maar al die is er nog een filter op die kamer omdat water nadat ze ijveren en dat noemen we het
consumenten te allen is in die landen dus veel minder dan in nederland en het heeft voor een deel te maken met ja natuur en traditie en in europa zijn dat
alle trainingen goed regelt en in een erika christensen om minder daar overstroomd gelang heel dorien zijn aan 't is een nieuwe opbouwen zouden doen
maar in nederland ook niet en zelfs dat mensen in canada ook zijn ook in nederland is 't zelf vinden dat de absolute zekerheid moeten hebben dat ding dat
dat uit de kraan komt dat er drie aan altijd is het leven niet zeker eind en trainer altijd goed is zo dat onze kinderen met een gerust hart kunnen drinken en
hijzelf ook en jawel een plaats in het uh een aantal jaren 't was 't water gebruikt het dat we gebruik maken van grondwater en oppervlaktewater voor
drinkwater zien en grondwater meer is ja ook in nederland vaak nog van een hele goeie kwaliteit uh de beetje geïllustreerd aan dit laatste is dan uh de
zenuwen waren we regelen de wolken zien en je kunt je wel voorstellen als thierry als te treden daar op dat enorme zand oppervlak van de vrije uren
stroomt via daargelaten wordt heel goed gevuld leert en dat grondwater dat je daarom niet uh hoorde je dat je daar niet het is een hele goeie kwaliteit
natuurlijk is het grondwater is niet algemeen goed zijn best ook wel zorgen over hoort er hier en daar hebben natuurlijk ik had me niet te weinig zelfs uh ze
alles ter plaatse en die in het grondwater terecht onder enige en en boeren die gebruikelijk in mest en bestrijdingsmiddelen en het uiteinde een ook in het
grondwater terecht komen maar hij gemiddeld gesproken is grondwater tal van prima kwaliteit en wiskunde ook volstaan met een eenvoudige cijfer in de
lucht in en zand op de raadslieden komen dan op 't meestal half uur en dan maar te laten daarentegen dat is juist het andere eind van 't dan zou je kunnen
zeggen en hij zit in in nederland leidt af van wat je van roken aan één en daarnaast die zijn door de frankrijk en duitsland en belgië althans voor later is ook
geloosd is de oppervlakte laten bevalt dan volledige cocktail aan alles dat als drie je kunt voorstellen is al te lang te water moet zeer uitgebreid gezuiverd
worden en dat doen we ook in nederland wordt in het buitenland was aangeduid als double darts drie mensen hebben heel veel suiker is zes achter elkaar
om dan maar zeker van te zijn dat dat water uiteindelijk toch goed is en heel bijzonder in internationaal verband gebruiken geen gehoor amerikanen die
vinden het om gehoor te gebruiken en drinkwater smaakt ook naar de order uit ook naar voren daar vinden amerikanen aan volkomen normaal en in
nederlandse en een eventuele niet in eerste plaats is daar een inhoudelijke reden voor de namen we weten dat als je door toepast dat gaat reageren met
bepaalde stoffen die van nature in water voor de oma van graniet in dingen en daar kijk je bepaalde resistentie negen producten noemen het loon voor een
is het meest voorbeeld en dat zijn deze ongewenste stoffen zijn stoffen die giftig kunnen zijn en nou daar kan je al gaan zeggen van ik kan daar nog andere
voorstellen maar misschien kan ik niet aan iedereen voldoen maar het in een in nederland onder de russen ned zijn allerlei stoffen die willen gewoon niet en
deze willen worden lang niet gebruiken dat is een bepaald ja essentieel het gaan spelen niet wat ook heel veel consequenties je meester maar wat in
nederland al meer dan dertig jaar gehanteerd wordt en daar is er veel aan gedaan stellen onderzoeken aangedaan dus dat is denk ik één belangrijk punt al
voor dit eerste college om even vasthouden aller blijft gewoon tot die giftige verbindingen en het moet je daarom niet willen hiervan hebben dat in
nederland besloten dat we dat niet willen kunnen doen is het niet te vrezen want praktisch alles kijkt en is dat later met de norm nou alles smaakt en dat
vinden in nederland ook niet aan toch wat later werd ik er aan komt dat moet lekker smaakt en dat moet niet zo fysiek loodzwaar te hebben dat is 't is en
wat ze alles maakt dat willen we voor drinkwater niet is waarschijnlijk andré de eerste keer niet met cultuur en consumentenvertrouwen samen we hebben
een aantal alle risico is iedereen nederland en gebruiken en ik heb daar een stuk of drie vier ziet voor om die kort tegen de zin te laten presteren dan nietrokers op 't is onduidelijk al over gezegd in nederland is ook zo dat we een relatief uh de grotere bedrijven hebben en die je je in zo'n mengsel zijn van
publiek en privaat het zijn geen genetisch en niet iedereen is het water drijft wat hier is dat in het geen maar de aandelen zijn in handen van de gemeente
rotterdam en de provincie uh en andere gemeenten niet gezien is dat niet is is eigenlijk een soort altijd maar net die niet eens in je overheid en het geeft
ook niet bijzonder is het vorige week dan minimaal leraar benoemde tvm ja heel verhalen over eens dat dat eigenlijk een ideale formule is dat je niet meer
zeg maar de waarde van water en later is toch iets niet zomaar in een markt goed is zoals die niet zo makkelijk kunt reguleren zoals andere uh zoals auto's
en andere dingen is 't water heeft ook iets te maken met iets van ons allemaal moeten we niet zoveel meer ja en zo'n publieke verantwoordelijkheid dienen
en niet gaat rechterlijke organisatie nvpi deze adviseert werkt ja dat is alleen iets want een zekere aantrekkelijke kanten heeft en typisch voor nederland
een rol in je heel graag deze doet daar niets aan maar is niet laten sekte in nederland heel goed georganiseerd die heeft een gemeenschappelijke research
institute opgesteld die de aan en waren en het is werk voor de waterbedrijven wordt uitgevoerd en die hebben een belangenorganisatie opgericht de vrede
in en die hebben ook een besloten training de knvb en waren als je hier in de laatste in een ander lid van worden en ja daar is een heel wereldje waarin het
gaat het goed samengewerkt wordt en zo informatie uitgewisseld dan zou d'r iets bijzonders dan nederland is in nederland ook makkelijker lenen niet
natuurlijk en je niet ik kan je niet gemakkelijk te samenwerken te zullen zijn te lezen en milieu ik en dat gaat in nederland allemaal wat makkelijker wat een
infrastructuur werken eten en dan zijn dit en seksuele kenmerken om te de bescherming van de drol dan moet als het in mei en niet het paard achter de
wagen spannen en 't huys te laten nee ik altijd met een zo snel mogelijk een rol en zorgen dat die brons dan blijft de naam zie je overal ja ik ben niet boos
is hiervan wordt zie staan met grondwater benin grondwater bescherming geniet niet verontreinigd en als je het kunt voorkomen dat de de gebruikt
grondwater als mogelijk is te zien het helemaal niet 't noorden het oosten en het zuiden van nederland wordt alleen maar grondwater gebruikt voor de
drinkwater zien daar is grondwater beschikbaar is van goede kwaliteit dat is niet de biologische betalen maar je ziet zelden zoiets wel eens iets andere orde
is daar is de voor deze ronde die gebruiken we dan dus ook nou in het westen van nederland kan dat natuurlijk niet 't weet jij ook echt waarom niet nee
geen thema in 't als leider is niet altijd zonder gestand te bereiken ja maar ook mee en is een echte nadenken is dan een die ze moeilijk zeker laten ze oud
precies en is het grondwater hier is de oudste ontstaan is heel erg duur en is dat ik niet eigenlijk niet de actrice dus ja je kan je geen grondwater gebruiken
deze gebruiken maar oppervlaktewater en een deel dat daarvoor werken hier in het hele duingebied door van oppervlaktewater kunstmatige land laten
maken is hier niet goed raad cynischer rond op de vlakte later dit jaar één gelaten dat de bodem isaac en dan wordt het een soort kunstmatige ontstaan
willen ja ja ja ja in alle hoewel ja maar wat is er een ja dan moet ik even niets meer zeggen dan eerst een hele tijd niet geldt eigenlijk dat de door haar als
gevolg van nieuwe lang niet iedereen die op die dagen de gevallen is dat een zoet water wel op het zoute water drijft is als je heel voorzichtig te laten niet
meer als een te laten winnen dat kan snel fouten aan de orde deze moet je echt wel mee op pad zijn maar het kan net al zei orie zouden daarin waterlinie
in het westen van 't zuid holland en en noord-holland begonnen in de eeuw door gewoon eerst daar later op de grond te alleen met die ze daar fout gegaan
is is oudewater en toen deze met 't niet verteren van grote aantallen nou als je als klant en dat is nou net de hele dag mensen ergens al een aantal ook niet
meer laten kijken en daar wordt dan moet je ook laten laten gebruiken dan heeft geen grondwater rotterdamse heeft ook geen eigen die zien we daar
oppervlaktewater bereiken dan moet je niet heel uitgebreide zuiveringen hebben dus is ook niet eenvoudig en daar gaat het daar te deze man dat is en in
dat is toch nummer één zeer regelmatig dat de later richting in de krant zetten van deze stof moet verboden worden hier moeten bedenkingen aangesteld
worden alle zorgen en dat wat goed is goed blijft nou grondwater laat je denkt dat dit toch al en is dat niet is en als je een mens om met vier dan maakt
van grondwater dat eigenlijk negen later door een gigantische san siro gesteld is er nou dat is goed en over te laten nou wel bij scheveningen hier 't is
gewoon in de natuurlijke dijkstal eitjes wonderen en maal slaat en dan naar voor zuigelingen over voor wat anders te stoppen die daar gelijk in de training
en dan vond hij niet horen daar milieu heeft in het ik niet eens laten wordt eerst voor te zijn met de ander daar in ieder komt daar zakte de bodem in dan in
't weer terecht met met weetjes die her en der führer niet duidelijk geplaatste zijn dan gaat het naar de naar zijn ze niet toe nieuwenhuys te zien staan en
dan vervolgens de distributie net in de gemeenteraad en de meervoudige waar je ns dat ik mensen volstaat eigenlijk onderschat is trouwens geen maar te
zien en te zien dat daar een heleboel stappen achter mekaar zitten hebben gewoon veel afzonderlijke zaken links processen in deze eisen omdat er onder
zeker van te zijn dan zij iets minder werk dat andere ontvangt de veiligheid door die strijd is heel belangrijk en anderzijds om soorten stoffen met zijn drie
systemen tegen te kunnen houden is het woord altijd ja 't gaat altijd om een vrij uitgebreide technische thema's als het over wat water tegelijk ook
moderne technologie nee dus dat zijn dan weer uh ontwikkelingen die de afgelopen decennia zeg maar mogelijk geworden zijn je zien we de men de hand
op de laatste installatie bij heemskerk deze modernste en de grootste zuivering van en niet iedereen in europa zien nederlandse ontwikkeld hier de meest
sexy merk ik nu veel licht is 't zijn eigenlijk gewoon een hele onderwijs is daar je kunnen voorstellen maar die stralen dan is de realiteit en en bacteriën die
kunnen daar niet tegen deze anekdote van de stad is een goeie manier om deze sectie verlaten te bewerkstelligen ja dat is twee jaar geleden in
aanwezigheid van de prins en ook dat is weer een nederlandse ontwikkeling om dezelfde dingen weten te krijgen nou 't staat daar al eens aan dat wat is er
aan de gay aan als de ja op te zetten dan komt er later van een grote kijkt uit naar de zaak gelaten en wat geen verontreiniging en dat valt en ook geen
gehoor wat ook zacht is het waard is ook contacten in nederland komen later op terug en uiteindelijk is 't wild dat daarmee het daardoor dat we in
nederland dan helemaal geen vlees te laten gebruiken althans heel weinig en dat is uiteindelijk drie jaar en in een macro-economisch naar kijkt of zelfs naar
de individuele klant is dat gewoon en dat is dan de zaak want mensen laten is helemaal natuurlijk aan drinkwater en het is eigenlijk in dure is dat veel
slechter voor milieu in het milieu en de slacht van flessen later zijn is een keer zonnetje staan gemaakt met enkel punt nrc over andere niet en ze moeten
allemaal terecht gevoerd worden met acht waren ze er niet moeten niet schromen maakt worden enzovoort en als die stomme idee maakt dan is het milieu
220
Annex E. Speech recognition
in beslag van flessen water dertig keer zo hoog als zanger drinkwater zes één en op de achterkant van zich radeloos is zo'n beetje maakt vooral naar de
nederlander verlaat de kleintjes zijn italianen denis die italiaan twee tot drie keer zoveel tijd voor en later dan de nederlander en dat zit 'm vooral in het feit
dat mensen als ze later uit te drinkwatervoorziening zelf kosten daarvan zijn niet meer gelijk naar want en dat is ook een kenmerk van dit soort
grootschalige ingestudeerd om iets goed te doen meer gouden gezaaid geringer veilige systemen maken dus niet zo heel veel duurder dan alles te doen het
van de korsten zitten al in dat je moet met de niet te maken je moet een zuigeling en wie moet dat worden leidingen distributie leidingen heel veel kosten
hier kiezen wie ze ook niet goed is niet veel duurder dan is het slecht doet ik denk dat ik er ben al jaren doen we hebben ook nog andere dingen dus we
hebben 't laatste leek presentatie van de wereld en hele betrouwbare systemen en we letten tegenwoordig natuurlijk in nederland op water besparing water
is toch de natuurlijke grondstoffen moet je niet willen is het water gebruikt in nederland stijgt niet is relatief constante het huishoudelijk water gebruikt
daalt zelf met een tegenwoordig maar 't is maar net van het en douches en uh uh was misschien iets zou hebben en die worden ook andere stimuleert
crisisteam op enzovoort hè dus we zijn allemaal verantwoord mee mee zegt en dan hebben we de laatste tien jaar deze uit uiteindelijk deelstaat van die
hele filosofieën en indien india gedaan zijn de afgelopen dertig jaar is is dat we zijn ervan hebben 't wonderen uit de kraan waarschijnlijk aan de ene keer
iets van de latere reizen naar drie jaar geleden toen posters van gemaakt en reclame op radio en tv het zonder uit te keren aan heel goed laat en het is er
altijd worden niet ziek van de gevel treinen gingen we hebben geen flessen water nodig we hebben geen filters aandenken aan de nodige verspelen 't water
niet deze hebben is daar goed voor mekaar nou nee niet zeggen dat je een beetje als jij zie je de wereld zijn best wel dingen die nog beter kunnen en beter
moeten bijkomen wat terug maar qua stilistisch genie zeker in vergelijking met haar naar amerika bijvoorbeeld is dat gewoon zo aan de andere kant moeten
we ons ook realiseren dat er natuurlijk ook heel anders kan en in een heel veel landen ook heel anders gaat het meest extreme voorbeeld daarvan zijn tien
ontwikkelingslanden waaronder basis in first die nog volledig ontbreekt en daar gaat na de pauze door is er welgeteld aan uh even pauzeren met
Annex E. Speech recognition
221
Annex E3. Speech recognition (SHoUT) compared to human made
subtitles
Human transcription (Erwin)
Na een ruime inlooptijd
kunnen we beginnen
met het tweede deel van 30-11, Watermanagement.
Het deel over gezondheidstechniek ga ik
de komende zeven weken met jullie doornemen.
En ik dacht, ik zal me eerst eens even aan jullie voorstellen dus
mijn naam is Hans van Dijk, zoals jullie daar zien staan
en ik dacht, laat ik daar maar twee dingen voor nemen,
mijn hobby en mijn werk.
Nou de hobby dat zien jullie,
ik ben een marathonloper.
Een mooie foto van de glorieuze
binnenkomst in Rotterdam
in april afgelopen periode.
Marathonlopers dat zijn allemaal een beetje fanatieke lui he,
echte doordouwers, die trainen iedere dag.
Die weten hun leven zodanig te organiseren dat dat allemaal kan.
Dus ik loop hier ook iedere dag tussen de middag een rondje naar
Delfts hout, of langs de Schie,
of een ander parcour hier.
Als jullie me eens een keer in korte broek of trainingspak zien lopen
dan klopt dat, dat ben ik.
En dat doe ik inmiddels met een heel groepje mensen,
bij ons op de afdeling, met studenten en promovendi.
En een van die studenten is hier weergegeven, dat is Karin Teunissen.
Die zat drie jaar geleden hier bij inleiding watermanagement.
Was toen derde jaars, inmiddels is ze afgestudeerd
en begonnen met een promotieonderzoek bij het duinwaterbedrijf in
Scheveningen.
En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april
42 kilometer samen gelopen.
Nou dat is een herinnering die ons beide in het geheugen gegrift zal
blijven.
Dan het werk.
Ik heb, ik ben, ja een vraag,
ben jij ook een hardloper?
- Sorry?
Ben jij ook een hardloper?
- Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al
gegeven.
Nee
- Niet? Dan weet ik niet hoe ik dit al wist, maar...
Nou ik zeg dit wel eens vaker, dus dat zou best kunnen.
Waar ben je geweest?
- Ja volgens mij vorig jaar, maar...
Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha.
Maar deze foto is echt van april hoor dus dat is toch vrij recent.
Wat misschien zou kunnen zijn is,
ik geef ook altijd een van de gastcolleges
bij inleiding Civiele Techniek in het eerste jaar.
En daar begin ik natuurlijk ook een beetje met, ja wie ben ik,
dus dat zou best kunnen, dat je het daarvan herinnert.
Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd.
Ik heb toen ook Civiele Techniek gestudeerd, in '76 afgestudeerd.
Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in
Amersfoort,
en dat kan ik jullie van harte aanraden als je straks afgestudeerd bent
om bij een ingenieursbureau te gaan werken.
Dat is een geweldige ervaring,
je bent met allerlei projecten over de hele wereld bezig.
In mijn geval dan drinkwater projecten.
Machine speech recognition (Shout)
... en aan uh ... ja ... en daarin lopen ... we ...
daar is ... zeker ... geen en uh ... we ...
met het ... hele heelal van uh ... dertig ... elf ... en later naar ... het
cement ...
't cd lover ... gezondheidss en niet werd hij ...
de komende zeven weken met jullie uh doornemen ...
... en ik dacht ik zelf mee zeker je je voorstellen ... is ...
mijn naam is als een tank zoals jullie daar ... zien staan ...
en ik dacht laat ... ik daar maar twee dingen van één en een half ...
jaar naar werk ...
ja ... nou ... willen ... niet inzien ja ...
ik ben marathonloper ...
een mooie foto van de glorieuze ...
... binnenkomst in uh ... rotterdam ...
in ... april ... afgelopen periode ...
en marathonlopers dat zijn allemaal een beetje fanatieke leidde er ...
echter door de ouders die ... twee iedere dag ...
die je er beter inleven zelf de aandacht te organiseren dat het allemaal
... kan ...
... dus ik loop hier ook ... iedere dag tussen de meer dan ooit je naar
delft houdt of langs deze rivier
of ... andere koerier ... en uh ...
als jullie d'r ... is een keer in korte broek als d'r is ... vaccineren lopen
dan klopt dat we daar niet ...
en uh dat door tieners met een heel groepje er mensen zijn ...
... bij ons op de afdeling met studenten en promovendi ...
en één van die je ergens tien cent en ... is zeer ... tevreden met ...
daarin trainen zijn ...
diezelfde drie jaar ... geleden leerde hij jij ... inleidingen en ... laat de
meeste mensen ...
was ... een derdejaars inmiddels is afgestudeerd ...
... dat ... heeft ... hij ... er twee ...
ik ben ... ja ... ja ... ja ... ja ja ... dat was ...
... nou ... als ... ik zeg niet als ... derde is ... dat zou mensen kunnen
...
daar waar deze leest hè ...
... ja ... tuurlijk vorig jaar ... lang werden ... afgegeven in alle klassen
...
... het ... is ... een ... stuk ...
... zeg ...
maar dat is ... wat er is echter laten ... we deze wist dat ... hij ... zijn
dus uh ...
... wat te zien zou kunnen zijn ...
in eerste reden ... is ook altijd ze er één van de gast gelezen
medelijden is ... niet echt niet het eerste jaar ...
en daar ... is ... natuurlijk ook een beetje net diary ben ik ...
er dus dat zou best ... kunnen missen je daar je daarvan ... in het ...
... nou dan weet je ... nou ... dat ... weet jij hier ... dertig jaar ...
geleden de laatste set neer te ...
zetten dat is een techniek verste dertien zes en zeventig ...
afgestudeerd... en ...
daarna de reeks werken aan werk erbij ... en is hier ... wel ... meer ...
dingen ... zei in amersfoort ...
en dat kan ik hier niet van harte aan uh ... raden als je straks
afgestudeerd ben
om daarin niet hier is ... d'r oren te gaan werken ...
... we zeggen wel ... ervaring
je bent met ... allerlei projecten over de hele wereld bezig
in mijn geval dan drinkwater ... projecten
<en verder>
A row in the table corresponds with a <speech>-element in the Shout output
A word (delimited with blanks) in the Shout output corresponds with a <word>-element
222
Annex E. Speech recognition
Recovery of words by SHoUT
Line nr.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
Words in
human subtitles
4
3
8
6
7
13
11
11
5
6
4
6
3
4
9
6
11
13
6
5
13
6
10
9
12
9
7
10
14
4
14
3
7
5
1
5
16
1
11
12
4
6
14
0
0
14
6
8
8
13
10
13
12
12
13
7
5
10
6
Total sample
471
443
Subset
(excluding
conversation)
401
411
Annex E. Speech recognition
Words in SHoUT
Number
8
7
13
8
8
11
11
14
3
6
3
6
4
4
10
8
14
14
6
5
14
6
12
9
16
13
6
0
0
0
0
5
9
0
0
0
0
0
0
12
5
0
10
4
1
16
6
13
8
11
13
17
12
16
16
10
4
10
6
Relative
200%
233%
163%
133%
114%
85%
100%
127%
60%
100%
75%
100%
133%
100%
111%
133%
127%
108%
100%
100%
108%
100%
120%
100%
133%
144%
86%
0%
0%
0%
0%
167%
129%
0%
0%
0%
0%
0%
0%
100%
125%
0%
71%
114%
100%
163%
100%
85%
130%
131%
100%
133%
123%
143%
80%
100%
100%
Words recovered by SHoUT
Number
Relative
0
0%
1
33%
4
50%
1
17%
7
100%
5
38%
8
73%
9
82%
1
20%
1
17%
3
75%
6
100%
3
100%
4
100%
7
78%
3
50%
6
55%
10
77%
4
67%
1
20%
8
62%
3
50%
7
70%
9
100%
4
33%
3
33%
5
71%
0
0%
0
0%
0
0%
0
0%
1
33%
2
29%
0
0%
0
0%
0
0%
0
0%
0
0%
0
0%
6
50%
1
25%
0
0%
4
29%
2
4
5
3
8
7
8
5
4
11
4
1
10
6
14%
67%
63%
38%
62%
70%
62%
42%
33%
85%
57%
20%
100%
100%
94%
215
46%
102%
204
51%
223
224
Annex E. Speech recognition
Annex F.
1.
2.
3.
4.
5.
6.
7.
Searching
Information retrieval in multimedia content ............................................ 227
Subfields of information retrieval ........................................................................ 227
Indexing by means of speech recognition ........................................................... 228
Metadata storage ...................................................................................... 230
Data levels ........................................................................................................ 230
Density of data levels ........................................................................................ 231
Tag clouds ................................................................................................. 232
Types ............................................................................................................... 232
Computation of the tag size ............................................................................... 232
Foundation for search ........................................................................................ 233
Creating a tag cloud .......................................................................................... 233
Tag clouds based on data level .......................................................................... 236
Assessment of tag clouds by lecturer ....................................................... 243
Assessment approach ........................................................................................ 243
Original tag clouds ............................................................................................ 243
Modified tag clouds ........................................................................................... 244
Searching in recorded lectures ................................................................. 249
Evaluation of searching in recorded lectures ............................................ 251
Comparing subtitles and ASR output in search .................................................... 251
Searching for different text-types ....................................................................... 252
Duration per text type ....................................................................................... 253
Multiple-keyword search .................................................................................... 254
Keyword search for all lectures .......................................................................... 254
Multiple keyword search for all lectures .............................................................. 255
Precision and recall measurement ...................................................................... 256
Ranked search results ....................................................................................... 260
Evaluation ................................................................................................. 264
Annex F. Searching
225
226
Annex F. Searching
1.
Information retrieval in multimedia content
Subfields of information retrieval
Information Retrieval (IR) is the discipline of finding information in collections. These can be
divided into several subfields:
• image retrieval
• video retrieval
• text retrieval
Typically research on automatically solving the representation mismatch is done in image and
video retrieval. For text retrieval, in general both the query and the collection are text based
so that there is no representation mismatch.
Image retrieval
In Content Based Image Retrieval (CBIR), images are retrieved from a collection of images
based on an index that is generated by automatically analyzing the content of the images.
Mostly the images are retrieved by keyword/key-phrase queries or by query by example. In
the query by example task, images are retrieved that contain similar content as an example
image that is used as query. Although the query images and the images in the collection are
of the same modality, it is not possible to compare them directly. The representation of both
query and collection need to be altered. In order to compare the images, for each image a
mathematical model, or signature, is created. This signature contains low-level information
about the picture such as shape, texture or color information.
Video retrieval
Where image retrieval focuses on standalone images, in content-based video retrieval, the
goal is to support in searching video collections. For this purpose, various methods of
abstracting information from the video recordings are employed. Because video consists of a
sequence of still pictures that are played rapidly after each other, in video retrieval a lot of
image retrieval techniques can be re-used, but also other techniques are used such as for
example detecting scene changes or recognition of text that is edited in the video (like
people's names). Because most videos contain people speaking, it is also possible to use
speech as a source of information.
Spoken document retrieval
Speech, in most multimedia archives, is a rich source of information for solving the
representation mismatch. Sometimes it is even the only reliable source of information. Radio
shows or telephone recordings do not contain any video. They might contain some music or
sound effects, but generally for those examples most information is in the speech.
Spoken Document Retrieval (SDR) is a subfield of information retrieval that solely focuses on
the use of speech for retrieving information from audio or video archives. In the most widely
studied form of SDR, in order to solve the representation mismatch the speech is
automatically translated into written text by Automatic Speech Recognition (ASR) technology.
The output of this process, speech transcriptions, can be used in a retrieval system (see
Figure 1.1). The transcriptions contain the exact time that each word is pronounced so that it
is possible to play back all retrieved words. This method is similar to the earlier mentioned
example of an index in a book where the page number of each word is stored. Both such an
index and speech transcriptions are often referred to as metadata. Metadata is data about
data. In the speech transcription case, the words and the timing information provide
information about the actual data, the audio recordings.
Annex F. Searching
227
Figure 1.1: Solving the respresentation mismatch between content and query in an SDR system
The speech from multimedia documents is translated into written speech transcriptions by the
ASR component. As the query is already formulated in written text, it does not need to be
translated and can be used directly by the retrieval component to and relevant video
fragments.
If the speech transcriptions would always contain exactly what is being said, the performance
of the text retrieval system would be equally good as when searching in written text. In
general ASR systems are not perfect and any word that is recognized incorrectly, potentially
introduces errors in the retrieval component. This was illustrated by the cross recognizer
retrieval task during the seventh Text Retrieval Conference (TREC-7) in 1998 organized by
the National Institute of Standards and Technology (NIST). Participants of the benchmark
evaluation used speech transcriptions of varying quality to perform text retrieval. The results
showed that although the speech transcriptions didn't have to be perfect in order to obtain
good retrieval performance, there was a significant correlation between the quality of the
transcriptions and the performance of the retrieval system. This illustrates that the success of
an SDR system is highly depending on the performance of the ASR component.
(Source: http://wwwhome.cs.utwente.nl/~huijbreg/publications/thesis_Marijn_Huijbregts.pdf)
Indexing by means of speech recognition
The amount of metadata attached to multimedia collections that can be used for searching is
very much dependant on the available resources within the organizations that create or own
the collections. Large national audiovisual institutions such as Sound&Vision in the
Netherlands put a lot of effort in archiving their assets and they label collection items with at
least titles, dates and short content descriptions.
When creating a more detailed archive of textual data, the speech in audio is an important
information source that, once transformed into text and/or enriched with linguistic
annotation, can enable the conceptual querying of video content. The basic idea is to use
automatic speech recognition technology to generate such a linguistic annotation of textutal
representation and to use this as (a source for) automatically created metadata that can be
used for searhing by applying standard text-based information retrieval techniques.
(Source: Multimedia Retrieval by Henk Blanken…)
SHoUT is a software package that has been developed at the University of Twente at the
chair Human Media Interaction by promovendus Marijn Huijbregts. He was doing a PHD
project titled "Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled".
ShoUT is a Dutch acronym for "Speech Recognition Research at the University of Twente"
which is a speech recognition system based on machine learning techniques that are
commonly used. It is used to do research on Large Vocabulary Continuous Speech
Recognition (LVCSR), but the speech/non-speech detector and the speaker diarization
application can be used separately. It is written in C++ on a Linux platform.
(Source: http://wwwhome.cs.utwente.nl/~huijbreg/shout/)
228
Annex F. Searching
Figure 1.2: Logo of the chair Human Media Interaction at the University of Twente
(Source: http://hmi.ewi.utwente.nl)
Annex F. Searching
229
2.
Metadata storage
Data levels
To make the lecture searchable, the data first needs to be properly structured. This can be
done by creating a database and store the metadata in several tables. The accompanying
Entity-Relation-Diagram can be seen in Figure 2.1.
Figure 2.1: Entity-Relation-Diagram of recorded lecture database
When looking at the available lecture metadata, several data levels can be distinguished.
Each of these levels has a certain relevance factor which is different for each lecture. The
different levels and their respective source are shown in Table 2.1. An illustration of where
this data has been derrived from can be seen in Figure 2.2.
Table 2.1: List of Text_types and their respective source
Name
Lecture title
Lecture chapter
Slide title
Slide content
Slide notes
Transcript (lecture)
Transcript (slide)
Transcript (sentence)
Transcript (word)
Source
Lecturer after post-processing
Lecturer after post-processing
PowerPoint slides
PowerPoint slides
PowerPoint slides
Video of lecture
Video of lecture
Video of lecture
Video of lecture
Figure 2.2: Sources for each text_type in the database
230
Annex F. Searching
This list has been ordered by expected relevance. A slide title has a lot more probative value
as a keyword compared to a random word in the transcript. Therefore if someone runs a
search on this data, a higher relevance should be assigned to results that come from the slide
content, as opposed to the transcript. This has been made visible in tables 3, 4, 5 and 6,
where an example of all the data levels is shown for lecture CT3011.
The advantage of this database structure is that it offers a total freedom for users to store
whatever type of metadata they want. The only required elements are a match to a video
lecture, a timeframe to which the metadata is pertinent and Text_type that constitutes the
category of the metadata stored.
Density of data levels
With all the data inserted in the database, a word count can be made on each data level. It
shows the density of words for each text type, with the number of records, words and
characters sorted by category.
Table 2.2: List of Text_types and the amount of records and words in the database for course CT3011
Name
Lecture title
Lecture chapter
Slide title
Slide content
Slide notes
Transcript (lecture)
Transcript (slide)
Transcript (sentence)
Transcript (word)
Nr of records
28
116
1,183
1,042
10
1
28
779
118,926
Nr of words
129
300
3,900
15,943
804
6,970
6,970
6,970
188,926
Nr of characters
917
2,526
28,741
129,195
5,102
41,407
41,383
40,623
808,482
The grayed out rows only show the data given for lecture 15 by Hans van Dijk, since this is
the first sample lecture that was used for manual human-made subtitles. In Table 2.3, the
numbers in rows "Slide notes", "Transcript (lecture)" and "Transcript (sentence)" have been
multiplied by 28 (the total number of lectures in course CT3011). The row "Transcript (slide)"
has been adjusted to the total number of slides in course CT3011. This has been done to give
a more complete picture of the density of each category.
Table 2.3: List of Text_types and the amount of records and words in the database for course CT3011
Name
Lecture title
Lecture chapter
Slide title
Slide content
Slide notes
Transcript (lecture)
Transcript (slide)
Transcript (sentence)
Transcript (word)
Nr of records
28
116
1,183
1,042
280
28
1,183
21,812
118,926
Nr of words
129
300
3,900
15,943
22,512
* 179,480
* 179,480
* 179,480
188,926
Nr of characters
917
2,526
28,741
129,195
142,856
* 768,058
* 768,058
* 768,058
808,482
* 95% of the total number of words generated by SHoUT, based on the comparison between the human-made
subtitles and the SHoUT subtitles
Annex F. Searching
231
3.
Tag clouds
A tag cloud or word cloud (or weighted list in visual design) is a visual depiction of usergenerated tags, or simply the word content of a piece of text. It is mainly used to describe
the content of web sites. Tags are usually single words and are typically listed alphabetically,
and the importance of a tag is shown with font size or color, thus both finding a tag by
alphabet and by popularity is possible. The tags can become hyperlinks that lead to a
collection of items that are associated with a tag.
Figure 3.1: Example of a tag cloud with terms related to Web 2.0
(Source: http://en.wikipedia.org/wiki/Tag_cloud)
Types
There are three main types of tag cloud applications in social software, distinguished by their
meaning rather than appearance. In the first type, size represents the number of times that
tag has been applied to a single item. This is useful as a means of displaying metadata about
an item that has been democratically "voted" on and where precise results are not desired.
Examples of such use include http://last.fm (to indicate genres attributed to bands) and
http://www.librarything.nl (to indicate tags attributed to a book).
In the second, more commonly used type, size represents the number of items to which a tag
has been applied, as a presentation of each tag's popularity. Examples of this type of tag
cloud are used on the image-hosting service Flickr, blog aggregator Technorati and on Google
search results with DeeperWeb.
In the third type, tags are used as a categorization method for content items. Tags are
represented in a cloud where larger tags represent the quantity of content items in that
category. More generally, the same visual technique can be used to display non-tag data, as
in a word cloud or a data cloud.
Computation of the tag size
for ti > tmin; else si = 1
si: display fontsize
fmax: max. fontsize
ti: count
tmin: min. count
tmax: max. count
232
Annex F. Searching
Foundation for search
For searching within a video lecture, the subtitle files can be used as a basis. This transcript
includes all the words that were spoken by the lecturer, including the specific timeframe that
the word was mentioned. Below is a list of interesting statistics about the subtitles of lecture
CT3011, Introduction Water Management:
Table 3.1: Analysis of transcript for lecture CT3011
Length
Number of words
Number of subtitle sentences
Average words per minute
Average words per second
45:09
6,970
779
154.3
2.57
By inserting all the words into a SQL Server database table, it became possible to do a further
analysis on them, through the use of the following query:
SELECT DISTINCT word, COUNT(word) AS count FROM CT3011_transcript
GROUP BY word ORDER BY count DESC;
The result was a list of words in order of their most frequent use, which is shown in Table
3.2.
Table 3.2: Top 20 most common words in transcript of lecture CT3011
Nr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Word
dat
de
en
het
is
een
van
in
ook
we
die
je
dus
ik
dan
daar
niet
zijn
op
met
Count
269
231
225
220
181
174
162
151
134
128
113
107
93
88
80
76
55
54
54
53
If you take a closer look at these words, you will immediately notice that most of them aren't
very useful for searching. The most frequent words used are short senseless words like
"and", "a" and "that", when people are generally interested in keywords that say something
meaningful about the subject.
Creating a tag cloud
Once this list of most frequently used words is available, it's possible to generate a tag cloud,
based on these words. A good example of a free tag cloud generator is Wordle (Source:
http://www.wordle.net). It provides two ways of providing input, either by pasting in a bunch
of text through a single text field, or by entering a list of words combined with a relevance
weight (such as word count). Both of these input options are shown in Figure 3.2 and Figure
3.3.
Annex F. Searching
233
The website will then generate an image using JAVA and create a tag cloud based on the
word count of each individual word. The larger the frequency, the larger the word becomes in
the generated image. Figure 3.4 shows an example of such a tag cloud of our transcript of
lecture CT3011.
Figure 3.2: Input of a text block in Wordle
Figure 3.3: Input of weighted words Wordle
Figure 3.4: Tag cloud of entire transcript CT3011 by Wordle
As predicted, the results of this aren't very good since the words that are deemed "most
important" based on their word count, are all the small and meaningless common connector
words. Luckily, Wordle offers the option of removing common words in each language. When
turning this feature on, the results become a lot more useful, as shown in Figure 3.5.
234
Annex F. Searching
Figure 3.5: Tag cloud of entire transcript CT3011 by Wordle with common Dutch word removal
There are two obvious problems with this generated tag cloud. The amount of words used is
too big, which makes it hard to comprehend and the words selected by the generator aren't
equally relevant to the subject of the lecture. Also, there are still a bunch of meaningless
words included in the tag cloud, such as "natuurlijk", "nou" and "jullie".
The problem of returning is a cloud that is too dense to comprehend is easily solved. Wordle
allows for the possibility of setting a limit to the maximum number of words returned. This
way, there's a lot less clutter in the tag cloud and the keywords stand out a lot clearer, as is
shown in Figure 3.6.
Figure 3.6: Tag cloud of entire transcript CT3011 by Wordle with common Dutch word removal and a maximum
number of 25 words
After showing several of these tag clouds, using different settings offered by Wordle, it seems
clear that in order to increase the relevance of the tag cloud a certain selection of words has
to be made. Using the word count to assign a relevance factor seems to be working, since a
lot of keywords about lecture CT3011 are returned, there needs to be something done about
the meaningless words that are show.
A simple solution for this is to create a list based on all the words in a transcript, sorted by
frequency and to select only the nouns. This selection will remove a lot of the smaller words
and only come up with a list of genuine keywords or words that have a decent relevance
compared to the text. This list of all nouns spoken in transcript of lecture CT3011 is shown in
Table 3.3.
Table 3.3: Top 15 most used nouns in transcript of lecture CT3011
Nr
1
2
3
4
5
6
7
Word
water
Nederland
grondwater
jaar
drinkwatervoorziening
dingen
oppervlaktewater
Annex F. Searching
Count
39
36
21
16
16
16
15
235
8
9
10
11
12
13
14
15
boek
keer
drinkwater
plaatje
vragen
chloor
soort
stoffen
15
13
13
11
10
10
9
9
By running this new list of keywords through the same tag cloud generator from Wordle, a
new picture is generated (see Figure 3.7). There is an obvious difference in relevance and the
amount of words shown is also a lot smaller, which makes the overall image much easier to
comprehend. It is no longer a chaotic mess filled with meaningless words.
Figure 3.7: Tag cloud of nouns in transcript CT3011 by Wordle with common Dutch word removal
Tag clouds based on data level
Now that several data levels have been distinguished, it's possible to create different tag
clouds based on the corresponding information sources. That way, the relevance and
effectiveness of the data can be compared by looking at the output generated by the Wordle
tag cloud generator.
Table 3.4: List of slide titles and their respective timeframe of lecture CT3011
Nr
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
236
Title
Civiele Gezondheidstechniek
Overzicht 3011, deel Gezondheidstechniek
Drinkwater- principes en praktijk
Colleges
Wat is Gezondheidstechniek?
Schoon water voor een gezond leven..
Wat is Gezondheidstechniek?
Schoon water voor een gezond leven..
Wat is Gezondheidstechniek?
Wat is Gezondheidstechniek?
Wat is Gezondheidstechniek?
Drinking water and Delft
Wat doet een waterleidingingenieur? Studies
Wat doet een waterleidingingenieur? Ontwerpen
Wat doet een waterleidingingenieur?
Wat doet een waterleidingingenieur?
Wat doet een waterleidingingenieur? Onderzoek
Dutch drinking water: principles and practices
Drinking water in the Netherlands
Principles and practices:1
Principles and practices: 2
Source protection
Groundwater
Artificial recharge
Multiple barriers…
Timeframe
0:00 – 7:28
7:28 – 9:48
9:48 – 10:20
10:20 – 13:12
13:12 – 13:47
13:47 – 13:50
13:50 – 14:39
14:39 – 15:46
15:46 – 16:39
16:39 – 17:20
17:20 – 18:06
18:06 – 18:50
18:50 – 20:39
20:39 – 23:11
23:11 – 24:07
24:07 – 25:09
25:09 – 27:08
27:08 – 29:47
29:47 – 33:35
33:35 – 35:49
35:49 – 38:48
38:48 – 39:08
39:08 – 39:27
39:27 – 40:02
40:02 – 40:36
Length
7:28
2:20
0:32
2:52
0:35
0:03
0:49
1:07
0:53
0:41
0:46
0:44
1:49
2:32
0:56
1:02
1:59
2:39
3:48
2:14
2:59
0:20
0:19
0:35
0:34
Annex F. Searching
26
27
28
29
Modern technology…
Principles and practices: 3
Principles and practices: 4
The miracle from the tap
40:36
41:28
43:23
43:59
–
–
–
–
41:28
43:23
43:59
45:09
0:52
1:55
0:36
1:10
Table 3.4 shows every slide title, the timeframe in which the slide is shown during the lecture
and the corresponding amount of number of minutes and seconds of the slide length. The
average time that a slide is shown for lecture CT3011 is 1 minute and 33 seconds.
When using the same tag cloud generator employed earlier to generate a tag cloud based on
all the slide titles, it is interesting to see the differences between the two (see Figure 3.7 and
Figure 3.8). The most obvious difference is the most frequently used word. Lecture CT3011 is
titled "Civiele gezondheidstechniek", yet in the first tag cloud based on the entire transcript,
either of these words doesn't even appear. When creating another tag cloud that only
incorporates the slide titles, interestingly enough the most frequently used word is
"Gezondheidstechniek".
Figure 3.8: Tag cloud of slide titles of lecture CT3011 with common Dutch word removal
Table 3.5 shows all the content that is displayed order by slide. It is clear that most data that
is presented here are keywords for the lecture. For searching, the data that comes out of
these slides is generally very good.
Table 3.5: List of slide content of lecture CT3011
Nr
1
2
3
4
5
Content
Prof. Hans van Dijk
Boek Drinkwater-principes en praktijk verkrijgbaar bij Mieke Hubert, kamer 4.55
Vraagstukken in boek
Computer assignments op blackboard
Oude tentamens op blackboard
7 colleges conform schema
Gezondheidstechniek 3011
Drinkwaterbedrijven 3011
Planning en ontwerp 3420
Financiën 3420
Waterverbruik 3011
Waterkwaliteit 3011
Grondwater 3420
Oppervlaktewater 3420
Distributie 3011
27 sept. Inleiding gezondheidstechniek
1 okt. Waterkwaliteit 1: eisen/micro
4 okt. Waterkwaliteit 2: natuur/chemie
8 okt. Drinkwaterbedrijven 1: grondwater
11 okt. Drinkwaterbedrijven 2: oppervlaktewater
15 okt. Waterverbruik
18 okt. Distributie
Excursie naar de Berenplaat op 11 oktober na college
grondwater
drinkwater
oppervlaktewater
Annex F. Searching
237
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
riolering
afvalwater
aantal per 100.000 inwoners
% niet- aangesloten
jaar
70 60 50 40 30 20 10 0
1919 1945
1900 1925 1950 1975
70 60 50 40 30 20 10 0
grondwater
drinkwater
oppervlaktewater
riolering
afvalwater
aantal per 100.000 inwoners
% niet- aangesloten
jaar
70 60 50 40 30 20 10 0
1919 1945
1900 1925 1950 1975
70 60 50 40 30 20 10 0
Goede waterkwaliteit ten dienste van mens en milieu
kennis van:
waterwinning
waterzuivering
watertransport
waterchemie
microbiologie
Gezondheidstechnisch ingenieur maakt gebruik van
kennis van:
hydraulica
hydrologie
constructieve vormgeving
informatica
projectrealisatie
van groot belang voor de volksgezondheid
grootschalige gespecialiseerde infrastructuur
goed georganiseerde sector met heldere taken
Prof. ir. Hans van Dijk
27 september 2009
Total volume:1.2 x 109 m3/jaar
Sources
Groundwater: 2/3
Surface water: 1/3
Treatment
Groundwater: aeration and sand filtration
Surface water: very extensive treatment
Distribution
no chlorine!
Focus on public health…
Large publicly owned private companies…
With joined efforts for research and communication
Source protection
Safe groundwater when available…
Or artificial groundwater…
Or surface water with multiple barriers for micro-organisms, pollutants and nutrients…
22
238
Annex F. Searching
23
24
25
26
27
28
29
High quality water without chlorine…
And with a low hardness…
So the customers drink water from the tap
No leakage…
Reliable systems…
Stimulate water saving…
High quality water supply
No waterborne diseases
No chlorine
No pesticides
No hard water
No corrosion and metals
No leakage
No need for home filters
No need for bottled water
No wasting of water
Figure 3.9: Tag cloud of slide content of lecture CT3011 with common Dutch word removal
Figure 3.10: Tag cloud of slide titles and content of lecture CT3011 with common Dutch word removal
In Table 3.6, the notes that Hans van Dijk added to his PowerPoint slides are presented. The
content of this data obviously differs a lot per lecturer. In this specific example, most of the
notes are parts of the story that the lecturer wants to tell, which is why the text looks a lot
like a transcript.
Table 3.6: List of slide notes of lecture CT3011
Nr
5
6
Notes
De infrastructurele werken van drinkwater en afvalwater of wel de kleine waterkringloop of
stedelijke waterketen zoals ons segment in het DC heet. Dus het onttrekken van water aan
de grote hydrologische kringloop, het zuiveren en distribueren van drinkwater, inzamelen van
afvalwater en tenslotte het zuiveren van afvalwater voordat we het weer aan de grote
kringloop teruggeven, zoals we dat eufemistisch uitdrukken, ofwel voordat we het afvalwater
weer lozen op het oppervlaktewater.
We hebben daarbij een heel helder doel, namelijk het bevorderen van de volksgezondheid en
Annex F. Searching
239
8
9
11
12
13
14
15
20
240
dat we daarbij zeer succesvol zijn blijkt uit de grafiek die laat zien dat buiktyphus in de vorige
eeuw uit Nederland werd uitgebannen door de aanleg van de drinkwatervoorziening. Dat
zuiver drinkwater van groot belang is voor de volksgezondheid weten we dus al een eeuw,
maar we moeten er steeds alert op blijven, ook met nieuwe bedreigingen zoals SARS.
Als toepassingsgerichte ingenieurs maken we bij het ontwerpen van drink- en
afvalwaterinstallaties natuurlijk gebruik maken van een groot aantal civiele domeinen, zoals…
Enkele bijzondere kenmerken van de gezondheidstechniek zijn dat het een heel
gespecialiseerd en goed georganiseerd wereldje is met heldere taken en veel aandacht voor
kwaliteit.
Van oudsher spelen civiel ingenieurs een vooraanstaande rol in de sector, hoewel ook wij
geconfronteerd worden met concurrentie van de bedrijfskundigen, economen en juristen.
Vroeger behoorde een waterleidingdirecteur toch wel civiel ingenieur te zijn, nu zijn er van de
15 directeuren nog 4 civiel.
Wat doet een gezondheidstechnisch ingenieur?
Ja, net als iedere andere civiel verricht hij studies, soms op momenten dat de rest van
Nederland voor de TV zit, zoals het linker plaatje laat zien. Bij de kwart-finale van de WK van
1998 Nederland – Argentinie bleek dat het waterverbruik een perfecte indicator is van het
wedstrijdverloop. Zodra de wedstrijd begint, daalt het waterverbruik snel met minima tijdens
de doelpunten en vlak voor de rust en vlak voor het einde . Tijdens de rust en na de
wedstrijd gaat iedereen meteen naar de WC en zien we een enorme piek in het
waterverbruik. Interessant is nog de zogenaamde Cruijf-dip als JC commentaar gaat geven in
de rust; we zien dat circa 10 % van de mensen dan even terugkomt van de WC.
Ja, ontwerpen doen we als alle civielen natuurlijk het liefste. Ook bij ons gaat het dan om
schematiseren en berekenen, waarbij natuurlijk geen fouten gemaakt moeten worden want
zo'n implosie t.g.v waterslag als bij het koolfilter op de middelste foto valt natuurlijk wel op
en staat niet zo best op je conduitestaat (niet door Delfts civiel ingenieur ontworpen)
U vindt ons dikwijls op de mooiste plekjes van Nederland, zoals bos, heide en de duinen waar
we dat "Zuiver water uit een schoon milieu" denken te vinden, maar ook aangename
excursies kunnen organiseren, zoals het hooglerarenuitje van enkele jaren geleden naar
Scheveningen.
Groot project in Limburg, WPH.
Een voorbeeld van een recent project waar ik in mijn DHV -tijd nog medeverantwoordelijk
voor geweest ben is WPH, een groot project dat als doel had om de grondwaterwinning in
Limburg te verminderen (i.v.m. verdroging) en deels te vervangen door oppervlaktewater.
Nabij Heel ligt een voormalige grindwinplas (de Lange Vlieter) die we ingericht hebben als
bekken voor de drinkwatervoorziening. Vanuit het Lateraalkanaal wordt water in het bekken
gepompt, waarna het op natuurlijk wijze infiltreert in de bodem en via winputten op enige
afstand en na minimaal 60 dagen van het bekken weer wordt opgepompt. Het grote voordeel
van deze bodempassage is dat bacteriologisch betrouwbaar water gewonnen wordt. Het
water wordt vervolgens gezuiverd m.b.v. zand en koolfilters en via in totaal 100 km
transportleiding getransporteerd naar de bestaande (grondwater) pompstations in Middenen Noord Limburg, waar het vervolgens wordt opgemengd met het lokaal gewonnen
grondwater en gedistribueerd. Capaciteit 20 miljoen m3/a, Investeringen 300 miljoen, in
guldens weliswaar.
Onderzoek doen we niet altijd op van die idyllische plekjes zoals tijdens het veldpracticum in
Luxemburg op de linker foto, maar toch wel vaak op locatie omdat de ene waterkwaliteit nu
eenmaal de andere niet is en het lastig is om te slepen met water naar het laboratorium.
In het lab zelf doen we meer fundamenteel onderzoek, zoals de foto in het midden een
onderzoek naar de hydraulische verdeling van water en lucht bij het terugspoelen van
membranen. Het Lab van Gezondheidstechniek verhuis momenteel naar Stevin II , waar we
samen met Vloeistofmechanica het nieuwe Waterlab gaan vormen. We verwachten veel van
de samenwerking met VLM, daar het bij ons ook draait om de combinatie van VLM en
waterkwaliteit. En als dat onderzoek goed gaat eindigt het met een promotie en kijken we
trots en tevreden terug…
Wat betreft de bescherming van de bron, hierbij is naast onderzoek ook de nodige publiciteit
en lobby nodig om de vervuilers aan te pakken en de bronnen te saneren; de WLB nagelen
dan ook graag vervuilers aan de schandpaal. Endocrine disruptoren, pil
Annex F. Searching
Figure 3.11: Tag cloud of slide notes of lecture CT3011 with common Dutch word removal
Table 3.7 shows the transcripts that correspond to the first 3 slides of the lecture. This data is
obviously the most extensive since every spoken word is included. The large amount of words
make the data less relevant when searching.
Table 3.7: Transcript for the first 3 slides of lecture CT3011
Nr
1
2
Transcript
Na een ruime inlooptijd kunnen we beginnen met het tweede deel van 30-11, Watermanagement. Het deel over gezondheidstechniek ga ik
de komende zeven weken met jullie doornemen. En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus, mijn naam is Hans van
Dijk, zoals jullie daar zien staan en ik dacht, laat ik daar maar twee dingen voor nemen, mijn hobby en mijn werk. Nou de hobby dat zien
jullie, ik ben een marathonloper. Een mooie foto van de glorieuze binnenkomst in Rotterdam in april afgelopen periode. Marathonlopers dat
zijn allemaal een beetje fanatieke lui he, echte doordouwers, die trainen iedere dag. Die weten hun leven zodanig te organiseren dat dat
allemaal kan. Dus ik loop hier ook iedere dag tussen de middag een rondje naar Delfts hout, of langs de Schie, of een ander parcour hier.
Als jullie me eens een keer in korte broek of trainingspak zien lopen dan klopt dat, dat ben ik. En dat doe ik inmiddels met een heel groepje
mensen, bij ons op de afdeling, met studenten en promovendi. En een van die studenten is hier weergegeven, dat is Karin Teunissen. Die
zat drie jaar geleden hier bij inleiding watermanagement. Was toen derde jaars, inmiddels is ze afgestudeerd en begonnen met een
promotieonderzoek bij het duinwaterbedrijf in Scheveningen. En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april 42
kilometer samen gelopen. Nou dat is een herinnering die ons beide in het geheugen gegrift zal blijven. Dan het werk. Ik heb, ik ben, ja een
vraag, ben jij ook een hardloper? - Sorry? Ben jij ook een hardloper? - Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al
gegeven. Nee - Niet? Dan weet ik niet hoe ik dit al wist, maar... Nou ik zeg dit wel eens vaker, dus dat zou best kunnen. Waar ben je
geweest? - Ja volgens mij vorig jaar, maar... Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha. Maar deze foto is echt
van april hoor dus dat is toch vrij recent. Wat misschien zou kunnen zijn is, ik geef ook altijd een van de gastcolleges bij inleiding Civiele
Techniek in het eerste jaar. En daar begin ik natuurlijk ook een beetje met, ja wie ben ik, dus dat zou best kunnen, dat je het daarvan
herinnert. Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd. Ik heb toen ook Civiele Techniek gestudeerd, in '76
afgestudeerd. Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in Amersfoort, en dat kan ik jullie van harte aanraden als je
straks afgestudeerd bent om bij een ingenieursbureau te gaan werken. Dat is een geweldige ervaring, je bent met allerlei projecten over de
hele wereld bezig. In mijn geval dan drinkwater projecten. Dus het ontwerpen van zuiveringsinstallaties, bouwen van systemen, ook het
doen van onderzoek. Eigenlijk kun je alle kanten op bij een ingenieursbureau en de Nederlandse ingenieursbureaus zijn redelijk succesvol,
ook op de internationale markt tegenwoordig. Ja ik heb daar vele jaren gewerkt, totdat op een gegeven moment, inmiddels is dat alweer 17
jaar geleden, er een advertentie stond dat we een hoogleraar zochten hier in Delft. En toen dacht ik van, nou ja, laat ik maar eens een brief
schrijven, je weet het nooit, niet geschoten is altijd mis. Dus ik heb een brief geschreven en ik dacht, ik zal het vast wel niet worden, maar
ik werd het wel. Dus ook daar zit al meteen een eerste levensles in, probeer maar eens wat en het kan altijd meevallen. Ik ben in eerste
instantie vervolgens voor een dag in de week hier deeltijdhoogleraar geworden in de drinkwatervoorziening, dat is mijn leerstoel. En ja, zo
langzamerhand van het een komt het ander, je wordt voor steeds meer dingen gevraagd. Dus ik ben langzamerhand meer dingen hier in
Delft gaan doen en die aanstelling bij DHV heb ik steeds verder afgebouwd, en vanaf 1999 ben ik volledig gestopt bij DHV en ben ik hier
voltijd hoogleraar. En voltijd hoogleraar dat betekent ook, je hebt enerzijds taken op het gebied van onderwijs, anderzijds onderzoek, maar
ook management, dus management, ja, dan moet je, ik ben hoofd van een afdeling enzo en dan zit je in het managementteam of in de
opleidingscommissie. Moet je over algemene dingen meepraten en beslissen. Daar kun je natuurlijk een dagtaak van maken, dat heb ik
altijd vermeden. Ik vind het toch altijd het leukste om met het vak bezig te zijn en daarmee kom ik op het tweede plaatje wat hier staat,
want het allerleukste is eigenlijk afstudeerders begeleiden. Dat gaan jullie de komende jaren dat proces doormaken. Dat is voor ons altijd
ontzettend leuk om te zien hoe studenten zich transformeren van min of meer anonieme figuren die in de collegezaal zitten en zitten te
luisteren. Min of meer absorberen wat ik in een monoloog aan het overdragen ben. Hoewel ik overigens wel reacties van jullie zeer op prijs
stel hoor en ik zal daar ook af en toe expliciet om vragen. Maar goed, de praktijk is toch dat in deze fase van de studie zitten jullie nog
vooral te luisteren en dat wordt eigenlijk steeds leuker als je verder komt in het vierde en het vijfde jaar en het hoogtepunt is dan natuurlijk
het afstuderen, waar je echt een onderwerp helemaal zelf bij de kop pakt. Ik zeg ook altijd tegen mijn afstudeerders, je moet van je
afstudeerproject je visitekaartje maken, he Doris, en dat werkt ook echt zo. Op het moment dat je klaar bent met dat afstudeeronderwerp
dan weet jij het meeste van dat onderwerp af. Meer dan wie dan ook in Nederland. Dat bewijzen we ook iedere keer weer door de
afstudeercolloqui. Daar geven we veel kenbaarheid aan, daar komen altijd mensen vanuit de waterbedrijven van KIWA, van andere
researchinstituten. Die doen daar mee in de discussies en onze afstudeerders die weten keer op keer alle vragen te beantwoorden.
Misschien niet altijd 100% goed, maar toch wel 99% goed. Dat is altijd een genoegen om mee te maken. Ik zeg ook altijd dat ik trots ben
op mijn afstudeerders, en dat is ook zo. Ik heb er inmiddels een stuk of 80 gehad en soms gaat het dan heel goed, zoals hier staat met
Karin en Doris, Doris is hier trouwens in de zaal aanwezig, die dan het afgelopen jaar allebei zelfs met lof zijn afgestudeerd. Dat betekent
dus dat je het heel goed gedaan hebt, hoge cijfers gehaald hebt, en ook het afstudeerproject heel goed gedaan hebt. Ja, dat is voor ons
gewoon heerlijk om dat mee te maken. Om te zien hoe jonge mensen het vak ook leuk gaan vinden, zelf ook enthousiast worden, en hun
stempel gaan zetten op ons vakgebied. En ik hoop dat enkele van jullie ook zo ver zullen komen. Goed, dat is wat mijzelf betreft.
Dan wat dit vak betreft. We gaan dat doen aan de hand van het boek, dat staat al op blackboard aangegeven. Daar hebben we een
Nederlandse en een Engelstalige versie van. Dat boek dat moeten jullie kopen bij de secretaresse van ons, Mieke op de vierde verdieping,
voor 25 euro. In de winkel kost het 50 euro, maar wij hebben een speciale kortingsregeling. Jullie mogen zelf weten of je het Nederlandse
of het Engelse boek koopt. De inhoud is vrijwel hetzelfde en in ieder geval voldoende voor dit vak. Als jullie een advies van mij willen
hebben dan zou ik zeggen, als je goed Engels kunt lezen, koop het Engelse boek, dat is iets actueler, staat iets meer informatie in, maar het
Nederlandse boek is voor dit vak zeker voldoende. Ja, zo'n boek heeft natuurlijk, behalve dat we er over gaan vragen bij het tentamen, daar
zal ik bij mijn volgende dia op terugkomen, heeft zo'n boek natuurlijk ook nog een zekere functie als naslagwerk. Als je zo'n boek eenmaal
hebt, dan heb je dat bij je, ook na je afstuderen neem je dat mee. Als je vervolgens ergens in een vreemd land een installatie moet
ontwerpen, dan haal je dat boek weer eens uit de tas en dan weet je weer het een en ander. Die functie heeft zo'n boek ook. Daar staan
vraagstukken ook in, in dat boek, en we hebben ook vraagstukken op blackboard staan. Dat zullen jullie misschien ook al gezien hebben,
computer assignments. Dat is overigens niet verplicht, er is bij ons niets verplicht. Ja, jullie moeten uiteindelijk het tentamen doen, maar we
bieden materiaal aan, dus maak er gebruik van zou ik zeggen maar we gaan dat niet controleren. Er staan daar vragen op blackboard, er
zitten vragen in dat boek, de antwoorden staan er ook bij, of althans, als je die computer assignment gemaakt hebt dan krijg je na afloop te
melden welke vragen goed waren en welke vragen fout waren. Dus dat is een ondersteuning voor jullie bij het kennismaken met de materie
en het leren van de stof. En oude tentamens hebben we daar ook bij staan, dus dan kun je ook nog eens oefenen en kijken wat er
ongeveer gevraagd wordt. En dan gaan we college geven de komende periode.
Annex F. Searching
241
3
4
242
Oh ja, dus over het boek, jullie hoeven niet het hele boek te kennen. Dat boek wordt zowel gebruikt bij 30-11, als bij het volgende college
34-20, wat een a keuzevak is voor de mensen die watermanagement gaan doen, en de hoofdstukken die voor 30-11 gevraagd worden op
het tentamen staan hier aangegeven. En die presentatie komt ook weer op blackboard zoals jullie weten, inclusief deze video opname.
Dan gaan we deze colleges geven, dus 7 keer de komende periode vanaf nu, en ik wil het dit jaar zo doen dat in het eerste uur vertel ik een
beetje de grote lijn van het betreffende onderwerp. De belangrijkste punten, ik probeer daar wat kleuring aan te geven. Wat is nou
belangrijk en wat minder. En het tweede uur heb ik steeds een van de promovendi, vandaag is dat Doris, die dan iets gaan vertellen over
hun eigen onderwerp, hun eigen onderzoek, hun eigen project, wat een stukje actualiteit geeft, en kleuring, verdieping, van het betreffende
onderwerp. En ik heb het zo georganiseerd dat dat steeds, als het goed is, goed op elkaar aansluit en jullie een goed beeld geven van de
stof, zodat je straks het tentamen ook makkelijk kunt maken. Dat wil niet zeggen dat alle onderdelen van de verhalen van de promovendi
tentamenstof zijn. Dat zullen we zo her en der ook wel aangeven. Ja, zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan jullie nu in
het derde jaar hoeven te weten, maar het gaat meer om de beeldvorming, de kleuring en het begrip van de materie. Dan hebben we een
excursie gepland naar de Berenplaat, de grote zuiveringsinstallatie bij Rotterdam, bij Spijkenisse om precies te zijn, op 11 oktober. Ook dat
is niet verplicht, alles is facultatief bij ons. Daar hebben zich tot nu toe een stuk of 60 mensen aangemeld. De inschrijving sluit op 1 oktober
hebben we gezegd, omdat bij de waterbedrijven tegenwoordig ook strikte veiligheidsvereisten enzo zijn na de aanslagen in New York. Je
moet daar precies opgeven wie er allemaal komen, met naam enzo en wij moeten daar voor instaan ook, dat er geen vervelende dingen
gebeuren, en er moeten natuurlijk ook bussen gereserveerd worden en we krijgen daar lunch geserveerd. Dus de mensen die zich
opgegeven hebben die krijgen nog een mailtje binnenkort, kort na 1 oktober, met een bevestiging, en degene die zich niet opgegeven
hebben die gaan niet mee. En ik ga er ook van uit dat degenen die zich wel opgegeven hebben, dat die ook komen he, het is natuurlijk een
beetje vervelend tegenover de organisatoren als we daar met veel minder mensen zouden aankomen dan we aangemeld hebben. We zullen
proberen, ik heb wat vragen gekregen over dat er 's middags verplichte practica zouden zijn van constructieleer en statistiek geloof ik, dus
we zullen proberen om tijdig weer terug te zijn. Dat zal zeker niet om half 2 zijn, dus ik denk dat we ongeveer om half 3 terug zullen zijn, en
we vertrekken gewoon na het college op donderdag, dus om half 11. Ik weet niet of, even kijken of ik al ga beginnen,
Annex F. Searching
4.
Assessment of tag clouds by lecturer
Assessment approach
The tag clouds produced in this research project (reported in Annex E and Annex F) have
been evaluated by the lecturer of this course. These tag clouds have been produced in black
and white with the same font face, in order to have only the font size as a distinctive
element.
This assessment was done in 2 steps:
• quality assessment of the original tag clouds
• quality assessment of the modified tag clouds (uniform, max 15 words)
Original tag clouds
1
2
3
4
5
6
7
8
Annex F. Searching
243
9
10
The lecturer of the course was asked to assess the quality of the tag clouds using his own
criteria. His main criteria for this assessment were:
• a limited number of words to increase readability
• showing the proper words
The results of the first assessment are given in Table 4.1.
Table 4.1: Tag cloud assessment of original tag clouds
ID
Description
(source)
Cleaned
(*)
Nr of
words
1
Slide titles
1
35
2
3
Slide content
Slide titles and slide
content
Slide notes
Human subtitles A
Human subtitles B
Human subtitles C
1
1
100
100
1
1
1
100
100
100
25
2
4
5
6
7
8
Human subtitles,
nouns only A
9
Human subtitles,
nouns only B
10
SHoUT output, nouns
only
(*) 1 = after removing common
Assessment results
General appearance
OK
Too many little words
Not OK
Not OK
Rank
3
4
15
Not OK
Not OK
Not OK
OK
Too many irrelevant words
OK
2
15
OK
1
2
15
OK
Word "Chloor" is missing
Dutch words ; 2 = nouns only
1
2
The following conclusions have been made from these results:
• tag clouds are only useful at a maximum of 15 words
• new tag clouds have to be produced for further assessment
Modified tag clouds
Based on the results of the first assessment, a number of new tag clouds have been
produced for re-assessment. All these modified tag clouds include 15 words.
1
244
2
Annex F. Searching
3
4
5
6
7
8
9
10
In this second assessment the lecturer was also requested to appoint words which he
considered superfluous in the tag cloud.
ID
1
Removed words
the
leven
doet
and
2
blackboard
dijk
prof
no
Annex F. Searching
Remaining words
practices
gezondheidstechniek
waterleidingingenieur
drinking
principles
principes
gezond
drinkwater
water
technology
schoon
gezondheidstechniek
niet-aangesloten
drinkwaterafvalwateroppervlaktewatergrondwater
watermanagement
245
sectie
per
afdeling
3
afdeling
doet
the
and
no
4
wel
groot
zoals
waar
zien
heel
natuurlijk
gaat
tijdens
we
ook
die
ik
dat
dan
een
in
dus
de
je
en
het
is
van
gaan
weer
moeten
nou
gaat
maken
zien
heel
goed
wel
jullie
plaatje
keer
jaar
soort
dingen
5
6/7
8
9
246
soort
keer
dingen
plaatje
leakage
water
m3/h
demand
waterleidingingenieur
groundwater
practices
watermanagement
drinking
water
gezondheidstechniek
demand
m3/h
principles
project
onderzoek
water
afvalwater
civiel
nederland
nederland
grondwater
natuurlijk
water
oppervlaktewater
grondwater
water
chloor
drinkwater
nederland
stoffen
drinkwatervoorziening
vragen
boek
grondwater
nederland
boek
drinkwater
Annex F. Searching
jaar
10
drinkwatervoorziening
water
stoffen
oppervlaktewater
vragen
chloor
nederland
grondwater
oppervlaktewater
drinkwater
stoffen
wereld
onderzoek
kwaliteit
water
plaatje
soort
jaren
jaar
dingen
mensen
The results of the second assessment are given in Table 4.2. In this table the number of
deleted words has been ranked (lowest = 1, etc) and added to the appearance ranking,
giving a total rank score.
Table 4.2: Tag cloud assessment of modified tag clouds (all 15 words)
ID
Description
(source)
Cleaned
(*)
1
Slide titles
1
Words
deleted
4
2
Slide content
1
7
3
Slide titles and slide 1
content
4
Slide notes
1
5
Human subtitles A
6/7
Human subtitles B
1
8
Human subtitles,
2
nouns only A
9
Human subtitles,
2
nouns only B
10
SHoUT output,
2
nouns only
(*)1 = after removing common Dutch words
Assessment results
5
Total
rank
6
7
13
8
10
9
15
11
5
6
9
4
1
13
18
12
3
5
3
5
6
2
7
5
General
appearance
Many same sized
(small) words
Too many same sized
(small) words
Too many same sized
(small) words
Word "Chloor" is
missing
; 2 = nouns only
Rank
Table 4.2 shows that the two tag clouds from nouns in the subtitles have the best overall
ranking. These two tag clouds contain the same words, but differ in letter font and layout of
the words. The best readable font (Coolvetica) was preferred by the lecturer over a less
readable font (Vigo).
The lowest number of "deleted words" was obtained from the slide titles. However the
produced tag cloud contains a very low variance in font size, so did not drawn attention to
special words. The variance in word count in subtitles is much larger giving a more
pronounced picture.
The tag cloud from SHoUT output has a lower ranking because it misses an important word,
and has more "deleted words". The other produced tag clouds were significant less
appreciated.
The following conclusions have been made from these results:
• tag clouds should contain less than 15 words
• tag clouds should be obtained from "nouns only"
Annex F. Searching
247
•
•
•
tag clouds from subtitles (or speech recognition) are preferred over tag clouds from slide
titles (or slide content / slide notes) because of their larger variance in font size
tag clouds needs a "best readable font"
tag clouds might be improved by removing bad words chosen by the lecturer
The use of colored tag clouds is not evaluated, since this might be largely depending on the
personal preference of a lecturer.
248
Annex F. Searching
5.
Searching in recorded lectures
Now that a database is available with all the relevant data for the recorded lectures of course
CT3011, a search engine can be built to query this data. This has been done under the name
"Collegerama Lecture Search". It offers a layered search engine that allows the user to
choose the sources of data in which he/she wants to search. The interface of this search
engine is shown in Figure 5.1.
Figure 5.1: Collegerama lecture search
When the user only selects the lecture titles and chapters without any additional query text,
the system will generate a table of content of all the lectures. This list is based on the
information provided by the lecturer during post-processing, so this information should be
100% accurate and relevant for each lecture. A generated list is shown in Figure 5.2.
Figure 5.2: Table of content generated by Collegerama lecture search
Annex F. Searching
249
When the search engine returns a result set, each row is color coded based on the source of
the information that is being displayed. This gives the user an idea of the granularity of his
search results and he can choose to either remove or add more sources to his result set to
expand or limit the number of returned rows. An example of this is shown in Figure 5.3.
Figure 5.3: Layered result set returned by Collegerama Lecture Search
250
Annex F. Searching
6.
Evaluation of searching in recorded lectures
A relevant evaluation method for the Collegerama lecture search is the "know item search".
With this method, the search engine is tested on the retrieval of known items or selected
keywords.
For this research project, known item testing is done for:
• comparing the retrieval rate for ASR output versus for full subtitling
• comparing the retrieval rate for all text types in Collegerama lecture search
Comparing subtitles and ASR output in search
Comparing the retrieval rate for ASR output versus full subtitling is done on two subsets of
"know items" or keywords:
• most-used words
• most important words
Most used words
The retrieval for the 20 most used words from subtitles data versus ASR data is presented in
Table 6.1. The data has been abstracted from lecture #15. The human-made subtitles
contain 6,970 words and the ASR output contains 7,351 words.
Table 6.1: Retrieval of 20 most used words from subtitles versus ASR
Known item (word)
dus
we
het
dat
van
ook
ik
op
die
een
daar
je
dan
met
de
in
zijn
is
en
niet
Total
Human-made subtitles
(ref)
(rank)
13
10
4
1
7
9
14
19
11
6
16
12
15
20
2
8
18
5
3
17
(number)
93
128
220
269
162
134
88
54
113
174
76
107
80
53
231
151
54
181
225
55
2.648
ASR
(number)
22
46
155
218
132
109
74
46
99
158
72
117
89
59
276
192
75
270
359
165
2.733
Word
accuracy /
Retrieval
rate
(%)
24%
36%
70%
81%
81%
81%
84%
85%
88%
91%
95%
109%
111%
111%
119%
127%
139%
149%
160%
300%
103%
Table 6.1 shows that the 20 most common words amount to (2648/6970=) 38% respectively
(2733/7351=) 37% of the total number of words. The larger number of words in ASR can be
explained by the tendency of SHoUT to decode long words into smaller components.
The words "dus" and "we" have a WA-value (word accuracy) of below 50%, or a WER-value
(word error rate) of above 50%. The word "dus" will only be retrieved 24% of the time and
the word "we" 36% of the time, in the case where only ASR data is available. It is assumed
that human-made subtitles have a WA-value of 100%.
Annex F. Searching
251
The words "en" and "niet" have a WA-value of above 150% and a WER-value above 150%.
This means that the word "en" will be retrieved for 1.6 times and the word "niet" 3 times
more than the actual number. This high retrieval is again caused by the tendency of SHoUT
to split up longer words.
Important words
The retrieval for the 15 most- used nouns from subtitles data versus ASR data is presented in
Table 6.2. The data has been abstracted from the same lecture. In determining the retrieval
of the word "water", composed words such as "drinkwater", "drinkwatervoorziening",
"grondwater", "oppervlaktewater" has not been included (as is the case for "drinkwater" in
"drinkwatervoorziening"). This table also shows the 5 words that are marked by the lecturer
as less relevant in the assessment of tag clouds (see chapter 4), leaving the ten "most
important words" or "ok words".
Table 6.2: Retrieval of 15 most used nouns from subtitles versus ASR
Known item (word)
Lecturer
check
chloor
drinkwatervoorziening
boek
oppervlaktewater
plaatje
vragen
soort
water
stoffen
grondwater
Nederland
dingen
keer
drinkwater
jaar
Total
Total ok words
ok
ok
ok
ok
ok
ok
ok
ok
ok
ok
Human made
subtitles (ref)
(rank)
(number)
10
13
16
5
15
8
15
7
11
11
10
12
9
14
39
1
9
15
21
3
36
2
16
6
13
9
13
10
16
4
249
184
ASR
(number)
0
4
5
7
6
6
7
33
8
20
35
17
16
16
28
208
134
Word accuracy
/ Retrieval rate
(%)
0%
25%
33%
47%
55%
60%
78%
85%
89%
95%
97%
106%
123%
123%
175%
84%
73%
The most remarkable result in the retrieval rate is the word "chloor", which has been
indicated by the lecturer as one of the ten most important words. This word has not been
recognized by SHoUT as being an uncommon word in the Dutch language. This word or item
is therefore not retrieved from the lecture if no correct subtitles are available.
A retrieval rate of above 100%, as for "jaar", "keer" and "drinkwater", shows that for
searching composed words in the ASR output, it is better to search for word components
instead of full words. This is illustrated by the low retrieval rate for the word
"drinkwatervoorziening".
A retrieval rate of above 50% is expected from the ASR output as the accepted or expected
quality level for ASR engines. The word "boek" has a lower retrieval rate in the ASR output,
which shows that for SHoUT this word is difficult to decode. This word has also been
indicated by the lecturer as one of the ten most important words.
Searching for different text-types
In order to determine the most important text type during searching, the ten most important
known items of lecture #15 have been searched. The result of these know item search in
Collegerama lecture search has been shown in Table 6.3.
252
Annex F. Searching
Table 6.3: Retrieval of 10 most important items/word from different text types
Known item
(word)
water
Nederland
grondwater
drinkwatervoorziening
boek
oppervlaktewater
drinkwater
chloor
vragen
stoffen
total
non-retrieved words
Subtitles
39
36
21
16
15
15
13
10
10
9
ASR
33
35
20
4
5
7
16
0
6
8
Slide
titles
5
0
0
0
0
0
1
0
0
0
Slide
text
3
0
5
0
1
5
5
0
0
0
Slide
t+t
8
0
5
0
1
5
6
0
0
0
Slide
notes
0
3
1
2
0
2
3
0
0
0
Lecture
title
0
1
0
1
0
0
0
0
0
0
Lecture
chapter
0
0
0
0
0
0
0
0
0
0
184
100%
0
0%
134
73%
1
10%
6
3%
8
80%
19
10%
5
50%
25
14%
5
50%
11
6%
5
50%
2
1%
8
80%
0
0%
10
100%
The results of Table 6.3 show that for searching in lectures, the lecture titles and lecture
chapter titles are of no relevance. These text types give a 0%-1% retrieval rate for the most
important words and 80%-100% of these words give no results at all. These text type are
particularly suitable for navigation but apparently not for searching.
To a lesser extent, the same holds true for slide content. These types give a retrieval rate of
3%-14% for the most important words and 50%-80% non-retrieved words. The retrieval rate
of ASR for the most important words is 73% and only 10% are non-retrieved words.
These results show that ASR gives a drastic increase in the retrieval rate over slide content.
The retrieval rate for the most important words is significantly higher than the overall word
correctness of ASR for this lecture (73% versus 46%). Having subtitles will further increase
the retrieval rate to an assumed 100% value, as human-made subtitles in real timesuppressed environments have a tested word correctness of 96%-100%.
Duration per text type
The retrieval rate indicates how much of the items are found in a search, but not how long it
will take to really find this item. Searching an item in (non time-tagged) transcripts may
indicate the lecture in which the item is used, but the user has to watch/listen to the whole
lecture to really see the searched result. Assuming a constant speaking rate might give a best
guess to jump to the equivalent time-frame, but in most cases this is not suitable for the
user.
The time correctness of a search is related to the length or duration (end time minus start
time) of the related video fragment. The durations per text type in Collegerama lecture
search are shown in Table 6.4.
Table 6.4: Duration of text types in Collegerama lecture search for Course CT3011
Text type
Description
Lecture title
Transcript (lecture)
Lecture chapter
Slide title
Slide content
Slide notes
Transcript (slide)
Transcript (sentence)
Transcript (word)
Lecture recording
Annex F. Searching
Minimum
(sec)
1,351
Maximum
(sec)
3,231
Mean
(sec)
2,451
Chapters by lecturer
Slide data
15
2
2,197
611
592
55
Subtitles
ASR output
0.6
0.0
6.0
3.4
3.4
0.3
253
Table 6.4 shows that the duration for slides may vary between 2 seconds and 7:28 minutes,
with a mean value of 58 seconds. This means that on average the user has to wait for nearly
1 minute to encounter his searched item. This duration might be acceptable for recorded
lectures, as most spoken text has a relevant surrounded text. In general all spoken text
belongs to that particular slide, as the lecturer more or less explains the slide content.
More detailed searching for a specific sentence can be achieved by searching in subtitles or
time-tagged words (such as the ASR output of SHoUT). With time-tagged words, it is possible
to show a kind of karaoke-type subtitling, with sentences and coloring of the spoken word.
An example of this can be seen at the website for Radio Oranje, in which old transcripts has
been time-tagged by ASR (SHoUT).
Multiple-keyword search
Students might use a search engine for recorded lectures during preparation of their exam.
They might be looking for a passage that was once mentioned in an earlier lecture or a
specific exam question that was discussed. These searches probably contain more than one
keyword, for example the keywords “stoffen” and “grondwater”.
The search engine on individual subtitles will not give a positive result, as these keywords are
never used in one particular sentence and won't be retrieved as one record in the database.
The same holds true for searching on individual words from ASR.
A solution to this problem is offered by storing all spoken text belonging to a slide, called a
slide transcript. The time-code contains a start and end time for the slide. The same is done
for an entire lecture. This will allow for the searching of combined keywords. The student can
use the slide or lecture timeframe as the starting point for further viewing.
Spoken text per slide is included in the database but not implemented in the prototype for
the web interface. Evaluation of this feature has been done directly on the database. This
approach results in the storing of the same data in multiple records. Transcripts per lecture
could be searched by a search engine using the transcript per word (ASR output). The
approach used gives additional flexibility in the layout of transcripts, which enables more
sophisticated output options. A lecture transcript can be printed in a more convenient way if
additional line breaks are included. This option is not available if lecture transcript are
automatically abstracted from word transcripts.
If a multiple-keyword search is done on the ASR data for the words “stoffen” and
“grondwater” in lecture #15, 8 results are returned. When clustering this result set by slide,
there are only 2 slides out of a total of 29 slides that contain both keywords. The slide
timeframe 24:07-25:09 gives 1 paired result and the slide timeframe 29:47-33:35 gives 4
paired results. The total viewing time for the combined results is reduced from the lecture
duration of 45:09 minutes to only 1:02 + 3:48 = 4:50 minutes.
Keyword search for all lectures
The 4 most important words of lecture #15, the ones with the best ASR accuracy, can be
used for evaluation of the search engine on all lectures of the course. It is assumed that
these 4 keywords (“stoffen"”, “grondwater”, “Nederland” and “dingen”) will also give a high
accuracy for the other lectures, despite the fact that most of these lectures were given by
other lecturers.
The results of this evaluation test are shown in Table 6.5.
254
Annex F. Searching
Table 6.5: Occurrence of 4 important words in all lectures
Lecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Total occurrence
Portion in #15
Number of lectures
Nederland
9
13
5
4
15
4
10
6
7
2
11
8
10
12
35
16
17
6
6
9
30
23
10
4
4
7
dingen
7
17
10
9
8
10
9
5
13
9
5
6
8
3
17
1
8
6
9
4
6
4
8
1
4
5
5
2
grondwater
2
3
18
8
2
5
2
11
20
2
12
26
2
75
3
21
10
1
-
stoffen
1
8
11
11
3
4
15
1
-
283
12%
26
199
9%
28
223
9%
18
54
15%
8
Table 6.5 shows that lecture #15 is the most important lecture for the keyword “Nederland”,
with the highest occurrence (35 times). However this keyword is found in all but 2 lectures,
with also a high occurrence in lecture #21 (30 times) and #23 (23 times).
The keyword “dingen” is found in all lectures, with equal occurrence for lecture #15 and
lecture #2.
The keyword “grondwater” is found in 18 lectures. Lecture #21 (“Grondwaterzuivering”)
seems to be the most important lecture for this item, with the highest occurrence.
The keyword “dingen” is found in only 8 lectures with lecture #23
(“Oppervlaktewaterzuivering”) as the most relevant lecture for this item.
Multiple keyword search for all lectures
The two keywords occurring in the lowest number of lectures (“stoffen” and “grondwater”)
have been used in a multiple keyword search. Table 6.6 gives the results of this search.
Annex F. Searching
255
Table 6.6: Occurrence of combinations of 2 important words in all lectures
Lecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Total occurrence
Number of lectures
Number of slides
Total duration
stoffen
grondwater
2
3
18
8
2
5
2
11
20
2
12
26
2
75
3
21
10
1
-
stoffen +
grondwater
in lecture
8
11
11
2
4
15
1
-
stoffen +
grondwater
in slide
1+4
1+1+1
1+1+2+1+2
0
1+1+1
1+2+1
1
-
1
8
11
11
3
4
15
1
-
54
8
-
223
18
-
52
7
270
5:22:05
23
6
17
20:57
Table 6.6 shows that both keywords are present in 7 lectures. Without a multiple-keyword
search per slide, this will require a total viewing time of 5:22:05 hours in order to see all
results.
If a search is done on slide level only 6 lectures will be retrieved, with a total of 17 slides in
which the combination of keywords is found. This reduces the viewing time to only 20:57
minutes. Searching on slide level reduces the total viewing time to 6.5%, or a reduction of
93.5%.
Precision and recall measurement
For the Collegerama lecture search engine “Precision and Recall” measurements can be
executed on the data of Lecture #15 in which the human-made subtitles can be considered
as known precise objects.
As object for these tests the slides of the lectures can be used. A slide is regarded to give a
completed sub set of a lecture in which the related subject is explained. In this way, a slide
can be considered as an object or a document.
The test was done on 3 of the 10 “important words” of Lecture #15: “stoffen”, “grondwater”
and “chloor”. The results of these test are shown in Table 6.7, Table 6.8 and Table 6.9.
256
Annex F. Searching
Table 6.7: Keyword “stoffen” per slide in different text types
Slide
Time frame
#
0:00 – 7:28
1
7:28 – 9:48
2
9:48 – 10:20
3
10:20 – 13:12
4
13:12 – 13:47
5
13:47 – 13:50
6
13:50 – 14:39
7
14:39 – 15:46
8
15:46 – 16:39
9
16:39 – 17:20
10
17:20 – 18:06
11
18:06 – 18:50
12
18:50 – 20:39
13
20:39 – 23:11
14
23:11 – 24:07
15
24:07 – 25:09
16
25:09 – 27:08
17
27:08 – 29:47
18
29:47 – 33:35
19
33:35 – 35:49
20
35:49 – 38:48
21
38:48 – 39:08
22
39:08 – 39:27
23
39:27 – 40:02
24
40:02 – 40:36
25
40:36 – 41:28
26
41:28 – 43:23
27
43:23 – 43:59
28
43:59 – 45:09
29
Occurrence
Relevant slides
Slides retrieved
Relevant slides retrieved
Recall
Precision
Annex F. Searching
Subtitles
ASR
1
2
1
2
5
4
1
1
9
4
4
4
100%
100%
8
4
4
4
100%
100%
Slide
titles
Slide
content
Slide
notes
Chapter
title
Lecture
title
0
4
0
0
0%
-
0
4
0
0
0%
-
0
4
0
0
0%
-
0
4
0
0
0%
-
0
4
0
0
0%
-
257
Table 6.8: Keyword “grondwater” per slide in different text types
Slide
Time frame
#
0:00 – 7:28
1
7:28 – 9:48
2
9:48 – 10:20
3
10:20 – 13:12
4
13:12 – 13:47
5
13:47 – 13:50
6
13:50 – 14:39
7
14:39 – 15:46
8
15:46 – 16:39
9
16:39 – 17:20
10
17:20 – 18:06
11
18:06 – 18:50
12
18:50 – 20:39
13
20:39 – 23:11
14
23:11 – 24:07
15
24:07 – 25:09
16
25:09 – 27:08
17
27:08 – 29:47
18
29:47 – 33:35
19
33:35 – 35:49
20
35:49 – 38:48
21
38:48 – 39:08
22
39:08 – 39:27
23
39:27 – 40:02
24
40:02 – 40:36
25
40:36 – 41:28
26
41:28 – 43:23
27
43:23 – 43:59
28
43:59 – 45:09
29
Occurrence
Relevant slides
Slides retrieved
Relevant slides retrieved
Subtitles
Recall
Precision
258
ASR
Slide
titles
Slide
content
Slide
notes
Chapter
title
Lecture
title
1
1
1
2
7
1
1
7
9
8
2
2
21
5
5
5
20
5
6
5
0
5
0
0
1
5
1
0
1
5
1
0
0
5
0
0
0
5
0
0
100%
100%
100%
83%
0%
-
0%
0%
0%
0%
0%
-
0%
-
1
Annex F. Searching
Table 6.9: Keyword “chloor” per slide in different text types
Slide
Time frame
#
0:00 – 7:28
1
7:28 – 9:48
2
9:48 – 10:20
3
10:20 – 13:12
4
13:12 – 13:47
5
13:47 – 13:50
6
13:50 – 14:39
7
14:39 – 15:46
8
15:46 – 16:39
9
16:39 – 17:20
10
17:20 – 18:06
11
18:06 – 18:50
12
18:50 – 20:39
13
20:39 – 23:11
14
23:11 – 24:07
15
24:07 – 25:09
16
25:09 – 27:08
17
27:08 – 29:47
18
29:47 – 33:35
19
33:35 – 35:49
20
35:49 – 38:48
21
38:48 – 39:08
22
39:08 – 39:27
23
39:27 – 40:02
24
40:02 – 40:36
25
40:36 – 41:28
26
41:28 – 43:23
27
43:23 – 43:59
28
43:59 – 45:09
29
Occurrence
Relevant slides
Slides retrieved
Relevant slides retrieved
Recall
Precision
Annex F. Searching
Subtitles
ASR
Slide
titles
Slide
content
Slide
notes
Chapter
title
Lecture
title
0
2
0
0
0%
-
0
2
0
0
0%
-
0
2
0
0
0%
-
0
2
0
0
0%
-
0
2
0
0
0%
-
0
2
0
0
0%
-
8
1
9
2
2
2
100%
100%
259
Ranked search results
The search results have to be ordered according to a certain norm. In this research project,
two of these options have been evaluated:
• time-based
• rank based
Time-based
In this order method, all the results are sorted in chronological order. This makes sense for
recorded lectures, assuming the sequential explanation of key items in lectures. Later in the
course, the key items are explained in further detail.
In SQL Server, this can be accomplished by ordering the query results on Lecture_nr and
Start_time. The query that can be used for this is shown below:
SELECT *
FROM Content
INNER JOIN Lectures ON Content.Lecture_id = Lectures.Lecture_id
WHERE CONTAINS (Text, 'stoffen')
AND Lecture_ID = '15'
ORDER BY Lectures.Lecture_nr, Start_time, Content.Text_type
Rank based
SQL Server has a function ranks search results based on several factors:
• text length
• number of occurrences of search words/phrases
• proximity of search words/phrases in proximity search
• user-defined weights
The query that can be used for this is shown below:
SELECT *
FROM Content AS FT_TBL INNER JOIN
CONTAINSTABLE(Content, Text, 'stoffen') AS KEY_TBL
ON FT_TBL.Content_id = KEY_TBL.[KEY]
WHERE Lecture_ID = '15'
ORDER BY KEY_TBL.RANK DESC;
260
Annex F. Searching
Table 6.10: Ranked search results using CONTAINSTABLE for the word "stoffen"
Text
type
9
9
9
9
9
9
9
9
8
8
8
8
8
7
6
8
8
8
8
7
7
7
Text
stoffen
stoffen
stoffen
stoffen
stoffen
stoffen
stoffen
stoffen
Allerlei stoffen die worden afgefiltreerd tussen het z…
Water is een natuurlijke stof en de verontreiniging…
dat gaat reageren met bepaalde stoffen die van nature…
En dat zijn dus ongewenste stoffen.
Dat zijn stoffen die giftig kunnen zijn.
Een plaatje met een aantal kernbegrippen vast, het…
Na een ruime inlooptijd kunnen we beginnen met het…
We zeggen nee, dat zijn ongewenste stoffen, die willen…
En anderzijds, om verschillende soorten stoffen met…
De interactie, de lozing van stoffen die eventueel plaats…
Dus het oppervlaktewater bevat een volledige cocktail…
En tenslotte doen we natuurlijk ook onderzoek, vooral…
dat zien we hier, zakt dat vanzelf de grond in. Dat…
Oppervlaktewater hebben we meervoudige barrieres. Op…
Annex F. Searching
Key
Rank
6223
6514
6532
7516
7551
7553
7593
8875
10115
10144
10235
10239
10240
89
10456
10244
10376
10146
10221
87
86
95
192
192
192
192
192
192
192
192
192
192
192
192
192
137
123
96
96
96
96
76
48
48
Start
time
1460970
1566710
1573320
1927800
1941580
1942270
1955460
2428180
1460700
1563000
1926200
1940200
1941900
1787000
0
1954000
2425500
1571300
1879800
1509000
1447000
2402000
261
Table 6.11: Ranked search results using CONTAINSTABLE for the word "grondwater"
Text
type
6
7
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
9
8
7
8
8
8
8
8
8
8
8
8
8
7
4
8
8
8
8
8
8
8
8
8
8
7
5
7
262
Text
Na een ruime inlooptijd kunnen we beginnen met het…
Als we naar de opzet van de infrastructuur kijken, dan z…
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
grondwater
het winnen van grondwater,
Een plaatje met een aantal kernbegrippen vast, het…
Ja, dat water wordt heel goed gefiltreerd en dat grond…
dus grondwater is over het algemeen goed.
en dat kan uiteindelijk ook in het grondwater terecht...
maar gemiddeld gesproken is grondwater toch van een…
Gebruik grondwater als het mogelijk is.
Ja, want ik zie op de waddeneilanden wordt wel grond…
Rotterdam heeft geen grondwater, Rotterdam heeft ook…
Nou, grondwater dat plaatje.
Als je een mental map hiervan maakt, van grondwater…
kunstmatig grondwater te maken via die infiltratie.
Nou, grondwater dat plaatje. Ik denk dat dit toch wel…
•27 sept.Inleidinggezondheidstechniek •1 okt.Waterkw…
Dus dan win je eigenlijk een soort kunstmatig grondwater.
en andere verontreiningen bevat, maak je een soort…
het feit dat we gebruik maken van grondwater en opper…
Grondwater is ook in Nederland vaak nog van een hele…
en dat wordt het een soort kunstmatig grondwater.
wordt alleen maar grondwater gebruikt voor de drinkwa…
Daar is grondwater beschikbaar, dat is van goede kwal...
Zeewater, zout, precies he. Dus het grondwater hier is...
Dus ja, hier kun je geen grondwater gebruiken, dus…
vuilnisstortplaatsen, en die kunnen het grondwater ook …
dat zien we hier, zakt dat vanzelf de grond in. Dat noe…
Een voorbeeld van een recent project waar ik in mijn…
We hadden natuurlijk gezondheidstechniek. Nou dat zal…
Key
Rank
10456
91
4559
6130
6309
7152
7159
7222
7243
7277
7297
7305
8197
8199
8214
8235
8243
8332
8357
8585
8658
8682
9932
89
10206
10208
10213
10214
10303
10328
10346
10355
10357
10324
93
1731
10120
10122
10202
10203
10326
10305
10306
10319
10321
10211
86
69
77
240
205
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
160
106
80
80
80
80
80
80
80
80
80
80
80
80
80
40
Start
time
0
2149000
847370
1425430
1492940
1797370
1802540
1825000
1831790
1843730
1850630
1853580
2172810
2174280
2182320
2190940
2193580
2227140
2235120
2316190
2349830
2357870
846900
1787000
1822600
1831000
1849400
1851900
2181800
2254000
2315200
2349500
2354700
2243800
2348000
620000
1479900
1489400
1795900
1802700
2250700
2190000
2193100
2225000
2234000
1841700
1447000
1239000
830000
Annex F. Searching
Table 6.12: Ranked search results using CONTAINSTABLE for the word "chloor"
Text
type
7
8
8
8
8
8
8
8
6
8
8
7
Text
Een plaatje met een aantal kernbegrippen vast, het…
Amerikanen die vinden het vanzelfsprekend om chloor…
het drinkwater smaakt ook naar chloor daar,
ruikt ook naar chloor daar.
namelijk we weten dat als je chloor toepast,
Dus we willen chloor gewoon niet gebruiken.
Er is nog een praktisch ander aspect en dat is dat water…
en ook geen chloor.
Na een ruime inlooptijd kunnen we beginnen met het…
Chloor leidt gewoon tot die giftige verbindingen en dat…
En heel bijzonder in internationaal verband, we…
Nou, het resultaat daarvan is dan dat we dus... Aan de…
Key
Rank
89
10228
10229
10230
10234
10245
10253
10394
10456
10250
10227
97
288
224
224
224
224
224
224
224
160
112
112
44
Start
time
1787000
1907200
1910300
1912000
1923000
1957100
1989300
2501500
0
1978900
1900900
2488000
Evaluation
Table 6.10, Table 6.11 and Table 6.12 show that the ASR results (text_type = 7) are higher
ranked than the other types, because each word has their own record. This means that the
document length is effectively the smallest size possible. According to the relevance ranking
system Okapi BM25, these will be evaluated as being of a very high relevance.
(Source: http://nlp.uned.es/~jperezi/Lucene-BM25/)
Similar results can be expected for subtitles and slide titles in comparison with slide notes and
slide transcripts. These effects might be corrected by using user-defined weights for different
text types. However, this has not been tested in this research project. The current search
engine uses time-based ordering.
Annex F. Searching
263
7.
Evaluation
Since there is so much additional metadata available for online recorded lectures, a
Collegerama data system is an absolutely necessery addition. It gives several new options for
searching:
• creating tables of content
• generating tag clouds that give an insight into the subject of a lecture
• a layered search engine based on different data sources
• it allows for teachers to add additional lecture and chapter information after the recording
has been processed and stored
The database should always contain all slide content, slide titles and slide notes (when
available). Data collected from the lecturer (lecture title, chapter title etc) during postprocessing is also extremely useful, since they very accurately reflect the subjects and topics
covered in the lecture.
The chapter titles and the slide titles are essential for proper navigation (table of contents).
For a better understanding of a lecture, the subtitles are also regarded as a beneficial
element in recorded lectures. Subtitles allow for:
• improved viewing of the lecture (simultaneous listening to and reading of the spoken
text)
• create the option for translated subtitles in other languages by using machine translation
• enlarges the reach of a search engine, giving larger result sets for the viewer to select
264
Annex F. Searching
Expanding the usabbility of recorded lectures
Expanding
the usability of
recorded lectures
A new age in teaching
and classroom instruction
E.L. de Moel
EE.L. de Moel