Expanding the usability of recorded lectures
Transcription
Expanding the usability of recorded lectures
Expanding the usability of recorded lectures Expanding the usability of recorded lectures A new age in teaching and classroom instruction E.L. de Moel EE.L. de Moel Expanding the usability of recorded lectures A new age in teaching and classroom instruction E.L. de Moel for the degree of: M aster of Science in Com puter Science Date of submission: 26 February 2010 Date of defense: 3 March 2010 Committee: dr.ir. D. Hiemstra Dipl. Wirtsch.-Info R. Aly dr. T.H.S. Eysink Chair Databases University of Twente Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Faculty of Behavioural Sciences Department Computer Science Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente Summary Background At present, Delft University of Technology records around 10% of their lectures. This number is expected to increase in the following years. Having these recorded lectures opens the door to all kinds of new ideas and improvements for their educational program. At this moment they employ a video streaming system called Collegerama, which allows viewers with an active Internet connection to watch their lectures online. It combines a video stream of the lecturer with a series of screenshots of the accompanying PowerPoint slides. Research The main research question for this project is: “How can we efficiently and effectively present recorded lectures and course material to students at universities?” This can be divided into three sub-questions: • How can we increase the accessibility and availability of the recorded lectures in Collegerama? • How can we make recorded lectures easier to follow, especially for foreign speaking students? • How can we effectively and efficiently navigate and search within recorded lectures? The research approach for this project was to study the individual questions separately. In the second phase of the project, the individual results were combined into a set of integrated recommendations for further development in a short term implementation project. Accessibility and availability To increase the availability of the lectures, it is recommended to create a single video file from the Collegerama recordings. This will allow for the distribution over many other popular online multimedia platforms, such as YouTube-Edu and iTunes-U. A single video file distribution allows for offline viewing without an active broadband Internet connection (for example, while sitting in the train). This is not possible within the current Collegerama system. In this research project, a Collegerama lecture has been converted into a single video stream, after careful review of several layout designs and technical specifications. This lecture has been published on YouTube. Several other formats have been created, so that the lecture can also be distributed on all kinds of distribution platforms. This includes a smaller sized version, created specifically for mobile devices and has been tested on Apple’s latest iPhone. Easier to follow To make lectures easier to follow, we show that the creation and displaying of subtitles is useful. These subtitles can automatically be translated using machine translation. For this research project, Google Translate has been used which currently supports translation to 52 different languages. The quality of these is decent, depending on the target language that has been chosen. If necessary, this generated text can be enhanced by manual postprocessing. The current speech recognition technology has also been evaluated for the generation of proper subtitles, using the speech recognition engine created by University of Twente called SHoUT. It is concluded that this system is not yet sufficient to generate proper subtitles and manual post-processing to improve the output is always required. Navigation and search This research project has shown that to properly navigate through the available recorded lectures, the input from teachers is important. They need to provide the lecture title and divide their lectures into several chapters with a proper chapter title, based on separate timeframes (start time and end time). These chapters together with the slide titles and slide content form the foundation for navigation and searching. The search element can be further expanded by the available subtitles. For the purpose of this research project, all lecture titles Summary 3 and chapters provided by the lecturer, slide titles and content and the generated SHoUT transcripts for all 14 lectures (28 lecture videos) have been collected. The slide metadata has been digitally and automatically extracted from the original PowerPoint files. All this new information and metadata has been stored in a multimedia database, so that the retrieval options for the lecture content could be researched. This database serves as the source for all the additional options for navigation and searching: • generating a static and/or interactive table of contents for each lecture (based on lecture chapters) • generating tag clouds • displaying subtitles in several different languages • searching within lecture material To demonstrate its functionality, a prototype for a Collegerama lecture search engine has been developed. This is an online web application that can be accessed from any location with an active Internet connection and searches within all the above mentioned data linked to a lecture. Every search result provides a link to Collegerama, so users can immediately see the related part of the lecture. Future developments It is concluded that a better system for recording slides needs to be developed. Looking at the future of education and the increasing developments in technology, it’s clear that presentations are going to be supported by more animation and video. This means that an old screenshot recording system will no longer be sufficient to properly record PowerPoint slides. To further increase the usability of the recorded lectures, a new interactive way to discuss lectures with the teacher and other students can be introduced. It promotes the asking and answering of questions, not just by the teacher but also by fellow classmates. This can be done through the use of a dynamic message board that is linked to the timeline of each lecture. Students can comment and discuss on the different topics in the lecture. To support such a system, an extension of the current multimedia database is required, so that the messages along with their optional timeframes can be stored. With these recommendations, it is possible to use recorded lectures as a foundation for future online-given courses without the need for live lectures. 4 Summary Abstract The status of recorded lectures at Delft University of Technology has been studied in order to expand its usability in their present and future educational environment. Possibilities for the production of single file vodcasts have been tested. These videos allow for an increased accessibility of their recorded lectures through the form of other distribution platforms. Furthermore the production of subtitles has been studied. This was done with an ASR system called SHoUT, developed at University of Twente, and machine translation of subtitles into other languages. SHoUT generated transcripts always require post-processing for subtitling. Machine translation could produce translated subtitles of sufficient quality. Navigation of recorded lectures needs to be improved, requiring input of the lecturer. Collected metadata from lecture chapter titles, slide data (titles, content and notes) as well as ASR results have been used for the creation of a lecture search engine, which also produces interactive tables of content and tag clouds for each lecture. Recorded lectures could further be enhanced with time-based discussion boards, for the asking and answering of questions. Further improvements have been proposed for allowing recorded lectures to be re-used in recurring online-based courses. Abstract 5 6 Preface This report has been written as a result of my research project at University of Twente, in cooperation with Delft University of Technology. It was originally a project that started at TU Delft, in which my father was involved. He has been active at the university for the past 8 years at the chair of drinking water engineering, to improve their educational programs. When the development of a system for recording and sharing recorded lectures started (Collegerama), they were one of the first chairs at the university that started recording all their lectures. At that time, I was active as an online poker instructor, teaching enthusiastic players ways to improve their game. I told my father the techniques we used to teach and instruct students all over the world through the use of the Internet, either one on one or via online streaming recorded lectures. We began exchanging ideas about this subject and started to see the remarkable potential that lies ahead with this new form of online multimedia education. That is how I got involved with this research project. The goal of this project is to research possibilities for expanding the usability of recorded lectures at TU Delft and University of Twente and improve the means for distributing and sharing lecture and course material to students at universities. I would like to thank the following people: • Djoerd Hiemstra, my first supervisor, for assisting and guiding me during my research project • Robin Aly, my second supervisor, for providing ideas about evaluating certain topics in my thesis • Peter de Moel, for providing feedback and to bounce ideas back and forth about the future of online recorded lectures • Thijs Verschoor, for generating the subtitles for the 28 lecture videos of course CT3011, using the SHoUT engine developed by University of Twente • Willem Jansen, for helping me with the automatic conversion from pdf PowerPoint sheets to an Excel data sheet and a SQL 2005 database using a C++ script • Koen Zomers, for his useful Visual Studio 2008 tips while programming the Collegerama lecture search engine • my parents, sister, family and friends for their support during the past 9 months and during the course of my master February 2010 Erwin de Moel Preface 7 8 Table of contents Summary Abstract Preface List of figures List of tables 1. 2. 3. 4. 5. 6. 7. Introduction Existing systems for digitally recorded lectures 2.1 Massachusetts Institute of Technology 2.2 Delft University of Technology 2.3 University of Twente 2.4 Summary Distribution platforms 3.1 YouTube 3.2 iTunes 3.3 Portable Document Format (PDF) 3.4 Conclusions Subtitling 4.1 Subtitling process 4.2 Subtitles from speech recognition 4.3 Machine translation for subtitles 4.4 Text-to-speech for translated subtitles 4.5 Conclusions Navigation and searching 5.1 Meta-data for navigation and search 5.2 Metadata sources 5.3 Metadata storage 5.4 Course and lecture navigation 5.5 Collegerama lecture search 5.6 Conclusions Proposed improvements 6.1 Lecture accessibility 6.2 Navigation and searching 6.3 Student interaction 6.4 Increasing course frequency 6.5 Pilot project for further development Conclusions List of references List of URL’s Annexes Accompanying material Table of contents 3 5 7 10 12 13 15 15 18 24 25 27 29 33 37 39 41 42 43 46 50 50 51 52 54 55 56 60 66 67 67 68 69 70 72 75 77 79 81 83 9 List of figures Figure 1.1: Screenshot of a recorded lecture at TU Delft in Collegerama ............................. 13 Figure 2.1: Prof. Walter H.G. Lewin, the YouTube superstar .............................................. 16 Figure 2.3: Older Collegerama lectures (TN2012) recorded in 2004 had a smaller video size 18 Figure 2.4: Collegerama recording using a Tablet PC as an interactive blackboard .............. 19 Figure 2.5: Stationary and mobile recording unit of Collegerama ........................................ 20 Figure 2.6: Screenshot of a Collegerama lecture with too many screenshots because of mouse movement ...................................................................................................................... 20 Figure 2.7: Examples of the three presentation options ..................................................... 21 Figure 2.8: Collegerama screenshots of the three different presentation options ................. 21 Figure 2.9: The Collegerama lecture at Mechanical Engineering is streamed to the “movie theater” next door........................................................................................................... 22 Figure 2.10: The Collegerama live streamed online recordings (CT2011) were announced as “lectures in bed” ............................................................................................................. 23 Figure 2.11: Collegerama recording (214020) at University of Twente in November 2007 .... 24 Figure 2.12: Overview of video lectures (214020) in Blackboard (2nd quarter of study year 2009-2010)..................................................................................................................... 24 Figure 3.1: Combining the Collegerama components into a single video file enables broader distribution of recorded lectures ....................................................................................... 28 Figure 3.2: The two main components of Collegerama, video and slides ............................. 30 Figure 3.3: Layout of Collegerama elements within the resolution constraints for YouTube movies (1280x720) ......................................................................................................... 31 Figure 3.4: Collegerama as a vodcast for YouTube (1280x720) .......................................... 32 Figure 3.5: Converting PowerPoint slides into a movie file by recording the Collegerama slidedisplay ........................................................................................................................... 32 Figure 3.6: Vodcast of a Collegerama recording converts a small-sized video into a HD movie with room for proper subtitles .......................................................................................... 33 Figure 3.7: A typical PowerPoint slide at iPod resolution (320x240) .................................... 35 Figure 3.8: Collegerama vodcasts with different options for the video component at iPod aspect ratio .................................................................................................................... 35 Figure 3.10: PowerPoint slide in TU design at iPod size, with and without inserted movie components (20%) ......................................................................................................... 37 Figure 3.11: Adobe Presenter allows the creation of lectures based on PowerPoint ............. 37 Figure 3.12: Screenshot of lecture CT3011 implemented within Adobe Presenter ................ 38 Figure 4.1: Creation process for subtitles .......................................................................... 42 Figure 4.2: Screenshot of the program SubCreator ............................................................ 42 Figure 4.3: Translated subtitles improve the learning environment for non-native speaking students ......................................................................................................................... 43 Figure 4.4: Word correctness of SHoUT for the CT3011 lectures, clustered by speaker ........ 45 Figure 4.5: Subtitles created from the SHoUT transcript .................................................... 46 Figure 4.6: The performance of some English to German translation engines compared to human translation (=ref) [6] ............................................................................................. 48 Figure 4.7: Translated subtitles from Dutch to English in YouTube ..................................... 49 Figure 4.8: Will automatic real-time translation engines become available within the next decade? (Source: http://www.meglobe.com) .................................................................... 50 Figure 5.1: Catalog of recorded lectures in a course .......................................................... 51 Figure 5.2: Schematic view of a multimedia information retrieval system [14] ....................... 53 Figure 5.3: Searching in parallel metadata of videos [18]..................................................... 53 Figure 5.6: Interactive TOC for recorded lecture #15 in course CT3011, generated from the Collegerama data system................................................................................................. 57 Figure 5.7: Tag cloud for recorded lecture #15 in course CT3011, generated by Wordle, with and without deleted words by prof J.C. van Dijk [54] .......................................................... 58 Figure 6.1: Online viewing (YouTube) and available downloads and links for a recorded lecture............................................................................................................................ 68 10 List of figures Figure 6.2: Tools created from the Collegerama database (slide navigator, tag clouds and search application) will significantly improve the accessibility of recorded lectures .............. 69 Figure 6.3: Time-lined online discussions on recorded lectures are common practice for the online educational poker community ................................................................................ 69 Figure 6.4: Multiple scheduling of courses with recorded lectures and online/moderated assistance by a lecturer ................................................................................................... 70 Figure 6.5: Examples of multiple scheduled courses .......................................................... 71 Figure 6.6: Online poker courses are scheduled on specific days, in order to enlarge the attendance and to promote live online discussion (Source: http://www.deucescracked.com/) ...................................................................................................................................... 71 Figure 6.7: Recorded lectures are embedded in a Multimedia Information Retrieval System, containing multimedia content and structured course and lecture metadata ....................... 73 List of figures 11 List of tables Table 2.1: Number of slides/screenshots for the three presentation options ........................ 21 Table 4.1: Quality assessment of word correctness by speech recognition on lectures ......... 45 Table 4.2: Some popular machine translation engines ....................................................... 47 Table 5.1: Primary metadata for selecting of and navigating in recorded lectures ................ 52 Table 5.2: Analogy of navigation in DVDs and recorded lectures ........................................ 52 Table 5.3: Database table Content ................................................................................... 55 Table 5.4: Database table Lectures .................................................................................. 55 Table 5.5: List of Text_types and the amount of records and words in the database for course CT3011 .......................................................................................................................... 56 Table 5.7: Occurrences of the 15 most used nouns from ASR versus human-made subtitles 61 Table 5.9: Video length per data source in Collegerama lecture search for course CT3011 ... 63 Table 5.10: Precision and recall measurement for different data sources on 3 important words of lecture #15................................................................................................................. 64 Table 6.1: Current situation and goals for future academic courses .................................... 72 Table 6.2: Additional products for expanded usability of recorded lectures ......................... 73 12 List of tables 1. Introduction Background For the past 10 years, there has been little or no change in the way that lectures are given at the various Universities throughout the world. With the emerging of new technologies, there are numerous new possibilities for improving the method in which information is shared between student and teacher. Through the use of the Internet, there is an incredible amount of additional material that can be found in order to delve even deeper into the subject matter. Most universities already employ an online community and messaging system where lecture sheets, additional subject material and practice exams are shared. Every year, a teacher of a course gives a similar lecture compared to the previous year, while a new group of students follows the course. As long as both the course and lecture material don’t go through a significant change, this seems somewhat redundant. In the past year, TU Delft has also been faced with a problem. The amount of registrants for certain courses exceeds the maximum capacity of the largest available classroom. TU Delft has been developing its own system for the production and streaming of digitally recorded lectures, called Collegerama. They stream lectures on a web server to further support their learning programs. Figure 1.1 shows an example of a lecture given at the University of Delft that has been recorded and can be viewed online. Figure 1.1: Screenshot of a recorded lecture at TU Delft in Collegerama (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=7548f752-101b-417e-a4e7-58aebc595376) The recorded lectures on Collegerama contain the following elements: • a video stream of the lecturer • screenshots of the presentation sheets or an interactive screen (tablet PC) on which the presenter writes notes • navigation tools for scrolling through the video and/or slides • controls for play/pause, full screen mode and to modify the playback speed Goals TU Delft would like to investigate possibilities of expanding the usability of their recorded lectures. They have several ideas for achieving this: • subtitling the lectures for students with hearing problems • subtitling in other languages (English subtitles for Dutch lectures and vice versa) • translated subtitles spoken over the original video stream (with or without subtitles) • searching in lecture content (whether in transcripts/subtitles) • searching in lecture content (by handmade content overview and/or computer generated keywords) • distribution as a vodcast (for PDA, iPod, iPhone or another type of mobile phone) 1. Introduction 13 Research questions The main research question for this project is: How can we efficiently and effectively present recorded lectures and course material to students at universities? This question can be divided into three sub questions: • • • How can we increase the accessibility and availability of the recorded lectures in Collegerama? How can we make recorded lectures easier to follow, especially for foreign speaking students? How can we effectively and efficiently navigate and search within recorded lectures? Project boundaries This research project will not include investigations on user preferences (teachers and students), best educational practices, optimal teaching methods for recorded lectures etc. It will be restricted to alternatives technically feasible within the E-learning and ICT environment of TU Delft. This is not restricted to presently used ICT tools, but might include commercial available products implementable within the TU Delft environment. Starting point for this research project are the presently produced Collegerama recordings. No other methods of recording lectures will be evaluated. This research project includes a technology-centered approach to the subject. A user-centered approach might be taken in a succeeding research project, evaluating the different proposed extending products and applications for using recorded lectures and the benefits and problems it might bring to the different user groups (teachers, local students, foreign exchange students, the University etc). Report outline The entire report can be separated into three parts; chapters 2 and 3 give an introduction into systems currently used for the distribution of recorded lectures, followed by chapters 4, 5 and 6 in which possibilities for expanding the usability of these recorded lectures are discussed. Chapter 7 contains the conclusions of the entire research project. In chapter 2, a detailed history of recorded lectures is described, starting with the way that the Massachusetts Institute of Technology produces and distributes its lectures. This is followed by the current Collegerama system that is used by both University of Twente and Delft University of Technology. Chapter 3 describes several formats for producing and storing recorded lectures. It mentions several audio and video formats that can be used, in what way a timeline can be determined for a lecture and how to handle the link between the video stream of the lecturer and the slides that accompany the presentation. After this introduction into the world of online recorded lectures, chapter 4 discusses new possibilities for expanding its usability through the means of subtitling and translation. In chapter 5, a description of several different methods for navigating and searching through the various recorded lectures is given. Chapters 4 and 5 are concluded in chapter 6, by describing a list of proposed improvements. An actual prototype for a lecture search engine has been designed and a new way of browsing through a group of lectures for a single course is demonstrated. Finally, the conclusions of the entire report are presented in chapter 7. The printed version of the report does not include the annexes. The annexes are only included in the electronic version of the report. An accompanying DVD includes the intermediate and final products of this research project. 14 1. Introduction 2. Existing systems for digitally recorded lectures Ten years ago (in 1999), the Massachusetts Institute of Technology started broadcasting several unique physics lectures over a local TV channel. This was primarily done to gain more exposure for their educational programs. It received a lot of positive results, which caused them to start recording and distributing more lectures from other sciences by use of the Internet. As time went on, the recorded lectures were also being used to improve and expand their learning programs for their own students by publishing them on the Internet. Several years later, a trend started to emerge and several other large universities in the United States, such as Berkeley and Stanford started to do the same. In 2000, Delft University of Technology, followed by University of Twente, started its own lecture recording programs. After running a few successful pilots, they are now recording more and more lectures each year. It’s not going to be surprising to see that within the next couple of years, all the Dutch universities are doing the same with their courses and lectures. In this chapter, the history, recording process and developments with regards to recorded lectures are discussed. The differences between the techniques used at MIT, TU Delft and University of Twente will be shown and several drawbacks in the current system that both Dutch universities are using will be described. For further background information about the research on this topic, see Annex A and B. 2.1 Massachusetts Institute of Technology Massachusetts Institute of Technology (MIT) is a private research university located in Cambridge, Massachusetts in the United States. It has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological research. It is one of the most prestigious technical universities in the world. Their reputation is based on their scientific output through the publishing of scientific articles and reports and the awards received by their staff. Seventy-three members of the MIT community have won the Nobel Prize, including seven current faculty members.[24] MIT enrolled 4,232 undergraduates and 6,152 graduate students during the fall of 2009– 2010.[25] It employs about 1,000 faculty members. Its endowment and annual research expenditures are among the largest of any American university. 75 Nobel Laureates, 47 National Medal of Science recipients and 31 MacArthur Fellows are currently or have previously been affiliated with the university.[24] The aggregated revenues of companies founded by MIT alumni would be the seventeenth largest economy in the world.[26][27] OpenCourseWare In 2000, MIT started the concept of publishing their course material on the Internet, which would be publically available for everyone. They called this project OpenCourseWare (OCW). The first proof-of-concept site was published in 2002, containing 50 courses. By November of 2007, MIT completed the initial publication of almost their entire curriculum which contained over 1,800 courses in 33 academic disciplines.[29] MIT also publishes some of their courses in one or more translated versions and have formally partnered with four organizations that are translating OCW course material into Spanish, Portuguese, Simplified Chinese, Traditional Chinese and Thai. Their material has already been translated into at least 10 different languages, including French, German, Vietnamese, and Ukrainian. 2. Existing systems for digitally recorded lectures 15 Since 2008, MIT has added audio and video-taped lectures to their OCW website. These lectures were recorded between 1999 and 2008 and have been published on YouTube, iTunes and VideoLectures.net. The OCW concept has received an enormous amount of attention from all over the world, both from students as well as from universities. In 2005, the OpenCourseWare Consortium was established to advance education and empower people through open courseware. At present, about 200 higher education institutions and associated organizations from around the world are a member of this organization, including TU Delft, the Dutch Open University and HAN University of Applied Sciences (Hogeschool van Arnhem and Nijmegen). Because of the positive response on their OCW activities, MIT employs a special OCW office where close to 20 people are working every day.[30] Walter Lewin In 1999, MIT started recording the lectures of their most popular courses. Professor Walter Lewin is one of the most well-known lecturers today, who has been made famous through TV and the Internet. He is an extremely enthusiastic physics teacher who received his Ph.D. degree in nuclear physics in 1965 at Delft University of Technology. He joined MIT in January of 1966 as a post-doctoral associate and became an assistant professor later that year.[28] Figure 2.1: Prof. Walter H.G. Lewin, the YouTube superstar (Source: http://bibliotematica.wordpress.com/2009/06/05/walter-lewin-quiero-morir-en-una-clase/ and http://www.pbs.org/kcet/wiredscience/blogs/2007/12/free-to-be-mit.html) Even before the advent of MIT OpenCourseWare, Lewin’s lectures could be found on UWTV in Seattle, where he reached an audience of about four million people, and on MIT Cable TV, where he helped freshmen with their weekly homework assignments. Lewin’s lectures on “Newtonian Mechanics, Electricity and Magnetism” and on “Vibrations and Waves” comprise some of the most popular content on MIT OpenCourseWare. He consistently holds a spot in the most downloaded videos on Apple’s iTunes-U as well as on YouTube-Edu. His unique style of teaching has captured the attention of a broad range of students, educators and selflearners.[28] Thanks to the various distribution channels that MIT OCW employs, the lectures of Walter Lewin now receive about 3,000 views a day, from people all over the world. Online distribution YouTube is the most popular website for online video content in the world. Nearly 20% of all global Internet users visit YouTube, an average of 16 page views per visit. In October 2009, they were ranked the 4th in the top 500 websites list, right after Google, Facebook and Yahoo.[33] Since March 2009, YouTube has a special section for education called YouTube-Edu. In April of 2009, about 150 universities and colleges in the United States have submitted around 25,000 educational videos. 8 months later, in December of 2009, there are already 298 participating universities. The videos on YouTube-Edu are not all recorded lectures, but also short movies (6 to 12 minutes).[34] 16 2. Existing systems for digitally recorded lectures iTunes-U is a part of the iTunes Apple Store. The service was created to manage, distribute, and control access to educational audio and video content for students within a college or university or for outside viewers. The member institutions are given their own iTunes-U site that makes use of Apple’s iTunes Store infrastructure. The online service is without cost to those uploading or downloading material. Content includes course lectures, language lessons, lab demonstrations, sports highlights and campus tours provided by many top colleges and universities from the US, United Kingdom, Australia, Canada, Ireland and New Zealand.[35] In November of 2009, iTunes-U holds over 200,000 educational audio and video files from top universities, museums and public media organizations around the world. About 200 international universities and colleges have published content on iTunes-U, including MIT, Yale, Stanford, UC Berkeley, Oxford, Cambridge, Freiburg, Lausanne, TU Aachen and Melbourne. The number of participating universities, as well as the number of audio and video files, has doubled in the previous 7 months. Apart from iTunes-U and YouTube, which are commercial services, there are also a few websites who offer their services for other reasons. A popular example of this is VideoLectures.net. Their main purpose is “to provide free and open access of high quality video lectures presented by distinguished scholars and scientists at the most important and prominent events like conferences, summer schools, workshops and science promotional events from many fields of Science. The portal is aimed at promoting science, exchanging ideas and fostering knowledge sharing by providing high quality didactic contents not only to a scientific community but also to a general public.”[32] A recent addition to this group is Academic Earth, which launched in March of 2009. Their mission statement, as stated on their website: “Academic Earth is an organization founded with the goal of giving everyone on earth access to a world-class education”.[31] Video composition Every MIT video has a camera angle that is fixed on the front side of the classroom. Most of the time a professor is walking in front of a whiteboard while explaining several course topics. The video camera follows the professor and zooms in and out on the blackboard whenever the professor is writing on it. Sometimes during the video, parts of the surrounding classroom are visible and you can see students sitting down and/or people walking in. Most MIT professors only use the blackboard, while PowerPoint slides, overhead projectors or projected illustrations are rarely used. In case this does happen, the content of these slides are included in the video by zooming in on the projected screen, or the recorded video might show a text screen referring to the lecture material. These slides are published as a pdf file under the “Lecture notes”. Figure 2.2: MIT lecture with a professor using slides, which are also included in the recorded video (Source: http://www.youtube.com/watch?v=R90sohp6h44) 2. Existing systems for digitally recorded lectures 17 The MIT lectures were initially recorded with two cameras; one camera was used for the overview and one camera took care of the close-ups of the blackboard. More recent recordings included two more cameras in the back of the classroom to provide a wider overview of the lecturer in front of the class. All these multi-camera lecture recordings had to undergo some form of post-production to work out the different camera angles, so that a single continuous video could be constructed combining all of the different recorded footage. Transcripts, captions and annotations About 60% of the recorded lectures at MIT are provided with a transcript. The transcripts are presented on the MIT-OCW website, on the page of the related lecture under the embedded YouTube movie. Most of the time these transcripts are also available as a pdf file. In YouTube, these transcripts are used for the YouTube Caption option that shows subtitles in the bottom part of the movie. Captions or subtitles are available in YouTube since August of 2008. 2.2 Delft University of Technology Development of Collegerama In the year 2000, the section Multimedia Services (MMS) of Delft University of Technology started with the development of Collegerama in a pilot project on streaming media.[38][39] The main goal of this pilot was the recording of lectures which could be viewed by students within Blackboard, their digital learning environment. These “web lectures” were regarded as instruments to improve study results and to increase the efficiency of the education at the university. MMS selected the commercially available Mediasite system, created by Sonic Foundry, as a basis for Collegerama. The term Collegerama is a private brand created by TU Delft, so that they could be independent from the technical infrastructure for their web lectures. Selecting a standard product avoids the high development cost for creating a new system. By using an existing solution, the university also has the added benefit of getting new updates and features within the Mediasite platform. The early years In April and May of 2004, Professor Barend Thijsse was teaching the BSc course TN2012 Quantum mechanics. He was giving the course for the last time, because he was leaving the university. Since he was recognized as an outstanding teacher, TU Delft wanted to record his lectures now that they still had the chance. He gave the course and lectures together with his successor, Professor Leo Kouwenhoven. Figure 2.3: Older Collegerama lectures (TN2012) recorded in 2004 had a smaller video size (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=735a8c5902864988b01157c16f8e632e) 18 2. Existing systems for digitally recorded lectures Mediasite was used for recording the 25 lectures (40-45 minutes) of the BSc course. A Tablet PC functioned as a blackboard to write notes on and both lecturers had a speaker microphone attached to their jackets. The recorded courses were used during the succeeding years as a reference until a drastic curriculum change in September of 2008. After this successful project, there were 3 additional presentations recorded using Mediasite from September until December of 2004, as part of tests for the technical infrastructure of Collegerama. These web lectures were filmed with poor audio recording equipment (no special microphone for the speaker) and a small sized video recording (256x192 resp. 240x180). By that time, 240x180 was the standard video size for Mediasite recordings. In January of 2006, Collegerama was used for the recording of the closing speech by the Rector Magnificus, Prof. Dr. Ir. J.T. Fokkema, at the 164th Dies Natalis of TU Delft. This was the start of a yearly tradition where all the Dies Natalis speeches were recorded. The video was recorded at a higher resolution of 320x240, which is still the standard Collegerama video resolution in 2009. Between September and December of 2006, the 30 lectures (40-45 minutes) of the BSc course TN2545 Signals and Systems by Professor Lucas van Vliet were recorded. This course was normally given in Dutch, but for the sake of the recordings they decided to give them in English to allow non-Dutch speaking students to follow the course. The recorded lectures consist of videos showing the lecturer and synchronized screenshots of a Tablet PC, used as an interactive blackboard. These recorded lectures were actually used for several years, until in September 2009 a new lecturer took over the course. They are currently available on Blackboard as reference material. Figure 2.4: Collegerama recording using a Tablet PC as an interactive blackboard (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9) Collegerama recording Collegerama has two possibilities for recording lectures. They can either use a stationary setup that has been placed at a few classrooms at TU Delft, or they can use the mobile station which can be used at any given location. Both of the systems consist of a stationary webcam which can be operated remotely by use of a joystick. The operator, usually a student aid, makes sure that the camera is always pointed at the lecturer while he or she is moving around the classroom. The laptop that comes with the presenter unit is connected to a beamer, so that the PowerPoint slides can be viewed in the classroom and recorded by the system. The recording system takes screenshots of the beamer screen, based on computer activity. Every 1 to 4 seconds, the system checks for a change on the screen. If a different slide has been loaded or the position of the mouse has been changed, a new screenshot will be saved as a jpeg image file. 2. Existing systems for digitally recorded lectures 19 Figure 2.5: Stationary and mobile recording unit of Collegerama The current system used to record the slides of the lectures relies on the fact that changes on the screen always correspond to a change in the presentation. This is clearly not the case and several scenarios can cause a faulty screenshot to be taken: • a video is played within a PowerPoint slide • the lecturer inadvertently moves the mouse • the lecturer leaves PowerPoint to demonstrate an application on his PC This recording flaw creates a problem, because a lot of Collegerama lectures contain a lot of abundant images that were accidentally saved. Some of these lectures contain 400 screenshots, when in fact the original PowerPoint presentation only had about 50 slides. While playing the lectures, the interface relies on the screenshots that are created during the recording for navigation. The problem with this navigation system is that once the lecture contains an overflow of useless slides, there is no other way of browsing through the lecture except for the video timeline. An example of the navigation element in such a lecture is shown in Figure 2.6. Figure 2.6: Screenshot of a Collegerama lecture with too many screenshots because of mouse movement After the lecture has been given, the data is sent to the presentation server. It will process the different data sources and create three different outputs: • an audio/video stream (wmv file) • pictures of the different PowerPoint slides or computer screenshots (jpeg files) • different settings and additional information about the lecture (xml file) The presentation server will synchronize all the different elements and will store the required information in the xml file. This information will later be used to correctly display the video in combination with the screenshots. When the presentation has been processed, it is written to the Collegerama web server and is now available for students with Internet access all over the world.[42] 20 2. Existing systems for digitally recorded lectures Presentation options During the presentation, the lecturer is provided with three different presenting options: • blackboard The lecturer uses the blackboard or an overhead projector to give his lecture, while the video camera records the content. • PowerPoint This works in combination with a prepared set of PowerPoint slides that will be displayed while the presentation is being given. • screen capturing The contents of the computer screen will be displayed during the presentation, which allows for the lecturer to use external software such as computer simulations or written text on a Tablet PC and record the results as separate screenshots. Figure 2.7: Examples of the three presentation options Each of these presentation options uses the same storage system, which is based on screen activity. Especially while using the blackboard or desktop methods, there will be an abundant amount of images stored, since every mouse movement and change on the screen, when writing down notes, will cause a new screenshot to be saved. Collegerama uses a uniform view for all three presenting options, as is shown in the examples given in Figure 2.8. Figure 2.8: Collegerama screenshots of the three different presentation options (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=ca42dce5-bb51-4c39-93de-50528dd6b880 and http://collegerama.tudelft.nl/mediasite/Viewer/?peid=724886f7-cfd0-441d-ae85-1fae0cbb28a1 and http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9) Figure 2.8 illustrates that Collegerama is suitable for the showing of lectures in which a PowerPoint presentation or a Tablet PC is used (middle and right screenshot). In these cases the most detailed information is presented on the presentation block. For a lecture with blackboard only (left screenshot), the Collegerama system is a little superfluous. The three presentation options differ significantly in the number of slides (or screenshots). This difference is illustrated in Table 2.1. Table 2.1: Number of slides/screenshots for the three presentation options Presenting option Number of slides Blackboard PowerPoint Screen capturing 0 (no slides picture) 30 308 2. Existing systems for digitally recorded lectures Navigation pages (list – small – large) 0-0-0 2-3-5 12 - 15 - 29 21 With respect to navigation, only lectures with PowerPoint slides seem to be suitable for a Collegerama recording. Blackboard lectures lack the navigation by slides/screen shots, while Tablet PC lectures have too many screenshots for a proper navigation. For the latter, the screenshots can be clustered in chapters as part of the post-processing process of a Collegerama recording. Collegerama as a service Starting in September of 2007, Collegerama became part of the regular facilities for education at TU Delft, under the responsibility of the University Corporate Office for Education and Student Affairs (O&S). This office is also responsible for the electronic learning system Blackboard. As a consequence, recording of lectures was financed by the Corporate Office and became free for the lecturers at the different faculties. Before that time recordings were made at a rate of € 500,- per recorded session of 45 minutes. The scheduling of recording units and operators is now organized by O&S and lecturers can apply there to have their lectures recorded. This service has resulted in a huge increase of recorded lectures. In September and October of 2009 alone, around 60 to 75 lectures were recorded each week (30 to 40 sessions of 2 lectures of 45 minutes each). This amounts to 5% of all lecture hours given each week at TU Delft. In September of 2009, the faculty Mechanical Engineering was faced with a huge student overflow. The 500 first-year students did not fit in their largest lecture room available, which had a capacity of 300 students. To overcome this problem, they used two lecture rooms. In one lecture room, the lecturer gives the live lecture which was recorded using a high quality camera. This recording was then streamed to the other lecture room via a larger data stream to accommodate the higher quality. The recorded lectures were afterwards also available at a lower quality via Blackboard and Collegerama. The faculty called this service: “lectures in a movie theater”. Figure 2.9: The Collegerama lecture at Mechanical Engineering is streamed to the “movie theater” next door (Source: Delta 27, 17 September 2009) Collegerama live The mobile recording units of Collegerama have a personal storage unit. After the lecture recording has been completed, the stored data is uploaded to the central Collegerama server. It is also possible to stream this recording to the server immediately while recording, thus generating a live stream to the outside world. This live streaming process has a 5 to 10 second delay between recording and broadcasting. In the Collegerama setup, a URL for a Collegerama lecture is automatically created 4 hours before the recording. This URL is published before the lecture starts, so that every student can watch it from their own room or any other location that has live Internet access. 22 2. Existing systems for digitally recorded lectures This live streaming system was used for the course CT2011 Watermanagement in September-October of 2009. The course was moved within the curriculum from the third year to the second year, which caused the student attendance to double to about 500 students. This again largely exceeded the maximum seating capacity of the largest classroom available at the faculty of Civil Engineering (it holds only 350 students). To reduce the number of students attending, the lectures were scheduled on Monday and Friday during the first two lecture hours. The lectures were also announced to be broadcasted live and received a wide media attention under the title “lectures in bed”. The system was a huge success. After the initial lecture, the number of attending students reduced to around 100 attendees, with a large number of online viewers during lecture hours or several hours after the lecture. The movie theater lecture room stayed empty after the first lecture. Figure 2.10: The Collegerama live streamed online recordings (CT2011) were announced as “lectures in bed” (Source: Delta 27, 17 September 2009) OpenCourseWare In March 2007, TU Delft started its own OpenCourseWare pilot project.[40] In this pilot project the course material of about 20 MSc courses from 6 different disciplines were published. Collegerama lecture recordings were part of this material. This initiative was very well received and students found the Collegerama recordings to be of extraordinary quality. Because of the national and international response, TU Delft decided in 2008 to continue its OpenCourseWare program at a more extensive scale. In October 2009, TU Delft hosted the yearly conference of the OpenCourseWare consortium, in which more than 200 universities worldwide are active.[30] The Director of Education and Student affairs of TU Delft is a member of the board of the OCW Consortium. In January 2010, TU Delft has renewed its OpenCourseWare website. One of the goals for update was to give the recorded lectures a more pronounced exposure and to give it the look and feel of the original Blackboard courses. A month later, they have created an iTunes-U account and have started to publish recorded lectures, partly as a result of the work that has been described in this research report. Annex C gives further information on this subject. Future developments In 2010 a new viewer for Collegerama will be implemented at TU Delft (Mediasite version 5.2). This Silverlight player has the look and feel of viewing YouTube movies, with a small slide viewer for navigation and of viewing iPhone or Windows 7 screens (dynamic screen changes). This new viewer will still encounter the major drawback of having an overload of useless slides, because this problem is related to the recording process, not to the viewer. 2. Existing systems for digitally recorded lectures 23 2.3 University of Twente In 2007, University of Twente started a pilot project on recorded lectures. This pilot project used the experience of TU Delft with its Collegerama system. The same technical infrastructure of Collegerama was also used at University of Twente. Within the pilot project, the lectures of 10 BSc courses have been recorded. One of these was the course Algorithms, Data structures and Complexity (214020), which is also a pre-master course for the master Computer Science. Between November 2007 and January 2008, 8 of their lecture sessions have been recorded. Afterwards the 7th lecture session was not available due to technical difficulties. The recorded sessions include two lecture hours (40 minutes each) and the intermediate coffee break (a 15 minute recording of a clock). Figure 2.11: Collegerama recording (214020) at University of Twente in November 2007 (Source: http://videolecture.utwente.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a028-34b7f1d0dfdb) After each recorded course, an evaluation form was used to register the opinion of the students. Based on the positive results of the pilot project it was decided to continue the project. Since September 2008, lectures at University of Twente can be recorded with Collegerama i.e. Mediasite.[41] At University of Twente, two lecture rooms are available with recording facilities for Collegerama/Mediasite (Horst C101 and Cubicus B209). There is also one mobile recording unit available (Spiegel, for room 1, 2, 4 and 5). This unit can also be used in other buildings and lecture rooms, if requested. The service for recording lectures is free of charge and is provided by the ICT Service Centre of University of Twente. In September 2009, University of Twente started using Blackboard as its digital learning system as a replacement for Teletop. At present, TU Delft and University of Twente use the same technical infrastructure for their digital learning environment as well as their lecture recording and streaming system. Figure 2.12: Overview of video lectures (214020) in Blackboard (2nd quarter of study year 2009-2010) (Source: http://blackboard.utwente.nl/webapps/blackboard/content/listContent.jsp?course_id=_758_1&content_id=_ 92264_1) 24 2. Existing systems for digitally recorded lectures 2.4 Summary Massachusetts Institute of Technology (MIT) was the first university to start recording their lectures back in 1999, by taking a video camera into the classroom and video-taping the lecturer as he was teaching. Several years later, TU Delft started doing the same, using a more sophisticated system that simultaneously records the slides along with a video stream of the lecturer. This made the recorded lectures easier to follow, but also added a problem. The lectures were no longer contained within a single video file, which severely limits the possibilities for different online distribution channels. In 2007, University of Twente decided to use the same system for the recording and distribution of their lectures as TU Delft. After several pilot projects, they purchased 2 stationary recording units and 1 mobile recording unit of Collegerama. The current Collegerama system has several problems: • navigation, since it is based on inconsistent screenshots of slides • not distributable through a single (video) file • no easy way of browsing/searching through a lecture 2. Existing systems for digitally recorded lectures 25 26 2. Existing systems for digitally recorded lectures 3. Distribution platforms Collegerama / Mediasite player At present, lectures recorded in Collegerama can only be viewed as streaming video with an Internet connection to the Collegerama server. The movies are played within the custom Java player developed by Mediasite. This setup has several advantages: • no distribution channels required, avoiding its institutional and technical requirements • single point of entry, with its benefits on updating (its content as well as the player) • no storage required at the point of viewing/listening Aside from these advantages, there are also a number of severe drawbacks to the current Collegerama distribution platform: • limited distribution options • no offline viewing • compatibility • limited expansion options Limited distribution options The current Collegerama system can be divided into two parts: • video stream of the lecturer (wmv) • screenshots of PowerPoint slides (jpg) During playback, the web player will update the screenshots based on a time index that is stored in the configuration file of the lecture. Basically, a video stream is played and the corresponding pictures are reloaded on the right side of the viewer during playback. Virtually all online distribution platforms operational today require a video file to be uploaded. This file will usually be re-encoded using a specific codec compatible with that player. YouTube for instance uses mp4 as the way of storing its online video files. Unfortunately this poses a problem when distributing lectures stored within the Collegerama server over any of these other multimedia platforms. It is possible to upload the video stream, since that component is stored in a video file format, but without the lecture slides to accompany it the lecture will miss most of its important content. No offline viewing Since all the lectures are streamed over the Internet, it is not possible to view the Collegerama lectures without an active Internet connection. This means that it’s not possible to store the lectures and view them later on your laptop, iPhone/iPod or other mobile multimedia device. Compatibility The current player that is being used within Mediasite is based on Microsoft Silverlight, which has a bad compatibility with other operating systems such as Linux. There is a custom made version available created by Novell, but this solution won’t always work when Mediasite releases a new version of their player. Users are dependent on the developments by Novell to keep their system up to date.[43] Limited expansion options At present, the Mediasite player cannot be easily integrated with (multi-language) subtitles. This might be improved in future versions, but Collegerama is dependent on the Mediasite developments in order to add custom functionality. Other channels such as YouTube do provide these options as a default and are ahead of Mediasite in this area. 3. Distribution platforms 27 Figure 3.1: Combining the Collegerama components into a single video file enables broader distribution of recorded lectures Other distribution platforms In this chapter there are two important platforms for which the options and capabilities have been researched, YouTube and iTunes. These two have been selected for the following reasons: • their worldwide exposure • the acceptance of their technical specifications by other external platforms • the experiences of MIT (see Annex A) • the compatibility of these technical specifications on TU Delft’s own Blackboard learning environment, the OpenCourseWare website and other web platforms The distribution of recorded lectures through these platforms requires the creation of a single video file, which can be uploaded to their server. For the creating of such a Collegerama vodcast/podcast, the following aspects should be examined: • content (slides, audio, video, subtitles and any combination of these) • presentation of the content (lay-out, introduction tune/movie, branding) • video quality (resolution, frame rate) • format of video file (mov, wmv, flv, mp4, codec etc) • audio quality (stereo/mono, frequency range) • format of audio file (mov, mp3, mp4, codec etc) Above mentioned technical specifications (quality, codec) primarily determine the file size. The technical specification should balance between quality (usability) and quantity (download time and storage requirements). Outline This chapter will focus on the distribution of Collegerama over various different platforms. The popular audio/video sharing mediums YouTube and iTunes will first be covered. After that, a new lecture creation tool called Adobe Presenter will be demonstrated. Each of these platforms will be thoroughly examined and a conclusion will be made about the quality of each of these systems. For further background information about the research on this topic, see Annex C. 28 3. Distribution platforms 3.1 YouTube YouTube is a video sharing website where users can upload and share their videos. Three former PayPal employees created YouTube in February of 2005. In November 2006, YouTube was bought by Google Inc. for $1.65 billion and is now operated as a subsidiary of Google. It uses Adobe Flash Video technology to display a wide variety of user-generated video content and is currently the biggest distributor of streaming online video content. Unregistered users can watch the videos, while registered users are permitted to upload an unlimited number of videos. Videos that are considered to contain potentially offensive content are available only to registered users over the age of 18. The uploading of videos containing copyright violations is prohibited by YouTube’s terms of service. Accounts of registered users are called “channels”.[44] In the last few years YouTube became a medium for several Universities to publish their recorded lectures on. One of the first was MIT (Massachusetts Institute of Technology), who joined in October of 2005. Later, other Universities like Purdue (2006), Stanford (2006), UC Berkeley (2007) and Harvard Business (2007) started publishing recorded lectures and course material via the popular Internet medium. Video formats for YouTube YouTube’s video playback technology, based on the Adobe Flash Player, allows the site to display videos with quality comparable to more established video playback technologies such as Windows Media Player, QuickTime, and RealPlayer. These formats generally require the user to download and install a web browser plug-in to view video content. Viewing Flash video also requires a plug-in, but market research from Adobe Systems has found that its Flash plug-in is installed on over 95% of the personal computers around the world.[45] Videos uploaded to YouTube are limited to ten minutes in length and a file size of 2 gigabyte.[47] When YouTube was first launched in 2005, it was possible for any user to upload videos longer than ten minutes, but YouTube’s help section now states: “You can no longer upload videos longer than ten minutes regardless of what type of account you have. Users who had previously been allowed to upload longer content still retain this ability, so you may occasionally see videos that are longer than ten minutes.”[46] The ten minute limit was introduced in March 2006, after YouTube found that the majority of videos exceeding this length were unauthorized uploads of television shows and films. Video formats and quality YouTube accepts videos uploaded in most formats, including .WMV, .AVI, .MKV, .MOV, MPEG, .MP4, DivX, .FLV, and .OGG. It also supports 3GP, allowing videos to be uploaded directly from a mobile phone. They originally offered their videos in only one format, but now use three main formats, as well as a “mobile” format for the viewing on mobile phones. The original format, now labeled “standard quality”, displays videos at a resolution of 320x240 pixels using the Sorenson Spark codec, with mono MP3 audio. This was, at the time, the standard for streaming online videos. “High quality” videos, introduced in March 2008, are shown at a resolution of up to 860x480 pixels with stereo AAC sound. This format offers a significant improvement over the standard quality. In November 2008, 720p HD support was added. At the same time, the YouTube player was changed from an aspect ratio of 4:3 to a widescreen 16:9 resolution. 720p videos are shown at a resolution of 1280x720 pixels and encoded with the H.264 video codec. They also feature stereo audio encoded with AAC. 3. Distribution platforms 29 Collegerama components A Collegerama lecture has screenshots of the PowerPoint slides and a video of the lecturer giving the lecture. On the web interface, these have been split up into separate parts. If the recorded lectures in Collegerama are to be published as a vodcast, the different elements need to be combined into a single multimedia file format. In • • • • the current video system of Collegerama, the following elements are kept in sync: video of the lecturer audio PowerPoint slides closed captions/subtitles (not currently used at TU Delft) Figure 3.2: The two main components of Collegerama, video and slides Video of the lecturer The video part of Collegerama usually shows the lecturer, but might occasionally be switched to a recording of the display screen for animations, movies etc. Collegerama publishes the video stream using the following quality settings: Resolution: 320 x 240 (ratio 4:3) Frame rate: 25 fps Bit rate: 370 kb/s Codec: wmv3 In short: Windows Media Video 9 / 320x240 / 25.00fps / 341kbps Audio Audio is an important part of the vodcast. It contains all the spoken text and explanations by the lecturer. A lecture can be followed by only having an audio recording without video, but not the other way around. This is shown by podcasts of lectures. A video stream without audio doesn’t make any sense. Collegerama publishes the audio stream using the following quality settings: Channels: 2 (Stereo) Sampling rate: 22050 Hz (22 kHz) Bit depth: 16 bits/sample Bit rate: 20 kB/s Codec: wma2 In short: Windows Media Audio 9.2 / 20 kbps / 22 kHz / stereo (1-pass CBR) PowerPoint slides The slides of a presentation contain the most detailed information. It’s important for the viewers since it gives a guideline to the story. Fortunately the slides mostly contain keywords at a pretty decent font size, which means that the quality and resolution do not have to be high for it to be readable. Collegerama publishes PowerPoint slides using the following specifications: Resolution: 1024 x 768 (ratio 4:3) Bit depth: 24 bits/pixel (full color) Codec: jpg 30 3. Distribution platforms Closed captions / subtitles There are different ways of publishing closed captions or subtitles on video. The most commonly used method is a text file containing the spoken sentences along with their corresponding timestamps. Closed captions and subtitles for Collegerama lectures are described elsewhere. For the production of a vodcast, the subtitle files are not relevant since they will be attached to the vodcast based on the internal timestamps of the movie file. Publishing Collegerama on YouTube A vodcast for YouTube should comply with the restrictions for resolution of YouTube. A general strategy for this is to develop a vodcast at the best video quality supported by YouTube, with the following considerations and constraints: • movie size is limited to 2 gigabyte • display size is limited to 10 minutes for the general public, unlimited for channel managers like YouTube-Edu • YouTube gives the viewer the option to display at a lower quality when bandwidth is a limiting factor • producing a vodcast at the highest quality enables the production of “child products” for other platforms with a lower quality, which results in smaller file sizes or bandwidth requirements • YouTube converts movies with non-normalized resolution by downsizing to the nearest standard heights of 360, 480, 720 or 1280 pixels Within these constraints, the best quality of a Collegerama vodcast for YouTube can be achieved by following these steps: • reduce the size of the slides from 1024x768 to 960x720 (downsizing to 94%, keeping the display ratio 4:3) • leave the video resolution at 320x240 • put both elements alongside each other, giving an overall size of 1280x720 (HD720, widescreen, display ratio 16:9) • fill the remaining area with related info, navigation tools or leave them blank Video 320x240 Slide 960x720 Related info 320x480 Figure 3.3: Layout of Collegerama elements within the resolution constraints for YouTube movies (1280x720) A layout according to this setup is given in Figure 3.3. The video is located on the right-hand side of the slides, to give a more balanced overall picture for left to right reading. The overall view could be mirrored to obtain an overall picture which resembles the original Collegerama view, where the video is located on the left. A screenshot of a lecture converted to the resolution requirements and uploaded to YouTube can be seen in Figure 3.4. 3. Distribution platforms 31 Figure 3.4: Collegerama as a vodcast for YouTube (1280x720) Vodcast production Single audio files are often referred to as "podcast" files. The term podcast originates from the iPod, as iPod-broadcasting. In the slipstream of this term, single movie files are often referred to as "vodcast" files. Originally these were downloaded files since iPod and iTunes did not support streaming content. The meaning of these terms has later transferred into "audio on demand" or "video on demand (VOD)", in combination with an RSS feed. This audio or video can also be streaming audio or video, without actual distribution of a real file. The most important step in the production of a downloadable vodcast out of a Collegerama recording is the conversion of the PowerPoint slides into a movie. This can be achieved with the help of screen capturing systems such as Camtasia Screen Recorder. These systems record an assigned part of the display screen into a movie file. By playing a Collegerama lecture, the slides can be recorded as a movie with the right time-framing. Figure 3.5 gives an impression of such a screen recording. Figure 3.5: Converting PowerPoint slides into a movie file by recording the Collegerama slide-display This screen recording resulted in a movie file of 39 MB (1024x768, 15 fps, wmv3), which is only 6.3 times the total file size of the 29 slides (1024x768, jpg). The wmv3 compression proves to be efficient when recording still pictures, since the original 29 pictures have been converted into over 40,000 picture frames. The captured slides movie and the Collegerama movie have been combined into a single HD movie file of only 88 MB (1280x720, 15 fps). This is only (88/117=) 75% of the original small sized Collegerama movie (320X240, 25 fps). The reduction is caused by a lower frame rate and the efficient compression of the wmv3 codec for still pictures. Converting this movie file into the H264 codec increases the file size to over 500 MB. This shows an inferior compression of the H264 codec over the wmv3 codec for this type of movie, typically including large areas with still pictures. 32 3. Distribution platforms Scientific research on compression efficiency of these two codes shows less significant differences.[7][8] The common opinion is that the compression of these two codes is similar, but wmv3 (VC-1) would require less processor power for encoding and decoding. The differences in architecture might result in larger differences in specific situations. Moreover, the achieved compression with these codecs is also influenced by the efficiency of the encoding software. Wmv3 (VC-1) has more advanced features for motion compensation with a more flexible block sizing, which might be the main cause of the observed differences. The creation of a HD movie from a Collegerama recording increases the movie resolution with a factor of 4x4, allowing for much better display of subtitles as is shown in Figure 3.6. Figure 3.6: Vodcast of a Collegerama recording converts a small-sized video into a HD movie with room for proper subtitles Above described production of a vodcast is rather labor and time consuming. A more or less similar result could be obtained by doing a one step recording session, where the overall Collegerama display is recorded by Camtasia. 3.2 iTunes iTunes is an application that allows the user to manage audio and video on a personal computer, acting as a front-end for Apple’s QuickTime media player. Officially, iTunes is required in order to manage the audio of an Apple iPod portable audio player (although alternative software does exist). Users can organize their music into playlists within one or more libraries, copy files to a digital audio player, purchase music and videos through its built-in music store, download free podcasts and encode music into a number of different audio formats. There is also a large selection of free internet radio stations to listen to. Version 4.9 of iTunes, released on June 28th 2005, added built-in support for podcasting. It allows users to subscribe to podcasts for free using the iTunes Music Store or by entering the RSS feed URL. Once subscribed, the podcast can be set to download automatically. Users can choose to update podcasts weekly, daily, hourly or manually. It is also possible to select podcasts to listen to from the Podcast Directory, to which anyone can submit their podcast for placement. The front-page of the directory displays high-profile podcasts from commercial broadcasters and independent podcasters. It also allows users to browse the podcasts by category or popularity and to submit new podcasts to the directory. Video content available from the store used to be encoded as 540 kbit/s protected MPEG-4 video (H.264) with a 128 kbit/s AAC audio track. Many videos and video podcasts currently require the latest version of QuickTime, version 7, which is incompatible with older versions of Mac OS (only v10.3.9 and later are supported). On September 12th 2006, the resolution of video content sold on the iTunes Store was increased from 320x240 (QVGA) to 640x480 (VGA). The higher resolution video content is encoded as 1.5 Mbit/s (minimum) protected MPEG-4 video (H.264) with a minimum of 128 kbit/s AAC for the audio track. 3. Distribution platforms 33 Video formats for iTunes The main focus of iTunes is to distribute content to the Apple iPod and its successors. The original iPod was not provided with a video screen for movie display until October of 2005. The iPod Nano received a movie display in September 2007. The screen size of the iPod family is shown in Table 3.1. Table 3.1: Screen sizes of the iPod and its successors Type iPod video iPhone iPod Touch iPod Nano iPod Nano (new) Supported video (external screen) HD movies Introduction date October 2005 June 2007 September 2007 September 2007 September 2009 Screen size 480 480 320 376 640 x x x x x 320 320 240 240 480 Aspect ratio 1.33 (4:3) 1.5 (3:2) 1.5 (3:2) 1.33 (4:3) 1.57 1.33 (4:3) 1.78 (16:9) Over the years, the different iPod versions have evolved to larger screen sizes and wider screens (higher aspect ratio). If the iPhone aspect ratio is compared to the HD widescreen ratio used today, the iPhone is somewhere in between the traditional TV and the HD widescreen standards. All iPods support a video display of a maximum of 640x480 by use of an external screen. As widescreen HD video has become more or less the standard nowadays, it looks like Apple will someday also transform into larger video displays with HD specifications. iPod constraints for Collegerama vodcasts For the development of a Collegerama vodcast for iTunes (and iPods), the following aspects are of concern: • the rather low resolution of the screen • the different aspect ratio These constraints have consequences for the following design aspects: • the size of the display • the size of the video component • the location of the video component (upper/lower and left/right corner) Low resolution The resolution of the iPod is the same as that of the Collegerama video component. This would allow for simple distribution, using just the video stream as a vodcast and leaving out the presentation slides. Such a setup is used at MIT and many other universities. However, the slides in Collegerama provide a lot of the lecture content, since the keywords and a large part of the subject matter is on it. In an alternative setup, the vodcast might include the slide part of Collegerama with the audio of the video component. This is only an adequate alternative if the slides are readable at this low resolution. Figure 3.7 gives an example of a typical PowerPoint slide at a normal iPod resolution. It shows that the smaller fonts in a presentation are no longer readable at the low iPod resolution, but the typical PowerPoint fonts can still be read quite well. The iPod resolution is around (320/1024=) 30% of the maximum slide size in Collegerama and (320/640=) 50% of the slide size in an overall Collegerama display. 34 3. Distribution platforms Figure 3.7: A typical PowerPoint slide at iPod resolution (320x240) Different aspect ratio The iPod aspect ratio is the same as both the slides and the movie components in Collegerama. Therefore combining these two components in a widescreen view, as is done in the previous YouTube vodcast, is not possible. Alternative solutions are: • the slide components are not included (video only) • the video component is not included (audio only) • the video component is included at a smaller size (picture-in-picture) • the video component is included at a smaller size (side-by-side) with unequal scaling An example of these images is shown in Figure 3.8, which gives an impression of the latter three options. Figure 3.8: Collegerama vodcasts with different options for the video component at iPod aspect ratio From Figure 3.8, it is concluded that the most convenient option for including the movie component is the picture-in-picture layout. This is based on the following considerations: • the slides should be shown at a maximum size for proper readability (no side-by-side) • the movie component can be reduced to a small size (thumbnail) and still remain properly visible • the audio component without the video component misses a focus point for the viewer (the movements of the lecturer give a better understanding of the lecture) Size of display An important aspect in the design of a vodcast for iTunes is the display resolution selected for the production and distribution. The design strategy for creating the smallest file size looks most promising, for the following reasons: • vodcasts for iTunes should be downloaded to and stored on the iPod of the viewers (download time and storage capacity are relevant factors now, which is not the case in a streaming video setup) • small file-sized vodcasts will minimize the requests for other small sized output options like podcasts (audio only), which would require additional production and distribution efforts (time, costs, organization) • a small sized design gives a larger differentiation to the YouTube HD quality design 3. Distribution platforms 35 • • iTunes-Uses the H264 codec, which is not as efficient in video compression as the wmv3 codec used in the YouTube design, so a smaller display size will be more relevant for a less efficient compression the smallest display design allows for viewing on the older iPods, which is still the majority of the iPods currently in use For above mentioned reasons a vodcast for iTunes will be produced with a display size of 320x240 pixels. Size of video component The video in Collegerama shows the lecturer talking to the attendees. For this function a small video size is sufficient as the most important aspect of such a movie is its audio component (spoken words). This is shown in Figure 3.9, in which the original video resolution (320x240) is downsized to 10% of its original size. It shows that downsizing the Collegerama video to 20% (64x48) still gives supporting visibility of the speaking lecturer. In some recorded lectures, the lecturer is writing text on the blackboard or is presenting experiments. Both circumstances require a larger display size for proper viewing and a full switch from the slide view to the video component might be useful. The downside is that this will require an extensive video-editing process which might also need the input of the lecturer. These constraints are not within the scope of a vodcast production out of a Collegerama recording. Production of a vodcast should be possible within a fully automated production process. Figure 3.9: Collegerama video in original size (320x240) and reduced to 30%, 20% and 10% The video component in a picture-in-picture design with the slides on the background will cover part of the slides, reducing its readability. This can be minimized by doing the following: • selecting a small video component (10%-20%) • making the video component (partly) transparent, still allowing for a background view (this setup might allow for a larger video size than a non-transparent movie, 20%-30% instead of 10%-20%) • placing the video component in an area with the lowest disturbance of the slide view Location of the video component The video component should be located on the least disturbing part of the slide. Figure 3.10 gives an impression of these locations for a TU Delft PowerPoint slide at iPod resolution. It shows that the upper-left corner and the lower-right corner are unsuitable for movie insertion. The upper-left corner hides the important slide title, while the left corner hides the slide number. The lower-left corner hides the TU Delft logo and the upper-right corner might hide part of the slide title. Both locations can be deemed acceptable. 36 3. Distribution platforms Figure 3.10: PowerPoint slide in TU design at iPod size, with and without inserted movie components (20%) The lower-left corner might have a small advantage since this resembles the general lecture room layout at TU Delft, in which the lecture desk is in the front-left and the projection screen is located in the upper-center or upper-right part of the lecture room. This lecture room layout results in many Collegerama recordings showing the lecturer looking to his/her upper-left. With a movie component in the upper-right corner, the lecturer often seems to look up into the “sky”. It should be noticed that not all lecturers use the standard TU Delft PowerPoint design. If the lecturer would have been made aware that his Collegerama recording is transformed into an iPod vodcast, he or she might adjust the slides to keep a certain corner of the slide empty. Therefore a uniform predesigned position of the movie component is important. 3.3 Portable Document Format (PDF) Portable Document Format (pdf) is a file format created by Adobe Systems in 1993 for document exchange. It is used for representing two-dimensional documents in a manner independent of the application software, hardware and operating system. Each pdf file encapsulates a complete description of a fixed-layout of 2D document that includes the text, fonts, images and 2D vector graphics which compose the documents.[48] The great thing about pdf files is the fact that all the data of a document is frozen and “digitally printed”, so that it cannot be edited and all the layout properties are fixed. Over the years, it has become the standard medium for distributing and sharing documents online. A new development at Adobe is the release of Adobe Acrobat Connect Pro (formerly called Macromedia breeze). It allowed for a new way of creating general presentations, online training materials, web conferencing, learning modules and user desktop sharing. The entire product is Adobe Flash based.[49] The module for creating lectures based on PowerPoint presentations is a plug-in called Adobe Presenter. Figure 3.11: Adobe Presenter allows the creation of lectures based on PowerPoint 3. Distribution platforms 37 There are several advantages that come with the use of Adobe Presenter, as opposed to Collegerama: • better navigation • higher slide quality • distributable through a single pdf file Navigation As you can see in Figure 3.12, the Adobe Presenter interface creates an automatic index based on the different slides. On the right side you can see each slide title, which is automatically extracted from the PowerPoint file. The great thing about this feature is the fact that there’s a clear way of navigating through a lecture based on keywords taken from the lecture material. This is an option that Collegerama does not have. Figure 3.12: Screenshot of lecture CT3011 implemented within Adobe Presenter Adobe Presenter has a different navigation system compared to Collegerama. Instead of having a video stream that has several images of PowerPoint slides linked to it, it uses a different approach by placing the PowerPoint presentation at the heart of the interface. This means that there is no long video of 45 minutes with a main timeline. It splits the presentation up into separate timelines per slide. Each of these has its own short video attached to it with a separate timeline. As you can see in Figure 3.12, a video of 7 minutes and 25 seconds is playing along with the first introductory slide. The problem with such a system is that it requires the video recording of the lecturer to be split up into smaller segments and linked to each separate slide. This is a time consuming process. Slide quality Since the Adobe Presenter system makes use of the original PowerPoint presentation, it has all the slides digitally available at the highest quality. Once the lecture is converted to a shareable format, the quality of the sheets is no longer limited to a set resolution (1024x768 for Collegerama), but is stored as a vector oriented image. This means that the viewing quality is incredibly high compared to Collegerama. Distributable through a single pdf file There are two ways of distributing the recorded lectures with Adobe Presenter: • server-based streaming • single pdf file distribution The obvious problem with the server-based streaming is the same as that of the current Collegerama system. It is not possible to distribute the lectures through a standard videosharing and streaming medium such as YouTube or iTunes. This means that the distribution options are severely limited. When choosing the single pdf file distribution, all the data that is required to view the lecture, the audio and video stream and the PowerPoint slides, are compacted within one single pdf file. It offers the option of playing it on an offline device that has the Adobe Reader installed. Once downloaded, it is also possible to play the lecture an unlimited amount of times, without having to be connected to the server. 38 3. Distribution platforms Unfortunately the same distribution problem arises when choosing the pdf option. Currently none of the online streaming servers support the playing of pdf files. This means that for other distribution channels to be available, the lecture needs to be converted back to a single file video format. 3.4 Conclusions Timeline There are two approaches to creating recorded lectures: • video-based • slide-based The difference between these two types is the timeline on which the lecture is based. The video-based system is the standard Collegerama method, where a video file of the lecturer exists and several screenshots are linked to the timeline of this video. An example of a slidebased system is Adobe Presenter. Here, the PowerPoint slides pose as a logical timeline for the whole lecture and audio and video streams are linked to each slide. Navigation The current navigation system within Collegerama does not work well. It relies on the screenshots of the PowerPoint slides that are displayed during the lecture. The problem is that during the recording of these lectures, a screenshot is taken every 1 to 4 seconds whenever a change on the screen has been detected. When the lecturer inadvertently moves the mouse or plays a video in his presentation, a lot of abundant screenshots are taken and the efficiency of navigation is greatly decreased. Collegerama as vodcast It is clear that if Collegerama lectures are going to be distributed through the current popular video-sharing mediums, it is required to convert the lectures to a single video file. This is the standard input that is required and accepted by all platforms. To do this, the two elements of a lecture need to be combined: • video stream of the lecturer (wmv) • screenshots of PowerPoint slides (jpg) A lot of thought has to go into what screen resolution to use, where to place each element within the video stream and how to fill up any extra unused space in the newly created video. The size of the video used is dependent on the medium while sharing it. If a vodcast stream for an iPod or iPhone is being created, the resolution is obviously going to be a lot different compared to a video that is created for a high definition YouTube video. It is concluded that a video-based system is a lot better for distribution, since virtually all popular online distribution channels do not offer support for pdf files or a server-based infrastructure to share lectures (YouTube, iTunes-U, Academic Earth etc). By creating high definition movies from the original Collegerama recordings, all other video versions with different pixel sizes can be derived (for instance, vodcasts designed to fit on mobile media players such as the iPhone or Blackberry). This HD movie has a smaller file size than the original Collegerama recordings, due to the efficient compression of still pictures (slides as movie). In the example lecture of 45 minutes, the file size is 88 MB instead of 117 MB. 3. Distribution platforms 39 40 3. Distribution platforms 4. Subtitling Subtitles form the foundation for a lot of extra functionality options, such as tag cloud indexing, searching and translation. In this chapter, the methods for creating subtitles, reasons for wanting to do so and ways of translating subtitles for foreign speaking students is discussed. There are several reasons why the addition of subtitles for Collegerama lectures is useful: • lectures are easier to follow • lectures are available to foreign-speaking students • lectures can be made searchable Lectures are easier to follow If a lecture contains subtitles during playback, it will be possible for the deaf and people with a hearing problem to understand what is being said. These special subtitles for the hearing impaired are called “closed captions” or are sometimes also referred to as “subtitles for the hard of hearing”. The term “closed” in closed captioning indicates that not all viewers see the captions, only those who choose to decode or activate them. This distinguishes from “open captions” (sometimes called “burned-in” or “hardcoded” captions), which are visible to all viewers. Most countries in the world do not distinguish captions from subtitles. In the United States and Canada, these terms do have different meanings. Subtitles assume the viewer can hear but cannot understand the language or accent, or the speech is not entirely clear, so they only transcribe dialogue and some on-screen text. Captions aim to describe all significant audio content—spoken dialogue and non-speech information such as the identity of speakers and occasionally their manner of speaking—along with music or sound effects using words or symbols. Lectures are available to foreign-speaking students Subtitles are generally used to display the spoken words in a video on the screen. For every different language, a new subtitle track has to be created. Most DVD movies that are released in Europe contain at least the subtitle tracks for the languages German, French and English. During production these subtitles are mostly created by hand using professional translators. An alternative for generating different subtitle tracks is to use an automated computer system. An example of such a service that is publically available is Google Translate. It is a beta service provided by Google Inc. to translate a section of text or a webpage into another language. In December of 2009 the system supports 52 different languages from around the world. Like other automatic translation tools, it has its limitations. While it can help the reader understand the general content of a foreign language text, it does not always deliver accurate translations. Some languages produce better results than others.[37] Lectures can be made searchable Every Collegerama lecture consists of a single video stream. Without some sort of indexing system, the only element offered is a 45-minute long video that has no possibility for skipping through relevant parts based on a certain topic. For further background information about the research on this topic, see Annex D. 4. Subtitling 41 4.1 Subtitling process Subtitles for translation and searching are only composed of spoken text. This is created from the audio track that has been extracted from the video stream. The creation method is shown in Figure 4.1. Figure 4.1: Creation process for subtitles There are several ways of creating subtitles: • manual subtitling • real-time subtitling • speech recognition Manual subtitling Many different programs can be used to manually create subtitles for a movie, but the overall usage of them is generally the same. You start by typing in the lines of text that are spoken in the movie. Once these are finished, the transcript needs to be matched to the time sequences of the movie. For every line of text, a timestamp is added so that the subtitle generator can later show the appropriate text at the right timeframe. Figure 4.2: Screenshot of the program SubCreator (Source: http://www.radioactivepages.com/index.php?section=software&docid=subcreator) The advantage of this method is the easy editing of the subtitles. Everyone who can understand the language that is being spoken can write out the transcripts of a given video stream. Unfortunately, this process is very time consuming and therefore relatively expensive. Real-time subtitling Real-time subtitles have to be created within 2 or 3 seconds of the broadcast. There are people specializing in this sort of work, called Communication Access Real-Time Translation stenographers. They use a specialized keyboard that is specifically designed to support shorthand writing, called a stenotype or velotype typewriter. Real-time stenographers are the most highly skilled in their profession. Stenography is a system of rendering words phonetically, and English, with its multitude of homophones (e.g. there, their, they’re), is particularly unsuited for easy transcriptions. They must deliver their transcriptions accurately and immediately.[23] 42 4. Subtitling Speech recognition (ASR) At the moment, speech recognition technology or Automated Speech Recognition (ASR) is still a long way from achieving fully automatic subtitles for any program. There are still many errors in generating text and several challenges such as background noise, different accents and multiple simultaneous speakers make the process difficult. Speech recognition technologies do have their place in the world of modern subtitling. ASR systems are already used in live subtitling systems for sports, news and politics. Translated subtitles Previous described methods for creating subtitles can also be applied to the creation of subtitles in languages other than the spoken language. In general, two ways of creating translated subtitles can be distinguished: • human translation of the spoken text (either offline or live) • machine translation from subtitles of the spoken text Figure 4.3: Translated subtitles improve the learning environment for non-native speaking students At present, machine translation is not able to produce high quality subtitles. The produced quality is either accepted as an improvement over “no translation” or used as a starting point for human post-processing. Google Translate is a well known example of machine translation, but many other systems are presently available. Machine translation is a booming industry supported by an enormous amount of scientific research programs, executed at nearly every university in the world. 4.2 Subtitles from speech recognition Automated speech recognition (ASR) ASR is a sub-field of computational linguistics that investigates the use of computers to transfer spoken words into computer data, ranging from text (speech-to-text) to input control (voice-controlled machines). The fast development of stronger computers has boosted this field in the last decade, sometimes ironically leading to disastrous overrating, such as the Lernout & Hauspie collapse.[57] Speech recognition systems have been and are being developed by universities as well as by commercial companies. Some major international institutions on ASR: • LIMSI - Spoken language processing group (France) • Speech research group at University of Cambridge (UK) • Raytheon - BBN Technologies (USA) • SRI - Speech Technology and Research (STAR) Laboratory (USA) For recent research on ASR, a reference is made to publications of the International Speech Communication Association (ISCA). The most recent conference of the ISCA was held between September 6th and 10th 2009 in Brighton (UK). This 10th yearly conference (Interspeech 2009) included 38 oral sessions, 39 poster sessions and 10 special sessions, resulting in 762 reviewed and accepted papers.[9] 4. Subtitling 43 Performance evaluation of speech recognition Speech recognition engines are developed for a certain language and most often a certain environment, such as telephone conversations, voicemails, news readings, movies etc. The performance of an ASR engine differs not only based on environment, but also on the different speakers (male/female voice, dialect, intonation etc). The standard evaluation metric used to measure the accuracy of an ASR engine is the Word Error Rate (𝑊𝑊𝑊𝑊𝑊𝑊). The word error rate is defined as the ratio of word errors over the total number of words in the correct reference transcript 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁. The number of word errors is the sum of the number of deletions 𝐷𝐷, insertions 𝐼𝐼 and substitutions 𝑆𝑆:[12] 𝑊𝑊𝑊𝑊𝑊𝑊 = 𝐷𝐷 + 𝐼𝐼 + 𝑆𝑆 ∙ 100% 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 Note that the word error rate can be higher than 100%. For example, when the result set contains more words then the reference transcript and all of these words are incorrect. In this case the number of substitutions would be equal to the number of words in the reference text. On top of that there would be insertion errors. For ASR, a 𝑊𝑊𝑊𝑊𝑊𝑊 of 50% is often considered as an adequate baseline for retrieval.[10] Modern ASR engines have a 𝑊𝑊𝑊𝑊𝑊𝑊 between 10% and 60%. Human-made transcripts have a 𝑊𝑊𝑊𝑊𝑊𝑊 between 2% and 4%.[11] Word accuracy (𝑊𝑊𝑊𝑊) is defined as the supplement of the word error rate:[12] 𝑊𝑊𝐴𝐴 = 100 − 𝑊𝑊𝑊𝑊𝑊𝑊 The word accuracy is not just the fraction of words correctly recognized, because the latter does not include insertions. Determining the 𝑊𝑊𝑊𝑊𝑊𝑊 value requires a reference transcript. By absence of such a transcript, the quality of ASR can be indicated by the Word Correctness (𝑊𝑊𝑊𝑊). The 𝑊𝑊𝑊𝑊 value is defined by the ratio of the number of correct words 𝑁𝑁𝑁𝑁 over the number of output words 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁:[12] 𝑊𝑊𝑊𝑊 = 𝑁𝑁𝑁𝑁 ∙ 100% 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 Word accuracy and word correctness can be used interchangeably in case the ASR engine does not produce deletions (𝐷𝐷 = 0) and insertions (𝐼𝐼 = 0), or only in a negligible number (less than 5% to 10%). This is often true for modern ASR engines with a good performance. In this case, the ASR output only includes correct and incorrect words and the number of words in the reference transcript is equal to the number of words in the ASR output (𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 = 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁), so: 𝑊𝑊𝑊𝑊 = 𝑊𝑊𝑊𝑊 = 100 – 𝑊𝑊𝑊𝑊𝑊𝑊 Speech recognition for recorded lectures The 28 recorded lectures of course CT3011 (TU Delft) have been used as input for ASR (see Annex E). These lectures were given in the Dutch language. Speech recognition was done with SHoUT, a speech recognition engine for the Dutch language developed at University of Twente by Marijn Huijbregts, as part of his PhD research.[13] SHoUT is an acronym for the Dutch project name “Spraak Herkennings Onderzoek Universiteit Twente” (in English: Speech recognition research University of Twente). 44 4. Subtitling Table 4.1 gives some data on the lectures and the quality assessment. Figure 4.4 gives the word correctness per lecturer. Table 4.1: Quality assessment of word correctness by speech recognition on lectures Item Number of recorded lectures Duration of lectures (hh:mm:ss) Number of words in output Sample size for assessment Word correctness Range Average per lecture 23:26 – 53:51 3,581 – 9,392 4.2% – 11.0% 23% – 73% 41:20 6,748 6.1% 50% Total 28 19:17:33 188,957 6.1% 50% Figure 4.4: Word correctness of SHoUT for the CT3011 lectures, clustered by speaker The results of the quality assessment of the SHoUT output can be discussed for the following items: • number of words • word correctness Number of words For one of the 28 recorded lectures, human-made subtitles have been manually created. These subtitles contain 6.970 words. SHoUT produces 7.351 words for the same lecture. This 5% increase is probably due to the rather low speaking-rate in lectures, for which SHoUT divides long words into smaller words. Word correctness The average word correctness of SHoUT amounts to 50%, with a variation between 23% and 73%. The word correctness differs significantly for different lecturers. For the word correctness, no correlation was found with either the gender of the lecturer (male or female voice) or the age (lowered voice). Subtitles from speech recognition output For one lecture the SHoUT output was used for creating subtitles. This required substantial input to cluster the individual words and sentences into proper subtitles. The result of this conversion is shown in Figure 4.5. 4. Subtitling 45 Figure 4.5: Subtitles created from the SHoUT transcript In the produced subtitles, no word correction was done. Such a correction for real lectures is essential for using SHoUT output for subtitles. In the result set of Figure 4.5, only 7 out of the 48 test sentences (15% of the subtitles sentences) have a word correctness of 100%. Speech recognition engines like SHoUT might be extended with a statistical post-processor which clusters the generated words into subtitle sentences. This post-processor will be fed by a huge collection of subtitle sentences, in a way similar to collections used for machine translation (see paragraph 4.3). Statistical post-processing will not only produce sentences instead of isolated words, but might also increase the word correctness of the ASR engine by analyzing these complete sentences. In this way, statistical post-processing will reduce the efforts required for the production of high quality subtitles. ASR time-coding of transcript An alternative approach of using ASR for the creation of subtitles is the time-coding of human-made transcripts. The ASR engine will analyze the entire transcript and try to match the known words to similar words that are picked up. These words are then linked to the proper time-code. The website Radio Oranje[51] shows a demo of this method for a speech broadcast on the radio by Queen Wilhelmina during World War II. The existing transcripts of the broadcast were available and each individual word has been time-coded by SHoUT. 4.3 Machine translation for subtitles Machine Translation (MT) Machine translation, sometimes referred to by the abbreviation MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition and translation of idioms, as well as the isolation of anomalies. Machine translation can be diverted into two main approaches: • rule based translation • statistical translation Rule-based machine translation relies on countless built-in linguistic rules and millions of bilingual dictionaries. It focuses on translating separate words and afterwards correcting the grammar by using dictionary grammar rules. The translation is predictable, but the translation results may lack the fluency readers expect. 46 4. Subtitling Statistical machine translation utilizes statistical translation models whose parameters come from the analysis of monolingual and bilingual collections of texts. Building statistical translation models is a quick process, but the technology relies heavily on existing multilingual documents. A minimum of 2 million words for a specific domain and even more for general languages are required. Statistical MT provides good quality when large and qualified text data is available. The translation is fluent, meaning it reads well and therefore meets user’s expectations. However, the translation is neither predictable nor consistent.[1] Phrase-based statistical machine translation has emerged as the dominant paradigm in machine translation research.[1] In order to obtain the benefit from both approaches, existing rule-based translation systems are presently extended by statistical post-processing.[2] For further recent research on machine translation, a reference is made to publications of the Association for Computational Linguistics.[50] The ACL is the most prominent international scientific and professional society for people working on problems involving natural language and computation. Regional associations related to the ACL include: • EACL: The European Chapter of the ACL • NAACL: The North American Chapter of the ACL The most recent conference of EACL was held between March 30th and April 3rd 2009 in Athens, Greece. The 12th conference included several special workshops. Many international researches on this subject were presenting their most recent findings at the conference and workshops.[3][4] Research on machine translation in Europe is heavily funded by the European Union. Their research programs on machine translations are: • EuroMatrix Project (Sept. 2006-Febr. 2009)[52] Project motto: Statistical and Hybrid Machine Translation Between All European Languages • EuroMatrixPlus (March 2009-February 2012)[53] Project motto: Bringing Machine Translation for European Languages to the User A special aspect of machine translation is machine transliteration. Transliteration is the conversion from one writing system to another, with different scripts. Translation from English to Chinese is an example of this. The most recent workshop on this subject (2009 Named Entities Workshop: Shared Task on Transliteration) was held in Singapore on August 7th 2009 as part of Association for Computational Linguistics.[5] Proceeding of this workshop can be found on the Internet. Many machine translation engines have been developed by universities as well as by commercial companies. These machine translation engines compete in the quality of the produced translation. Table 4.2 gives some examples of these engines. Table 4.2: Some popular machine translation engines Product Owner SYSTRAN Babelfish Translate MOSES Asia Online Bing SYSTRAN Yahoo Google Open source Asia Online (MOSES) Microsoft Type (*) R R (+S?) S S S S Start year 1968 1990 2004 2006 2006 2009 Type: R= rule based S=statistical (Source: http://en.wikipedia.org/wiki/Comparison_of_machine_translation_applications) 4. Subtitling Languages 21 13 53 toolkit 516 20 47 Performance of Machine Translation The performance of statistical machine translation depends strongly on the size and quality of its data (corpus). This performance might differ with the direction of translation. Translation from Dutch to English might differ from the translation of English to Dutch, even though it is produced by the same translation engine. Moreover, the performance will be different in districted domains. Translation of news bulletins might be different to translation of scientific articles, produced by the same translation engine. Several automatic metric scores have been developed for evaluating machine translation performance, such as Blue, Meteor, TER (Translation Error Rate), HTER (Human-targeted Translation Error Rate) MaxSim, ULC, and many others. However, automatic measures are considered to be an imperfect substitute for human assessment of translation quality.[6] The performance of some English to German machine translation engines is shown in Figure 4.6. These results were obtained from a quality assessment by 160 translators for English and five other languages (German, Spanish, French, Czech and Hungarian).[6] The translators were asked to rank the outcome of 26 MT engines on 38,000 sentences (1,500-7,000 per language pair). They were also asked to edit about 9,000 isolated sentences, coming from the MT engines, into fluent and correct sentences without looking at the original source. This should reflect the people’s understanding of the output. The edited output was used in the evaluation, even in instances where the translators were unable to improve the output because it was too incomprehensible. The edited output was given a value for the percentage of the time that each MT system was judged to produce an acceptable translation. This value can be considered as a value for “understandability”, not as a real measurable value, but as a relative figure for comparison of different systems. The reference system is an online humanmade translation. Around 20% to 50% of the time, adequate edited translations were obtained with machine translation. Figure 4.6: The performance of some English to German translation engines compared to human translation (=ref) [6] Assessments showed that languages for which large and reliable language pairs are available are better translated.[5] Differences in evaluation of ASR and MT Under the present state of development the values for word accuracy of ASR engines are in the same order as the values for understandability of MT engines. However, the first is regarded as “far from sufficient for subtitling”, while the latter is often considered as “adequate for subtitling”. This phenomenon can be explained by the big difference in awareness of the viewer. Hearing the speaker while watching the ASR subtitles, put a lot more emphasis on differences between the two. Most of these differences are noticed by the viewer and can be seen as a serious shortcoming. Bad subtitles will also result in bad translated subtitles through the use of MT. 48 4. Subtitling ASR can be used for searching, since mistakes in sentences won’t be visible and aren’t a big problem for the search functionality. A word accuracy of 50% is considered as suitable and is obviously a lot better than having no reference data for search. For MT, the criteria aren’t as demanding, since the reference situation is that of a student who is trying to find a spoken word in a bilingual dictionary to his native language, while the lecturer is continuing with his lecture. Google Translate in YouTube If there is at least one subtitle track available, YouTube provides a translation service that can automatically convert the subtitles to another language. This is done through the Google Translate service mentioned. On the bottom-right of the YouTube interface, a button with the CC logo (the official logo which stands for Closed Captions) is available to turn the subtitles on or off. It also opens a submenu from which you can access the translation menu. When the translation menu has been opened, the user can choose from 52 different languages that are available under the dropdown menu. Once a language has been chosen, the subtitles will be automatically sent to the Google Translate engine and YouTube will display the results. Figure 4.7: Translated subtitles from Dutch to English in YouTube Google Translate coverage has been expanded dramatically. It now supports the translation between any of the following languages: English, Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish and Swedish. Google Translate now supports 56 language pairs and has become the most comprehensive online translation tool available for free. In November 2009, YouTube announced that they will expand their services for translating subtitles. Users will be able to post-process the subtitles generated by Google Translate. This service also includes the use of Google’s ASR system for generating timetagged subtitles for YouTube-Edu channels (initially only available in English). As part of this service it will be possible to upload transcripts which will be time-tagged by Google’s ASR system. Human post-processing The present automatic translation by Google Translate results in translated subtitles which are readable for 20% to 80% of the time. It is expected that this quality will improve significantly over the next few years. This quality improvement is obtained with the help of larger and better data sets. For • • • recorded lectures at TU Delft, the following translation pairs are most significant: Dutch to English (BSc courses for non-Dutch-speaking MSc students) English to Dutch (MSc courses for Dutch professionals, as life-long learning material) English to any other language (MSc courses for non-native English speaking students) 4. Subtitling 49 For these target areas the present quality of machine translation might be considered to be insufficient. Manual post-processing might be used in these cases for improving the quality of the machine translation output. 4.4 Text-to-speech for translated subtitles Having proper translated subtitles opens the door for spoken subtitles in the native language of the student. Dubbing of lectures is possible by using text-to-speech engines. In the chain from spoken words to speech recognition (speech to text) to machine translation (text to translated text) to spoken translated words (text to speech), this part has been most developed. IBM’s ViaVoice Text-To-Speech is an example of such a service, which is available online. It should be noted that “Real-Time Translation Service” will be a major research goal for the near future. Another example is MeGlobe, which is an instant messaging service with real-time translation to and from over 15 languages (see Figure 4.8). For educating foreign speaking students, such developments will be a serious boost. This futuristic development is not further elaborated within the scope of this research project. Figure 4.8: Will automatic real-time translation engines become available within the next decade? (Source: http://www.meglobe.com) 4.5 Conclusions It is concluded that producing subtitles for a video lecture opens up a lot of new possibilities. Having the option of turning subtitles on in the same language as the spoken text could make lectures easier to follow for certain students. For Dutch students who follow an English master course, it adds to their learning experience if those lectures are subtitled in Dutch. Subtitles can also be useful as a basis for searching of lecture content. The present state of development in speech recognition for producing subtitles, and machine translation for producing translated subtitles, has been investigated in this research project. The current speech recognition technology has also been evaluated for the generation of proper subtitles. For this, the speech engine created at University of Twente, called SHoUT, has been used. With this ASR engine a word correctness of 25% to 75% was observed for the 28 Dutch spoken lectures that were tested. It is concluded that this system is not yet sufficient to generate proper subtitles and manual post-processing is always required. Machine translation allows for a decent translation, which is always better than having no translation at all. Using it professionally in the education program still requires substantial post-processing. A problem that most universities currently have is that certain master courses have a prerequisite bachelor course that is given in Dutch. Foreign speaking students who are only going to do a master need to know the subject matter of these courses, but aren’t able to look back through those lectures. With subtitles and MT technology, it becomes possible for them to at least follow part of the lecture (dependant on the quality of translation). 50 4. Subtitling 5. Navigation and searching Presently, Collegerama does not provide any form of search functionality. The Collegerama catalog shows an overview of all recorded lectures in a course in a crude form. An example of this catalog is shown in Figure 5.1. Figure 5.1: Catalog of recorded lectures in a course (Source: http://collegerama.tudelft.nl/mediasite/Catalog/?cid=16b5f5fa-0745-4b8b-9f02-f79a03abf50a) The lecture titles and the name of the lecturer are usually wrong. The only correct metadata of a lecture are the recording date and time (announced as air date and time) and the duration of the recording. Searching for a particular lecture in Collegerama can only be done by sorting on this improper metadata. This form of searching seems far from sufficient. Due to this inadequate metadata, the lecturer usually creates an URL-link of a particular lecture recording within Blackboard, the digital learning environment. In Blackboard the lecturers have full access to the published course material. Within a lecture, the only navigation and/or search facility of Collegerama is the overview of slides. Using this thumbnail table during playback hides the view of the current slide. The main drawback of the Collegerama navigator is the disturbance caused by screen actions either by mouse movements or by screen actions, due to a PowerPoint animation or by writing on an electronic blackboard. This enormous amount of screenshots makes this slidebased system completely unsuitable for navigation (see Figure 2.6). This description clearly shows the need for improvement of navigation and searching facilities in the Collegerama environment. In this chapter the possibilities for searching in and browsing through recorded lectures in a course will be presented. Initially, navigation in movies and DVD’s is presented, as well as the scientific research on multimedia retrieval systems. In • • • the next paragraph, the following sources of information are presented: lecturer (lecture titles, lecture chapters) slides (slide titles, slide content, slide notes) spoken words (transcripts, subtitles and/or speech recognition output) Afterwards, the different products are presented: • search engine on lecture data in a course • tables of content (for courses and lectures) • tag cloud presentations of lecture content Finally the results will be evaluated in order to determine a proposal for searching facilities, i.e. required sources and proposed output. For further background information about the research on this topic, see Annex E and F. 5. Navigation and searching 51 5.1 Meta-data for navigation and search Elaborating the improvement of navigation and search within lectures recorded by Collegerama might be preceded by investigating these aspects in parallel environments or disciplines. For navigation of videos, the navigation within DVD and Blu-Ray movies can be evaluated. These movies are considered to be the most commonly accepted development in user accessibility. For search, the latest developments in multimedia retrieval have been studied. Selecting of and navigation in DVD movies The selection and navigation process for (recorded) lectures could be compared to the selecting (buying/hiring) and viewing of DVD movies. The movie box sets containing movies from a TV series can be considered as comparable to courses containing recorded lectures. To make a proper selection, the potential viewer requires further information on the actual content of the movie box set and its movies. This metadata is normally printed on the movie box set and on the cover of the individual movies. With this concept in mind, the primary metadata of courses, lectures and lecture content is presented in Table 5.1. Table 5.1: Primary metadata for selecting of and navigating in recorded lectures Course University Course name Responsible teacher Course code (Academic) Year Academic discipline Faculty Logo Lecture Lecture title Name of lecturer Course name (and year) Date of recording Initial slide (picture) Tag cloud of content Screenshots (picture story) Short description Lecture content Table of contents Tag cloud of content Screenshots (picture story) Short description Not all metadata is text. Screenshots, logos and tag clouds are pictures which give a better impression on the movie box (course) and its movies (lectures) than text in titles and descriptions. For navigation within a movie itself, Table 5.2 gives the analogy for recorded lectures. Table 5.2: Analogy of navigation in DVDs and recorded lectures Element Main menu Submenu per chapter DVDs Chapters Scenes Recorded lectures Chapters Slides Search in movies Searching in movies is studied in the research discipline of computational multimedia information retrieval.[14] Such video information retrieval focuses on searching in video collections by using various methods of abstracting information from video recordings. The abstraction of spoken text (speech recognition) for data retrieval or the detection of shot changes for segmenting can be mentioned as examples of these methods. Figure 5.1 gives an overview of a multimedia information retrieval system, as described in the book Multimedia Retrieval.[14] 52 5. Navigation and searching Figure 5.2: Schematic view of a multimedia information retrieval system [14] Specific elements of multimedia information retrieval with relevance for recorded lectures are: • languages for metadata[15] • presentation of search results[16] • evaluation of Multimedia Retrieval Systems[17] An important element in searching within multimedia data is the relation between the video content and the metadata. For recorded lectures, this relation can be fixed by using timetagging. With time-tagging, the metadata is related to a certain time interval in the multimedia content. Subtitles with a particular begin and end time is a typical example of this. Other items such as slide views (pictures/scenes) and chapters can be time-tagged. Figure 5.3 gives an impression of searching in multiple parallel metadata of recorded lectures.[18] Figure 5.3: Searching in parallel metadata of videos 5. Navigation and searching [18] 53 Searching in a multimedia system will give a result set. The user will be confronted with this result set in order to further select one or more of the results for actual viewing. For this, selection it might be essential that the user is able to see the context of the result element. As a user looks for the keyword “water”, the result set will show multiple occurrences of this word. Information about the context of the search result or the source type that the data came from might be relevant for evaluating the search results. This constraint requires context-preserving information retrieval.[19] 5.2 Metadata sources Input from lecturer The Collegerama recording system is based on input from a video camera and input from screenshots at the display-computer. These screenshots should be regarded as a low level of screen recording with a maximum frame rate of 1 fps. Thumbnails of screenshots are used for navigation in Collegerama/Mediasite. For this the individual screenshots can be clustered in a group showing only one thumbnail in the navigation screen. This clustering is done automatically during recording. At TU Delft, this results in the generation of far too many thumbnails. Further clustering can be done in a manual post-processing session, but this is currently never done. The lecture recording department is understaffed to handle this task and the lecturer does not have access to the Collegerama server. The ultimate result is that the recorded lectures often lack a proper navigation. The lecturer should get access to the Collegerama server so that the overhead of screenshots in the recorded lectures can be corrected. As an alternative approach, the recording department may develop an offline tool (or web based data collection system) in which this clustering can be done. Such a system could be used for collecting all data from the lecturer, such as: • proper lecture title • accurate name of the lecturer or lecturers • time based chapter titles of a lecture • time based correlation between recording and original PowerPoint slides • original PowerPoint presentation (either as ppt or as pdf file) The main purpose of the data collected from the lecturer is to create a proper table of contents for the recorded lecture. To accomplish this, the lecture should be divided into 3 to 8 “chapters” for a 45 minute lecture. This provides each lecture with chapter durations of approximately 5 to 15 minutes. The lecturer should at least create a “text slide” per chapter in case the lecturer does not use a PowerPoint presentation or equal presentation tools (such as electronic blackboards etc). This text slide is used as an equivalent to a presentation slide and is shown during the playback of the whole chapter. The collected data can be incorporated into a database per course (Collegerama data system) and might also be used to improve the original Collegerama recording/navigation. This database can also be used to generate a proper table of contents (TOC), containing all recorded lectures of a course. This might replace the original Collegerama catalog. The word correctness of text information from data collected from the lecturer is estimated at 90% to 100%. The text itself has completely been recovered from the PowerPoint slides, but the lecturer might have made mistakes while creating them. Input from slides Text on PowerPoint slides form a rich source of data for recorded lectures. The text data from slides can be divided into: • slide titles • slide content • slide notes 54 5. Navigation and searching The text data of PowerPoint slides can automatically be retrieved from a digital file, either from the ppt/pptx file and/or the “printed” pdf file. This data can then be inserted into the Collegerama data system. The slide titles form a table of contents (TOC) of the lecture based on the timing input of the lecturer. Every word in the text itself is automatically retrieved; however the text that is shown in pictures requires a different technique (Optical Character Recognition). In this research project, OCR has not been used to accomplish this. Input from spoken words The spoken text in recorded lectures might be available in one of the following forms: • transcript (full text, without timestamp) • subtitles (time-stamped per sentence) • words (time-stamped per word) For the sample course CT3011, the following sources are available: • subtitles and transcript of the sample lecture #15 of this course (transcript generated from human-made subtitles) • words of all lectures retrieved through speech recognition (SHoUT) For the speech recognition by SHoUT, the word correctness of all recorded lectures has been determined (see Annex E). The mean word correctness is 50%, with values between 23% and 73% (standard variation 14.6%). 5.3 Metadata storage All the collected metadata can be incorporated into a Collegerama data system. For this research project this database is restricted to only the recorded course. The database shown consists of 2 tables: • lectures, containing all metadata related to the lecture as a whole • content, containing all metadata within the course, on a time-based level (start and end time in milliseconds) A visual representation of each table, its columns and their corresponding data type is given in Table 5.3 and Table 5.4. Table 5.3: Database table Content Field name Content_id Lecture_id Start_time End_time Text_type Text Table 5.4: Database table Lectures Field name Lecture_id Lecture_nr Title Lecturer Air_date Collegerama_id 5. Navigation and searching Data type int int int int int nvarchar(MAX) Data type int int nvarchar(100) nvarchar(50) datetime nvarchar(50) 55 For this project, only text data has been included into these tables. A future addition could be the adding of thumbnails per record, so that a characteristic screenshot preserves the context of information. This screenshot might be taken at a certain time moment 𝑇𝑇𝑇𝑇𝑇𝑇 in the time interval (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 to 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) at a fixed elapsed time interval: 𝑇𝑇𝑇𝑇𝑇𝑇 = 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 + 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 ∙ (𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 − 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇) In the latest YouTube movies, the typical screenshot for a movie shown at selection is taken at 33% of the length (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0.33). The screenshot might be replaced by storing the value for 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 in the metadata tables, in case the movie and metadata are stored in a multimedia retrieval system. The value of 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 per record could be flexible giving additional selection freedom to the lecturer. Figure 5.4 and Table 5.5 give an impression of the data collected in the Collegerama data system. Figure 5.4: Source and number of records in the Collegerama data system for the course CT3011 (assuming subtitled for all lectures) Table 5.5: List of Text_types and the amount of records and words in the database for course CT3011 ID 1 2 3 4 5 6 7 8 9 Text type Lecture title Lecture chapter Slide title Slide content Slide notes Transcript (lecture) Transcript (slide) Transcript (sentence) Transcript (word) Nr of records 28 116 1,183 1,042 280 28 1,183 21,812 118,926 Nr of words 129 300 3,900 15,943 22,512 * 179,480 * 179,480 * 179,480 188,926 Nr of characters 917 2,526 28,741 129,195 142,856 * 768,058 * 768,058 * 768,058 808,482 * 95% of the total number of words generated by SHoUT, based on the comparison between the human-made subtitles and the SHoUT subtitles 5.4 Course and lecture navigation Tables of content The Collegerama data system can be used as a generator for a table of contents (TOC) for: • an overview of recorded lectures in a course • an overview of content in a recorded lecture For this research project, two prototypes of TOC’s have been evaluated: • a static TOC (list) • an interactive TOC (based on Macromedia Flash technology) 56 5. Navigation and searching Static table of contents Figure 5.5 gives an impression of a static TOC generated from the Collegerama data system. Figure 5.5: Table of contents for recorded lectures in course CT3011, generated from the Collegerama data system The generated TOC lists all lecture titles, the lecturer and the duration. The TOC also contains a hyperlink to the related Collegerama recording. This generated TOC is an improvement over the TOC generated by the lecturer, created as an improvement over the Collegerama catalog, for the following reasons: • a uniform layout for the whole university • possibility for automatically updating after modification of the content within the Collegerama data system Interactive table of contents Figure 5.6 gives an impression of an interactive TOC generated from the Collegerama data system. In this example a Flash movie is generated containing: • time slider for all chapters, including chapter titles • time slider for all slides, including slide titles • screenshots of HD movie (𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = 0.1) Figure 5.6: Interactive TOC for recorded lecture #15 in course CT3011, generated from the Collegerama data system The generated TOC shows the screenshot of the HD movie whenever the users’ mouse goes over the related time slider section. This is synchronized with the related chapter. The chapter slides does the opposite, showing the first slide in the chapter. This TOC gives a proper viewing of the content of the lecture and is a great improvement over the Collegerama thumbnail navigation. 5. Navigation and searching 57 A similar interactive TOC can be generated for each course. This interactive TOC might show additional metadata such as: • lecture name • date and time of recording (air date/time) • short description of the lecture • tag cloud of the lecture The Flash technology allows for relatively large amounts of text information. Flash movies contain vector based text which keeps it sharp at all magnifications (for example at full screen display). Tag clouds A tag cloud is a selection of tags or a list of relevant words from a document, in which the size of each tag is based on its frequency of occurrence. The Collegerama data system can be used as a generator for Tag clouds for a certain lecture. These are considered to be a useful representation of the content of a lecture. Annex F evaluates different forms of tag clouds. A basic relationship between frequency and word size is: 𝐶𝐶 – 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 ∗ 𝑅𝑅𝑅𝑅𝑅𝑅 + 𝐵𝐵𝐵𝐵𝐵𝐵 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 – 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 In which: 𝑆𝑆 = font size of word (pixels) 𝐶𝐶 = frequency count for the word (or tag) C𝑚𝑚𝑚𝑚𝑚𝑚 = frequency count for the least popular word (or tag) 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = frequency count for the most popular word (or tag) 𝑅𝑅𝑅𝑅𝑅𝑅 = largest font size minus smallest font size for words (pixels) 𝐵𝐵𝐵𝐵𝐵𝐵 = smallest font size for words (pixels) 𝑆𝑆 = In practice, more sophisticated relations are also applied, such as logarithmic or different non-linear relations as well as all kinds of clustering algorithms. Tag clouds have been studied on various other aspects, such as order of words, layout of words, color usage etc.[22] Tag clouds are often produced using specialized websites, such as MakeCloud, Wordle or ToCloud.[54][55][56] The tag clouds for this research project have been produced via the website Wordle. Figure 5.7 gives an impression of such a web-generated tag cloud. Figure 5.7: Tag cloud for recorded lecture #15 in course CT3011, generated by Wordle, with and without deleted words by prof J.C. van Dijk [54] Evaluation of tag clouds In this research project, tag clouds have been produced from different data sources or text types (subtitles, ASR output, slide titles, slide content). These tag clouds were evaluated in order to determine rules for creating the best tag clouds that could best represent the content of a lecture. All tag clouds have been produced in black and white with the same font face, in order to have only the font size as a distinctive element. In most cases the words for the tag clouds have initially been “cleaned” by removing “common Dutch words” (common according to wordle.net) or by selecting only the nouns. These tag clouds and the assessment experiments are reported in more detail in Annex E and Annex F. 58 5. Navigation and searching This assessment was done in 2 steps: • quality assessment of 10 tag clouds with 15 to 100 words from different sources • quality assessment of 10 uniform tag clouds with 15 words from the same sources The second step was based on the remarks by the lecturer on the first step: • tag clouds with 100 words are always unacceptable since these are unreadable • tag clouds with 25 to 35 words contain too many irrelevant words In the second step, the lecturer was asked to assign a sequential ranking of the 10 tag clouds (actually 9, as #6 was identical to #7) and to mark irrelevant words in each tag cloud for deletion, in order to obtain a better representation for the lecture. In Table 5.6, the results of this second assessment are shown. It contains two rankings, the first is the ranking as given by the lecturer, the second is this ranking combined with a ranking based on the number of deleted irrelevant words. Table 5.6: Tag cloud assessment of modified tag clouds (all 15 words) ID Source Cleaning method * 1 Slide titles 1 2 Slide content 1 3 Slide titles and slide content Slide notes Human subtitles Human subtitles Human subtitles Human subtitles SHoUT output 1 4 5 6/7 8 9 10 1 1 2 2 2 Lecturer assessment results General appearance Many same sized (small) words Too many same sized (small) words Too many same sized (small) words Word “chloor” is missing Rank 5 Nr of deleted words/rank 4 (=1) Total rank 6 7 7 (=6) 13 8 5 (=2) 10 6 9 4 1 3 2 9 15 11 5 5 6 (=7) (=9) (=8) (=2) (=2) (=5) 13 18 12 3 5 7 * 1 = after removing common Dutch words; 2 = nouns only Table 5.6 shows that the two tag clouds from nouns in the subtitles (#8 and #9) have the best overall ranking. These two tag clouds contain the same words, but differ in letter font and layout of the words. The best readable font (Coolvetica) was preferred by the lecturer over a less readable font (Vigo). The lowest number of deleted words was obtained from the slide titles. However, the produced tag cloud contains a low variance in font size, so there isn’t a large distinction in occurrence. The variance in word count in subtitles is much larger giving a more pronounced picture. The tag cloud from the SHoUT output has a lower ranking than the human subtitles, because it misses an important word (“chloor”) and has more deleted words. The other produced tag clouds were significantly less appreciated. The following conclusions have been made from these results: • tag clouds should contain less than 15 words • tag clouds should be obtained from “nouns only” • tag clouds from subtitles (or speech recognition) are preferred over tag clouds from slide titles (or slide content / slide notes), because of their larger variance in font size • tag clouds need a “best readable font” • tag clouds could be improved by removing bad words chosen by the lecturer The use of colored tag clouds is not evaluated, since this might be largely dependent on the personal preference of a lecturer. 5. Navigation and searching 59 5.5 Collegerama lecture search The collected data is the source for the Collegerama lecture search engine. Figure 5.8 gives an impression of this. Figure 5.8: Collegerama search engine produced for the course CT3011 The produced search engine allows for selecting each individual data source. The user might search for a certain word or word combination in the selected sources. Along with this, the user might also search over all lectures or within a particular lecture. It is also possible to look through all the available content leaving the keyword empty, which results in: • a table of contents (TOC) of the course (by selecting only the lecture titles) • a table of contents (TOC) over the lecture (by selecting only the slide titles in a particular course) The output of the Collegerama Lecture Search, shown in Figure 5.8, presents the following context-preserving data: • data source (subtitles, slide titles, etc) • lecture number (ID) • lecture title • lecturer • time interval (begin, end) • queried keyword, with 30 preceding and 30 subsequent letters Evaluation of search engine The performance of search engines on recorded lectures is studied in the research discipline of Spoken Document Retrieval (SDR).[20] SDR involves the retrieval of excerpts from recordings of speech using a combination of automatic speech recognition and information retrieval techniques. Movies and videos form a sub domain of Spoken Documents. Special 60 5. Navigation and searching workshops on evaluation of information retrieval systems for movies and videos have been organized under the name TRECVid (Text Retrieval Conferences on Videos).[21] For this research project, an analysis on the results of certain important keywords has been evaluated. The following tests have been done: • comparing query results from ASR output versus human-made subtitles • comparing query results from all data sources • analyzing the video length of search results, based on different data sources in Collegerama lecture search • “precision and recall” measurement[17] • analyzing multiple keyword queries Comparing query results from ASR versus human-made subtitles The query results of the 15 most-used nouns on both data sources are presented in Table 5.7. The data has been abstracted from lecture #15. In determining the query results of the word “water”, compounds such as “drinkwater”, “drinkwatervoorziening”, “grondwater”, “oppervlaktewater” have not been included (as is the case for “drinkwater” in “drinkwatervoorziening”). This table also shows the 5 deleted words that are marked by the lecturer as less relevant in the assessment of tag clouds (see chapter 5.4), leaving the ten most important words (marked by “ok” in the table) as selected by the lecturer. Table 5.7: Occurrences of the 15 most used nouns from ASR versus human-made subtitles Keyword chloor drinkwatervoorziening boek oppervlaktewater plaatje vragen soort water stoffen grondwater Nederland dingen keer drinkwater jaar Total Total ok words Lecturer check ok ok ok ok deleted ok deleted ok ok ok ok deleted deleted ok deleted ASR (occurrences) 0 4 5 7 6 6 7 33 8 20 35 17 16 16 28 208 134 Human-made subtitles (ref) (occurrences) 10 16 15 15 11 10 9 39 9 21 36 16 13 13 16 249 184 𝑾𝑾𝑾𝑾 for single word (%) 0% 25% 33% 47% 55% 60% 78% 85% 89% 95% 97% 106% 123% 123% 175% 84% 73% The most remarkable result in the occurrences is the word “chloor”, which has been indicated by the lecturer as one of the ten most important words. This word has not been recognized by SHoUT as being an uncommon word in the Dutch language. This word or item is therefore not retrieved from the lecture if no correct subtitles are available. The word accuracy ( 𝑊𝑊𝑊𝑊 ) for “jaar”, “keer” and “drinkwater”, shows that for searching composed words in the ASR output, it is better to search for word components instead of full words. This is illustrated by the low 𝑊𝑊𝑊𝑊 of the word “drinkwatervoorziening”. A 𝑊𝑊𝑊𝑊 of above 50% is expected from the ASR output as the accepted or expected quality level for ASR engines. The word “boek” has a lower 𝑊𝑊𝑊𝑊 in the ASR output, which shows that for SHoUT, this word is difficult to decode. This word has also been indicated by the lecturer as one of the ten most important words. 5. Navigation and searching 61 Comparing query results from different data sources In order to evaluate the different data sources, the search results of the ten most important keywords of lecture #15 (as determined by the lecturer) have been compared. The results are shown in Table 5.8. Table 5.8: Occurrences of the 10 most important keywords (as determined by lecturer) from different data sources Keyword (lecturer) water Nederland grondwater drinkwatervoorziening boek oppervlaktewater drinkwater chloor vragen stoffen Total occurrences Nr retrieved keywords % retrieved keywords Subtitles 39 36 21 16 15 15 13 10 10 9 184 10 ASR 33 35 20 4 5 7 16 0 6 8 134 9 Slide titles 5 0 0 0 0 0 1 0 0 0 6 2 Slide cont. 2 0 1 0 1 1 0 0 0 0 5 4 Slide t+c 7 0 1 0 1 1 1 0 0 0 11 5 Slide notes 0 3 1 2 0 2 2 0 0 0 10 5 Lecture title 0 1 0 1 0 0 0 0 0 0 2 2 Lecture chapter 0 0 0 0 0 0 0 0 0 0 0 0 100% 90% 20% 40% 50% 50% 20% 0% The results of Table 5.8 show that for searching in lectures, the lecture chapter titles are of no importance, since none of the important keywords are retrieved. The slide titles and the lecture titles only retrieve 20% of the keywords. These three text types are particularly suitable for navigation, but clearly not for searching. To a lesser extent, the same holds true for slide content and slide notes, which retrieve 40% to 50% of the keywords. The overall ASR word correctness of this lecture is 46%, as shown in Annex E. The word correctness over the keywords is 73% (= 134 / 184). When comparing the keywords themselves, 90% of them are retrieved by ASR. These results show that ASR gives a drastic increase in search results over slide data. Having human-made subtitles will further increase the search results to an assumed 100% value. The results of Table 5.8 can partly be explained by the fact that transcripts, either from ASR, subtitles or other, contain around ten times more words than the slide content. This is shown in Table 5.5. Video length per data source The query results indicate how many of the items are found in a search, but not how long the accompanying video length is for each item. Searching an item in (non time-tagged) transcripts may indicate the lecture in which the item is used, but the user has to watch/listen to the whole lecture to actually come across the correct video segment. Assuming a constant speaking rate might give a best guess to jump to the equivalent time-frame, but in most cases this is not suitable for the user. The time correctness of a search is related to the video length or duration (end time minus start time) of the related video fragment. The video length per data source in Collegerama lecture search is shown in Table 5.9. 62 5. Navigation and searching Table 5.9: Video length per data source in Collegerama lecture search for course CT3011 Data source (text type) Lecture title Transcript (lecture) Lecture chapter Slide title Slide content Slide notes Transcript (slide) Transcript (sentence) Transcript (word) Description Minimum (sec) 1,351 Maximum (sec) 3,231 Mean (sec) 2,451 Chapters by lecturer Slide data 15 2 2,197 611 592 55 Subtitles ASR output 0.6 0.0 6.0 3.4 3.4 0.3 Lecture recording Table 5.9 shows that the video length for slides may vary between 2 seconds and 7:28 minutes, with a mean value of 58 seconds. This means that on average the user has to wait for nearly 1 minute to encounter his searched item. This video length might be acceptable for recorded lectures, as most spoken text has a relevant surrounded text. In general, all spoken text belongs to that particular slide, as the lecturer explains the slide content. More detailed searching for a specific sentence can be achieved by searching in subtitles or time-tagged words (such as the ASR output of SHoUT). With time-tagged words, it is possible to show a karaoke-type subtitling, with sentences and coloring of the spoken word. An example of this can be seen at the website for Radio Oranje, in which old transcripts have been time-tagged by ASR (SHoUT).[51] Precision and recall measurement The effectiveness of an information retrieval system is often measured by the combination of “precision” and “recall”.[17] Precision is the fraction of retrieved objects that is relevant. Recall is the fraction of relevant objects that is retrieved. These values can be defined in the following formulas: 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = 𝑟𝑟 𝑅𝑅 𝑟𝑟 𝑛𝑛 In which: r = number of relevant documents retrieved 𝑛𝑛 = number of documents retrieved 𝑅𝑅 = total number of relevant documents The measurements require a set of objects or documents for which the number of relevant objects is known. For searching in recorded lectures, slides can be considered as documents since they give an overall overview of the subject matter. For the Collegerama lecture search engine, these test can be executed on the data of lecture #15. The slides of the lectures can be used as an object for these tests. A slide is regarded to give a completed subset of a lecture in which the related subject matter is explained. A detailed description of this test is given in Annex F, while the results are shown in Table 5.10. The test was done on 3 of the 10 “important words” of lecture #15: “stoffen”, “grondwater” and “chloor”. The words “stoffen” and “grondwater” were selected because of their high ASR accuracy and their low occurrence in all lectures. It is assumed that these 2 keywords will also give a high ASR accuracy for the other lectures, despite the fact that most of these lectures were given by other lecturers. The low occurrence will result in more profound results. The word “chloor” was selected because of the missing of this word in ASR. 5. Navigation and searching 63 Table 5.10: Precision and recall measurement for different data sources on 3 important words of lecture #15 Item Data source Occurrences Number of related slides (𝑅𝑅) Number of slides retrieved (𝑛𝑛) Number of related slides retrieved (𝑟𝑟) Recall (𝑟𝑟 / 𝑅𝑅) Precision (𝑟𝑟 / 𝑛𝑛) subtitles subtitles ASR slide titles slide content slide notes ASR slide titles slide content slide notes stoffen 9 4 4 0 0 0 4 0 0 0 Keyword grondwater 21 5 6 0 1 1 5 0 0 0 chloor 9 2 0 0 0 0 0 0 0 0 ASR slide slide slide ASR slide slide slide 100% 0% 0% 0% 100% - 100% 0% 0% 0% 83% 0% 0% 0% 0% 0% 0% - titles content notes titles content notes Table 5.10 shows that for the keyword “stoffen”, the slide recall and slide precision for ASR are both 100%, despite the fact that the retrieval rate for this keyword was only 89% (8 out of 9, according to Table 5.8). Although ASR has missed 1 occurrence, on an object/slide/document level this was not relevant. Slide data does not give any recall for this keyword and consequently no precision. For the keyword “grondwater”, the slide recall is 100% (20 out of 21, according to Table 5.8). The precision is 83% since 1 additional slide is retrieved by ASR. Again, for recall and precision the slide data is of no importance. The keyword “chloor” was missed by ASR and is not shown on the slides. Consequently the recall is 0%. The above mentioned recall and precision measurements show higher values on slide level than were obtained by a previous a known item/keyword search. Multiple-keyword search Multiple-keyword searching on individual subtitles will not give a positive result, as these keywords are never used in one particular sentence and won’t be retrieved as one record in the database. The same holds true for searching on individual words from ASR. A solution to this problem is offered by storing all spoken text belonging to a slide, called a slide transcript. The time-code contains a start and end time for the slide. The same is done for an entire lecture. This will allow for the searching of combined keywords. The student can use the slide or lecture timeframe as the starting point for further viewing. Searching within spoken text per slide is included in the database but not implemented in the prototype for the web interface. Evaluation of this feature has been done directly on the database. This approach results in the storing of the same data in multiple records. Transcripts per lecture could be searched by a search engine using the transcript per word (ASR output). The approach used gives additional flexibility in the layout of transcripts, which enables more sophisticated output options. A lecture transcript can be printed in a more convenient way if additional line breaks are included. This option is not available if lecture transcripts are automatically abstracted from word transcripts. If a multiple-keyword search is done on the ASR data for the words “stoffen” and “grondwater” in lecture #15, 8 results are returned. When clustering this result set by slide, there are only 2 slides out of a total of 29 slides that contain both keywords. The slide timeframe 24:07-25:09 gives 1 paired result and the slide timeframe 29:47-33:35 gives 4 paired results. The total viewing time for the combined results is reduced from the lecture duration of 45:09 minutes to only 4:50 minutes. 64 5. Navigation and searching Table 5.11: Occurrence of combinations of 2 important words in all lectures Lecture # 15 17 19 20 21 23 25 Total occurrence Number of lectures Number of slides Total duration stoffen grondwater 20 12 26 2 75 21 10 stoffen + grondwater in lecture 8 11 11 2 4 15 1 stoffen + grondwater in slide 1+4 1+1+1 1+1+2+1+2 0 1+1+1 1+2+1 1 8 11 11 3 4 15 1 54 8 - 223 18 - 52 7 270 5:22:05 23 6 17 20:57 These 2 words can be searched in all lectures. Both keywords are present in 7 lectures. Without a multiple-keyword search per slide, this will require a total viewing time of 5:22:05 hours in order to see all results. If a search is done on slide level, only 6 lectures will be retrieved, with a total of 17 slides in which the combination of keywords is found. This reduces the viewing time to only 20:57 minutes. Searching on slide level reduces the total viewing time to 6.5%, or a reduction of 93.5%. Ranked search results (implementation) The search results have to be ordered according to a certain norm. In this research project, two of these options have been evaluated: • time-based • rank based Time-based In this order method, all the results are sorted in chronological order. This makes sense for recorded lectures, assuming the sequential explanation of key items in lectures. Later in the course, the key items are explained in further detail. In SQL Server, this can be accomplished by ordering the query results on Lecture_nr and Start_time. The query that can be used for this is shown below: SELECT * FROM Content INNER JOIN Lectures ON Content.Lecture_id = Lectures.Lecture_id WHERE CONTAINS (Text, 'stoffen') AND Lecture_ID = '15' ORDER BY Lectures.Lecture_nr, Start_time, Content.Text_type Rank based SQL Server has a function ranks search results based on several factors: • text length • number of occurrences of search words/phrases • proximity of search words/phrases in proximity search • user-defined weights The query that can be used for this is shown below: SELECT * FROM Content AS FT_TBL INNER JOIN CONTAINSTABLE(Content, Text, 'stoffen') AS KEY_TBL ON FT_TBL.Content_id = KEY_TBL.[KEY] WHERE Lecture_ID = '15' ORDER BY KEY_TBL.RANK DESC; 5. Navigation and searching 65 Evaluation With the rank based approach, the ASR results (text_type = 7) are higher ranked than the other types, because each record only contains one single word. According to the relevance ranking system Okapi BM25[58], these will be evaluated as being of high relevance. Similar results can be expected for subtitles and slide titles in comparison with slide notes and slide transcripts. These effects might be corrected by using user-defined weights for different text types. However, this has not been tested in this research project. The current search engine uses time-based ordering. 5.6 Conclusions Table of contents A table of contents (TOC) of a recorded lecture is an important element in the navigation of recorded lectures. Such a table of contents should be drafted by the lecturer or the assisting staff. It’s useful to prevent such a TOC in an interactive way based on screenshots from the HD movie, the timeline and the available text data. This could be generated automatically into a Flash movie by accessing the database content and the related HD lecture movie. Tag clouds The accessibility of a recorded lecture could be further enlarged by creating tag clouds per lecture (limited to 15 words). The following conclusions have been made from these results: • tag clouds should contain less than 15 words (nouns only) • the best source of information for tag clouds are human-made subtitles • tag clouds from subtitles (or speech recognition) are preferred over tag clouds from slide titles (or slide content / slide notes), because of their larger variance in font size • tag clouds need a “best readable font” • tag clouds could be improved by removing bad words chosen by the lecturer (in our examples, 25%-40% of the words were removed) Search engine A search engine on the course database is a useful element for enlarging the accessibility of the course and its lectures. It forms as an additional component over the navigation tools such as table of contents and tag clouds. The following conclusions can be made: • the best source of information for searching are human-made subtitles followed by ASR output • chapter titles and slide content has a low importance for searching • chapter titles and slide titles are only relevant for the generation of table of contents • by clustering subtitles or ASR per slide, multiple-keyword searching is largely improved because of shorter viewing times in the search results (in our example lecture, it was reduced from 5.3 hours to 21 minutes) For the proper operation of a search engine, the output of a speech recognition system with sufficient word correctness is required. Better retrieval rates can be obtained with full subtitles. In view of the other beneficial elements of subtitles (for machine translation, and for better following of the lectures) these subtitles are considered as an essential part of all recorded lectures. Future extensions A text-based database per course can also form as a basic container for a course discussion board, using time stamped remarks (“questions and answers”, discussions). A further extension to the database and search engine could be the adding of the other course material, such as readings (books, lecture notes), activities (assignments, tests, lab tests) and practice exams. 66 5. Navigation and searching 6. Proposed improvements From the information and knowledge derived in this research project, as described in the previous chapters, it can be concluded that the usability of recorded lectures can be expanded. However, to increase the usability, it will be necessary to improve and extend the existing lecture recording and storage system. These improvements and extensions can be divided into these four categories: • improved lecture accessibility • improved navigation and searching • addition of online discussion • re-using recorded lectures to increase the course frequency In paragraph 6.1 through 6.4, each of these elements will be discussed and the accompanying recommendations for improvement are mentioned. These improvements are a combination of conclusions from this research project, as well as suggestions and recommendations for future developments. Paragraph 6.5 will give the outline of a pilot project for further development of these proposed improvements. This project can be regarded as a practical approach for implementing the conclusions and recommendations of the previous paragraphs. 6.1 Lecture accessibility Improving the accessibility focuses on giving more students access to recorded lectures, independent of their location, computer device or operating system. The ultimate goal is to offer all lectures in several different video formats, as well as a small sized version that is designed specifically for mobile devices like the iPhone or Windows Smartphones. All lectures need to have subtitles of the spoken language, as well as translated subtitles for the most common foreign languages such as English, Spanish, French, German and Chinese. This will support the student exchange programs that are available at most universities in the Netherlands. We • • • can divide the improvement of accessibility into these three general categories: vodcast distribution subtitling translation Vodcast distribution Since TU Delft likes to offer lectures to any student, no matter what his or her location is, several vodcast versions need to be produced. At the moment the only way to watch recorded lectures is by having access to a broadband Internet connection that has enough bandwidth to support the online streaming of videos. This makes it impossible to watch lectures while being in the train or bus, where a fixed high-speed Internet connection isn’t available (mobile GPRS and EDGE data networks do not suffice). A prototype for the integration between streaming and downloadable recorded lectures within the Blackboard environment is shown in Figure 6.1. This figure shows the different downloadable video formats in which this sample lecture is available, as well as the related course items. 6. Proposed improvements 67 Figure 6.1: Online viewing (YouTube) and available downloads and links for a recorded lecture (Source: http://blackboard.tudelft.nl CT3011-OpenCourseWare – Lecture new – demo-version) Subtitling and translation Subtitling has proved to be a substantial improvement to the online viewing experience of lectures. It is therefore recommended to display the subtitles of the spoken language for all different lectures in Collegerama. Furthermore, Dutch lectures (in the BSc phase) should be subtitled in proper English whenever the course is regarded as a useful resource for English speaking MSc students. For this goal, an automated translation as offered by Google Translate might be of insufficient quality. Additionally, English spoken courses could be subtitled in the Dutch language as a service to people who have trouble understanding English. Subtitles available in one or two languages enables automated subtitling in other languages. Such an automated subtitling system could be convenient for non-native English-speaking students. This service reduces the need for using a dictionary in order to understand the lecture, which is common for Chinese students in their first MSc year. Subtitles in the original spoken language can be created with the help of an ASR system such as SHoUT. The word-error rate of SHoUT is rather high (30%-70%), however these systems do provide an accurate timing of the spoken words. Human post-processing should improve the generated text and should divide the text in sentences, as is needed for proper subtitling. Figure 6.1 gives an impression of subtitles in the spoken language of a lecture. Subtitling and translated subtitles are further described in chapter 4. 6.2 Navigation and searching Recorded lectures have an average duration of 30 minutes for a short lecture and 100 minutes for a double lecture session (discounting the break time). For first-time viewing this might be considered as acceptable, resembling the live course environment. However for reviewing lectures at a later time, better browsing, navigation and search capabilities are required. This is especially true for students who are studying for the exam and are browsing through the course material and/or doing course assignments. Students also need a much better indication of the content of a certain lecture. The only available piece of metadata available is the lecture title. Searching for specific course content is not possible. The following improvements are recommended: • browsing the lectures and its content through a course navigator and/or table of contents • searching the course content (online search engine) • indication of the course content by presenting a tag cloud for each lecture The contours of a search engine and the creation of tag clouds have been described in chapter 5. A course and slide navigator could be produced from the content of the search engine. Such navigators function as an interactive table of contents. Figure 6.2 shows the improved navigation and searching within the Blackboard environment. 68 6. Proposed improvements Figure 6.2: Tools created from the Collegerama database (slide navigator, tag clouds and search application) will significantly improve the accessibility of recorded lectures (Source: http://blackboard.tudelft.nl CT3011-OpenCourseWare – Lecture (new) – demo-version) 6.3 Student interaction Live lectures given in a lecture room allow for a direct form of communication between students and lecturer. This communication is two-way. The lecturer might ask the students some questions and receive feedback in order to test his educational performance. The rest of his lecture will then be based on this response. When a recorded lecture is used, this form of communication is no longer available. A similar kind of discussion can be achieved by employing an online message board linked to each recorded lecture. Students will be able to ask questions, discuss events and topics during the lecture and receive feedback from the lecturer. During a live lecture the frequency of these questions is very low when the student attendance is very high. They are either too far away from the lecturer and/or students dislike interrupting a large classroom and drawing a lot of attention to them. Such an online messaging system also promotes student-tostudent discussion and interaction that is not possible during a live lecture, since it will hinder the other classmates. In general, an online discussion board linked to recorded lectures could greatly increase and promote frequency of students asking questions. An online discussion board will have even more value when the discussions are moderated by the lecturer or someone from the teaching staff. This moderation could include the answering of questions and the removal of silly unrelated remarks. This form of discussion can be complemented by adding the option to post time-tagged questions and comments. This means that the student can ask a question based on a certain timeframe within the lecture to which the question is relevant. With such a form of time-lined discussion, other students might look for specific remarks. These time-based discussions could be accessed by means of a search engine and/or a time slider that gives a popup whenever a discussion is related to that moment within the lecture. Figure 6.3 gives an impression of such a time-based discussion for online poker lectures. Figure 6.3: Time-lined online discussions on recorded lectures are common practice for the online educational poker community (Source: http://www.deucescracked.com/videos/1210-Episode-Seven) 6. Proposed improvements 69 6.4 Increasing course frequency Recorded lectures with improved accessibility and provided with online communication facilities could allow for the repeating of a course in the same academic year. These recurring courses might be of importance in the following situations: • students following a minor program in another faculty (all scheduled in the first academic semester) might miss courses in their own faculty • students with deadlines for BSc or MSc exams might encounter problems when preferred courses are not available in the current and/or next course period These students can now be given the option of following and trying to pass the course through self-study, since all recorded lectures and accompanying material can now be shared. It could facilitate better study results and shorter study durations. A moderating lecturer can provide students with the required assistance and help by answering questions via the online communication facilities. Figure 6.4 gives a visual representation of these recurring courses. Figure 6.4: Multiple scheduling of courses with recorded lectures and online/moderated assistance by a lecturer Time-critical courses If TU Delft wants to apply this program of recurring courses within the same academic year, then this multiple scheduling is beneficial to the following types of time-critical courses: • last year BSc courses • minor-program courses (inside/outside faculty in first semester) • courses for exchange students (Erasmus Mundus exchange in 1 semester) • courses in cooperation with other universities (unparalleled scheduling) • intensive courses (3 full weeks instead of 10 weeks of 30%) • courses in graduate school (for starting PhD students, multiple starting moments) Giving the students more freedom in choosing when to follow a certain course within an academic year, should have a positive influence on the time it takes for them to complete their education. Often times, a student will have to wait several months before he or she can follow a specific course that is required for them to finish their curriculum. Figure 6.5 shows a visual representation of the current lecture situation, along with 3 possible ways to execute such a recurring course system. The green bars represent a live lecture that is given in front of students in a classroom. The yellow bars represent a course that is given primarily online, in which no live lectures are available. The red dots constitute the moments of examination. 70 6. Proposed improvements Present situation Extra recorded course before second exam Extra recorded course in other semester Full year course Figure 6.5: Examples of multiple scheduled courses These additional online courses without live lectures should be provided with an online discussion board, to allow for the input of students by asking questions and comments of the lecturer by answering them. This further promotes students helping each other and starting a dialogue about the presented course material. The lecturer also acts as a moderator for this discussion board. Scheduling When all lectures are pre-recorded and available, it is easy to simply allow students access to all the lectures. In that fashion, they can decide whenever they want to watch a lecture. Another option is to create scheduled releases of pre-recorded lectures. This means that all lectures are made invisible, but are released at set intervals (for instance, every week). Such a system simulates the experience of following a live course in which students go to the classroom every week. This form of scheduled releasing of lectures might give the following advantages: • improving the weekly attendance by students (fixation in calendars of students) • increased concurrent attendance by concentrating students into virtual classrooms • allowing for moderation by lecturers (supporting the virtual classroom) In the online poker teaching community, such a system is already employed. They offer the recurring releasing of pre-recorded lectures on a weekly basis. An impression of such a schedule that is offered at a poker instruction website called Deuces Cracked is shown in Figure 6.6. Figure 6.6: Online poker courses are scheduled on specific days, in order to enlarge the attendance and to promote live online discussion (Source: http://www.deucescracked.com/) 6. Proposed improvements 71 6.5 Pilot project for further development Goals The above described improvements can best be developed in a pilot project under a real educational environment. The goals for the pilot project are summarized in Table 6.1. This table shows both the required short term improvements (1-3 years) as well as the long term goals (5-10 years). Table 6.1: Current situation and goals for future academic courses Current situation 1 time a year 1 location 1 language Short term improvements 2 times a year (each semester) between 1-3 locations (3TU) Dutch and English (subtitled) Long term goals 5 times a year 3-10 locations (associated universities) plus 1 or 2 other local languages The developments in Table 6.1 are based on two alternative approaches: • classroom courses, with a live moderating lecturer • scheduled self-study courses, with an online moderating lecturer It is recommended that these are developed within the scope of a pilot project and run alongside the ongoing TU Delft OpenCourseWare project. A similar concurrent pilot project could also be done at University of Twente. The project should include about 5 to 10 courses, giving enough content to apply for a YouTube-Edu account and/or an iTunes-U account. These platforms require a minimum volume of around 100 video lectures organized in 5 to 10 courses. The pilot project should focus on expanding the scheduling of courses from once a year to at least once per semester (repeated courses with recorded lectures) and the expansion of the course locations from only in Delft or Twente to at least one other location (simultaneous distant learning, with live streaming and the playing of recorded lectures). This approach covers a classroom environment. A classroom approach is preferred for this demo since it gives the smallest deviation to the current curriculum and it allows for the maximum amount of feedback from the students. In a second phase, the focus could be shifted more towards individual self-learners. In this phase it should be established whether a scheduled organization gives better results over a free agenda approach. Developments of new products Different additional new products have to be developed in order to achieve the above mentioned goals within this pilot project. Table 6.2 gives an overview of these products, for which Figure 6.7 gives the relations. 72 6. Proposed improvements Table 6.2: Additional products for expanded usability of recorded lectures Item Videos - HD-video (YouTube) - Mini-video (iTunes) * Table of contents - Course (Flash) - Lectures (Flash) Subtitles - Course language - NL / EN (optional) Search - Course search - Tag clouds Discussion board - Course discussion board - Lecture discussion board * if design differs from HD-videos Addition to / Replacement for Responsible Number per course MMS MMS 5 - 30 5 - 30 Collegerama catalog Collegerama slide navigator Lecturer Lecturer 1 5 - 30 Lectures NL in EN / EN in NL MMS MMS 5 - 30 5 - 30 TOC course/lectures TOC lectures MMS MMS 1 5 - 30 Course Lectures MMS / Lecturer MMS / Lecturer 1 5 - 30 Collegerama online view Collegerama online view Figure 6.7: Recorded lectures are embedded in a Multimedia Information Retrieval System, containing multimedia content and structured course and lecture metadata Requirements The usability of recorded lectures can be expanded with the following requirements and/or additional provisions: • proper recording • HD movie creation • post-processing by lecturer • post-processing by data creator Proper recording It is concluded that a better system for recording slides needs to be developed. Looking at the future of education and the increasing developments in technology, it seems clear that presentations are going to be supported by animation and video. This means that an old screenshot recording system will no longer be sufficient to properly record PowerPoint slides. Re-using recorded lectures requires proper recording of a lecture. For this the following guidelines can be given: • record lectures in a natural classroom environment (“recorded for a live audience”, no “talking head” recording) • no slides, no recording (if not, creation of slides is required during post-processing) • use full audio recording (minimum of 1 extra microphone, for introducing speaker and/or for the lecture room audience) • add full screen recording options to Collegerama, for animations, electronic drawing boards, movies, computer demos (minimum of 5 fps, preferably 10-25 fps) • original Collegerama camera (small size movie) should follow the lecturer at all times, never the projected slide or PowerPoint material 6. Proposed improvements 73 HD movie creation The creation of a HD movie from a Collegerama recording will allow for the distribution of recorded lectures via YouTube, iTunes and Blackboard. For the creation of this HD movie, the following conclusions and guidelines can be presented: • Collegerama recordings can be used as a basis for the creation of a HD movie (minimum of 1280x720) • a HD movie is preferred for streaming and distribution • a uniform design of HD movies is proposed • several LQ movies can be derived from this HD movie for the distribution on alternative platforms (mobile phones, mobile media players, iPod/iPhone) • the HD movie is prepared for subtitles (no hard-coded subtitles, always as separate subtitle text files) Post-processing by the lecturer Recorded lectures require post-processing with the following guidelines: • provide lectures with proper lecture titles, speaker names etc • divide lectures into 2-10 chapters (5-15 minutes per chapter) • connect the video time-frame to the original PowerPoint slides • eventually improve the slides and/or add slides (explaining text) This post-processing should be done either within the Collegerama system (by using special login access for lecturers) or in a new recorded lecture data system. Post-processing by Collegerama services Recorded lectures require post-processing by a data creator with the following guidelines: • import the slide data into a database (slide titles, slide content, slide notes) • create tag clouds based on subtitles or slide titles for each lecture • create subtitles for the lectures (at least in the spoken language, preferably in the additional Dutch or English language) • create interactive tables of contents (both for the lectures in a course as well as for the chapters and slides in the individual lectures) • create a search engine for course content and lectures • create a discussion board for the course and the individual course lectures • provide these elements within the Blackboard environment of the course The post-processing for creating subtitles might be largely reduced when better performing ASR systems become available, which includes statistical post-processing of the result set produced by the word decoder of the system. 74 6. Proposed improvements 7. Conclusions At present, Delft University of Technology records around 10% of their lectures. This number is expected to increase in the following years. Having these recorded lectures opens the door to all kinds of new ideas and improvements for their educational programs. At this moment they employ a video streaming system called Collegerama, which allows viewers with an active Internet connection to watch their lectures online. It combines a video stream of the lecturer with a series of screenshots of the accompanying PowerPoint slide. In this thesis, a broad spectrum of possibilities for expanding the usability of recorded lectures has been examined and evaluated. The main research question for this project is: How can we efficiently and effectively present recorded lectures and course material to students at universities? This main research question has been divided into three sub-questions, which are discussed below. How can we increase the accessibility and availability of the recorded lectures in Collegerama? To increase the availability of the lectures, it is recommended to create a single video file from the Collegerama recordings. This will allow for the distribution over many other popular online multimedia platforms, such as YouTube-Edu and iTunes-U. A single video file distribution allows for offline viewing without an active broadband Internet connection (for example, while sitting in the train or lying at the beach). This is not possible within the current Collegerama system. In this research project, a Collegerama lecture has been converted into a single video stream, after careful review of several layout designs and technical specifications. This lecture has been published on YouTube. Several other technical formats have been created, so that the lecture can also be distributed elsewhere. This includes a smaller sized version, created specifically for mobile devices and has been tested on Apple’s latest iPhone. How can we make recorded lectures easier to follow, especially for foreign speaking students? To make lectures easier to follow, it is concluded that the creation and displaying of subtitles is useful. These subtitles can automatically be translated using machine translation. For this research project, Google Translate has been used which currently supports translation to 52 different languages. Although the quality of these has not been tested on Collegerama, evaluations in EACL show that around 20% to 50% of the time, adequate edited translations was obtained with machine translation. If necessary, this generated text can be enhanced by manual post-processing. The current speech recognition technology has also been evaluated for the generation of proper subtitles, using the speech engine created by University of Twente called SHoUT. It has an average word error rate of 50% and it's concluded that this system is not yet sufficient to generate proper subtitles and manual post-processing to improve the output is always required. How can we effectively and efficiently navigate and search within recorded lectures? This research project has shown that to properly navigate through the available recorded lectures, the input from teachers is important. They need to provide the lecture title and divide their lectures into several chapters with a proper chapter title, based on separate timeframes (start time and end time). These chapters together with the slide titles and slide content form the foundation for navigation and searching. The search element can be further expanded by the available subtitles. For the purpose of this research project, all lecture titles and chapters provided by the lecturer, slide titles and content and the generated SHoUT transcripts for all 14 lectures (28 lecture videos) have been collected. The slide metadata has been digitally and automatically extracted from the original PowerPoint files. 7. Conclusions 75 All this new information and metadata has been stored in a multimedia database, so that the retrieval options for the lecture content could be researched. This database will serve as the source for all the additional options for navigation and searching: • generating a static and/or interactive table of contents for each lecture (based on the lecture chapters) • generating tag clouds • displaying subtitles in several different languages • searching within lecture material To demonstrate its functionality, a prototype for a Collegerama lecture search engine has been developed. This is an online web application that can be accessed from any location with an active Internet connection and searches within all the above mentioned data linked to a lecture. Every search result provides a link to Collegerama, so users can immediately see the related part of the lecture. The following conclusions can be made: • the best source of information for searching are human-made subtitles followed by ASR output • chapter titles and slide content has a low importance for searching • chapter titles and slide titles are only relevant for the generation of table of contents • by clustering subtitles or ASR per slide, multiple-keyword searching is largely improved because of shorter viewing times in the search results (in our example lecture, it was reduced from 5.3 hours to 21 minutes) For the proper operation of a search engine, the output of a speech recognition system with sufficient word correctness is required. Better retrieval rates can be obtained with full subtitles. In view of the other beneficial elements of subtitles (for machine translation, and for better following of the lectures) these subtitles are considered as an essential part of all recorded lectures. Future developments It is concluded that a better system for recording slides needs to be developed. Looking at the future of education and the increasing developments in technology, it’s clear that presentations are going to be supported by more animation and video. This means that an old screenshot recording system will no longer be sufficient to properly record PowerPoint slides. To further increase the usability of the recorded lectures, a new interactive way to discuss lectures with the teacher and other students needs to be introduced. It promotes the asking and answering of questions, not just by the teacher but also by fellow classmates. This can be done through the use of a dynamic message board that is linked to the timeline of each lecture. Students can comment and discuss on the different topics in the lecture. To support such a system, an extension of the current multimedia database is required, so that the messages along with their optional timeframes can be stored. With these recommendations, it is possible to use recorded lectures as a foundation for future online-given courses without the need for live lectures. 76 7. Conclusions List of references [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Improving Mid-Range Reordering using Templates of Factors Hieu Hoang and Philipp Koehn In: Proceedings of the 12th Conference of the European Chapter of the ACL, pages 372–379, Athens, Greece, 30 March – 3 April 2009. Association for Computational Linguistics 2009 SMT and SPE Machine Translation Systems for WMT’09 Holger Schwenk and Sadaf Abdul-Rauf and Loïc Barrault In: Proceedings of the 4th EACL Workshop on Statistical Machine Translation , pages 130–134, Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics 2009 Proceedings of the 12th Conference of the European Chapter of the ACL Athens, Greece, 30 March – 3 April 2009. Association for Computational Linguistics 2009 Proceedings of the 4th EACL Workshop on Statistical Machine Translation Athens, Greece, 30 March – 31 March 2009. Association for Computational Linguistics 2009 Proceedings of 2009 Named Entities Workshop (NEWS2009): Shared Task on Transliteration Singapore, 7 August 2009, The Association for Computational Linguistics and The Asian Federation of Natural Language Processing, 2009 Findings of the 2009Workshop on Statistical Machine Translation Chris Callison-Burch, Philipp Koehn, Christof Monz and Josh Schroeder In: Proceedings of the 4th EACL Workshop on Statistical Machine Translation , pages 128, Athens, Greece, 30 March – 31 March 2009, Association for Computational Linguistics 2009 An Overview of VC-1 Sridhar Srinivasan and Shankar L. Regunathan In: Visual Communications and Image Processing 2005, Proc. of SPIE, 2005 The VC-1 and H.264 Video Compression Standards for Broadband Video Services Jae-Beom Lee and Hari Kalva Springer 2008 (Google books: http://books.google.nl/books?id=MKhbLPHRb78C page 114-122) Conference Proceedings of the 10th Annual Conference of the International Speech Communication Association (Interspeech 2009 Brighton, UK) (including Abstract book and conference programme) International Speech Communication Association, 2009 (partly available at http://www.interspeech2009.org/conference/) Speech indexing Roeland Ordelman, Franciska de Jong and David van Leeuween In: Multimedia retrieval, Henk Blanken, Arjen P. De Vries, Henk Ernst Blok and Ling Feng, Springer 2007 The Rich Transcription 2007 Speech-To-Text (STT) and Speaker Attributed STT (SASTT) Results J. Fiscus and J. Ajot In: Presentation at NIST’s Rich Transcription 2007 Meeting Recognition Workshop, National Institute of Standards and Technology, 2007 Evaluation of Text and Speech Systems Laila Dybkjæ , Holmer Hemsen and Wolfgang Minker, Springer 2008 Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled (PhD thesis) M. A. H. Huijbregts University of Twente, Nov. 2008 List of references 77 [14] Multimedia Retrieval Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling Feng Springer 2007 [15] Languages for Metadata Ling Feng, Rogier Brussee, Henk Blanken and Mettina Veenstra In: Multimedia Retrieval, Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling Feng, Springer 2007 [16] Interaction Erik Broertjes and Anton Nijholt In: Multimedia Retrieval, Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling Feng, Springer 2007 [17] Evaluation of Multimedia Retrieval Systems Djoerd Hiemstra and Wessel Kraaij In: Multimedia Retrieval, Henk Blanken, Arjen P. de Vries, Henk Ernst Blok and Ling Feng, Springer 2007 [18] A Query Model to Synthesize Answer Intervals from Indexed Video Units Sujet Pradhan, Keishi Tajima and Katsumi Tanaka IEEE Transactions on Knowledge and Data Engineering, Volume 13, p. 824 - 838 IEEE 2001 [19] Towards a Unified Framework for Context-Preserving Video Retrieval and Summarization Nimit Pattanasri, Somchai Chatvichienchai, and Katsumi Tanaka In: Proceedings of 8th International Conferences on Asian Digital Libraries (ICADL 2005), Springer 2005 [20] TREC-6 1997 Spoken Document Retrieval Track - Overview and Results John S. Garofolo, Ellen M. Voorhees, Vincent M. Stanford and Karen Sparck Jones In: Proceedings of the 6th Text REtrieval Conference (TREC 6), National Institute of Standards and Technology (NIST), 1997 [21] Evaluation Campaigns and TRECVid Alan F. Smeaton, Paul Over and Wessel Kraaij In: Proceedings of the 8th ACM international workshop on Multimedia information retrieval (MIR 2006), Association for Computing Machinery (ACM), 2006 [22] Tagging: people-powered metadata for the social web Gene Smith New Riders (Pearson Education), 2008 [23] Submissions to the captioning standards review Department of Communications, Information Technology and the Arts; Australian Caption Centre 78 List of references List of URL’s [24] MIT Facts 2009: Faculty and Staff [online] [Cites 20 October, 2009] Access: http://web.mit.edu/facts/faculty.html [25] Enrollment Statistics: MIT Office of the Registrar [online] [Cites 7 November, 2009] Access: http://web.mit.edu/registrar/stats/yrpts/index.html [26] Research – The Center for Measuring University Performance [online] [Cites 7 November, 2009] Access: http://mup.asu.edu/research_data.html [27] Kauffman Foundation study [online] [Cites 7 November, 2009] Access: http://web.mit.edu/newsoffice/2009/kauffman-study-0217.html [28] MIT [online]. Instructor Profile: Walter Lewin [Cites 20 October, 2009] Access: http://ocw.mit.edu/OcwWeb/web/courses/instructors/lewin/lewin.htm [29] Our History – MIT OpenCourseWare [online] [Cites 20 October, 2009] Access: http://ocw.mit.edu/OcwWeb/web/about/history/index.htm [30] Home Page - OpenCourseWare Consortium [online] [Cites 20 October, 2009] Access: http://www.ocwconsortium.org/ [31] About – Academic Earth [online] [Cites 21 October, 2009] Access: http://academicearth.org/about [32] About VideoLectures.Net [online] [Cites 21 October, 2009] Access: http://videolectures.net/site/about/ [33] Alexa Top 500 Global Sites [online] [Cites 21 October, 2009] Access: http://www.alexa.com/topsites [34] YouTube Blog: Higher Education for All [online] [Cites 21 October, 2009] Access: http://www.youtube.com/blog?entry=uvxBVPuf4A8 [35] Apple Announces iTunes-U on the iTunes Store [online] [Cites 7 November, 2009] Access: http://www.apple.com/pr/library/2007/05/30itunesu.html [36] TAG - definition of TAG by the Free Online Dictionary [online] [Cites 7 November, 2009] Access: http://www.thefreedictionary.com/TAG [37] Google Translate [online] [Cites 7 November, 2009] Access: http://translate.google.com/ [38] Collegerama.nl [online] [Cites 28 October, 2009] Access: http://www.collegerama.nl/ [39] Paul Copier – LinkedIn [online] [Cites 28 October, 2009] Access: http://www.linkedin.com/in/paulcopier [40] Open Course Ware: Home [online] [Cites 28 October, 2009] Access: http://ocw.tudelft.nl/ [41] Inleiding Videolectures [online] [Cites 28 October, 2009] Access: http://www.utwente.nl/videolecture/ [42] Collegerama Etalage [online] [Cites 28 October, 2009] Access: http://collegerama.tudelft.nl [43] Silverlight-op-Linux plugin Moonlight 1.0 beschikbaar [online] [Cites 2 November, 2009] Access: http://webwereld.nl/nieuws/55091/silverlight-op-linux-plugin-moonlight-1-0beschikbaar.html [44] YouTube Community Guidelines [online] [Cites 7 November, 2009] Access: http://www.youtube.com/t/community_guidelines [45] Adobe – Flash Player Statistics [online] [Cites 2 November, 2009] Access: http://www.adobe.com/products/player_census/flashplayer/ [46] Learn More: Longer videos. - YouTube Help [online] [Cites 2 November, 2009] Access: http://help.youtube.com/support/youtube/bin/answer.py?hl=en&answer=7167 3 [47] Getting Started: Video length and size - YouTube Help [online] [Cites 2 November, 2009] Access: http://www.google.com/support/youtube/bin/answer.py?hl=en&answer=5574 3 List of URL’s 79 [48] PDF Reference – sixth edition [online] [Cites 7 November, 2009] Access: http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf [49] Adobe Acrobat Connect Pro [online] [Cites 7 November, 2009] Access: http://www.adobe.com/products/acrobatconnectpro/ [50] The Association for Computational Linguistics [online] [Cites 25 November, 2009] Access: http://www.aclweb.org/ [51] Radio Oranje demo [online] [Cites 7 December, 2009] http://hmi.ewi.utwente.nl/choral/radiooranje.html In: Choral (http://hmi.ewi.utwente.nl/choral/index.html) University of Twente, 2005-2010 [52] The EuroMatrix Project (Sept. 2006 – Febr. 2009) [online] [Cites 14 December, 2009] Access: http://www.euromatrix.net/ [53] EuroMatrixPlus – Bringing Machine Translation for European Languages to the User [online] [Cites 14 December, 2009] Access: http://www.euromatrixplus.net/ [54] Wordle – Beautiful Word Clouds [online] [Cites 14 December, 2009] Access: http://www.wordle.net [55] Make A Tag Cloud [online] [Cites 17 December, 2009] Access: http://www.makecloud.com [56] Keyword Cloud Generator [online] [Cites 17 December, 2009] Access: http://www.tocloud.com/ [57] How High-Tech Dream Shattered In Scandal at Lernout & Hauspie [online] [Cites 14 December, 2009] Access: http://web.archive.org/web/20060420133237/http://www2.gol.com/users/coy nerhm/how_high.htm [58] Integrating the Probabilistic Model BM25/BM25F into Lucene [online] [Cites 18 January, 2009] Access: http://nlp.uned.es/~jperezi/Lucene-BM25/ 80 List of URL’s Annexes A. Recorded lectures at MIT B. Recorded lectures in Collegerama 109-136 C. Collegerama as vodcast 137-170 D. Subtitling of Collegerama 171-198 E. Speech recognition 199-224 F. Searching 225-264 Annexes 85-108 81 82 Annexes Accompanying material DVD 1. Lectures CT3011 Lectures CT3011 28 lectures Maps per Lecture, each map contains the following material: Item Description Lectures (video) Slides SHoUT Collegerama video’s of all lectures Powerpoint presentations and abstracts SHoUT- output and trancripts Total Files 28 622 117 Total size (MB) 3.140 76 272 767 3.488 Lectures CT3011 Lecture no.15 Civiele Gezondheidstechniek Maps for additional produced material from Lecture no. 15 Civiele Gezondheidstechniek: Item Description Lecture presentation Collegerama_CT3011_11 Vodcast Vodcast Camtasia Vodcast_TU Presenter Flash Subtitles Speech recognition Powerpoint-file (plus pdf) Collegerama files Vodcast try-outs and results…. Screen captures Camtasia Vodcasts by TUDelft Presenter try-outs and results Try-out for Flash Subtitle files and try-outs Initial try-outs and evaluation SHoUT Total Accompanying material Files 2 296 42 3 2 59 31 24 12 Total size (MB) 19 125 3.081 148 229 269 981 122 13 471 4.987 83 84 Accompanying material Annex A. 1. 2. 3. Recorded lectures at MIT Massachusetts Institute of Technology Introduction Departments Students Staff OpenCourseWare Translated courses at MIT Recorded lectures at MIT Lectures Recorded lectures in MIT OCW Composition Camera setup Transcripts, captions and annotations Technical specifications of videos External publishing of MIT recorded lectures YouTube iTunes VideoLectures.net Academic Earth Annex A. Recorded lectures at MIT 87 87 88 89 90 90 91 92 92 92 94 95 96 98 100 100 103 105 107 85 86 Annex A. Recorded lectures at MIT 1. Massachusetts Institute of Technology Introduction Massachusetts Institute of Technology (MIT) is a private research university located in Cambridge, Massachusetts in the United States. It is one of the most prestigious technical universities in the world. Their reputation is based on their scientific output in the way of articles and reports and the awards of their staff. Seventy-three members of the MIT community have won the Nobel Prize, including seven current faculty members. According to their website, the mission of MIT is to advance knowledge and educate students in science, technology and other areas of scholarship that will best serve the nation and the world in the 21st century. (Source: http://web.mit.edu/registrar/stats/yrpts/index.html, http://web.mit.edu/facts/faculty.html, http://www.universityportal.net/2007/09/worlduniversity-ranking-of-engineering.html and http://web.mit.edu/aboutmit) Figure 1.1: MIT logo Annex A. Recorded lectures at MIT 87 Departments The education at MIT is organized into 6 "schools" which includes in total 30 departments, sections and programs (http://web.mit.edu/education). Table 1.1: MIT Schools with departments, sections or programs School Department, section or program ID Engineering Civil and Environmental Engineering Mechanical Engineering Materials Science and Engineering Electrical Engineering and Computer Science Chemical Engineering Aeronautics and Astronautics Biological Engineering Nuclear Science and Engineering Engineering Systems Division 1 2 3 6 10 16 20 22 ESD Science Chemistry Biology Physics Brain and Cognitive Sciences Earth, Atmospheric, and Planetary Sciences Mathematics 5 7 8 9 12 18 Management (Sloan School) Business / Management 15 Architecture and Planning Architecture Urban Studies and Planning Media Arts and Sciences (Media Lab) 4 11 MAS Humanities, Arts, and Social Sciences Economics Political Science Anthropology Foreign Languages and Literatures History Literature Music and Theater Arts Writing and Humanistic Studies Comparative Media Studies Science, Technology, and Society Health Sciences and Technology (Whitaker College) Health Sciences and Technology 14 17 21A 21F 21H 21L 21M 21 W CMS STS HST The departments (ID) are numbered in the approximate order of when the department was founded. 88 Annex A. Recorded lectures at MIT Students MIT has around 10,000 students (http://web.mit.edu/facts/enrollment.html) of which some 25% MSc students and some 35% PhD students. Table 1.2: MIT students per faculty, and per grade (2009-2010) Schools (Faculties) Undergraduate Sophomores (pre-academic), others Engineering Science Management Architecture and Planning Humanities, Arts, and Social Sciences Health Sciences and Technology Total Graduate MSc PhD Special 1,092 - - - 1,851 827 174 69 140 - 1,070 13 879 402 33 23 1,636 1,047 114 179 277 337 101 11 12 3 7 2 4,153 3,220 3,590 136 (Source: http://web.mit.edu/facts/enrollment.html) There are around 3,000 international students registered at MIT (30%), for the most part as graduate student (85%) (http://web.mit.edu/facts/international.html). About 50% of these international students originate from Asia and about 25% from Europe. The total number of students at MIT is comparable to Twente and Delft. However, MIT has a lot more PhD students and a relatively larger international population (Twente 12%, Delft 15%, for BSc and MSc only). Table 1.3: Number of students at Twente and Delft University BSc MSc Other PhD Twente University (2008) 5,409 2,099 737 Not published 160 / year Delft University of Technology (2007) 9,453 4,724 122 1,650 229 / year Twente: http://www.utwente.nl/feitenencijfers/onderwijs/totaal/inschrijvingen.doc/ Delft: http://www.tudelft.nl/live/pagina.jsp?id=dcb20543-c4f7-4e6d-891a-703b9e8f6701&lang=nl Students who apply for an education at MIT have to go through an evaluation program, which focuses on academic potential, strong personal qualifications and outstanding interests, activities and achievements (http://web.mit.edu/facts/admission.html). In 2008 alone, 13,396 candidates submitted their final applications for the freshman class and 1,589 (11.9%) were offered admission. The actual first-year enrollment was 1,051 (69% of admitted). Applicants for graduate degree programs are evaluated for previous performance and professional promise by the department in which they wish to register. In 2008, 17,271 candidates applied for a graduate education. Of the 3,680 candidates who received offers of admission, 2,300 or 63 percent registered in advanced degree programs at MIT. Annex A. Recorded lectures at MIT 89 Nine months' tuition and fees for 2008–2009 is $36,390. Summer term tuition in 2008 was $12,045 for students enrolled in courses. Additionally, undergraduate room and board is approximately $10,860, dependent on the student's housing and dining arrangements. Books and personal expenses are about $2,850 (http://web.mit.edu/facts/tuition.html). About 62% of the undergraduates receive need-based financial aid and 87% of the graduate students are supported by MIT fellowships, research assistantships, or teaching assistantships. Staff MIT employs about 11,500 people including around 3,500 researchers, 650 professors, 213 associated professors and 146 assistant professors (http://web.mit.edu/facts/faculty.html). The number of employees is much larger compared to the Dutch technical universities, because of their larger research programs and facilities. The University of Twente employs 2,804 people (2008) while Delft University of Technology employs 3,571 (2007). OpenCourseWare In 2000, MIT started the concept of publishing their course material on the internet for open access, called OpenCourseWare (OCW) (http://ocw.mit.edu). They published the first proofof-concept site in 2002, containing 50 courses. By November 2007, MIT completed the initial publication of virtually the entire curriculum, over 1,800 courses in 33 academic disciplines. (http://ocw.mit.edu/OcwWeb/web/about/history/index.htm) MIT publishes some courses in one or more translated versions, including Spanish, Portuguese, Simplified Chinese, Traditional Chinese, and Thai. Since 2008 MIT has added audio and video-taped lectures to their OCW website. These lectures were recorded between 1999 and 2008. These lectures are also published on YouTube, iTunes and VideoLectures.net. (Source: http://www.youtube.com/profile?user=MIT and itms://deimos3.apple.com/WebObjects/Core.woa/Browse/mit.edu and http://www.videolectures.net) Figure 1.2: MIT OCW homepage The OCW concept has received an enormous attention worldwide, both from students as well as from universities. In 2005 the OpenCourseWare Consortium was established to advance education and empower people worldwide through open courseware. At present about 200 higher education institutions and associated organizations from around the world are a member of this organization, including TU Delft, the Dutch Open University and HAN University of Applied Sciences (Hogeschool van Arnhem and Nijmegen). Because of the positive response on their OCW activities, MIT operates a rather large OCW office with close to 20 people (http://www.ocwconsortium.org). 90 Annex A. Recorded lectures at MIT Translated courses at MIT MIT has formally partnered with four organizations that are translating OCW course materials into Spanish, Portuguese, Simplified Chinese, Traditional Chinese, and Thai. OCW materials have been translated into at least 10 languages, including French, German, Vietnamese, and Ukrainian. Figure 1.3: MIT translated courses (Source: http://ocw.mit.edu/OcwWeb/web/courses/lang/index.htm) Our example course 18.06 Linear Algebra is available in Chinese (simplified), Portuguese and Spanish. Figure 1.4: Course 18.06 in Chinese In the translated courses, the recorded lectures are not provided with subtitles. The previous method of linking to RealMedia streams is used. Subtitles are only available at 60% of the lectures published at YouTube. In YouTube you have the option of automatically translating all the subtitles into 43 languages using the autotranslate function (enabled by Google Translate). Annex A. Recorded lectures at MIT 91 2. Recorded lectures at MIT Lectures Since 1999 MIT has recorded lectures for the most popular courses. The recorded lectures are presented in their internal network (http://web.mit.edu/[courseID], presently migrating to an uniform umbrella http://stellar.mit.edu/). These websites function as an E-learning System like BlackBoard or TeleTop. Figure 2.1: E-learning for course 18.06 (past and present) (Source: http://web.mit.edu/18.06/www/index.html and http://stellar.mit.edu/S/course/18/sp09/18.06/) Initially these video's were presented as RealMedia files and offered in 3 different bandwidth types, 56k, 80k and 220k. The highest quality version contains a video with a resolution of 320x240 pixels at 15 frames per second and an audio quality of 8,000 Khz at 8.5 Kbps. Since 2005 more lectures were recorded. These lectures are published as RealMedia files (220k), MP4 files (iTunes) and FLV (YouTube), all at 320x240 pixels at 15 frames per second. Since 2008, lectures were recorded at a higher quality for the purpose of publishing them as high quality videos on YouTube. These have a standard resolution of 480x360 pixels at 30 frames per second. A part of the recorded lectures are provided with transcripts. These are also use for subtitling and automatic subtitle translation on YouTube. You will see an icon light up that says CC, which stands for closed caption. A couple of published courses have Flash video's as further teaching material, sometimes with voice narration added. These video's are all animations and do not contain any pre-recorded material. They are also available in MIT-OCW under 'Lecture notes' or Assignments. Recorded lectures in MIT OCW The recorded lectures were included in their OpenCourseWare program since 2008. For publishing these, they have a special section on their OCW website. Figure 2.2: Multimedia section on MIT OCW (Source: http://ocw.mit.edu/OcwWeb/web/courses/av/index.htm) 92 Annex A. Recorded lectures at MIT Figure 2.3: Video page of Lecture 33 of Course 18.06 on MIT-OCW (Source: http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-005/VideoLectures/detail/lecture33.htm) In OCW, most of the recorded lectures are offered as RealMedia (220k with above mentioned specifications) and MP4 (converted from the 220k RealMedia recording). For streaming video, a link is presented to http://videolectures.net, from where the lectures could also be downloaded. These recorded lectures are also presented as embedded movies from YouTube. In the YouTube player, the user can navigate to the related lectures of this course (via the standard YouTube button on the right bottom). For some recorded lectures, transcripts are also available and shown under the embedded YouTube movie. Recorded lectures in MIT OCW might originate from a previous year, as suggested by the OCW course dating. Above mentioned course 18.06 is dated spring 2005. The videos were recorded live in the fall of 1999. The attached reading is a book that has been released in February 2009. More recent recordings are only presented on the MIT OCW website as MP4 downloads and as embedded YouTube streaming video. This course is announced on VideoLectures.net (April 2009). Figure 2.4: Video page of Lecture 1 of Course 5.60 on MIT-OCW (recent recorded lecture) (Source: http://ocw.mit.edu/OcwWeb/Chemistry/5-60Spring-2008/VideoLectures/detail/embed01.htm) Annex A. Recorded lectures at MIT 93 Composition Every MIT video has a camera that is fixed on the front side of the classroom. You generally see a walking professor explaining certain topics in front of a whiteboard. The video camera follows the professor and zooms in and out on the blackboard when the professor is writing on it. Sometimes even part of the searing area of the classroom is visible and you can see sitting students and people walking in. Figure 2.5: Screen shot for recorded lecture (Course 18.06 - Lecture 33) (Source: http://www.youtube.com/watch?v=sWh92ZnYfZE) Most MIT professors are only using the blackboard. PowerPoint slides, overhead projectors or projected illustrations are not often used. In some courses the professor uses an electronic blackboard, computer projected slides or overhead projectors. The content of these slides might be included in the video by zooming to the projected screen, or the recorded video might show at the relevant moment a text screen referring to the lecture material. Most often these slides are available as PDF-file under the 'Lecture notes'. Figure 2.6: MIT lecture with professor using slides, which are also included in the video (Source: http://www.youtube.com/watch?v=R90sohp6h44) Figure 2.7: MIT lecture with professor using an electronic blackboard, with related slides/screen slides (not recorded in the video) (Source: http://www.youtube.com/watch?v=tynCH4dosA8) 94 Annex A. Recorded lectures at MIT Figure 2.8: Sometimes the video is blacked for copyright reasons (Source: http://www.youtube.com/watch?v=UxdUvyBtfXY) The copyright issue is partly the reason why the recorded lectures at MIT do not show slides, illustrations etc. Camera setup Initially, a lecture was recorded with two cameras, one camera for the overview and one camera for the close-ups. Figure 2.9: Two different camera angels during a lecture (Source: http://www.youtube.com/watch?v=sWh92ZnYfZE) More recent recordings were done with up to 4 cameras. Figure 2.10: Two other camera angels during a lecture (Source: http://www.youtube.com/watch?v=2x3F08_8B80) All these multi-camera recordings were apparently done under the supervision of a movie director, since some post-recording editing and camera angle selection would have to be done. Annex A. Recorded lectures at MIT 95 Most of the latest recordings were done with a single camera. Details of demonstrations and blackboard writings are visible in these recordings by intensively using the zoom function of the camera: Figure 2.11: Single camera recording (Source: http://www.youtube.com/watch?v=kLqduWF6GXE) Transcripts, captions and annotations Around 60% of the recorded lectures of MIT are provided with a transcript. The transcripts are presented on the MIT-OCW website, on the page of the related lecture under the embedded YouTube movie. Most often transcripts are also available as a PDF file. In YouTube these transcripts are used for the YouTube Caption option, which shows subtitles in the bottom part of the movie. Captions or subtitles are available in YouTube since August 2008. Figure 2.12: Recorded lecture with captions on YouTube (Source: http://www.youtube.com/watch?v=sWh92ZnYfZE) (Note the new "Turn down the lights"-button in the upper right corner) The captions of this older recording are most probably created afterwards by computers, considering the rather poor timing of the subtitles. More recent recordings have a much better timing of the captions. 96 Annex A. Recorded lectures at MIT Figure 2.13: Recent (27 January 2009) published lecture with captions (Source: http://www.youtube.com/watch?v=ZwpwmGP5ITM) At starting a YouTube movie with captions, the player shows for a few seconds the language in the left upper corner (English for MIT lectures). This text is also shown after restarting the caption during display of the movie. Movies on YouTube can have captions in different languages. The viewer can select from the available languages via the CC button. MIT lectures are uploaded to YouTube with English captions only. Alternatively, the viewer can start the auto-translate option, which provides for an online translation of the (selected) captions. The viewer can select from 43 languages, including Dutch, Chinese (simplified and traditional) and Indonesian. Apparently, this option uses Google Translate. The auto-translate option is available in YouTube since November 2008, benefiting from the ongoing improvements and expansions of Google Translate. (Source: http://www.youtube.com/blog?entry=oqBeXa7v_aE, http://translate.google.com/# and http://www.google.com/intl/en/help/faq_translation.html) Figure 2.14: Auto-translating MIT lectures in YouTube (Source: http://www.youtube.com/watch?v=ZwpwmGP5ITM) Figure 2.15: Online translation of captions (English to Dutch) in YouTube (Source: http://www.youtube.com/watch?v=ZwpwmGP5ITM) Annex A. Recorded lectures at MIT 97 Since June 2008, YouTube can also show annotations on their movies, provided by the uploader of the movie. MIT does not use this option (no annotations loaded). (NB Since February 2009 also viewers can make annotations on viewed movies http://www.youtube.com/blog?entry=cfPYFjnzJIk) Technical specifications of videos In general the older recorded MIT lectures have a resolution of 320x240 pixels, at a frame rate of 15 fps. VideoLectures.net uses a larger resolution for their wmv and flv files, resulting in larger files but without any quality improvement. These files are probably converted from the MIT MP4 files. Table 2.1: Technical specifications of older recorded lecture movies (Course 18.06 - Lecture 33) Type Location Size (MB) Specifications RealMedia MIT-OCW + VideoLectures 59.4 Video: RealVideo 8 / 320x240 / 15 fps / 188 Kbps Audio: RealAudio 4.0 / 8.5 Kbps MP4 MIT-OCW + iTunes + VideoLectures 89.8 Video: MPEG4 Video / 320x240 (AR 1:1) / 15 fps Audio: AAC 44100Hz / stereo / 1411 Kbps FLV MIT-OCW + YouTube 96.7 VideoLectures 156.4 Video: Flash Video 1 / 320x240 / 15 fps Audio: MPEG Audio Layer 3 22050Hz / mono / 8 Kbps VideoLectures 366.3 WMV Video: Flash Video 4 / 352x288 / 15 fps Audio: MPEG Audio Layer 3 44100Hz / mono / 64Kbps Video: Windows Media Video 9 / 352x288 15 fps 650Kbps Audio: Windows Media Audio 44100Hz / stereo / 96Kbps Length: 41:52 (rm, flv on YouTube), 41:59 (MP4, wmv, flv on VideoLectures) In Google Video the 18.06 course is used to show that recorded lectures can be played at double speed with the VLC media player (http://sites.google.com/site/variablespeedlectures/). More recent recordings have the same technical specifications for the MP4 files, but are also available as HQ movies on YouTube, with a larger resolution and a larger frame rate. Table 2.2: Technical specifications of recent recorded lecture movies (Course 5.60 - Lecture 1) Type Location Size (MB) Specifications MP4 MIT-OCW + iTunes 101 Video: MPEG4 Video / 320x240 (AR 1:1) / 15 fps Audio: AAC 44100Hz / stereo / 1411Kbps FLV MIT-OCW + YouTube 110 Video: MPEG4 Video (H264) / 320x240 / 30 fps Audio: AAC 22050Hz / stereo FLV HQ MIT-OCW + YouTube 245 Video: MPEG4 Video (H264) / 480x360 (AR 1:1) / 30 fps Audio: AAC 44100Hz / stereo Length: 46:46 min (MP4 and flv) 98 Annex A. Recorded lectures at MIT The quality of the MP4 video is better than the older recordings, despite its comparable technical specifications. This is most probably caused by the inferior RealMedia codecs used in the older recordings. In YouTube this video is available as High Quality (with HQ symbol), giving the viewer the option to switch to 'normal quality'. The HQ YouTube recording has a larger resolution and a better audio than the MP4 file or the normal FLV file (480x360 / 44 kHz versus 320x240 / 22 kHz). This higher resolution significantly improves full screen display. The YouTube video's have a larger frame rate than the MP4 file (30 versus 15 fps). This should give a better visibility of movements (moving objects either or camera movements). Since January 2009 YouTube is converting its video store from the original FLV format with Sorenson codec into the FLV format with H.264 codec (MP4). The video quality of the recently recorded MIT lectures is comparable to the Collegerama lectures (wmv / 320x240 / 30 fps). The lectures on HQ YouTube have a better quality than used in Collegerama. The captions of the YouTube movies are made in a SubViewer (*.SUB) file either or in a SubRip (*.SRT) file being the only formats supported by You Tube (http://help.youtube.com/support/youtube/bin/answer.py?answer=100077). These formats do not contain language information. The language of the caption file is added as part of the YouTube upload procedure. It is possible to upload separate files for different languages to the same movie. Apparently MIT lectures are uploaded to YouTube with English captions only. As part as the upload procedure the uploader selects a font size for the captions. The auto-translate option in YouTube use the Google Translation API: http://code.google.com/intl/nl/apis/ajaxlanguage/documentation/ Annex A. Recorded lectures at MIT 99 3. External publishing of MIT recorded lectures YouTube YouTube (http://www.youtube.com) is the most popular website for video content. Nearly 20% of all global internet users visit YouTube with some 16 page views per visit (alexa.com, ranked #3 after Google and Yahoo). Since March 2009 (http://www.youtube.com/blog?entry=uvxBVPuf4A8) YouTube has a special section for Education (YouTube EDU: http://www.youtube.com/edu) in which about 150 universities and colleges from the USA have submitted some 25,000 video's (April 2009). These are not all recorded lectures, but also short movies (6 - 12 min.). Figure 3.1: YouTube EDU section (Source: http://www.youtube.com/members?s=ytedu_ms&gl=US) Starting in January 2008 MIT is publishing all their recorded lectures on YouTube. For this an user section has been created (http://www.youtube.com/profile?user=MIT). With over 25.000 subscribers, MIT is the most popular EDU member. Berkeley and Stanford are subscribed slightly less. Figure 3.2: MIT channel on YouTube (Source: http://www.youtube.com/profile?user=MIT) 100 Annex A. Recorded lectures at MIT At present (April 2009), a total of 893 recorded MIT lectures are available on YouTube, from 49 courses. The number of recorded lectures per course varies from 4 to 51 (on average 18). The recorded lectures mostly have a length of around 50 minutes, but may vary between 40 and 120 minutes. Shorter videos are most often introduction videos. At present (April 2009) around 30-40 recorded lectures are published each month. YouTube has a play list for each course. Figure 3.3: YouTube play list for course 18.06 (Source: http://www.youtube.com/view_play_list?p=E7DDD91010BC51F8) The playlist view gives a link to all recording of the lecture, including an introduction video if available. The play list also shows the ratings and views for each of these videos. The most viewed MIT lecture on YouTube is the first lecture of the course 18.06 Linear Algebra (http://www.youtube.com/watch?v=gVMRuLH6FdQ). This lecture is viewed about 201,000 times between January 2008 and April 2009 (440 views per day). The succeeding lectures are viewed roughly 5,000 and 6,000 times (12 views per day). Most probably only a very limited number of these viewers have watched the whole lecture. Figure 3.4: Views on YouTube of course 18.06 (period January 2008 - April 2009) (Source: http://www.youtube.com/view_play_list?p=E7DDD91010BC51F8) YouTube also allows for rating and commenting the movies. The most viewed lecture (see above) is rated 498 times and commented 360 times (during 16 months). Annex A. Recorded lectures at MIT 101 Figure 3.5: Lecture page for the most viewed MIT lecture on YouTube (Source: http://www.youtube.com/watch?v=gVMRuLH6FdQ) This lecture is rated as 5 stars ('Awesome'). The comments are mainly positive ("free lectures"). Later lectures of this course are rated and commented much less . Lecture 33 has received a rating of 4 and 1 comment (http://www.youtube.com/watch?v=sWh92ZnYfZE). YouTube also registers the website from which the viewer has reached the page. For the most viewed lecture around 10% originates from the MIT-OCW site. The rest might come from YouTube itself, or might be unregistered. Since February 2009, YouTube viewers can make annotations on viewed movies (http://www.youtube.com/blog?entry=cfPYFjnzJIk). This feature can be disabled by the uploader (owner) of the videos. Apparently, annotations are blocked by MIT. Since February 2009, YouTube is testing the possibility for downloading their movies via an additional download button (http://www.youtube.com/blog?entry=Mp1pWVLh3_Y). Owners of the movie (uploaders) may allow viewers for this option, for free either or charged. MIT is not one of the testers for this option. 102 Annex A. Recorded lectures at MIT iTunes iTunes is the media player of Apple. The program organizes and plays all kinds of digital media. It forms the direct link to the "iTunes Store", from which users can buy and download songs and other multimedia content. On January 6, 2009 Apple announced that over 6 billion songs had been downloaded since the service first launched on April 28, 2003. iTunes U is a part of the iTunes Store featuring free lectures, language lessons, audio books and more. At present (April 2009) iTunes U holds over 100,000 educational audio and video files from top universities, museums and public media organizations from around the world. About 100 international universities and colleges have published content on iTunes U, including MIT, Yale, Stanford, UC Berkeley, Oxford, Cambridge, Freiburg, Lausanne, TU Aachen and Melbourne. Figure 3.6: ITunes U in iTunes MIT has a section on iTunes U, as one of their featured providers (next to amongst others Yale, Oxford, UK Open University). Figure 3.7: MIT section on iTunes U In the MIT section the MIT-OCW content is clustered into 21 sections largely reflecting the MIT departments. At present (April 2009) MIT has some 1,500 files available, including video, audio, and transcripts. Moreover the MIT section includes some 200 other files. Selecting an OCW cluster shows the courses of the selected department. Annex A. Recorded lectures at MIT 103 Figure 3.8: MIT-OCW Mathematics Selecting a course gives the video recording and transcripts of the selected course. Figure 3.9: Play list of Course 18.06 Selecting a video by clicking the 'Get movie' button starts downloading the video, which is stored on the user's library and from there available to be played in iTunes. Figure 3.10: Course 18.06 lecture 1 played in iTunes In iTunes, the transcripts of the recorded lectures can also be downloaded (if available). These downloadable files are the PDF-files which are also available on the MIT-OCW page of the respective lecture. iTunes does not show the number of downloads and also does not allow for ratings or comments by users. 104 Annex A. Recorded lectures at MIT VideoLectures.net VideoLectures.net is a portal for recorded lectures (and interviews). It started on 2002 as a project at the Jozef Stefan Institute in Slovenia. VideoLecture has offices in Slovenia and the Ukraine. At present (April 2009), the portal contains nearly 7,000 video's from around 4,500 presenters all over the world. Figure 3.11: Homepage of VideoLecture.net (Source: http://videolectures.net/) VideoLectures.net has no provisions for transcripts or for closed captions. Since October 2008, MIT OpenCourseWare has its own portal site on VideoLectures.net, which contains all recorded lectures of MIT-OCW. Figure 3.12: MIT-OCW portal at VideoLectures.net (Source: http://videolectures.net/mit_ocw/) The MIT-OCW lectures are available as streams (flv and wmv) and as downloads (rm, MP4, flv and wmv). Each course has an overview page containing links to the MIT-OCW course and a visual overview of all recorded lectures of that course. This overview includes the length of the video and the number of views. The first lecture of the 18.06-course has 681 views and the last lecture has 15 views. There are 3 comments on this course. The course is most probably available since the beginning of March 2009. This means some 17 views per day for the first lecture, compared with 440 views per day for this lecture on YouTube. Annex A. Recorded lectures at MIT 105 Figure 3.13: Homepage at VideoLectures of the 18.06 Linear Algebra course (Source: http://videolectures.net/mit1806s05_linear_algebra/) Each lecture has its own page, showing the available streams and downloads, as well as the rating and comments of the viewers on the lecture itself. For rating and commenting, the viewer should be logged in. Figure 3.14: Video page of Lecture 33 of Course 18.06 on VideoLectures.net (Source: http://videolectures.net/mit1806s05_strang_lec33/) This lecture is available as: • streams: FLV (JW-player Flash), WMV (Windows Media Player) • downloads: FLV, MP4, RealMedia, WMV There are no comments or ratings submitted by viewers of this lecture (nor for other lectures of this course). Transcript and subtitles of the MIT lectures are not available at VideoLectures.net. 106 Annex A. Recorded lectures at MIT Academic Earth Academic Earth (http://academicearth.org) is a website for recorded lectures from selected universities. They collect videotaped lectures from the internet and present this on their own website. Their website received more than one million visits in the first three months since its January 18, 2009 beta launch, with more than 50% of users coming from outside of the US (http://www.slideshare.net/academicearth/academic-earth-one-million-visits-1364816). At present (April 2009) they show 60 videotaped courses (including 1,500 lectures) and 900 guest lectures from 6 USA universities, i.e. Berkeley, Harvard, MIT, Princeton, Stanford, and Yale. They don't have any formal relation with these universities, but they collect the videotaped lectures from the OpenCourseWare sites of these universities. Figure 3.15: Homepage of Academic Earth The course and lectures are sorted by origin (university) and subject. Top ratings (visits) are presented per course, playlist, lecture and instructors. Their website has a very colorful user interface which is filled with pictures, which makes the whole appeal very attractive to visitors. By selecting MIT under Universities, the user gets an overview of the 23 MIT courses included, listed by 'Subject' (Chemistry, Mathematics, etc.), as well as by 'Featured' (Featured, Most view, Top rated). Figure 3.16: MIT page on Academic Earth (Source: http://academicearth.org/universities/mit/) Annex A. Recorded lectures at MIT 107 Figure 3.17: MIT courses on Mathematics at Academic Earth (Source: http://academicearth.org/universities/mit/subject:20) By selecting a course the viewer gets all lecture of this course. Figure 3.18: Homepage at Academic Earth of the 18.06 Lineair Algebra course The course number at Academic Earth might differ from the course number in the original OCW site. Lecture 33 of Course 18.06 of MIT is presented as Lecture 34 at Academic Earth due to the fact that the original course includes a course number 24 and 24B. The video page of a course shows the video itself, as well as links for sharing on different social networks (Facebook, Digg, etc.), code for embedding the video on user's webpage, citation to the original source, the available downloads, and subscription on iTunes and RSS for video or audio. Figure 3.19: Video page of Lecture 33 of Course 18.06 on Academic Earth The available downloads include one video file (mp4, 240 MB, 320x240, 15 fps) and one audio file (mp3, 12.4 MB). The downloaded video has the same specifications as the mp4 file available at MIT OCW, but is nearly 3 times larger in file size (240 vs. 89.8 MB). For Lecture 33, a transcript is also available at Academic Earth. At the video page the mean rating result is shown and the number of viewers who have rated the lecture. The actual number of views is not shown. 108 Annex A. Recorded lectures at MIT Annex B. 1. 2. 3. 4. 5. Recorded lectures in Collegerama History Development of Collegerama The early years Collegerama as a service Collegerama live OpenCourseWare Collegerama for research congresses Collegerama for famous speakers Collegerama as streaming video server Collegerama at the University of Twente Recording of a lecture Collegerama recording Presentation options Multimedia lectures with Collegerama recording Disturbed recording Presentation of recorded lectures Course catalog Alternative display options Player options Navigation Technical specifications Data storage Synchronization between slides and video Evaluation Annex B1. Collegerama lectures of CT3011 in OpenCourseWare Annex B. Recorded lectures in Collegerama 111 111 111 113 114 115 116 116 117 118 120 120 121 122 123 124 124 125 126 127 128 128 128 131 132 109 110 Annex B. Recorded lectures in Collegerama 1. History Development of Collegerama The section Multimedia Services (MMS) of Delft University of Technology started in 2000 with the development of Collegerama (www.collegerama.nl) in a pilot project on Streaming media (http://www.linkedin.com/in/paulcopier). The main goal was the recording of lectures which could be viewed by students within Blackboard (http://blackboard.tudelft.nl), the digital learning system used at TU Delft. These 'web lectures' were regarded as instruments to improve study results and to increase the efficiency of the education at TU Delft. MMS selected the commercially available system Mediasite, by Sonic Foundry, as a basis for Collegerama. The term Collegerama is a private brand created by the TU Delft, so that they could be independent from the technical infrastructure for their web lectures. Selecting a standard product avoids the high development cost for creating a new system. By using an existing solution, the university also has the added benefit of getting new updates and features within the Mediasite platform. At present Sonic Foundry claims to be the global leader for enterprise webcasting (www.sonicfoundry.com). Mediasite is their flagship product (www.sonicfoundry.com/mediasite/). Figure 1.1: Collegerama is a TU Delft brand name for weblectures, at present based on the Mediasite platform The early years An overview of the developments of Collegerama can be found on the Collegerama main catalog page (http://collegerama.tudelft.nl). From September till December of 2004, 3 presentations have been recorded with Mediasite as part of tests for the technical infrastructure of Collegerama. These web lectures were filmed with very poor audio recording equipment (no special microphone for the speaker) and a small sized video recording (256x192 resp. 240x180). By that time, 240x180 was the standard video size for Mediasite recordings. Prior to this, in April and May 2004, Mediasite was already used for recording 25 lectures (4045 minutes) of the BSc course TN2012 Quantum mechanics. This was the last year in which professor Barend Thijsse was teaching this course, who was recognized as an outstanding teacher. He gave that course together with his successor professor Leo Kouwenhoven. They used a tablet PC as a blackboard and had speaker microphones attached to their shirts or jackets. The recorded courses were used during the succeeding years as a reference until a drastic curriculum change in September 2008. Annex B. Recorded lectures in Collegerama 111 Figure 1.2: Collegerama recording of the 25 lectures of Barend Thijsse and Leo Kouwenhoven in 2004 (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=735a8c5902864988b01157c16f8e632e) In April of 2005, Collegerama was used for the recording of 4 presentations of the Future Design competition. These presentations, with durations between 15 and 21 minutes, were used for promotional purposes. The presentations were recorded in the MMS studio as opposed to a classroom. The video size is still small (240x180), but the speaker voices were now recorded with the use of a small head microphone. Figure 1.3: Collegerama recording of the Future Design competition in 2005 (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=04f4279ee6c14782aee3d2603143e207 ) In January of 2006, Collegerama was used for recording of the closing speech by the Rector Magnificus, Prof. Dr. Ir. J.T. Fokkema, at the 164th Dies Natalis of Delft University of Technology. This was the start of a yearly tradition in recording these speeches. This time the video was recorded at a resolution of 320x240, which is still the standard Collegerama video resolution today. Figure 1.4: The Collegerama recording of Dies Natalis in 2006 started a yearly tradition for Collegerama recordings (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b3338ad192f043acae7193f83e2033e5) 112 Annex B. Recorded lectures in Collegerama Between September and December of 2006 Collegerama was used for recording the 30 lectures (40-45 minutes) of the BSc course TN2545 Signals and Systems by prof. Lucas van Vliet. Normally this course was given in the Dutch language. The recorded course was given in English to allow non-Dutch speaking students to follow this course. The recorded lectures consist of videos showing the lecturer and synchronized screenshots of a Tablet PC, used as an interactive blackboard. These recorded lectures were actually used for several years, until in September 2009 a new lecturer took over the course. They are currently available on Blackboard as reference material. Figure 1.5: Collegerama recording of course TN2545 in fall 2006, using a Tablet PC as interactive blackboard (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9) Collegerama as a service Starting in September of 2007 Collegerama became part of the regular facilities at TU Delft, for education under the responsibility of the University Corporate Office for Education and Student Affairs (O&S). This office is also responsible for the Electronic learning system Blackboard. As a consequence, recording of lectures was financed by the Corporate Office and became free for the lecturers at the TU faculties. Before that time recordings were made at a rate of € 500 per recorded session of 45 minutes. Moreover the scheduling of recording units and operators is now organized by O&S and lecturers can apply there to have their lectures recorded. This service has resulted in a dramatic increase of recorded lectures. In September and October of 2009, around 60 to 75 lectures are recorded each week (30 to 40 sessions of 2 lectures of 45 minutes each). This amounts to 5% of all lecture hours given each week at TU Delft. Eight mobile recording units are available for Collegerama. These units can be used at all faculties of TU Delft. At the faculty of Mechanical Engineering, 2 lecture rooms are provided with Collegerama recording equipment. In September 2009 this faculty was faced with a huge student overflow. The 500 first-year students did not fit in the largest lecture room available, which had a capacity of 300 students. To overcome this problem, they used two lecture rooms for the first-year lectures. In one lecture room the lecturer gives the lecture which is recorded and streamed to the other lecture room. The recorded lectures are available afterwards via Blackboard. The faculty calls this service: "lectures in a movie theater". Annex B. Recorded lectures in Collegerama 113 Figure 1.6: The Collegerama recording is streamed to the 'movie theater' next door at Mechanical Engineering (Source: Delta 27, 17 September 2009) In September 2009 a total of around 5.000 lectures have been recorded at TU Delft. A Collegerama lecture requires approximately 200 MB of server storage capacity. The total storage requirement of the Collegerama server is around 1 TB, which is about 10% of the storage requirement of the Blackboard system at TU Delft. Collegerama live The mobile recording units of Collegerama have a personal storage unit. After the lecture has finished recorded, these are uploaded to the central Collegerama server. It is also possible to stream this recording to the server immediately while recording, thus generating a live stream to the outside world. This live streaming process has a 5 to 10 seconds delay between recording and broadcasting. In the Collegerama setup, a URL for a Collegerama lecture is automatically created 4 hours before the recording. This URL is published before the lecture starts, so that every student can watch it from their own room or any other study location that has live Internet access. This live streaming system was used for the course CT2011 Watermanagement in September-October 2009. The course was moved within the curriculum from the third year to the second year, which caused the student attendance to double to about 500 students. This largely exceeded the maximum seating capacity of the largest classroom available at the faculty of Civil Engineering (it holds only 350 students). To reduce the number of students attending the lectures, they were scheduled on Monday and Friday during the first two lecture hours. Moreover the lectures were announced to be broadcasted live. The lectures got a rather wide media attention under the title 'lecture from your bed'. The system was a huge success. After the initial lecture, the number of attending students reduced to around 100 attendees, with a large number of viewers during lecture hours or several hours after the lecture. The movie theater lecture room stayed empty after the first lecture. 114 Annex B. Recorded lectures in Collegerama Figure 1.7: The Collegerama-live recordings in September 2009 for the course CT2011 were announced as 'lectures from your bed' (Source: Delta 27, 17 September 2009) OpenCourseWare In March 2007, the TU Delft started a pilot project for OpenCourseWare (http://ocw.tudelft.nl). In this pilot project the course material of around 20 MSc courses from 6 different disciplines were published. Collegerama lecture recordings were part of this material. This initiative was very well received and the Collegerama recordings were said to be of extraordinary quality. Because of the national and international response, the TU Delft decided in 2008 to continue its OpenCourseWare program at a more extensive scale. Figure 1.8: Collegerama recordings form an important element in the OpenCourseWare program of TU Delft (Source: http://ocw.tudelft.nl) In October 2009, the TU Delft hosted the yearly conference of the OpenCourseWare consortium, in which more than 200 universities worldwide are active (www.ocwconsortium.org). The Director of Education and Student affairs of TU Delft is a member of the board of the OCW Consortium. TU Delft works on the renewal of its OpenCourseWare website. One of the goals for this is to give the recorded lectures a more pronounced exposure at the OCW website and to give it the look and feel of the original Blackboard courses. Annex B. Recorded lectures in Collegerama 115 Collegerama for research congresses Collegerama is not only used for lectures. The first scientific conference fully recorded with Collegerama at TU Delft was held in June 2007. Figure 1.9: The first scientific conference fully recorded in Collegerama was held in June 2007 (Source: http://drinkwater.tudelft.nl) Afterwards more congresses, inauguration speeches, PhD defenses have been recorded with Collegerama. As an example of this the yearly vacation course in Drinking water and Wastewater can be mentioned. Every year over 400 water professionals take part in this course to listen to presentations by national and international experts. The entire course of January 2008 and January 2009 were recorded with Collegerama. These recordings are publicly available at http://drinkwater.tudelft.nl. Collegerama for famous speakers Collegerama offers the opportunity to record the speeches of famous speakers. Examples are the recordings of the Dutch astronaut André Kuiper (in September 2004), of the famous Italian designer Alberto Alessi (in December 2004), of the President of the Republic of Mozambique Mr Armando Emilio Guebuza (in February 2008) and of the Dutch Crown prince Willem Alexander (in June 2009) (http://collegerama.tudelft.nl). Figure 1.10: The Dutch Crown prince as a speaker at the conference in Delft on Sustainable built environments in June 2009 (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=4af4ce446fe142189e76e920acfcdc15) 116 Annex B. Recorded lectures in Collegerama Collegerama as streaming video server Collegerama is not only used for recorded lecturers. It is also used as the streaming video server for all kind of video recordings. In 2006 the docusoap 'Delft Blauw' was broadcast at RTL5 TV. This TV show followed the daily life of 9 students at the TU Delft who lived in a board house in the old city of Delft. It was made in cooperation with the university to show different aspects of their educational and research programs. A total of 13 episodes have been made, which are still available at the TU Delft website in the form of Collegerama recordings. Figure 1.11: The TV serie Delft Blauw, (recorded in 2006) can still be seen in Collegerama at the TU Delft website (Source: http://www.tudelft.nl/live/pagina.jsp?id=260550da-bf3a-48a1-9a44-94e367d7f041) In September 2009 the Executive Board of TU Delft started using Collegerama for video messages, to communicate with employees and students of TU Delft. This is done in the form of recorded monologs or interviews with an average duration of 5 to 10 minutes. At present, no slides are used for these recordings. The video messages and interviews are in the Dutch language and provided with English subtitles (hard branded in the movie recording). The Collegerama recording is embedded in the TU Delft website. They have also been uploaded to YouTube within the TU Delft channel (http://www.youtube.com/user/tudelft). Apparently the university prefers to embed the Collegerama viewer over embedding the YouTube upload (avoiding YouTube branding). Figure 1.12: The TU Delft board uses Collegerama for video messages (October 2009) (Source: http://www.tudelft.nl/live/pagina.jsp?id=e323c397-d00a-4721-9f15-67fb278e1b36 and http://www.tudelft.nl/live/pagina.jsp?id=4e31e711-82fe-4971-8d33-53998efeadc8) Videos of more than 10 minutes are only available at the Collegerama platform. At the time of writing (October 2009), the TU Delft does not have a YouTube Edu account which would allow the uploading of longer videos. Annex B. Recorded lectures in Collegerama 117 Collegerama at the University of Twente In 2007 the University of Twente started a pilot project on videotaped lectures. This pilot project used the experience of TU Delft with its Collegerama system. The same technical infrastructure of Collegerama is also used at the University of Twente. Within the Twente pilot project, the lectures of 10 BSc courses have been recorded. One of the courses was the course 214020 Algorithms, Data structures and Complexity. Between November 2007 and January 2008, 8 of their lecture sessions have been recorded. Afterwards the 7th lecture session was not available due to technical difficulties. The recorded sessions include two lecture hours (40 minutes each) and the intermediate coffee break (a 15 minute recording of a clock). Initially these recorded lectures were hosted at the TU Delft Collegerama server: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a02834b7f1d0dfdb The URLs of these lectures were published at Teletop, the digital learning system of the University of Twente. Somewhere in the summer of 2009 the recordings were moved to the University of Twente servers (or domain): http://videolecture.utwente.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a02834b7f1d0dfdb Figure 1.13: Collegerama recording of the course 214020 at University of Twente in November 2007 (Source: http://videolecture.utwente.nl/mediasite/Viewer/?peid=bcb88779-b54c-4d38-a028-34b7f1d0dfdb) After each recorded course, an evaluation form was used to register the opinion of the students. Based on the positive results of the pilot project it was decided to continue the project. Since September 2008, lectures at the University of Twente can be recorded with Collegerama i.e. Mediasite (http://www.utwente.nl/videolecture/). Figure 1.14: Videolectures as a service, at University Twente since September 2008 (Source: http://www.utwente.nl/videolecture/) 118 Annex B. Recorded lectures in Collegerama At the University of Twente, 2 lecture rooms are available with recording facilities for Collegerama/Mediasite (Horst C101 and Cubicus B209). There is also one mobile recording unit available (Spiegel, for room 1, 2, 4 and 5). This mobile set can also be used in other buildings and lecture rooms, if requested. The service for recording lectures is free of charge and is provided by the ICT Service Centre of the University of Twente. In September 2009, the University of Twente started using Blackboard as its digital learning system as a replacement for Teletop. At present, the TU Delft and the University of Twente use the same technical infrastructure for both the digital learning environment as well as the system for streaming recorded lectures. Annex B. Recorded lectures in Collegerama 119 2. Recording of a lecture Collegerama recording Collegerama has two possibilities for recording the lectures. They can either use a stationary setup that has been placed at a few classrooms at the TU Delft, or they can use the mobile station which used at any given location. Both of the systems consist of a stationary webcam which can be operated remotely by use of a joystick. The operator, usually a student aid, makes sure that the camera is always pointed at the lecture while the lecturer is moving around the classroom. Figure 2.1: Stationary recording unit for Collegerama Figure 2.2: Mobile recording unit for Collegerama The laptop that comes with the presenter unit is connected to a beamer, so that the PowerPoint slides can be viewed in the classroom and recorded by the system. The recording system takes screenshots of the computer screen that is visible on the beamer, based on computer activity. Every 1 to 4 seconds, the system checks for a change on the computer screen. If a different slide has been loaded or the position of the mouse has been changed, a new screenshot will be saved as a JPEG image file. The disadvantage of this is that a lot of abundant images might be saved and used in the eventual Collegerama presentation, which is later published online. After the lecture has been given, the data will be sent to the presentation server. It will process the different data sources and create three different outputs: • an audio/video stream (wmv file) • pictures of the different PowerPoint slides or computer screenshots (jpeg files) • different settings and additional information about the lecture (xml file) 120 Annex B. Recorded lectures in Collegerama The presentation server will synchronize all the different elements and will store the required information in the XML file. This information will later be used to correctly display the video in combination with the screenshots. When the presentation has been processed, it is written to the Collegerama web server (http://collegerama.tudelft.nl) and is now available for students with Internet access all over the world. Presentation options During the presentation, the lecturer is provided with three different presenting options: • blackboard • • The lecturer uses the black board or an overhead projector to give his lecture, while the video camera records the content. PowerPoint This works in combination with a prepared set of PowerPoint slides that will be displayed while the presentation is being given. screen capturing The contents of the computer screen will be displayed during the presentation, which allows for the lecturer to use external software such as computer simulations or written text on a Tablet PC and record the results as separate screenshots. Figure 2.3: Examples of the three presentation options Each of these presentation options uses the same storage system, which is based on screen activity. Every 1 to 4 seconds, the system will store the current screen that is visible on the beamer and they will be stored as images. Especially with the blackboard and desktop methods, there will be an abundant amount of images stored, since every mouse movement and change on the screen, when writing down notes, will cause a new screenshot to be saved. Collegerama uses a uniform view for all three presenting options, as is shown in the examples given below. Figure 2.4: Collegerama with black board (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=ca42dce5-bb51-4c39-93de-50528dd6b880) Annex B. Recorded lectures in Collegerama 121 Figure 2.5: Collegerama with PowerPoint (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=724886f7-cfd0-441d-ae85-1fae0cbb28a1) Figure 2.6: Collegerama with Tablet PC (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b7d4c81eed134ff68781e84ba05002e9) The three presenting options differ significantly in the number of slides (or screenshots). Table 2.1: Number of slides/screenshots for the three Collegerama examples Presenting option Number of slides Blackboard PowerPoint Screen capturing 0 (no slides picture) 30 308 Navigation pages (list – small – large) 0-0-0 2-3-5 12 - 15 - 29 The figures show that Collegerama is very suitable for registration of lectures in which a PowerPoint presentation or a Tablet PC is used. In these cases the most detailed information is presented on the presentation block. For a lecture with BlackBoard only, the Collegerama system is a little superfluous, but still gives a proper registration of the lecture. With respect to navigation, only lectures with PowerPoint seem to be suitable for a Collegerama recording. Blackboard lectures lack the navigation by slides/screen shots, while Tablet PC lectures have too many screenshots for a proper navigation. For the latter, the screenshots can be clustered in chapters as part of the post processing of a Collegerama recording. Multimedia lectures with Collegerama recording A multimedia lecture can be recorded within Collegerama by either recording the projected movie or recording the screen (full time screen capturing). The Collegerama operator can then choose between recording either the camera or the presentation PC (full screen recording). During the full screen recording, the slide in the Collegerama slide section is either the previous slide or screen shots of the movie. 122 Annex B. Recorded lectures in Collegerama Figure 2.7: Screen recording in Collegerama for multimedia lectures (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=f9379fda-848c-4a6f-8210-a5fe2b91edb7) Figure 2.8: Screen recording in Collegerama for multimedia lectures (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b8c63b74-be43-4864-8a6c-17cf6a8130d4) Disturbed recording Recording of slides can be disturbed by unwanted screen activities caused by improper mouse movements. In this way, a recording with 575 screenshots was obtained from 20 slides. Figure 2.9: Collegerama recording with screen disturbing (575 screen shots from 20 slides) (Source: http://collegerama.tudelft.nl/mediasite/Viewer/?peid=b8c63b74-be43-4864-8a6c-17cf6a8130d4) Annex B. Recorded lectures in Collegerama 123 3. Presentation of recorded lectures Course catalog All recorded lectures of a course are grouped in a 'Catalog' which can be accessed by its own URL. This catalog functions as a directory of the recorded lectures. Figure 3.1 shows an example of the catalog for course CT3011. Figure 3.1: Course catalog in Collegerama (Source: http://collegerama.tudelft.nl/mediasite/Catalog/?cid=16b5f5fa-0745-4b8b-9f02-f79a03abf50a) Each lecture can be started by clicking the lecture hyperlink. The catalog gives metadata of the lectures such as name of the lecture, the presenter, the recording (air) date and the duration. The recorded lectures can be sorted by lecture name, date and presenter. The URL of the catalog can be saved as both an RSS feed as well as a browser link. Most lectures at the TU Delft do not use the catalog. Most often the lecture links are included in the BlackBoard site of the course giving the lecturer more flexibility in presenting related information, such as handouts, downloads, links to other course items etc. Some lectures give a short content list of a lecture in BlackBoard, giving the student more information about the content of the course. Such list can be used as direct link into the course by using the start time (in milliseconds) in the URL. http://collegerama.tudelft.nl/mediasite/Viewer/?peid=7548f752-101b-417e-a4e758aebc595376&playFrom=1218000 Figure 3.2: Recorded lectures for course 3011 in Blackboard (Source: http://blackboard.tudelft.nl/webapps/blackboard/content/listContent.jsp?mode=reset&course_id= _24753_1&content_id=_1048479_1) 124 Annex B. Recorded lectures in Collegerama Alternative display options Collegerama is displayed within the Mediasite-player, which is based on Windows Media Player and uses java-script to store configuration information. The display settings of the player can be modified by changing the configuration files, which are javascript documents. The size of the slides and video screen and their position within the player is determined by the values in the file 'standalone-layout.js'. Additional settings are included in the file 'standalone-manifest.js'. The position of the slides and video screen LayoutOptions.DefaultPosition. Possible values are: 1. video in left upper corner 2. video in right upper corner 3. video in right lower corner 4. video in left lower corner is controlled by the value Figure 3.3: Collegerama display with video in right upper corner The display size of the slides and the video is controlled by the values of LayoutOptions.VideoHeight, LayoutOptions.VideoWidth, LayoutOptions.SlideHeight and LayoutOptions.SlideWidth. These settings should comply with the display size which is controlled by the values of LayoutOptions.PlayerHeight and LayoutOptions.PlayerWidth. The default settings for the sizes are 240, 320, 480 and 640, with related display sizes of 584 and 983. These settings correspond with a 100% display of the video size and 62% of the slide sizes. Figure 3.4: Collegerama display with a smaller video in right upper corner (50%, 120x160) The other display settings are included in the file 'standalone-manifest.js'. This file includes the time settings of the slides as well as the text in the viewer text block, such as title, presenter info and time data. Annex B. Recorded lectures in Collegerama 125 Player options The main player of Collegerama is split into three parts: • video screen • display of slides or Tablet PC • information bar Figure 3.5: Screenshot of the Collegerama web application Video screen The video screen shows a video recording of the lecturer talking, while the camera is following him around. This is the only part of the lecture that shows a continuous motion. Around the video player are several controls that allow you to customize your lecture viewing experience. You have the option of pausing and skipping back the movie to the beginning. You can also adjust the volume settings and increase the video to full-screen. The last interesting option that is offered is the playback speed. In December of 2001, a student at Brigham Young University in Utah wrote a paper called 'Variable Speed Playback of Digitally Recorded Lectures', in which he analyzed the usefulness of alternating the playback speed. They released a plugin that allowed you to play the video at twice the original speed. Afterwards he interviewed 625 students who took the course that was published via digitally recorded lectures and 256 students in this group (41%) ended up purchasing and using the plugin. Most of the buyers were very enthusiastic about the program and were very content in using it. Figure 3.6: Screenshot of the video screen Display of slides or presentation aid The slide selector of Collegerama is located on the right side of the screen. You have three separate options for selecting these slides: • a list of slide numbers, their corresponding time and the title of the slide • an overview of the following 16 slides in full color • a larger overview of the following 6 slides in full color 126 Annex B. Recorded lectures in Collegerama Figure 3.7: Screenshot of the three different slide selection screens By clicking on a slide, the video stream will automatically fast forward to the beginning of that slide. The timing information is retrieved from the xml file that contains all the configuration information. What you can see when using Collegerama is that the focus has been laid on the video stream. When the viewer clicks on a slide, it updates the video screen but not the current slide view. It is necessary to leave the slide selection screen, to see the current slide in full-screen again. This is very distracting, because the most important content and information about the presentation is contained in the Presentation slides. Information bar This part of the interface contains some additional information about the lecturer, the title, date and length of the lecture and some optional information. After the lecture has been uploaded, the text on this screen can be updated. The problem is that generally lecturers at the TU Delft don't have the opportunity to modify these text fields on the Collegerama server. That means that this field is usually kept empty and therefore kind of useless. Figure 3.8: Screenshot of the information bar Navigation A characteristic about the navigation in Collegerama is that it's based on the video stream. The foundation of the lecture is the webcam recording, to which the slides have been linked based on time codes. This means that a movie is required to use the Collegerama system and a simple collection of slides with possibly some audio narration is not be sufficient. The only way to really navigate through the lecture is by using the slide navigation screens. You get to see an overview of the different slides and you can fast forward to them. This works very well when the lecturer makes use of a Powerpoint presentation, in which they generally have a talk of a couple of minutes, illustrating what is shown on the slide. When the lecturer uses the Tablet PC option, in which they write down notes, the navigation becomes problematic. Collegerama checks for changes in the screen during the presentation every 1 to 4 seconds. If the lecturer is writing down notes, a large amount of slides will be generated since the screen is constantly changing. When the lecture is put on the server, these types of presentation contain about 150 slides on average, most of them about 10 seconds long. This makes navigation through the lecture virtually impossible, because a slide no longer constitutes a change in the subject but simply a small change on the screen. Annex B. Recorded lectures in Collegerama 127 4. Technical specifications Data storage Every presentation is split up into several parts: • video • audio • slides • XML • presentation viewer If you compare the different file sizes of each segment, a considerable difference can be observed. The video stream is the part that takes up the biggest amount of space, while the slides are relatively small. If the only thing available was a combination of slides and audio, the complete lecture would be 1/5th times the size that it would have been, had the video stream been included. Table 4.1: Storage and quality information of a Collegerama lecture (PowerPoint method) Stream Video Data size 91,4 MB Length / amount 45:09 minutes Audio 20,6 MB 45:09 minutes Slides 6,32 MB 29 slides XML Presentation viewer 21 KB 1,75 MB ----- Recording quality Windows Media Video 9 codec 320x240 pixels 30 fps 350 kbps bitrate Windows Media Audio 9 codec 48 kbps 44 kHz mono (A/V) CBR encode mode 48024 bitrate 44100 sample rate 1024x768 pixels 96 pixels/inch Synchronization between slides and video The different contents of the presentation need to be synchronized, so that when the lecture is viewed, all the streams (video, audio, slides) are displayed at the correct time. This is accomplished using an XML file. Apart from the timing information, it also contains all relevant information concerning the lecture. The markup file is divided up into 9 sections: • presentation • folders • profile • slides • chapters • presenters • polls • external links • viewer Presentation element This element contains all the information about the presentation. It defines the owner, shows a creation and modification date, the title of the lecture and information about the time zone of the recording. There's also a link included to the location of the lecture on the server and in what online directory the file is located. 128 Annex B. Recorded lectures in Collegerama <Id> <Value>f33ba7ff-0160-4259-bd94-7ee0d9c5a461</Value> <EntityType>Presentation</EntityType> </Id> <Owner>Kees</Owner> <CreationDate>2008-11-05T09:56:29Z</CreationDate> <LastModified>2008-11-05T09:56:29Z</LastModified> <RootId> <Value>f33ba7ff-0160-4259-bd94-7ee0d9c5a461</Value> <EntityType>Presentation</EntityType> </RootId> <Title>CT3011/11</Title> <FolderId> <Value>16b5f5fa-0745-4b8b-9f02-f79a03abf50a</Value> <EntityType>Folder</EntityType> </FolderId> Figure 4.1: XML snippet of the <presentation> element Folders element Every lecture that is recorded is usually stored in more than one folder, since most of them belong to different bachelor or master programs. Each lecture also contains an owner ID, a creation and modification date of the folder and an identifier that can be used to locate the folder on the web server. <Folder> <Id> <Value>b2d9aa4e-c67a-4200-9297-3670289bfea7</Value> <EntityType>Folder</EntityType> </Id> <Owner>Jos-pc</Owner> <CreationDate>2009-02-03T13:29:28Z</CreationDate> <LastModified>2009-02-03T14:11:28Z</LastModified> <Name>TU Delft COLLEGES</Name> <ParentId> <Value>11cd7471-86a3-4d32-a5cd-300dca2a78bf</Value> <EntityType>Folder</EntityType> </ParentId> <Type>Folder</Type> </Folder> Figure 4.2: XML snippet of the <folders> element Profile The profile holds all the relevant information about the video and audio streams that have been stored. For the video stream it holds the creation date, the frame rate, the resolution and the video codec used (in this case the Windows Media Video 9 codec). For the audio stream it contains the bit rate, sample rate, number of channels and the encoding mode. Slides This part of the configuration file holds the timing information for the presentation. Every slide has a unique identifier, an entity type (such as Presentation), a number, the filename and the timing index in milliseconds. With this information, the viewer application knows at which timeframe the image in the slide viewer needs to change. The timeline of the video stream functions as the foundation for the whole Collegerama presentation system and all slide components rely on it to be correct. <Slide> <PresentationId> <Value>f33ba7ff-0160-4259-bd94-7ee0d9c5a461</Value> <EntityType>Presentation</EntityType> </PresentationId> <Number>1</Number> <Time>47</Time> <SlideImageFileNameTemplate>slide_{0:D4}_full.jpg</SlideImageFileNameTemplate> </Slide> Figure 4.3: XML snippet of the <slide> element Annex B. Recorded lectures in Collegerama 129 Chapters This element is used to store additional information about each slide, such as title and additional text. They are displayed in the slide listing on the Collegerama application while selecting another slide. At present, this element is not used very often since they need to be added after the presentation has been published and have to be added manually. Presenters Since every lecture is given by one or more presenters, this information is also stored with the presentation. It contains some general contact information and provides a link to the photo of the lecturer. This photo is displayed during playback of the given lecture on the Collegerama web application. <Presenter> <Id> <Value>0cdd3b98-9bd5-4275-8afc-38ada3eb03a7</Value> <EntityType>Presenter</EntityType> </Id> <Owner>MediasiteAdmin</Owner> <CreationDate>2008-11-05T09:46:56Z</CreationDate> <LastModified>2008-11-10T15:33:28Z</LastModified> <FirstName>J.C.</FirstName> <MiddleName>van</MiddleName> <LastName>Dijk</LastName> <BioUrl>http://tudelft.nl/live/pagina.jsp?id=e1c07f7e-f00d-4aa1-837a27798eab23c6</BioUrl> <EmailAddress>J.C.vanDijk@tudelft.nl</EmailAddress> <ImageName>0cdd3b98-9bd5-4275-8afc-38ada3eb03a7.JPG</ImageName> <ImageUrl>http://collegerama.tudelft.nl/mediasite/MediasiteData/Presenters/0cd d3b98-9bd5-4275-8afc-38ada3eb03a7/0cdd3b98-9bd5-4275-8afc38ada3eb03a7.JPG</ImageUrl> <AdditionalInfo /> </Presenter> Figure 4.4: XML snippet of the <presenter> element Polls Here you can enter polls and questions about the lecture. The TU Delft does not make use of this feature. ExternalLinks Here you can add external links that are relevant for the lecture. The TU Delft does not make use of this feature. Viewer This contains information about the viewer, such as dimensions, resolution, a link to the title banner etc. You can also update the images used while loading, slide start, slide end and other pictures. 130 Annex B. Recorded lectures in Collegerama 5. Evaluation Collegerama provides a good and stable platform for the distribution of online video lectures. The biggest advantage of this is that you only need a video stream to upload your lecture and additional slides or screenshots are optional, since the system relies on the video stream as a basic timeline. The disadvantage is that the navigation is not that great. The creators built the system with the idea in mind that the main focus should be on the video and not the slides. This doesn't make much sense, since the main storyline and structure of most lectures is based on the keywords that are placed on the PowerPoint slides. When navigating through the slides, the focus is lost on the current slide that lets the viewer know where the lecturer is with the story. Another problem is that there is no possibility for searching within the lectures. The text slides are converted to pictures which makes searching them increasingly difficult. The quality of these screenshots is also decreased since the slides are no longer vector oriented, but converted to still images. Overall, Collegerama is a good system, but there are definitely a few flaws that could be improved in order to make it even easier to work with and several aspects could be enhanced to allow for better usability by students. Annex B. Recorded lectures in Collegerama 131 Annex B1. Collegerama lectures of CT3011 in OpenCourseWare 132 Annex B. Recorded lectures in Collegerama Annex B. Recorded lectures in Collegerama 133 134 Annex B. Recorded lectures in Collegerama Annex B. Recorded lectures in Collegerama 135 136 Annex B. Recorded lectures in Collegerama Annex C. 1. 2. 3. 4. Collegerama as single movie/audio file Single movie or audio file.......................................................................... 139 Benefits of a single movie or audio file? .............................................................. 139 Podcast, vodcast or what's in the name? ............................................................ 140 Multimedia container files .................................................................................. 140 Developments in multimedia container formats ................................................... 141 Developments in multimedia systems for movies and home theaters .................... 142 Technical specification for video streams/files ..................................................... 144 Technical specification for audio streams/files ..................................................... 146 Collegerama for YouTube .......................................................................... 148 What is YouTube? ............................................................................................. 148 Video formats for YouTube ................................................................................ 148 Components of a Collegerama vodcast ............................................................... 149 Collegerama as vodcast for YouTube .................................................................. 150 Vodcast production............................................................................................ 151 One step recording for vodcast production .......................................................... 155 Uploading to YouTube ....................................................................................... 156 Downloading of vodcasts from YouTube ............................................................. 157 Conclusions and recommendations for vodcasts on YouTube ............................... 158 Collegerama for iTunes ............................................................................. 159 What is iTunes? ................................................................................................ 159 Video formats for iTunes.................................................................................... 159 iPod constraints for Collegerama vodcasts .......................................................... 160 Components of a Collegerama vodcast for iTunes ............................................... 163 Vodcast production............................................................................................ 165 Vodcast production by TU Delft .......................................................................... 168 Uploading to and downloading from iTunes ........................................................ 169 Conclusions and recommendations for vodcasts on iTunes ................................... 169 Evaluation ................................................................................................. 170 iTunes versus YouTube ...................................................................................... 170 Alternative download options ............................................................................. 170 Future developments of Collegerama.................................................................. 170 Annex C. Collegerama as single movie/audio file 137 138 Annex C. Collegerama as single movie/audio file 1. Single movie or audio file Benefits of a single movie or audio file? At present, lectures recorded in Collegerama can only be viewed as streaming video with an Internet connection to the Collegerama server (http://collegerama.tudelft.nl). This setup has several advantages, such us: • no distribution channels required, avoiding its institutional and technical requirements • single point of entry, with its benefits on updating (its content as well as the player) • no storage required at the point of viewing/listening Severe drawbacks of this setup are: • no distribution via popular channels, such as YouTube and iTunes (and missing their worldwide exposure) • no offline viewing/listening and therefore repeatedly streaming for multiple views In order to overcome these drawbacks, it's required to create a single movie or audio file from a Collegerama recording which can be used in most distribution channels. A single video file also makes it easier to allow additional options such as subtitling or voice narration. Each distribution channel requires its own specific technical specifications. To avoid the production of a wide range of files, only the following distribution channels are taken into consideration: • movie files: YouTube and iTunes • audio files: iTunes These two platforms have been selected based on: • their worldwide exposure • the acceptance from their technical specifications by other external platforms • the experiences of MIT (see separate document) • the usability of these technical specifications in TUDelft's own BlackBoard, OCW and other web platforms Figure 1.1: Combining the Collegerama components into a single video file enables distribution of recorded lectures For • • • • • • creating a single movie and/or audio file, the following aspects should be determined: content (slides, audio, video, subtitles, and any combination of these) presentation of the content (lay out, introduction tune/movie, branding) video quality (resolution, frame rate) format of video file (mov, wmv, flv, mp4, codec etc) audio quality (stereo/mono, frequency range) format of audio file (mov, mp3, mp4, codec etc) Annex C. Collegerama as single movie/audio file 139 Above mentioned technical specifications (quality, codec) primarily determine the file size. The technical specification should balance between quality/usability and quantity (download time and storage requirements). Podcast, vodcast or what's in the name? Single audio files are often referred to as "podcast" files. The term podcast originates from the iPod, as iPod-broadcasting. In the slipstream of this term, single movie files are often referred to as "vodcast" files. Originally these were downloaded files since iPod and iTunes did not support streaming content. The meaning of these terms has later transferred into "audio on demand" or "video on demand (VOD)", in combination with an RSS feed. This audio or video can also be streaming audio or video, without actual distribution of a real file. Figure 1.2: The RSS logo (background, brown) is combined with the movie icon to indicate a vodcast distribution A video always contains a digital audio and a digital video stream that is put into a single file. The advantage of downloading the complete vodcast (or video podcast) is that it can later be played offline on a PC or some other portable multimedia device. When downloading the complete video, it can be watched multiple times without causing additional bandwidth usage on the server, or even connecting to the server (offline viewing or listening). Streaming allows for skipping parts of the video without downloading the whole content, but users may have to face pauses in playback due to slow transfer speeds. Downloaded files have a much faster response on interruptions from the user. Recorded lectures are often watched more than one time and often with user interference like skipping or rewinding some passages. Multimedia container files Digital multimedia files consist of at least two digital streams: • video stream • audio stream These two streams are synchronized by an Audio to Video Synchronization process (A/Vsync). The video and audio streams as well as the synchronization process are stored in a single multimedia container file or stream. In these container files, the digital streams are stored in a compressed form in order to reduce the file or stream size. Several different codecs (compression-decompression systems) have been developed for the compression of these streams. Decompression is done in the digital media player during display. 140 Annex C. Collegerama as single movie/audio file Table 1.1 gives an overview of the most popular container formats. A container file format supports one or more codecs as indicated in this table. Some container file formats also include subtitles files, menu-systems and/or metadata files. For other container formats this information is provided by attached files. Table 1.1: Most popular multimedia container file formats File extension avi Owner Video formats Audio formats Microsoft Almost anything through ACM; Vorbis is problematic rm RealNetworks Almost anything through VFW; H.264/AVC is problematic due to the limited B-frame support RealVideo 8, 9, 10 mpg MPEG MPEG-1, MPEG-2 mov Apple asf, wma, wmv Microsoft flv Adobe mp4 MPEG Limited to what is available to the QuickTime codec manager Almost anything through VFW or DMO; H.264/AVC is problematic Sorenson, VP6, Screen Video, H.264/MPEG-4 AVC MPEG-4 ASP, H.264/MPEG-4 AVC, H.263, VC-1, Dirac, others mkv public domain virtually anything (Source: http://en.wikipedia.org/wiki/Container_format_(digital) and http://en.wikipedia.org/wiki/Comparison_of_container_formats) (HE)-AAC, Cook Codec, Vorbis, RealAudio Lossless MPEG-1 Layers I, II, III (mp3), other formats only in private streams: LPCM Limited to what is available to Sound Manager or CoreAudio Almost anything through ACM or DMO; Vorbis is problematic MP3, Nellymoser, ADPCM, Linear PCM, AAC[8], Speex MPEG-1 Layers I, II, III (MP3), MPEG-2/4 (HE)-AAC, AC-3, Vorbis (with privat objectTypeIndication), Apple Lossless, others virtually anything Developments in multimedia container formats Multimedia container formats have been developed since their early years in 1980-1990, when PC's became sufficiently powerful to display videos and movie files. Over time much better movie quality was achieved driven by the following developments: • better codecs resulting in smaller files, enabled by faster and larger processors • larger file sizes, enabled by cheaper data storage • larger stream sizes, enabled by faster Internet connections These developments were incorporated into the container systems of each of the respective suppliers/owners. This development is reflected in the succeeding numbering of the file systems. Microsoft's extension wmv is internally indicated as Windows Media Video 9, in short labeled as wmv3. Similar developments can be viewed for the other suppliers. (Source: http://www.microsoft.com/windows/windowsmedia/forpros/codecs/video.aspx#WindowsMedi aVideo9VCM) The only exception is the development of the mp4 extension. This is an open standard supported by many suppliers. To distinguish between the older mpg extension (for MPEG1 and MPEG2), a new extension has been introduced. Annex C. Collegerama as single movie/audio file 141 Developments in multimedia systems for movies and home theaters In 1977, the Video Home System (better known as VHS) has been introduced by JVC. There were several rival formats that were competing to be the leading video format, with Sony's Betamax being its fiercest competitor. JVC's standard offered a longer playing time and had the advantage of a far less complex tape transport mechanism. Early VHS machines could also rewind and fast forward the tape considerably faster than a Betamax VCR. (Source: http://besser.tsoa.nyu.edu/impact/f96/Projects/jchyung/) By the 1990s, the VHS format became the standard for distributing movies and videos throughout the consumer market. The problem with this format was the fact that it contains an analogue signal. In 1982, Sony and Philips developed a new standard for audio and data storage called Compact Disc (CD). In that same year, Sony Music Entertainment released the first music album on their new digital Compact Disc format. Since then, it became the standard for the distribution of digital audio, while video was still being released in an analogue format. For the following ten years, this trend continued until in 1993 a new video format was developed based on Sony's original CD technology, called Compact Disc Digital Video or Video CD (VCD). It contained an MPEG-1 video stream with a resolution of 352x240 for NTSC and 352x288 for PAL. The overall picture quality was intended to be comparable to VHS video (although poorly compressed VCD video can sometimes have a lower quality than VHS video). The advantage of VCD was the use of block artifacts rather than analog noise, which doesn't deteriorate further with each use. (Source: http://www.philipsmuseumeindhoven.nl/phe/products/e_cd.htm and https://www.ip.philips.com/view_attachment/2450/https://www.ip.philips.com/view_attachm ent/2450/sl00812.pdfsl00812.pdf) In 1997, a new disc format called Digital Video Disc (DVD) was introduced, which offered about 7 times the storage capacity of a CD. This increase in space allowed for the distributing of movies at a much larger quality and video resolution. These videos were stored using a new format called MPEG-2, at a resolution of 720x480 for NTSC and 720x576 for PAL. It also allowed for the support of wide-screen video with an aspect ratio of 16:9. As of this moment, DVD became and still is the standard for all public movie and video productions. In 2006, a new format called Blu-ray Disc (BD), designed by Sony, Philips and Panasonic, was released as the successor to DVD. However, unlike previous format changes (e.g. audio tape to compact disc, VHS videotape to DVD), there is no immediate indication that production of the standard DVD will gradually wind down, as they still dominate with around 87% of video sales and approximately one billion DVD player sales worldwide. Table 1.2 gives the technical details for the above mentioned systems. The developments within the movie piracy scene are displayed in Table 1.3. The main differences to the legal world systems is the use of new technology on older media, enabled by better processors and cheaper media, and the use of single file systems enabling more easier display at the end user's PC. From these developments it can be seen that future digital movie system will be more and more based on MPEG-4 within the Blu-ray specifications. 142 Annex C. Collegerama as single movie/audio file Table 1.2: Developments in digital video for movies and home theaters VCD DVD Blu-ray Year of introduction Base size Max size 1993 1997 2006 700 MB 700 MB (mode 1) 800 MB (mode 2/xa) 20 GB 20 GB (single layer) 50 GB (dual layer) Video encoding MPEG-1 4.7 GB 4.7 GB (s-sided s-layer) 8.54 GB (s-sided d-layer) 17.08 GB (d-sided dlayer) MPEG-2 Audio encoding MPEG-1 Audio Layer II Up to 44.1 kHz Dual channel or stereo 352x240 (NTSC) 352x288 (PAL) 1,150 kbits/sec 4:3 29.97 or 23.976 (NTSC) 25 (PAL) Sony, Philips, Matsushita and JVC Resolution Bitrate Aspect ratio FPS Creator DVD-Audio Up to 192 kHz Up to 6.1 surround 720x480 (NTSC) 720x576 (PAL) 4:3 or 16:9 29.97 (NTSC) 25 (PAL) Sony, Panasonic, Toshiba, Fox Studios, Warner Brothers, Philips Table 1.3: Developments in digital video within the movie piracy scene MPEG-4 met H264 VC1 MPEG-2 (backward compatible) AAC Up to 192 kHz Up to 7.1 surround 1920x1080 29.4 Mbit/sec 16:9 Blu-ray Disc Association (Sony, Panasonic, Pioneer, Philips, Thomson, LG Electronics, Hitachi, Sharp and Samsung) DivX XviD Matroska Year of introduction Purpose 1998 2001 2006 Compression of MPEG-4 Compression of MPEG-4 File extension *.avi *.avi Container for MPEG4 files *.mkv Annex C. Collegerama as single movie/audio file 143 Technical specification for video streams/files The digital video stream contains pictures (or frames) played at a certain rate (frame rate). The picture or frame is composed of pixels (resolution). (Source: http://www.equasys.de/videoformats.html) Resolution or frame size Since every frame is an orthogonal bitmap digital image, it comprises of a raster of pixels. If it has a width of W pixels and a height of H pixels, the frame size is stated as WxH. The frame size should fit within the size of the digital display. Figure 1.3 gives an overview of common digital displays modes. Figure 1.3: Overview of common display resolutions (Source: http://en.wikipedia.org/wiki/Display_resolution) The development in display resolution is aimed towards larger resolutions (from 320x240 to 1920x1080) and towards wider screens (ratio 4:3 to 16:9 i.e 1.33 to 1.78). These example resolutions increase the number of pixels per frame from 76.8k to 2.07M, or an increase by a factor 27. Frame rate In a digital video, the digital images or frames are displayed in rapid succession at a constant rate. The rates at which these frames are displayed are measured in frames per second, or FPS. The frame rate may vary from 1 to 100, depending on the actual motion within the movie. (Source: http://spng.se/frame-rate-test/) 144 Annex C. Collegerama as single movie/audio file Table 1.4 gives the frame rate for most popular video and television systems. Table 1.4: Frame rates for television and movie System Initial games Cinema film TV-PAL TV-NTSC Blue-ray Blue-ray Monitor Frame rate (FPS) 6 24 25 29.97 24-p / 23.976-p 59.94-i / 50-i 60 / 100 Remarks Accepted by players in 3D game progressive interlaced More modern systems are also capable to handle a variable frame rate. For recording lectures, a very high frame rate is not required. A frame rate of around 15 frames per second seems more than sufficient. Video compression Video compression is achieved not only on a frame to frame basis (such as bmp to jpg compression), but also over successive frames. Video compression typically operates on square-shaped groups of neighboring pixels, often called macro blocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression codec (encode/decode scheme) sends only the differences within those blocks. This works extremely well if the video has no motion. A still frame of text for example can be repeated with very little transmitted data. In areas of video with more motion, more pixels change from one frame to the next. When more pixels change, the video compression scheme must send more data to keep up with the larger number of pixels that are changing. Very good video compression is flexible, making the actual frame rate of minor importance. (Source: http://www.videsignline.com/howto/showArticle.jhtml?articleID=185301351) Video codecs used in multimedia files are often identified by a 4 digit code (FourCC). Table 1.5 provides a list of the codes that are most used around the world. Table 1.5: Some popular codecs for video compression Code Owner RAW - avc1 Apple MP4V Apple Alternative name Description, Products Using the Codec, etc. Full Frames (Uncompressed) H.264 Apple's version of the MPEG4 part 10/H.264 standard apparently. Apple QuickTime MPEG-4 native H264 Intel ITU H.264 FLV1 FLV1 codec (supported by ffdshow) TSCC TechSmith Screen Capture Codec WMV3 Microsoft Windows Media Video 9 The codec implements the Simple and Main modes of the VC-1 codec standard WVC1 Microsoft Windows Media Video 9 Advanced Profile DIVX OpenDivX DivX DMO-based codec. VC-1 compliant format. Fully compliant Advanced Profile of the VC-1 codec standard. WVC1 is included in Windows Media Player 11. This FOURCC code is used for versions 4.0 and later of the DivX codec. XVID XviD MPEG-4 codec XVIX Based on XviD MPEG-4 codec (Source: http://www.videsignline.com/howto/showArticle.jhtml?articleID=185301351, http://www.fourcc.org/, Annex C. Collegerama as single movie/audio file 145 http://abcavi.kibi.ru/fourcc.php) Table 1.6 gives the compression efficiency for MPEG compression. More recent codecs such as VC1 (in wmv3 or Windows Media Video 9) give even better compression than MPEG-4. Table 1.6: Global compression efficiency for MPEG video codecs Compression Absolute Relative MPEG-1 7 - 15 1 MPEG-2 15 - 30 2 MPEG-4 50 - 100 6 Technical specification for audio streams/files A digital audio stream contains one or more channels (mono/stereo/surround). For each channel the analog signal is converted into a digital signal at a given sampling rate and bit resolution or bit depth. Generally speaking, the higher the sampling rate and bit depth, the more fidelity combined with an increase in the amount of digital data. Number of channels An audio stream can be mono (1 channel), stereo (2 channels) or surround sound (3 to 7 channels). Sampling rate The sampling rate, sample rate, or sampling frequency defines the number of samples per second (or per other unit) taken from a continuous signal to make a discrete signal. For timedomain signals, it can be measured in samples per second or hertz (Hz). Bit depth In digital audio, bit depth describes the number of bits of information recorded for each sample. Bit depth directly corresponds to the resolution of each sample in a set of digital audio data. Common examples of bit depth include CD audio, which is recorded at 16 bits, and DVD-Audio, which can support up to 24-bit audio. Bit rate Bit rate refers to the amount of data, specifically bits, transmitted or received per second (bit/s or bps). The bit rate is often quantified in conjunction with an SI prefix such as kilo- (kbit/s or kbps), mega- (Mbit/s or Mbps), giga- (Gbit/s or Gbps) or tera- (Tbit/s or Tbps). 1 kbit/s has almost always meant 1,000 bit/s, not 1,024 bit/s. The bit rate can be calculated in the following way: Bit rate = (bit depth) x (sampling rate) x (number of channels). One of the most common bit rates given is that for compressed audio files. For example, an MP3 file might be described as having a bit rate of 160 kbit/s or 160000 bits/second. This indicates the amount of compressed data needed to store one second of music. The standard audio CD is said to have a data rate of 44.1 kHz/16, implying the audio data was sampled 44,100 times per second, with a bit depth of 16. CD tracks are usually stereo, using a left and right track, so the amount of audio data per second is double that of mono, where only a single track is used. The bit rate is then 44100 samples/second x 16 bits/sample x 2 = 1,411,200 bit/s or 1.4 Mbit/s. 146 Annex C. Collegerama as single movie/audio file Audio compression Audio compression algorithms are implemented in computer software as audio codecs. The most known audio codec is MP3 (MPEG-1 Audio Layer 3), a patented digital audio encoding format using a form of lossy data compression. It is a common audio format for consumer audio storage, as well as a de facto standard of digital audio compression for the transfer and playback of music on digital audio players. Designed to be the successor of the MP3 format, AAC (Advanced Audio Coding) generally achieves better sound quality than MP3 at similar bit rates. AAC's best known use is as the default audio format of Apple's iPhone, iPod, iTunes, and the MPEG-4 video standard (MP4). Table 1.7 gives an overview for some audio codecs. Table 1.7: Some popular codecs for audio compression Code Creator Year Latest version Remarks MP3 1993 1997 ISO/IEC 11172-3, ISO/IEC 13818-3 ISO/IEC 14496-3 wma ISO/IEC MPEG Audio Committee ISO/IEC MPEG Audio Committee Microsoft 1999 11 Ogg Xiph.Org Foundation 2001 1.2 Patent rights are disputed iTunes DRM audio for MP4 Free for Windows licensees Free to use AAC (Source: http://en.wikipedia.org/wiki/Comparison_of_audio_codecs) Annex C. Collegerama as single movie/audio file 147 2. Collegerama for YouTube What is YouTube? YouTube is a video sharing website on which users can upload and share videos. Three former PayPal employees created YouTube in February 2005. In November 2006, YouTube, LLC was bought by Google Inc. for $1.65 billion, and is now operated as a subsidiary of Google. It is the biggest distributor of streaming online video content. (Source: http://news.bbc.co.uk/2/hi/business/6411017.stm) The company is based in San Bruno, California, and uses Adobe Flash Video technology to display a wide variety of user-generated video content, including movie clips, TV clips, and music videos, as well as amateur content such as video blogging and short original videos. Most of the content on YouTube has been uploaded by individuals, although media corporations including CBS, the BBC, UMG and other organizations offer some of their material via the site, as part of the YouTube partnership program. Unregistered users can watch the videos, while registered users are permitted to upload an unlimited number of videos. Videos that are considered to contain potentially offensive content are available only to registered users over the age of 18. The uploading of videos containing defamation, pornography, copyright violations, and material encouraging criminal conduct is prohibited by YouTube's terms of service. Accounts of registered users are called "channels". In the last few years YouTube became a medium for several Universities to publish their recorded lectures on. One of the first was MIT (Massachusetts Institute of Technology), who joined in October of 2005. Later other Universities like Purdue (2006), Stanford (2006), UC Berkeley (2007) and Harvard Business (2007) started publishing recorded lectures and course material via the popular Internet medium. Video formats for YouTube YouTube's video playback technology for web users is based on the Adobe Flash Player. This allows the site to display videos with quality comparable to more established video playback technologies (such as Windows Media Player, QuickTime, and RealPlayer) that generally require the user to download and install a web browser plug-in to view video content. Viewing Flash video also requires a plug-in, but market research from Adobe Systems has found that its Flash plug-in is installed on over 95% of personal computers. Videos uploaded to YouTube are limited to ten minutes in length and a file size of 2 GB. When YouTube was launched in 2005, it was possible for any user to upload videos longer than ten minutes, but YouTube's help section now states: "You can no longer upload videos longer than ten minutes regardless of what type of account you have. Users who had previously been allowed to upload longer content still retain this ability, so you may occasionally see videos that are longer than ten minutes." The ten minute limit was introduced in March 2006, after YouTube found that the majority of videos exceeding this length were unauthorized uploads of television shows and films. Video formats YouTube accepts videos uploaded in most formats, including .WMV, .AVI, .MKV, .MOV, MPEG, .MP4, DivX, .FLV, and .OGG. It also supports 3GP, allowing videos to be uploaded directly from a mobile phone. 148 Annex C. Collegerama as single movie/audio file Video quality YouTube originally offered videos in only one format, but it now has three main formats, as well as a "mobile" format for viewing on mobile phones. The original format, now labeled "standard quality", displays videos at a resolution of 320x240 pixels using the Sorenson Spark codec, with mono MP3 audio. This was, at the time, the standard for streaming online videos. "High quality" videos, introduced in March 2008, are shown at up to 864x480 pixels with stereo AAC sound. This format offers a significant improvement over standard quality. In November 2008 720p HD support was added. At the same time, the YouTube player was changed from an aspect ratio of 4:3 to a widescreen 16:9 resolution. 720p videos are shown at 1280x720 pixels resolution and encoded with the H.264 video codec. They also feature stereo audio encoded with AAC. Components of a Collegerama vodcast A Collegerama lecture has screenshots of the PowerPoint slides and a video of the lecturer giving the lecture. On the web interface, these have been split into separate parts. If Collegerama is going to be published as a vodcast, a conversion of the different Collegerama elements into a single video (multimedia) file format is necessary. In • • • the current video system of Collegerama, the following elements are kept in sync: video (with audio) PowerPoint slides closed captions (not currently used at the TU Delft) Video The video part of Collegerama usually shows the lecturer, but might occasionally be switched to a recording of the display screen for animations, movies etc. Collegerama publishes the video stream using the following quality settings: Resolution: 320 x 240 (ratio 4:3) Frame rate: 25 fps Bit rate: 370 kb/s Codec: wmv3 In short: Windows Media Video 9 / 320x240 / 25.00fps / 341kbps Audio Audio is a very important part of the vodcast. It contains all the explanations by the lecturer and is a main part of the lecture. A lecture can be followed by only having an audio recording without video. This is shown by podcasts of lectures. A video stream without audio doesn't make any sense. Collegerama publishes the audio stream using the following quality settings: Channels: 2 (Stereo) Sampling rate: 22050 Hz (22 kHz) Bit depth: 16 bits/sample Bit rate: 20 kB/s Codec: wma2 In short: Windows Media Audio 9.2 / 20 kbps / 22 kHz / stereo (1-pass CBR) Slides The slides of a presentation contain the most detailed information. It's important for the viewers since it gives a guideline as to where the lecturer is with his story. Fortunately the slides mostly contain keywords at a pretty decent font size, which means that the quality and resolution do not have to be very high for it to be readable. Collegerama publishes PowerPoint slides using the following specifications: Resolution: 1024 x 768 (ratio 4:3) Bit depth: 24 bits/pixel (full colour) Codec: jpg Annex C. Collegerama as single movie/audio file 149 Closed captions / Subtitles There are different ways of publishing closed captions or subtitles on video. The most commonly used is a text file containing time stamps and corresponding spoken sentences. Closed captions and subtitles for Collegerama lectures are described elsewhere. For the production of a vodcast, these subtitles are not relevant, since they will be attached to the vodcast based on the internal timestamps of the movie file. Collegerama as vodcast for YouTube A vodcast for YouTube should comply with the restrictions for resolution of YouTube. A general strategy for this is to develop a vodcast at the best video quality supported by YouTube, with the following considerations and constraints: • YouTube movies are limited to a file size of 2GB and a display time (10 minutes for the general public, unlimited for channel managers like YouTube Edu) • YouTube gives the viewer the option to display at a lower quality when bandwidth is a limiting factor • producing a vodcast at the highest quality enables the production of "child products" for other platforms with a lower quality, which results in smaller file sizes or bandwidth requirements • YouTube converts movies with non-normalized resolution by downsizing to the nearest standard heights of 360, 480, 720 or 1280 pixels The best quality of a Collegerama vodcast for YouTube within these constraints can be achieved by: • reducing the size of the slides from 1024x768 to 960x720 (downsizing to 94%, keeping the display ratio 4:3) • keeping the video resolution of 320x240 • putting both elements next to each other giving an overall size of 1280x720 (HD720, widescreen, display ratio 16:9) • filling the remaining area with related info and/or navigation tools (or blank) Video 320x240 Slide 960x720 Related info 320x480 Figure 2.1: Layout of Collegerama elements within resolution constraints for YouTube movies (1280x720) A layout according to this setup is given in Figure 2.1. The video is located on the right hand side of the slides, to give a more balanced overall picture for left to right reading. The overall view might be mirrored to obtain an overall picture which resembles the original Collegerama view (video left). 150 Annex C. Collegerama as single movie/audio file Figure 2.2 gives this layout for the Collegerama lecture in the course CT3011 Water management. Figure 2.2: Collegerama as vodcast for YouTube (1280x720) Vodcast production A high quality vodcast for YouTube can be produced by following these steps: • convert the slides into a movie file • convert the slide movie into a HD resolution • combining the slide movie and the lecture movie • conversion to other file formats and codecs (if beneficial) Convert the slides into a movie file The most important step in the production of a vodcast out of a Collegerama recording is the conversion of the slides into a movie. This can be achieved with the help of screen capturing systems such as Camtasia. These systems record an assigned part of the display screen into a movie file. By playing a Collegerama lecture, the slides can be recorded as a movie with the right time framing. Figure 2.3 gives an impression of such a recording. In the actual recording of the Collegerama slides the full display mode of Collegerama has been recorded, instead of the slide segment in the overall view. Annex C. Collegerama as single movie/audio file 151 Figure 2.3: Converting the slides into a movie file by recording the Collegerama slide display Table 2.1 gives the results of the conversion of the slides into a movie file. Table 2.1: Properties of original slides and the single movie file obtained by screen recording with Camtasia Property Number of files Resolution Frame rate Codec Duration Total number of frames Total storage Original 29 picture files 1024x768 jpg 29 6.3 MB * Result of conversion 1 movie file 1024x768 15 fps wmv3 45:09 min. (2,709 s) 15 x 2,709 = 40,635 39.8 MB 1 1 1 (39.8 / 6.3 =) 6.32 (40.635/29 =) 1,401 (1,401/6.3 =) 222 Increase of file size Increase of frames Efficiency of compression * original PowerPoint presentation was 16.1 MB (text and higher quality pictures) Table 2.1 shows the amazing efficiency of the video compression. The 29 frames (single pictures) are converted into more than 40,000 frames with only an increase in file size of a factor 6.3. This proves that the wmv3 codec is extremely efficient in compressing such still picture movies. Convert the slide movie into a HD resolution The slide movie resembles the original slide resolution (1024x768). This should be downsized to the most nearby HD resolution (in this case HD720). This conversion can be executed by various video editing systems, including Camtasia. Figure 2.4 and Table 2.2 give the results of this conversion/editing step in which additionally the text area is incorporated at the right hand lower side. 152 Annex C. Collegerama as single movie/audio file Figure 2.4: Converting the slides into a movie file by recording the Collegerama slide display Table 2.2: Properties of converting the single movie file to HD720, with additional text area (Camtasia) Property Number of files Resolution Frame rate Codec Duration Total number of frames Total storage Original 1 movie file + 1 picture file 1024x768 + 320x480 15 fps wmv3 45:09 min. (2,709 s) 15 x 2,709 = 40,635 39.8 MB + 0.1 MB Result of conversion 1 movie file 1280x720 15 fps wmv3 45:09 min. (2,709 s) 15 x 2,709 = 40,635 39.8 MB Increase of file size Increase of resolution 1 1 Efficiency of compression 1 (39.8 / 39.8 =) 1.0 (1280x720/1024x768 = ) 1.17 (1.17/1.0 =) 1.17 Table 2.2 shows that the 17% increase in resolution does not result in a larger file size due to the efficient compression efficiency of the wmv3 codec. Combining the slide movie and the lecture movie Finally the slide movie should be combined with the Collegerama lecture movie. This editing can be executed by various video editing systems, including Camtasia. Figure 2.5 and Table 2.3 Table 2.2give the results of this editing step. Annex C. Collegerama as single movie/audio file 153 Figure 2.5: Combining the slide movie with the lecture movie Table 2.3: Properties of the movie files before and after the incorporation of the lecture movie (Camtasia) Property Number of files Resolution Frame rate Codec Duration Total number of frames Total storage Original 2 movie file 1280x720 + 320x240 15 fps + 25 fps wmv3 45:09 min. (2,709 s) 15 x 2,709 = 40,635 (39.8 + 117 = ) 157 MB Result of conversion 1 movie file 1280x720 15 fps wmv3 45:09 min. (2,709 s) 15 x 2,709 = 40,635 87.8 MB Increase of file size Increase of resolution Reduction in frame rate 1 1 1 (87.8/157 =) 0.56 1 1 resp 25/25 = 0.6 Table 2.3 amazingly shows that the resulting file size is 75% compared to that of the original Collegerama movie part, although the resolution is 12 times larger. Comparing the file size of the HQ720 movie without the Collegerama movie part to the result file shows an increase of (87.8-39.8=) 49.1 MB. This size reflects the costs for the dynamic movie part over the stagnant picture part in the upper right hand side of the vodcast. The reduction in file size can only partly be caused by the smaller frame rate (25 versus 15 fps). It proves the efficient use of the wmv3 codec within the Camtasia movie editing system. The reduction in frame rate is not observed in a reduction of the quality of the movie. Lecture movies might be recorded at a lower frame rate than those used in Collegerama recordings. The file size of the vodcast is even smaller than the sum of the basic ingredients (1 movie, 29 slides and 1 information picture). The produced vodcast requires less bandwidth for streaming or download time for distributing of this Collegerama lecture without losing information or resolution. Conversion to other file formats and codecs (if beneficial) The result file might be converted to other file formats, frame rates and codecs in order to investigate the compression efficiency of alternative codecs. The results of these conversions are shown in Table 2.4. All the file formats and codecs in this table are accepted by YouTube. 154 Annex C. Collegerama as single movie/audio file Table 2.4: Collegerama vodcast (HD720, 45:09 min.) in alternative specifications (Camtasia) File format wmv mp4 mp4 flv f4v Video Codec wmv3 H264 H264 On2 VP6 H264 Frame rate (fps) 15 10 24 15 15 File size (MB) 87.8 507 828 465 408 File size (ratio) 1.0 5.8 9.4 5.3 4.6 Table 2.3 shows that the wmv3 codec has a superior compression over the other codecs, including the H264 codec as used in the Blu-ray mp4 specification. Apparently the mp4 codec is quite sensitive to frame rate, which is remarkable in view of the relative still movie (slides contain the largest part of the movie). The new Flash codec (H264 in f4v) is slightly more efficient than the older On2 codec, but both are more efficient than the H264 codec in mp4. The results show that the wmv3 codec is the best option for vodcast production for YouTube. One step recording for vodcast production Above described production of a vodcast is rather labor and time consuming. A more or less similar result might be obtained by a one step recording session. In this the overall Collegerama display is recorded by Camtasia. Figure 2.6 gives an impression of such a recording. Table 2.5 shows the results. Figure 2.6: One step vodcast production for a Collegerama lecture (left: recording; right: result) Table 2.5: Properties of original Collegerama components and the screen recording by Camtasia Property Number of files Resolution Frame rate Codec Duration Total number of frames Total storage Original 1 movie file, 29 slides, player 320x240 + 640x480 + .. 25 fps + 0.01 fps wmv3 45:09 min. (2,709 s) 25 x 2,709 = 67,725 (117 + 6.3 = ) 123 MB Result of screen recording 1 movie file 972x480 15 fps wmv3 45:09 min. (2,709 s) 15 x 2,709 = 40,635 65.1 MB Increase of file size Increase of resolution Reduction in frame rate 1 1 1 (65.1/123 =) 0.56 1 15/25 = 0.6 The results of Table 2.5 are comparable to the results of Table 2.4. The file size of the vodcast is around (65.1/117=) 50% of the original Collegerama movie, although the resolution is (972*480/(320*240=) 6.1 times larger. The produced vodcast resembles the view of the Collegrama viewer which might be more recognizable for the viewer. However the Collegerama navigation buttons for the movie part are shown but not functional. Annex C. Collegerama as single movie/audio file 155 The production method is rather efficient since only 1 recording/editing session is required. The resolution of the direct recorded vodcast is not according to the normalized sizes for HD movies. Moreover the resolution of the slides is smaller than available at the Collegerama server. These drawbacks might be overcome by modifying the slide display sizes of the Collegerama player and record the enlarged display size. In this modification, also the movie section of the Collegerama display might be changed to the upper right hand side in accordance with the vodcast of Figure 2.4. Uploading to YouTube The produced vodcasts can be uploaded to YouTube if your YouTube account allows for uploading unlimited movie lengths, as is the case for YouTube Edu accounts. Since the TU Delft does not have such an account, a normal user account has been used in order to investigate the uploading and storage facilities at the YouTube server. Such an account is limited to a maximum movie length of 10 minutes. Table 2.6 shows the results for uploading the (slightly modified) vodcast of Table 2.3. Table 2.6: Properties of vodcasts uploaded to and stream/downloaded from YouTube Property Uploaded movie wmv 10:28 min. 19.5 MB 1280x720 15 fps wmv3 mono 44.1 kHz wma2 - Stream/download Non-HD flv 10:28 min. 15.0 MB 640x360 15fps H264 stereo 22.1 kHz mp4a (AAC-SBR) 854x480 Stream/download HD mp4 10:28 min. 70.4 MB 1280x720 15 fps H264 stereo 44.1 kHz mp4a (AAC-SBR_ 854x480 Increase of file size 1 Increase of resolution 1 (70.4/19.5= ) 3.61 1.0 Increase in compression efficiency 1 (15.0/19.5= ) 0.77 (640*360/(1280*720=) 0.26 (0.26/0.77 =) 0.33 Container type Duration File size Video Audio Displayed on YouTube (in web player) (Source: http://www.youtube.com/watch?v=9na2hHJmmvE) (1.0/3.61 =) 0.28 Table 2.6 shows that YouTube converts the uploaded movie into its own file formats and movie quality. YouTube stores and/or streams the uploaded vodcast in 2 file types: flv for "normal quality" and mp4 for "HD quality". Both quality streams are displayed in the same sized player. The HD quality becomes more relevant when displaying in full-screen mode. YouTube uses the H264 codecs for both qualities. This codec is around 3 times less efficient than the uploaded wmv3 codec. Nevertheless, YouTube uses this less efficient codec as this is playable within the Flash player. The Flash player has the highest installation coverage on viewer PC's. Similar results can be obtained by uploading the vodcast of Table 2.5. These results are shown in Table 2.7. 156 Annex C. Collegerama as single movie/audio file Table 2.7: Properties of vodcasts uploaded to and stream/downloaded from YouTube Property Uploaded movie Container type Duration File size wmv 10:26 min. (65.1*626/2710= ) 15.0 MB 972x480 15 fps wmv3 mono 44.1 kHz wma2 - Video Audio Displayed on YouTube (in web player) Increase of file size Increase of resolution Increase in compression efficiency 1 1 1 Stream/download Non-HD flv 10:26 min. 14.2 MB Stream/download HD flv 10:26 min. 33.9 MB 640x320 15fps H264 stereo 22.1 kHz mp4a (AAC-SBR) 640x315 864x432 15 fps H264 stereo 44.1 kHz mp4a (AAC-SBR_ 854x421 (14.2/15.0= ) 0.95 (640*320/(972*480=) 0.44 (0.44/0.95 =) 0.46 (33.9/15.0= ) 2.26 (864*432/(972*480=) 0.8 (0.8/2.26 =) 0.35 (Source: http://www.youtube.com/watch?v=otGN0NUYs5w) Table 2.7 also shows that YouTube converts the uploaded movie into its own file format and movie quality. The uploaded resolution is downsized into 2 standard sizes, either with a width of 640 pixels (normal quality) or a width of 864 pixels (HD quality). The heights are in accordance with the original aspect ratio. In this case YouTube stores and/or streams the uploaded vodcast only as flv format, both for "normal quality" as well as for "HD quality". The mp4 container file is apparently used only for HD720 files and higher. The different quality streams are displayed in different sized players. YouTube uses the H264 codecs for both qualities. This codec is about 2-3 times less efficient than the uploaded wmv3 codec, which has a slightly better result than the previous upload. Downloading of vodcasts from YouTube For recorded lectures, a download of the Collegerama vodcast might be beneficial to students for use in areas without Internet access, such as trains, beaches, parks etc. YouTube only provide streaming videos. Downloads of these videos is not offered by YouTube itself, however YouTube movies can be easily downloaded with the help of third party tools and/or websites. Table 2.8 gives the result of the download via http://www.youtubedownload.nl. This website uses the YouTube URL as input and modifies this URL into an URL for direct playback with save options from the original YouTube server (http://v6.lscache1.c.youtube.com/). Annex C. Collegerama as single movie/audio file 157 Table 2.8: Properties of downloaded vodcast via youtubedownload.nl Property Container type Duration File size Video Audio Increase of file size* Increase of resolution* Increase in compression efficiency* flv flv 10:28 min. mp4 mp4 10:28 min. 3gp 3gp 10:28 min. 23.3 MB 640x360 15 fps flv1 stereo 22.1 kHz mp3 25.8 MB 480x270 15fps H264 stereo 44.1 kHz mp4a (AAC-SBR) 4.3 MB 176 x 144 12 fps mp4v mono 22.2 kHz AAC (23.3/19.5= ) 1.19 (640*360/(1280*720=) 0.26 (0.26/1.19 =) 0.21 (25.8/19.5= ) 1.32 (480*270/(1280*720=) 0.14 (0.14/1.32 =) 0.10 (4.3/19.5= ) 0.22 (176*144/(1280*720=) 0.03 (0.03/0.22 =) 0.14 * compared to the original wmv upload (Source: http://www.youtube.com/watch?v=9na2hHJmmvE and http://www.youtubedownload.nl) The results of Table 2.8 show that this download service is focused on the "normal quality" videos of YouTube, in view of the smaller file sizes. Apparently YouTube files can also be presented in a 3gp file format, which is most suitable for mobile phones. The efficiency of the used codecs in these downloads is 5-10 times less than the original wmv3 codec. The downloaded flv file is around (23.3/15.0=) 1.6 times larger than the directly obtained non-HD movie with similar technical specifications. This download service offers simplicity in use and flexible file formats but no efficient downloads. Alternative YouTube download services are: • Moyea FLV Downloader • http://www.downloadyoutubevideos.com/ • http://www.viddownloader.com • and many, many more YouTube does not encourage the direct download of their movies as this reduces their viewing rate resulting in reduced advertisement revenues. In February 2009, YouTube announced a test service, allowing some partners to offer video downloads for free or for a fee paid through Google Checkout. It is very likely that this will also be part of the YouTube Edu channel. Conclusions and recommendations for vodcasts on YouTube The production of Collegerama vodcasts has resulted into the following conclusions: • a high quality vodcast can be produced at HD720 specifications with a file size which is even smaller than the original Collegerama video recording • the wmv3 codec is much more efficient for the recorded lectures than the H264 or flv1 codec The upload of these vodcasts to YouTube has resulted into the following conclusions: • the high quality vodcasts can be uploaded to and displayed from YouTube without loss of quality • the video codecs used by YouTube are less efficient than the wmv3 codecs, resulting in larger download files if downloaded from YouTube • uploading Collegerama lectures to YouTube requires a YouTube Edu account which allows for uploading movies over 10 minutes 158 Annex C. Collegerama as single movie/audio file 3. Collegerama for iTunes What is iTunes? iTunes is an application that allows the user to manage audio and video on a personal computer, acting as a front-end for Apple's QuickTime media player. Officially, iTunes is required in order to manage the audio of an Apple iPod portable audio player, although alternative software does exist. Users can organize their music into playlists within one or more libraries, edit file information, record Compact Discs, copy files to a digital audio player, purchase music and videos through its built-in music store, download free podcasts, back up songs onto a CD or DVD, run a visualizer to display graphical effects in time to the music, and encode music into a number of different audio formats. There is also a large selection of free internet radio stations to listen to. Version 4.9 of iTunes, released on June 28, 2005, added built-in support for podcasting. It allows users to subscribe to podcasts for free in the iTunes Music Store or by entering the RSS feed URL. Once subscribed, the podcast can be set to download automatically. Users can choose to update podcasts weekly, daily, hourly, or manually. Users can select podcasts to listen to from the Podcast Directory, to which anyone can submit their podcast for placement. The front page of the directory displays high-profile podcasts from commercial broadcasters and independent podcasters. It also allows users to browse the podcasts by category or popularity, and to submit new podcasts to the directory. Video content available from the store used to be encoded as 540 kbit/s Protected MPEG-4 video (H.264) with an approximately 128 kbit/s AAC audio track. Many videos and video podcasts currently require the latest version of QuickTime, QuickTime 7, which is incompatible with older versions of Mac OS (only v10.3.9 and later are supported). On September 12, 2006, the resolution of video content sold on the iTunes Store was increased from 320x240 (QVGA) to 640x480 (VGA). The higher resolution video content is encoded as 1.5 Mbit/s (minimum) Protected MPEG-4 video (H.264) with a minimum 128 kbit/s AAC audio track. Video formats for iTunes The main focus of iTunes is to distribute content to the Apple iPod and its successors. The original iPod was not provided with a video screen for movie display until October of 2005. The iPod Nano got a movie display in September 2007. The screen size of the iPod family is shown in Table 3.1. Table 3.1: Screen sizes of the iPod and its successors Type iPod video iPhone iPod Touch iPod Nano Introduction date October 2005 June 2007 September 2007 September 2007 September 2009 Supported video (external screen) Annex C. Collegerama as single movie/audio file Screen size 320 x 240 480 x 320 480 x 320 320 x 240 376 x 240 640 x 480 Aspect ratio 1.33 (4:3) 1.5 (3:2) 1.5 (3:2) 1.33 (4:3) 1.57 1.33 (4:3) 159 The iPod family has developed into larger screen sizes and wider screens (higher aspect ratio). If the iPhone aspect ratio is compared to the HD widescreen ratio used today, the iPhone is somewhere in between the traditional TV standard and HD widescreen. All iPods support a video display of maximum 640x480 by use of an external screen. It is expected that future iPhone models will at least support the HD 720p and 1080i output modes for external display with its 16:9 aspect ratio. As widescreen HD video has become more or less the standard nowadays, it looks like Apple will also transform into larger video displays with HD specifications. iPod constraints for Collegerama vodcasts For the development of a Collegerama vodcast for iTunes (and iPods), the following aspects are of major concern: • the rather low resolution of the screen • the different aspect ratio These constraints have consequences for the following design aspects: • size of display • size of the video component • location of the video component (upper/lower/left/right corner) Low resolution The resolution of the iPod is the same as of the Collegerama video component. This would allow for simply distributing this video component as a vodcast leaving out the presentation slides. Such setup is used at MIT and many other universities. However the slides provide for the most viewable information in a Collegerama recording. The slides form an important part of a Collegerama vodcast. In an alternative setup, the vodcast might include the slide part of Collegerama with the audio of the video component. This is only an adequate alternative whenever the slides are readable at this low resolution. Figure 3.1 gives an example of a typical PowerPoint slide at iPod resolution. Figure 3.1: A typical PowerPoint slide at iPod resolution (320x240) Figure 3.1 shows that the smaller fonts in a presentation are no longer readable at iPod resolution, but the typical PowerPoint fonts can still be read quite well. The iPod resolution is around (320/1024=) 30% of the maximum slide size in Collegerama and (320/640=) 50% of the slide size in an overall Collegerama display. 160 Annex C. Collegerama as single movie/audio file Different aspect ratio The iPod aspect ratio is the same as both the slides and the movie components in Collegerama. Therefore combining these two components in a widescreen view as in the previous YouTube vodcast is not possible. Alternative solutions are: • the slide components is not included (video only) • the video components is not included (audio only) • the video component is included at a rather small size (picture-in-picture) • the video component is included at a rather small size (side-by-side) with unequal scaling Figure 3.2 gives an impression of the latter three options. Figure 3.2: Collegerama vodcasts including slides with different options for the video component, at iPod aspect ratio From Figure 3.2, it is concluded that the most convenient option for including the movie component is the picture-in-picture layout. This is based on the following considerations: • the slides should be shown at maximum size for proper readability (no side-by-side) • the movie component can be reduced to a rather small size (thumbnail) and still obtaining proper visibility • including the audio component without the video component misses a focus point for the viewer (the movements of the lecturer give a better understanding of the lecture) Size of display An important aspect in the design of a vodcast for iTunes is the display resolution selected for the production and for the distribution. The design strategy for creating the smallest file size looks most promising in this case, for the following reasons: • vodcasts for iTunes should be downloaded to and stored at the iPod of the viewers (download time and storage capacity are relevant factors now, which is not the case in a streaming video setup) • small file sized vodcasts will minimize the requests for other small sized output options like podcasts (audio only), which would require additional production and distribution efforts (time, costs, organization) • a small sized design gives a larger differentiation to the YouTube HD quality design • iTunes uses the H264 codec, which is not as efficient in video compression as the wmv3 codec used in the YouTube design, so a smaller display size will be more relevant for a less efficient compression • the smallest display design allows for viewing on the older iPods, which is still the majority of the iPods currently in use For above mentioned reasons a vodcast for iTunes will be produced with a display size of 320x240 pixels. Annex C. Collegerama as single movie/audio file 161 Size of video component The video in Collegerama shows the lecturer talking to the attendees. For such oration, a very small size is sufficient for viewing as the most important aspect of such a movie is its audio component (spoken words). This is shown in Figure 3.3 in which the original video resolution (320x240) is downsized to 10% of its original size. Figure 3.3: Collegerama video in original size (320x240), and reduced to 30, 20 and 10% Figure 3.3 shows that downsizing the Collegerama video to 20% (64x48) still gives a sufficient visibility with a speaking lecturer. However in some recorded lectures the lecturer is writing text on the blackboard or is presenting experiments. Both situations require a larger display size for proper viewing. For these situations, a full switch from the slide view to the video component might be suggested. However this will require an extensive video editing procedure which also might require the input of the lecturer. These constraints are not within the scope of a vodcast production out of a Collegerama recording. Production of a vodcast from a Collegerama recording should be possible within a fully automated production process. The video component in a picture-in-picture design with the slides at the background will cover part of the slides reducing its readability. This can be minimized by doing the following: • selecting a very small video component (10-20%) • making the video component (partly) transparent, still allowing for a background view (this setup might allow for a larger video size than a non-transparent movie, 20-30% instead of 10-20%) • placing the video component in an area with the lowest disturbance of the slide view Location of the video component The video component should be located on the least disturbing part of the slide. Figure 3.4 gives an impression of these locations for a TU Delft PowerPoint slide at iPod resolution. 162 Annex C. Collegerama as single movie/audio file Figure 3.4: PowerPoint slide in TU design at iPod size, without and with inserted movie components (20%) Figure 3.4 shows that the upper left corner and the lower right corner are unsuitable for movie insertion. The upper left corner hides the important slide title, while the left corner hides the slide number. The lower left corner hides the TU Delft logo and the upper right corner might hide part of the slide title. Both locations are regarded to be acceptable. The lower left corner might have a small advantage since this resembles the general lecture room layout in TU Delft, in which the lecture desk is in the left front and the projection screen is located in the upper center of the lecture room either or to the upper right. This lecture room layout results in many Collegerama recordings showing the lecturer looking to his/her upper left. With a movie component in the upper right corner the lecturer often seems to look in the "sky". It should be noticed here that not all lecturers use the standard TU Delft PowerPoint design. However, in case a lecturer is aware that his Collegerama recording is transformed into an iPod vodcast, he or she might adjust their slides to keep a certain corner of the slide empty. Therefore a uniform predesigned position of the movie component is important. Components of a Collegerama vodcast for iTunes Small file sizes can be also be obtained by excluding parts of the Collegerama recording. To determine the relative importance of Collegerama components, vodcasts have been produced with different components, such as: • slides with audio (no video component) • slides with audio and subtitles (no video component) • slides with audio and video • slides with audio, video and subtitles These vodcasts have been compared to other output forms of a lecture such as: • audio only (podcast) • slides only (pdf) The vodcasts and other output forms have been uploaded to the TU Delft E-learning system (BlackBoard) for evaluation by TU Delft employees. These output forms are produced at a video resolution of the original slides (1024x768) with Microsoft codecs (wma, wmv). The selected video resolution is larger than supported by iTunes and iPod. This video resolution has been selected to enable the evaluation at larger screen sizes. The vodcasts have been produced for the initial 10:28 minutes of the lecture to limit the production time. Annex C. Collegerama as single movie/audio file 163 Vodcast file size Table 3.2 gives an overview of the file sizes for the different output forms. In this table the file sizes have been extrapolated to a full lecture of 45:00 minutes by assuming a linear relation with duration. Table 3.2: File size for different Collegerama output files Type Slides Slides Slides Slides + + + + audio audio + subtitles audio + video audio + subtitles + video Audio only (podcast 48 kbps) Audio only (podcast low quality 32 kbps) Slides only File format wmv wmv wmv wmv File size 10:28 min. (MB) 9.7 10.7 14.3 15.3 File size 45:00 min. (MB) 38.7 46.0 61.5 65.8 wma wma pdf/jpg 3.7 2.4 0.1 15.9 10.3 2.9 Slides: size 1024x768 - frame rate 15 fps Audio: mono - 44.1 kHz - 48 kb/s Video: size 205x154 (20%) – frame rate 15 fps (picture-in-picture) The results of Table 3.2 show that Collegerama vodcasts with slides are 40 to 65 MB. All of these vodcasts have the same resolution, the same frame rate, and the same audio component. The file size becomes larger when the vodcast reaches a part that has some form of increased movement. The wmv3-codec is most efficient in its compression for slideshows (picture movies). That's why the vodcast with slides and audio is only double the size of the podcast (audio only) with similar audio specifications. Moreover including a small sized video in the corner of the slide increases the file size much less than might be expected. A podcast (audio only) of the lecture requires some 10-15 MB, with the smaller size for the lower audio quality. The vodcasts with slides are smaller than the vodcast of Table 2.4 (87.8 MB). The latter is produced at a higher resolution of both the slides as the video component. Vodcast information transfer efficiency The different output forms give different results in 'understanding of the lecture'. Slides only (as pdf) give a proper impression of the content but will not allow for understanding the lecture. Adding the audio will give an improvement of this understanding. Table 3.3 gives score for the relative information transfer of the different output forms, in which a normal Collegerama view is set to 100%. The ratio file size over relative information transfer gives the information transfer efficiency per output form. 164 Annex C. Collegerama as single movie/audio file Table 3.3: Relative score for understanding the lecture for different output files, and the information transfer efficiency Type Collegerama recording Slides Slides Slides Slides Stream Relative information transfer (%) 100 Information transfer efficiency (MB / %) - 38.7 46.0 61.5 65.8 70 - 90 75 - 95 90 - 100 95 - 105 0.5 0.5 0.6 0.6 15.9 10.3 2.9 20 - 30 20 - 30 5 - 10 0.6 0.4 0.4 File size (MB) + + + + audio audio + subtitles audio + video audio + subtitles + video Audio only (podcast 48 kbps) Audio only (podcast low quality 32 kbps) Slides only Table 3.3 shows that slides and audio are the most essential parts of a vodcast for adequate information transfer. Audio only (as in a podcast) is less or not suitable for this type of lecture in which a lot of information is shown on the slides, in the form of pictures. The video component in the corner of the slide area improves the information transfer, without significantly reducing the transfer efficiency. It is therefore concluded that a vodcast for iTunes should include the video component (at a small size). Lectures without the use of slides will have a larger information transfer in a podcast (audio only) than observed in Table 3.3. Information transfer within the 80-90% range might be obtained for some lectures as seen in MIT OpenCourseWare in which the lecturer is given an oration without the use of slides and not using the blackboard. A slide only file is quiet useful despite its low information transfer. Most often it gives the title of the lecture on the initial slide, the structure of the lecture on the content slide and the highlights of the lecture in the title of the slides itself. The slide only files have the best performance in information efficiency (small ratio of size over information transfer). Moreover, slide only files are easy to navigate for the viewer. From these observations it is concluded that distributing the presentation slides (in the form of a pdf-file) always is an adequate distribution feature, even in combination with a vodcast. Vodcast production For the production of a vodcast for iTunes the following production strategies can be evaluated: • tailored vodcast production process for iTunes/iPod • vodcast production out of the YouTube vodcast Tailored vodcast production for iTunes/iPod The tailored vodcast for iTunes/iPod might be produced in a similar way as described in the chapter on a vodcast for YouTube. This vodcast can be produced in the following steps: • convert the slides into a movie file (same as for YouTube) • combine the slide movie and the lecture movie, the latter in reduced resolution • convert the produced vodcast into iPod resolution (320x240) and iPod codec (mp4) Annex C. Collegerama as single movie/audio file 165 The result of the second step has been published in BlackBoard en uploaded to YouTube for general evaluation. BlackBoard enables display of wmv files in the Windows media player at iPod resolution (and others). YouTube enables embedded viewing at iPod resolution by embedding. The results of both display modes are shown in BlackBoard at: http://blackboard.tudelft.nl/webapps/blackboard/content/listContent.jsp?mode=reset&course _id=_13432_1&content_id=_1085686_1#_1092564_1 Figure 3.5: Collegerama vodcast at iPod resolution (Source: http://www.youtube.com/watch?v=5Qqe6XxbvS4) For distribution on iTunes, the resolution should be downsized to 640x480 and might be downsized to 320x240 for the smallest download size. Table 3.4 gives the results for such conversion. Table 3.4: Properties of vodcasts for iTunes/iPod Property Vodcast at slide resolution wmv 10:28 min. 14.3 MB 1024x768 15 fps wmv3 mono 44.1 kHz wma2 Vodcast at iPod resolution wmv 10:28 min. 9.1 MB 320x240 15fps wmv3 mono 22.1 kHz wma2 Vodcast at iPod resolution mp4 10:28 min. 31.3 MB 320x240 15 fps H264 mono 22.1 kHz mp4a (AAC-SBR) Increase of file size 1 Increase of resolution 1 Increase in compression efficiency 1 (9.1/14.3= ) 0.64 (320*240/(1024*768=) 0.10 (0.10/0.64 =) 0.16 (31.3/14.3= ) 2.1 (320*240/(1024*768=) 0.10 (0.10/2.1 =) 0.05 Container type Duration File size Video Audio Table 3.4 shows that downsizing to an iPod resolution in wmv3 codec reduces the file size but far less than the reduction in resolution. At this smaller resolution the compression efficiency is less. 166 Annex C. Collegerama as single movie/audio file By transforming to an mp4 file with the H264 codec the file size is (31.3/9.1=) 3.4 times enlarged. This again shows the inferior compression of the H264 codec compare to the wmv3 codec. Vodcast production from the HD vodcast An iTunes.iPod vodcast might alternatively be produced from the widescreen HD vodcast for YouTube by unequal resizing. This allows for production of different resolutions and aspect ratios for iPod and for iPhone. This can be done by automatic conversion. Figure 3.6 gives the different aspect ratios of these vodcasts. Table 3.5 gives the results of this conversion. Figure 3.6: Change in aspect ratio by resizing from HD widescreen resolution to iPhone and iPod resolution Table 3.5: Properties of vodcasts for iTunes/iPod Property Vodcast at HQ resolution wmv 10:28 min. 19.5 MB 1280x720 (16:9) 15 fps wmv3 mono 44.1 kHz wma2 Vodcast at iPhone resolution w4v 10:28 min. 19.0 MB 480x320 (3:2) 10 fps H264 stereo 44.1 kHz mp4a (AAC-SBR) Vodcast at iPod resolution m4v 10:28 min. 15.5 MB 320x240 (4:3) 10 fps H264 mono 44.1 kHz mp4a (AAC-SBR) Increase of file size 1 Increase of resolution 1 Increase in compression efficiency 1 (19.0/19.5= ) 0.97 (480*320/(1280*720=) 0.17 (0.17/0.97 =) 0.18 (15.5/19.5= ) 0.79 (320*240/(1280*720=) 0.083 (0.083/0.79 =) 0.11 Container type Duration File size Video Audio Table 3.5 shows that the reduction in resolution hardly reduces the file size. This is caused by inferior compression efficiency of the H264 codec compared to the wmv3 codec. The loss in compression efficiency is larger at smaller resolution. This observation is in line with the conclusion in Table 3.5. The file size of the iPod vodcast is smaller than produced for Table 3.2. This difference can probably not fully be caused by the lower frame rate. Annex C. Collegerama as single movie/audio file 167 Vodcast production by TU Delft The department of TU Delft that is responsible for the Collegerama facilities has produced two vodcasts out of the recordings of two lectures of the same course (CT3011 – Inleiding Watermanagement). Based on the results of this study, which was presented on BlackBoard for evaluation, they decided to produce a vodcast for iPod resolution in which the video part was included as a transparent movie section. Figure 3.7 gives an impression of this layout. The properties of the vodcasts are given in Table 3.6. Figure 3.7: Vodcast for iPod resolution as produced by the Collegerama department Table 3.6: Properties of vodcasts for iPod as produced by the Collegerama department Property Container type Duration File size Video Audio CT3011 Lecture 3a mp4 45:56 min. 123 MB 320x240 (4:3) 15 fps H264 Video 105x78 (33%) stereo 48.0 kHz mp4a (AAC-SBR) CT3011 Lecture 3b mp4 39:18 min. 106 MB 320x240 (4:3) 15 fps H264 Video 105x78 (33%) stereo 48.0 kHz mp4a (AAC-SBR) Table 3.6 shows that the average file size for lectures with duration of 40-45 minutes is 105125 MB. This file size will require a relatively large bandwidth for fast downloading. These files sizes are even larger than the HD vodcast for YouTube as presented in Table 2.2 (90 MB for a 1280x720 vodcast). As presented in Table 2.3, this is largely caused by the inferior compression of the H264 codec compared to the wmv3 codec. The vodcasts of Table 3.6 have a proper visibility for the slides. The visibility of the movie component is less than for a non-transparent display despite its larger size (33% versus 20%). The part of the slides that is covered by the movie component has about 50% readability. However the overall view resembles a more modern iPod view. Whether a larger transparent movie section is better than a smaller non-transparant section seems to be a matter of taste. 168 Annex C. Collegerama as single movie/audio file The vodcasts again confirm the conclusion in the paragraph on Components in a vodcast. Audio is very important, the readability of the slides should be good and the video component is of minor importance for this type of lecture. Uploading to and downloading from iTunes Uploading to and downloading from iTunes could not be tested yet. The handling of the account request of TU Delft for an iTunes U account is still in progress. Conclusions and recommendations for vodcasts on iTunes The production of Collegerama vodcasts for iTunes has resulted into the following conclusions: • an iPod vodcast can be produced out of a Collegerama recording with sufficient visibility and readability despite the low resolution (320x240) • a picture-in-picture layout is preferred over a side-by-side layout because of the larger display size of the slides and therefore better readability • lecturers might anticipate in their slide design to this picture-in-picture vodcast • the wmv3 codec is much more efficient for the recorded lectures than the H264 or flv1 codec Annex C. Collegerama as single movie/audio file 169 4. Evaluation iTunes versus YouTube Comparing the iTunes world (vodcasts and iTunes functionality) with the YouTube world gives the following observations: • YouTube allows for distributing HD vodcasts at Collegerama resolution, for iTunes lower quality vodcasts should be produced • YouTube gives much more functionality than iTunes (subscripts, automatic translation of subtitles, annotation, flexibility in viewing) • iTunes is more easy for downloading, but this drawback can be overcome for YouTube by using third party download tools and by the introduction of the announced download option for YouTube channels • by using third party download tools for YouTube, users are able to download vodcasts at requested resolution, file formats etc. (mp4, flv, 3gp) YouTube is the preferred channel for distributing Collegerama vodcast. Additionally, distributing Collegerama vodcasts via iTunes is more focused on marketing. This channel might require tailored vodcasts because of the limitations in screen sizes for iPod and iPhone. It is expected that future iPods and iPhones (or their successors) will allow for displaying larger resolutions, probably at HD quality. Alternative download options Downloads for Collegerama vodcast could also be offered as part of a BlackBoard course, or within the courses available of the TU Delft OpenCourseWare. Future developments of Collegerama It might be assumed that future releases of the Collegerama server (Mediasite of Sonic Foundation) will include options for downloads produced online at a selected resolution. 170 Annex C. Collegerama as single movie/audio file Annex D. 1. 2. 3. Subtitling of Collegerama Subtitles on digital media ......................................................................... 173 Why subtitles? .................................................................................................. 173 Subtitle types .................................................................................................... 175 Developments ................................................................................................... 176 From image to text ............................................................................................ 176 Subtitle formats ................................................................................................ 177 How to create subtitles ...................................................................................... 180 Subtitling for recorded lectures ................................................................ 182 Selecting of an example lecture.......................................................................... 182 Creating subtitles for the example lecture ........................................................... 182 Subtitling in YouTube ........................................................................................ 184 Translation of subtitles for recorded lectures ........................................... 185 YouTube ........................................................................................................... 185 Annex D1. Annex D2. Annex D3. Annex D4. Transcript lecture CT3011 (unsorted) ............................................. 188 Transcript lecture CT3011 (sorted) ................................................. 191 Partial transcript lecture CT3011 (incl. time frames / sentence) ... 196 Partial transcript lecture CT3011 (incl. time frames / word) ......... 198 Annex D. Subtitling of Collegerama 171 172 Annex D. Subtitling of Collegerama 1. Subtitles on digital media Why subtitles? There are several added benefits for adding subtitles to Collegerama lectures: • lecture is easier to follow • lecture is available to foreign speaking students • lectures can be made searchable Lecture is easier to follow If a lecture contains subtitles during playback, it will be possible for deaf and people with a hearing problem to understand what is being said. These special subtitles for the hearing impaired are called "closed captions" or sometimes also referred to as "Subtitles for the hard of hearing". The term "closed" in closed captioning indicates that not all viewers see the captions, only those who choose to decode or activate them. This distinguishes from "open captions" (sometimes called "burned-in" or "hardcoded" captions), which are visible to all viewers. Most of the world does not distinguish captions from subtitles. In the United States and Canada, these terms do have different meanings, however: "subtitles" assume the viewer can hear but cannot understand the language or accent, or the speech is not entirely clear, so they only transcribe dialogue and some on-screen text. "Captions" aim to describe all significant audio content—spoken dialogue and non-speech information such as the identity of speakers and, occasionally, their manner of speaking—along with music or sound effects using words or symbols. Lecture is available to foreign speaking students Subtitles are generally used to display the spoken words in a video on the screen. For every different language, a new subtitle track has to be created. Most DVD movies that are released in Europe contain at least the subtitle tracks for the languages German, French and English. During production these subtitles are mostly created by hand using professional translators. An alternative for generating different subtitle tracks is to use an automated computer system. An example of such a service that is publically available is Google Translate. It is a beta service provided by Google Inc. to translate a section of text, or a webpage, into another language. At the moment of writing the system supports 52 different languages from around the world. Like other automatic translation tools, it has its limitations. While it can help the reader to understand the general content of a foreign language text, it does not always deliver accurate translations. Some languages produce better results than others. Lectures can be made searchable Every Collegerama lecture consists of a single video stream. Without some sort of indexing system, the only thing offered is a 45 minute long video that has no possibility for skipping through relevant parts based on a certain topic. Annex D. Subtitling of Collegerama 173 There are several methods for indexing: • transcript of spoken text • time stamped transcripts of spoken text (subtitles) • tagging Transcript of spoken text A transcript is a written record (usually typewritten) of dictated or recorded speech. When this is available for a certain movie, a search engine can be used to look through its content to see if a certain search term is mentioned somewhere. The problem with having just the spoken text is that there is no way of knowing at which timeframe the word has actually been spoken. Time stamped transcripts of spoken text Subtitles or time stamped transcripts of a video serve as the foundation for making every part of the video searchable. It's possible to search for a certain keyword or term within the spoken text. Along with the search results, a reference link to a certain timestamp can be returned so that the user can fast forward the video to that part. Tagging In online computer systems terminology, a tag is a non-hierarchical keyword or term assigned to a piece of information (such as an internet bookmark, digital image, or computer file). This kind of metadata helps describe an item and allows it to be found again by browsing or searching. Tags are chosen informally and personally by the item's creator or by its viewer, depending on the system. Within a movie, tags usually contain the title of a movie, certain topics that are discussed and possible chapter titles of a certain book that is being covered. When a movie has been described by a certain amount of tags, it is possible to create a tag cloud that shows the different topics that are covered. By matching the terms with the frequency of the times they're said in the video, a relevancy weight can be assigned to them. This results in a tag cloud where the size of the text is equivalent to its relevance. Figure 1.1: Example of a tag cloud 174 Annex D. Subtitling of Collegerama Subtitle types There are three different types of subtitles available: • hard • pre-rendered • soft Figure 1.1 gives an overview of the basic differences between these types. Table 1.1: Three different subtitle types Feature Can be turned on/off Editable Player requirements Hard No No None Transitions and effects Highest Pre-rendered Yes Difficult, but possible Most players support DVD subs Low Distribution Inside video stream Separate video stream Additional overhead Example None VHS video tape / Karaoke CD movie High DVD movie Soft Yes Yes Usually requires special software Depends on player, usually poor Small subtitle file or instructions stream Low Blu-ray movie Hard In this form, the subtitle text is merged with the original video frames and no special software or hardware is required for playback. The most commonly known form of these is Karaoke, where complex animations such as a bouncing ball are used to follow the lyrics. The disadvantage is that these cannot be turned off unless the original video stream is also available. Pre-rendered These are separate video frames that are overlaid on the original video stream while playing and are used on DVD's. The general codec used for movie DVD's is called vobsub, recognizable by the .sub and .idx files on the DVD. You can turn them on or off and usually include separate streams for different languages. They're usually encoded as images with minimal bit rate and number of colors. It's very hard to alter the subtitles, but it is possible to convert them to "soft subtitles" using software such as SubRip (using OCR technology). Soft (also known as softsubs or closed subtitles) Softsubs are separate instructions that usually contain a timestamp in combination with a piece of text that can be displayed during playback. It requires player support and there are numerous different file formats available. They are relatively easy to create and update. Figure 1.2: Example of a pre-rendered subtitle Annex D. Subtitling of Collegerama 175 Developments Each new movie distribution technology uses a more advanced type of subtitle system. When the first video tapes came out, they were displayed on an analog system. In order to display the subtitles it was necessary to hardcode them onto the video stream. For every different language, a new video tape had to be produced. Once the video was created and the subtitles were burned into the stream, it was impossible for anyone to alter them post production. When the first karaoke CD movies came out, they also used a hard-coded system that burned the subtitles directly onto the video stream. This offered the advantage of transition effects such as word highlighting that could be incorporated in these types of videos. Several years later the video distribution was done through a digital medium (DVD). This allowed for new possibilities, such as combining different video streams together. Because of this new technology, a single DVD could be produced for all the different countries and languages, because audio and video streams of subtitles could be mixed together and chosen by the end user. The problem with this subtitle system is flexibility in updating the subtitles. Because the basic system is essentially the same as the VHS streams that burn the subtitles onto the video stream, they are almost impossible to alter once the subtitle stream has been produced. In the past few years, a new digital medium has been produced called Blu-ray and once again a new subtitle system was employed. Because movie producers would like more flexibility in altering subtitle streams, it was decided that the streams should be created in a text-based form. They were no longer displayed by combining different video streams, but by letting a player render the subtitles as the movie is being played. The obvious disadvantage of this system is that a piece of software is required on the player side to accomplish this. However this is easily overcome by setting standards for the creation of Blu-ray players, that all incorporate this simple piece of rendering software. From image to text Most subtitles consist purely of text characters. Since text is also some of the easiest data to store and compress it makes sense to store subtitles as simple text files or a text stream within a video file. Although it's normal for all subtitles to start out this way, that doesn't mean that this is the way they are stored. As a matter of fact, subtitles on DVDs aren't actually text. They're actually encoded as raster graphics. Much like the way characters on older text-based computer interfaces, they're actually just a collection of dots on a grid. These images are put over the top of the video frame when displayed. The important thing about any text-based subtitle format is that you do have the ability to edit subtitles easily. Since editing a text-based subtitle can generally be done with even a simple text editor like Notepad, they're the easiest to modify and by far the easiest to create yourself. Creating subtitles isn't exactly something most people have the inclination (or time) to do, but if you want to do this you'll have to at least start with a text-based format. Perhaps the biggest reason for the widespread development of text-based subtitles is their use in AVI files. While AVI files can't contain graphic subtitles, they can have text subtitles. AVI was/is the most common container for MPEG-4 ASP video encoded with codecs like DivX and XviD and was also the format first added to DVD players for MPEG support. 176 Annex D. Subtitling of Collegerama In order to create text-based subtitles from an image-based format, a process called OCR or Optical Character Recognition is used. OCR software essentially attempts to 'read' the text represented by the images. The problem is that there can be big differences between two different images of the same character. Differences in fonts and spacing make it nearly impossible for even the most sophisticated OCR software to identify every character correctly. Subtitle formats There are countless different formats for the displaying and storing of subtitles. This is mainly caused by different movie producers, video players or other software development companies who are all trying to create the best technology that will become the standard format to be used. Some of these are more popular and more widely used as others, but each format can be placed in one of four categories: • image based subtitles • text based subtitles • HTML-based subtitles • XML-based subtitles Table 1.2 shows a select number of examples for each category, along with a few characteristics that are typical for that format. Table 1.2: Different subtitle formats Name Extension Text Styling Metadata VobSub .sub + .idx N/A N/A XSUB embedded N/A N/A SubRip .srt No No SubViewer .sub No Yes AQTitle .aqt No No JACOSub .jss Yes No MicroDVD .sub No No MPSub .sub No Yes Ogg Writ embedded Yes Yes Phoenix Subtitle .pjs No No PowerDivX .psb No No (Advanced) SubStation Alpha .ssa or .ass Yes Yes SAMI .smi Yes Yes RealText .rt Yes No MPEG-4 Timed Text .ttxt (or embedded) Yes No Structured Subtitle Format .ssf Yes Yes Universal Subtitle Format .usf Yes Yes Image based Text based HTML-based XML-based Annex D. Subtitling of Collegerama 177 Image based The most commonly used and known image based subtitle format is vobsub. It is the name of the format for bitmap subs after they have been extracted from a VOB, Video Object file, on a DVD. The application which extracted the subtitles was also called VobSub (now known as VSRip) was developed by Gabest. VobSubs consist of an .idx file (the index of starting timestamps, colors, and other basic info) and a .sub file (which contains the bitmap pictures for the subtitles themselves). Unlike text based formats, VobSubs are usually somewhat larger in size because the images take up more disk space. Index code and bitmap sample: timestamp: 00:18:40:752, filepos: 0000aa000 File:ct3011_001.PNG Figure 1.3: Sample code for vobsub (image based) Text based SubRipText (or SRT) is a subtitle format commonly used in combination with XviD or DivX movies. SubRip is an optical character recognition program for Windows which rips (extracts) subtitles and their timings from video files or DVDs, recording them as a text file. It is also the name of the subtitle format created by this software. The caption files are named with the extension .srt. This format is supported by most software video players and subtitle creation programs. An example of this is shown in Figure 1.4. 1 00:00:00,100 --> 00:00:03,300 Na een ruime inlooptijd 2 00:00:03,300 --> 00:00:06,800 kunnen we beginnen 3 00:00:06,800 --> 00:00:11,700 met het tweede deel van 30-11, Watermanagement. Figure 1.4: Sample code for SRT (text based) HTML-based An example of a HTML-based subtitle system is called SAMI. It stands for Synchronized Accessible Media Interchange and is a rare subtitle HTML-format that is based on start frames that are given for each subtitle. The structured markup language is designed to simplify creating captions for media playback on a PC, i.e. not for broadcast purposes. SAMI documents are text, and can be written in any text editor, although there are special utilities available to create SAMI documents. They use .smi or .sami file extensions. The common use of .smi for SAMI files creates a file extension collision with SMIL files. The advantage of this format is that each SAMI document may contain more than one language. It is also the only supported subtitle format supported by Microsoft Windows Media Player. An example of the SAMI format is shown in Figure 1.5. 178 Annex D. Subtitling of Collegerama <HEAD> <STYLE TYPE="Text/css"><!-P {margin-left: 29pt; margin-right: 29pt; font-size: 14pt; text-align: center; font-family: tahoma, arial, sans-serif; font-weight: bold; color: white; background-color: black;} .SUBTTL {Name: 'Subtitles'; SAMIType: CC;} --> </STYLE> </HEAD> <BODY> <SYNC Start=0><P Class=SUBTTL><br> <SYNC Start=100><P>Na een ruime inlooptijd <SYNC Start=3300><P><br> <SYNC Start=3300><P>kunnen we beginnen <SYNC Start=6800><P><br> <SYNC Start=6800><P>met het tweede deel van 30-11, Watermanagement. </BODY></SAMI> Figure 1.5: Sample code for SAMI (HTML-based) XML-based One of the latest subtitle types is based on XML and is called MPEG-4 Part 17 or MPEG-4 Timed Text. It is a text based subtitle format for MPEG-4 which is used on the new Blu-ray media disc system. It is streamable, which was one of the main aspects when creating the format and is mainly aimed for use in the .mp4 container, but can also be used in the .3gp container (as 3GPP Timed Text), which is technically almost identical with .mp4 but more used in cell phones. 3GPP Timed Text is exactly the same as MPEG-4 Timed Text when used in the .mp4 container. QuickTime Pro and MP4Box can create or produce these kind of subtitle streams out of various subtitle input formats. MP4Box uses the fourcc tx3g for MPEG-4 Timed Text because of its inherently higher compatibility. MPEG-4 Timed Text is heavily based on XML semantics. Of interest is the fact that it seems a line must defined for all times, meaning when there are no subtitles to be displayed, a blank line must be inserted. An example of MPEG-4 Timed Text subtitles is shown in Figure 1.6. <tt xml:lang="en" xmlns="http://www.w3.org/2006/10/ttaf1" xmlns:tts="http://www. w3.org/2006/10/ttaf1#style"> <head> <layout /> </head> <body> <div xml:id="captions"> <p begin="00:00:01" end="00:00:07"><![CDATA[Na een ruime inlooptijd]]></p> <p begin="00:00:08" end="00:00:10"><![CDATA[kunnen we beginnen]]></p> <p begin="00:00:10.5" end="00:00:12.5"><![CDATA[met het tweede Watermanagement.]]></p> </div> </body> deel van 30-11, Figure 1.6: Sample code for MPEG-4 timed text (XML-based) Annex D. Subtitling of Collegerama 179 How to create subtitles Subtitles for translation and searching are only composed of spoken text. This is created from the audio track extracted from the video stream. The creation method is shown in Figure 1.7. Figure 1.7: Conversion process for creating subtitles There are several ways of creating subtitles: • manual post processing • speech recognition • live Manual post processing Many different programs can be used to manually create subtitles for a movie, but the overall usage of them is generally the same. You start by typing in the lines of text that are spoken in the movie. Once these are finished the transcript need to be matched to the time sequences of the movie. For every line of text, a digital timestamp is added so that the subtitle generator can later show the appropriate text at the right timeframe. Figure 1.8: Screenshot of the program SubCreator The advantage of this method is that it is very easy and editing the subtitles is simple. Everyone who can understand the language that is being spoken can write out the transcripts of a given video stream. The problem is that such a process is very time consuming and therefore relatively expensive. Speech recognition At the moment, speech technology is still a long way from achieving totally automatic subtitling for any program. There are still too many errors in generating text and several challenges such as background noise, different accents and multiple simultaneous speakers make the process very difficult. However, speech technologies do have their place in the world of modern subtitling. Speech recognition systems are already used in live subtitling systems for sports, news and politics. Note that for searching within transcripts, the text does not have to be as good as for translation. Live Live subtitles have to be created within 2 or 3 seconds of the broadcast. There are people specializing in this sort of work, called Communication Access Real-Time Translation stenographers. They use a specialized keyboard that is specifically designed to support shorthand writing, called a stenotype or velotype typewriter. 180 Annex D. Subtitling of Collegerama Figure 1.9: Two examples of a velotype typewriter Realtime stenographers are the most highly skilled in their profession. Stenography is a system of rendering words phonetically, and English, with its multitude of homophones (e.g. there, their, they're), is particularly unsuited to easy transcriptions. They must deliver their transcriptions accurately and immediately. They must therefore develop techniques for keying homophones differently, and be unswayed by the pressures of delivering accurate product on immediate demand. Annex D. Subtitling of Collegerama 181 2. Subtitling for recorded lectures Selecting of an example lecture For the case of this project, a Dutch lecture given by J.C. (Hans) van Dijk about Sanitary Engineering was selected for the purpose of testing certain subtitling techniques. This lecture is from the bachelor's course CT3011, Introduction Water Management. It has the following specifications: Table 2.1: Sample lecture chosen for subtitling Course Lecture Lecturer Duration Number of slides Collegerama link CT3011 – Introduction Water Management Lecture 7 – Sanitary Engineering (Civiele Gezondheidstechniek) Prof. ir. J.C. (Hans) van Dijk 45:09 27 http://collegerama.tudelft.nl/mediasite/Viewer/?peid=f33ba7ff-01604259-bd94-7ee0d9c5a461 For the creation of subtitles for this lecture, the most flexible type is soft subtitles, because they are manageable and easy to update. Another advantage is that the subtitles are offered in plain text, so they can easily be sent through translation engines or even search through them with relative ease. Creating subtitles for the example lecture To manually create a soft subtitle track, there are plenty of programs which are easy to use and all work very similarly. The one used in this example is called SubCreator (as shown earlier). It allows the user to play, pause, fast forward and rewind the video stream and contains a textbox where the spoken text can be entered (see Figure 2.1). Figure 2.1: Transcript annotation using Subcreator After the entire transcript has been typed out, the video has to be replayed from the beginning. This is done in order to add the timecodes corresponding to the text, which are necessary for later playback. A subtitle player can't possibly know when to display a certain piece of text on the screen. Within SubCreator, the shortcut ctrl+a will add a timecode to the current line of text and automatically skip to the next line. Sometimes the sentences are too long and they have to be split up in order for them to fit on the screen during playback. Usually, comma's or short pauses in the text are chosen to split them up. 182 Annex D. Subtitling of Collegerama Another common problem is the pace at which the text is spoken. Sometimes the lecturer speaks at such a high speed that it is hard to follow the subtitle text, because the text on the screen is updated so quickly. In these instances it is important to try and make the text on the screen as long as possible, so that it won't update too fast for viewers to follow, but will still fit well on the screen. These two problems combined make it difficult to properly create good subtitles. There are two options that can solve this problem. The long sentence can be split up into smaller sentences, or a really long subtitle line is created which will automatically be divided into two rows by the subtitle player during playback. The best solution depends on the pace at which the lecturer is speaking. If he is speaking really fast, then it is very chaotic to use one single line, because the text will update really fast. For this situation it is better to create two rows of text so that the subtitles become easier to follow. If the lecturer is speaking at a slower pace, it is best to simply cut up the text into single lines, usually done in between short pauses. Figure 2.2: Adding timecodes to the transcript Once the transcript has been completely worked out, the sentences have been properly split up (so that they are easy to follow and aren't too long) and the timecodes have been entered, it is time to convert them to a proper subtitle format. SubCreator offers several different formats: • SSA format • SRT format • simple time format • frame format (MicroDVD) SRT is the most commonly used and therefore the one used in this example (as you can see in Figure 2.3). It converts the text and saves it to a *.srt file on your hard drive so it can be used for playback along with the video file. Annex D. Subtitling of Collegerama 183 Figure 2.3: Converting the time-coded transcript to a generic subtitle format To manually work out the whole transcript and to properly timecode every single sentence, takes a normal student approximately 3 hours, for a video of 45 minutes. This means that it takes 4 times the length of the video to create a complete time-coded subtitle stream. The results can be viewed in Annex A, B, C and D. Subtitling in YouTube The online video distribution website YouTube allows for the possibility of adding subtitle tracks to your uploaded videos. For every language, a new subtitle track needs to be uploaded so the viewers can switch between languages (as shown in Figure 2.4). YouTube accepts *.srt files for this. Figure 2.4: Adding subtitles to a YouTube movie 184 Annex D. Subtitling of Collegerama 3. Translation of subtitles for recorded lectures Subtitles for translation and searching are only composed of spoken text. This is created from the audio track extracted from the video stream. The creation method is shown in Figure 3.1. Figure 3.1: Creation process for subtitles There are several ways of creating translated subtitles: • manual subtitling • real-time subtitling • speech recognition YouTube If there is at least one subtitle track available, YouTube provides a translation service that can automatically convert the subtitles to another language. This is done through the Google Translate service mentioned. On the bottom right of the YouTube interface, a button with the CC logo (the official logo which stands for Closed Captions) is available to turn the subtitles on or off. It also opens a submenu from which you can access the translation menu (see Figure 3.2). Figure 3.2: Turning subtitles on or off in YouTube When the translation menu has been opened, the user can choose from 52 different languages that are available under the dropdown menu (see Figure 3.3). Once a language has been chosen, the subtitles will be automatically sent to the Google Translate engine and YouTube will display the results. Annex D. Subtitling of Collegerama 185 Figure 3.3: Google Translate menu in YouTube Automatic translation is very difficult, as the meaning of words depends on the context in which they're used. Scientists and computer developers are still working on this problem and it may be some time before anyone can offer a quick and seamless translation experience. Obviously, the translation that is offered today is far from being perfect or even coherent. However, it's still a great way to understand the central ideas from a text. Now that Google Translate supports so many languages, it's not hard to imagine that you'll be able to read almost any web page in your language and maybe any application will be able to use Google Translate's APIs to speak your language. Figure 3.4: Translated subtitles from Dutch to English in YouTube Google Translate's coverage has been expanded dramatically. It now supports the translation between any of the following languages: English, Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, Finnish, French, German, Greek, Hindi, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish. From 26 language pairs, Google Translate now supports 56 language pairs and becomes the most comprehensive online translation tool available for free. Most state-of-the-art, commercial machine-translation systems in use today have been developed using a rule-based approach, and require a lot of work to define vocabularies and grammars. Google Translate takes a different approach and feeds the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. After that, statistical learning techniques are applied to build a translation model. 186 Annex D. Subtitling of Collegerama Table 3.1: List of languages supported by Google Translate African Albanian Arabic Belarusian Bulgarian Catalan Chinese Croatian Czech Danish Dutch English Estonian Filipino Finnish French Galician Annex D. Subtitling of Collegerama German Greek Hebrew Hindi Hungarian Icelandic Indonesian Irish Italian Japanese Korean Latvian Lithuanian Macedonian Malay Maltese Norwegian Persian Polish Portuguese Romanian Russian Serbian Slovak Slovenian Spanish Swahili Swedish Thai Turkish Ukrainian Vietnamese Welsh Yiddish 187 Annex D1. Transcript lecture CT3011 (unsorted) Na een ruime inlooptijd kunnen we beginnen met het tweede deel van 30-11, Watermanagement. Het deel over gezondheidstechniek ga ik de komende zeven weken met jullie doornemen. En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus, mijn naam is Hans van Dijk, zoals jullie daar zien staan en ik dacht, laat ik daar maar twee dingen voor nemen, mijn hobby en mijn werk. Nou de hobby dat zien jullie, ik ben een marathonloper. Een mooie foto van de glorieuze binnenkomst in Rotterdam in april afgelopen periode. Marathonlopers dat zijn allemaal een beetje fanatieke lui he, echte doordouwers, die trainen iedere dag. Die weten hun leven zodanig te organiseren dat dat allemaal kan. Dus ik loop hier ook iedere dag tussen de middag een rondje naar Delfts hout, of langs de Schie, of een ander parcour hier. Als jullie me eens een keer in korte broek of trainingspak zien lopen dan klopt dat, dat ben ik. En dat doe ik inmiddels met een heel groepje mensen, bij ons op de afdeling, met studenten en promovendi. En een van die studenten is hier weergegeven, dat is Karin Teunissen. Die zat drie jaar geleden hier bij inleiding watermanagement. Was toen derde jaars, inmiddels is ze afgestudeerd en begonnen met een promotieonderzoek bij het duinwaterbedrijf in Scheveningen. En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april 42 kilometer samen gelopen. Nou dat is een herinnering die ons beide in het geheugen gegrift zal blijven. Dan het werk. Ik heb, ik ben, ja een vraag, ben jij ook een hardloper? - Sorry? Ben jij ook een hardloper? - Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al gegeven. Nee - Niet? Dan weet ik niet hoe ik dit al wist, maar... Nou ik zeg dit wel eens vaker, dus dat zou best kunnen. Waar ben je geweest? - Ja volgens mij vorig jaar, maar... Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha. Maar deze foto is echt van april hoor dus dat is toch vrij recent. Wat misschien zou kunnen zijn is, ik geef ook altijd een van de gastcolleges bij inleiding Civiele Techniek in het eerste jaar. En daar begin ik natuurlijk ook een beetje met, ja wie ben ik, dus dat zou best kunnen, dat je het daarvan herinnert. Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd. Ik heb toen ook Civiele Techniek gestudeerd, in '76 afgestudeerd. Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in Amersfoort, en dat kan ik jullie van harte aanraden als je straks afgestudeerd bent om bij een ingenieursbureau te gaan werken. Dat is een geweldige ervaring, je bent met allerlei projecten over de hele wereld bezig. In mijn geval dan drinkwater projecten. Dus het ontwerpen van zuiveringsinstallaties, bouwen van systemen, ook het doen van onderzoek. Eigenlijk kun je alle kanten op bij een ingenieursbureau en de Nederlandse ingenieursbureaus zijn redelijk succesvol, ook op de internationale markt tegenwoordig. Ja ik heb daar vele jaren gewerkt, totdat op een gegeven moment, inmiddels is dat alweer 17 jaar geleden, er een advertentie stond dat we een hoogleraar zochten hier in Delft. En toen dacht ik van, nou ja, laat ik maar eens een brief schrijven, je weet het nooit, niet geschoten is altijd mis. Dus ik heb een brief geschreven en ik dacht, ik zal het vast wel niet worden, maar ik werd het wel. Dus ook daar zit al meteen een eerste levensles in, probeer maar eens wat en het kan altijd meevallen. Ik ben in eerste instantie vervolgens voor een dag in de week hier deeltijdhoogleraar geworden in de drinkwatervoorziening, dat is mijn leerstoel. En ja, zo langzamerhand van het een komt het ander, je wordt voor steeds meer dingen gevraagd. Dus ik ben langzamerhand meer dingen hier in Delft gaan doen en die aanstelling bij DHV heb ik steeds verder afgebouwd, en vanaf 1999 ben ik volledig gestopt bij DHV en ben ik hier voltijd hoogleraar. En voltijd hoogleraar dat betekent ook, je hebt enerzijds taken op het gebied van onderwijs, anderzijds onderzoek, maar ook management, dus management, ja, dan moet je, ik ben hoofd van een afdeling enzo en dan zit je in het managementteam of in de opleidingscommissie. Moet je over algemene dingen meepraten en beslissen. Daar kun je natuurlijk een dagtaak van maken, dat heb ik altijd vermeden. Ik vind het toch altijd het leukste om met het vak bezig te zijn en daarmee kom ik op het tweede plaatje wat hier staat, want het allerleukste is eigenlijk afstudeerders begeleiden. Dat gaan jullie de komende jaren dat proces doormaken. Dat is voor ons altijd ontzettend leuk om te zien hoe studenten zich transformeren van min of meer anonieme figuren die in de collegezaal zitten en zitten te luisteren. Min of meer absorberen wat ik in een monoloog aan het overdragen ben. Hoewel ik overigens wel reacties van jullie zeer op prijs stel hoor en ik zal daar ook af en toe expliciet om vragen. Maar goed, de praktijk is toch dat in deze fase van de studie zitten jullie nog vooral te luisteren en dat wordt eigenlijk steeds leuker als je verder komt in het vierde en het vijfde jaar en het hoogtepunt is dan natuurlijk het afstuderen, waar je echt een onderwerp helemaal zelf bij de kop pakt. Ik zeg ook altijd tegen mijn afstudeerders, je moet van je afstudeerproject je visitekaartje maken, he Doris, en dat werkt ook echt zo. Op het moment dat je klaar bent met dat afstudeeronderwerp dan weet jij het meeste van dat onderwerp af. Meer dan wie dan ook in Nederland. Dat bewijzen we ook iedere keer weer door de afstudeercolloqui. Daar geven we veel kenbaarheid aan, daar komen altijd mensen vanuit de waterbedrijven van KIWA, van andere researchinstituten. Die doen daar mee in de discussies en onze afstudeerders die weten keer op keer alle vragen te beantwoorden. Misschien niet altijd 100% goed, maar toch wel 99% goed. Dat is altijd een genoegen om mee te maken. Ik zeg ook altijd dat ik trots ben op mijn afstudeerders, en dat is ook zo. Ik heb er inmiddels een stuk of 80 gehad en soms gaat het dan heel goed, zoals hier staat met Karin en Doris, Doris is hier trouwens in de zaal aanwezig, die dan het afgelopen jaar allebei zelfs met lof zijn afgestudeerd. Dat betekent dus dat je het heel goed gedaan hebt, hoge cijfers gehaald hebt, en ook het afstudeerproject heel goed gedaan hebt. Ja, dat is voor ons gewoon heerlijk om dat mee te maken. Om te zien hoe jonge mensen het vak ook leuk gaan vinden, zelf ook enthousiast worden, en hun stempel gaan zetten op ons vakgebied. En ik hoop dat enkele van jullie ook zo ver zullen komen. Goed, dat is wat mijzelf betreft. Dan wat dit vak betreft. We gaan dat doen aan de hand van het boek, dat staat al op blackboard aangegeven. Daar hebben we een Nederlandse en een Engelstalige versie van. Dat boek dat moeten jullie kopen bij de secretaresse van ons, Mieke op de vierde verdieping, voor 25 euro. In de winkel kost het 50 euro, maar wij hebben een speciale kortingsregeling. Jullie mogen zelf weten of je het Nederlandse of het Engelse boek koopt. De inhoud is vrijwel hetzelfde en in ieder geval voldoende voor dit vak. Als jullie een advies van mij willen hebben dan zou ik zeggen, als je goed Engels kunt lezen, koop het Engelse boek, dat is iets actueler, staat iets meer informatie in, maar het Nederlandse boek is voor dit vak zeker voldoende. Ja, zo'n boek heeft natuurlijk, behalve dat we er over gaan vragen bij het tentamen, daar zal ik bij mijn volgende dia op terugkomen, heeft zo'n boek natuurlijk ook nog een zekere functie als naslagwerk. Als je zo'n boek eenmaal hebt, dan heb je dat bij je, ook na je afstuderen neem je dat mee. Als je vervolgens ergens in een vreemd land een installatie moet ontwerpen, dan haal je dat boek weer eens uit de tas en dan weet je weer het een en ander. Die functie heeft zo'n boek ook. Daar staan vraagstukken ook in, in dat boek, en we hebben ook vraagstukken op blackboard staan. Dat zullen jullie misschien ook al gezien hebben, computer assignments. Dat is overigens niet verplicht, er is bij ons niets verplicht. Ja, jullie moeten uiteindelijk het tentamen doen, maar we bieden materiaal aan, dus maak er gebruik van zou ik zeggen maar we gaan dat niet controleren. Er staan daar vragen op blackboard, er zitten vragen in dat boek, de antwoorden staan er ook bij, of althans, als je die computer assignment gemaakt hebt dan krijg je na afloop te melden welke vragen goed waren en welke vragen fout waren. Dus dat is een ondersteuning voor jullie bij het kennismaken met de materie en het leren van de stof. En oude tentamens hebben we daar ook bij staan, dus dan kun je ook nog eens oefenen en kijken wat er ongeveer gevraagd wordt. En dan gaan we college geven de komende periode. Oh ja, dus over het boek, jullie hoeven niet het hele boek te kennen. Dat boek wordt zowel gebruikt bij 30-11, als bij het volgende college 34-20, wat een a keuzevak is voor de mensen die watermanagement gaan doen, en de hoofdstukken die voor 30-11 gevraagd worden op het tentamen staan hier aangegeven. En die presentatie komt ook weer op blackboard zoals jullie weten, inclusief deze video opname. Dan gaan we deze colleges geven, dus 7 keer de komende periode vanaf nu, en ik wil het dit jaar zo doen dat in het eerste uur vertel ik een beetje de grote lijn van het betreffende onderwerp. De belangrijkste punten, ik probeer daar wat kleuring aan te geven. Wat is nou belangrijk en wat minder. En het tweede uur heb ik steeds een van de promovendi, vandaag is dat Doris, die dan iets gaan vertellen over hun eigen onderwerp, hun eigen onderzoek, hun eigen project, wat een stukje actualiteit geeft, en kleuring, verdieping, van het betreffende onderwerp. En ik heb het zo georganiseerd dat dat steeds, als het goed is, goed op elkaar aansluit en jullie een goed beeld geven van de stof, zodat je straks het tentamen ook makkelijk kunt maken. Dat wil niet zeggen dat alle onderdelen van de verhalen van de promovendi tentamenstof zijn. Dat zullen we zo her en der ook wel aangeven. Ja, zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan jullie nu in het derde jaar hoeven te weten, maar het gaat meer om de beeldvorming, de kleuring en het begrip van de materie. Dan hebben we een excursie gepland naar de Berenplaat, de grote zuiveringsinstallatie bij Rotterdam, bij Spijkenisse om precies te zijn, op 11 oktober. Ook dat is niet verplicht, alles is facultatief bij ons. Daar hebben zich tot nu toe een stuk of 60 mensen aangemeld. De inschrijving sluit op 1 oktober hebben we gezegd, omdat bij de waterbedrijven tegenwoordig ook strikte veiligheidsvereisten enzo zijn na de aanslagen in New York. Je moet daar precies opgeven wie er allemaal komen, met naam enzo en wij moeten daar voor instaan ook, dat er geen vervelende dingen gebeuren, en er moeten natuurlijk ook bussen gereserveerd worden en we krijgen daar lunch geserveerd. Dus de mensen die zich opgegeven hebben die krijgen nog een mailtje binnenkort, kort na 1 oktober, met een bevestiging, en degene die zich niet opgegeven hebben die gaan niet mee. En ik ga er ook van uit dat degenen die zich wel opgegeven hebben, dat die ook komen he, het is natuurlijk een beetje vervelend tegenover de organisatoren als we daar met veel minder mensen zouden aankomen dan we aangemeld hebben. We zullen proberen, ik heb wat vragen gekregen over dat er 's middags verplichte practica zouden zijn van constructieleer en statistiek geloof ik, dus we zullen proberen om tijdig weer terug te zijn. Dat zal zeker niet om half 2 zijn, dus ik denk dat we ongeveer om half 3 terug zullen zijn, en we vertrekken gewoon na het college op donderdag, dus om half 11. Ik weet niet of, even kijken of ik al ga beginnen, ja ik ga al beginnen dus, zijn er vragen over de organisatie en deze algemene inleiding? Okee. Nou dan ga ik kort even iets vertellen over gezondheidstechniek, dat zal jou ook bekend voor komen want dat heb ik ook bij het eerste jaar al verteld, en dan ga ik iets meer vertellen over de drinkwatervoorziening van Nederland en na de pauze gaat Doris dan iets vertellen over de drinkwatervoorziening in ontwikkelingslanden, want daar is zij vooral mee bezig. We hadden natuurlijk gezondheidstechniek. Nou dat zal ieder van jullie niet onbekend zijn, dat dat gaat over de stedelijke waterkringloop, dus de infrastructurele werken voor de voorziening van drinkwater, het winnen van grondwater, het winnen van oppervlaktewater, het zuiveren daarvan, het vervolgens transporteren met een heel transportleidingen en distributieleidingensysteem naar ons allen toe. Naar de huishoudens en de industrieen, de bedrijven, vervolgens het inzamelen van het afvalwater via de riolering. Het zuiveren van dat afvalwater en dat wordt dan vervolgens weer geloosd op het oppervlaktewater. Dus alle infrastructurele werken die over die kleine stedelijke waterkringloop gaan, dat is wat we gezondheidstechniek noemen, en ik zal hier vooral focussen op de drinkwatervoorziening, omdat we daar ook het meest duidelijke effect zien zoals hier in deze figuur weergegeven. Het verdwijnen van besmettelijke ziekten in Nederland, doordat die niet meer overgedragen worden via besmet drinkwater. In de rest van de wereld is dat natuurlijk nog een hele andere situatie, maar hier hebben we daar flink veel succes mee gehad in de 20e eeuw. We zien hier een plaatje dat weergeeft de daling van de sterfte aan buiktyfus in de 20e eeuw, en dat loopt parallel aan het percentage van de mensen wat niet aangesloten is op de drinkwatervoorziening, in diezelfde periode is in Nederland de drinkwatervoorziening aangelegd. Rond 1900, zelfs kort voor 1900, de grote steden en zo langzamerhand ook de kleinere steden en het platteland, en vanaf 1975 zeg maar, is in Nederland iedereen op de drinkwatervoorziening aangesloten en komen besmettelijke ziekten die door besmet drinkwater overgedragen worden ook niet meer voor. Dus het gaat bij ons om infrastructurele 188 Annex D. Subtitling of Collegerama werken voor een goede waterkwaliteit, dus zaken als waterwinning, waterzuivering, watertransport, waterchemie en microbiologie ook, die waterkwaliteit. Microbiologie, enerzijds het afwezig zijn van organismen waar we ziek van kunnen worden maar anderzijds ook het gebruiken van micro organismen om de zuivering te optimaliseren. Micro organismen kunnen ook weer verontreinigingen afbreken, bekendste voorbeeld daarvan is de afvalwaterzuivering waar we met behulp van zuurstof en actief slib, dat is een mengsel van bacterien, de afvalstoffen in het afvalwater laten afbreken. Dus waterkwaliteit, waterchemie en microbiologie zijn in dit deel van de civiele techniek vrij belangrijk. We maken natuurlijk ook gebruik van de algemene kennis van civiele ingenieurs en met name dan van zaken als hydraulica, hydrologie, constructieleer, constructieve vormgeving, projectrealisatie, informatica, zijn natuurlijk allemaal dingen die je in projecten nodig hebt. Vaak ook in teamverband, bij zo'n ingenieursbureau bijvoorbeeld. De een is meer bezig met de automatisering, de ander is meer bezig met het constructieve deel, een derde is weer met de hydraulica bezig, en jullie kunnen afhankelijk van de specialisatie die je kiest daar een verschillende rol in spelen. Die gezondheidstechniek is natuurlijk van groot belang voor de volksgezondheid, dat spreekt voor zich. Het gaat over relatief grootschalige infrastructurele werken, we zien hier de zogenaamde Biesbosch bekkens. Dat is in de Brabantse Biesbosch. Bekkens die aangelegd zijn voor de drinkwatervoorziening, en het gaat om een goed georganiseerde sector met heldere taken. Er is zelfs een aparte wetgeving voor, de waterleidingwet, als het over de drinkwatervoorziening gaat, waarin gewoon staat precies waar alles aan moet voldoen, en dat de directeur van het waterleidingbedrijf daar persoonlijk voor aansprakelijk is. Die riskeert gevangenisstraf als die onvoldoende water of water distribueert waar je ziek van kan worden. Dus dat is allemaal goed georganiseerd. En we doen daar in Delft een hoop aan, dus die leerstoel van mij, de leerstoel drinkwatervoorziening, is de enige leerstoel in Nederland op het gebied van de drinkwatervoorziening. Dus dat is wel fijn, geeft ons een zekere exclusiviteit. Veel van onze studenten die zijn dus ook, ja die hebben toonaangevende posities in die vakwereld, die zijn directeur of staffunctionaris, of ontwerper bij de waterbedrijven, en ook veel van onze ingenieurs gaan naar de ingenieursbureau's toe. Nou, daar gebeurt een heleboel, af en toe hebben we zelfs ook gastcolleges van Willem Alexander die ook het watermanagement interessant vindt. Nou dan heb ik tenslotte nog drie dia's voordat ik wat meer ga vertellen over de opzet van de infrastructuur in Nederland, die nog even wat illustreren van dat werk van ons vakgebied. Dus dit plaatje dat heb jij ook al gezien he, dus jij kan mij nu ook vertellen waar dit dipje vandaan komt? - Volgens heeft dat iets met de pauze te maken. Ja precies, dus dit is het waterverbruik tijdens massa events, in dit geval de WK voetbal, en nu zie je dat we ons allemaal als kuddedieren gedragen. Dat vanaf het begin van de wedstrijd, dat is hier, het waterverbruik enorm naar beneden gaat. Niemand gebruikt meer water, iedereen zit voor de TV, zit te kijken. Totdat het rust is, dan rennen we allemaal naar de WC en naar de koffieautomaat, dan hebben we een enorme stijging in het waterverbruik. In de tweede helft gaat weer iedereen kijken, zien we weer een zeer lage piek in het waterverbruik met zelfs een minimum kort voor de tijd toen dat beslissende doelpunt, in dit geval door Dennis Bergkamp gemaakt werd, en aan het einde van de wedstrijd rent iedereen weer naar de WC toe. En datzelfde zie je dus ook bij het industriele verbruik he. Zelfs daar is het zo dat operators enzo, die zitten ook te kijken, en alles zit toch een beetje op halve kracht te draaien. Dat is een enorm reproduceerbaar fenomeen, deze curves, soort electrocardiogrammen van ons gedrag. Het gedrag van de bevolking. En dit dipje, dat noemen we inderdaad de Cruijff dip. Dat is het moment tijdens de pauze waarop Cruijff commentaar komt geven. Dan rent iedereen weer even terug van de WC om even te luisteren wat Cruijff te zeggen heeft en dan worden vaak ook de doelpunten herhaald, en dan kijken we allemaal weer eventjes naar de TV. We zijn natuurlijk vooral bezig met ontwerpen. Het gaat natuurlijk vaak om nieuwe infrastructurele werken. De bouw van een pompstation, het ontwerp van een zuiveringsinstallatie en transportleiding en ontwerpen daar hebben jullie natuurlijk al veel over gehad bij projectonderwijs, en het ontwerponderwijs. Dat is schematiseren. Een bepaald kader in je hoofd maken van hoe iets in elkaar zit. Dus een filter, hoe schematiseren we dat nou, en hoe stroomt het water door een installatie heen. De hydraulische lijn. Daar moeten we een bepaald schema van maken. Daar moeten we formules op kunnen toelaten. Dat moeten we kunnen berekenen. En daar moeten we vooral ook geen fouten bij maken, daar is dit plaatje voor bedoelt. Een van de koolfilterinstallaties bij de drinkwaterleiding van Rotterdam, bij Kralingen, langs de Drienernoordbrug, waar toendertijd een keer waterslag is opgetreden, met als gevolg implosie van dat koolfilter, en dat is natuurlijk heel vervelend. Vaak loop je daar dan ook tegenaan dat je met die wet van Murphy te maken hebt, dat alles wat fout kan gaan dat gaat ook een keer. Dus waterslag dat is het verschijnsel dat als bijvoorbeeld een pomp afslaat, dat er een onderdrukgolf kan ontstaan en die onderdruk die kan dus inderdaad tot implosie leiden. Nou dat kan je natuurlijk voorkomen door een ontluchting beluchtingsventiel aan te brengen. Dat is hier ook gedaan, bovenop dat koolfilter zat zo'n ventiel, maar helaas was het net op het moment dat die pomp hier uitviel, ten gevolge van een stroomstoring, was het ook winter en was het een hele strenge vorst en was dat ontluchtingsventiel bevroren, waardoor er geen lucht meer kon toetreden en er dus toch vacuum ontstond in dat vat, en ja, dit resultaat optrad. Dus ontwerpen is vooral ook bewust zijn van dingen die mis kunnen gaan, vandaar ook dat hydraulica ook vrij belangrijk is. Het is natuurlijk heel vervelend als het water ergens uit spuit of de verkeerde kant op gaat, dus je moet vooral ook steeds alert zijn op dingen die fout kunnen gaan en ontwerpen is vooral ook ervaring. Dingen gezien hebben, hoe doe je het in de praktijk nou? Vandaar ook dat we die excursie gepland hebben naar de Beerenplaat toe, dan kunnen jullie voor de eerste keer vast eens even kijken van, ja, hoe ziet zo'n installatie er nou uit, waar moet je nou allemaal rekening mee houden? Alright, nou, nog een paar plaatjes van een ander project, in Limburg in dit geval, waar een grote transportleiding is aangelegd bij een oppervlaktewaterproject in Panheel. Dat was in het kader van de zogenaamde verdrogingsdiscussie. Dat is een discussie die in Nederland een aantal jaren gevoerd is, onder andere door de winning van drinkwater gaan de grondwaterstanden omlaag en treed er verdroging van natuurgebieden op. Dus er is toen hier in Limburg gezegd, een jaar of 10 geleden van, nou we moeten de grondwaterwinning gaan verminderen en overgaan op de Maas. Die stroomt tenslotte door Limburg heen, dus dat is vrij makkelijk. Toen is er hier een spaarbekken aangelegd. Nou aangelegd, dat was een oud grindgat. Dus er was daar grind gewonnen, dus die put was er toch al. Die is gevuld met Maaswater. Dat Maaswater gaat vervolgens vanuit dat bekken, dat zien we hier, zakt dat vanzelf de grond in. Dat noemen we infiltratie, kunstmatige infiltratie, dat water zakt de grond in waarbij er alvast een heleboel kwaliteitsverbetering optreed. Allerlei stoffen die worden afgefiltreerd tussen het zand van de ondergrond, en de bacterien gaan dood door de lange verblijftijd. Dus je krijgt al een aanzienlijke verbetering van de waterkwaliteit. Dan wordt het water weer opgepompt met behulp van putten, die dan op een bepaalde afstand rond dat bekken zijn opgeplaatst. Dus dan win je eigenlijk een soort kunstmatig grondwater. Je maakt dan eigenlijk van het Maaswater, wat natuurlijk allerlei bacterien en virussen en andere verontreiningen bevat, maak je een soort kunstmatig grondwater. Dat wordt dan weer gewonnen en het wordt vervolgens nog gezuiverd in de zuiveringsinstallatie die we hier zien weergegeven. En dan ging het dus met die transportleiding door heel Limburg heen, naar de verbruikers toe. En tenslotte doen we natuurlijk ook onderzoek, vooral hier op de TU. Als je bij een ingenieursbureau werkt, nou dan heb je niet zoveel onderzoek nodig, dan gebruik je meestal vuistregels en ontwerpcriteria, maar het vakgebied ontwikkelt zich natuurlijk ook steeds verder, er zijn iedere keer weer nieuwe bedreigingen. Momenteel bijvoorbeeld nogal in het nieuws, het voorkomen van geneesmiddelen in de Rijn. De pil die is aantoonbaar in concentraties in de Rijn aanwezig, en komt dat nou ook in het drinkwater terecht en wat moeten we daaraan doen. Moet de zuivering weer uitgebreid worden? Dat soort vragen die leven. En dan zijn we dan met onderzoek bezig. Onderzoek dat gebeurt vaak ter plaatse bij ons. Dit is een plaatje van het veldpracticum in Luxemburg. Zal Huub Savernije misschien afgelopen maandag ook wat over verteld hebben, maar dat is ook heel relevant omdat het ene water het andere niet is. Water is een natuurlijke stof en de verontreinigingen en de stoffen waar het om gaat, ja dat is afhankelijk van de bron. De interactie, de lozing van stoffen die eventueel plaats gevonden hebben. Interactie met de bodem, bladeren en natuurlijke afvalstoffen die in het water terecht komen. Dus ieder water is weer anders en je moet het bij voorkeur ter plaatse doen. Het is niet zo goed mogelijk om te zeggen van, nou ja, ik doe in het laboratorium maar proeven. Nee, je hebt toch altijd weer de toets nodig van de praktijk. Gedraagt het water zich in de praktijk ook zoals we dat theoretisch denken. Sommige dingen gebeuren natuurlijk wel in het lab. Er is hier ook een laboratorium Stevin 3, het waterlaboratorium, waar allerlei opstellingen staan. Filters, bezinkinstallaties, andere proefopstellingen, en daar krijgen jullie later, zullen jullie daar zelf ook practicum doen als je in deze richting door gaat. En uiteindelijk kun je zelfs een promotieonderzoek doen, en in de aula de doktorsbul uitgereikt krijgen. Goed. Dan heb ik nog een kwartier als ik het goed heb. Ja, en die kan ik goed gebruiken voor een stukje om eens even een eerste verhaal vast te geven van wat is er nou bijzonder aan de drinkwatervoorziening in Nederland? Wat moeten jullie daar nou van weten. En ik maak daar gebruik van een presentatie die ik vorig jaar gegeven heb in Canada, voor de Canadeze waterbedrijven, en daar is het weer heel anders. Dus ik heb daar ook echt mijn best gedaan om een beetje duidelijk te maken van, wat is er nou bijzonder in Nederland? En wat zouden jullie in Canada daar nou aan kunnen hebben? En ik denk dat dat voor jullie ook een aardige introductie zou kunnen zijn in het vakgebied. En eigenlijk is dat trouwens al heel kernachtig weergegeven met dit plaatje. Dus dat is een plaatje van een kindje, Joey, die het water uit de kraan drinkt en eigenlijk, zoals dat plaatje hier weergeeft, vertrouwt he. Dus het water moet zo goed zijn, dat je er volledig op kunt vertrouwen. Dat je het zelfs je kinderen laat drinken en dat het boven elke verdenking verheven is. Dat is eigenlijk de kern van de filosofie van de drinkwatervoorziening in Nederland. Dat is natuurlijk ook bij andere landen in zekere zin wel het geval, maar toch veel minder. Ik weet niet of jullie in Amerika en Canada en dat soort landen geweest zijn. Daar is het eigenlijk meer zo dat men het drinkwater, dat heet daar ook tapwater, kraanwater, dat is meer iets dat gebruik je voor de wasmachine, en de WC, dat doen wij ook hoor, maar drinken doe je dat eigenlijk niet, in Canada en Amerika. Als je water wilt drinken, dan ga je een fles kopen bij de supermarkt. Of je zet nog een filter op je kraan, om het water na te zuiveren. En dat noemen we, het consumentenvertrouwen is in die landen dus veel minder dan in Nederland. En dat heeft voor een deel te maken met, ja, cultuur en traditie. In Europa zijn we gewend dat de overheid dingen goed regelt, en in Amerika zijn ze dat veel minder gewend. Daar overstroomt gewoon heel New Orleans, en dan gaan we het weer eens opnieuw opbouwen enzo. Dat doen wij in Nederland ook niet. En zo is dat met drinkwater ook zo. In Nederland is het zo dat we vinden dat we absolute zekerheden moeten hebben dat dat drinkwater wat uit de kraan komt, dat dat er A altijd is, leveringszekerheid, en B dat het altijd goed is, zodat onze kinderen het met een gerust hart kunnen drinken en wij zelf ook. Een plaatje met een aantal kernbegrippen vast, het waterverbruik, het feit dat we gebruik maken van grondwater en oppervlaktewater voor de drinkwatervoorziening. Grondwater is ook in Nederland vaak nog van een hele goede kwaliteit. Een beetje geillustreerd aan dit plaatje van de Veluwe, waar we regenwolken zien, en je kunt je wel voorstellen, als die regen daar op dat enorme zandoppervlak van de Veluwe stroomt. Ja, dat water wordt heel goed gefiltreerd en dat grondwater wat je daar wint, dat is van hele goede kwaliteit natuurlijk, dus grondwater is over het algemeen goed. Er zijn best ook wel zorgen over hoor, zoals hier en daar hebben we natuurlijk, ik geloof niet te weinig zelfs, vuilnisstortplaatsen, en die kunnen het grondwater ook weer verontreinigen. En boeren die gebruiken natuurlijk mest en bestrijdingsmiddelen en dat kan uiteindelijk ook in het grondwater terecht komen, maar gemiddeld gesproken is grondwater toch van een prima kwaliteit, en dus kunnen we ook volstaan met een eenvoudige zuivering. Beluchting en zandfiltratie, daar komen we nog op, dat is meestal wel voldoende. Oppervlaktewater daarentegen, dat is juist het andere eind van het spectrum zou je kunnen zeggen. We zitten in Nederland bij het afvoerputje van Europa. De Rijn en de Maas die zijn door Frankrijk en Duitsland en Belgie gestroomd. Al dat afvalwater is erop geloosd. Dus het oppervlaktewater bevat een volledige cocktail aan alle stoffen die je je kunt voorstellen. Dus oppervlaktewater moet zeer uitgebreid gezuiverd worden. Dat doen we ook in Nederland. Wordt in het buitenland wel eens aangeduid als double Dutch threatment. We hebben heel veel zuiveringsprocessen achter elkaar, om er maar zeker van te zijn dat dat water uiteindelijk Annex D. Subtitling of Collegerama 189 toch goed is. En heel bijzonder in internationaal verband, we gebruiken geen chloor. Amerikanen die vinden het vanzelfsprekend om chloor te gebruiken, het drinkwater smaakt ook naar chloor daar, ruikt ook naar chloor daar. Dat vinden Amerikanen volkomen normaal. En in Nederland zeggen we, nee, dat willen we niet. In de eerste plaats is daar een inhoudelijke reden voor, namelijk we weten dat als je chloor toepast, dat gaat reageren met bepaalde stoffen die van nature in water voorkomen, organische verbindingen, en dan krijg je bepaalde desinfectie nevenproducten noemen we het. Chloroform is het meest bekende voorbeeld. En dat zijn dus ongewenste stoffen. Dat zijn stoffen die giftig kunnen zijn. Nou, daar kun je wel van zeggen van, ik kan daar een bepaalde norm voor stellen, en misschien kan ik net nog aan die norm voldoen, maar dat vinden we in Nederland al voldoende. We zeggen nee, dat zijn ongewenste stoffen, die willen we gewoon niet hebben. Dus we willen chloor gewoon niet gebruiken. Dat is een bepaald essentieel uitgangspunt, wat ook heel veel consequenties heeft hoor, maar wat in Nederland al meer dan 30 jaar gehanteerd wordt, en daar is ook veel aan gedaan, veel onderzoek aan gedaan. Dus dat is denk ik een belangrijk punt al voor dit eerste college om even vast te houden. Chloor leidt gewoon tot die giftige verbindingen en dat moet je daarom niet willen. In ieder geval hebben we dat in Nederland besloten, dat we dat niet willen, en dat doen we dus ook niet. Er is nog een praktisch ander aspect en dat is dat water met chloor naar chloor smaakt, en dat vinden we in Nederland ook niet fijn. We vinden toch dat water wat uit de kraan komt, dat moet lekker smaken, dat moet niet zo'n vieze choorsmaak hebben. Dat is een zwembad chloorsmaak. Dat willen we voor drinkwater niet. Dat is waarschijnlijk ook weer een van die aspecten die met cultuur en consumentenvertrouwen samenhangen. We hebben een aantal principes die we in Nederland gebruiken, en ik heb daar een stuk of 3, 4, sheets voor om die kort even de revue te laten passeren. Nou die focus op het gezondheid daar heb ik al voldoende over gezegd. In Nederland is het ook zo dat we relatief grotere bedrijven hebben, die een soort mengsel zijn van publiek en privaat. Het zijn NV's. Evides, het waterbedrijf wat hier is, dat is een NV, maar de aandelen zijn in handen van de gemeente Rotterdam en de provincie, en andere gemeenten in het voorzieningsgebied. Dus het is eigenlijk een soort overheid, maar net weer niet. Semi-overheid, en dat geeft ook iets bijzonders. Er is vorige week ook een nieuwe hoogleraar benoemd bij TBM, die daar een heel verhaal over heeft, dat dat eigenlijk een ideale formule is. Dat je op die manier zeg maar de waarde van water, water is toch iets wat niet zomaar een marktgoed is, wat je niet zo makkelijk kunt reguleren, zoals andere, zoals auto's en andere dingen, dus water heeft ook iets te maken met, het is van ons allemaal, we moeten het zorgvuldig beheren, en zo'n publieke verantwoordelijkheid binnen een privaatrechtelijke organisatie, een NV, die dus wel efficient werkt, ja, dat is wel iets wat een zekere aantrekkelijke kant heeft. En typisch voor Nederland, we polderen hier heel graag, dus we doen graag dingen samen. Dus die watersector in Nederland is heel goed georganiseerd. Die heeft een gemeenschappelijk researchinstituut opgesteld, KIWA, waar het speurwerk voor de waterbedrijven wordt uitgevoerd, en die hebben een belangenorganisatie opgericht, de VEWIN. En die hebben ook een personenvereniging, de KVWN, waar, als jullie hierin doorgaan, zul je daar allemaal lid van worden. En ja, er is een heel wereldje waarin er goed samengewerkt wordt enzo, informatie uitgewisseld. Dat is ook wel iets bijzonders van Nederland. Is in Nederland ook makkelijker dan in Amerika natuurlijk he. In Amerika kun je niet zo makkelijk even samenwerken tussen Los Angeles en New York. Dat gaat in Nederland allemaal wat makkelijker. Als we naar de opzet van de infrastructuur kijken, dan zijn dit essentiele kenmerken. Om te beginnen de bescherming van de bron, daar moet alles natuurlijk mee beginnen. Niet het paard achter de wagen spannen en met vies water beginnen. Nee, begin altijd met een zo schoon mogelijke bron, en zorg dan ook dat die bron schoon blijft. Daarom zie je overal, in de duinen en in de bosgebieden, zie je van die bordjes staan met grondwaterwinning, grondwaterbeschermingsgebied. Ja, niet verontreinigen als je het kunt voorkomen. Gebruik grondwater als het mogelijk is. Dus in het hele blauwe gebied, het noorden, het oosten en het zuiden van Nederland, wordt alleen maar grondwater gebruikt voor de drinkwatervoorziening. Daar is grondwater beschikbaar, dat is van goede kwaliteit. Dat is microbiologisch betrouwbaar, dus je zal er sowieso nooit ziek van worden, dus dat is de voorkeursbron, die gebruiken we dan dus ook. Nou, in het westen van Nederland kan dat natuurlijk niet. Dat weet jij ook he? - Eeh... Waarom niet? - Eeh, dat weet ik niet meer. Dat ben je vergeten. Maar iemand anders weet het misschien wel, want gezond boeren verstand, kun je ook een hoop mee he. Dus gewoon even nadenken, het is helemaal niet zo moeilijk. - Zeewater. Zeewater, zout, precies he. Dus het grondwater hier is zout. Ontzouten is heel erg duur, dus dat is eigenlijk niet praktisch. Dus ja, hier kun je geen grondwater gebruiken, dus gebruiken we maar oppervlaktewater. En dan doen we dat bij voorkeur, hier in het hele duingebied, door van dat oppervlaktewater, kunstmatig grondwater te maken via die infiltratie. Dus we pompen oppervlaktewater de duinen in, laten dat de bodem in zakken, en dat wordt het een soort kunstmatig grondwater. Wil jij een vraag stellen? - Ja, want ik zie op de waddeneilanden wordt wel grondwater gebruikt, - maar dat is in principe ook een duingebied. Ja. - Maar daar zou toch ook zout in het waterwingebied voorkomen? Ja, dan moet ik even iets meer zeggen dan. In het hele duingebied geldt eigenlijk dat er door, als gevolg van eeuwenlang regen die op die duinen gevallen is, dat er een zoetwaterbel op het zoute water drijft. Dus als je heel voorzichtig dat water wint, kun je wel zoet water winnen. Dat kan snel fout gaan hoor, dus daar moet je echt wel mee oppassen, maar dat kan net wel. En zo is ook de duinwaterwinning in het westen van Zuid Holland en Noord Holland begonnen, in de 19e eeuw, door gewoon eerst duinwater op te pompen. Op een gegeven moment is het daar fout gegaan, kregen ze zout water, en toen zijn ze met dat infiltreren van oppervlaktewater begonnen. Nou, als het helemaal niet anders kan en dat is nou net bij Rotterdam het geval, en daarom gaan we ook bij die Beerenplaat kijken, daar moet je oppervlaktewater gebruiken. Rotterdam heeft geen grondwater, Rotterdam heeft ook geen duinen, dus je moet daar oppervlaktewater gebruiken. Dan moet je dus een hele uitgebreide zuivering hebben, dus dat is ook niet eenvoudig. En daar gaan we kijken. Dus bronbescherming, dat is toch nummer 1. We zien ook regelmatig dat de waterbedrijven berichten in de krant zetten van, deze stof moet verboden worden, hier moeten beperkingen aan gesteld worden. Gewoon zorgen dat wat goed is, goed blijft. Nou, grondwater dat plaatje. Ik denk dat dit toch wel erg illustratief is. Als je een mental map hiervan maakt, van grondwater, dat is eigenlijk regenwater wat door een gigantisch zandfilter gestroomd is. Nou, dat is goed. Oppervlaktewater, nou, bijvoorbeeld bij Scheveningen hier. Er is gewoon in de natuurlijke duinvalleitjes, pompen we Maaswater. Wel na voorzuivering overigens hoor, want anders verstoppen die duinvalleien meteen, en dan verontreinigen we het duinmilieu, dat willen we natuurlijk niet. Dus het water wordt eerst voorgezuiverd, dan de duinen in gepompt, dan zakt het de bodem in. Dan winnen we het weer terug met putjes die her en der in die duinen ook geplaatst zijn. Dan gaat het naar de nazuivering toe, die we hier zien staan, en dan vervolgens het distributienet in. Oppervlaktewater hebben we meervoudige barrieres. Op dit moment volstaat eigenlijk om zo'n stroomschema te zien, en te zien dat daar een heleboel stappen achter elkaar zitten. We hebben gewoon veel afzonderlijke zuiveringsprocessen. Enerzijds om er zeker van te zijn dat als de ene iets minder werkt, dat de andere het wel opvangt. Veiligheid, robuustheid, is heel belangrijk. En anderzijds, om verschillende soorten stoffen met verschillende zuiveringssystemen tegen te kunnen houden. Dus het gaat altijd om vrij uitgebreide zuiveringsschema's als het over oppervlaktewater gaat. Daar gebruiken we ook moderne technologie bij. Dus dat zijn dan weer ontwikkelingen die de afgelopen decennia zeg maar mogelijk geworden zijn. Hier zien we de membraanfiltratie-installatie bij Heemskerk. Dat is de modernste en grootste zuivering van dit type in Europa. Is in Nederland ontwikkeld. We zien hier de desinfectie met UV licht. Dus dat zijn eigenlijk gewoon TL buizen zou je je kunnen voorstellen, maar die stralen dan UV licht uit. En bacterien die kunnen daar niet tegen, die gaan daar dood van. Dus dat is een goede manier om desinfectie van dat water te bewerkstelligen. Nou, dat is 2 jaar geleden geopend in aanwezigheid van de Prins en ook dat is weer een Nederlandse ontwikkeling, om de zuivering weer beter te krijgen. Nou, het resultaat daarvan is dan dat we dus... Aan de kraan, als we die openzetten, dan komt er water van een hoge kwaliteit uit he, zuiver water, wat geen verontreinigingen bevat, en ook geen chloor. Wat ook zacht is. Het water is ook onthard in Nederland. Daar komen we nog later op terug. En uiteindelijk is het resultaat mede daardoor, dat we in Nederland ook helemaal geen flessenwater gebruiken. Althans, heel weinig. En dat is uiteindelijk weer, als je er macro-economisch naar kijkt, of zelfs naar de individuele klant, is dat gewoon een hele verstandige zaak, want flessenwater is vele malen duurder dan drinkwater. Het is 500 keer duurder. Het is ook veel slechter voor het milieu. Het milieubeslag van flessenwater, daar zijn eens een keer sommetjes van gemaakt, met ecopunten enzo, van die flessen moeten allemaal over de weg vervoerd worden met vrachtwagens, en die moeten weer schoongemaakt worden enzovoort. Als je die sommetjes maakt, dan is het milieubeslag van flessenwater 30 keer zo hoog als van drinkwater. Dus als je even op de achterkant van een sigarendoos een sommetje maakt, van wat nou de Nederlander voor water kwijt is, en de Italiaan, dan is de Italiaan 2 tot 3 keer zoveel kwijt voor water dan de Nederlander. En dat zit hem vooral in het feit dat men flessenwater gebruikt. De drinkwatervoorziening zelf, de kosten daarvan, zijn min of meer vergelijkbaar, want dat is ook een kenmerk van dit soort grootschalige infrastructuur. Om iets goed te doen, meervoudige zuiveringen, veilige systemen maken, dat is niet zo heel veel duurder dan om het slecht te doen. Want het merendeel van de kosten zit er toch in dat je moet beginnen met een winning te maken, je moet een zuivering hebben, je moet transportleidingen, distributieleidingen, heel veel van die kosten die heb je sowieso. En als je het goed doet, is niet veel duurder dan als je het slecht doet. Ik denk dat ik er ben, oh ja, we hebben ook nog andere dingen, dus we hebben het laagste lekpercentage van de wereld, en hele betrouwbare systemen en we letten tegenwoordig natuurlijk in Nederland op waterbesparing. Water is toch een natuurlijke grondstof, dat moet je niet verspillen. Dus het waterverbruik in Nederland stijgt niet, is relatief constant, en het huishoudelijk waterverbruik daalt zelfs, doordat we tegenwoordig waterbesparende toiletten en douches en wasmachines enzo hebben, en die worden ook allemaal gestimuleerd, krijg je subsidie op enzovoort. We zijn er allemaal verantwoord mee bezig. En dan hebben we de laatste dia. Dus het uiteindelijke resultaat van die hele filosofie en de dingen die daaraan gedaan zijn de afgelopen 30 jaar, is dus dat we zeggen van, we hebben het wonder uit de kraan. Dat was een reclamekreet van de waterbedrijven een aantal jaren geleden. Werd toen posters van gemaakt en reclame op radio en TV. Het wonder uit de kraan, heel goed water, het is er altijd. We worden er niet ziek van, geen verontreinigingen. We hebben geen flessenwater nodig. We hebben geen filters aan de kraan nodig. We verspillen het water niet. Dus we hebben de zaken goed voor elkaar. Nou, dat is enerzijds natuurlijk een beetje een gechargeerd beeld. Er zijn best wel dingen die nog beter kunnen en beter moeten, en daar komen we ook wel op terug, maar qua filosofie zeker in vergelijking met Canada en Amerika bijvoorbeeld, is dat gewoon zo. En aan de andere kant moeten we ons ook realiseren dat dat natuurlijk ook heel anders kan, en in heel veel landen ook heel anders gaat. En het meest extreme voorbeeld daarvan zijn natuurlijk de ontwikkelingslanden, waar gewoon de basisinfrastructuur nog volledig ontbreekt, en daar gaat na de pauze Doris over vertellen. We gaan even pauzeren, bedankt. 190 Annex D. Subtitling of Collegerama Annex D2. Transcript lecture CT3011 (sorted) Na een ruime inlooptijd kunnen we beginnen met het tweede deel van 30-11, Watermanagement. Het deel over gezondheidstechniek ga ik de komende zeven weken met jullie doornemen. En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus, mijn naam is Hans van Dijk, zoals jullie daar zien staan en ik dacht, laat ik daar maar twee dingen voor nemen, mijn hobby en mijn werk. Nou de hobby dat zien jullie, ik ben een marathonloper. Een mooie foto van de glorieuze binnenkomst in Rotterdam in april afgelopen periode. Marathonlopers dat zijn allemaal een beetje fanatieke lui he, echte doordouwers, die trainen iedere dag. Die weten hun leven zodanig te organiseren dat dat allemaal kan. Dus ik loop hier ook iedere dag tussen de middag een rondje naar Delfts hout, of langs de Schie, of een ander parcour hier. Als jullie me eens een keer in korte broek of trainingspak zien lopen dan klopt dat, dat ben ik. En dat doe ik inmiddels met een heel groepje mensen, bij ons op de afdeling, met studenten en promovendi. En een van die studenten is hier weergegeven, dat is Karin Teunissen. Die zat drie jaar geleden hier bij inleiding watermanagement. Was toen derde jaars, inmiddels is ze afgestudeerd en begonnen met een promotieonderzoek bij het duinwaterbedrijf in Scheveningen. En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april 42 kilometer samen gelopen. Nou dat is een herinnering die ons beide in het geheugen gegrift zal blijven. Dan het werk. Ik heb, ik ben, ja een vraag, ben jij ook een hardloper? - Sorry? Ben jij ook een hardloper? - Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al gegeven. Nee - Niet? Dan weet ik niet hoe ik dit al wist, maar... Nou ik zeg dit wel eens vaker, dus dat zou best kunnen. Waar ben je geweest? - Ja volgens mij vorig jaar, maar... Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha. Maar deze foto is echt van april hoor dus dat is toch vrij recent. Wat misschien zou kunnen zijn is, ik geef ook altijd een van de gastcolleges bij inleiding Civiele Techniek in het eerste jaar. En daar begin ik natuurlijk ook een beetje met, ja wie ben ik, dus dat zou best kunnen, dat je het daarvan herinnert. Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd. Ik heb toen ook Civiele Techniek gestudeerd, in '76 afgestudeerd. Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in Amersfoort, en dat kan ik jullie van harte aanraden als je straks afgestudeerd bent om bij een ingenieursbureau te gaan werken. Dat is een geweldige ervaring, je bent met allerlei projecten over de hele wereld bezig. In mijn geval dan drinkwater projecten. Dus het ontwerpen van zuiveringsinstallaties, bouwen van systemen, ook het doen van onderzoek. Eigenlijk kun je alle kanten op bij een ingenieursbureau en de Nederlandse ingenieursbureaus zijn redelijk succesvol, ook op de internationale markt tegenwoordig. Ja ik heb daar vele jaren gewerkt, totdat op een gegeven moment, inmiddels is dat alweer 17 jaar geleden, er een advertentie stond dat we een hoogleraar zochten hier in Delft. En toen dacht ik van, nou ja, laat ik maar eens een brief schrijven, je weet het nooit, niet geschoten is altijd mis. Dus ik heb een brief geschreven en ik dacht, ik zal het vast wel niet worden, maar ik werd het wel. Dus ook daar zit al meteen een eerste levensles in, probeer maar eens wat en het kan altijd meevallen. Ik ben in eerste instantie vervolgens voor een dag in de week hier deeltijdhoogleraar geworden in de drinkwatervoorziening, dat is mijn leerstoel. En ja, zo langzamerhand van het een komt het ander, je wordt voor steeds meer dingen gevraagd. Dus ik ben langzamerhand meer dingen hier in Delft gaan doen en die aanstelling bij DHV heb ik steeds verder afgebouwd, en vanaf 1999 ben ik volledig gestopt bij DHV en ben ik hier voltijd hoogleraar. En voltijd hoogleraar dat betekent ook, je hebt enerzijds taken op het gebied van onderwijs, anderzijds onderzoek, maar ook management, dus management, ja, dan moet je, ik ben hoofd van een afdeling enzo en dan zit je in het managementteam of in de opleidingscommissie. Annex D. Subtitling of Collegerama Moet je over algemene dingen meepraten en beslissen. Daar kun je natuurlijk een dagtaak van maken, dat heb ik altijd vermeden. Ik vind het toch altijd het leukste om met het vak bezig te zijn en daarmee kom ik op het tweede plaatje wat hier staat, want het allerleukste is eigenlijk afstudeerders begeleiden. Dat gaan jullie de komende jaren dat proces doormaken. Dat is voor ons altijd ontzettend leuk om te zien hoe studenten zich transformeren van min of meer anonieme figuren die in de collegezaal zitten en zitten te luisteren. Min of meer absorberen wat ik in een monoloog aan het overdragen ben. Hoewel ik overigens wel reacties van jullie zeer op prijs stel hoor en ik zal daar ook af en toe expliciet om vragen. Maar goed, de praktijk is toch dat in deze fase van de studie zitten jullie nog vooral te luisteren en dat wordt eigenlijk steeds leuker als je verder komt in het vierde en het vijfde jaar en het hoogtepunt is dan natuurlijk het afstuderen, waar je echt een onderwerp helemaal zelf bij de kop pakt. Ik zeg ook altijd tegen mijn afstudeerders, je moet van je afstudeerproject je visitekaartje maken, he Doris, en dat werkt ook echt zo. Op het moment dat je klaar bent met dat afstudeeronderwerp dan weet jij het meeste van dat onderwerp af. Meer dan wie dan ook in Nederland. Dat bewijzen we ook iedere keer weer door de afstudeercolloqui. Daar geven we veel kenbaarheid aan, daar komen altijd mensen vanuit de waterbedrijven van KIWA, van andere researchinstituten. Die doen daar mee in de discussies en onze afstudeerders die weten keer op keer alle vragen te beantwoorden. Misschien niet altijd 100% goed, maar toch wel 99% goed. Dat is altijd een genoegen om mee te maken. Ik zeg ook altijd dat ik trots ben op mijn afstudeerders, en dat is ook zo. Ik heb er inmiddels een stuk of 80 gehad en soms gaat het dan heel goed, zoals hier staat met Karin en Doris, Doris is hier trouwens in de zaal aanwezig, die dan het afgelopen jaar allebei zelfs met lof zijn afgestudeerd. Dat betekent dus dat je het heel goed gedaan hebt, hoge cijfers gehaald hebt, en ook het afstudeerproject heel goed gedaan hebt. Ja, dat is voor ons gewoon heerlijk om dat mee te maken. Om te zien hoe jonge mensen het vak ook leuk gaan vinden, zelf ook enthousiast worden, en hun stempel gaan zetten op ons vakgebied. En ik hoop dat enkele van jullie ook zo ver zullen komen. Goed, dat is wat mijzelf betreft. Dan wat dit vak betreft. We gaan dat doen aan de hand van het boek, dat staat al op blackboard aangegeven. Daar hebben we een Nederlandse en een Engelstalige versie van. Dat boek dat moeten jullie kopen bij de secretaresse van ons, Mieke op de vierde verdieping, voor 25 euro. In de winkel kost het 50 euro, maar wij hebben een speciale kortingsregeling. Jullie mogen zelf weten of je het Nederlandse of het Engelse boek koopt. De inhoud is vrijwel hetzelfde en in ieder geval voldoende voor dit vak. Als jullie een advies van mij willen hebben dan zou ik zeggen, als je goed Engels kunt lezen, koop het Engelse boek, dat is iets actueler, staat iets meer informatie in, maar het Nederlandse boek is voor dit vak zeker voldoende. Ja, zo'n boek heeft natuurlijk, behalve dat we er over gaan vragen bij het tentamen, daar zal ik bij mijn volgende dia op terugkomen, heeft zo'n boek natuurlijk ook nog een zekere functie als naslagwerk. Als je zo'n boek eenmaal hebt, dan heb je dat bij je, ook na je afstuderen neem je dat mee. Als je vervolgens ergens in een vreemd land een installatie moet ontwerpen, dan haal je dat boek weer eens uit de tas en dan weet je weer het een en ander. Die functie heeft zo'n boek ook. Daar staan vraagstukken ook in, in dat boek, en we hebben ook vraagstukken op blackboard staan. Dat zullen jullie misschien ook al gezien hebben, computer assignments. Dat is overigens niet verplicht, er is bij ons niets verplicht. Ja, jullie moeten uiteindelijk het tentamen doen, maar we bieden materiaal aan, 191 dus maak er gebruik van zou ik zeggen maar we gaan dat niet controleren. Er staan daar vragen op blackboard, er zitten vragen in dat boek, de antwoorden staan er ook bij, of althans, als je die computer assignment gemaakt hebt dan krijg je na afloop te melden welke vragen goed waren en welke vragen fout waren. Dus dat is een ondersteuning voor jullie bij het kennismaken met de materie en het leren van de stof. En oude tentamens hebben we daar ook bij staan, dus dan kun je ook nog eens oefenen en kijken wat er ongeveer gevraagd wordt. En dan gaan we college geven de komende periode. Oh ja, dus over het boek, jullie hoeven niet het hele boek te kennen. Dat boek wordt zowel gebruikt bij 30-11, als bij het volgende college 34-20, wat een a keuzevak is voor de mensen die watermanagement gaan doen, en de hoofdstukken die voor 30-11 gevraagd worden op het tentamen staan hier aangegeven. En die presentatie komt ook weer op blackboard zoals jullie weten, inclusief deze video opname. Dan gaan we deze colleges geven, dus 7 keer de komende periode vanaf nu, en ik wil het dit jaar zo doen dat in het eerste uur vertel ik een beetje de grote lijn van het betreffende onderwerp. De belangrijkste punten, ik probeer daar wat kleuring aan te geven. Wat is nou belangrijk en wat minder. En het tweede uur heb ik steeds een van de promovendi, vandaag is dat Doris, die dan iets gaan vertellen over hun eigen onderwerp, hun eigen onderzoek, hun eigen project, wat een stukje actualiteit geeft, en kleuring, verdieping, van het betreffende onderwerp. En ik heb het zo georganiseerd dat dat steeds, als het goed is, goed op elkaar aansluit en jullie een goed beeld geven van de stof, zodat je straks het tentamen ook makkelijk kunt maken. Dat wil niet zeggen dat alle onderdelen van de verhalen van de promovendi tentamenstof zijn. Dat zullen we zo her en der ook wel aangeven. Ja, zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan jullie nu in het derde jaar hoeven te weten, maar het gaat meer om de beeldvorming, de kleuring en het begrip van de materie. Dan hebben we een excursie gepland naar de Berenplaat, de grote zuiveringsinstallatie bij Rotterdam, bij Spijkenisse om precies te zijn, op 11 oktober. Ook dat is niet verplicht, alles is facultatief bij ons. Daar hebben zich tot nu toe een stuk of 60 mensen aangemeld. De inschrijving sluit op 1 oktober hebben we gezegd, omdat bij de waterbedrijven tegenwoordig ook strikte veiligheidsvereisten enzo zijn na de aanslagen in New York. Je moet daar precies opgeven wie er allemaal komen, met naam enzo en wij moeten daar voor instaan ook, dat er geen vervelende dingen gebeuren, en er moeten natuurlijk ook bussen gereserveerd worden en we krijgen daar lunch geserveerd. Dus de mensen die zich opgegeven hebben die krijgen nog een mailtje binnenkort, kort na 1 oktober, met een bevestiging, en degene die zich niet opgegeven hebben die gaan niet mee. En ik ga er ook van uit dat degenen die zich wel opgegeven hebben, dat die ook komen he, het is natuurlijk een beetje vervelend tegenover de organisatoren als we daar met veel minder mensen zouden aankomen dan we aangemeld hebben. We zullen proberen, ik heb wat vragen gekregen over dat er 's middags verplichte practica zouden zijn van constructieleer en statistiek geloof ik, dus we zullen proberen om tijdig weer terug te zijn. Dat zal zeker niet om half 2 zijn, dus ik denk dat we ongeveer om half 3 terug zullen zijn, en we vertrekken gewoon na het college op donderdag, dus om half 11. Ik weet niet of, even kijken of ik al ga beginnen, ja ik ga al beginnen dus, zijn er vragen over de organisatie en deze algemene inleiding? Okee. Nou dan ga ik kort even iets vertellen over gezondheidstechniek, dat zal jou ook bekend voor komen want dat heb ik ook bij het eerste jaar al verteld, en dan ga ik iets meer vertellen over de drinkwatervoorziening van Nederland en na de pauze gaat Doris dan iets vertellen over de drinkwatervoorziening in ontwikkelingslanden, want daar is zij vooral mee bezig. We hadden natuurlijk gezondheidstechniek. 192 Nou dat zal ieder van jullie niet onbekend zijn, dat dat gaat over de stedelijke waterkringloop, dus de infrastructurele werken voor de voorziening van drinkwater, het winnen van grondwater, het winnen van oppervlaktewater, het zuiveren daarvan, het vervolgens transporteren met een heel transportleidingen en distributieleidingensysteem naar ons allen toe. Naar de huishoudens en de industrieen, de bedrijven, vervolgens het inzamelen van het afvalwater via de riolering. Het zuiveren van dat afvalwater en dat wordt dan vervolgens weer geloosd op het oppervlaktewater. Dus alle infrastructurele werken die over die kleine stedelijke waterkringloop gaan, dat is wat we gezondheidstechniek noemen, en ik zal hier vooral focussen op de drinkwatervoorziening, omdat we daar ook het meest duidelijke effect zien zoals hier in deze figuur weergegeven. Het verdwijnen van besmettelijke ziekten in Nederland, doordat die niet meer overgedragen worden via besmet drinkwater. In de rest van de wereld is dat natuurlijk nog een hele andere situatie, maar hier hebben we daar flink veel succes mee gehad in de 20e eeuw. We zien hier een plaatje dat weergeeft de daling van de sterfte aan buiktyfus in de 20e eeuw, en dat loopt parallel aan het percentage van de mensen wat niet aangesloten is op de drinkwatervoorziening, in diezelfde periode is in Nederland de drinkwatervoorziening aangelegd. Rond 1900, zelfs kort voor 1900, de grote steden en zo langzamerhand ook de kleinere steden en het platteland, en vanaf 1975 zeg maar, is in Nederland iedereen op de drinkwatervoorziening aangesloten en komen besmettelijke ziekten die door besmet drinkwater overgedragen worden ook niet meer voor. Dus het gaat bij ons om infrastructurele werken voor een goede waterkwaliteit, dus zaken als waterwinning, waterzuivering, watertransport, waterchemie en microbiologie ook, die waterkwaliteit. Microbiologie, enerzijds het afwezig zijn van organismen waar we ziek van kunnen worden maar anderzijds ook het gebruiken van micro organismen om de zuivering te optimaliseren. Micro organismen kunnen ook weer verontreinigingen afbreken, bekendste voorbeeld daarvan is de afvalwaterzuivering waar we met behulp van zuurstof en actief slib, dat is een mengsel van bacterien, de afvalstoffen in het afvalwater laten afbreken. Dus waterkwaliteit, waterchemie en microbiologie zijn in dit deel van de civiele techniek vrij belangrijk. We maken natuurlijk ook gebruik van de algemene kennis van civiele ingenieurs en met name dan van zaken als hydraulica, hydrologie, constructieleer, constructieve vormgeving, projectrealisatie, informatica, zijn natuurlijk allemaal dingen die je in projecten nodig hebt. Vaak ook in teamverband, bij zo'n ingenieursbureau bijvoorbeeld. De een is meer bezig met de automatisering, de ander is meer bezig met het constructieve deel, een derde is weer met de hydraulica bezig, en jullie kunnen afhankelijk van de specialisatie die je kiest daar een verschillende rol in spelen. Die gezondheidstechniek is natuurlijk van groot belang voor de volksgezondheid, dat spreekt voor zich. Het gaat over relatief grootschalige infrastructurele werken, we zien hier de zogenaamde Biesbosch bekkens. Dat is in de Brabantse Biesbosch. Bekkens die aangelegd zijn voor de drinkwatervoorziening, en het gaat om een goed georganiseerde sector met heldere taken. Er is zelfs een aparte wetgeving voor, de waterleidingwet, als het over de drinkwatervoorziening gaat, waarin gewoon staat precies waar alles aan moet voldoen, en dat de directeur van het waterleidingbedrijf daar persoonlijk voor aansprakelijk is. Die riskeert gevangenisstraf als die onvoldoende water of water distribueert waar je ziek van kan worden. Dus dat is allemaal goed georganiseerd. En we doen daar in Delft een hoop aan, dus die leerstoel van mij, de leerstoel drinkwatervoorziening, is de enige leerstoel in Nederland op het gebied van de drinkwatervoorziening. Dus dat is wel fijn, geeft ons een zekere exclusiviteit. Veel van onze studenten die zijn dus ook, ja die hebben toonaangevende posities in die vakwereld, die zijn directeur of staffunctionaris, of ontwerper bij de waterbedrijven, en ook veel van onze ingenieurs gaan naar de ingenieursbureau's toe. Nou, daar gebeurt een heleboel, af en toe hebben we zelfs ook gastcolleges van Willem Alexander die ook het watermanagement interessant vindt. Annex D. Subtitling of Collegerama Nou dan heb ik tenslotte nog drie dia's voordat ik wat meer ga vertellen over de opzet van de infrastructuur in Nederland, die nog even wat illustreren van dat werk van ons vakgebied. Dus dit plaatje dat heb jij ook al gezien he, dus jij kan mij nu ook vertellen waar dit dipje vandaan komt? - Volgens heeft dat iets met de pauze te maken. Ja precies, dus dit is het waterverbruik tijdens massa events, in dit geval de WK voetbal, en nu zie je dat we ons allemaal als kuddedieren gedragen. Dat vanaf het begin van de wedstrijd, dat is hier, het waterverbruik enorm naar beneden gaat. Niemand gebruikt meer water, iedereen zit voor de TV, zit te kijken. Totdat het rust is, dan rennen we allemaal naar de WC en naar de koffieautomaat, dan hebben we een enorme stijging in het waterverbruik. In de tweede helft gaat weer iedereen kijken, zien we weer een zeer lage piek in het waterverbruik met zelfs een minimum kort voor de tijd toen dat beslissende doelpunt, in dit geval door Dennis Bergkamp gemaakt werd, en aan het einde van de wedstrijd rent iedereen weer naar de WC toe. En datzelfde zie je dus ook bij het industriele verbruik he. Zelfs daar is het zo dat operators enzo, die zitten ook te kijken, en alles zit toch een beetje op halve kracht te draaien. Dat is een enorm reproduceerbaar fenomeen, deze curves, soort electrocardiogrammen van ons gedrag. Het gedrag van de bevolking. En dit dipje, dat noemen we inderdaad de Cruijff dip. Dat is het moment tijdens de pauze waarop Cruijff commentaar komt geven. Dan rent iedereen weer even terug van de WC om even te luisteren wat Cruijff te zeggen heeft en dan worden vaak ook de doelpunten herhaald, en dan kijken we allemaal weer eventjes naar de TV. We zijn natuurlijk vooral bezig met ontwerpen. Het gaat natuurlijk vaak om nieuwe infrastructurele werken. De bouw van een pompstation, het ontwerp van een zuiveringsinstallatie en transportleiding en ontwerpen daar hebben jullie natuurlijk al veel over gehad bij projectonderwijs, en het ontwerponderwijs. Dat is schematiseren. Een bepaald kader in je hoofd maken van hoe iets in elkaar zit. Dus een filter, hoe schematiseren we dat nou, en hoe stroomt het water door een installatie heen. De hydraulische lijn. Daar moeten we een bepaald schema van maken. Daar moeten we formules op kunnen toelaten. Dat moeten we kunnen berekenen. En daar moeten we vooral ook geen fouten bij maken, daar is dit plaatje voor bedoelt. Een van de koolfilterinstallaties bij de drinkwaterleiding van Rotterdam, bij Kralingen, langs de Drienernoordbrug, waar toendertijd een keer waterslag is opgetreden, met als gevolg implosie van dat koolfilter, en dat is natuurlijk heel vervelend. Vaak loop je daar dan ook tegenaan dat je met die wet van Murphy te maken hebt, dat alles wat fout kan gaan dat gaat ook een keer. Dus waterslag dat is het verschijnsel dat als bijvoorbeeld een pomp afslaat, dat er een onderdrukgolf kan ontstaan en die onderdruk die kan dus inderdaad tot implosie leiden. Nou dat kan je natuurlijk voorkomen door een ontluchting beluchtingsventiel aan te brengen. Dat is hier ook gedaan, bovenop dat koolfilter zat zo'n ventiel, maar helaas was het net op het moment dat die pomp hier uitviel, ten gevolge van een stroomstoring, was het ook winter en was het een hele strenge vorst en was dat ontluchtingsventiel bevroren, waardoor er geen lucht meer kon toetreden en er dus toch vacuum ontstond in dat vat, en ja, dit resultaat optrad. Dus ontwerpen is vooral ook bewust zijn van dingen die mis kunnen gaan, vandaar ook dat hydraulica ook vrij belangrijk is. Het is natuurlijk heel vervelend als het water ergens uit spuit of de verkeerde kant op gaat, dus je moet vooral ook steeds alert zijn op dingen die fout kunnen gaan en ontwerpen is vooral ook ervaring. Dingen gezien hebben, hoe doe je het in de praktijk nou? Vandaar ook dat we die excursie gepland hebben naar de Beerenplaat toe, dan kunnen jullie voor de eerste keer vast eens even kijken van, ja, hoe ziet zo'n installatie er nou uit, waar moet je nou allemaal rekening mee houden? Alright, nou, nog een paar plaatjes van een ander project, in Limburg in dit geval, waar een grote transportleiding is aangelegd Annex D. Subtitling of Collegerama bij een oppervlaktewaterproject in Panheel. Dat was in het kader van de zogenaamde verdrogingsdiscussie. Dat is een discussie die in Nederland een aantal jaren gevoerd is, onder andere door de winning van drinkwater gaan de grondwaterstanden omlaag en treed er verdroging van natuurgebieden op. Dus er is toen hier in Limburg gezegd, een jaar of 10 geleden van, nou we moeten de grondwaterwinning gaan verminderen en overgaan op de Maas. Die stroomt tenslotte door Limburg heen, dus dat is vrij makkelijk. Toen is er hier een spaarbekken aangelegd. Nou aangelegd, dat was een oud grindgat. Dus er was daar grind gewonnen, dus die put was er toch al. Die is gevuld met Maaswater. Dat Maaswater gaat vervolgens vanuit dat bekken, dat zien we hier, zakt dat vanzelf de grond in. Dat noemen we infiltratie, kunstmatige infiltratie, dat water zakt de grond in waarbij er alvast een heleboel kwaliteitsverbetering optreed. Allerlei stoffen die worden afgefiltreerd tussen het zand van de ondergrond, en de bacterien gaan dood door de lange verblijftijd. Dus je krijgt al een aanzienlijke verbetering van de waterkwaliteit. Dan wordt het water weer opgepompt met behulp van putten, die dan op een bepaalde afstand rond dat bekken zijn opgeplaatst. Dus dan win je eigenlijk een soort kunstmatig grondwater. Je maakt dan eigenlijk van het Maaswater, wat natuurlijk allerlei bacterien en virussen en andere verontreiningen bevat, maak je een soort kunstmatig grondwater. Dat wordt dan weer gewonnen en het wordt vervolgens nog gezuiverd in de zuiveringsinstallatie die we hier zien weergegeven. En dan ging het dus met die transportleiding door heel Limburg heen, naar de verbruikers toe. En tenslotte doen we natuurlijk ook onderzoek, vooral hier op de TU. Als je bij een ingenieursbureau werkt, nou dan heb je niet zoveel onderzoek nodig, dan gebruik je meestal vuistregels en ontwerpcriteria, maar het vakgebied ontwikkelt zich natuurlijk ook steeds verder, er zijn iedere keer weer nieuwe bedreigingen. Momenteel bijvoorbeeld nogal in het nieuws, het voorkomen van geneesmiddelen in de Rijn. De pil die is aantoonbaar in concentraties in de Rijn aanwezig, en komt dat nou ook in het drinkwater terecht en wat moeten we daaraan doen. Moet de zuivering weer uitgebreid worden? Dat soort vragen die leven. En dan zijn we dan met onderzoek bezig. Onderzoek dat gebeurt vaak ter plaatse bij ons. Dit is een plaatje van het veldpracticum in Luxemburg. Zal Huub Savernije misschien afgelopen maandag ook wat over verteld hebben, maar dat is ook heel relevant omdat het ene water het andere niet is. Water is een natuurlijke stof en de verontreinigingen en de stoffen waar het om gaat, ja dat is afhankelijk van de bron. De interactie, de lozing van stoffen die eventueel plaats gevonden hebben. Interactie met de bodem, bladeren en natuurlijke afvalstoffen die in het water terecht komen. Dus ieder water is weer anders en je moet het bij voorkeur ter plaatse doen. Het is niet zo goed mogelijk om te zeggen van, nou ja, ik doe in het laboratorium maar proeven. Nee, je hebt toch altijd weer de toets nodig van de praktijk. Gedraagt het water zich in de praktijk ook zoals we dat theoretisch denken. Sommige dingen gebeuren natuurlijk wel in het lab. Er is hier ook een laboratorium Stevin 3, het waterlaboratorium, waar allerlei opstellingen staan. Filters, bezinkinstallaties, andere proefopstellingen, en daar krijgen jullie later, zullen jullie daar zelf ook practicum doen als je in deze richting door gaat. En uiteindelijk kun je zelfs een promotieonderzoek doen, en in de aula de doktorsbul uitgereikt krijgen. Goed. Dan heb ik nog een kwartier als ik het goed heb. Ja, en die kan ik goed gebruiken voor een stukje om eens even een eerste verhaal vast te geven van wat is er nou bijzonder aan de drinkwatervoorziening in Nederland? Wat moeten jullie daar nou van weten. En ik maak daar gebruik van een presentatie die ik vorig jaar gegeven heb in Canada, voor de Canadeze waterbedrijven, en daar is het weer heel anders. Dus ik heb daar ook echt mijn best gedaan om een beetje duidelijk te maken van, wat is er nou bijzonder in Nederland? En wat zouden jullie in Canada daar nou aan kunnen hebben? En ik denk dat dat voor jullie ook een aardige introductie zou kunnen zijn in het vakgebied. En eigenlijk is dat trouwens al heel kernachtig weergegeven met dit plaatje. Dus dat is een plaatje van een kindje, Joey, 193 die het water uit de kraan drinkt en eigenlijk, zoals dat plaatje hier weergeeft, vertrouwt he. Dus het water moet zo goed zijn, dat je er volledig op kunt vertrouwen. Dat je het zelfs je kinderen laat drinken en dat het boven elke verdenking verheven is. Dat is eigenlijk de kern van de filosofie van de drinkwatervoorziening in Nederland. Dat is natuurlijk ook bij andere landen in zekere zin wel het geval, maar toch veel minder. Ik weet niet of jullie in Amerika en Canada en dat soort landen geweest zijn. Daar is het eigenlijk meer zo dat men het drinkwater, dat heet daar ook tapwater, kraanwater, dat is meer iets dat gebruik je voor de wasmachine, en de WC, dat doen wij ook hoor, maar drinken doe je dat eigenlijk niet, in Canada en Amerika. Als je water wilt drinken, dan ga je een fles kopen bij de supermarkt. Of je zet nog een filter op je kraan, om het water na te zuiveren. En dat noemen we, het consumentenvertrouwen is in die landen dus veel minder dan in Nederland. En dat heeft voor een deel te maken met, ja, cultuur en traditie. In Europa zijn we gewend dat de overheid dingen goed regelt, en in Amerika zijn ze dat veel minder gewend. Daar overstroomt gewoon heel New Orleans, en dan gaan we het weer eens opnieuw opbouwen enzo. Dat doen wij in Nederland ook niet. En zo is dat met drinkwater ook zo. In Nederland is het zo dat we vinden dat we absolute zekerheden moeten hebben dat dat drinkwater wat uit de kraan komt, dat dat er A altijd is, leveringszekerheid, en B dat het altijd goed is, zodat onze kinderen het met een gerust hart kunnen drinken en wij zelf ook. Een plaatje met een aantal kernbegrippen vast, het waterverbruik, het feit dat we gebruik maken van grondwater en oppervlaktewater voor de drinkwatervoorziening. Grondwater is ook in Nederland vaak nog van een hele goede kwaliteit. Een beetje geillustreerd aan dit plaatje van de Veluwe, waar we regenwolken zien, en je kunt je wel voorstellen, als die regen daar op dat enorme zandoppervlak van de Veluwe stroomt. Ja, dat water wordt heel goed gefiltreerd en dat grondwater wat je daar wint, dat is van hele goede kwaliteit natuurlijk, dus grondwater is over het algemeen goed. Er zijn best ook wel zorgen over hoor, zoals hier en daar hebben we natuurlijk, ik geloof niet te weinig zelfs, vuilnisstortplaatsen, en die kunnen het grondwater ook weer verontreinigen. En boeren die gebruiken natuurlijk mest en bestrijdingsmiddelen en dat kan uiteindelijk ook in het grondwater terecht komen, maar gemiddeld gesproken is grondwater toch van een prima kwaliteit, en dus kunnen we ook volstaan met een eenvoudige zuivering. Beluchting en zandfiltratie, daar komen we nog op, dat is meestal wel voldoende. Oppervlaktewater daarentegen, dat is juist het andere eind van het spectrum zou je kunnen zeggen. We zitten in Nederland bij het afvoerputje van Europa. De Rijn en de Maas die zijn door Frankrijk en Duitsland en Belgie gestroomd. Al dat afvalwater is erop geloosd. Dus het oppervlaktewater bevat een volledige cocktail aan alle stoffen die je je kunt voorstellen. Dus oppervlaktewater moet zeer uitgebreid gezuiverd worden. Dat doen we ook in Nederland. Wordt in het buitenland wel eens aangeduid als double Dutch threatment. We hebben heel veel zuiveringsprocessen achter elkaar, om er maar zeker van te zijn dat dat water uiteindelijk toch goed is. En heel bijzonder in internationaal verband, we gebruiken geen chloor. Amerikanen die vinden het vanzelfsprekend om chloor te gebruiken, het drinkwater smaakt ook naar chloor daar, ruikt ook naar chloor daar. Dat vinden Amerikanen volkomen normaal. En in Nederland zeggen we, nee, dat willen we niet. In de eerste plaats is daar een inhoudelijke reden voor, namelijk we weten dat als je chloor toepast, dat gaat reageren met bepaalde stoffen die van nature in water voorkomen, organische verbindingen, en dan krijg je bepaalde desinfectie nevenproducten noemen we het. Chloroform is het meest bekende voorbeeld. En dat zijn dus ongewenste stoffen. Dat zijn stoffen die giftig kunnen zijn. Nou, daar kun je wel van zeggen van, ik kan daar een bepaalde norm voor stellen, en misschien kan ik net nog aan die norm voldoen, maar dat vinden we in Nederland al voldoende. We zeggen nee, dat zijn ongewenste stoffen, die willen we gewoon niet hebben. Dus we willen chloor gewoon niet gebruiken. 194 Dat is een bepaald essentieel uitgangspunt, wat ook heel veel consequenties heeft hoor, maar wat in Nederland al meer dan 30 jaar gehanteerd wordt, en daar is ook veel aan gedaan, veel onderzoek aan gedaan. Dus dat is denk ik een belangrijk punt al voor dit eerste college om even vast te houden. Chloor leidt gewoon tot die giftige verbindingen en dat moet je daarom niet willen. In ieder geval hebben we dat in Nederland besloten, dat we dat niet willen, en dat doen we dus ook niet. Er is nog een praktisch ander aspect en dat is dat water met chloor naar chloor smaakt, en dat vinden we in Nederland ook niet fijn. We vinden toch dat water wat uit de kraan komt, dat moet lekker smaken, dat moet niet zo'n vieze choorsmaak hebben. Dat is een zwembad chloorsmaak. Dat willen we voor drinkwater niet. Dat is waarschijnlijk ook weer een van die aspecten die met cultuur en consumentenvertrouwen samenhangen. We hebben een aantal principes die we in Nederland gebruiken, en ik heb daar een stuk of 3, 4, sheets voor om die kort even de revue te laten passeren. Nou die focus op het gezondheid daar heb ik al voldoende over gezegd. In Nederland is het ook zo dat we relatief grotere bedrijven hebben, die een soort mengsel zijn van publiek en privaat. Het zijn NV's. Evides, het waterbedrijf wat hier is, dat is een NV, maar de aandelen zijn in handen van de gemeente Rotterdam en de provincie, en andere gemeenten in het voorzieningsgebied. Dus het is eigenlijk een soort overheid, maar net weer niet. Semi-overheid, en dat geeft ook iets bijzonders. Er is vorige week ook een nieuwe hoogleraar benoemd bij TBM, die daar een heel verhaal over heeft, dat dat eigenlijk een ideale formule is. Dat je op die manier zeg maar de waarde van water, water is toch iets wat niet zomaar een marktgoed is, wat je niet zo makkelijk kunt reguleren, zoals andere, zoals auto's en andere dingen, dus water heeft ook iets te maken met, het is van ons allemaal, we moeten het zorgvuldig beheren, en zo'n publieke verantwoordelijkheid binnen een privaatrechtelijke organisatie, een NV, die dus wel efficient werkt, ja, dat is wel iets wat een zekere aantrekkelijke kant heeft. En typisch voor Nederland, we polderen hier heel graag, dus we doen graag dingen samen. Dus die watersector in Nederland is heel goed georganiseerd. Die heeft een gemeenschappelijk researchinstituut opgesteld, KIWA, waar het speurwerk voor de waterbedrijven wordt uitgevoerd, en die hebben een belangenorganisatie opgericht, de VEWIN. En die hebben ook een personenvereniging, de KVWN, waar, als jullie hierin doorgaan, zul je daar allemaal lid van worden. En ja, er is een heel wereldje waarin er goed samengewerkt wordt enzo, informatie uitgewisseld. Dat is ook wel iets bijzonders van Nederland. Is in Nederland ook makkelijker dan in Amerika natuurlijk he. In Amerika kun je niet zo makkelijk even samenwerken tussen Los Angeles en New York. Dat gaat in Nederland allemaal wat makkelijker. Als we naar de opzet van de infrastructuur kijken, dan zijn dit essentiele kenmerken. Om te beginnen de bescherming van de bron, daar moet alles natuurlijk mee beginnen. Niet het paard achter de wagen spannen en met vies water beginnen. Nee, begin altijd met een zo schoon mogelijke bron, en zorg dan ook dat die bron schoon blijft. Daarom zie je overal, in de duinen en in de bosgebieden, zie je van die bordjes staan met grondwaterwinning, grondwaterbeschermingsgebied. Ja, niet verontreinigen als je het kunt voorkomen. Gebruik grondwater als het mogelijk is. Dus in het hele blauwe gebied, het noorden, het oosten en het zuiden van Nederland, wordt alleen maar grondwater gebruikt voor de drinkwatervoorziening. Daar is grondwater beschikbaar, dat is van goede kwaliteit. Dat is microbiologisch betrouwbaar, dus je zal er sowieso nooit ziek van worden, dus dat is de voorkeursbron, die gebruiken we dan dus ook. Nou, in het westen van Nederland kan dat natuurlijk niet. Dat weet jij ook he? - Eeh... Waarom niet? - Eeh, dat weet ik niet meer. Dat ben je vergeten. Maar iemand anders weet het misschien wel, want gezond boeren verstand, kun je ook een hoop mee he. Dus gewoon even nadenken, het is helemaal niet zo moeilijk. - Zeewater. Zeewater, zout, precies he. Dus het grondwater hier is zout. Ontzouten is heel erg duur, dus dat is eigenlijk niet praktisch. Annex D. Subtitling of Collegerama Dus ja, hier kun je geen grondwater gebruiken, dus gebruiken we maar oppervlaktewater. En dan doen we dat bij voorkeur, hier in het hele duingebied, door van dat oppervlaktewater, kunstmatig grondwater te maken via die infiltratie. Dus we pompen oppervlaktewater de duinen in, laten dat de bodem in zakken, en dat wordt het een soort kunstmatig grondwater. Wil jij een vraag stellen? - Ja, want ik zie op de waddeneilanden wordt wel grondwater gebruikt, - maar dat is in principe ook een duingebied. Ja. - Maar daar zou toch ook zout in het waterwingebied voorkomen? Ja, dan moet ik even iets meer zeggen dan. In het hele duingebied geldt eigenlijk dat er door, als gevolg van eeuwenlang regen die op die duinen gevallen is, dat er een zoetwaterbel op het zoute water drijft. Dus als je heel voorzichtig dat water wint, kun je wel zoet water winnen. Dat kan snel fout gaan hoor, dus daar moet je echt wel mee oppassen, maar dat kan net wel. En zo is ook de duinwaterwinning in het westen van Zuid Holland en Noord Holland begonnen, in de 19e eeuw, door gewoon eerst duinwater op te pompen. Op een gegeven moment is het daar fout gegaan, kregen ze zout water, en toen zijn ze met dat infiltreren van oppervlaktewater begonnen. Nou, als het helemaal niet anders kan en dat is nou net bij Rotterdam het geval, en daarom gaan we ook bij die Beerenplaat kijken, daar moet je oppervlaktewater gebruiken. Rotterdam heeft geen grondwater, Rotterdam heeft ook geen duinen, dus je moet daar oppervlaktewater gebruiken. Dan moet je dus een hele uitgebreide zuivering hebben, dus dat is ook niet eenvoudig. En daar gaan we kijken. Dus bronbescherming, dat is toch nummer 1. We zien ook regelmatig dat de waterbedrijven berichten in de krant zetten van, deze stof moet verboden worden, hier moeten beperkingen aan gesteld worden. Gewoon zorgen dat wat goed is, goed blijft. Nou, grondwater dat plaatje. Ik denk dat dit toch wel erg illustratief is. Als je een mental map hiervan maakt, van grondwater, dat is eigenlijk regenwater wat door een gigantisch zandfilter gestroomd is. Nou, dat is goed. Oppervlaktewater, nou, bijvoorbeeld bij Scheveningen hier. Er is gewoon in de natuurlijke duinvalleitjes, pompen we Maaswater. Wel na voorzuivering overigens hoor, want anders verstoppen die duinvalleien meteen, en dan verontreinigen we het duinmilieu, dat willen we natuurlijk niet. Dus het water wordt eerst voorgezuiverd, dan de duinen in gepompt, dan zakt het de bodem in. Dan winnen we het weer terug met putjes die her en der in die duinen ook geplaatst zijn. Dan gaat het naar de nazuivering toe, die we hier zien staan, en dan vervolgens het distributienet in. Oppervlaktewater hebben we meervoudige barrieres. Op dit moment volstaat eigenlijk om zo'n stroomschema te zien, en te zien dat daar een heleboel stappen achter elkaar zitten. We hebben gewoon veel afzonderlijke zuiveringsprocessen. Enerzijds om er zeker van te zijn dat als de ene iets minder werkt, dat de andere het wel opvangt. Veiligheid, robuustheid, is heel belangrijk. En anderzijds, om verschillende soorten stoffen met verschillende zuiveringssystemen tegen te kunnen houden. Dus het gaat altijd om vrij uitgebreide zuiveringsschema's als het over oppervlaktewater gaat. Daar gebruiken we ook moderne technologie bij. Dus dat zijn dan weer ontwikkelingen die de afgelopen decennia zeg maar mogelijk geworden zijn. Hier zien we de membraanfiltratie-installatie bij Heemskerk. Dat is de modernste en grootste zuivering van dit type in Europa. Is in Nederland ontwikkeld. We zien hier de desinfectie met UV licht. Dus dat zijn eigenlijk gewoon TL buizen zou je je kunnen voorstellen, maar die stralen dan UV licht uit. En bacterien die kunnen daar niet tegen, die gaan daar dood van. Dus dat is een goede manier om desinfectie van dat water te bewerkstelligen. Nou, dat is 2 jaar geleden geopend in aanwezigheid van de Prins Annex D. Subtitling of Collegerama en ook dat is weer een Nederlandse ontwikkeling, om de zuivering weer beter te krijgen. Nou, het resultaat daarvan is dan dat we dus... Aan de kraan, als we die openzetten, dan komt er water van een hoge kwaliteit uit he, zuiver water, wat geen verontreinigingen bevat, en ook geen chloor. Wat ook zacht is. Het water is ook onthard in Nederland. Daar komen we nog later op terug. En uiteindelijk is het resultaat mede daardoor, dat we in Nederland ook helemaal geen flessenwater gebruiken. Althans, heel weinig. En dat is uiteindelijk weer, als je er macro-economisch naar kijkt, of zelfs naar de individuele klant, is dat gewoon een hele verstandige zaak, want flessenwater is vele malen duurder dan drinkwater. Het is 500 keer duurder. Het is ook veel slechter voor het milieu. Het milieubeslag van flessenwater, daar zijn eens een keer sommetjes van gemaakt, met ecopunten enzo, van die flessen moeten allemaal over de weg vervoerd worden met vrachtwagens, en die moeten weer schoongemaakt worden enzovoort. Als je die sommetjes maakt, dan is het milieubeslag van flessenwater 30 keer zo hoog als van drinkwater. Dus als je even op de achterkant van een sigarendoos een sommetje maakt, van wat nou de Nederlander voor water kwijt is, en de Italiaan, dan is de Italiaan 2 tot 3 keer zoveel kwijt voor water dan de Nederlander. En dat zit hem vooral in het feit dat men flessenwater gebruikt. De drinkwatervoorziening zelf, de kosten daarvan, zijn min of meer vergelijkbaar, want dat is ook een kenmerk van dit soort grootschalige infrastructuur. Om iets goed te doen, meervoudige zuiveringen, veilige systemen maken, dat is niet zo heel veel duurder dan om het slecht te doen. Want het merendeel van de kosten zit er toch in dat je moet beginnen met een winning te maken, je moet een zuivering hebben, je moet transportleidingen, distributieleidingen, heel veel van die kosten die heb je sowieso. En als je het goed doet, is niet veel duurder dan als je het slecht doet. Ik denk dat ik er ben, oh ja, we hebben ook nog andere dingen, dus we hebben het laagste lekpercentage van de wereld, en hele betrouwbare systemen en we letten tegenwoordig natuurlijk in Nederland op waterbesparing. Water is toch een natuurlijke grondstof, dat moet je niet verspillen. Dus het waterverbruik in Nederland stijgt niet, is relatief constant, en het huishoudelijk waterverbruik daalt zelfs, doordat we tegenwoordig waterbesparende toiletten en douches en wasmachines enzo hebben, en die worden ook allemaal gestimuleerd, krijg je subsidie op enzovoort. We zijn er allemaal verantwoord mee bezig. En dan hebben we de laatste dia. Dus het uiteindelijke resultaat van die hele filosofie en de dingen die daaraan gedaan zijn de afgelopen 30 jaar, is dus dat we zeggen van, we hebben het wonder uit de kraan. Dat was een reclamekreet van de waterbedrijven een aantal jaren geleden. Werd toen posters van gemaakt en reclame op radio en TV. Het wonder uit de kraan, heel goed water, het is er altijd. We worden er niet ziek van, geen verontreinigingen. We hebben geen flessenwater nodig. We hebben geen filters aan de kraan nodig. We verspillen het water niet. Dus we hebben de zaken goed voor elkaar. Nou, dat is enerzijds natuurlijk een beetje een gechargeerd beeld. Er zijn best wel dingen die nog beter kunnen en beter moeten, en daar komen we ook wel op terug, maar qua filosofie zeker in vergelijking met Canada en Amerika bijvoorbeeld, is dat gewoon zo. En aan de andere kant moeten we ons ook realiseren dat dat natuurlijk ook heel anders kan, en in heel veel landen ook heel anders gaat. En het meest extreme voorbeeld daarvan zijn natuurlijk de ontwikkelingslanden, waar gewoon de basisinfrastructuur nog volledig ontbreekt, en daar gaat na de pauze Doris over vertellen. We gaan even pauzeren, bedankt. 195 Annex D3. Partial transcript lecture CT3011 (incl. time frames / sentence) 1 00:00:00,100 --> 00:00:03,300 Na een ruime inlooptijd 2 00:00:03,300 --> 00:00:06,800 kunnen we beginnen 3 00:00:06,800 --> 00:00:11,700 met het tweede deel van 30-11, Watermanagement. 4 00:00:11,700 --> 00:00:14,300 Het deel over gezondheidstechniek ga ik 5 00:00:14,300 --> 00:00:18,600 de komende zeven weken met jullie doornemen. 6 00:00:18,600 --> 00:00:21,400 En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus, 7 00:00:21,400 --> 00:00:25,000 mijn naam is Hans van Dijk, zoals jullie daar zien staan 8 00:00:25,000 --> 00:00:26,900 en ik dacht, laat ik daar maar twee dingen voor nemen, 9 00:00:26,900 --> 00:00:29,000 mijn hobby en mijn werk. 10 00:00:29,000 --> 00:00:30,800 Nou de hobby dat zien jullie, 11 00:00:30,800 --> 00:00:32,600 ik ben een marathonloper. 12 00:00:32,600 --> 00:00:37,400 Een mooie foto van de glorieuze binnenkomst in Rotterdam 13 00:00:37,400 --> 00:00:40,300 in april afgelopen periode. 14 00:00:40,300 --> 00:00:43,100 Marathonlopers dat zijn allemaal een beetje fanatieke lui he, 15 00:00:43,100 --> 00:00:46,600 echte doordouwers, die trainen iedere dag. 16 00:00:46,600 --> 00:00:50,800 Die weten hun leven zodanig te organiseren dat dat allemaal kan. 17 00:00:50,800 --> 00:00:54,000 Dus ik loop hier ook iedere dag tussen de middag een rondje naar 18 00:00:54,000 --> 00:00:55,800 Delfts hout, of langs de Schie, 19 00:00:55,800 --> 00:00:59,400 of een ander parcour hier. 20 00:00:59,400 --> 00:01:03,100 Als jullie me eens een keer in korte broek of trainingspak zien lopen ...................... ...................... 770 00:44:26,200 --> 00:44:28,800 We hebben geen filters aan de kraan nodig. 196 Annex D. Subtitling of Collegerama 771 00:44:28,800 --> 00:44:30,700 We verspillen het water niet. 772 00:44:30,700 --> 00:44:33,000 Dus we hebben de zaken goed voor elkaar. 773 00:44:33,000 --> 00:44:35,700 Nou, dat is enerzijds natuurlijk een beetje een gechargeerd beeld. 774 00:44:35,700 --> 00:44:38,900 Er zijn best wel dingen die nog beter kunnen en beter moeten, 775 00:44:38,900 --> 00:44:40,700 en daar komen we ook wel op terug, 776 00:44:40,700 --> 00:44:44,700 maar qua filosofie zeker in vergelijking met Canada en Amerika bijvoorbeeld, 777 00:44:44,700 --> 00:44:46,800 is dat gewoon zo. 778 00:44:46,800 --> 00:44:52,200 En aan de andere kant moeten we ons ook realiseren dat dat natuurlijk ook heel anders kan, 779 00:44:52,200 --> 00:44:54,900 en in heel veel landen ook heel anders gaat. 780 00:44:54,900 --> 00:44:58,100 En het meest extreme voorbeeld daarvan zijn natuurlijk de ontwikkelingslanden, 781 00:44:58,100 --> 00:45:01,600 waar gewoon de basisinfrastructuur nog volledig ontbreekt, 782 00:45:01,600 --> 00:45:05,100 en daar gaat na de pauze Doris over vertellen. 783 00:45:05,100 --> 00:45:11,100 We gaan even pauzeren, bedankt. Annex D. Subtitling of Collegerama 197 Annex D4. Partial transcript lecture CT3011 (incl. time frames / word) woord en aan uh ja en daarin lopen we daar is zeker geen en uh we met het hele heelal van uh dertig elf en later naar het cement 't cd lover gezondheidss en niet werd hij de zeven weken met jullie uh doornemen en ik dacht ik zelf mee zeker je je voorstellen is mijn naam is als een tank zoals jullie daar zien staan en ik dacht laat ik daar maar twee dingen van één en een half jaar naar werk ja nou willen niet inzien 198 begin millisec 110 240 760 1080 1260 1410 2260 2560 2880 3070 3170 3710 3940 4220 5840 6730 6910 7260 7480 7820 8300 9200 9550 10020 10210 10600 10810 10900 11470 11700 12180 12760 13270 13370 13670 13880 14260 14780 15170 15550 15740 16200 16460 18600 18730 18830 19040 19120 19350 19520 19950 20050 20180 20970 21710 21880 22110 22250 22500 22630 22940 23280 23470 23700 23880 24750 24870 24950 25160 25370 25440 25610 25740 26000 26260 26470 26730 26900 26980 27190 27470 27630 28120 28970 29140 29380 29540 eind millisec 240 760 1070 1250 1410 2260 2540 2830 3070 3160 3690 3940 4220 4670 6630 6910 7130 7480 7820 8300 9170 9540 10010 10210 10600 10800 10900 11460 11700 12180 12710 13270 13370 13670 13880 14180 14370 15170 15550 15740 16200 16460 17190 18730 18830 19040 19120 19350 19520 19950 20050 20180 20960 21390 21880 22110 22250 22500 22630 22940 23280 23470 23690 23880 24350 24870 24950 25160 25360 25440 25610 25740 26000 26260 26470 26730 26900 26980 27180 27470 27630 28110 28400 29120 29370 29540 30050 ja ik ben marathonloper een mooie foto van de glorieuze binnenkomst in uh rotterdam in april afgelopen periode en marathonlopers dat zijn allemaal een beetje fanatieke leidde er echter door de ouders die twee iedere dag [s] die je er beter inleven zelf de aandacht te organiseren dat het allemaal kan dus ik loop hier ook iedere dag tussen de meer dan ooit je naar delft houdt of langs deze rivier of andere koerier en uh als jullie d'r is 30050 30470 30600 30820 32610 32730 33080 33550 33760 33820 35300 36050 36350 36580 37370 37650 38110 38640 40040 40210 41010 41140 41390 41760 41850 42100 42660 43000 43200 43480 43720 43800 44270 44990 45230 45710 46290 46610 46920 47010 47280 47640 48020 48260 48340 48590 48680 49450 49620 49750 50150 50920 51060 51180 51420 51620 51900 52260 52490 52750 52830 52980 53280 53520 53670 53900 54250 54670 54790 55060 55290 55770 56400 56830 58230 58590 59580 59750 59970 60320 30450 30600 30820 31920 32730 33080 33550 33760 33820 34800 36050 36350 36570 37360 37640 38100 38640 39360 40210 41010 41140 41390 41760 41850 42100 42660 43000 43190 43480 43720 43800 44270 44610 45230 45710 46290 46610 46920 47010 47280 47640 48020 48260 48340 48590 48680 49450 49620 49750 50100 50470 51060 51180 51420 51620 51880 52260 52490 52750 52830 52980 53280 53520 53670 53900 54250 54670 54790 55060 55290 55770 56050 56830 57580 58590 58930 59750 59970 60310 60580 Annex D. Subtitling of Collegerama Annex E. 1. 2. 3. 4. Speech recognition Speech recognition for movies .................................................................. 201 Types of speech recognition............................................................................... 201 Speech recognition at University of Twente ......................................................... 201 SHoUT .............................................................................................................. 201 SHoUT for example lecture in CT3011 ...................................................... 202 SHoUT on example lecture ................................................................................. 202 Number of segments and words ......................................................................... 202 Total duration of words and silences .................................................................. 203 SHoUT compared with human made subtitles ..................................................... 203 SHoUT as subtitling system ................................................................................ 204 SHoUT and word frequency ............................................................................... 205 SHoUT for tag cloud search ............................................................................... 206 SHoUT for example course CT3011 (all lectures) ..................................... 207 SHoUT on example course ................................................................................. 207 Lectures and lecturers ....................................................................................... 207 SHoUT output analysis....................................................................................... 209 Quality of word recognition ................................................................................ 209 Word correctness per lecturer ............................................................................ 211 Evaluation ................................................................................................. 212 SHoUT for word indexing ................................................................................... 212 SHoUT for tag cloud production ......................................................................... 212 SHoUT for transcripts ........................................................................................ 212 SHoUT for subtitles ........................................................................................... 212 Annex E1. Annex E2. Annex E3. SHoUT result from lecture CT3011.................................................. 213 Transcript of lecture CT3011 from speech recognition (SHoUT) .... 218 Speech recognition (SHoUT) compared to human made subtitles . 222 Annex E. Speech recognition 199 200 Annex E. Speech recognition 1. Speech recognition for movies Types of speech recognition Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to text. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker - as is the case for most desktop recognition software; hence there is an aspect of speaker recognition, which attempts to identify the person speaking, to better recognize what is being said. Speech recognition is a broad term which means it can recognize almost anybodys speech - such as a callcentre system designed to recognize many voices. Speech recognition at University of Twente Speech recognition is one of the focus points of the chair Human Media Interaction at the University of Twente. Figure 1.1: Logo of the chair Human Media Interaction at the University of Twente Further reference is made to the websites of this chair: • chair: http://hmi.ewi.utwente.nl) • multi media retrieval: http://hmi.ewi.utwente.nl/topic/Multimedia%20Retrieval SHoUT SHoUT is a software package that has been developed at the University of Twente at the chair Human Media Interaction by promovendus Marijn Huijbregts. He was doing a PHD project titled "Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled". ShoUT is a Dutch acronym for "Speech Recognition Research at the University of Twente" which is a speech recognition system based on machine learning techniques that are commonly used. It is used to do research on Large Vocabulary Continuous Speech Recognition (LVCSR), but the speech/non-speech detector and the speaker diarization application can be used separately. It is written in C++ on a Linux platform. (Source: http://wwwhome.cs.utwente.nl/~huijbreg/shout/) Annex E. Speech recognition 201 2. SHoUT for example lecture in CT3011 SHoUT on example lecture For the case of this project, a Dutch lecture given by J.C. (Hans) van Dijk about Sanitary Engineering was selected for the purpose of testing the results on speech recognition by SHoUT. This lecture has been subtitled previously, by human subtitling (Erwin de Moel). This allows for evaluating the quality of the speech recognition. The lecture is from the bachelor's course CT3011, Introduction Water Management. Further information on this course is given in Table 2.1. Table 2.1: Sample lecture chosen for speech recognition Course Lecture Lecturer Duration Number of slides Recording date/time Collegerama link CT3011 – Introduction Water Management Lecture 8 – Sanitary Engineering (Civiele Gezondheidstechniek) (#15 in Collegerama recorded lecturers) Prof. ir. J.C. (Hans) van Dijk 45:09 27 29 September 2007 / 8:45 AM – 9:30 AM http://collegerama.tudelft.nl/mediasite/Viewer/?peid=f33ba7ff-01604259-bd94-7ee0d9c5a461 The result of the speech recognition by SHoUT is given in Annex E1. This is an xml-file with time stamps associated with each spoken word. Silences are considered as words (marked as [s]) with a certain duration. This xml-file can be converted into a transcript by removing the time stamps and replacing [s] with "…". The result of this conversion is shown in Annex E2. Number of segments and words The number of segments and words retrieved by ShoUT is shown in Table 2.2. Table 2.2: Analysis of speech recognition for lecture CT3011 Property Number of speech segments SHoUT 170 Human subtitling 779 Number of words - incl. [silence]'s Number of real words (excl. [silence]'s) 9,739 7,351 6,970 Number of text blocks * 2,223 779 * Assuming [s] of [s][s] as sentence delimiter The SHoUT output file contains 170 speech segments, labeled as SPK01-001 to SPK01-170. Segments are defined by SHoUT based on the following procedures: • energy detection • speech activity detection • smoothing (combining simular elements without long silence periods) Nearly all speech segments begin and end with a silence (165 out of 170 segments). Silences are shown as "[s]" in the SHoUT output file. Segments cannot be considered as real sentences. Their number is much smaller than the number of subtitles (170 versus 779). Also a silence cannot be regarded as a sentence delimiter. The output file contains 165 double [s] words and 2.058 single [s]-words. This total number is much more than the number of sentences in the subtitling file. 202 Annex E. Speech recognition The number of real words in the SHoUT output is 5% more compared to those in the subtitle file. Apparently longer words might be divided into different words whenever their syllables or word parts are recognized as separate words. Total duration of words and silences Speech segments and words might have an interval time. In that case the starting time differs from the ending time of the previous segment or word. Table 2.3 gives an overview of the interval duration between words and silences as well as the duration of silences and words. Table 2.3: Duration of intervals, silences and words from speech recognition for lecture CT3011 Property Number of elements Minimum duration (milliseconds) Maximum duration (milliseconds) Median duration (milliseconds) Total time (milliseconds) Total time (minutes) Intervals 220 20 19,020 30 Silences 2,388 10 1,710 50 Words 7,351 30 1,480 240 Total 779 - 135,410 2:15 5% 409,150 6:49 15% 2,162,130 36:02 80% 2,707,690 45:07 100% The interval time varies from 30 to 19,020 milliseconds. The total interval time amounts to 5% of the lecture time. The silence time varies from 10 to 1,710 milliseconds. The total silence time amounts to 15% of the duration of the lecture. The total time of the SHoUT elements (45:07 minutes) nearly equals the playing time of the Collegerama lecture (45:09 minutes). The difference might be caused by the start and end periods and/or the difference in timing of the movie and the SHoUT speech recognition system. It is assumed that the time stamps of the speech recognition allows for adequate timing for use in subtitling and/or word searching. SHoUT compared with human made subtitles The output of the SHoUT speech recognition can be compared with the human made subtitles. For this purpose the transcript of Annex E2 has been converted into sentences as is required for subtitles. The result of this conversion is shown in Annex E3, which shows the human made subtitles as well as the converted SHoUT results. Additionally, Annex E3 shows the speech segments from SHoUT. The results are summarized in Table 2.4. The comparison has been made for the initial part of the lecture with a total sample size of 471 words. This sample size has been reduced by removing a conversation between the lecturer and a student. The recording of this conversation is hampered by the absence of a microphone for students. The reduced sample size amounts to nearly 6% of the total lecture. Annex E. Speech recognition 203 Table 2.4: Recovery of words by SHoUT speech recognition compared to human made subtitles Set Collection Words Lecture Sample Sample, excluding conversation Sample, excluding conversation (%) Recovered words Sample, excluding conversation Lines with 100% word correctness Lines Human subtitling (# words or lines) 6.970 471 401 5.8 % 401 48 - Speech recognition (SHoUT) (# words (%) or lines) 7.351 105 % 443 94 % 411 102 % 5.6 % 204 51 % 48 100 % 7 - Table 2.4 shows that the total number of words recovered by the SHoUT speech recognition systemis a little bit more than the actual number of words (105%). This is probably caused by the fact that SHoUT recognizes long words as separate smaller words. This word splitting might be explained by the low speaking rate in lectures. The total number of words in the reduced sample is 102% compared to that of the humanmade subtitles. This corresponds well to the lecture as a whole. The word correctness of SHoUT proves to be approximately 50%. The reduced sample size includes 48 subtitle lines. Only 7 lines have a 100% word correctness. This corresponds to approximately 15% of the subtitle lines. The rather low word correctnesse and the dramatically low sentence correctness require a substantial improvement by human intervention, when using the SHoUT results for subtitling. SHoUT as subtitling system The output of the SHoUT speech recognition might be used for the creation of subtitles. Previous comparison showed that SHoUT recognizes around 50% of the words correctly. For accurate subtitling, this is clearly not enough. This is properly shown in the YouTube example: http://www.youtube.com/watch?v=otGN0NUYs5w The subtitles from SHoUT are indicated as Interlingua-SHoUT. Figure 2.1 gives an impression of these subtitles. Figure 2.1: Subtitles created from the SHoUT transcript 204 Annex E. Speech recognition Moreover, the quality of the speech recognition is inadequate for using these subtitles for automated translation by Google Translate. This too can be demonstrated in the previous mentioned YouTube example. SHoUT and word frequency The human-made subtitles were previously analyzed on the frequency of words. This result can be compared to the results of the SHoUT speech recognition system. Table 2.5 gives the comparison for the top-20 words, Table 2.6 for the top-15 nouns. Table 2.5: Top 20 most common words in transcript of lecture #15 of CT3011 (human made versus speech recognition SHoUT) Nr Word 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 dat de en het is een van in ook we die je dus ik dan daar niet zijn op met Count Human made 269 231 225 220 181 174 162 151 134 128 113 107 93 88 80 76 55 54 54 53 Table 2.6: Top 15 most used nouns in transcript of lecture #15 in CT3011 Nr Word 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 water Nederland grondwater jaar drinkwatervoorziening dingen oppervlaktewater boek keer drinkwater plaatje vragen chloor soort stoffen Annex E. Speech recognition Count manual 39 36 21 16 16 16 15 15 13 13 11 10 10 9 9 Count Speech recognition 218 276 359 155 270 158 132 192 109 46 99 117 22 74 89 72 165 75 46 59 Word accuracy Count SHoUT 33 35 20 28 4 17 7 5 16 16 6 6 0 7 8 Word accuracy 81% 119% 160% 70% 149% 91% 81% 127% 81% 36% 88% 109% 24% 84% 111% 95% 300% 139% 85% 111% 85% 97% 95% 175% 25% 106% 47% 33% 123% 123% 55% 60% 0% 78% 89% 205 Table 2.3 and Table 2.5 show that speech recognition by SHoUT has a word accuracy between 0 and 300%. Some words are never recognized (as the word "chloor"), and some words are recognized far too often (as the word "niet"). SHoUT for tag cloud search The human made subtitles were previously analyzed for the frequency of words. Table 2.7: Top 15 most used nouns in speech recognition of lecture #15 in CT3011 Nr Word 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 nederland water jaar grondwater mensen dingen drinkwater stoffen jaren onderzoek oppervlaktewater soort kwaliteit plaatje wereld Count Speech recognition 35 33 28 20 20 17 16 8 7 7 7 7 6 6 6 In top-15 human made subtitles yes yes yes yes no yes yes yes no no yes yes no no no The top-4 words from the SHoUT system are the same as those obtained from human-made subtitling. The top-10 is for 70% identical (7 out of 10) and the top-15 for 60% (9 out of 15). These results show that speech recognition is rather suitable for creating tag clouds. This is demonstrated in Figure 2.2. Figure 2.2: Tag cloud from SHoUT (left) versus tag cloud from human-made subtitles (right) with common Dutch word removal (Source: http://www.wordle.net) 206 Annex E. Speech recognition 3. SHoUT for example course CT3011 (all lectures) SHoUT on example course For the case of this project, a BSc course in the Dutch language was selected for the purpose of testing the results on speech recognition by SHoUT. The course is from the BSc program Civil Engineering. This course is part of the TU Deft OpenCourseWare (OCW - OpenER) program. Further details of this course are given in Table 3.1. Table 3.1: Sample course chosen for speech recognition Course Academic year Period Lecturers Course credits Number of recordings OCW links Collegerama catalog link CT3011 – Introduction Water Management 2007 – 2008 (Lecture #1 to #4 are recorded in 2008/2009) P1 (september – november) Prof. dr.ir. N.C. (Nick) van de Giesen Prof. ir. J.C. (Hans) van Dijk Plus 9 guest lecturers 4 ECTS 28 (14 double lecture sessions) Available as OCW course and also as the original Blackboard-course. For links see http://drinkwater.tudelft.nl banner OpenCourseWare http://collegerama.tudelft.nl/mediasite/Catalog/?cid=16b5f5fa-0745-4b8b9f02-f79a03abf50a Lectures and lecturers The course CT3011 consists of 28 lectures with a nominal duration of 45 minutes, given in 14 double lecture sessions. The two responsible professors gave 18 lectures while the remaining 10 lectures were given by 9 different guest lecturers. The lectures were given in the academic year 2007/2008, except for the first 4 lectures which were recorded a year later. Table 3.2 gives an overview of the lectures and the lecturers. This table also includes the gender and age of the lecturer, since this might be of relevance for evaluating the word correctness of the speech recognition. All lectures were given in the Dutch language and all lecturers are native Dutch speakers. Annex E. Speech recognition 207 Table 3.2: Lectures and lecturers in CT3011 (lectures in Dutch by native Dutch speakers) Nr Lecturer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Nick van de Giesen Nick van de Giesen Nick van de Giesen Nick van de Giesen Nick van de Giesen Nick van de Giesen Nick van de Giesen Nick van de Giesen Nick van de Giesen Peter- Jules van Overloop Nick van de Giesen Nick van de Giesen Huub Savenije Huub Savenije Hans van Dijk Doris van Halem Hans van Dijk Patrick Smeets Hans van Dijk Jasper Verberk Hans van Dijk Karin Teunissen Hans van Dijk Anke Grefte Hans van Dijk Mirjam Blokker Hans van Dijk Jan Vreeburg * M Male, F Female 208 Nr 1 1 1 1 1 1 1 1 1 4 1 1 3 3 2 5 2 6 2 7 2 8 2 9 2 10 2 11 Gender (*) M M M M M M M M M M M M M M M F M M M M M F M F M F M M Age (at time of recording) 46 46 46 46 45 45 45 45 45 38 45 45 54 54 53 26 53 36 53 35 53 23 53 27 53 33 53 47 Recorded length (hh:mm:ss) 0:36:03 0:44:20 0:43:46 0:43:05 0:45:56 0:39:19 0:39:19 0:48:19 0:44:49 0:41:27 0:36:50 0:38:35 0:40:50 0:40:38 0:45:09 0:32:25 0:46:28 0:44:22 0:48:22 0:39:56 0:53:51 0:34:46 0:48:30 0:23:26 0:39:49 0:22:31 0:46:17 0:42:20 Annex E. Speech recognition SHoUT output analysis The number of words retrieved by SHoUT from all recording is shown in Table 3.3. Table 3.3: Words from SHoUT for all recorded lectures of CT3011 Lecture Nr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Total Lecturer Number of words Nr 1 1 1 1 1 1 1 1 1 4 1 1 3 3 2 5 2 6 2 7 2 8 2 9 2 10 2 11 19:22:00 Nr 5,716 7,149 6,930 5,966 7,792 5,803 6,492 7,776 6,768 6,371 5,857 6,596 6,152 5,923 7,351 4,520 8,139 7,320 8,693 7,370 9,392 5,652 8,642 3,740 6,946 3,581 8,188 8,132 188,957 Timing last word (sec) 2,161 2,658 2,625 2,584 2,752 2,309 2,358 2,897 2,688 2,486 2,207 2,619 2,445 2,436 2,706 1,892 2,831 2,665 3,056 2,390 3,229 2,027 3,029 1,358 2,387 1,350 2,776 2,532 69,453 Speech rate (words/sec) 2.65 2.69 2.64 2.31 2.83 2.51 2.75 2.68 2.52 2.56 2.65 2.52 2.52 2.43 2.72 2.39 2.87 2.75 2.84 3.08 2.91 2.79 2.85 2.75 2.91 2.65 2.95 3.21 2.71 Table 3.3 shows that the 28 recorded lectures have a total length of almost 19.5 hours. In total, nearly 190,000 words have been produced by SHoUT. The mean speech rate in these lectures is 2.7 words per second, with a variation between plus or minus 20%. This variation is mainly caused by the speaking rate of the different lecturers and to a lesser extent by the speaking pauses during the lectures. The latter are more or less absent. Quality of word recognition The quality of the speech recognition is determined by comparing a sample of the transcript generated by SHoUT with the actual spoken words. Selection of this sample was done for all lectures by using a selection of 25 sentences, somewhere in the beginning of the lecture. This came out to be 5%-6% of the number of words in a lecture. The comparison was visualized by color marking the correct words. Figure 3.1 gives a print of this color marking for the highest and for the lowest word correctness. Table 3.4 gives the results of this comparison. Annex E. Speech recognition 209 Figure 3.1: Quality check of speech recognition by marking the correct words, showing the highest word correctness (left, 73%) and the lowest word correctness (right, 23 %) Table 3.4: Words from SHoUT for all recorded lectures of CT3011 Lecture Nr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Total Minimum Maximum Mean Std Deviation 210 Lecturer Nr 1 1 1 1 1 1 1 1 1 4 1 1 3 3 2 5 2 6 2 7 2 8 2 9 2 10 2 11 Lecture size SHoUT (Nr of words) 5,716 7,149 6,930 5,966 7,792 5,803 6,492 7,776 6,768 6,371 5,857 6,596 6,152 5,923 7,351 4,520 8,139 7,320 8,693 7,370 9,392 5,652 8,642 3,740 6,946 3,581 8,188 8,132 188,957 Sample size (Nr of words) 434 445 428 446 417 426 313 411 411 447 443 430 409 407 437 398 400 406 384 439 394 405 373 389 425 395 404 411 11,527 (%) 7.6 6.2 6.2 7.5 5.4 7.3 4.8 5.3 6.1 7.0 7.6 6.5 6.6 6,9 5.9 8.8 4.9 5.5 4.4 6.0 4.2 7.2 4.3 10.4 6.1 11.0 4.9 5.1 6.1 4.2 11.0 Word correctness (%) 35 40 26 23 45 31 34 41 40 60 36 35 73 47 46 64 64 64 61 49 56 41 67 64 48 71 64 66 23 73 50 14.6 Annex E. Speech recognition Table 3.4 shows that the average word correctness of SHoUT amounts to 50%, with a variation between 23 and 73%. The latter corresponds to a word error rate of approximately 77 respectively 27 %. Word correctness per lecturer The word correctness may vary for the different speakers. Therefor the word correctness can be clustered per lecturer. This is presented in Figure 3.2. Figure 3.2: Word correctness clustered per lecturer shows significant differences amongst speakers Figure 3.2 shows significant differences amongst speakers. The first speaker has an average word correctness of 35% (variation between 23 and 45%), the second speaker has an average word correctness of 58% (variation between 45 and 67%). This difference cannot be related to technical differences of the recordings or the difference in speech rate. The differences are most likely caused by differences in prononcation and articulation of both lecturers. The guest lectures have a word correctness within the same range as the second lecturer. No significant differences have been observed between male (10, 18, 20, 28) and female (16, 22, 24, 26) speakers. Annex E. Speech recognition 211 4. Evaluation Speech recognition can be used for word and text recovery from recorded lectures. Based on the results of the quality analyses of the recorded lecture, the following conclusion can be drawn for using SHoUT as speech recognition system for lectures recorded with Collegerama. SHoUT for word indexing The word correctness of SHoUT amounts to 50% with variation between 25 and 75%. This recovery rate allows for word indexing in cases where no better sources for word indexing, such as correct subtitles, are available. SHoUT for tag cloud production The word correctness of SHoUT amounts to 50% with variation between 25 and 75%. Testing the produced tag cloud for the most frequently used nouns shows that tag clouds produced from SHoUT output are more or less similar to tag clouds produced from handmade subtitles. It should be noted that SHoUT is missing some uncommon words completely, like the word "chloor" in the test lecture. For academic lectures, this might be a serious shortcoming. SHoUT for transcripts The word correctness of SHoUT is too low for producing readable transcripts. SHoUT for subtitles The word correctness of SHoUT is too low for producing readable subtitles. The SHoUT output however might be used as a starting point for human-made subtitles since it provides the proper timing of all words. In such a human-based post processing these words should be clustered in subtitle sentences and the incorrect words should be corrected. 212 Annex E. Speech recognition Annex E1. SHoUT result from lecture CT3011 <?xml version="1.0" encoding="ISO-8859-1"?> <!-############################################################################################### ### ### Shout is the decoder of the 'SHoUT LVCS Recognition toolkit'. ### ### This toolkit is developed by: ### ### Marijn Huijbregts, HMI, University of Twente. ### ### http://wwwhome.cs.utwente.nl/~huijbreg/ ### ### marijn.huijbregts@utwente.nl ### ############################################################################################### ### --> <shout_metadata> <model_info> <AM>/home/parlevink/verschoort/projects/asr-test/shout/models/acoustic-model.16.try4.orgsil.am</AM> <DCT>/home/parlevink/verschoort/projects/asr-test/shout/models/9904.65K.release-003.dct.bin</DCT> <LM>/home/parlevink/verschoort/projects/asr-test/shout/models/mix-9904.3g.interpolate.v02.lowprobs_cgn-comp-fijkl.arpa.plus.bin</LM> </model_info> <decoding_settings> <LM_SCALE>30.0</LM_SCALE> <TRANSITION_PENALTY>0.0</TRANSITION_PENALTY> <SHORT_WORD_PENALTY>0.0</SHORT_WORD_PENALTY> <SHORT_WORD_LENGTH>3</SHORT_WORD_LENGTH> <SIL_PENALTY>0.0</SIL_PENALTY> <GLOBAL_BEAM>175.0</GLOBAL_BEAM> <NODE_BEAM>60.0</NODE_BEAM> <ENDWORD_BEAM>50.0</ENDWORD_BEAM> <LMLA>ON</LMLA> <LMLA_CACHESIZE>800</LMLA_CACHESIZE> <LMLA_CLEANUPINTERVAL>100</LMLA_CLEANUPINTERVAL> <MAX_TOKENS_PER_STATE>160</MAX_TOKENS_PER_STATE> <MAX_TOKENS_TOTAL>35000</MAX_TOKENS_TOTAL> </decoding_settings> <segments> <speech label="SPK01-001" begintime="0.00" endtime="17.67" > <real_time milliseconds="22112" frames="1767" RTF="1.2514"/> <EOS-score="0.00000"/> <score COMBINED="-59866.66016"/> <wordsequence> <word wordID="[s]" beginTime="0.000" endTime="0.110"> <score AM="-232.13780" LM="0.00000" COMBINED="-232.13780"/> </word> <word wordID="en" beginTime="0.110" endTime="0.240"> <score AM="-714.45599" LM="-42.98058" COMBINED="-757.43658"/> </word> <word wordID="aan" beginTime="0.240" endTime="0.760"> <score AM="-2258.56299" LM="-118.15379" COMBINED="-2376.71680"/> </word> <word wordID="uh" beginTime="0.760" endTime="1.070"> <score AM="-2902.88232" LM="-183.18549" COMBINED="-3086.06787"/> </word> <word wordID="[s]" beginTime="1.070" endTime="1.080"> <score AM="-2947.88574" LM="-183.18549" COMBINED="-3131.07129"/> </word> <word wordID="ja" beginTime="1.080" endTime="1.250"> <score AM="-3457.09912" LM="-242.73988" COMBINED="-3699.83911"/> </word> <word wordID="[s]" beginTime="1.250" endTime="1.260"> <score AM="-3498.23120" LM="-242.73988" COMBINED="-3740.97119"/> </word> <word wordID="en" beginTime="1.260" endTime="1.410"> <score AM="-4014.74512" LM="-295.96329" COMBINED="-4310.70850"/> </word> <word wordID="daarin" beginTime="1.410" endTime="2.260"> <score AM="-6442.80078" LM="-409.44135" COMBINED="-6852.24219"/> </word> <word wordID="lopen" beginTime="2.260" endTime="2.540"> <score AM="-7443.77246" LM="-503.81482" COMBINED="-7947.58740"/> </word> <word wordID="[s]" beginTime="2.540" endTime="2.560"> <score AM="-7505.68408" LM="-503.81482" COMBINED="-8009.49902"/> </word> <word wordID="we" beginTime="2.560" endTime="2.830"> <score AM="-8327.93164" LM="-533.22229" COMBINED="-8861.15430"/> </word> <word wordID="[s]" beginTime="2.830" endTime="2.880"> <score AM="-8552.04395" LM="-533.22229" COMBINED="-9085.26660"/> </word> <word wordID="daar" beginTime="2.880" endTime="3.070"> <score AM="-9426.74512" LM="-599.57513" COMBINED="-10026.32031"/> </word> <word wordID="is" beginTime="3.070" endTime="3.160"> <score AM="-9854.66602" LM="-661.00763" COMBINED="-10515.67383"/> </word> <word wordID="[s]" beginTime="3.160" endTime="3.170"> <score AM="-9895.22363" LM="-661.00763" COMBINED="-10556.23145"/> </word> <word wordID="zeker" beginTime="3.170" endTime="3.690"> <score AM="-11814.48828" LM="-751.24976" COMBINED="-12565.73828"/> Annex E. Speech recognition 213 </word> <word wordID="[s]" beginTime="3.690" endTime="3.710"> <score AM="-11905.47949" LM="-751.24976" COMBINED="-12656.72949"/> </word> <word wordID="geen" beginTime="3.710" endTime="3.940"> <score AM="-12843.25195" LM="-789.18256" COMBINED="-13632.43457"/> </word> <word wordID="en" beginTime="3.940" endTime="4.220"> <score AM="-13695.13770" LM="-862.70569" COMBINED="-14557.84375"/> </word> <word wordID="uh" beginTime="4.220" endTime="4.670"> <score AM="-14415.94141" LM="-916.09106" COMBINED="-15332.03223"/> </word> <word wordID="[s]" beginTime="4.670" endTime="5.840"> <score AM="-17269.53516" LM="-916.09106" COMBINED="-18185.62695"/> </word> <word wordID="we" beginTime="5.840" endTime="6.630"> <score AM="-18720.78516" LM="-989.69421" COMBINED="-19710.47852"/> </word> <word wordID="[s]" beginTime="6.630" endTime="6.730"> <score AM="-19061.42383" LM="-989.69421" COMBINED="-20051.11719"/> </word> <word wordID="met" beginTime="6.730" endTime="6.910"> <score AM="-19861.14648" LM="-1053.53931" COMBINED="-20914.68555"/> </word> <word wordID="het" beginTime="6.910" endTime="7.130"> <score AM="-20622.81445" LM="-1095.39929" COMBINED="-21718.21289"/> </word> <word wordID="[s]" beginTime="7.130" endTime="7.260"> <score AM="-21291.16211" LM="-1095.39929" COMBINED="-22386.56055"/> </word> <word wordID="hele" beginTime="7.260" endTime="7.480"> <score AM="-22231.99609" LM="-1169.62207" COMBINED="-23401.61719"/> </word> <word wordID="heelal" beginTime="7.480" endTime="7.820"> <score AM="-23341.09375" LM="-1271.07471" COMBINED="-24612.16797"/> </word> <word wordID="van" beginTime="7.820" endTime="8.300"> <score AM="-24820.38672" LM="-1324.94214" COMBINED="-26145.32812"/> </word> <word wordID="uh" beginTime="8.300" endTime="9.170"> <score AM="-26474.20117" LM="-1380.89258" COMBINED="-27855.09375"/> </word> <word wordID="[s]" beginTime="9.170" endTime="9.200"> <score AM="-26595.05664" LM="-1380.89258" COMBINED="-27975.94922"/> </word> <word wordID="dertig" beginTime="9.200" endTime="9.540"> <score AM="-27844.60742" LM="-1496.64246" COMBINED="-29341.25000"/> </word> <word wordID="[s]" beginTime="9.540" endTime="9.550"> <score AM="-27888.40625" LM="-1496.64246" COMBINED="-29385.04883"/> </word> <word wordID="elf" beginTime="9.550" endTime="10.010"> <score AM="-29522.81055" LM="-1581.08472" COMBINED="-31103.89453"/> </word> <word wordID="[s]" beginTime="10.010" endTime="10.020"> <score AM="-29572.72656" LM="-1581.08472" COMBINED="-31153.81055"/> </word> <word wordID="en" beginTime="10.020" endTime="10.210"> <score AM="-30031.23047" LM="-1634.81067" COMBINED="-31666.04102"/> </word> <word wordID="later" beginTime="10.210" endTime="10.600"> <score AM="-31528.07227" LM="-1732.08472" COMBINED="-33260.15625"/> </word> <word wordID="naar" beginTime="10.600" endTime="10.800"> <score AM="-32227.48438" LM="-1785.05762" COMBINED="-34012.54297"/> </word> <word wordID="[s]" beginTime="10.800" endTime="10.810"> <score AM="-32279.51172" LM="-1785.05762" COMBINED="-34064.57031"/> </word> <word wordID="het" beginTime="10.810" endTime="10.900"> <score AM="-32755.11133" LM="-1816.83484" COMBINED="-34571.94531"/> </word> <word wordID="cement" beginTime="10.900" endTime="11.460"> <score AM="-34582.38672" LM="-1954.31470" COMBINED="-36536.70312"/> </word> <word wordID="[s]" beginTime="11.460" endTime="11.470"> <score AM="-34614.26953" LM="-1954.31470" COMBINED="-36568.58594"/> </word> <word wordID="'t" beginTime="11.470" endTime="11.700"> <score AM="-35245.50391" LM="-1995.90515" COMBINED="-37241.41016"/> </word> <word wordID="cd" beginTime="11.700" endTime="12.180"> <score AM="-36775.66016" LM="-2143.99170" COMBINED="-38919.65234"/> </word> <word wordID="lover" beginTime="12.180" endTime="12.710"> <score AM="-38360.26172" LM="-2316.99219" COMBINED="-40677.25391"/> </word> <word wordID="[s]" beginTime="12.710" endTime="12.760"> <score AM="-38546.11328" LM="-2316.99219" COMBINED="-40863.10547"/> </word> <word wordID="gezondheidss" beginTime="12.760" endTime="13.270"> <score AM="-40555.60938" LM="-2493.16968" COMBINED="-43048.77734"/> </word> <word wordID="en" beginTime="13.270" endTime="13.370"> <score AM="-40998.75391" LM="-2500.35669" COMBINED="-43499.10938"/> 214 Annex E. Speech recognition </word> <word wordID="niet" beginTime="13.370" endTime="13.670"> <score AM="-42095.47656" LM="-2567.79614" COMBINED="-44663.27344"/> </word> <word wordID="werd" beginTime="13.670" endTime="13.880"> <score AM="-42794.28906" LM="-2670.35693" COMBINED="-45464.64453"/> </word> <word wordID="hij" beginTime="13.880" endTime="14.180"> <score AM="-44075.64062" LM="-2732.96899" COMBINED="-46808.60938"/> </word> <word wordID="[s]" beginTime="14.180" endTime="14.260"> <score AM="-44427.12891" LM="-2732.96899" COMBINED="-47160.09766"/> </word> <word wordID="de" beginTime="14.260" endTime="14.370"> <score AM="-44871.51562" LM="-2785.76685" COMBINED="-47657.28125"/> </word> <word wordID="komende" beginTime="14.370" endTime="14.780"> <score AM="-46386.73828" LM="-2853.83130" COMBINED="-49240.57031"/> </word> <word wordID="zeven" beginTime="14.780" endTime="15.170"> <score AM="-47839.63281" LM="-2934.08618" COMBINED="-50773.71875"/> </word> <word wordID="weken" beginTime="15.170" endTime="15.550"> <score AM="-49237.71875" LM="-2973.23242" COMBINED="-52210.94922"/> </word> <word wordID="met" beginTime="15.550" endTime="15.740"> <score AM="-49943.36328" LM="-3037.76416" COMBINED="-52981.12891"/> </word> <word wordID="jullie" beginTime="15.740" endTime="16.200"> <score AM="-51646.74219" LM="-3154.68311" COMBINED="-54801.42578"/> </word> <word wordID="uh" beginTime="16.200" endTime="16.460"> <score AM="-52159.67188" LM="-3223.82251" COMBINED="-55383.49609"/> </word> <word wordID="doornemen" beginTime="16.460" endTime="17.190"> <score AM="-55033.90234" LM="-3385.21851" COMBINED="-58419.12109"/> </word> <word wordID="[s]" beginTime="17.190" endTime="17.670"> <score AM="-56481.44141" LM="-3385.21851" COMBINED="-59866.66016"/> </word> </wordsequence> </speech> <speech label="SPK01-002" begintime="18.13" endtime="35.00" > <real_time milliseconds="17020" frames="1687" RTF="1.0089"/> <EOS-score="0.00000"/> <score COMBINED="-68474.89844"/> <wordsequence> <word wordID="[s]" beginTime="18.130" endTime="18.600"> <score AM="-1041.74548" LM="0.00000" COMBINED="-1041.74548"/> </word> <word wordID="en" beginTime="18.600" endTime="18.730"> <score AM="-1560.05615" LM="-42.98058" COMBINED="-1603.03674"/> </word> <word wordID="ik" beginTime="18.730" endTime="18.830"> <score AM="-1991.38696" LM="-88.98117" COMBINED="-2080.36816"/> </word> <word wordID="dacht" beginTime="18.830" endTime="19.040"> <score AM="-2784.23022" LM="-145.49277" COMBINED="-2929.72290"/> </word> …………………………………. <word wordID="ook" beginTime="2688.400" endTime="2688.560"> <score AM="-62879.57422" LM="-3356.99512" COMBINED="-66236.57031"/> </word> <word wordID="realiseren" beginTime="2688.560" endTime="2689.390"> <score AM="-65831.93750" LM="-3444.05444" COMBINED="-69275.99219"/> </word> <word wordID="[s]" beginTime="2689.390" endTime="2689.570"> <score AM="-66419.42969" LM="-3444.05444" COMBINED="-69863.48438"/> </word> </wordsequence> </speech> <speech label="SPK01-170" begintime="2689.59" endtime="2706.69" > <real_time milliseconds="19453" frames="1710" RTF="1.1376"/> <EOS-score="0.00000"/> <score COMBINED="-68375.91406"/> <wordsequence> <word wordID="[s]" beginTime="2689.590" endTime="2689.740"> <score AM="-534.19904" LM="0.00000" COMBINED="-534.19904"/> </word> <word wordID="dat" beginTime="2689.740" endTime="2689.920"> <score AM="-1349.89319" LM="-44.63421" COMBINED="-1394.52734"/> </word> <word wordID="er" beginTime="2689.920" endTime="2690.020"> <score AM="-1821.88281" LM="-98.82831" COMBINED="-1920.71106"/> </word> <word wordID="natuurlijk" beginTime="2690.020" endTime="2690.440"> <score AM="-3515.03809" LM="-181.26859" COMBINED="-3696.30664"/> </word> <word wordID="ook" beginTime="2690.440" endTime="2690.620"> <score AM="-4123.34424" LM="-209.74142" COMBINED="-4333.08545"/> </word> <word wordID="[s]" beginTime="2690.620" endTime="2690.630"> <score AM="-4170.82178" LM="-209.74142" COMBINED="-4380.56299"/> </word> Annex E. Speech recognition 215 <word wordID="heel" beginTime="2690.630" endTime="2690.920"> <score AM="-5322.86133" LM="-260.55157" COMBINED="-5583.41309"/> </word> <word wordID="[s]" beginTime="2690.920" endTime="2690.930"> <score AM="-5364.94189" LM="-260.55157" COMBINED="-5625.49365"/> </word> <word wordID="anders" beginTime="2690.930" endTime="2691.370"> <score AM="-7285.50293" LM="-308.50604" COMBINED="-7594.00879"/> </word> <word wordID="[s]" beginTime="2691.370" endTime="2691.530"> <score AM="-8042.13965" LM="-308.50604" COMBINED="-8350.64551"/> </word> <word wordID="kan" beginTime="2691.530" endTime="2692.050"> <score AM="-10443.12793" LM="-372.76978" COMBINED="-10815.89746"/> </word> <word wordID="[s]" beginTime="2692.050" endTime="2692.140"> <score AM="-10779.51074" LM="-372.76978" COMBINED="-11152.28027"/> </word> <word wordID="en" beginTime="2692.140" endTime="2692.280"> <score AM="-11268.68262" LM="-425.27689" COMBINED="-11693.95996"/> </word> <word wordID="in" beginTime="2692.280" endTime="2692.430"> <score AM="-11783.96582" LM="-486.46951" COMBINED="-12270.43555"/> </word> <word wordID="een" beginTime="2692.430" endTime="2692.560"> <score AM="-12219.71777" LM="-525.48224" COMBINED="-12745.20020"/> </word> <word wordID="[s]" beginTime="2692.560" endTime="2692.690"> <score AM="-12731.64941" LM="-525.48224" COMBINED="-13257.13184"/> </word> <word wordID="heel" beginTime="2692.690" endTime="2692.920"> <score AM="-13424.57227" LM="-595.17139" COMBINED="-14019.74316"/> </word> <word wordID="veel" beginTime="2692.920" endTime="2693.150"> <score AM="-14489.91504" LM="-651.30042" COMBINED="-15141.21582"/> </word> <word wordID="[s]" beginTime="2693.150" endTime="2693.160"> <score AM="-14535.47363" LM="-651.30042" COMBINED="-15186.77441"/> </word> <word wordID="landen" beginTime="2693.160" endTime="2693.470"> <score AM="-15868.35742" LM="-731.72437" COMBINED="-16600.08203"/> </word> <word wordID="ook" beginTime="2693.470" endTime="2693.640"> <score AM="-16490.08789" LM="-794.62903" COMBINED="-17284.71680"/> </word> <word wordID="heel" beginTime="2693.640" endTime="2693.850"> <score AM="-17516.69141" LM="-864.57343" COMBINED="-18381.26562"/> </word> <word wordID="[s]" beginTime="2693.850" endTime="2693.860"> <score AM="-17571.29492" LM="-864.57343" COMBINED="-18435.86914"/> </word> <word wordID="anders" beginTime="2693.860" endTime="2694.160"> <score AM="-18876.98633" LM="-912.52789" COMBINED="-19789.51367"/> </word> <word wordID="[s]" beginTime="2694.160" endTime="2694.170"> <score AM="-18926.98828" LM="-912.52789" COMBINED="-19839.51562"/> </word> <word wordID="gaat" beginTime="2694.170" endTime="2694.670"> <score AM="-20875.73633" LM="-981.36896" COMBINED="-21857.10547"/> </word> <word wordID="[s]" beginTime="2694.670" endTime="2695.030"> <score AM="-22006.42578" LM="-981.36896" COMBINED="-22987.79492"/> </word> <word wordID="het" beginTime="2695.030" endTime="2695.220"> <score AM="-22716.94141" LM="-998.54010" COMBINED="-23715.48242"/> </word> <word wordID="meest" beginTime="2695.220" endTime="2695.470"> <score AM="-23682.84570" LM="-1089.76001" COMBINED="-24772.60547"/> </word> <word wordID="extreme" beginTime="2695.470" endTime="2695.890"> <score AM="-25189.66016" LM="-1165.45996" COMBINED="-26355.11914"/> </word> <word wordID="voorbeeld" beginTime="2695.890" endTime="2696.260"> <score AM="-26716.26367" LM="-1201.65186" COMBINED="-27917.91602"/> </word> <word wordID="daarvan" beginTime="2696.260" endTime="2696.590"> <score AM="-27988.59961" LM="-1258.88196" COMBINED="-29247.48242"/> </word> <word wordID="zijn" beginTime="2696.590" endTime="2696.780"> <score AM="-28713.11133" LM="-1301.98157" COMBINED="-30015.09375"/> </word> <word wordID="tien" beginTime="2696.780" endTime="2696.960"> <score AM="-29364.91016" LM="-1415.34509" COMBINED="-30780.25586"/> </word> <word wordID="ontwikkelingslanden" beginTime="2696.960" endTime="2697.850"> <score AM="-33020.26172" LM="-1557.82581" COMBINED="-34578.08594"/> </word> <word wordID="[s]" beginTime="2697.850" endTime="2698.380"> <score AM="-34633.25000" LM="-1557.82581" COMBINED="-36191.07422"/> </word> <word wordID="waaronder" beginTime="2698.380" endTime="2699.060"> <score AM="-37268.43750" LM="-1628.67432" COMBINED="-38897.11328"/> </word> <word wordID="basis" beginTime="2699.060" endTime="2699.450"> <score AM="-38777.08203" LM="-1773.71997" COMBINED="-40550.80078"/> </word> 216 Annex E. Speech recognition <word wordID="in" beginTime="2699.450" endTime="2699.600"> <score AM="-39287.77344" LM="-1822.52649" COMBINED="-41110.30078"/> </word> <word wordID="first" beginTime="2699.600" endTime="2699.850"> <score AM="-40235.83203" LM="-1972.55884" COMBINED="-42208.39062"/> </word> <word wordID="die" beginTime="2699.850" endTime="2700.040"> <score AM="-41120.54297" LM="-2045.21863" COMBINED="-43165.76172"/> </word> <word wordID="nog" beginTime="2700.040" endTime="2700.230"> <score AM="-41832.09766" LM="-2114.40869" COMBINED="-43946.50781"/> </word> <word wordID="volledig" beginTime="2700.230" endTime="2700.810"> <score AM="-43819.07422" LM="-2224.03735" COMBINED="-46043.11328"/> </word> <word wordID="[s]" beginTime="2700.810" endTime="2700.820"> <score AM="-43871.33594" LM="-2224.03735" COMBINED="-46095.37500"/> </word> <word wordID="ontbreekt" beginTime="2700.820" endTime="2701.450"> <score AM="-46094.56641" LM="-2319.25000" COMBINED="-48413.81641"/> </word> <word wordID="[s]" beginTime="2701.450" endTime="2701.750"> <score AM="-46920.07422" LM="-2319.25000" COMBINED="-49239.32422"/> </word> <word wordID="en" beginTime="2701.750" endTime="2701.850"> <score AM="-47307.50781" LM="-2377.24951" COMBINED="-49684.75781"/> </word> <word wordID="daar" beginTime="2701.850" endTime="2702.000"> <score AM="-47861.07812" LM="-2434.52393" COMBINED="-50295.60156"/> </word> <word wordID="[s]" beginTime="2702.000" endTime="2702.010"> <score AM="-47905.36328" LM="-2434.52393" COMBINED="-50339.88672"/> </word> <word wordID="gaat" beginTime="2702.010" endTime="2702.170"> <score AM="-48427.79688" LM="-2483.29663" COMBINED="-50911.09375"/> </word> <word wordID="na" beginTime="2702.170" endTime="2702.340"> <score AM="-49053.28516" LM="-2578.76147" COMBINED="-51632.04688"/> </word> <word wordID="de" beginTime="2702.340" endTime="2702.460"> <score AM="-49529.62500" LM="-2599.29126" COMBINED="-52128.91797"/> </word> <word wordID="[s]" beginTime="2702.460" endTime="2702.470"> <score AM="-49568.12109" LM="-2599.29126" COMBINED="-52167.41406"/> </word> <word wordID="pauze" beginTime="2702.470" endTime="2702.800"> <score AM="-50858.21875" LM="-2661.28027" COMBINED="-53519.50000"/> </word> <word wordID="door" beginTime="2702.800" endTime="2703.060"> <score AM="-51838.53516" LM="-2743.34717" COMBINED="-54581.88281"/> </word> <word wordID="is" beginTime="2703.060" endTime="2703.360"> <score AM="-52878.67578" LM="-2825.90088" COMBINED="-55704.57812"/> </word> <word wordID="[s]" beginTime="2703.360" endTime="2703.370"> <score AM="-52931.14453" LM="-2825.90088" COMBINED="-55757.04688"/> </word> <word wordID="er" beginTime="2703.370" endTime="2703.550"> <score AM="-53466.64062" LM="-2855.25171" COMBINED="-56321.89062"/> </word> <word wordID="[s]" beginTime="2703.550" endTime="2703.630"> <score AM="-53726.53125" LM="-2855.25171" COMBINED="-56581.78125"/> </word> <word wordID="welgeteld" beginTime="2703.630" endTime="2704.230"> <score AM="-56197.35938" LM="-2989.80566" COMBINED="-59187.16406"/> </word> <word wordID="[s]" beginTime="2704.230" endTime="2705.140"> <score AM="-59231.87109" LM="-2989.80566" COMBINED="-62221.67578"/> </word> <word wordID="aan" beginTime="2705.140" endTime="2705.430"> <score AM="-60231.19141" LM="-3078.66821" COMBINED="-63309.85938"/> </word> <word wordID="uh" beginTime="2705.430" endTime="2705.570"> <score AM="-60526.45312" LM="-3139.66064" COMBINED="-63666.11328"/> </word> <word wordID="[s]" beginTime="2705.570" endTime="2705.610"> <score AM="-60670.96094" LM="-3139.66064" COMBINED="-63810.62109"/> </word> <word wordID="even" beginTime="2705.610" endTime="2705.800"> <score AM="-61350.90234" LM="-3239.85645" COMBINED="-64590.75781"/> </word> <word wordID="pauzeren" beginTime="2705.800" endTime="2706.330"> <score AM="-63343.58203" LM="-3358.18481" COMBINED="-66701.76562"/> </word> <word wordID="met" beginTime="2706.330" endTime="2706.690"> <score AM="-64959.30859" LM="-3416.60718" COMBINED="-68375.91406"/> </word> </wordsequence> </speech> </segments> <statistics> <real_time milliseconds="3150020" frames="261370" RTF="1.2052"/> </statistics> </shout_metadata> Annex E. Speech recognition 217 Annex E2. Transcript of lecture CT3011 from speech recognition (SHoUT) en aan uh ja en daarin lopen we daar is zeker geen en uh we met het hele heelal van uh dertig elf en later naar het cement 't cd lover gezondheidss en niet werd hij de zeven weken met jullie uh doornemen en ik dacht ik zelf mee zeker je je voorstellen is mijn naam is als een tank zoals jullie daar zien staan en ik dacht laat ik daar maar twee dingen van één en een half jaar naar werk ja nou willen niet inzien ja ik ben marathonloper een mooie foto van de glorieuze binnenkomst in uh rotterdam in april afgelopen periode en marathonlopers dat zijn allemaal een beetje fanatieke leidde er echter door de ouders die twee iedere dag die je er beter inleven zelf de aandacht te organiseren dat het allemaal kan dus ik loop hier ook iedere dag tussen de meer dan ooit je naar delft houdt of langs deze rivier of andere koerier en uh als jullie d'r is een keer in korte broek als d'r is vaccineren lopen dan klopt dat we daar niet en uh dat door tieners met een heel groepje er mensen zijn bij ons op de afdeling met studenten en en één van die je ergens tien cent en is zeer tevreden met daarin trainen zijn diezelfde drie jaar geleden leerde hij jij inleidingen en laat de meeste mensen was een derdejaars inmiddels is afgestudeerd dat heeft hij er twee ik ben ja ja ja ja ja dat was nou als ik zeg niet als derde is dat zou mensen kunnen daar waar deze leest hè ja tuurlijk vorig jaar lang werden afgegeven in alle klassen het is een stuk zeg maar dat is wat er is echter laten we deze wist dat hij zijn dus uh wat te zien zou kunnen zijn in eerste reden is ook altijd ze er één van de gast gelezen medelijden is niet echt niet het eerste jaar en daar is natuurlijk ook een beetje net diary ben ik er dus dat zou best kunnen missen je daar je daarvan in het nou dan weet je nou dat weet jij hier dertig jaar geleden de laatste set neer te zetten dat is een techniek verste dertien zes en zeventig afgestudeerd en daarna de reeks werken aan werk erbij en is hier wel meer dingen zei in amersfoort en dat kan ik hier niet van harte aan uh raden als je straks afgestudeerd ben om daarin niet hier is d'r oren te gaan werken we zeggen wel ervaring je bent met allerlei projecten over de hele wereld bezig in mijn geval dan drinkwater projecten is het ontwerp van zijn installaties en waarom van de systemen ook het doen van onderzoek eigenlijk kun je alle kanten op de één en hier is wel in een nederlands indië zullen zijn redelijk succesvol ook op de internationale markten tegenwoordig ja ik heb daar uh vele jaren gewerkt en op dat andere evenementen inmiddels is dat alweer zeventien jaar geleden werd dat tensen stond dat ze later herenhuis op vier en delft vier en toen dacht ik van nou ja maar ik maak een brief schrijven je weet het nooit uh en niet wist dat is altijd mis dus ik heb een brief geschreven en ik dacht ik standvastig niet worden maar ik weet 't wel is ja dan meteen een eerste levenslessen in colombia maar is wat en het kan altijd meevallen naar ik ben is dat die vervolgens voor één dag in een beetje in deeltijd hoogleraar geworden in de drinkwater zien het is mijn leerstoel en en ja zo langzamerhand verandering komt dat andreae wordt voor steeds meer dingen gevraagd is in de laatste maand meer die nier in delft gaan doen en met de aanstelling daarentegen werd ik steeds verder af behoud en vanaf negentien negen en negentig en ik geleden gestopt met trainen ben ik hier voor altijd een leraar en vonden het ook weer aan de betekent ook dat je hebt enerzijds raken want niets aan onderwijs anderzijds onderzoek maar ook een incident en is management jaagt de moeder en ik ben een hoofd afdeling en zo en dat is in het managementteam of opleiding commissie moet je over algemene dingen meepraten en meebeslissen ja daar is in een dagtaak van maken en dat ze er heb ik altijd een eer vinden dat het leukst om met 't vaak bezig te zijn en daarin kan ook op de tweede plaatje wat hier staat dat alleen steeds eigenlijk als het is de leider en dat ga je niet komen die ja en dat proces doormaken met stront want als er een leuk om te zien hoe studenten zich transformeren van janine neer anonieme figuren die in de collegezaal zitten erin zitten te luisteren en niet meer als almere wat niet in een monoloog dan overdag werden hoewel ik overigens wel reacties realiseerde hij stellen en ik zal er ook altijd doen er expliciet om vragen aan 't maar goed de praktijk is toch dat ze in deze staat van de studie ziet hierin ofwel te luisteren en delen kijkt steeds leuker als je verder komt in een vier en vijftien jaar en 't hoogtepunt is dan natuurlijk het zetten achter je laat echter een onderwerp helemaal zelf bij de kop wat ik zeg ook altijd tegen mijn afstuderen is je moet van je afstudeerde leer je visitekaartje maken en door essent en en dat is dan krijgt ze al op momenten dat je klaar en het afstuderen en ontwerpt alleen via het meeste van dat onderwerp vrijaf meer dan er niet aan ook in nederland dan de ijssel ook niet meer door de achtste keer komen ook weer aan de aangever veel ken daar eind aan daar komen de mensen vanuit de waterbedrijven van klimaatverandering steeds instituten dienen aan hele discussies en onze acht jaar in ere zie weet ik keer op keer als er aan de man te worden misschien niet altijd honderd procent gelukt maar toch al één en negentig procent werd goed dan zou 't genoeg landen eten maken ik zeg alleen dat wat men ook aan af te dingen zit en dat is ook zo en uh ik heb kunnen stickers tachtig gehad en soms gaat dat dan heel goed zoals je staat met de rk in en doris doris is niet alles in de zaal aanwezig en dat niet dan uh het afgelopen jaar allebei zelfs net langs zijn afgestudeerd te betreden zit je ja heel goed gedaan ook stijgen schraal te zijn dan ook de achtste keer projecten heel goed gedaan dan het jaar dat is voor ons gewoon eerlijk om dat mee te maken om me te zien hoe jonge mensen het vaak al blijft gaan naar zinnen zelf ook enthousiaster worden en uh ja stempel gaan zitten om voor uh op ons vakgebied maar ik had dat enkele van je nieren op z'n grenzen maar kan me de 't dan is alles en hijzelf en het recht dan dat dit vaker bedreigd gaan dat ze er doen aan de hand van naar het boekwerk naast alle wordt aangegeven hebben en de nederlandse en engelstalige versie hier staan dan ook dat moeten jullie kant en daarin is dit het beste van ons niet veel te zien de verdieping voor vijfentwintig euro in de winkel kosten vijftig euro maar naar de speciale korting sturen regeling en jelena dezelfde is als je nederlands of engels een boekwerk telt de inhoud is jarenlang was ze als werk en in ieder geval voor dit uh vaak als jullie je en als die staan heel erg en dan zou ik zeggen als je goed engels kunnen lezen koken zijn als een boek dat is niet actuele staat iets meer informatie in maar het nederlandse boek is voor dit vak zeker ja is dan ook heeft natuurlijk alles wat er uh dat we daar over gaat vragen bij het 't en daar een en dezelfde en die je ook terug te komen deze omroep een drietal nog onzekere vinci als naslagwerk en jullie als je zo'n boek eenmaal hebt dan heb je daarbij neer dat ook na je afstuderen nee niet nee hè als je vervolgens ergens in een vreemd land en is te laat zien dat er van haider roepen is uit het duits en dan weet je het één en ander het is niet te zien is zo'n boek ook daar staan ze naast de kantine ook iedereen hebben ook vraagstukken landelijk noord staan zullen jullie misschien ook wel gezien hebben computers daar ineens zegt dat is uh over uh is niet verplicht is merels niet stuk ligt en ja je moet eigenlijk een tentamen doen maar uh de dealer en materiaal aan 't is uh de maakt er gebruik van zelf zeggen maar een ander niet controleren bestaan daar graag aan ontbreekt wordt zitten vragen in de broek de antwoorden staan welke bijen of althans als je die computers zijn na te maken te krijgen na de ramp met een andere vraag maar aan welke vragen fout waren deze leer is een ondersteuning voor jullie wij het kennismaken met de materie en het leren van de stof en oude tentamens hebben en ook bij staan deze keer ook nog is uh hoe venijn kijken wat er ongeveer gevraagd wordt en elke lezer eigenlijk alleen in ere houden en sir maar ja deze over een half uur niet helemaal te kennen dat boek hoort zowel daarbij dertig elf als mij het commissie vier en dertig twintig jaar na keizer gemaakt is voor de mensen die laten mensen met gaande is en uh de hoofdstukken die voor dertig elf gedragen te worden op tentamen staan ieren aangegeven en uh en die presentatie komt ook willen we bereiken wordt zijn zoals jullie weten niet precies weet deze video-opnamen dan gaan we deze colleges schneider is zeven keer kan de periode vanaf en niet naar nederland dit jaar zou doen dat in die eerste uren vertel ik een beetje de grote lijnen van het onderwerp is de belangrijkste ik proberen alle kleren aan te geven wat is nou belangrijker en wat minder en het leren heb ik steeds één van de trommels er niet en daar is dat uh door s. die daar niets aan vertellen over hun eigen onderwerp zijn eigen onderzoek naar eigen project dat een stukje actualiteit en geest en uh werken lering verdieping van het onderwerp en ik heb 't zo organiseert dat dat steeds als het goed niet goed op elkaar aansluiten en uh ja jullie je goede beeld geven van de stof dat je straks een tamelijk makkelijk kunt maken wil niet zeggen dat alle onderdelen van de verhalen van 't allemaal genie tentamens of uh zijn dat zullen ze er ook wel aangeven ja ja zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan je in een jaar hoeven te weten maar gaat niet om de de beeldvorming die nu in het van de matinee de hele reeks is in de eind jaren dieren te laten grote zuivering installatie bij rotterdam wijst naar de mis en om precies te zijn op de elf oktober ook dat is niet het verplicht alles is dat niet bij ons hebben zich tot nu toe steken zestig mensen aangemeld uh daarin zich eveneens sluit op één oktober hebben gezegd omdat beide laten bedrijven tegenwoordig nog strikter de veiligheid van de russen eiste nu zou zijn naam en de aanslagen in new_york medici je je moet daar uh precies op gegeven hier allemaal komen met name en zo hè maar moeten daarvoor in staan maar ook dat er geen dingen gebeuren en er moet een tienkamp is gereserveerd worden hier en daar is gesitueerd is de mensen die zich al gegeven hebben niet in eigen land een heel sieren binnenkort kort na één oktober met een bevestiging en degene die zich niet opgegeven hebben niet aan de niet nee je en ik ga ervan uit dat degene die zich wel opgegeven hebben het hier ook komen er dus niet een beetje tegenover de organisatoren als we daar met dat minder mensen zouden aankomen dan aangemeld te hebben hebben zelf te redeneren kijken wat daar in zekere zin al over dat is middags uh de verplichte tactieken uit zouden zijn van constructie leerde statistiek geloof ik een ze zullen proberen om 't hele neer te zetten zijn dat zal zeker niet om half twee zijn is ik denk dat we zo ongeveer om half drie terug en ze zijn naar en we kijken gewoon naar college op donderdag en dus allemaal zelf nee dat is kijk als ik al gaat ja kralj dus zijn de vragen over de organisatie en deze algemene inleiding elke nou dan ga ik altijd iets vertellen over gezondheid ze echt niet dat ze al jarenlang vertel dat het ook bij het eerste jaar al verteld en dat lijkt niets meer vertellen over de dingen laten zien van nederland en nadat de oud-senator is dan iets vertellen over de dingen laten zien in ontwikkelingslanden want daar is hij dan ben je bezig uh nou we hadden natuurlijk gezondheidss techniek en nou dat zal ieder van jullie niet uh zijn dat dat gaat over de stedelijke later uh ik kringloop is de infrastructurele werken voor de voorziening van drinkwater te winnen van grondwater twintig al oppervlaktewater en stuiteren daarvan het gevolg te transporteren met een heel transport 218 Annex E. Speech recognition leidingen en distributie leiding systeem naar ons al een toename is een oude zijn d'r industriële bedrijven vervolgens het inzamelen aansturen van het afgeladen vier irian het zuiveren van dat afvalwater en dat wordt dan vervolgens weer geloosd op het oppervlaktewater is alle infrastructurele werken die over die kleine stedelijke waterkering ook aan dat is wat de gezondheid zegt niet te noemen en ik zal hier voor alle strokes er op de de en drinkwatervoorziening omdat we daarin ook een is een duidelijk effect zien zoals hier in deze steriele weergegeven het verdwijnen van besmettelijke ziekten en in nederland kunnen die niet meer overgedragen worden via besmet werden in krachtig in het voordeel is dat dit nog heel andere situatie maar hier hebben we daar flink veel succes meer gehad in de twintigste eeuw zien hier een plaatje dat weer geeft de daling van de sterfte aan en lijkt niet steeds in de twintigste eeuw en dat loopt parallel aan het percentage van de mensen nog niet aangesloten is de drinkwatervoorziening in diezelfde periode is in nederland drinken laten zien aangelegd rond negentien honderd zelfs kort voor negentien honderd drie grote steden en zo langzamerhand ook in kleinere steden en met het platteland en in z'n al vanaf negentien vijfenzeventig zeg maar niet in nederland iedereen 't drinken aangezien aangesloten ik al een besmettelijke ziekte die ieder het is net drinkwater over een aantal worden ook niet meer voor en deze gaat ze bij ons er om daar uh infrastructurele werken en voor een goeie laten kwaliteit dus uh zaken en als de waterlinie water zuivering water transport laten genie en niet meeregeren ook en niet waterkwaliteit microbioloog die één is eind het afwezig zijn van de organismen waren ziet veranderen kunnen worden maar anderzijds ook het gebruik ervan en niet de organismen onder zijn d'r in 't uh optimaliseren en micro-organismen kunnen ook weer verontreiniging afbreken ik eis een voorbeeld daarvan is afgeladen zijn ring waarin met behulp van zuurstof en actief leren tennissen mengsel van bacteriën 't afvalstoffen in het afvalwater laten afbreken dus waterkwaliteit laten geen ier en microbioloog die zijn in dit deel van de civiele techniek je vrij belangrijk wij maken natuurlijk ook gebruik van de algemene kennis van de civiele zie je er zijn met name dan van zaken als iedereen audina niet oorlogen die en die constructie een leren constructieve vormgeving projecten realisatie en informatica aan vind ik allemaal dingen die je in projecten nodig hebt vaak al tien tien verband hebben ze niet hier is wereldbeeld om ons nee niet meer bezig met automatisering de ander is meer bezig met het constructieve deel is weer met die naliet aanwezig en jullie kunnen afhankelijk van de specialisatie niet kiest daar een rol in spelen en in iets anders en niet is natuurlijk van groot belang voor de volksgezondheid is trekt gezichten het gaat over de relatief grootschalig en uh uh infrastructurele werken we zien hier de zogenaamde die scorsese de mens is in de brabantse biesbosch mckenzie aangelegd zijn voor de drinkwatervoorziening en het gaat om een goed georganiseerde sector met z'n allen aan en zelfs een aparte wetgeving voor de waterleiding werd was dat willink laten zien gaat men gewoon staat precies waar alles aan moet voldoen en dat de directeur van het waterleidingbedrijf daar persoonlijk voor aansprakelijk is hier is je gevangenisstraf als die water of water distribueert waar je ziek van kan worden zeggen ze allemaal goed georganiseerd ja ja ja en houdt aan en is die studie is de rol van mijn leerstoel drinkwater zien is de enige leerstoel in nederland op het gebied van drinkwater zien dus dat is nog klein geeft een zekere exclusief en uh expliciet die tijd veel van onze studenten die zijn deze ook uh uh ja die hebben positie zien niet waar de wereld die zijn directeur of staf functionarissen of ontwerper daarin uh beiden laten bedreigen en ook veel van onze indiërs gaan naar de hier is een loser door en nou dat gebeurt een heleboel ander toe hebben zelfs ook als college willem-alexander die ook het watermanagement interessant vindt nou daar heb ik 't is altijd nog drie dia's worden niet aan naar nog meer vertellen over de opzet van de eerste keer in nederland die je er nog even laten registreren van dat werk van anderen ons vakgebied het is tien plaatje dat je er ook alle zien er dit jaar kan maar niet altijd wel uit dit diertje vandaan komt rent ja precies is dit is 't water verdrijft daar is niets aan en niet in zijn geval de media voeden boog en is hier dat ons alles kunnen dieren gedragen en dat was het van de wedstrijd is hier 't water gebruiken enorm naar mijn idee gaat niemand gelijk meer laten iedereen ziet voordelen in voor de tv zitten te kijken tot al te drieste is dan helemaal naar de wc met koffie uitermate en een enorme stijging in 't water gebruiken om de tweede en als garnering kijken zien we weer een zeer lage die in het water gebruikt mensen als er niet in een kort voor tijd toen dat doelpunt in dit geval de dennis bergkamp gemaakt werd en aan de einde van de mensenrechten en iedereen die naar de wc toe en dat ze al de zie je deze ook bij die niet hier verder uit en zelfs daar is het zo dat operators instellen die ziet er ook te kijken en alles in een beetje op als alle verkrachtte de ja ja ja dat is een enorme rij terroriseren waar zij dan in dit werk deze de rechter is het hele strook radio dan dat van uh ons gedrag 't gedacht van de bevolking en in de jaren dat normen inderdaad ik eis die je niet verder is het nou mensen daar is de oudste waarom juist commentaar komt geven daarin iedereen heeft het recht van de wc en heeft een lijst een jaar is het zeggen heeft en maar de zaken op de doelpunten herhaald en enkele andere geesten is uh nou dat is één ja natuurlijk vooral deze met ontwerpt en en 't gaat natuurlijk dus laat komen en nieuwe infrastructurele werken de bouw van een ongezonde te ontwerpen van een zuivering sinds te laat zien transport leiding en antwerpen en in dit als er over het project onderwijs en ontwerper andere eisen dat is geen visie en een bepaald aandeel in je hoofd maker van doen iets in elkaar zit is er veel te ver in het dat nauwe en en hoe stroomt het water door een installatie niet al is de lijn daar moeten we een bepaald schema van maken en moeten we vermoeden zou kunnen toelaten dat moeten we kunnen berekenen en daar moeten de vrouwen ook geen fouten bij maken en daar is dit plaatje voor uh bedoelt in één van de consulten installaties nee en drinkwater evenals een aandenken aan in en langs de drie nooit terug maar toen ik daar tien keer laten slachten is opgetreden met als gevolg in rosie van de consultant en uh ja natuurlijk heel vaak dat je dan ook helemaal niet dat je met die wet van murphy te maken met zijn dat alles wat fout kan gaan dat gaat ook een keer deze laatste slag met het verschijnsel dat als bijvoorbeeld een pond af slaat dat er onder druk of kan ontstaan en die onder druk niet aan en uh ja deze inderdaad tot in de uzi een leiden naar 't kan je team voorkomen door een uh mmm en onderricht in de lucht in zijn nieuwe aan te dringen 't is hier ook gedaan bovenop dat al veel te zeldzame ventiel maar helaas was er net dat er dan mensen die komt hieruit viel de stroomstoring was het ook een intern was het een hele strenge vorst en was dat de dienst in tiel de verloren waardoor er geen lucht meer kon toetreden en dat is toch vaker mond stond in dat land en ja de liefde en niet wil goedpraten optrad is ontwerper is vooral ook de messiah van die niet niet kunnen gaan vandaar dat italie kan ook vrij belangrijk is is niet heel tevreden dat het water en ze uitsluitsel over de verkeerde kant op gaat en je moet vooral ook steeds aan het einde op de dingen die fout kunnen gaan en ontwerpen is vooral ook echt waarin uh dingen gezien hebben hoe iemand in de praktijk wel vandaar dat we die excursie gepland hebben naar de wereld laten doen in juli voor de eerste keer vast is even kijken van uh ja precies een installatie naar haar gedaan moet je allemaal rekening mee houden paula iets en dan nog alleen maar maakt zich al een ander jaar in limburg in dit geval met waren grote transporten leiding is aangelegd bij en oppervlaktewater project nu al heel dat was in het kader van uh dezelfde namen de centen wel eens discussie is deze discussie die in nederland een aantal jaren voor het eerst aan de orde die niet onder andere door de winning van drinkwater aan de grond laten staan allemaal aangetreden verdroging van natuurgebieden op 't eerste hier in limburg zeg tien jaar tien geleden is er nou moeten de grondwater ingaan verminderen en overgaan op de laatste is dan tenslotte door naar de trainer zijn is dat is vrij makkelijk is te lezen hiernaast waarbij aangelegd aangelegd 't was een hand in hand is te laat stadium gewonnen is niet het was toch al niet zoveel met naast je later en dat laat ze laten gaat vervolgens uh faneyte dat werk en dat zien hier zakt dat ze er vanzelf de grond in een door infiltratie kunstmatig in getraceerd water de grond in waarbij je er vast een heleboel kwaliteitsverbetering optreedt allerlei stoffen die worden afgespeeld in het westen met zand ondergronds en dat die jullie gaan dood door de lange verblijf tijd je krijgt dan een aanzienlijke verbetering van de late kwaliteit is dan wordt het water weer op om met behulp van peter niet aan op een bepaalde afstand rook lont albert einstein had geplaatst dus dan dan niet eigenlijk een soort kunstmatige ontstaat en je maakt dan eigenlijk van nagelaten want natuurlijk en allerlei bacteriën en virussen en andere verontreiniging bevat maakt een soort kunstmatige grondwater dat wordt dan weer gewonnen en wordt vervolgd nog gezuiverd in zijn installatie die je werk zien weergegeven en dan ging 't is net niet worden leiding heel limburg erin naar de gebruikers te doen en en tenslotte door natuurlijk ook een onderzoek van oranje op de thee in en als je mij niet hier willen werken nou net niet zo veel onderzoek nodig dan gebruik je meestal vijftien regels en het ontwerp twee criteria maar 't vaak niet ontwikkelt zich natuurlijk ook steeds verder naar zijn niet intimideren bedreigingen momenteel onmogelijk niels het voorkomen van geneesmiddelen en in de rij in de wereld niet aantoonbaar in concentraties in een jaar in aanwezig en komt dat nou ook in het drinkwater terecht en dan moeten we daar doen moeten ze uit de meer uitgebreide worden dat soort vragen die leven en zijn er aan het onderzoek bezig onderzoek gebeurt maakt er de laatste bij ons in een plaatje van 't zelfde kern in luxemburg zal ik sterven naar je misschien afgelopen maanden wat ook verteld hebben we nu maar het is ook heel relevant omdat het één en laat het andere niet is later is er natuurlijk een stof en de verontreiniging en de stoffen waar het om gaat ja dat is afhankelijk van de bron is de interactie de lozingen van stoffen die evenveel plaats gevonden hebben interactie met broeder een beladen natuurlijk afvalstoffen in het water terechtkomen die je later is er anders en je moet het bij voorkeur ter plaatse doen het is niet zo goed mogelijk om te zeggen van nou ja ik doe niet landelijk door in maart oefenen nee je hebt altijd weer de toets nodig van de praktijk gedrag te laten zich in de praktijk uh ook zoals de theoretici denken sommige dingen gebeurt dit wel in dat laatste is hier ook een land dat alleen steeds in drie laten laboratorium waar allerlei opstelling staan filter is er dus niet installaties anderen ertoe voorstellingen en dan krijg je je laat ze serieuzer artikelen moet doen als je in deze richting uh doorgaat uiteindelijk kerk waar is die als een promotieonderzoek kunnen doen en in de aula aan de de dokters wil uitgereikt werk krijgen grootste dan heb ik nog een kwartier ik het goed zegt ja en die kan ik goed gebruiken versterkt ja om uh is negen en eerste gehaald was te geven van wat is er nou die zonder nadenken aangezien in nederland wat moeten jullie daar nou van eten en en ik maakte aangelegenheid van een presentatie die vorig jaar gegeven en tien jaar daarvoor de canadezen laten bedreigen en daar is 't is heel anders is ik heb daar ook echt m'n best gedaan om een beetje duidelijk te maken van wat is er nou bijzonder in nederland en ze al wat zou hier niet aan de aan de aan de trainer en ik denk dat we jullie ook een aardige introductie zou kunnen zijn in dit vakgebied en eigenlijk is dat trouwens al heel kernachtig weergegeven met dit laatste is een plaatje van hun hun niet zinnen en joey dieren en ja het water uit te keren aan de ind en eigenlijk zoals maakt hier een uh weergeeft vertel zijn is het water moet zo Annex E. Speech recognition 219 zijn dat je de volledig op kunt vertrouwen dat het niet zelf je kinderen later in de jaren en dat 't beloofde elke verdenking verheven is dat is eigenlijk niet en dan zich niet aan de drinken aangezien in nederland dat is natuurlijk ook bij andere landen in zekere zin wel te groot maar toch veel minder niet alleen iets te vieren in amerika en daarna en dat soort landen geweest zijn daar is 't ja eigenlijk niet zo dat men er drinkwater dat ik daar ook tenslotte ja later dat is niet iets dat gebruik je om de voor de was te zien en en de missie niet te doen maar je ook over maar drink intieme eigenlijk niet in canada en india je laten wilde in kennen eigen renstal te blijven zitten maar al die is er nog een filter op die kamer omdat water nadat ze ijveren en dat noemen we het consumenten te allen is in die landen dus veel minder dan in nederland en het heeft voor een deel te maken met ja natuur en traditie en in europa zijn dat alle trainingen goed regelt en in een erika christensen om minder daar overstroomd gelang heel dorien zijn aan 't is een nieuwe opbouwen zouden doen maar in nederland ook niet en zelfs dat mensen in canada ook zijn ook in nederland is 't zelf vinden dat de absolute zekerheid moeten hebben dat ding dat dat uit de kraan komt dat er drie aan altijd is het leven niet zeker eind en trainer altijd goed is zo dat onze kinderen met een gerust hart kunnen drinken en hijzelf ook en jawel een plaats in het uh een aantal jaren 't was 't water gebruikt het dat we gebruik maken van grondwater en oppervlaktewater voor drinkwater zien en grondwater meer is ja ook in nederland vaak nog van een hele goeie kwaliteit uh de beetje geïllustreerd aan dit laatste is dan uh de zenuwen waren we regelen de wolken zien en je kunt je wel voorstellen als thierry als te treden daar op dat enorme zand oppervlak van de vrije uren stroomt via daargelaten wordt heel goed gevuld leert en dat grondwater dat je daarom niet uh hoorde je dat je daar niet het is een hele goeie kwaliteit natuurlijk is het grondwater is niet algemeen goed zijn best ook wel zorgen over hoort er hier en daar hebben natuurlijk ik had me niet te weinig zelfs uh ze alles ter plaatse en die in het grondwater terecht onder enige en en boeren die gebruikelijk in mest en bestrijdingsmiddelen en het uiteinde een ook in het grondwater terecht komen maar hij gemiddeld gesproken is grondwater tal van prima kwaliteit en wiskunde ook volstaan met een eenvoudige cijfer in de lucht in en zand op de raadslieden komen dan op 't meestal half uur en dan maar te laten daarentegen dat is juist het andere eind van 't dan zou je kunnen zeggen en hij zit in in nederland leidt af van wat je van roken aan één en daarnaast die zijn door de frankrijk en duitsland en belgië althans voor later is ook geloosd is de oppervlakte laten bevalt dan volledige cocktail aan alles dat als drie je kunt voorstellen is al te lang te water moet zeer uitgebreid gezuiverd worden en dat doen we ook in nederland wordt in het buitenland was aangeduid als double darts drie mensen hebben heel veel suiker is zes achter elkaar om dan maar zeker van te zijn dat dat water uiteindelijk toch goed is en heel bijzonder in internationaal verband gebruiken geen gehoor amerikanen die vinden het om gehoor te gebruiken en drinkwater smaakt ook naar de order uit ook naar voren daar vinden amerikanen aan volkomen normaal en in nederlandse en een eventuele niet in eerste plaats is daar een inhoudelijke reden voor de namen we weten dat als je door toepast dat gaat reageren met bepaalde stoffen die van nature in water voor de oma van graniet in dingen en daar kijk je bepaalde resistentie negen producten noemen het loon voor een is het meest voorbeeld en dat zijn deze ongewenste stoffen zijn stoffen die giftig kunnen zijn en nou daar kan je al gaan zeggen van ik kan daar nog andere voorstellen maar misschien kan ik niet aan iedereen voldoen maar het in een in nederland onder de russen ned zijn allerlei stoffen die willen gewoon niet en deze willen worden lang niet gebruiken dat is een bepaald ja essentieel het gaan spelen niet wat ook heel veel consequenties je meester maar wat in nederland al meer dan dertig jaar gehanteerd wordt en daar is er veel aan gedaan stellen onderzoeken aangedaan dus dat is denk ik één belangrijk punt al voor dit eerste college om even vasthouden aller blijft gewoon tot die giftige verbindingen en het moet je daarom niet willen hiervan hebben dat in nederland besloten dat we dat niet willen kunnen doen is het niet te vrezen want praktisch alles kijkt en is dat later met de norm nou alles smaakt en dat vinden in nederland ook niet aan toch wat later werd ik er aan komt dat moet lekker smaakt en dat moet niet zo fysiek loodzwaar te hebben dat is 't is en wat ze alles maakt dat willen we voor drinkwater niet is waarschijnlijk andré de eerste keer niet met cultuur en consumentenvertrouwen samen we hebben een aantal alle risico is iedereen nederland en gebruiken en ik heb daar een stuk of drie vier ziet voor om die kort tegen de zin te laten presteren dan nietrokers op 't is onduidelijk al over gezegd in nederland is ook zo dat we een relatief uh de grotere bedrijven hebben en die je je in zo'n mengsel zijn van publiek en privaat het zijn geen genetisch en niet iedereen is het water drijft wat hier is dat in het geen maar de aandelen zijn in handen van de gemeente rotterdam en de provincie uh en andere gemeenten niet gezien is dat niet is is eigenlijk een soort altijd maar net die niet eens in je overheid en het geeft ook niet bijzonder is het vorige week dan minimaal leraar benoemde tvm ja heel verhalen over eens dat dat eigenlijk een ideale formule is dat je niet meer zeg maar de waarde van water en later is toch iets niet zomaar in een markt goed is zoals die niet zo makkelijk kunt reguleren zoals andere uh zoals auto's en andere dingen is 't water heeft ook iets te maken met iets van ons allemaal moeten we niet zoveel meer ja en zo'n publieke verantwoordelijkheid dienen en niet gaat rechterlijke organisatie nvpi deze adviseert werkt ja dat is alleen iets want een zekere aantrekkelijke kanten heeft en typisch voor nederland een rol in je heel graag deze doet daar niets aan maar is niet laten sekte in nederland heel goed georganiseerd die heeft een gemeenschappelijke research institute opgesteld die de aan en waren en het is werk voor de waterbedrijven wordt uitgevoerd en die hebben een belangenorganisatie opgericht de vrede in en die hebben ook een besloten training de knvb en waren als je hier in de laatste in een ander lid van worden en ja daar is een heel wereldje waarin het gaat het goed samengewerkt wordt en zo informatie uitgewisseld dan zou d'r iets bijzonders dan nederland is in nederland ook makkelijker lenen niet natuurlijk en je niet ik kan je niet gemakkelijk te samenwerken te zullen zijn te lezen en milieu ik en dat gaat in nederland allemaal wat makkelijker wat een infrastructuur werken eten en dan zijn dit en seksuele kenmerken om te de bescherming van de drol dan moet als het in mei en niet het paard achter de wagen spannen en 't huys te laten nee ik altijd met een zo snel mogelijk een rol en zorgen dat die brons dan blijft de naam zie je overal ja ik ben niet boos is hiervan wordt zie staan met grondwater benin grondwater bescherming geniet niet verontreinigd en als je het kunt voorkomen dat de de gebruikt grondwater als mogelijk is te zien het helemaal niet 't noorden het oosten en het zuiden van nederland wordt alleen maar grondwater gebruikt voor de drinkwater zien daar is grondwater beschikbaar is van goede kwaliteit dat is niet de biologische betalen maar je ziet zelden zoiets wel eens iets andere orde is daar is de voor deze ronde die gebruiken we dan dus ook nou in het westen van nederland kan dat natuurlijk niet 't weet jij ook echt waarom niet nee geen thema in 't als leider is niet altijd zonder gestand te bereiken ja maar ook mee en is een echte nadenken is dan een die ze moeilijk zeker laten ze oud precies en is het grondwater hier is de oudste ontstaan is heel erg duur en is dat ik niet eigenlijk niet de actrice dus ja je kan je geen grondwater gebruiken deze gebruiken maar oppervlaktewater en een deel dat daarvoor werken hier in het hele duingebied door van oppervlaktewater kunstmatige land laten maken is hier niet goed raad cynischer rond op de vlakte later dit jaar één gelaten dat de bodem isaac en dan wordt het een soort kunstmatige ontstaan willen ja ja ja ja in alle hoewel ja maar wat is er een ja dan moet ik even niets meer zeggen dan eerst een hele tijd niet geldt eigenlijk dat de door haar als gevolg van nieuwe lang niet iedereen die op die dagen de gevallen is dat een zoet water wel op het zoute water drijft is als je heel voorzichtig te laten niet meer als een te laten winnen dat kan snel fouten aan de orde deze moet je echt wel mee op pad zijn maar het kan net al zei orie zouden daarin waterlinie in het westen van 't zuid holland en en noord-holland begonnen in de eeuw door gewoon eerst daar later op de grond te alleen met die ze daar fout gegaan is is oudewater en toen deze met 't niet verteren van grote aantallen nou als je als klant en dat is nou net de hele dag mensen ergens al een aantal ook niet meer laten kijken en daar wordt dan moet je ook laten laten gebruiken dan heeft geen grondwater rotterdamse heeft ook geen eigen die zien we daar oppervlaktewater bereiken dan moet je niet heel uitgebreide zuiveringen hebben dus is ook niet eenvoudig en daar gaat het daar te deze man dat is en in dat is toch nummer één zeer regelmatig dat de later richting in de krant zetten van deze stof moet verboden worden hier moeten bedenkingen aangesteld worden alle zorgen en dat wat goed is goed blijft nou grondwater laat je denkt dat dit toch al en is dat niet is en als je een mens om met vier dan maakt van grondwater dat eigenlijk negen later door een gigantische san siro gesteld is er nou dat is goed en over te laten nou wel bij scheveningen hier 't is gewoon in de natuurlijke dijkstal eitjes wonderen en maal slaat en dan naar voor zuigelingen over voor wat anders te stoppen die daar gelijk in de training en dan vond hij niet horen daar milieu heeft in het ik niet eens laten wordt eerst voor te zijn met de ander daar in ieder komt daar zakte de bodem in dan in 't weer terecht met met weetjes die her en der führer niet duidelijk geplaatste zijn dan gaat het naar de naar zijn ze niet toe nieuwenhuys te zien staan en dan vervolgens de distributie net in de gemeenteraad en de meervoudige waar je ns dat ik mensen volstaat eigenlijk onderschat is trouwens geen maar te zien en te zien dat daar een heleboel stappen achter mekaar zitten hebben gewoon veel afzonderlijke zaken links processen in deze eisen omdat er onder zeker van te zijn dan zij iets minder werk dat andere ontvangt de veiligheid door die strijd is heel belangrijk en anderzijds om soorten stoffen met zijn drie systemen tegen te kunnen houden is het woord altijd ja 't gaat altijd om een vrij uitgebreide technische thema's als het over wat water tegelijk ook moderne technologie nee dus dat zijn dan weer uh ontwikkelingen die de afgelopen decennia zeg maar mogelijk geworden zijn je zien we de men de hand op de laatste installatie bij heemskerk deze modernste en de grootste zuivering van en niet iedereen in europa zien nederlandse ontwikkeld hier de meest sexy merk ik nu veel licht is 't zijn eigenlijk gewoon een hele onderwijs is daar je kunnen voorstellen maar die stralen dan is de realiteit en en bacteriën die kunnen daar niet tegen deze anekdote van de stad is een goeie manier om deze sectie verlaten te bewerkstelligen ja dat is twee jaar geleden in aanwezigheid van de prins en ook dat is weer een nederlandse ontwikkeling om dezelfde dingen weten te krijgen nou 't staat daar al eens aan dat wat is er aan de gay aan als de ja op te zetten dan komt er later van een grote kijkt uit naar de zaak gelaten en wat geen verontreiniging en dat valt en ook geen gehoor wat ook zacht is het waard is ook contacten in nederland komen later op terug en uiteindelijk is 't wild dat daarmee het daardoor dat we in nederland dan helemaal geen vlees te laten gebruiken althans heel weinig en dat is uiteindelijk drie jaar en in een macro-economisch naar kijkt of zelfs naar de individuele klant is dat gewoon en dat is dan de zaak want mensen laten is helemaal natuurlijk aan drinkwater en het is eigenlijk in dure is dat veel slechter voor milieu in het milieu en de slacht van flessen later zijn is een keer zonnetje staan gemaakt met enkel punt nrc over andere niet en ze moeten allemaal terecht gevoerd worden met acht waren ze er niet moeten niet schromen maakt worden enzovoort en als die stomme idee maakt dan is het milieu 220 Annex E. Speech recognition in beslag van flessen water dertig keer zo hoog als zanger drinkwater zes één en op de achterkant van zich radeloos is zo'n beetje maakt vooral naar de nederlander verlaat de kleintjes zijn italianen denis die italiaan twee tot drie keer zoveel tijd voor en later dan de nederlander en dat zit 'm vooral in het feit dat mensen als ze later uit te drinkwatervoorziening zelf kosten daarvan zijn niet meer gelijk naar want en dat is ook een kenmerk van dit soort grootschalige ingestudeerd om iets goed te doen meer gouden gezaaid geringer veilige systemen maken dus niet zo heel veel duurder dan alles te doen het van de korsten zitten al in dat je moet met de niet te maken je moet een zuigeling en wie moet dat worden leidingen distributie leidingen heel veel kosten hier kiezen wie ze ook niet goed is niet veel duurder dan is het slecht doet ik denk dat ik er ben al jaren doen we hebben ook nog andere dingen dus we hebben 't laatste leek presentatie van de wereld en hele betrouwbare systemen en we letten tegenwoordig natuurlijk in nederland op water besparing water is toch de natuurlijke grondstoffen moet je niet willen is het water gebruikt in nederland stijgt niet is relatief constante het huishoudelijk water gebruikt daalt zelf met een tegenwoordig maar 't is maar net van het en douches en uh uh was misschien iets zou hebben en die worden ook andere stimuleert crisisteam op enzovoort hè dus we zijn allemaal verantwoord mee mee zegt en dan hebben we de laatste tien jaar deze uit uiteindelijk deelstaat van die hele filosofieën en indien india gedaan zijn de afgelopen dertig jaar is is dat we zijn ervan hebben 't wonderen uit de kraan waarschijnlijk aan de ene keer iets van de latere reizen naar drie jaar geleden toen posters van gemaakt en reclame op radio en tv het zonder uit te keren aan heel goed laat en het is er altijd worden niet ziek van de gevel treinen gingen we hebben geen flessen water nodig we hebben geen filters aandenken aan de nodige verspelen 't water niet deze hebben is daar goed voor mekaar nou nee niet zeggen dat je een beetje als jij zie je de wereld zijn best wel dingen die nog beter kunnen en beter moeten bijkomen wat terug maar qua stilistisch genie zeker in vergelijking met haar naar amerika bijvoorbeeld is dat gewoon zo aan de andere kant moeten we ons ook realiseren dat er natuurlijk ook heel anders kan en in een heel veel landen ook heel anders gaat het meest extreme voorbeeld daarvan zijn tien ontwikkelingslanden waaronder basis in first die nog volledig ontbreekt en daar gaat na de pauze door is er welgeteld aan uh even pauzeren met Annex E. Speech recognition 221 Annex E3. Speech recognition (SHoUT) compared to human made subtitles Human transcription (Erwin) Na een ruime inlooptijd kunnen we beginnen met het tweede deel van 30-11, Watermanagement. Het deel over gezondheidstechniek ga ik de komende zeven weken met jullie doornemen. En ik dacht, ik zal me eerst eens even aan jullie voorstellen dus mijn naam is Hans van Dijk, zoals jullie daar zien staan en ik dacht, laat ik daar maar twee dingen voor nemen, mijn hobby en mijn werk. Nou de hobby dat zien jullie, ik ben een marathonloper. Een mooie foto van de glorieuze binnenkomst in Rotterdam in april afgelopen periode. Marathonlopers dat zijn allemaal een beetje fanatieke lui he, echte doordouwers, die trainen iedere dag. Die weten hun leven zodanig te organiseren dat dat allemaal kan. Dus ik loop hier ook iedere dag tussen de middag een rondje naar Delfts hout, of langs de Schie, of een ander parcour hier. Als jullie me eens een keer in korte broek of trainingspak zien lopen dan klopt dat, dat ben ik. En dat doe ik inmiddels met een heel groepje mensen, bij ons op de afdeling, met studenten en promovendi. En een van die studenten is hier weergegeven, dat is Karin Teunissen. Die zat drie jaar geleden hier bij inleiding watermanagement. Was toen derde jaars, inmiddels is ze afgestudeerd en begonnen met een promotieonderzoek bij het duinwaterbedrijf in Scheveningen. En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april 42 kilometer samen gelopen. Nou dat is een herinnering die ons beide in het geheugen gegrift zal blijven. Dan het werk. Ik heb, ik ben, ja een vraag, ben jij ook een hardloper? - Sorry? Ben jij ook een hardloper? - Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al gegeven. Nee - Niet? Dan weet ik niet hoe ik dit al wist, maar... Nou ik zeg dit wel eens vaker, dus dat zou best kunnen. Waar ben je geweest? - Ja volgens mij vorig jaar, maar... Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha. Maar deze foto is echt van april hoor dus dat is toch vrij recent. Wat misschien zou kunnen zijn is, ik geef ook altijd een van de gastcolleges bij inleiding Civiele Techniek in het eerste jaar. En daar begin ik natuurlijk ook een beetje met, ja wie ben ik, dus dat zou best kunnen, dat je het daarvan herinnert. Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd. Ik heb toen ook Civiele Techniek gestudeerd, in '76 afgestudeerd. Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in Amersfoort, en dat kan ik jullie van harte aanraden als je straks afgestudeerd bent om bij een ingenieursbureau te gaan werken. Dat is een geweldige ervaring, je bent met allerlei projecten over de hele wereld bezig. In mijn geval dan drinkwater projecten. Machine speech recognition (Shout) ... en aan uh ... ja ... en daarin lopen ... we ... daar is ... zeker ... geen en uh ... we ... met het ... hele heelal van uh ... dertig ... elf ... en later naar ... het cement ... 't cd lover ... gezondheidss en niet werd hij ... de komende zeven weken met jullie uh doornemen ... ... en ik dacht ik zelf mee zeker je je voorstellen ... is ... mijn naam is als een tank zoals jullie daar ... zien staan ... en ik dacht laat ... ik daar maar twee dingen van één en een half ... jaar naar werk ... ja ... nou ... willen ... niet inzien ja ... ik ben marathonloper ... een mooie foto van de glorieuze ... ... binnenkomst in uh ... rotterdam ... in ... april ... afgelopen periode ... en marathonlopers dat zijn allemaal een beetje fanatieke leidde er ... echter door de ouders die ... twee iedere dag ... die je er beter inleven zelf de aandacht te organiseren dat het allemaal ... kan ... ... dus ik loop hier ook ... iedere dag tussen de meer dan ooit je naar delft houdt of langs deze rivier of ... andere koerier ... en uh ... als jullie d'r ... is een keer in korte broek als d'r is ... vaccineren lopen dan klopt dat we daar niet ... en uh dat door tieners met een heel groepje er mensen zijn ... ... bij ons op de afdeling met studenten en promovendi ... en één van die je ergens tien cent en ... is zeer ... tevreden met ... daarin trainen zijn ... diezelfde drie jaar ... geleden leerde hij jij ... inleidingen en ... laat de meeste mensen ... was ... een derdejaars inmiddels is afgestudeerd ... ... dat ... heeft ... hij ... er twee ... ik ben ... ja ... ja ... ja ... ja ja ... dat was ... ... nou ... als ... ik zeg niet als ... derde is ... dat zou mensen kunnen ... daar waar deze leest hè ... ... ja ... tuurlijk vorig jaar ... lang werden ... afgegeven in alle klassen ... ... het ... is ... een ... stuk ... ... zeg ... maar dat is ... wat er is echter laten ... we deze wist dat ... hij ... zijn dus uh ... ... wat te zien zou kunnen zijn ... in eerste reden ... is ook altijd ze er één van de gast gelezen medelijden is ... niet echt niet het eerste jaar ... en daar ... is ... natuurlijk ook een beetje net diary ben ik ... er dus dat zou best ... kunnen missen je daar je daarvan ... in het ... ... nou dan weet je ... nou ... dat ... weet jij hier ... dertig jaar ... geleden de laatste set neer te ... zetten dat is een techniek verste dertien zes en zeventig ... afgestudeerd... en ... daarna de reeks werken aan werk erbij ... en is hier ... wel ... meer ... dingen ... zei in amersfoort ... en dat kan ik hier niet van harte aan uh ... raden als je straks afgestudeerd ben om daarin niet hier is ... d'r oren te gaan werken ... ... we zeggen wel ... ervaring je bent met ... allerlei projecten over de hele wereld bezig in mijn geval dan drinkwater ... projecten <en verder> A row in the table corresponds with a <speech>-element in the Shout output A word (delimited with blanks) in the Shout output corresponds with a <word>-element 222 Annex E. Speech recognition Recovery of words by SHoUT Line nr. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 Words in human subtitles 4 3 8 6 7 13 11 11 5 6 4 6 3 4 9 6 11 13 6 5 13 6 10 9 12 9 7 10 14 4 14 3 7 5 1 5 16 1 11 12 4 6 14 0 0 14 6 8 8 13 10 13 12 12 13 7 5 10 6 Total sample 471 443 Subset (excluding conversation) 401 411 Annex E. Speech recognition Words in SHoUT Number 8 7 13 8 8 11 11 14 3 6 3 6 4 4 10 8 14 14 6 5 14 6 12 9 16 13 6 0 0 0 0 5 9 0 0 0 0 0 0 12 5 0 10 4 1 16 6 13 8 11 13 17 12 16 16 10 4 10 6 Relative 200% 233% 163% 133% 114% 85% 100% 127% 60% 100% 75% 100% 133% 100% 111% 133% 127% 108% 100% 100% 108% 100% 120% 100% 133% 144% 86% 0% 0% 0% 0% 167% 129% 0% 0% 0% 0% 0% 0% 100% 125% 0% 71% 114% 100% 163% 100% 85% 130% 131% 100% 133% 123% 143% 80% 100% 100% Words recovered by SHoUT Number Relative 0 0% 1 33% 4 50% 1 17% 7 100% 5 38% 8 73% 9 82% 1 20% 1 17% 3 75% 6 100% 3 100% 4 100% 7 78% 3 50% 6 55% 10 77% 4 67% 1 20% 8 62% 3 50% 7 70% 9 100% 4 33% 3 33% 5 71% 0 0% 0 0% 0 0% 0 0% 1 33% 2 29% 0 0% 0 0% 0 0% 0 0% 0 0% 0 0% 6 50% 1 25% 0 0% 4 29% 2 4 5 3 8 7 8 5 4 11 4 1 10 6 14% 67% 63% 38% 62% 70% 62% 42% 33% 85% 57% 20% 100% 100% 94% 215 46% 102% 204 51% 223 224 Annex E. Speech recognition Annex F. 1. 2. 3. 4. 5. 6. 7. Searching Information retrieval in multimedia content ............................................ 227 Subfields of information retrieval ........................................................................ 227 Indexing by means of speech recognition ........................................................... 228 Metadata storage ...................................................................................... 230 Data levels ........................................................................................................ 230 Density of data levels ........................................................................................ 231 Tag clouds ................................................................................................. 232 Types ............................................................................................................... 232 Computation of the tag size ............................................................................... 232 Foundation for search ........................................................................................ 233 Creating a tag cloud .......................................................................................... 233 Tag clouds based on data level .......................................................................... 236 Assessment of tag clouds by lecturer ....................................................... 243 Assessment approach ........................................................................................ 243 Original tag clouds ............................................................................................ 243 Modified tag clouds ........................................................................................... 244 Searching in recorded lectures ................................................................. 249 Evaluation of searching in recorded lectures ............................................ 251 Comparing subtitles and ASR output in search .................................................... 251 Searching for different text-types ....................................................................... 252 Duration per text type ....................................................................................... 253 Multiple-keyword search .................................................................................... 254 Keyword search for all lectures .......................................................................... 254 Multiple keyword search for all lectures .............................................................. 255 Precision and recall measurement ...................................................................... 256 Ranked search results ....................................................................................... 260 Evaluation ................................................................................................. 264 Annex F. Searching 225 226 Annex F. Searching 1. Information retrieval in multimedia content Subfields of information retrieval Information Retrieval (IR) is the discipline of finding information in collections. These can be divided into several subfields: • image retrieval • video retrieval • text retrieval Typically research on automatically solving the representation mismatch is done in image and video retrieval. For text retrieval, in general both the query and the collection are text based so that there is no representation mismatch. Image retrieval In Content Based Image Retrieval (CBIR), images are retrieved from a collection of images based on an index that is generated by automatically analyzing the content of the images. Mostly the images are retrieved by keyword/key-phrase queries or by query by example. In the query by example task, images are retrieved that contain similar content as an example image that is used as query. Although the query images and the images in the collection are of the same modality, it is not possible to compare them directly. The representation of both query and collection need to be altered. In order to compare the images, for each image a mathematical model, or signature, is created. This signature contains low-level information about the picture such as shape, texture or color information. Video retrieval Where image retrieval focuses on standalone images, in content-based video retrieval, the goal is to support in searching video collections. For this purpose, various methods of abstracting information from the video recordings are employed. Because video consists of a sequence of still pictures that are played rapidly after each other, in video retrieval a lot of image retrieval techniques can be re-used, but also other techniques are used such as for example detecting scene changes or recognition of text that is edited in the video (like people's names). Because most videos contain people speaking, it is also possible to use speech as a source of information. Spoken document retrieval Speech, in most multimedia archives, is a rich source of information for solving the representation mismatch. Sometimes it is even the only reliable source of information. Radio shows or telephone recordings do not contain any video. They might contain some music or sound effects, but generally for those examples most information is in the speech. Spoken Document Retrieval (SDR) is a subfield of information retrieval that solely focuses on the use of speech for retrieving information from audio or video archives. In the most widely studied form of SDR, in order to solve the representation mismatch the speech is automatically translated into written text by Automatic Speech Recognition (ASR) technology. The output of this process, speech transcriptions, can be used in a retrieval system (see Figure 1.1). The transcriptions contain the exact time that each word is pronounced so that it is possible to play back all retrieved words. This method is similar to the earlier mentioned example of an index in a book where the page number of each word is stored. Both such an index and speech transcriptions are often referred to as metadata. Metadata is data about data. In the speech transcription case, the words and the timing information provide information about the actual data, the audio recordings. Annex F. Searching 227 Figure 1.1: Solving the respresentation mismatch between content and query in an SDR system The speech from multimedia documents is translated into written speech transcriptions by the ASR component. As the query is already formulated in written text, it does not need to be translated and can be used directly by the retrieval component to and relevant video fragments. If the speech transcriptions would always contain exactly what is being said, the performance of the text retrieval system would be equally good as when searching in written text. In general ASR systems are not perfect and any word that is recognized incorrectly, potentially introduces errors in the retrieval component. This was illustrated by the cross recognizer retrieval task during the seventh Text Retrieval Conference (TREC-7) in 1998 organized by the National Institute of Standards and Technology (NIST). Participants of the benchmark evaluation used speech transcriptions of varying quality to perform text retrieval. The results showed that although the speech transcriptions didn't have to be perfect in order to obtain good retrieval performance, there was a significant correlation between the quality of the transcriptions and the performance of the retrieval system. This illustrates that the success of an SDR system is highly depending on the performance of the ASR component. (Source: http://wwwhome.cs.utwente.nl/~huijbreg/publications/thesis_Marijn_Huijbregts.pdf) Indexing by means of speech recognition The amount of metadata attached to multimedia collections that can be used for searching is very much dependant on the available resources within the organizations that create or own the collections. Large national audiovisual institutions such as Sound&Vision in the Netherlands put a lot of effort in archiving their assets and they label collection items with at least titles, dates and short content descriptions. When creating a more detailed archive of textual data, the speech in audio is an important information source that, once transformed into text and/or enriched with linguistic annotation, can enable the conceptual querying of video content. The basic idea is to use automatic speech recognition technology to generate such a linguistic annotation of textutal representation and to use this as (a source for) automatically created metadata that can be used for searhing by applying standard text-based information retrieval techniques. (Source: Multimedia Retrieval by Henk Blanken…) SHoUT is a software package that has been developed at the University of Twente at the chair Human Media Interaction by promovendus Marijn Huijbregts. He was doing a PHD project titled "Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled". ShoUT is a Dutch acronym for "Speech Recognition Research at the University of Twente" which is a speech recognition system based on machine learning techniques that are commonly used. It is used to do research on Large Vocabulary Continuous Speech Recognition (LVCSR), but the speech/non-speech detector and the speaker diarization application can be used separately. It is written in C++ on a Linux platform. (Source: http://wwwhome.cs.utwente.nl/~huijbreg/shout/) 228 Annex F. Searching Figure 1.2: Logo of the chair Human Media Interaction at the University of Twente (Source: http://hmi.ewi.utwente.nl) Annex F. Searching 229 2. Metadata storage Data levels To make the lecture searchable, the data first needs to be properly structured. This can be done by creating a database and store the metadata in several tables. The accompanying Entity-Relation-Diagram can be seen in Figure 2.1. Figure 2.1: Entity-Relation-Diagram of recorded lecture database When looking at the available lecture metadata, several data levels can be distinguished. Each of these levels has a certain relevance factor which is different for each lecture. The different levels and their respective source are shown in Table 2.1. An illustration of where this data has been derrived from can be seen in Figure 2.2. Table 2.1: List of Text_types and their respective source Name Lecture title Lecture chapter Slide title Slide content Slide notes Transcript (lecture) Transcript (slide) Transcript (sentence) Transcript (word) Source Lecturer after post-processing Lecturer after post-processing PowerPoint slides PowerPoint slides PowerPoint slides Video of lecture Video of lecture Video of lecture Video of lecture Figure 2.2: Sources for each text_type in the database 230 Annex F. Searching This list has been ordered by expected relevance. A slide title has a lot more probative value as a keyword compared to a random word in the transcript. Therefore if someone runs a search on this data, a higher relevance should be assigned to results that come from the slide content, as opposed to the transcript. This has been made visible in tables 3, 4, 5 and 6, where an example of all the data levels is shown for lecture CT3011. The advantage of this database structure is that it offers a total freedom for users to store whatever type of metadata they want. The only required elements are a match to a video lecture, a timeframe to which the metadata is pertinent and Text_type that constitutes the category of the metadata stored. Density of data levels With all the data inserted in the database, a word count can be made on each data level. It shows the density of words for each text type, with the number of records, words and characters sorted by category. Table 2.2: List of Text_types and the amount of records and words in the database for course CT3011 Name Lecture title Lecture chapter Slide title Slide content Slide notes Transcript (lecture) Transcript (slide) Transcript (sentence) Transcript (word) Nr of records 28 116 1,183 1,042 10 1 28 779 118,926 Nr of words 129 300 3,900 15,943 804 6,970 6,970 6,970 188,926 Nr of characters 917 2,526 28,741 129,195 5,102 41,407 41,383 40,623 808,482 The grayed out rows only show the data given for lecture 15 by Hans van Dijk, since this is the first sample lecture that was used for manual human-made subtitles. In Table 2.3, the numbers in rows "Slide notes", "Transcript (lecture)" and "Transcript (sentence)" have been multiplied by 28 (the total number of lectures in course CT3011). The row "Transcript (slide)" has been adjusted to the total number of slides in course CT3011. This has been done to give a more complete picture of the density of each category. Table 2.3: List of Text_types and the amount of records and words in the database for course CT3011 Name Lecture title Lecture chapter Slide title Slide content Slide notes Transcript (lecture) Transcript (slide) Transcript (sentence) Transcript (word) Nr of records 28 116 1,183 1,042 280 28 1,183 21,812 118,926 Nr of words 129 300 3,900 15,943 22,512 * 179,480 * 179,480 * 179,480 188,926 Nr of characters 917 2,526 28,741 129,195 142,856 * 768,058 * 768,058 * 768,058 808,482 * 95% of the total number of words generated by SHoUT, based on the comparison between the human-made subtitles and the SHoUT subtitles Annex F. Searching 231 3. Tag clouds A tag cloud or word cloud (or weighted list in visual design) is a visual depiction of usergenerated tags, or simply the word content of a piece of text. It is mainly used to describe the content of web sites. Tags are usually single words and are typically listed alphabetically, and the importance of a tag is shown with font size or color, thus both finding a tag by alphabet and by popularity is possible. The tags can become hyperlinks that lead to a collection of items that are associated with a tag. Figure 3.1: Example of a tag cloud with terms related to Web 2.0 (Source: http://en.wikipedia.org/wiki/Tag_cloud) Types There are three main types of tag cloud applications in social software, distinguished by their meaning rather than appearance. In the first type, size represents the number of times that tag has been applied to a single item. This is useful as a means of displaying metadata about an item that has been democratically "voted" on and where precise results are not desired. Examples of such use include http://last.fm (to indicate genres attributed to bands) and http://www.librarything.nl (to indicate tags attributed to a book). In the second, more commonly used type, size represents the number of items to which a tag has been applied, as a presentation of each tag's popularity. Examples of this type of tag cloud are used on the image-hosting service Flickr, blog aggregator Technorati and on Google search results with DeeperWeb. In the third type, tags are used as a categorization method for content items. Tags are represented in a cloud where larger tags represent the quantity of content items in that category. More generally, the same visual technique can be used to display non-tag data, as in a word cloud or a data cloud. Computation of the tag size for ti > tmin; else si = 1 si: display fontsize fmax: max. fontsize ti: count tmin: min. count tmax: max. count 232 Annex F. Searching Foundation for search For searching within a video lecture, the subtitle files can be used as a basis. This transcript includes all the words that were spoken by the lecturer, including the specific timeframe that the word was mentioned. Below is a list of interesting statistics about the subtitles of lecture CT3011, Introduction Water Management: Table 3.1: Analysis of transcript for lecture CT3011 Length Number of words Number of subtitle sentences Average words per minute Average words per second 45:09 6,970 779 154.3 2.57 By inserting all the words into a SQL Server database table, it became possible to do a further analysis on them, through the use of the following query: SELECT DISTINCT word, COUNT(word) AS count FROM CT3011_transcript GROUP BY word ORDER BY count DESC; The result was a list of words in order of their most frequent use, which is shown in Table 3.2. Table 3.2: Top 20 most common words in transcript of lecture CT3011 Nr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Word dat de en het is een van in ook we die je dus ik dan daar niet zijn op met Count 269 231 225 220 181 174 162 151 134 128 113 107 93 88 80 76 55 54 54 53 If you take a closer look at these words, you will immediately notice that most of them aren't very useful for searching. The most frequent words used are short senseless words like "and", "a" and "that", when people are generally interested in keywords that say something meaningful about the subject. Creating a tag cloud Once this list of most frequently used words is available, it's possible to generate a tag cloud, based on these words. A good example of a free tag cloud generator is Wordle (Source: http://www.wordle.net). It provides two ways of providing input, either by pasting in a bunch of text through a single text field, or by entering a list of words combined with a relevance weight (such as word count). Both of these input options are shown in Figure 3.2 and Figure 3.3. Annex F. Searching 233 The website will then generate an image using JAVA and create a tag cloud based on the word count of each individual word. The larger the frequency, the larger the word becomes in the generated image. Figure 3.4 shows an example of such a tag cloud of our transcript of lecture CT3011. Figure 3.2: Input of a text block in Wordle Figure 3.3: Input of weighted words Wordle Figure 3.4: Tag cloud of entire transcript CT3011 by Wordle As predicted, the results of this aren't very good since the words that are deemed "most important" based on their word count, are all the small and meaningless common connector words. Luckily, Wordle offers the option of removing common words in each language. When turning this feature on, the results become a lot more useful, as shown in Figure 3.5. 234 Annex F. Searching Figure 3.5: Tag cloud of entire transcript CT3011 by Wordle with common Dutch word removal There are two obvious problems with this generated tag cloud. The amount of words used is too big, which makes it hard to comprehend and the words selected by the generator aren't equally relevant to the subject of the lecture. Also, there are still a bunch of meaningless words included in the tag cloud, such as "natuurlijk", "nou" and "jullie". The problem of returning is a cloud that is too dense to comprehend is easily solved. Wordle allows for the possibility of setting a limit to the maximum number of words returned. This way, there's a lot less clutter in the tag cloud and the keywords stand out a lot clearer, as is shown in Figure 3.6. Figure 3.6: Tag cloud of entire transcript CT3011 by Wordle with common Dutch word removal and a maximum number of 25 words After showing several of these tag clouds, using different settings offered by Wordle, it seems clear that in order to increase the relevance of the tag cloud a certain selection of words has to be made. Using the word count to assign a relevance factor seems to be working, since a lot of keywords about lecture CT3011 are returned, there needs to be something done about the meaningless words that are show. A simple solution for this is to create a list based on all the words in a transcript, sorted by frequency and to select only the nouns. This selection will remove a lot of the smaller words and only come up with a list of genuine keywords or words that have a decent relevance compared to the text. This list of all nouns spoken in transcript of lecture CT3011 is shown in Table 3.3. Table 3.3: Top 15 most used nouns in transcript of lecture CT3011 Nr 1 2 3 4 5 6 7 Word water Nederland grondwater jaar drinkwatervoorziening dingen oppervlaktewater Annex F. Searching Count 39 36 21 16 16 16 15 235 8 9 10 11 12 13 14 15 boek keer drinkwater plaatje vragen chloor soort stoffen 15 13 13 11 10 10 9 9 By running this new list of keywords through the same tag cloud generator from Wordle, a new picture is generated (see Figure 3.7). There is an obvious difference in relevance and the amount of words shown is also a lot smaller, which makes the overall image much easier to comprehend. It is no longer a chaotic mess filled with meaningless words. Figure 3.7: Tag cloud of nouns in transcript CT3011 by Wordle with common Dutch word removal Tag clouds based on data level Now that several data levels have been distinguished, it's possible to create different tag clouds based on the corresponding information sources. That way, the relevance and effectiveness of the data can be compared by looking at the output generated by the Wordle tag cloud generator. Table 3.4: List of slide titles and their respective timeframe of lecture CT3011 Nr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 236 Title Civiele Gezondheidstechniek Overzicht 3011, deel Gezondheidstechniek Drinkwater- principes en praktijk Colleges Wat is Gezondheidstechniek? Schoon water voor een gezond leven.. Wat is Gezondheidstechniek? Schoon water voor een gezond leven.. Wat is Gezondheidstechniek? Wat is Gezondheidstechniek? Wat is Gezondheidstechniek? Drinking water and Delft Wat doet een waterleidingingenieur? Studies Wat doet een waterleidingingenieur? Ontwerpen Wat doet een waterleidingingenieur? Wat doet een waterleidingingenieur? Wat doet een waterleidingingenieur? Onderzoek Dutch drinking water: principles and practices Drinking water in the Netherlands Principles and practices:1 Principles and practices: 2 Source protection Groundwater Artificial recharge Multiple barriers… Timeframe 0:00 – 7:28 7:28 – 9:48 9:48 – 10:20 10:20 – 13:12 13:12 – 13:47 13:47 – 13:50 13:50 – 14:39 14:39 – 15:46 15:46 – 16:39 16:39 – 17:20 17:20 – 18:06 18:06 – 18:50 18:50 – 20:39 20:39 – 23:11 23:11 – 24:07 24:07 – 25:09 25:09 – 27:08 27:08 – 29:47 29:47 – 33:35 33:35 – 35:49 35:49 – 38:48 38:48 – 39:08 39:08 – 39:27 39:27 – 40:02 40:02 – 40:36 Length 7:28 2:20 0:32 2:52 0:35 0:03 0:49 1:07 0:53 0:41 0:46 0:44 1:49 2:32 0:56 1:02 1:59 2:39 3:48 2:14 2:59 0:20 0:19 0:35 0:34 Annex F. Searching 26 27 28 29 Modern technology… Principles and practices: 3 Principles and practices: 4 The miracle from the tap 40:36 41:28 43:23 43:59 – – – – 41:28 43:23 43:59 45:09 0:52 1:55 0:36 1:10 Table 3.4 shows every slide title, the timeframe in which the slide is shown during the lecture and the corresponding amount of number of minutes and seconds of the slide length. The average time that a slide is shown for lecture CT3011 is 1 minute and 33 seconds. When using the same tag cloud generator employed earlier to generate a tag cloud based on all the slide titles, it is interesting to see the differences between the two (see Figure 3.7 and Figure 3.8). The most obvious difference is the most frequently used word. Lecture CT3011 is titled "Civiele gezondheidstechniek", yet in the first tag cloud based on the entire transcript, either of these words doesn't even appear. When creating another tag cloud that only incorporates the slide titles, interestingly enough the most frequently used word is "Gezondheidstechniek". Figure 3.8: Tag cloud of slide titles of lecture CT3011 with common Dutch word removal Table 3.5 shows all the content that is displayed order by slide. It is clear that most data that is presented here are keywords for the lecture. For searching, the data that comes out of these slides is generally very good. Table 3.5: List of slide content of lecture CT3011 Nr 1 2 3 4 5 Content Prof. Hans van Dijk Boek Drinkwater-principes en praktijk verkrijgbaar bij Mieke Hubert, kamer 4.55 Vraagstukken in boek Computer assignments op blackboard Oude tentamens op blackboard 7 colleges conform schema Gezondheidstechniek 3011 Drinkwaterbedrijven 3011 Planning en ontwerp 3420 Financiën 3420 Waterverbruik 3011 Waterkwaliteit 3011 Grondwater 3420 Oppervlaktewater 3420 Distributie 3011 27 sept. Inleiding gezondheidstechniek 1 okt. Waterkwaliteit 1: eisen/micro 4 okt. Waterkwaliteit 2: natuur/chemie 8 okt. Drinkwaterbedrijven 1: grondwater 11 okt. Drinkwaterbedrijven 2: oppervlaktewater 15 okt. Waterverbruik 18 okt. Distributie Excursie naar de Berenplaat op 11 oktober na college grondwater drinkwater oppervlaktewater Annex F. Searching 237 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 riolering afvalwater aantal per 100.000 inwoners % niet- aangesloten jaar 70 60 50 40 30 20 10 0 1919 1945 1900 1925 1950 1975 70 60 50 40 30 20 10 0 grondwater drinkwater oppervlaktewater riolering afvalwater aantal per 100.000 inwoners % niet- aangesloten jaar 70 60 50 40 30 20 10 0 1919 1945 1900 1925 1950 1975 70 60 50 40 30 20 10 0 Goede waterkwaliteit ten dienste van mens en milieu kennis van: waterwinning waterzuivering watertransport waterchemie microbiologie Gezondheidstechnisch ingenieur maakt gebruik van kennis van: hydraulica hydrologie constructieve vormgeving informatica projectrealisatie van groot belang voor de volksgezondheid grootschalige gespecialiseerde infrastructuur goed georganiseerde sector met heldere taken Prof. ir. Hans van Dijk 27 september 2009 Total volume:1.2 x 109 m3/jaar Sources Groundwater: 2/3 Surface water: 1/3 Treatment Groundwater: aeration and sand filtration Surface water: very extensive treatment Distribution no chlorine! Focus on public health… Large publicly owned private companies… With joined efforts for research and communication Source protection Safe groundwater when available… Or artificial groundwater… Or surface water with multiple barriers for micro-organisms, pollutants and nutrients… 22 238 Annex F. Searching 23 24 25 26 27 28 29 High quality water without chlorine… And with a low hardness… So the customers drink water from the tap No leakage… Reliable systems… Stimulate water saving… High quality water supply No waterborne diseases No chlorine No pesticides No hard water No corrosion and metals No leakage No need for home filters No need for bottled water No wasting of water Figure 3.9: Tag cloud of slide content of lecture CT3011 with common Dutch word removal Figure 3.10: Tag cloud of slide titles and content of lecture CT3011 with common Dutch word removal In Table 3.6, the notes that Hans van Dijk added to his PowerPoint slides are presented. The content of this data obviously differs a lot per lecturer. In this specific example, most of the notes are parts of the story that the lecturer wants to tell, which is why the text looks a lot like a transcript. Table 3.6: List of slide notes of lecture CT3011 Nr 5 6 Notes De infrastructurele werken van drinkwater en afvalwater of wel de kleine waterkringloop of stedelijke waterketen zoals ons segment in het DC heet. Dus het onttrekken van water aan de grote hydrologische kringloop, het zuiveren en distribueren van drinkwater, inzamelen van afvalwater en tenslotte het zuiveren van afvalwater voordat we het weer aan de grote kringloop teruggeven, zoals we dat eufemistisch uitdrukken, ofwel voordat we het afvalwater weer lozen op het oppervlaktewater. We hebben daarbij een heel helder doel, namelijk het bevorderen van de volksgezondheid en Annex F. Searching 239 8 9 11 12 13 14 15 20 240 dat we daarbij zeer succesvol zijn blijkt uit de grafiek die laat zien dat buiktyphus in de vorige eeuw uit Nederland werd uitgebannen door de aanleg van de drinkwatervoorziening. Dat zuiver drinkwater van groot belang is voor de volksgezondheid weten we dus al een eeuw, maar we moeten er steeds alert op blijven, ook met nieuwe bedreigingen zoals SARS. Als toepassingsgerichte ingenieurs maken we bij het ontwerpen van drink- en afvalwaterinstallaties natuurlijk gebruik maken van een groot aantal civiele domeinen, zoals… Enkele bijzondere kenmerken van de gezondheidstechniek zijn dat het een heel gespecialiseerd en goed georganiseerd wereldje is met heldere taken en veel aandacht voor kwaliteit. Van oudsher spelen civiel ingenieurs een vooraanstaande rol in de sector, hoewel ook wij geconfronteerd worden met concurrentie van de bedrijfskundigen, economen en juristen. Vroeger behoorde een waterleidingdirecteur toch wel civiel ingenieur te zijn, nu zijn er van de 15 directeuren nog 4 civiel. Wat doet een gezondheidstechnisch ingenieur? Ja, net als iedere andere civiel verricht hij studies, soms op momenten dat de rest van Nederland voor de TV zit, zoals het linker plaatje laat zien. Bij de kwart-finale van de WK van 1998 Nederland – Argentinie bleek dat het waterverbruik een perfecte indicator is van het wedstrijdverloop. Zodra de wedstrijd begint, daalt het waterverbruik snel met minima tijdens de doelpunten en vlak voor de rust en vlak voor het einde . Tijdens de rust en na de wedstrijd gaat iedereen meteen naar de WC en zien we een enorme piek in het waterverbruik. Interessant is nog de zogenaamde Cruijf-dip als JC commentaar gaat geven in de rust; we zien dat circa 10 % van de mensen dan even terugkomt van de WC. Ja, ontwerpen doen we als alle civielen natuurlijk het liefste. Ook bij ons gaat het dan om schematiseren en berekenen, waarbij natuurlijk geen fouten gemaakt moeten worden want zo'n implosie t.g.v waterslag als bij het koolfilter op de middelste foto valt natuurlijk wel op en staat niet zo best op je conduitestaat (niet door Delfts civiel ingenieur ontworpen) U vindt ons dikwijls op de mooiste plekjes van Nederland, zoals bos, heide en de duinen waar we dat "Zuiver water uit een schoon milieu" denken te vinden, maar ook aangename excursies kunnen organiseren, zoals het hooglerarenuitje van enkele jaren geleden naar Scheveningen. Groot project in Limburg, WPH. Een voorbeeld van een recent project waar ik in mijn DHV -tijd nog medeverantwoordelijk voor geweest ben is WPH, een groot project dat als doel had om de grondwaterwinning in Limburg te verminderen (i.v.m. verdroging) en deels te vervangen door oppervlaktewater. Nabij Heel ligt een voormalige grindwinplas (de Lange Vlieter) die we ingericht hebben als bekken voor de drinkwatervoorziening. Vanuit het Lateraalkanaal wordt water in het bekken gepompt, waarna het op natuurlijk wijze infiltreert in de bodem en via winputten op enige afstand en na minimaal 60 dagen van het bekken weer wordt opgepompt. Het grote voordeel van deze bodempassage is dat bacteriologisch betrouwbaar water gewonnen wordt. Het water wordt vervolgens gezuiverd m.b.v. zand en koolfilters en via in totaal 100 km transportleiding getransporteerd naar de bestaande (grondwater) pompstations in Middenen Noord Limburg, waar het vervolgens wordt opgemengd met het lokaal gewonnen grondwater en gedistribueerd. Capaciteit 20 miljoen m3/a, Investeringen 300 miljoen, in guldens weliswaar. Onderzoek doen we niet altijd op van die idyllische plekjes zoals tijdens het veldpracticum in Luxemburg op de linker foto, maar toch wel vaak op locatie omdat de ene waterkwaliteit nu eenmaal de andere niet is en het lastig is om te slepen met water naar het laboratorium. In het lab zelf doen we meer fundamenteel onderzoek, zoals de foto in het midden een onderzoek naar de hydraulische verdeling van water en lucht bij het terugspoelen van membranen. Het Lab van Gezondheidstechniek verhuis momenteel naar Stevin II , waar we samen met Vloeistofmechanica het nieuwe Waterlab gaan vormen. We verwachten veel van de samenwerking met VLM, daar het bij ons ook draait om de combinatie van VLM en waterkwaliteit. En als dat onderzoek goed gaat eindigt het met een promotie en kijken we trots en tevreden terug… Wat betreft de bescherming van de bron, hierbij is naast onderzoek ook de nodige publiciteit en lobby nodig om de vervuilers aan te pakken en de bronnen te saneren; de WLB nagelen dan ook graag vervuilers aan de schandpaal. Endocrine disruptoren, pil Annex F. Searching Figure 3.11: Tag cloud of slide notes of lecture CT3011 with common Dutch word removal Table 3.7 shows the transcripts that correspond to the first 3 slides of the lecture. This data is obviously the most extensive since every spoken word is included. The large amount of words make the data less relevant when searching. Table 3.7: Transcript for the first 3 slides of lecture CT3011 Nr 1 2 Transcript Na een ruime inlooptijd kunnen we beginnen met het tweede deel van 30-11, Watermanagement. Het deel over gezondheidstechniek ga ik de komende zeven weken met jullie doornemen. En ik dacht, ik zal me eerst eens even aan jullie voorstellen, dus, mijn naam is Hans van Dijk, zoals jullie daar zien staan en ik dacht, laat ik daar maar twee dingen voor nemen, mijn hobby en mijn werk. Nou de hobby dat zien jullie, ik ben een marathonloper. Een mooie foto van de glorieuze binnenkomst in Rotterdam in april afgelopen periode. Marathonlopers dat zijn allemaal een beetje fanatieke lui he, echte doordouwers, die trainen iedere dag. Die weten hun leven zodanig te organiseren dat dat allemaal kan. Dus ik loop hier ook iedere dag tussen de middag een rondje naar Delfts hout, of langs de Schie, of een ander parcour hier. Als jullie me eens een keer in korte broek of trainingspak zien lopen dan klopt dat, dat ben ik. En dat doe ik inmiddels met een heel groepje mensen, bij ons op de afdeling, met studenten en promovendi. En een van die studenten is hier weergegeven, dat is Karin Teunissen. Die zat drie jaar geleden hier bij inleiding watermanagement. Was toen derde jaars, inmiddels is ze afgestudeerd en begonnen met een promotieonderzoek bij het duinwaterbedrijf in Scheveningen. En zij is ook een fanatieke hardloper geworden, en zo hebben wij in april 42 kilometer samen gelopen. Nou dat is een herinnering die ons beide in het geheugen gegrift zal blijven. Dan het werk. Ik heb, ik ben, ja een vraag, ben jij ook een hardloper? - Sorry? Ben jij ook een hardloper? - Eeh, nou ja ik heb wel een vraag, maar volgens mij is dit college al gegeven. Nee - Niet? Dan weet ik niet hoe ik dit al wist, maar... Nou ik zeg dit wel eens vaker, dus dat zou best kunnen. Waar ben je geweest? - Ja volgens mij vorig jaar, maar... Ja tuurlijk, vorig jaar hebben we ook 30-11 gegeven ja, dat klopt haha. Maar deze foto is echt van april hoor dus dat is toch vrij recent. Wat misschien zou kunnen zijn is, ik geef ook altijd een van de gastcolleges bij inleiding Civiele Techniek in het eerste jaar. En daar begin ik natuurlijk ook een beetje met, ja wie ben ik, dus dat zou best kunnen, dat je het daarvan herinnert. Nou dan weet jij nog dat ik hier 30 jaar geleden ben afgestudeerd. Ik heb toen ook Civiele Techniek gestudeerd, in '76 afgestudeerd. Daarna ben ik gaan werken bij een ingenieursbureau, bij DHV in Amersfoort, en dat kan ik jullie van harte aanraden als je straks afgestudeerd bent om bij een ingenieursbureau te gaan werken. Dat is een geweldige ervaring, je bent met allerlei projecten over de hele wereld bezig. In mijn geval dan drinkwater projecten. Dus het ontwerpen van zuiveringsinstallaties, bouwen van systemen, ook het doen van onderzoek. Eigenlijk kun je alle kanten op bij een ingenieursbureau en de Nederlandse ingenieursbureaus zijn redelijk succesvol, ook op de internationale markt tegenwoordig. Ja ik heb daar vele jaren gewerkt, totdat op een gegeven moment, inmiddels is dat alweer 17 jaar geleden, er een advertentie stond dat we een hoogleraar zochten hier in Delft. En toen dacht ik van, nou ja, laat ik maar eens een brief schrijven, je weet het nooit, niet geschoten is altijd mis. Dus ik heb een brief geschreven en ik dacht, ik zal het vast wel niet worden, maar ik werd het wel. Dus ook daar zit al meteen een eerste levensles in, probeer maar eens wat en het kan altijd meevallen. Ik ben in eerste instantie vervolgens voor een dag in de week hier deeltijdhoogleraar geworden in de drinkwatervoorziening, dat is mijn leerstoel. En ja, zo langzamerhand van het een komt het ander, je wordt voor steeds meer dingen gevraagd. Dus ik ben langzamerhand meer dingen hier in Delft gaan doen en die aanstelling bij DHV heb ik steeds verder afgebouwd, en vanaf 1999 ben ik volledig gestopt bij DHV en ben ik hier voltijd hoogleraar. En voltijd hoogleraar dat betekent ook, je hebt enerzijds taken op het gebied van onderwijs, anderzijds onderzoek, maar ook management, dus management, ja, dan moet je, ik ben hoofd van een afdeling enzo en dan zit je in het managementteam of in de opleidingscommissie. Moet je over algemene dingen meepraten en beslissen. Daar kun je natuurlijk een dagtaak van maken, dat heb ik altijd vermeden. Ik vind het toch altijd het leukste om met het vak bezig te zijn en daarmee kom ik op het tweede plaatje wat hier staat, want het allerleukste is eigenlijk afstudeerders begeleiden. Dat gaan jullie de komende jaren dat proces doormaken. Dat is voor ons altijd ontzettend leuk om te zien hoe studenten zich transformeren van min of meer anonieme figuren die in de collegezaal zitten en zitten te luisteren. Min of meer absorberen wat ik in een monoloog aan het overdragen ben. Hoewel ik overigens wel reacties van jullie zeer op prijs stel hoor en ik zal daar ook af en toe expliciet om vragen. Maar goed, de praktijk is toch dat in deze fase van de studie zitten jullie nog vooral te luisteren en dat wordt eigenlijk steeds leuker als je verder komt in het vierde en het vijfde jaar en het hoogtepunt is dan natuurlijk het afstuderen, waar je echt een onderwerp helemaal zelf bij de kop pakt. Ik zeg ook altijd tegen mijn afstudeerders, je moet van je afstudeerproject je visitekaartje maken, he Doris, en dat werkt ook echt zo. Op het moment dat je klaar bent met dat afstudeeronderwerp dan weet jij het meeste van dat onderwerp af. Meer dan wie dan ook in Nederland. Dat bewijzen we ook iedere keer weer door de afstudeercolloqui. Daar geven we veel kenbaarheid aan, daar komen altijd mensen vanuit de waterbedrijven van KIWA, van andere researchinstituten. Die doen daar mee in de discussies en onze afstudeerders die weten keer op keer alle vragen te beantwoorden. Misschien niet altijd 100% goed, maar toch wel 99% goed. Dat is altijd een genoegen om mee te maken. Ik zeg ook altijd dat ik trots ben op mijn afstudeerders, en dat is ook zo. Ik heb er inmiddels een stuk of 80 gehad en soms gaat het dan heel goed, zoals hier staat met Karin en Doris, Doris is hier trouwens in de zaal aanwezig, die dan het afgelopen jaar allebei zelfs met lof zijn afgestudeerd. Dat betekent dus dat je het heel goed gedaan hebt, hoge cijfers gehaald hebt, en ook het afstudeerproject heel goed gedaan hebt. Ja, dat is voor ons gewoon heerlijk om dat mee te maken. Om te zien hoe jonge mensen het vak ook leuk gaan vinden, zelf ook enthousiast worden, en hun stempel gaan zetten op ons vakgebied. En ik hoop dat enkele van jullie ook zo ver zullen komen. Goed, dat is wat mijzelf betreft. Dan wat dit vak betreft. We gaan dat doen aan de hand van het boek, dat staat al op blackboard aangegeven. Daar hebben we een Nederlandse en een Engelstalige versie van. Dat boek dat moeten jullie kopen bij de secretaresse van ons, Mieke op de vierde verdieping, voor 25 euro. In de winkel kost het 50 euro, maar wij hebben een speciale kortingsregeling. Jullie mogen zelf weten of je het Nederlandse of het Engelse boek koopt. De inhoud is vrijwel hetzelfde en in ieder geval voldoende voor dit vak. Als jullie een advies van mij willen hebben dan zou ik zeggen, als je goed Engels kunt lezen, koop het Engelse boek, dat is iets actueler, staat iets meer informatie in, maar het Nederlandse boek is voor dit vak zeker voldoende. Ja, zo'n boek heeft natuurlijk, behalve dat we er over gaan vragen bij het tentamen, daar zal ik bij mijn volgende dia op terugkomen, heeft zo'n boek natuurlijk ook nog een zekere functie als naslagwerk. Als je zo'n boek eenmaal hebt, dan heb je dat bij je, ook na je afstuderen neem je dat mee. Als je vervolgens ergens in een vreemd land een installatie moet ontwerpen, dan haal je dat boek weer eens uit de tas en dan weet je weer het een en ander. Die functie heeft zo'n boek ook. Daar staan vraagstukken ook in, in dat boek, en we hebben ook vraagstukken op blackboard staan. Dat zullen jullie misschien ook al gezien hebben, computer assignments. Dat is overigens niet verplicht, er is bij ons niets verplicht. Ja, jullie moeten uiteindelijk het tentamen doen, maar we bieden materiaal aan, dus maak er gebruik van zou ik zeggen maar we gaan dat niet controleren. Er staan daar vragen op blackboard, er zitten vragen in dat boek, de antwoorden staan er ook bij, of althans, als je die computer assignment gemaakt hebt dan krijg je na afloop te melden welke vragen goed waren en welke vragen fout waren. Dus dat is een ondersteuning voor jullie bij het kennismaken met de materie en het leren van de stof. En oude tentamens hebben we daar ook bij staan, dus dan kun je ook nog eens oefenen en kijken wat er ongeveer gevraagd wordt. En dan gaan we college geven de komende periode. Annex F. Searching 241 3 4 242 Oh ja, dus over het boek, jullie hoeven niet het hele boek te kennen. Dat boek wordt zowel gebruikt bij 30-11, als bij het volgende college 34-20, wat een a keuzevak is voor de mensen die watermanagement gaan doen, en de hoofdstukken die voor 30-11 gevraagd worden op het tentamen staan hier aangegeven. En die presentatie komt ook weer op blackboard zoals jullie weten, inclusief deze video opname. Dan gaan we deze colleges geven, dus 7 keer de komende periode vanaf nu, en ik wil het dit jaar zo doen dat in het eerste uur vertel ik een beetje de grote lijn van het betreffende onderwerp. De belangrijkste punten, ik probeer daar wat kleuring aan te geven. Wat is nou belangrijk en wat minder. En het tweede uur heb ik steeds een van de promovendi, vandaag is dat Doris, die dan iets gaan vertellen over hun eigen onderwerp, hun eigen onderzoek, hun eigen project, wat een stukje actualiteit geeft, en kleuring, verdieping, van het betreffende onderwerp. En ik heb het zo georganiseerd dat dat steeds, als het goed is, goed op elkaar aansluit en jullie een goed beeld geven van de stof, zodat je straks het tentamen ook makkelijk kunt maken. Dat wil niet zeggen dat alle onderdelen van de verhalen van de promovendi tentamenstof zijn. Dat zullen we zo her en der ook wel aangeven. Ja, zo'n promotieonderzoek dat gaat natuurlijk veel dieper dan jullie nu in het derde jaar hoeven te weten, maar het gaat meer om de beeldvorming, de kleuring en het begrip van de materie. Dan hebben we een excursie gepland naar de Berenplaat, de grote zuiveringsinstallatie bij Rotterdam, bij Spijkenisse om precies te zijn, op 11 oktober. Ook dat is niet verplicht, alles is facultatief bij ons. Daar hebben zich tot nu toe een stuk of 60 mensen aangemeld. De inschrijving sluit op 1 oktober hebben we gezegd, omdat bij de waterbedrijven tegenwoordig ook strikte veiligheidsvereisten enzo zijn na de aanslagen in New York. Je moet daar precies opgeven wie er allemaal komen, met naam enzo en wij moeten daar voor instaan ook, dat er geen vervelende dingen gebeuren, en er moeten natuurlijk ook bussen gereserveerd worden en we krijgen daar lunch geserveerd. Dus de mensen die zich opgegeven hebben die krijgen nog een mailtje binnenkort, kort na 1 oktober, met een bevestiging, en degene die zich niet opgegeven hebben die gaan niet mee. En ik ga er ook van uit dat degenen die zich wel opgegeven hebben, dat die ook komen he, het is natuurlijk een beetje vervelend tegenover de organisatoren als we daar met veel minder mensen zouden aankomen dan we aangemeld hebben. We zullen proberen, ik heb wat vragen gekregen over dat er 's middags verplichte practica zouden zijn van constructieleer en statistiek geloof ik, dus we zullen proberen om tijdig weer terug te zijn. Dat zal zeker niet om half 2 zijn, dus ik denk dat we ongeveer om half 3 terug zullen zijn, en we vertrekken gewoon na het college op donderdag, dus om half 11. Ik weet niet of, even kijken of ik al ga beginnen, Annex F. Searching 4. Assessment of tag clouds by lecturer Assessment approach The tag clouds produced in this research project (reported in Annex E and Annex F) have been evaluated by the lecturer of this course. These tag clouds have been produced in black and white with the same font face, in order to have only the font size as a distinctive element. This assessment was done in 2 steps: • quality assessment of the original tag clouds • quality assessment of the modified tag clouds (uniform, max 15 words) Original tag clouds 1 2 3 4 5 6 7 8 Annex F. Searching 243 9 10 The lecturer of the course was asked to assess the quality of the tag clouds using his own criteria. His main criteria for this assessment were: • a limited number of words to increase readability • showing the proper words The results of the first assessment are given in Table 4.1. Table 4.1: Tag cloud assessment of original tag clouds ID Description (source) Cleaned (*) Nr of words 1 Slide titles 1 35 2 3 Slide content Slide titles and slide content Slide notes Human subtitles A Human subtitles B Human subtitles C 1 1 100 100 1 1 1 100 100 100 25 2 4 5 6 7 8 Human subtitles, nouns only A 9 Human subtitles, nouns only B 10 SHoUT output, nouns only (*) 1 = after removing common Assessment results General appearance OK Too many little words Not OK Not OK Rank 3 4 15 Not OK Not OK Not OK OK Too many irrelevant words OK 2 15 OK 1 2 15 OK Word "Chloor" is missing Dutch words ; 2 = nouns only 1 2 The following conclusions have been made from these results: • tag clouds are only useful at a maximum of 15 words • new tag clouds have to be produced for further assessment Modified tag clouds Based on the results of the first assessment, a number of new tag clouds have been produced for re-assessment. All these modified tag clouds include 15 words. 1 244 2 Annex F. Searching 3 4 5 6 7 8 9 10 In this second assessment the lecturer was also requested to appoint words which he considered superfluous in the tag cloud. ID 1 Removed words the leven doet and 2 blackboard dijk prof no Annex F. Searching Remaining words practices gezondheidstechniek waterleidingingenieur drinking principles principes gezond drinkwater water technology schoon gezondheidstechniek niet-aangesloten drinkwaterafvalwateroppervlaktewatergrondwater watermanagement 245 sectie per afdeling 3 afdeling doet the and no 4 wel groot zoals waar zien heel natuurlijk gaat tijdens we ook die ik dat dan een in dus de je en het is van gaan weer moeten nou gaat maken zien heel goed wel jullie plaatje keer jaar soort dingen 5 6/7 8 9 246 soort keer dingen plaatje leakage water m3/h demand waterleidingingenieur groundwater practices watermanagement drinking water gezondheidstechniek demand m3/h principles project onderzoek water afvalwater civiel nederland nederland grondwater natuurlijk water oppervlaktewater grondwater water chloor drinkwater nederland stoffen drinkwatervoorziening vragen boek grondwater nederland boek drinkwater Annex F. Searching jaar 10 drinkwatervoorziening water stoffen oppervlaktewater vragen chloor nederland grondwater oppervlaktewater drinkwater stoffen wereld onderzoek kwaliteit water plaatje soort jaren jaar dingen mensen The results of the second assessment are given in Table 4.2. In this table the number of deleted words has been ranked (lowest = 1, etc) and added to the appearance ranking, giving a total rank score. Table 4.2: Tag cloud assessment of modified tag clouds (all 15 words) ID Description (source) Cleaned (*) 1 Slide titles 1 Words deleted 4 2 Slide content 1 7 3 Slide titles and slide 1 content 4 Slide notes 1 5 Human subtitles A 6/7 Human subtitles B 1 8 Human subtitles, 2 nouns only A 9 Human subtitles, 2 nouns only B 10 SHoUT output, 2 nouns only (*)1 = after removing common Dutch words Assessment results 5 Total rank 6 7 13 8 10 9 15 11 5 6 9 4 1 13 18 12 3 5 3 5 6 2 7 5 General appearance Many same sized (small) words Too many same sized (small) words Too many same sized (small) words Word "Chloor" is missing ; 2 = nouns only Rank Table 4.2 shows that the two tag clouds from nouns in the subtitles have the best overall ranking. These two tag clouds contain the same words, but differ in letter font and layout of the words. The best readable font (Coolvetica) was preferred by the lecturer over a less readable font (Vigo). The lowest number of "deleted words" was obtained from the slide titles. However the produced tag cloud contains a very low variance in font size, so did not drawn attention to special words. The variance in word count in subtitles is much larger giving a more pronounced picture. The tag cloud from SHoUT output has a lower ranking because it misses an important word, and has more "deleted words". The other produced tag clouds were significant less appreciated. The following conclusions have been made from these results: • tag clouds should contain less than 15 words • tag clouds should be obtained from "nouns only" Annex F. Searching 247 • • • tag clouds from subtitles (or speech recognition) are preferred over tag clouds from slide titles (or slide content / slide notes) because of their larger variance in font size tag clouds needs a "best readable font" tag clouds might be improved by removing bad words chosen by the lecturer The use of colored tag clouds is not evaluated, since this might be largely depending on the personal preference of a lecturer. 248 Annex F. Searching 5. Searching in recorded lectures Now that a database is available with all the relevant data for the recorded lectures of course CT3011, a search engine can be built to query this data. This has been done under the name "Collegerama Lecture Search". It offers a layered search engine that allows the user to choose the sources of data in which he/she wants to search. The interface of this search engine is shown in Figure 5.1. Figure 5.1: Collegerama lecture search When the user only selects the lecture titles and chapters without any additional query text, the system will generate a table of content of all the lectures. This list is based on the information provided by the lecturer during post-processing, so this information should be 100% accurate and relevant for each lecture. A generated list is shown in Figure 5.2. Figure 5.2: Table of content generated by Collegerama lecture search Annex F. Searching 249 When the search engine returns a result set, each row is color coded based on the source of the information that is being displayed. This gives the user an idea of the granularity of his search results and he can choose to either remove or add more sources to his result set to expand or limit the number of returned rows. An example of this is shown in Figure 5.3. Figure 5.3: Layered result set returned by Collegerama Lecture Search 250 Annex F. Searching 6. Evaluation of searching in recorded lectures A relevant evaluation method for the Collegerama lecture search is the "know item search". With this method, the search engine is tested on the retrieval of known items or selected keywords. For this research project, known item testing is done for: • comparing the retrieval rate for ASR output versus for full subtitling • comparing the retrieval rate for all text types in Collegerama lecture search Comparing subtitles and ASR output in search Comparing the retrieval rate for ASR output versus full subtitling is done on two subsets of "know items" or keywords: • most-used words • most important words Most used words The retrieval for the 20 most used words from subtitles data versus ASR data is presented in Table 6.1. The data has been abstracted from lecture #15. The human-made subtitles contain 6,970 words and the ASR output contains 7,351 words. Table 6.1: Retrieval of 20 most used words from subtitles versus ASR Known item (word) dus we het dat van ook ik op die een daar je dan met de in zijn is en niet Total Human-made subtitles (ref) (rank) 13 10 4 1 7 9 14 19 11 6 16 12 15 20 2 8 18 5 3 17 (number) 93 128 220 269 162 134 88 54 113 174 76 107 80 53 231 151 54 181 225 55 2.648 ASR (number) 22 46 155 218 132 109 74 46 99 158 72 117 89 59 276 192 75 270 359 165 2.733 Word accuracy / Retrieval rate (%) 24% 36% 70% 81% 81% 81% 84% 85% 88% 91% 95% 109% 111% 111% 119% 127% 139% 149% 160% 300% 103% Table 6.1 shows that the 20 most common words amount to (2648/6970=) 38% respectively (2733/7351=) 37% of the total number of words. The larger number of words in ASR can be explained by the tendency of SHoUT to decode long words into smaller components. The words "dus" and "we" have a WA-value (word accuracy) of below 50%, or a WER-value (word error rate) of above 50%. The word "dus" will only be retrieved 24% of the time and the word "we" 36% of the time, in the case where only ASR data is available. It is assumed that human-made subtitles have a WA-value of 100%. Annex F. Searching 251 The words "en" and "niet" have a WA-value of above 150% and a WER-value above 150%. This means that the word "en" will be retrieved for 1.6 times and the word "niet" 3 times more than the actual number. This high retrieval is again caused by the tendency of SHoUT to split up longer words. Important words The retrieval for the 15 most- used nouns from subtitles data versus ASR data is presented in Table 6.2. The data has been abstracted from the same lecture. In determining the retrieval of the word "water", composed words such as "drinkwater", "drinkwatervoorziening", "grondwater", "oppervlaktewater" has not been included (as is the case for "drinkwater" in "drinkwatervoorziening"). This table also shows the 5 words that are marked by the lecturer as less relevant in the assessment of tag clouds (see chapter 4), leaving the ten "most important words" or "ok words". Table 6.2: Retrieval of 15 most used nouns from subtitles versus ASR Known item (word) Lecturer check chloor drinkwatervoorziening boek oppervlaktewater plaatje vragen soort water stoffen grondwater Nederland dingen keer drinkwater jaar Total Total ok words ok ok ok ok ok ok ok ok ok ok Human made subtitles (ref) (rank) (number) 10 13 16 5 15 8 15 7 11 11 10 12 9 14 39 1 9 15 21 3 36 2 16 6 13 9 13 10 16 4 249 184 ASR (number) 0 4 5 7 6 6 7 33 8 20 35 17 16 16 28 208 134 Word accuracy / Retrieval rate (%) 0% 25% 33% 47% 55% 60% 78% 85% 89% 95% 97% 106% 123% 123% 175% 84% 73% The most remarkable result in the retrieval rate is the word "chloor", which has been indicated by the lecturer as one of the ten most important words. This word has not been recognized by SHoUT as being an uncommon word in the Dutch language. This word or item is therefore not retrieved from the lecture if no correct subtitles are available. A retrieval rate of above 100%, as for "jaar", "keer" and "drinkwater", shows that for searching composed words in the ASR output, it is better to search for word components instead of full words. This is illustrated by the low retrieval rate for the word "drinkwatervoorziening". A retrieval rate of above 50% is expected from the ASR output as the accepted or expected quality level for ASR engines. The word "boek" has a lower retrieval rate in the ASR output, which shows that for SHoUT this word is difficult to decode. This word has also been indicated by the lecturer as one of the ten most important words. Searching for different text-types In order to determine the most important text type during searching, the ten most important known items of lecture #15 have been searched. The result of these know item search in Collegerama lecture search has been shown in Table 6.3. 252 Annex F. Searching Table 6.3: Retrieval of 10 most important items/word from different text types Known item (word) water Nederland grondwater drinkwatervoorziening boek oppervlaktewater drinkwater chloor vragen stoffen total non-retrieved words Subtitles 39 36 21 16 15 15 13 10 10 9 ASR 33 35 20 4 5 7 16 0 6 8 Slide titles 5 0 0 0 0 0 1 0 0 0 Slide text 3 0 5 0 1 5 5 0 0 0 Slide t+t 8 0 5 0 1 5 6 0 0 0 Slide notes 0 3 1 2 0 2 3 0 0 0 Lecture title 0 1 0 1 0 0 0 0 0 0 Lecture chapter 0 0 0 0 0 0 0 0 0 0 184 100% 0 0% 134 73% 1 10% 6 3% 8 80% 19 10% 5 50% 25 14% 5 50% 11 6% 5 50% 2 1% 8 80% 0 0% 10 100% The results of Table 6.3 show that for searching in lectures, the lecture titles and lecture chapter titles are of no relevance. These text types give a 0%-1% retrieval rate for the most important words and 80%-100% of these words give no results at all. These text type are particularly suitable for navigation but apparently not for searching. To a lesser extent, the same holds true for slide content. These types give a retrieval rate of 3%-14% for the most important words and 50%-80% non-retrieved words. The retrieval rate of ASR for the most important words is 73% and only 10% are non-retrieved words. These results show that ASR gives a drastic increase in the retrieval rate over slide content. The retrieval rate for the most important words is significantly higher than the overall word correctness of ASR for this lecture (73% versus 46%). Having subtitles will further increase the retrieval rate to an assumed 100% value, as human-made subtitles in real timesuppressed environments have a tested word correctness of 96%-100%. Duration per text type The retrieval rate indicates how much of the items are found in a search, but not how long it will take to really find this item. Searching an item in (non time-tagged) transcripts may indicate the lecture in which the item is used, but the user has to watch/listen to the whole lecture to really see the searched result. Assuming a constant speaking rate might give a best guess to jump to the equivalent time-frame, but in most cases this is not suitable for the user. The time correctness of a search is related to the length or duration (end time minus start time) of the related video fragment. The durations per text type in Collegerama lecture search are shown in Table 6.4. Table 6.4: Duration of text types in Collegerama lecture search for Course CT3011 Text type Description Lecture title Transcript (lecture) Lecture chapter Slide title Slide content Slide notes Transcript (slide) Transcript (sentence) Transcript (word) Lecture recording Annex F. Searching Minimum (sec) 1,351 Maximum (sec) 3,231 Mean (sec) 2,451 Chapters by lecturer Slide data 15 2 2,197 611 592 55 Subtitles ASR output 0.6 0.0 6.0 3.4 3.4 0.3 253 Table 6.4 shows that the duration for slides may vary between 2 seconds and 7:28 minutes, with a mean value of 58 seconds. This means that on average the user has to wait for nearly 1 minute to encounter his searched item. This duration might be acceptable for recorded lectures, as most spoken text has a relevant surrounded text. In general all spoken text belongs to that particular slide, as the lecturer more or less explains the slide content. More detailed searching for a specific sentence can be achieved by searching in subtitles or time-tagged words (such as the ASR output of SHoUT). With time-tagged words, it is possible to show a kind of karaoke-type subtitling, with sentences and coloring of the spoken word. An example of this can be seen at the website for Radio Oranje, in which old transcripts has been time-tagged by ASR (SHoUT). Multiple-keyword search Students might use a search engine for recorded lectures during preparation of their exam. They might be looking for a passage that was once mentioned in an earlier lecture or a specific exam question that was discussed. These searches probably contain more than one keyword, for example the keywords “stoffen” and “grondwater”. The search engine on individual subtitles will not give a positive result, as these keywords are never used in one particular sentence and won't be retrieved as one record in the database. The same holds true for searching on individual words from ASR. A solution to this problem is offered by storing all spoken text belonging to a slide, called a slide transcript. The time-code contains a start and end time for the slide. The same is done for an entire lecture. This will allow for the searching of combined keywords. The student can use the slide or lecture timeframe as the starting point for further viewing. Spoken text per slide is included in the database but not implemented in the prototype for the web interface. Evaluation of this feature has been done directly on the database. This approach results in the storing of the same data in multiple records. Transcripts per lecture could be searched by a search engine using the transcript per word (ASR output). The approach used gives additional flexibility in the layout of transcripts, which enables more sophisticated output options. A lecture transcript can be printed in a more convenient way if additional line breaks are included. This option is not available if lecture transcript are automatically abstracted from word transcripts. If a multiple-keyword search is done on the ASR data for the words “stoffen” and “grondwater” in lecture #15, 8 results are returned. When clustering this result set by slide, there are only 2 slides out of a total of 29 slides that contain both keywords. The slide timeframe 24:07-25:09 gives 1 paired result and the slide timeframe 29:47-33:35 gives 4 paired results. The total viewing time for the combined results is reduced from the lecture duration of 45:09 minutes to only 1:02 + 3:48 = 4:50 minutes. Keyword search for all lectures The 4 most important words of lecture #15, the ones with the best ASR accuracy, can be used for evaluation of the search engine on all lectures of the course. It is assumed that these 4 keywords (“stoffen"”, “grondwater”, “Nederland” and “dingen”) will also give a high accuracy for the other lectures, despite the fact that most of these lectures were given by other lecturers. The results of this evaluation test are shown in Table 6.5. 254 Annex F. Searching Table 6.5: Occurrence of 4 important words in all lectures Lecture 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Total occurrence Portion in #15 Number of lectures Nederland 9 13 5 4 15 4 10 6 7 2 11 8 10 12 35 16 17 6 6 9 30 23 10 4 4 7 dingen 7 17 10 9 8 10 9 5 13 9 5 6 8 3 17 1 8 6 9 4 6 4 8 1 4 5 5 2 grondwater 2 3 18 8 2 5 2 11 20 2 12 26 2 75 3 21 10 1 - stoffen 1 8 11 11 3 4 15 1 - 283 12% 26 199 9% 28 223 9% 18 54 15% 8 Table 6.5 shows that lecture #15 is the most important lecture for the keyword “Nederland”, with the highest occurrence (35 times). However this keyword is found in all but 2 lectures, with also a high occurrence in lecture #21 (30 times) and #23 (23 times). The keyword “dingen” is found in all lectures, with equal occurrence for lecture #15 and lecture #2. The keyword “grondwater” is found in 18 lectures. Lecture #21 (“Grondwaterzuivering”) seems to be the most important lecture for this item, with the highest occurrence. The keyword “dingen” is found in only 8 lectures with lecture #23 (“Oppervlaktewaterzuivering”) as the most relevant lecture for this item. Multiple keyword search for all lectures The two keywords occurring in the lowest number of lectures (“stoffen” and “grondwater”) have been used in a multiple keyword search. Table 6.6 gives the results of this search. Annex F. Searching 255 Table 6.6: Occurrence of combinations of 2 important words in all lectures Lecture 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Total occurrence Number of lectures Number of slides Total duration stoffen grondwater 2 3 18 8 2 5 2 11 20 2 12 26 2 75 3 21 10 1 - stoffen + grondwater in lecture 8 11 11 2 4 15 1 - stoffen + grondwater in slide 1+4 1+1+1 1+1+2+1+2 0 1+1+1 1+2+1 1 - 1 8 11 11 3 4 15 1 - 54 8 - 223 18 - 52 7 270 5:22:05 23 6 17 20:57 Table 6.6 shows that both keywords are present in 7 lectures. Without a multiple-keyword search per slide, this will require a total viewing time of 5:22:05 hours in order to see all results. If a search is done on slide level only 6 lectures will be retrieved, with a total of 17 slides in which the combination of keywords is found. This reduces the viewing time to only 20:57 minutes. Searching on slide level reduces the total viewing time to 6.5%, or a reduction of 93.5%. Precision and recall measurement For the Collegerama lecture search engine “Precision and Recall” measurements can be executed on the data of Lecture #15 in which the human-made subtitles can be considered as known precise objects. As object for these tests the slides of the lectures can be used. A slide is regarded to give a completed sub set of a lecture in which the related subject is explained. In this way, a slide can be considered as an object or a document. The test was done on 3 of the 10 “important words” of Lecture #15: “stoffen”, “grondwater” and “chloor”. The results of these test are shown in Table 6.7, Table 6.8 and Table 6.9. 256 Annex F. Searching Table 6.7: Keyword “stoffen” per slide in different text types Slide Time frame # 0:00 – 7:28 1 7:28 – 9:48 2 9:48 – 10:20 3 10:20 – 13:12 4 13:12 – 13:47 5 13:47 – 13:50 6 13:50 – 14:39 7 14:39 – 15:46 8 15:46 – 16:39 9 16:39 – 17:20 10 17:20 – 18:06 11 18:06 – 18:50 12 18:50 – 20:39 13 20:39 – 23:11 14 23:11 – 24:07 15 24:07 – 25:09 16 25:09 – 27:08 17 27:08 – 29:47 18 29:47 – 33:35 19 33:35 – 35:49 20 35:49 – 38:48 21 38:48 – 39:08 22 39:08 – 39:27 23 39:27 – 40:02 24 40:02 – 40:36 25 40:36 – 41:28 26 41:28 – 43:23 27 43:23 – 43:59 28 43:59 – 45:09 29 Occurrence Relevant slides Slides retrieved Relevant slides retrieved Recall Precision Annex F. Searching Subtitles ASR 1 2 1 2 5 4 1 1 9 4 4 4 100% 100% 8 4 4 4 100% 100% Slide titles Slide content Slide notes Chapter title Lecture title 0 4 0 0 0% - 0 4 0 0 0% - 0 4 0 0 0% - 0 4 0 0 0% - 0 4 0 0 0% - 257 Table 6.8: Keyword “grondwater” per slide in different text types Slide Time frame # 0:00 – 7:28 1 7:28 – 9:48 2 9:48 – 10:20 3 10:20 – 13:12 4 13:12 – 13:47 5 13:47 – 13:50 6 13:50 – 14:39 7 14:39 – 15:46 8 15:46 – 16:39 9 16:39 – 17:20 10 17:20 – 18:06 11 18:06 – 18:50 12 18:50 – 20:39 13 20:39 – 23:11 14 23:11 – 24:07 15 24:07 – 25:09 16 25:09 – 27:08 17 27:08 – 29:47 18 29:47 – 33:35 19 33:35 – 35:49 20 35:49 – 38:48 21 38:48 – 39:08 22 39:08 – 39:27 23 39:27 – 40:02 24 40:02 – 40:36 25 40:36 – 41:28 26 41:28 – 43:23 27 43:23 – 43:59 28 43:59 – 45:09 29 Occurrence Relevant slides Slides retrieved Relevant slides retrieved Subtitles Recall Precision 258 ASR Slide titles Slide content Slide notes Chapter title Lecture title 1 1 1 2 7 1 1 7 9 8 2 2 21 5 5 5 20 5 6 5 0 5 0 0 1 5 1 0 1 5 1 0 0 5 0 0 0 5 0 0 100% 100% 100% 83% 0% - 0% 0% 0% 0% 0% - 0% - 1 Annex F. Searching Table 6.9: Keyword “chloor” per slide in different text types Slide Time frame # 0:00 – 7:28 1 7:28 – 9:48 2 9:48 – 10:20 3 10:20 – 13:12 4 13:12 – 13:47 5 13:47 – 13:50 6 13:50 – 14:39 7 14:39 – 15:46 8 15:46 – 16:39 9 16:39 – 17:20 10 17:20 – 18:06 11 18:06 – 18:50 12 18:50 – 20:39 13 20:39 – 23:11 14 23:11 – 24:07 15 24:07 – 25:09 16 25:09 – 27:08 17 27:08 – 29:47 18 29:47 – 33:35 19 33:35 – 35:49 20 35:49 – 38:48 21 38:48 – 39:08 22 39:08 – 39:27 23 39:27 – 40:02 24 40:02 – 40:36 25 40:36 – 41:28 26 41:28 – 43:23 27 43:23 – 43:59 28 43:59 – 45:09 29 Occurrence Relevant slides Slides retrieved Relevant slides retrieved Recall Precision Annex F. Searching Subtitles ASR Slide titles Slide content Slide notes Chapter title Lecture title 0 2 0 0 0% - 0 2 0 0 0% - 0 2 0 0 0% - 0 2 0 0 0% - 0 2 0 0 0% - 0 2 0 0 0% - 8 1 9 2 2 2 100% 100% 259 Ranked search results The search results have to be ordered according to a certain norm. In this research project, two of these options have been evaluated: • time-based • rank based Time-based In this order method, all the results are sorted in chronological order. This makes sense for recorded lectures, assuming the sequential explanation of key items in lectures. Later in the course, the key items are explained in further detail. In SQL Server, this can be accomplished by ordering the query results on Lecture_nr and Start_time. The query that can be used for this is shown below: SELECT * FROM Content INNER JOIN Lectures ON Content.Lecture_id = Lectures.Lecture_id WHERE CONTAINS (Text, 'stoffen') AND Lecture_ID = '15' ORDER BY Lectures.Lecture_nr, Start_time, Content.Text_type Rank based SQL Server has a function ranks search results based on several factors: • text length • number of occurrences of search words/phrases • proximity of search words/phrases in proximity search • user-defined weights The query that can be used for this is shown below: SELECT * FROM Content AS FT_TBL INNER JOIN CONTAINSTABLE(Content, Text, 'stoffen') AS KEY_TBL ON FT_TBL.Content_id = KEY_TBL.[KEY] WHERE Lecture_ID = '15' ORDER BY KEY_TBL.RANK DESC; 260 Annex F. Searching Table 6.10: Ranked search results using CONTAINSTABLE for the word "stoffen" Text type 9 9 9 9 9 9 9 9 8 8 8 8 8 7 6 8 8 8 8 7 7 7 Text stoffen stoffen stoffen stoffen stoffen stoffen stoffen stoffen Allerlei stoffen die worden afgefiltreerd tussen het z… Water is een natuurlijke stof en de verontreiniging… dat gaat reageren met bepaalde stoffen die van nature… En dat zijn dus ongewenste stoffen. Dat zijn stoffen die giftig kunnen zijn. Een plaatje met een aantal kernbegrippen vast, het… Na een ruime inlooptijd kunnen we beginnen met het… We zeggen nee, dat zijn ongewenste stoffen, die willen… En anderzijds, om verschillende soorten stoffen met… De interactie, de lozing van stoffen die eventueel plaats… Dus het oppervlaktewater bevat een volledige cocktail… En tenslotte doen we natuurlijk ook onderzoek, vooral… dat zien we hier, zakt dat vanzelf de grond in. Dat… Oppervlaktewater hebben we meervoudige barrieres. Op… Annex F. Searching Key Rank 6223 6514 6532 7516 7551 7553 7593 8875 10115 10144 10235 10239 10240 89 10456 10244 10376 10146 10221 87 86 95 192 192 192 192 192 192 192 192 192 192 192 192 192 137 123 96 96 96 96 76 48 48 Start time 1460970 1566710 1573320 1927800 1941580 1942270 1955460 2428180 1460700 1563000 1926200 1940200 1941900 1787000 0 1954000 2425500 1571300 1879800 1509000 1447000 2402000 261 Table 6.11: Ranked search results using CONTAINSTABLE for the word "grondwater" Text type 6 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 8 7 8 8 8 8 8 8 8 8 8 8 7 4 8 8 8 8 8 8 8 8 8 8 7 5 7 262 Text Na een ruime inlooptijd kunnen we beginnen met het… Als we naar de opzet van de infrastructuur kijken, dan z… grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater grondwater het winnen van grondwater, Een plaatje met een aantal kernbegrippen vast, het… Ja, dat water wordt heel goed gefiltreerd en dat grond… dus grondwater is over het algemeen goed. en dat kan uiteindelijk ook in het grondwater terecht... maar gemiddeld gesproken is grondwater toch van een… Gebruik grondwater als het mogelijk is. Ja, want ik zie op de waddeneilanden wordt wel grond… Rotterdam heeft geen grondwater, Rotterdam heeft ook… Nou, grondwater dat plaatje. Als je een mental map hiervan maakt, van grondwater… kunstmatig grondwater te maken via die infiltratie. Nou, grondwater dat plaatje. Ik denk dat dit toch wel… •27 sept.Inleidinggezondheidstechniek •1 okt.Waterkw… Dus dan win je eigenlijk een soort kunstmatig grondwater. en andere verontreiningen bevat, maak je een soort… het feit dat we gebruik maken van grondwater en opper… Grondwater is ook in Nederland vaak nog van een hele… en dat wordt het een soort kunstmatig grondwater. wordt alleen maar grondwater gebruikt voor de drinkwa… Daar is grondwater beschikbaar, dat is van goede kwal... Zeewater, zout, precies he. Dus het grondwater hier is... Dus ja, hier kun je geen grondwater gebruiken, dus… vuilnisstortplaatsen, en die kunnen het grondwater ook … dat zien we hier, zakt dat vanzelf de grond in. Dat noe… Een voorbeeld van een recent project waar ik in mijn… We hadden natuurlijk gezondheidstechniek. Nou dat zal… Key Rank 10456 91 4559 6130 6309 7152 7159 7222 7243 7277 7297 7305 8197 8199 8214 8235 8243 8332 8357 8585 8658 8682 9932 89 10206 10208 10213 10214 10303 10328 10346 10355 10357 10324 93 1731 10120 10122 10202 10203 10326 10305 10306 10319 10321 10211 86 69 77 240 205 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 160 106 80 80 80 80 80 80 80 80 80 80 80 80 80 40 Start time 0 2149000 847370 1425430 1492940 1797370 1802540 1825000 1831790 1843730 1850630 1853580 2172810 2174280 2182320 2190940 2193580 2227140 2235120 2316190 2349830 2357870 846900 1787000 1822600 1831000 1849400 1851900 2181800 2254000 2315200 2349500 2354700 2243800 2348000 620000 1479900 1489400 1795900 1802700 2250700 2190000 2193100 2225000 2234000 1841700 1447000 1239000 830000 Annex F. Searching Table 6.12: Ranked search results using CONTAINSTABLE for the word "chloor" Text type 7 8 8 8 8 8 8 8 6 8 8 7 Text Een plaatje met een aantal kernbegrippen vast, het… Amerikanen die vinden het vanzelfsprekend om chloor… het drinkwater smaakt ook naar chloor daar, ruikt ook naar chloor daar. namelijk we weten dat als je chloor toepast, Dus we willen chloor gewoon niet gebruiken. Er is nog een praktisch ander aspect en dat is dat water… en ook geen chloor. Na een ruime inlooptijd kunnen we beginnen met het… Chloor leidt gewoon tot die giftige verbindingen en dat… En heel bijzonder in internationaal verband, we… Nou, het resultaat daarvan is dan dat we dus... Aan de… Key Rank 89 10228 10229 10230 10234 10245 10253 10394 10456 10250 10227 97 288 224 224 224 224 224 224 224 160 112 112 44 Start time 1787000 1907200 1910300 1912000 1923000 1957100 1989300 2501500 0 1978900 1900900 2488000 Evaluation Table 6.10, Table 6.11 and Table 6.12 show that the ASR results (text_type = 7) are higher ranked than the other types, because each word has their own record. This means that the document length is effectively the smallest size possible. According to the relevance ranking system Okapi BM25, these will be evaluated as being of a very high relevance. (Source: http://nlp.uned.es/~jperezi/Lucene-BM25/) Similar results can be expected for subtitles and slide titles in comparison with slide notes and slide transcripts. These effects might be corrected by using user-defined weights for different text types. However, this has not been tested in this research project. The current search engine uses time-based ordering. Annex F. Searching 263 7. Evaluation Since there is so much additional metadata available for online recorded lectures, a Collegerama data system is an absolutely necessery addition. It gives several new options for searching: • creating tables of content • generating tag clouds that give an insight into the subject of a lecture • a layered search engine based on different data sources • it allows for teachers to add additional lecture and chapter information after the recording has been processed and stored The database should always contain all slide content, slide titles and slide notes (when available). Data collected from the lecturer (lecture title, chapter title etc) during postprocessing is also extremely useful, since they very accurately reflect the subjects and topics covered in the lecture. The chapter titles and the slide titles are essential for proper navigation (table of contents). For a better understanding of a lecture, the subtitles are also regarded as a beneficial element in recorded lectures. Subtitles allow for: • improved viewing of the lecture (simultaneous listening to and reading of the spoken text) • create the option for translated subtitles in other languages by using machine translation • enlarges the reach of a search engine, giving larger result sets for the viewer to select 264 Annex F. Searching Expanding the usabbility of recorded lectures Expanding the usability of recorded lectures A new age in teaching and classroom instruction E.L. de Moel EE.L. de Moel