D4.1.1.3 Crossmedia C+
Transcription
D4.1.1.3 Crossmedia C+
WP4 CROSSMEDIA SOLUTIONS D4.1.1.3 CROSSMEDIA METADATA AND SEARCH NEEDS Deliverable number D4.1.1.3 Crossmedia metadata and search needs Authors: Marjo Markkula, Pirkko Oittinen, Sanna Olkkonen, Eero Sormunen, Stina Westman Confidentiality: Consortium Date and status: Jan 12 2011, Version 1.0 th This work was supported by TEKES as part of the next Media programme of TIVIT (Finnish Strategic Centre for Science, Technology and Innovation in the field of ICT) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Version history: Version Date State (draft/ final) 0.1 0.2 0.3 1.0 31.12.2010 10.1.2011 12.1.2011 16.3.2011 /update/ draft update update final Author(s) OR Remarks Editor/Contributors SO, SW MM, ES, SO, SW MM, ES, SW SW First draft of Aalto part Update with TaY parts added Revision by research parties Participant role Participants Organisation Case company representatives Harri Juutilainen, Hanna Nurminen, Pauli Tölli Suomen Tietotoimisto Lehtikuva Case company representatives Jouni Frilander, Pekka Kauranen, Tuula Peltonen Yleisradio Researchers Pirkko Oittinen. Sanna Olkkonen, Stina Westman Aalto/Mediatekniikka Researchers Marjo Markkula, Eero Sormunen Tampereen Yliopisto - INFIM next Media www.nextmedia.fi www.tivit.fi WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 1 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Executive Summary This deliverable discusses content annotation in crossmedia production processes which handle image and video content. Two studies were conducted at participating case companies. The first one focuses on metadata production processes in two companies (news agency, television broadcaster). The other investigates crossmedia information needs and selection criteria in television program making. Metadata workflows were analyzed based on interviews and observations at the two case companies. Metadata was found to serve multiple goals including aiding content discovery, filling information needs during production, facilitating journalistic processes, promoting content and enabling monetary compensation. The metadata fields currently used were classified according to which content facet they referred to: low-level content (e.g. color, motion), high-level content (semantic descriptions), structure (e.g. segmentation), lifecycle (production information), identification/location (e.g. labels), or management metadata (e.g. rights). The processes with which metadata was created, checked and accessed (e.g. for searches) were modeled as the metadata flow, alongside the essence flow in production. This analysis showed that planning and journalistic work processes were not fully interlinked phases in the metadata lifecycle, and that systems could be further integrated to streamline metadata production and management. Crossmedia user needs were investigated by analyzing search topics and video selection criteria connected to journalistic tasks in television production, as described by program team members. A model was developed and used to classify users’ search and selection criteria: 1. Content criteria including concrete visual elements (objects, action, events, places, time), themes and impressions, duration, audio attributes, shooting technique (e.g., shooting distance, angle) and technical qualities (e.g., color). 2. Document context criteria: document origin (e.g., name), availability (e.g., copyright), usage (e.g., used too much) and metadata (e.g., well documented), corresponding to life-cycle, identification, location and management metadata. 3. Use context criteria: searcher and his situation (e.g., enough material already, various kinds of video), difficult consider as metadata. User criteria in video searching and selection focused on video content. Most addressed high-level content but some low-level criteria were also identified. In search topics the focus was on specific entities (named persons, objects, events, places), generic entities (types of people, objects, events, action, places), abstract themes and impressions, and linear time. When reading textual metadata, contextual attributes became important. While watching retrieved content, audiovisual features such as visual appearance, affective qualities, shooting techniques and technical quality were evaluated. High-level content annotation currently requires a lot of resources in metadata production. The metadata production processes are challenging in terms of systems integration and information flows throughout content lifecycle. As conclusions for the work presented here, potential tracks to improve the cost-effectiveness of metadata production were suggested. These serve as future topics for research and company development in the coming years of the project. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 2 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Table of Contents Executive Summary ............................................................................................................... 2 1 Introduction ....................................................................................................................... 4 1.1 Partners .................................................................................................................... 4 1.2 Methods .................................................................................................................... 4 1.3 Deliverable structure ................................................................................................. 5 2 Metadata workflows .......................................................................................................... 6 2.1 Essence and metadata flow in a content centric process model ............................... 6 2.2 Image agency ........................................................................................................... 7 2.2.1 Metadata structure ........................................................................................ 7 2.2.2 Metadata lifecycle ......................................................................................... 9 2.3 Broadcasting company ........................................................................................... 11 2.3.1 Metadata structure ...................................................................................... 11 2.3.2 Metadata lifecycle ....................................................................................... 13 2.4 Comparisons of metadata models and lifecycles .................................................... 15 2.5 Process requirements for content annotation .......................................................... 16 3 Crossmedia search and selection criteria........................................................................ 18 3.1 Video searching in television program production process...................................... 18 3.2 Model of video attributes ......................................................................................... 19 3.3 Crossmedia search criteria ..................................................................................... 20 3.3.1 Search topic categories............................................................................... 20 3.3.2 Criteria in search topics............................................................................... 21 3.3.3 Selection criteria and evolution of criteria during search process ................ 23 3.4 Crossmedia search needs and selection criteria ..................................................... 25 4 Conclusions .................................................................................................................... 26 Table of Tables Table 1 Metadata structure at STT-Lehtikuva ........................................................................ 8 Table 2 Metadata structure at Yleisradio ..............................................................................12 Table 3 The distribution of different types of search topics in timeline interviews ..................21 Table 4 The distribution of user criteria in search topics using a) criteria occurrence, b) search topic, and c) work task as the unit of analysis. ..........................................................22 Table 5 The distribution of user criteria at three different phases of search process: (1) search topics, (2) selections based on textual metadata and (3) selections based on video image and audio ...................................................................................................................24 Table of Figures Figure 1 Content-centric essence and metadata flow model (Mauthe & Thomas 2007)......... 6 Figure 2 Essence and metadata lifecycle at STT-Lehtikuva .................................................10 Figure 3 Essence and metadata lifecycle at Yleisradio .........................................................14 Figure 4 The stages of the work process in making a television program according to Markkula & Sormunen (2006) ...............................................................................................18 WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 3 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme 1 Introduction Many categories of metadata vital in searching are based on the intellectual analysis of media essence increasing the cost of metadata production. Further, the content and context of crossmedia entities (e.g. still images, video clips) provide a vast number of potential aspects which could be represented in metadata. Media companies face four major challenges in developing high performance but costeffective content management for crossmedia assets: 1. How to identify the minimum set of metadata elements that are essential in searching, selecting and using crossmedia entities by different user groups for different purposes? 2. How to integrate metadata production to the lifecycle of media production and optimize it? 3. How to reallocate intellectual indexing resources on key metadata appropriate in state-of-the-art crossmedia archives? 4. How to find and apply the most productive automatic methods for media content analysis in metadata production? The study reported in this deliverable attacks the challenges on two research lines. The first one focuses on metadata production processes in two media companies operating in different fields of media production. The other aims at revealing crossmedia information needs and selection criteria in television program making in one of the partner companies. 1.1 Partners Yleisradio (YLE) is the Finnish public service broadcasting company. YLE has four national television channels as well as six radio channels and manages a vast collection of media content. YLE archives contain books, articles and web content, still images, audio content (music, radio programs and sound effects) and about 280 000 hours of television material stored in different formats (Yleisradio 2010, Vänttinen 2010). STT (Suomen Tietotoimisto, the Finnish News Agency) is an independent news provider with many Finnish and a number of international media companies as customers (STT 2010). STT has participated earlier in a study (TIVIT Flexible Services) concerning metadata in crossmedia editorial processes (Gylfe 2009). STT acquired Lehtikuva, the largest Finnish image agency in January of 2010. The decision was made to focus on the Lehtikuva operations for the metadata study described in this deliverable. It would enable comparison between the two units of the newly formed crossmedia company. Lehtikuva has several own photographers and it transmits images from several international image agencies to its news image clients. Lehtikuva’s online image shop is available to all registered clients. Lehtikuva has over 8 million images in their image archives. Video content was added to their products in 2005. 1.2 Methods The data for the metadata and workflow study were collected at YLE’s editorial offices and archives in Pasila, Helsinki and Tohloppi, Tampere. The majority of the material was collected in the current affairs editorial office, but persons in the news production unit and the television archive were also WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 4 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme included in the interviews and observations. Furthermore, data collection was carried out at STTLehtikuva’s premises in Helsinki. All the data was collected in the Lehtikuva unit. The data for the user needs study were collected at YLE’s editorial offices and archives in Pasila, Helsinki and Tohloppi, Tampere. The material was collected at documentary, factual, current affairs, news production and children’s units and in the television archive. For the study on metadata workflows, twelve people (journalists, image editors, broadcasting assistants, archivists, producers and IT specialists) were observed and interviewed in the research partners’ organizations. Documents on guidelines and instructions related to their metadata processes were also collected and analyzed. Based on the data gathered, an analysis on the types of metadata assigned to content during the production process was done. The metadata were classified according to the type of annotation they provide. The workflows related to content essence and metadata were modeled in a parallel fashion. The models depict the process stages and systems with which different type of metadata is added, checked or accessed. In investigating cross media search needs, a task-based approach was adopted. Data on user criteria were gathered by time line interviews of ten television professionals working in program production. User needs expressed in search and selection criteria were investigated at three different points of the search process: (1) criteria in search topic definitions, (2) selection criteria applied on textual surrogates of video documents (online and paper prints) at the beginning of the search process and (3) selection criteria applied in watching actual video documents at the latter part of the process. 1.3 Deliverable structure The rest of the deliverable is structured according to the two studies. Section 2 corresponds to the metadata workflow study and section 3 to the crossmedia user needs study. Section 2 reports on the essence and metadata workflows in the two partner companies. Section 2.1 introduces the content-centric workflow models in media production environments. Section 2.2 presents the metadata field classification and section 2.3 the essence/metadata workflow models. Section 2.4 discusses requirements for content annotation based on the study. In section 3, users’ needs for video annotation are presented. Section 3.1 introduces the program production process and video searching in television environments. Section 3.2 presents a model of user needs. In Section 3.3, we report the results of the study. First, we consider users’ search topics in categories (3.3.1). Second, criteria in search topics are presented (3.3.2) and third, criteria applied in selections and their evolution through the search process are presented (3.3.3.). Section 3.4 sums up the results. The conclusions of the two studies are presented in Section 4. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 5 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme 2 Metadata workflows 2.1 Essence and metadata flow in a content centric process model The classical definition of metadata is ‘data about data’. According to a more descriptive definition metadata is “structured information that describes, explains, locates or otherwise makes it easier to retrieve, use or manage an information resource” (Hodge 2001). In this project metadata was studied from the viewpoint of content management processes in media companies. According to the Society of Motion Pictures and Television Engineers content is the sum of essence and metadata: content = essence + metadata. This definition states that essence, i.e. the essential data delivered to the viewer (video, audio, images, graphics or text), is not content without the metadata describing it. The essence is unusable if it is just a part of an unorganized collection of bits. It has to be described by metadata so that it can be found and used, thus making descriptive metadata key in content management processes. Workflow processes in media production have traditionally been described with sequential process models where the process steps follow one another. However, nowadays the workflow of content creation process is hardly ever organized in a strictly sequential manner. Digital asset management systems allow connection of isolated systems, thus there is no need to follow a strictly sequential process where one phase is completed before the next one begins. Content is accessed and metadata added throughout various phases in the production, and the changes made are shared with all the parties working with the project. Such a content-centric workflow is visualized in the model presented by Mauthe & Thomas (2007). Figure 1 Content-centric essence and metadata flow model (Mauthe & Thomas 2007) WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 6 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme In this type of workflow the content is at the core of the process. All the information is stored in a content management system and updated as the process goes forward. The progress of the work can be observed by all the parties taking part in the production process. The production allows working in parallel with content items. This kind of a process is also said to enable gathering a richer set of metadata, faster access to content and easier reuse of content (Mauthe & Thomas 2007). This model matches the existing requirements for fast production cycles for media content. The model is also useful in the context of this research as it separates the flow of essence and metadata. In this section, we report on (1) the current metadata models used at two media companies and (2) how these metadata are produced in parallel with the content essence in the workflow. The process models for the metadata and essence flows in the partner companies are presented analogously to Mauthe’s and Thomas’ model (Figure 1). The metadata models reflect the types of content annotation needed and currently implemented in the production processes and systems. The flow models may be used to identify when in the process, using which system and by whom is metadata added, checked and accessed. Both may be utilized to identify potential bottlenecks in the metadata workflow. They also serve as comparison points to the user needs discussed in section 3. 2.2 Image agency 2.2.1 Metadata structure The metadata structure at STT-Lehtikuva is based on an IPTC standard but numerous other metadata items have been added to the metadata model as well. When the image archive was developed, it was clear from the beginning that the metadata model would change over time. An important decision was made to store the images and the metadata separately, i.e. the metadata in a separate database, where it can be modified with Oracle commands. This decision has proven to be a successful strategy for working with metadata, and similar solutions are used by other large scale image agencies nowadays. In total STT-Lehtikuva’s metadata model includes 42 metadata items. 17 of these fields are standard IPTC metadata fields. Table 1 contains STT-Lehtikuva’s metadata items classified according to the metadata typification presented by Pereira et al. (2008). We have omitted Pereira’s User interaction and User context metadata types as those were outside the scope of our study. The low level metadata fields included in STT-Lehtikuva’s metadata model are calculated automatically from the pixel-level data in the image. The size and shape (portrait, landscape, square or panorama) can be used for searching in the image web shop. The fields in the high level content description metadata type are numerous. Categories are standardized IPTC categories and can also be used in searches. The caption describes the content of the images, given both in Finnish and English. ISO-country code is the standardized country code. However, not all image agencies use it in the same way. The file server program in the digital asset management system checks the country codes when ingesting images and makes corrections automatically. Additional words-field contains keywords, which supplement the caption field. They are given both in English and Finnish and are not visible in the image web shop. Title is the object in the image, for example the name of the person in the image. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 7 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme The search for named persons in the image archive and image web shop application is done based on the value in this field. The Subject code field is not in use yet but should eventually contain a value from the 1300 item vocabulary in the IPTC Photo metadata standard. Headline is a short headline for the image given in Finnish and English, such as “Finnish presidential elections 2006”. Visible keywords are keywords which are set visible in the image web shop. Table 1 Metadata structure at STT-Lehtikuva Metadata type Metadata fields Content metadata: low-level: Low level features, which are typically automatically extracted from the content, such as color, texture, shape and motion for video data etc. X-pixels, Y-pixels, Shape, Rgb size Content metadata: high-level: High-level features, which are typically created by a human, such as annotations, keywords, reviews, ratings, and links to related material Category (IPTC 2:015), Caption (IPTC 2:120), ISO Coutry code (IPTC 2:100), City (IPTC 2:090), Country (IPTC 2:101), Additional words, Title (IPTC 2:005), Keywords (IPTC 2:025), Subject Code (IPTC 2:012), Headline (IPTC 2:105), Visible Keywords Content metadata: structure: All types of structure, organization or arrangement that may be present in one or more multimedia assets, such as spatial or temporal segmentation, audio and video streams etc. Product, Image Type, Basket, Status Content metadata: life-cycle: Information gathered along the content life-cycle about the process used in the value chain, regarding acquisition, scripting, recording, editing, mixing, archiving, producing and coding. Creator (IPTC 2:080), Original Type, Date Created (IPTC 2:055), Caption Writer (IPTC 2:122), History, Source (IPTC 2:115), Priority (IPTC 2:010), Service date Content identification and location metadata: Information to identify and locate the content, such as identification labels and links. ID, Assignment ID, Unique ID Content management metadata: Information useful for the efficient management of the data in terms of rights such as expression of rights, protection metadata and governance. LK Source, Instructions (IPTC 2:040), Project, Royalty Type, Pix Code, Copyright Notice (IPTC 2:116), Publishing Rights, Global Sales, Model Released, Property Released The structural metadata at Lehtikuva is related to organizing the images. The Product field divides the images into different image types, such as “posing” or “paparazzi” images. The field is used in the web shop. The image type divides the images into editorial stock, commercial stock, news, video and royalty free images, and it defines in what kind of search results the image will be visible in the web shop. The Basket field divides the images to different entities in the image repository and also defines the visibility to clients. Status, i.e. sellable, internal or deleted, also affects visibility in the web shop. Metadata lifecycle related fields contain information on the creator of the image, i.e. the name of the photographer or organization. Original type specifies the source from where the image has been transferred to STT-Lehtikuva’s image repository. Caption writer contains the initials of the person who has added the image to the image database and added the metadata. History concerns earlier uses of the image. Source contains information about the original owner of the image or the publishing rights. Service date is the date when the image has been included in some service offered by STT- WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 8 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Lehtikuva, and is used to control image visibility. For example, news image clients see images in the web shop for a period of one week. The content identification metadata contains three fields. ID is an internal identification number. Assignment ID is related to the physical location of negatives in the archive. Unique ID is specific to a certain client. Content management metadata includes several fields. They are related to making royalty payments, instructions concerning the use of images, codes given to images by international agencies which need to be referenced in payments, copyrights and publishing rights. They also include information about the rights concerning the model or items displayed in the image. It is clear that the metadata model reflects the functions relevant to the area of operation of STTLehtikuva. High level content metadata is important in supporting searching as evidenced by the large number of high-level content metadata fields. Categorization is also done to aid the searching. Content management metadata is vital for commercial purposes and is present in the metadata structure by various fields. The rights and payment transactions also need to be supported by the system. Compared to the metadata fields utilized at STT (Gylfe 2009), the image metadata model is simpler with roughly half the number of fields available as at STT. It also seems to be more consistently utilized during production across all the content annotated at Lehtikuva. The focus is more heavily on the description of content essence (both low-and high-level), due to the modality of the content. Still, content management metadata remains important for business reasons. The emphasis on lifecycle metadata present in the textual news content at STT is not as evident at Lehtikuva. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 9 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme 2.2.2 Metadata lifecycle The metadata lifecycle at STT-Lehtikuva is closely related to the essence lifecycle. The essence metadata lifecycles at the image agency are presented in Figure 2. The metadata types according to Pereira are referenced in the metadata lifecycle model in order to show the lifecycle of different types of metadata. The metadata lifecycle begins in the planning phase. Information on photo shoots as well as image requests by clients are filled to the planning systems daily. The planning system is not connected to the other components of the digital asset management system but is a separate entity, making it difficult to feed metadata into the production systems. Information exists in the planning phase which could be useful in the following phases, for example in the case of a photo shoot the place and the person to be photographed etc. are needed as metadata in the later phases. In the photo shoot the essence is created and the camera saves automatically EXIF data (e.g. date and time) which is transferred to the production system. When the photographer uploads images from the camera to the hard drives of the server of STT-Lehtikuva, there is an in-house coded software which enforces the photographer to fill a minimum amount of metadata, i.e. the name of the photographer WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 10 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme and the shooting place. The system also gives a unique ID to the photos. It is also possible to fill other data, such as captions or title, but at the minimum all photographs contain at least the date, the photographer’s name and the ID when they are ingested to the system. If the photographers have time to add other metadata they do it, if not the image journalists do the rest of the annotation. Figure 2 Essence and metadata lifecycle at STT-Lehtikuva Essence is also added into the process from international image agencies. When images arrive from international sources by satellite feed some metadata processing is done automatically to all the images, such as checking the ISO-country code. In the selection phase the image journalist goes through the images and accompanying metadata. More metadata processing can be done automatically to image groups, for example copying the content of a certain field to another in STT-Lehtikuva’s metadata model. Even after the automated processing the image journalists normally check the metadata and further annotate the images, adding and translating keywords and captions. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 11 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme The essence with metadata is delivered to the clients with a send code. The same send code field includes the code for sending the images to the archive, thus archiving is done simultaneously with sending the images to the clients. In addition to this push-type delivery, STT-Lehtikuva’s clients may search the image web shop independently and order images via the system, or contact the sales personnel, who in turn perform the searches as intermediaries. It has to be noted that there is also a fully automated essence lifecycle in the production process. Images from the satellite feed, i.e. from the most important international partners, are archived automatically for a certain time period and if they have not been viewed or purchased by anyone during that period they are deleted. When looking at the metadata lifecycle, it is clear that existing metadata is transferred between the systems and process phases quite well, except for the planning phase. Systems are linked and as stated earlier there are ways to automate some metadata processing, meaning editing, correcting and translating the content of some metadata fields. The essence and metadata are stored separately at STT-Lehtikuva, a solution which seems to work well with image files. The essence is stored to essence servers and metadata to the Oracle database. 2.3 Broadcasting company 2.3.1 Metadata structure The metadata model at Yleisradio (Table 2) contains more than a hundred fields, some of the items containing several subfields. The model does not follow any standard but is tailored to the needs of YLE. The amount of the metadata fields is vast, but only a small part of the items need to be filled out via manual annotation. The annotators are advised to fill metadata on three metadata groups which are organized on different tabs in the user interface: the program tab, production staff and performers tab and rights information tab. The metadata items presented in black in the table are the ones which should be taken into consideration by the annotators or they may be filled automatically. The gray items were currently left blank. The metadata model contains a few low level fields such as the color system or aspect ratio. The high level items are numerous and also contain time dependent items. It must be noted however, that many of the static high level items, such as the broadcasting name, production name or version name are copied from planning system Plasma and cannot be modified in the media asset management system Metro by the content annotators. Most annotation effort and time is spent on the time dependent high level items. The subject of each insert is filled to the subject strata. Key words strata field contains keywords for the insert and is filled by the archivists. The image strata field needs to be filled with description of the image, i.e. what is actually pictured in the video. Video and audio strata contain metadata on the video and audio, which are transferred from Avid iPlay. At the moment these strata are not in use. Subtitle strata field contains the subtitle text attached to the insert. Annotating the subjects and images of each insert in a program is a time consuming task and accounts for most of the annotation work by the broadcast assistants. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 12 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme The structural metadata contains fields related to the program structure, to the temporal segmentation of the program or other structural information related to the content. Here again, most of the static items are copied from Plasma and some of the dynamic items such as the default strata, i.e. the temporal segmentation of a program are produced automatically. Media is the type of the publication master file, i.e. internet, music, radio, special effects, TV or still image. Series number is copied from Plasma and contains the number of the part in the program series. Type indicates the media on which the program has been archived or if the program has been a direct broadcast. Main class, sub class and the combined class describe the genre of the program. Reference strata is meant to contain the IDs of the archived clips which have been attached to the program, but the strata was not activated in the current affairs editorial office. Table 2 Metadata structure at Yleisradio Metadata type WP4 Metadata fields Content metadata (low-level): Low level features, which are typically automatically extracted from the content, such as color, texture, shape and motion for video data etc. Aspect ratio, Color system, Color Content metadata (high-level): Highlevel features, which are typically created by a human, such as annotations, keywords, reviews, ratings, and links to related material Broadcasting name (Finnish), Language, Broadcasting name (Swedish), Production name, Version name, Name, Subject, Press release, Keywords, Production staff, Performers, Subject strata, Key words strata, Image strata, Interviewee strata, Production staff strata, Video strata, Audio strata, Subtitle strata Additional information (tape), Description of attached files, Additional information (sound), Additional information (sound file), Subtitle broadcasting name, Original name, Subtitle information, File name (subtitle), Language (subtitle), Additional information (subtitle), Attached files Content metadata: structure: All types of structure, organization or arrangement that may be present in one or more multimedia assets, such as spatial or temporal segmentation, audio and video streams etc. Media, Series number, Version, Type, Main class, Sub Class, Combined Class, Default strata, Reference strata Master starting time, Master ending time, Duration, Subtitle starting time, Subtitle ending time, Duration (subtitle), Compression (sound), Sample frequency (sound), Bit depth (sound), Channel order (sound), Loudness, Sound files, Format (sound), Compression (sound file), Sample frequency (sound file), Bit depth (sound file), Channel order (sound file), Loudness (sound file), Series number (subtitle), Essence packages, Essences in package, Properties of ASF, Timecode leaps Content metadata: life-cycle: Information gathered along the content life-cycle about the process used in the value chain, regarding acquisition, scripting, recording, editing, mixing, archiving, producing and coding. Origin, Broadcasting data, Completion time, Archiving date (film), Archiving date (insert) , Ready for archiving, Annotator, Saving date, Archived, Archivist, DOKPVM, DOK source, Generator, Generating date, Name of metadata modifier, Modifying date, Editing state, Create new program to Plasma, Program is created to Plasma Broadcasting information (Plasma), Broadcasting information (historical), Production phase, Avid import, Translators/subtitle editors, Completion date (subtitle), Channel (subtitle), First broadcasting date (subtitle), Ingest user, Ingest date, Exporting, Exporting date Content identification and location Program ID, Production number, TVAR-ATKN, Production number CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 13 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme metadata: Information to identify and locate the content, such as identification labels and links. (old), Plasma ID, Media ID, Source ID Archived tape, Sound pair, Production number (subtitle), Tape ID (subtitle), Locations (essence), Location details (essence) Content management metadata: Information useful for the efficient management of the data in terms of rights such as expression of rights, protection metadata and governance. Additional information, Producing cost center, Insert rights, Compensation on rebroadcasting, Limitations on use and broadcasting, Usage rights, Image sources strata Lifecycle fields include status information necessary for the production process, such as ”ready for archiving” field, which indicates to the archivists that the annotation is ready and the program can be archived. Lifecycle items contain information on the people who have been responsible for different actions in the production process, such as the annotator, archivist and metadata modifier. Dates describing the production process are included as well, in the completion time, archiving date and modifying date. The information in the archiving system can be used to create a new program object to the planning system (Create new program to Plasma) and to get a confirmation that the program has been created (Program is created to Plasma). Content identification and location metadata are imported to the system from Plasma in most cases. The content management metadata are fields which can be modified by the annotators and, importantly, should be updated during the production process. These fields are also actively checked when searching for archived material. Insert rights describe the rights to use the content item in other productions. Compensation might be required in the case of rebroadcasting, which is indicated in the compensation field. Limitations on use and broadcasting are also indicated and more detailed information can be given in the Usage rights field. 2.3.2 Metadata lifecycle The essence and metadata lifecycle at the broadcasting company is visualised in Figure 3. The planning phase is a critical phase in the production. Much of the metadata is created in the planning phase and it is transferred through the whole essence and metadata lifecycle. Also, archived material is searched for and utilized already in the planning phase. New essence is created in the capture phase, or received from external sources. Video material is captured by the cameraman and in some cases by the journalist. The raw clips are ingested to the Interplay system by the journalist with specific software. Not all of the clips are ingested however, as journalists tend to save a lot of raw material to their own repositories, e.g. external hard drives with video files. In the capture and ingest phase especially high level metadata is created, as clips are named according to given instructions and journalists’ preferences. This metadata is only used for production purposes and does not reach the archiving phase. In addition to the planning of future programs in Plasma there is another planning cycle within the journalistic production process, where the detailed content of a program is planned. An editorial system, iNews, is used for this purpose. A lot of metadata is created in this phase and not used later, as reflected in the metadata line inside the iNews system. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 14 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme In the pre-selection phase the journalist pre-selects suitable material and possibly edits it. Also archived clips are used from the content repository. Essence is edited and composed to a ready program in the editing phase. In all of these phases metadata is produced. In the script writing phase the journalist produces a lot of information related to the program content. The journalist uses this information for the journalistic process and it does not necessarily get forwarded as descriptive information into the metadata lifecycle. In the editing phase the editor names the ready sequences according to certain naming rules. Figure 3 Essence and metadata lifecycle at Yleisradio The program is played out with the Pebble Beach broadcasting automation. Live broadcasts are archived based on capture from the broadcasting stream. After the delivery of the program the final content annotation is done by the broadcast assistant. Annotation in this phase deals with high level content metadata, life cycle metadata and content management metadata. The structure might also be changed if the segment limits need to be changed for annotation purposes. Annotation is a time consuming phase, as for time dependent content the annotation is carried out for each segment. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 15 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme When annotation is ready the program is archived. The essence lifecycle does not end in the archiving phase. The essence is stored in the archive and might be reused for the production of another program. The archivists check the metadata in the archiving phase, and sometimes ask corrections or add some items, such as keywords to important segments. However, all content is not archived. Journalists and broadcast assistants are responsible for deleting essence gathered during the production from Avid’s servers. The material includes raw clips or archived material which has been transferred from media asset management system Metro to Avid’s servers for the editing phase. This material will be discarded, but the raw clips might be archived by the journalists to separate content repositories. The metadata lifecycle is in general well managed. The systems are integrated and metadata exchange between different modules is implemented. However, there are some parts where double entry of metadata happens, especially between the planning of an individual program and its annotation phase. As an example, the production staff and interviewees for different segments are noted into iNews, the editorial system. Feeding the information before the delivery is imperative because the names of the interviewees and production staff are showed in the program. However, in the annotation phase the same information needs to be rewritten into the media asset management system Metro. Also, the program needs to be segmented in the annotation phase, while the program parts have been listed with time codes to the editorial system. This data could be of used as an annotation aid if the systems were integrated. In news production the editorial system is connected to the media asset management system and the segmentation is done automatically based on the information retrieved from the editorial system. Finally, when archived material has been used in a program it would help annotation if the ID, program name and broadcasting date of the old material could be transferred automatically to the strata where the archived clip has been used. At the moment broadcasting assistants need to find out and enter the origin of the clips manually. 2.4 Comparisons of metadata models and lifecycles Compared to the original model by Mauthe and Thomas (2007) the phases in the essence and metadata lifecycle are described here on a higher abstraction level. On the other hand some phases are emphasized mores than in the model. Planning was seen as an important phase in the research partners’ processes. However in neither of the cases planning was fully integrated into the metadata lifecycle. In the image agency planning was done in a separate system and no metadata was entered into the content management system. In the current affairs editorial office of the broadcasting company the program planning phase was carried out with the editorial system which was not integrated to the other system components of the digital asset management system. This causes to reentry of metadata into different systems. In both companies the planning phase relied on separate systems, making it inherently difficult to integrate the phase into the metadata workflow. Tthe original process model does not include ingest as a separate production phase. However, in both case companies ingesting was named as a separate phase in both the content and metadata lifecycles. This was due to the specific content and metadata-related operations and rules that existed to govern this phase. In the partner enterprises specific actions were taken in the ingesting phase and software components were dedicated to handling for example metadata creation. In the image agency metadata WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 16 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme is processed automatically when content is ingested. In the broadcasting company there are for example instructions on naming the clips when ingesting the content. Actions related to the production of the content in the models representing the partner companies’ processes are not separated to production and post production steps. However, in the program planning phase, as well as in the elaboration phase of the process model, the media asset management system is used for retrieval of existing content to help the planning of future productions. The content lifecycle in the image agency is different in the sense than when searching for content is done, it is mostly sent to clients or collections of images are made upon clients’ requests. Existing content is not used much in the planning beyond recognizing gaps in the archives. The essence and metadata lifecycles are different in the image agency and the broadcasting company in what comes to annotation. In the process model annotation is included in the post production steps. In the image agency the annotation is done before the delivery of the content. It is important that the metadata in the images is sufficient and correct when delivered to clients. Metadata is actually a part of the product the image agency is selling. In the case of the broadcasting company the actual annotation is done after the delivery of the content, i.e. broadcasting the program. Annotation is done with great detail for the time dependent media and annotation is time consuming. The annotation serves above all the reuse of content. Archiving is done simultaneously with delivery to clients in the image agency, but also some images are chosen specifically to be archived. As mentioned earlier, there is also an automated process for archiving content from certain sources and for discarding unused content. In the broadcasting company archiving is done by specialized staff. Metadata is checked in the archiving phase, thus forming a step in the metadata lifecycle. Archiving is not mentioned in the original process model but was considered here to be a step which should be separated in the essence and metadata lifecycle. Discarding is present in both of the partner companies’ processes. Broadcast assistants and journalists are responsible of discarding essence form the servers which are used in the content production. 2.5 Process requirements for content annotation The goals, principles and requirements for annotation were studied in the partner companies. For this end, interviews of actors involved (broadcast assistants, journalists, archivists) and observations of the annotation process were conducted. According to the results, in the partner companies’ workflows, metadata should: - aid content discovery through description and categorization for searches fill information needs arising during production (related to content or metadata) facilitate the journalistic processes (e.g. editing) promote content to clients and support their production processes enable monetary compensation Most of the issues related to the goals of content annotation in both companies concerned searching and finding the content. The most important part of the metadata lifecycle could be said to begin after the content has been archived, i.e. when metadata enables accessing the content. It seemed that finding WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 17 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme the right content was challenging in both of the research partner’s processes. The selection of a suitable number of most relevant keywords or search terms was challenging both at the annotation and search ends. The results revealed also other interesting goals for content annotation beyond enabling access to content. Editing the manuscript, serving production and information needs as well as monetary compensation purposes were mentioned as the goals of content annotation in the broadcasting company. Metadata and annotated information were used extensively by the journalists in the broadcasting company for editing the manuscript. Annotating the content with adequate detail was important for the production process. For example naming the raw material clips with reasonable detail is important for the editing phase. In the image agency metadata was used for promoting the content. Image categories were created and displayed in the web shop with the aim of introducing potential customers the types of content available. Themed groups of images could also be used in retrieval to delimit searches which currently produce larger result sets than searchers hope for. In the image agency the metadata lifecycle extends from annotating the content to supporting the customer’s processes using that content. Metadata was thus added specifically to support the customer, such as information on the story to which the images were chosen. The generation of metadata needs to keep up with the fastening paces of content production. It is however dependent on the transfer of knowledge in the production process. For example, in the broadcasting company the broadcasting assistants rely in their annotations on the information given by the journalists either orally, by email or by for example typing lists of the archived clips used. More information from the content creators to the annotators was a clearly stated requirement in the results. The roles in content annotation are slowly changing. In the image agency it was a requirement that everybody participating in the production process handle the metadata process. It was mentioned that sufficient metadata is enough – metadata does not need to be complete. This is related to the fact that content should be in a sellable state as fast as possible. In the broadcasting companies’ internal searches the availability of audiovisual content as thought to decrease the need for extremely detailed annotation because the content can be viewed by the searcher immediately. For this goal content annotators need more training in order to learn new ways of annotation to match the new crossmedia retrieval paradigms implemented in Metro. An important requirement for content annotation in a complex production system is that systems should be interoperable. Double feeding the same metadata is still present in many stages of the workflows, straining the staff and taking up valuable time. In the image agency the issues were related to mostly to the planning phase. In the broadcasting company problems in the integration of the production tools and the digital asset management system was discussed. The metadata models used by the companies have evolved over time. This is an important issue to consider when building the systems for annotation and content and metadata archives. They also need to support language versions. More automatic generation of metadata was mentioned as a content annotation approach. The automatic generation meant most often automatic copying or transferring metadata which existed in WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 18 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme another system, whether static or time dependent, but in the case of the image agency methods for automatically annotating images based on image features were considered to be interesting. Requirements for content annotation also included tools for helping the annotation, such as spell checking for the fields where content description is added. User interface requirements were present as well, such as the intuitiveness of the user interface and similarity of annotation interfaces in the case where multiple systems are used for annotation. 3 Crossmedia search and selection criteria 3.1 Video searching in television program production process The research partner YLE is in a changeover from many separate traditional archives to a mediaarchive integrating digital video, audio, image and text in one system. The implementation of a digital media-archive has considerable effects on documentation and searching methods and practices of different media and they may change the work tasks and collaboration related to those processes. Traditional media-archives are based on a text database of annotations of video documents and a mechanically operated repository of video tapes. The user first queries the database containing the textual surrogates, then orders the video tape of interest from the tape repository and browses the tape by replaying it on a special device. The search and selection is often performed collaboratively, for example, a broadcast assistant searches the database and a journalist watches the video tapes and selects the relevant clips. The process has been observed to be time-consuming. (Kauranen 2009, Markkula & Sormunen 2006, Tan & Müller 2003.) Figure 4 illustrates the work process of television program making in a traditional broadcast archive. The work process consists of six stages: ideating, planning, shooting, pre-selecting video, script writing and editing. The program idea and the script is processed and developed during the whole work process. Video needs may occur and videos searched for at every stage of the process. The volume of acquired video material (or paper prints of video descriptions) is first growing. The material is pruned during the whole process but much pruning takes place in the later stages of the work process when the actual videos are assessed. In all, the whole search process included much piling and pruning of both textual listings of video documentations and video tapes (Kauranen 2009, Markkula & Sormunen 2006, Tan & Müller 2003.) WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 19 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Work pile of video (archive/shot) Idea Outline Final Script Stages of the work process Ideation Planning Shooting Preselection Script writing Editing Time Figure 4 The stages of the work process in making a television program according to Markkula & Sormunen (2006) The late possibility to browse audiovisual contents caused uncertainty to the search process: a lot of candidate tapes were ordered because the selection was impossible to do on the basis of text surrogates. In the integrated crossmedia archive, the user has an instant access to contents which presents new opportunities and challenges for both automated and manual video indexing, searching, browsing and visualization methods. In large and heterogeneous video archives, textual metadata has a major role in video retrieval. This includes cataloguing type of metadata (identification, origin, location, authorship, lifecycle, etc) and content description type of metadata (annotations, index terms, classification codes, etc.). Content descriptions are based on intellectual and manual effort of analyzing of entire video documents. Large scale automation is not possible with the state-of-art video IR technology even though current IR methods (e.g., automatic segmentation of video documents) can be applied to improve the efficiency and quality of metadata production. Since manual metadata production is time-consuming and expensive, it is important to study what possibilities improved access to the essence of video documents offers in redesigning metadata production. It is quite obvious that more efficient tools for browsing video and audio contents decrease the need for textual content descriptions. However, before making any major changes in the archival practices, we need to know better what criteria are important for users in formulating queries and assessing retrieved items at the browsing stage. In this section, we report (1) what kind of search and selection criteria TV professionals apply for archive video material and (2) how these criteria evolve during the search process. The user criteria identified in the study may suggest potential access points for videos to be considered when redesigning metadata production. The user criteria are studied in the context of real work tasks in traditional TV broadcast archive and identified at three different points of the search process. We base our analysis on the task-based approach (see Ingwersen & Järvelin 2000) because the journalistic work WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 20 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme sets primarily the requirements for the material searched and it is probably the most solid component of the changing environment 3.2 Model of video attributes To be able to map users’ search and selection criteria on the metadata schema applied in an crossmedia archive, we need to use an appropriate classification template. Many classification models of image attributes and considerations for image indexing have been suggested in literature. Nearly all researchers in the field have applied Shatford’s (1986) faceted classification of image subject attributes (e.g., Armitage & Enser 1997, Enser 1993, Herzum 1993, Markkula & Sormunen 2000, Sandom and Enser 2003, Westman and Oittinen 2006). Shatford (1986) suggests, based on the works by Panofsky (1962) and Markey (1983), four subject facets in image analysis: Objects (Who), Activities and events (What), Place (Where) and Time (When). Within these facets, an image may be interpreted to represent both concrete and objective entities (ofness, e.g. objects, places, actions) and abstract and subjective entities (aboutness, e.g. themes, feelings, concepts manifested or symbolized by objects). Moreover, Shatford states that an image is simultaneously specific and generic. For example, the image of the Brooklyn Bridge is at the same time an image of a specific bridge, i.e. Brooklyn Bridge and an image of a bridge in general. Shatford’s model is only a starting point in our analysis since it was originally developed for still images and focuses on subject indexing excluding contextual metadata. The basic idea is that users’ search and selection criteria are values of video document attributes and may be assigned into facets presented in the model. The analysis template was developed in the course of the data analysis and new attribute classes were added as they emerged from the data. The resulting analysis template with examples is presented in the Appendix 1. Searchers’ criteria were clustered into three main classes: (1) video & audio content related criteria, (2) document related criteria, which is usually quite objective data (e.g., tape location, copyright, format) and (3) context of use criteria relating rather to the searcher and his situation than to the video document (e.g., enough material already, various kinds of video) and thus quite impossible to consider in documentation. Very general expressions of selection criteria, such as “whole content” or “the topic”, were excluded from the analysis. All objects, events and places specified by proper names were classified to named object, event and place facets regardless how “unique” the places were. This follows the practice by Armitage and Enser (1997). New content facets were created for audio, shooting technique (e.g., close-ups, black and white videos and videos with aerial view) and technical qualities (e.g., graphics on the image). About-level concepts were grouped into two: (1) themes (e.g., food chain, vacation, attitudes of Finns to Africa and Africans in the past) and (2) impressions including emotions awoken by videos (e.g., funny, memorable) or expressed by people in video document (e.g., hungry). The reason for this distinction is that indexing of impressions, feelings etc. in images are often regarded quite subjective and inconsistent. For indexing themes, thesauri and subject heading lists are usually applied if available. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 21 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme All linear time expressions were classified to linear time-facet whether they were decades, specific dates or months given by a journalist for a broadcast assistant to locate the video material, for example, “it was last spring when there was about that in news…”. Also expressions such as last… and when… were decided to indicate linear time facet, e.g., “last drought in Sudan” if it was possible to find dates or years for the event. To give an example of the facet analysis, the search topic “Savoy, Seurahuone, Adlon 1930s, restaurant milieu of that time” were analyzed to comprise of three facets: 1) named place-facet (Savoy or Seurahuone or Adlon), 2) generic place-facet/indoors (restaurant milieu) and 3) linear time-facet (1930s’). 3.3 Crossmedia search criteria 3.3.1 Search topic categories The 24 timeline interviews generated in all 79 search topics. First, search topics were categorized by their general focus. Four different types of search topics were identified. 1. Search topics in which video image content was defined, e.g. “Armi Ratia (a Finnish beauty queen) in 1960’s; “rain forest from the air”, “lab picture of plant improvement, plant cutting etc.”. 2. Search topics in which video audio content was defined, e.g., “X and Y (Parliament representatives) commenting how embarrassed they are to get a rise in their salary”. 3. Search topics in which video topic was defined but not image or audio content specified, e.g., “latest drought in Sudan”, “Aimo Tukiainen, all material”. Some of these were very open such as video clips representative for a particular time period, e.g, “wartime 1939-1942”. 4. Search topics focusing on known video clips in programs or on specific program series, e.g., a task in which a jubilee program of a magazine was prepared and memorable moments of various insert series in the past years’ broadcasts were looked for. Table 3 presents the distribution of different search topic types in our data. In two-thirds of the topics, visual content was defined. Topical searches were the second common search topic type (20%). 10% of search topics concerned known items. Sound content was in the focus in two search topics. Table 3 The distribution of different types of search topics in timeline interviews Search topic type WP4 CROSSMEDIA # % Image content defined 53 67 Sound content defined 2 3 Topical 16 20 Known item and program series 8 10 Total 79 100 SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 22 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme 3.3.2 Criteria in search topics Next, a detailed facet analysis was performed for the search topics described by the interviewees. In total, 195 facets were identified in the descriptions of 79 search topics. Thus, the average number of facets in search topics was 2.5. The analysis template in Appendix 1 was used in the analysis. First, we calculated the occurrences of different facet types over the whole data. Second, we compiled the same data at search topic level. And third, a task-level analysis was conducted. Presenting results using different units of analysis was to check how potential outliers in data affect on observed results. A single journalist conducting a large number of exceptional searches is an example of an outlier. The results show (see Table 4) that the most frequent criteria in the search topics were generic types of people and objects (e.g. Eskimos, spiders, refrigerators) which form fifth of all criteria incidences and linear time expressions (e.g., 1950’s) with the share of 16%. Named persons and objects, types of places (rain forest) and impressions (looking hungry, pleading) were also common. In all, generic types of criteria occurred slightly more often (38.5%) than specific criteria (31.8) and the abstract about-level –criteria of themes and impressions were clearly fewer (13.3 %) than concrete of-level criteria. Criteria relating to document context (e.g., program name, tape location) formed 9.2 % of all criteria. The criteria frequencies at search topic-level (N=79) show that types of people and objects appeared in 44% of the search topics and linear time expressions were used in 39% of search topics. These are again the most common criteria types. Generic types of facets were slightly more common than specific types of facets. About-level themes and impressions which formed 13% of all criteria incidences were nonetheless applied in about third of the search topics. Document context criteria which were document name, program series name, genre and tape location were also common included over fifth of search topics. Other context criteria (see Appendix 1) were missing from search topic definitions. Criteria relating to shooting technique, technical qualities and audio were few in search topic definitions as was the criteria relating to use context. Table 4 The distribution of user criteria in search topics using a) criteria occurrence, b) search topic, and c) work task as the unit of analysis. Facet type % of all incidences (N=195) In % of search topics (N=79) In % of work tasks (N=14) CONTENT CRITERIA 88.2 98.7 Specific facets 31.8 54.4 100 78.6 Named person, object 9.7 22.8 64.3 Named event 1.0 2.5 14.3 Named place 5.1 12.7 42.9 15.9 39.2 50 Linear time Generic facets 71.4 20.0 44.3 64.3 Action, types of events 8.7 21.5 42.9 Types of places 9.2 22.8 57.1 Cyclic time 0.5 1,3 7.1 Types of people & objects WP4 57.0 38.5 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 23 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Aboutness 13.3 57.1 31.6 Abstract themes 4.6 11,4 35.7 Impressions 8.7 20.3 35.7 Shooting technique 2.1 5.1 14.3 Technical qualities (=colour) 1.0 2.5 14.3 Duration - - - Audio specific 1.5 3.8 21.4 DOCUMENT CONTEXT CRT 9.2 22.8 35.7 Origin (here program or series name, genre) 5.1 12.7 28.6 Availability (here tape location) 4.1 10.1 7.1 Documentation - - - Earlier usage - - - USE CONTEXT CRITERIA 2.6 TOTAL 100 6.3 14.3 - - In data like ours, which consist of limited number of work tasks (in total 14), individual work tasks may have a strong effect in the total frequencies of some facet types in the data. Some work tasks included many search topics and it is possible that a certain facet is applied through all search topics emerging during that work task. The last column in the table shows in how many work tasks the facet types was applied. This analysis reveals that the criterion type availability which occurs in 10% of search topics was actually applied only in one work task. In this task, it was crucial to have all the video material quickly available in the near tape stock and the facet tape location was applied in every eight search topics in that work task. 3.3.3 Selection criteria and evolution of criteria during search process Table 5 presents the distribution of searching and selection criteria. It illustrates how the focus of user criteria change during the search process starting from (1) search topic definitions, (2) continuing to the first selection phase of studying textual metadata and finally (3) to the second selection phase based on video image and audio. Most of the subjects began the relevance assessments by reading the context of the search word, which was highlighted on the computer screen. They explained that they tried to find out how the topic was dealt in the document, whether there was any illustration available or was the topic, for example, only mentioned in the video document. The subjects said that much of the first phase selections were based on ‘feeling’ got from the textual metadata and that often there were “lot of optimism involved” in it. Content criteria were the most frequently applied criteria by users when defining search topics, searching and selecting video material. Taking a closer look at the criterion types in the content class, search topics seem to focus on concrete persons, objects, places, action etc. attributes. Named objects, events; places and linear time were used often in search topic definitions and supposedly also in searching because these criteria are applied seldom later in the search process. This is quite obvious, WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 24 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme since these attributes are usually well-documented in metadata. Linear time is the most common criterion of this group, applied also in the first selection phase based on metadata. Interestingly, types of people, objects and action which are very frequent criteria in search topic definitions seem to be applied at both selection phases. This suggests that these criteria are not always fully met in searching and also textual metadata and videos are studied to meet these criteria. On the other hand, some of the criteria in this group, such as ‘lot of people’ and ‘action’, are probably used as secondary criteria and therefore left to the end of the search process when more essential criteria are already met. Remarkably, cyclic time is almost missing from the criteria expressed by interviewees in the tasks described. Elsewhere, however, cyclic time as criterion gets attention. For example, a broadcast assistant comments during the timeline interview: “Then, one thing I have noticed, is that summer, winter, autumn, spring is in very few... You look for some landscape and it is summer and you go with the tapes and all are shot in winter time. But it is never said in the documentation. It would be good to remember to put the season there”. Impressions were applied frequently in video selection and formed the most common content criteria type applied in the second selection phase. In all, it was used in more than half of the work tasks. The selections concerning abstract themes, instead, were applied by reading textual metadata. Shooting technique (e.g., close-ups) and technical qualities were applied mainly in the selections based on video image. Shooting technique involved criteria such as close-ups, which were mainly wanted, shooting angles and aerial views. Technical qualities concerned color related criteria, technical quality and image manipulations. Texts placed on video image (e.g., journalists or interviewees name) and image manipulations were common reasons to discard video documents when watching them. According to interviewees this information was quite often lacking from the textual documentations. According to interviews, duration of video clips was a very common criterion and, therefore, probably implicit criterion which was not always mentioned in the interviews. Very short clips were abandoned. The data shows that duration as criterion was applied at both selection phases. According to the interviewees the duration of the video clip on the wished topic was not always available in the metadata or the subjects found difficult to interpret it. Table 5 The distribution of user criteria at three different phases of search process: (1) search topics, (2) selections based on textual metadata and (3) selections based on video image and audio Criteria type CONTENT CRITERIA Specific / Named Named person, objects Named event Named place WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS % if criteria incidences Criteria in Selection criteria search topics based on textual (N=195) metadata (N=70) Selection criteria based on video image and audio (N=60) 88.2 31.8 82.5 3.2 61.9 11.9 9.7 1.0 5.1 D4.1.1.3 CROSSMEDIA 1.2 1.2 1.2 3.2 - 25 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Linear time 15.9 Generic / Type of Type of people, objects Action, type of event Type of place Cyclic time About 38.5 8.3 23.8 20.0 8.7 9.2 0.5 13.3 Abstract theme Impression Shooting technique Technical qualities Duration Audio specific 17.5 10.7 6.0 7.1 10.7 4.6 8.7 7.9 7.9 1.6 19 3.6 7.1 19 2.1 1.0 1.5 2.4 6.0 6.0 14.3 14.3 7.9 6.3 DOC. CONTEXT CRITERIA Availability Origin Documentation Earlier usage 9.2 34.5 1.6 CONTEXT OF USE CRITERIA TOTAL 2.6 100 5.1 4.1 15.5 14.3 3.0 2.4 3.6 100 1.6 15.9 100 Audio content was criterion in five tasks, (statements in two, language spoken in two and music in one task). The availability of audio track was criterion in three tasks and the quality of audio in one. Audio content was the main focus in two search topics. Searching audio content was often regarded difficult. Broadcast assistants and archivists sometimes documented statements of politicians etc. which were believed to be needed later. Statements were usually searched for by using person names, events and linear time as search criteria and searching thus based on memory. Document context criteria were applied in 11 tasks of 14 and accounted 34.5% of all criteria applied at the first selection phase. These attributes, which are related to document availability (copyright, format, location), metadata (well documented, “good image” documented), origin (name, genre, production company) and previous usage (“used too much”) are usually available on textual metadata. The most common criterion in this group was copyright, which was employed in eight tasks of 14 in the data. In fact, copyright is used as criterion in most tasks. However, according to the interviews, many journalists find it difficult to understand the copyrights from the metadata and leave this criterion for the broadcast assistant to apply. Thus, copyright as criterion is missing from some tasks described by the journalists. Context of use criteria were employed in half of the work tasks in our data. These criteria are applied mostly in the second selection phase and often in the cutting phase of the program. The most common criterion, applied in four tasks, was preference of various kinds of video clips and dropping off similar video documents. Other criteria in this group were “have enough material already” and “compatibility of video clips with other video material”. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 26 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme 3.4 Crossmedia search needs and selection criteria The results suggest that user criteria in video searching and selection focus on video content. Content criteria are applied frequently in search topic definitions as well as in both selection phases. The results show a slight dominance of general types of needs (types of objects, places action) over specific needs (named persons and objects, events, places and linear time). Most search topics (67%) focus on concrete visual content instead of topical or document context attributes. The facet analysis confirms this observation: concrete named and types of persons, objects, places, action and linear time are dominating in user’s search topic definitions. Impressions are also common criteria appearing in fifth of the search topics. Few audio related criteria were also found. In video selection, some new document attributes become important and some fade away. Document context criteria formed a considerable group of selection criteria and were applied almost solely at the first selection phase by studying textual metadata, in which these attributes are available. Copyright, genre and program name are the most common criteria in this group. Instead, subjective criteria such as impressions and criteria in class ‘context of use’ are applied clearly more often in selections based on video image and audio. In all, impressions, e.g., facial expressions of people or overall feelings depicted from video documents form an important group of selection criteria. According to interviews, criteria connected to impressions are used also in searching but the subjects often found these searches troublesome. Because of the subjective nature of these criteria, textual documentation of impressions and finding right search keys when searching them is difficult. Other types of criteria which were applied almost solely in watching video documents are criteria relating to shooting technique and technical qualities. These criteria are very visual concerning shooting angles, distances, color, technical quality and texts or graphics placed on video image. The user criteria were studied in the context of video seeking in a traditional broadcast archive. However, the time-line interviewing method (see Schamber 2000) and the task-based approach adopted allows the identification of the underlying criteria originating from the requirements of journalistic program production. The journalistic work tasks in making programs are probably the most solid component of the changing environment. Therefore, the results are considerable independent from the search system in use. 4 Conclusions The results on metadata workflows reflect the different operating environments of the two companies. The image agency serves both internal and external clients with the content repository, whereas the broadcasting company mostly uses its content repository for internal production-related content needs. Also the complexity of the production process could be said to be higher in the broadcasting company. A comparison of the metadata models of the two companies highlights the differences in the essence described. At the television company, emphasis on time dependent metadata increases the number of metadata fields in the model and guides the annotation process of the content according to the strata. The image agency’s model on the other hand emphasizes caption-like descriptions of the image together fields adhering to industry standards, a requirement in the minute-by-minute content delivery WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 27 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme from international sources to many national clients. The issues related to content annotation at the two companies were very similar. The role of metadata is seen as an auxiliary in the key processes of producing and selling the content. All in all, metadata was seen to serve multiple roles with potentially conflicting goals. The metadata models are still under development, and the work practices correspondingly evolve. Issues exist with overlap in metadata workflows, and training staff in different roles in the process. The analysis of search topic descriptions and selection criteria applied both in reading textual metadata and in watching audiovisual media contents revealed the importance and role of particular metadata categories. In search topic descriptions, users focused on specific entities (named persons or objects, events and places), generic entities (types of people or objects, events, action and places), abstract themes (leaving content open), and linear time. When reading textual metadata users paid some attention to specific and generic contents but documents’ contextual attributes became more important. In the watching phase, users checked audiovisual features: what it looks like (generic objects), what emotions it might arose (impressions), what shooting techniques were applied, is technical quality ok and so on. The findings suggest that the most frequently mentioned search criteria – except linear time - point to annotated metadata fields (Subject strata – “aihe”, Key word strata – “sisältö”). Assigning annotations to these fields requires currently a lot of resources in metadata production. Unfortunately, the state-of-the-art technology for automatic video and audio analysis cannot solve the problem of identifying named or generic entities in unrestricted video materials. Based on the two studies reported here, several obvious tracks for redesigning metadata production should be investigated to find ways to improve cost-effectiveness: 1. Redefining the annotation process. For efficient metadata production, content annotation should begin early in the production process, and the first high-level content descriptions should originate from the producer of the essence (e.g. photographer). The systems utilized need to support the flow of metadata throughout the production process. This would ensure the knowledge gained in planning and shooting would be retained with the content and avoid double entry of descriptive information into separate systems. 2. Goals and quality of annotation. Metadata was seen to serve multiple internal and external functions. If annotation is to be as fast as possible to enable rapid access to content (in content sales) and to minimize the time spent in annotating (internal workload), the quality criteria for the annotation for different types of goals and at different stages of the process need to be specified. This will also lead to changes in the metadata workflow. 3. Utilizing auxiliary information. The planning function was recognized as important in both companies’ production processes. However, the information resulting from the planning work could be utilized with better efficiency to aid not only the resulting essence production but also content annotation. Furthermore, in journalistic process, texts such as program scripts, introductions and news reports are produced but not exploited in constructing the archive. The linkage between these texts and the final audiovisual content might be indirect but anyway, it could offer new options for building additional text indexes for the archive. For this end, the collection and integration of the planning and editing information to the content through the systems needs to be implemented. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 28 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme 4. Annotating for queries. In the past, audiovisual contents have been annotated by text to help the user to preselect the video without seeing the contents, for example to imagine how objects in the moving image look like, what impressions they might awake. In a crossmedia archive, this is no longer necessary since users can check the audiovisual content easily. 5. Focused areas for video analysis. The study revealed several potential uses for the applications of video analysis in focused identification problems, for example, identification of faces (close-ups), number of people (or faces), texts and graphic element attached to the image (making it useless), context of shooting (indoor/outdoor, aerial view, summer/winter, day/night). By this types of video analysis users could be served by a preselected list of available restrictions that could be applied in querying. 6. Additional text indexes by speech recognition. Speech dominates in many TV program genres. The reliability of automatic speech recognition systems is high enough for building additional indexes for free-text searching. Combined with fuzzy keyword matching techniques the approach could improve searching of non-annotated programs and work well in program genres made in a standard studio environment. Current approaches to distinct phases and systems in the essence and metadata workflows need to be rethought from the perspective of annotation. The metadata workflow is not fully integrated into all phases (e.g. planning) or systems (e.g. separate editorial systems) causing heavier workloads than necessary. Users need various types of textual metadata in searching and selecting materials in crossmedia archives. The present annotation practices should be carefully analyzed and redesigned since 1) the current workflow does not fully utilize available information 2) different annotation approaches may be required for different goals and phases in the production (e.g. outside sales, internal searches) and 3) users’ instant access to audiovisual contents makes some sub-goals of the traditional approach outdated. There are some promising possibilities to develop automatic methods for metadata production based on video and audio analysis which should be tested. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 29 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme Appendixes Appendix 1. The model of video search and selection criteria with examples THE MODEL OF VIDEO SEARCH AND SELECTION CRITERIA 1. Video content criteria 1.1. Named or specific 1.1.1. Named persons and objects 1.1.2. Named events (GATT meeting) 1.1.3. Named places (Australia) 1.1.4. Linear time (1942, 1960’s, March last year) 1.2. Types of or generic 1.2.1. Types of objects (nurses, snake, refrigerators) 1.2.2. Types of action (playing, baking) and events (Christmas) 1.2.3. Types of places (savannah) and 1.2.4. Cyclic time (not winter) 1.3. About: abstract themes (food chain) 1.4. About: impressions (funny, hungry) 1.5. Shooting technique (close-ups, aerial view) 1.6. Technical qualities (graphics on the image, tint, black and white, technically good) 1.7. Duration 1.8. Audio specific criteria 1.8.1. Audio content 1.8.2. Audio technical (audio track available, technically good / bad audio 2. Document context criteria 2.1. Origin (name, author, production country, genre, authenticity) 2.2. Availability (tape location, format, copyright, video document the only available) 2.3. Documentation (level of documentation, “good image” documented 2.4. Earlier usage (used too much) 3. Context of use -criteria 3.1. Various kinds of video (various kinds, not similar) 3.2. Need satisfied (enough material already, all covered) 3.3. Coherence with other material (coheres with other clips, coheres with the program) WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 30 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme References Armitage L-H, Enser P-G (1997) Analysis of user need in image archives. Journal of Information Science, 23:287-299. Enser P-G (1993) Query analysis in a visual information retrieval context. Journal of Document and Text Management, 1:25-52. Gylfe, C. (2009) Metadata in Cross Media Editorial Processes. Master’s Thesis. Helsinki University of Technology, Department of Media Technology. Espoo, Finland. 84 p. Herzum M. (2003) Requests for information from a film archive: a case study of multimedia retrieval. Journal of Documentation 59 (2) 168-186. Hodge, G. (2001) Metadata made simpler. NISO Press, Bethesda, MD. 15 p. Ingwersen, P. & Järvelin, K. (2005). The turn. Integration of information seeking retrieval in context. Dordrecht: Springer. Kauranen P. (2008) Perinteisten TV-arkistojen hakukäytännöt ja ohjelmatyöntekijöiden käsitykset integroidun videoarkiston käyttöönoton vaikutuksista. University of Tampere. Department of Information studies. Master’s thesis, 98p. Available at: http://tutkielmat.uta.fi/pdf/gradu02354.pdf Markey K (1986) Subject access to visual resources collections: a model for computer construction of thematic catalogues. Greenwood Press, Westport, Connecticut, 1986. Markkula & Sormunen (2006) Video needs at the different stages of television program making process. In: Proceedings of the 1st International Conference on Information Interaction in Context (IIiX), pp.111-118. [Available at: http://portal.acm.org/citation.cfm?id=1164844]. Markkula, M. & Sormunen, E. (2000). End-User Searching. Challenges Indexing Practices in the Digital Newspaper Photo Archive. Information Retrieval 1(4): 259-285 Mauthe, A. & Thomas, P. (2007) Professional Content Management Systems. Chichester, West Sussex, John Wiley & Sons Ltd. 314 p. Panofsky E (1970) Meaning in the visual arts. Penguin, London. Pereira, F., Vetro, A. & Sikora, T., (2008) Multimedia Retrieval and Delivery: Essential Metadata Challenges and Standards. Proceedings of the IEEE, Vol. 96, No. 4, p. 721-743. Sandom C. & Enser PGB. (2003) Archival moving imagery in the Digital Environment. In: Anderson, J., Dunning, A. & Fraser, M (eds.) Digital resources for the humanities 2001-2002. London: Office for Humanities Communication, Kings College, 2003. Schamber L. (2000) Time-line interviews and inductive content analysis: their effectiveness for exploring cognitive behaviors. JASIS 51(8): 734-744 (2000) Shatford S (1986) Analyzing the subject of a picture: a theoretical approach. Cataloguing and Classification Quarterly, 6:39-62. WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 31 (31) Phase 1 (1.2-31.12.2010) Next Media - a Tivit Programme STT-Suomen tietotoimisto. n.d. Mikä on STT? (online) [Referenced on 10.5.2010], available in WWW-format: <URL:http://www.stt.fi/fi/> Tan E. & Müller H. (2003) Integration of Specialist Tasks in the Digital Image Archive. In Cognition in a Digital World, ed. Herre van Oostendorp, LawrenceErlbaum Associates, Inc., Publishers, New Jersey Vänttinen, K. (2010) Case YLE D-keskus (online). [Referenced on 14.5. 2010, available in WWWformat: <URL: http://www.digiwiki.fi/fi/images/1/14/2010-03-05_Vänttinen.pdf > Westman & Oittinen (2006). Image retrieval by end-users and intermediaries in a journalistic work context. In: Proceedings of the 1st International Conference on Information Interaction in Context (IIiX) , Copenhagen, Denmark, pp. 102-110. Yleisradio Oy. Yle Info. n.d. (online) [Referenced on 10.5.2010], available in WWW-format: <URL: http://www.yle.fi/fbc/thisisyle.shtml> WP4 CROSSMEDIA SOLUTIONS METADATA AND SEARCH NEEDS D4.1.1.3 CROSSMEDIA 32 (32)