D4.1.1.3 Crossmedia C+

Transcription

WP4 CROSSMEDIA SOLUTIONS D4.1.1.3
CROSSMEDIA METADATA AND SEARCH NEEDS
Deliverable number D4.1.1.3
Crossmedia metadata and search
needs
Authors:
Marjo Markkula, Pirkko Oittinen, Sanna Olkkonen, Eero Sormunen,
Stina Westman
Confidentiality:
Consortium
Date and status:
Jan 12 2011, Version 1.0
th
This work was supported by TEKES as part of the next Media programme of TIVIT
(Finnish Strategic Centre for Science, Technology and Innovation in the field of ICT)
Phase 1 (1.2-31.12.2010)
Next Media - a Tivit Programme
Version history:
Version Date
State
(draft/
final)
0.1
0.2
0.3
1.0
31.12.2010
10.1.2011
12.1.2011
16.3.2011
/update/
draft
update
update
final
Author(s)
OR Remarks
Editor/Contributors
SO, SW
MM, ES, SO, SW
MM, ES, SW
SW
First draft of Aalto part
Update with TaY parts added
Revision by research parties
Participant role
Participants
Organisation
Case company representatives
Harri Juutilainen, Hanna
Nurminen, Pauli Tölli
Suomen Tietotoimisto Lehtikuva
Case company representatives
Jouni Frilander, Pekka
Kauranen, Tuula Peltonen
Yleisradio
Researchers
Pirkko Oittinen. Sanna
Olkkonen, Stina Westman
Aalto/Mediatekniikka
Researchers
Marjo Markkula, Eero
Sormunen
Tampereen Yliopisto - INFIM
next Media
www.nextmedia.fi
www.tivit.fi
WP4
CROSSMEDIA
SOLUTIONS
METADATA AND SEARCH NEEDS
D4.1.1.3
CROSSMEDIA
1 (31)
Phase 1 (1.2-31.12.2010)
Executive Summary
This deliverable discusses content annotation in crossmedia production processes which handle image
and video content. Two studies were conducted at participating case companies. The first one focuses
on metadata production processes in two companies (news agency, television broadcaster). The other
investigates crossmedia information needs and selection criteria in television program making.
Metadata workflows were analyzed based on interviews and observations at the two case companies.
Metadata was found to serve multiple goals including aiding content discovery, filling information
needs during production, facilitating journalistic processes, promoting content and enabling monetary
compensation. The metadata fields currently used were classified according to which content facet
they referred to: low-level content (e.g. color, motion), high-level content (semantic descriptions),
structure (e.g. segmentation), lifecycle (production information), identification/location (e.g. labels), or
management metadata (e.g. rights). The processes with which metadata was created, checked and
accessed (e.g. for searches) were modeled as the metadata flow, alongside the essence flow in
production. This analysis showed that planning and journalistic work processes were not fully
interlinked phases in the metadata lifecycle, and that systems could be further integrated to streamline
metadata production and management.
Crossmedia user needs were investigated by analyzing search topics and video selection criteria
connected to journalistic tasks in television production, as described by program team members. A
model was developed and used to classify users’ search and selection criteria:
1. Content criteria including concrete visual elements (objects, action, events, places, time), themes
and impressions, duration, audio attributes, shooting technique (e.g., shooting distance, angle) and
technical qualities (e.g., color).
2. Document context criteria: document origin (e.g., name), availability (e.g., copyright), usage (e.g.,
used too much) and metadata (e.g., well documented), corresponding to life-cycle, identification,
location and management metadata.
3. Use context criteria: searcher and his situation (e.g., enough material already, various kinds of
video), difficult consider as metadata.
User criteria in video searching and selection focused on video content. Most addressed high-level
content but some low-level criteria were also identified. In search topics the focus was on specific
entities (named persons, objects, events, places), generic entities (types of people, objects, events,
action, places), abstract themes and impressions, and linear time. When reading textual metadata,
contextual attributes became important. While watching retrieved content, audiovisual features such as
visual appearance, affective qualities, shooting techniques and technical quality were evaluated.
High-level content annotation currently requires a lot of resources in metadata production. The
metadata production processes are challenging in terms of systems integration and information flows
throughout content lifecycle. As conclusions for the work presented here, potential tracks to improve
the cost-effectiveness of metadata production were suggested. These serve as future topics for research
and company development in the coming years of the project.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
2 (31)
Phase 1 (1.2-31.12.2010)
Table of Contents
Executive Summary ............................................................................................................... 2
1 Introduction ....................................................................................................................... 4
1.1 Partners .................................................................................................................... 4
1.2 Methods .................................................................................................................... 4
1.3 Deliverable structure ................................................................................................. 5
2 Metadata workflows .......................................................................................................... 6
2.1 Essence and metadata flow in a content centric process model ............................... 6
2.2 Image agency ........................................................................................................... 7
2.2.1 Metadata structure ........................................................................................ 7
2.2.2 Metadata lifecycle ......................................................................................... 9
2.3 Broadcasting company ........................................................................................... 11
2.3.1 Metadata structure ...................................................................................... 11
2.3.2 Metadata lifecycle ....................................................................................... 13
2.4 Comparisons of metadata models and lifecycles .................................................... 15
2.5 Process requirements for content annotation .......................................................... 16
3 Crossmedia search and selection criteria........................................................................ 18
3.1 Video searching in television program production process...................................... 18
3.2 Model of video attributes ......................................................................................... 19
3.3 Crossmedia search criteria ..................................................................................... 20
3.3.1 Search topic categories............................................................................... 20
3.3.2 Criteria in search topics............................................................................... 21
3.3.3 Selection criteria and evolution of criteria during search process ................ 23
3.4 Crossmedia search needs and selection criteria ..................................................... 25
4 Conclusions .................................................................................................................... 26
Table of Tables
Table 1 Metadata structure at STT-Lehtikuva ........................................................................ 8
Table 2 Metadata structure at Yleisradio ..............................................................................12
Table 3 The distribution of different types of search topics in timeline interviews ..................21
Table 4 The distribution of user criteria in search topics using a) criteria occurrence, b)
search topic, and c) work task as the unit of analysis. ..........................................................22
Table 5 The distribution of user criteria at three different phases of search process: (1)
search topics, (2) selections based on textual metadata and (3) selections based on video
image and audio ...................................................................................................................24
Table of Figures
Figure 1 Content-centric essence and metadata flow model (Mauthe & Thomas 2007)......... 6
Figure 2 Essence and metadata lifecycle at STT-Lehtikuva .................................................10
Figure 3 Essence and metadata lifecycle at Yleisradio .........................................................14
Figure 4 The stages of the work process in making a television program according to
Markkula & Sormunen (2006) ...............................................................................................18
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
3 (31)
Phase 1 (1.2-31.12.2010)
1
Introduction
Many categories of metadata vital in searching are based on the intellectual analysis of media essence
increasing the cost of metadata production. Further, the content and context of crossmedia entities (e.g.
still images, video clips) provide a vast number of potential aspects which could be represented in
metadata. Media companies face four major challenges in developing high performance but costeffective content management for crossmedia assets:
1. How to identify the minimum set of metadata elements that are essential in searching, selecting
and using crossmedia entities by different user groups for different purposes?
2. How to integrate metadata production to the lifecycle of media production and optimize it?
3. How to reallocate intellectual indexing resources on key metadata appropriate in state-of-the-art
crossmedia archives?
4. How to find and apply the most productive automatic methods for media content analysis in
metadata production?
The study reported in this deliverable attacks the challenges on two research lines. The first one
focuses on metadata production processes in two media companies operating in different fields of
media production. The other aims at revealing crossmedia information needs and selection criteria in
television program making in one of the partner companies.
1.1
Partners
Yleisradio (YLE) is the Finnish public service broadcasting company. YLE has four national
television channels as well as six radio channels and manages a vast collection of media content. YLE
archives contain books, articles and web content, still images, audio content (music, radio programs
and sound effects) and about 280 000 hours of television material stored in different formats
(Yleisradio 2010, Vänttinen 2010). STT (Suomen Tietotoimisto, the Finnish News Agency) is an
independent news provider with many Finnish and a number of international media companies as
customers (STT 2010). STT has participated earlier in a study (TIVIT Flexible Services) concerning
metadata in crossmedia editorial processes (Gylfe 2009). STT acquired Lehtikuva, the largest Finnish
image agency in January of 2010. The decision was made to focus on the Lehtikuva operations for the
metadata study described in this deliverable. It would enable comparison between the two units of the
newly formed crossmedia company. Lehtikuva has several own photographers and it transmits images
from several international image agencies to its news image clients. Lehtikuva’s online image shop is
available to all registered clients. Lehtikuva has over 8 million images in their image archives. Video
content was added to their products in 2005.
1.2
Methods
The data for the metadata and workflow study were collected at YLE’s editorial offices and archives
in Pasila, Helsinki and Tohloppi, Tampere. The majority of the material was collected in the current
affairs editorial office, but persons in the news production unit and the television archive were also
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
4 (31)
Phase 1 (1.2-31.12.2010)
included in the interviews and observations. Furthermore, data collection was carried out at STTLehtikuva’s premises in Helsinki. All the data was collected in the Lehtikuva unit.
The data for the user needs study were collected at YLE’s editorial offices and archives in Pasila,
Helsinki and Tohloppi, Tampere. The material was collected at documentary, factual, current affairs,
news production and children’s units and in the television archive.
For the study on metadata workflows, twelve people (journalists, image editors, broadcasting
assistants, archivists, producers and IT specialists) were observed and interviewed in the research
partners’ organizations. Documents on guidelines and instructions related to their metadata processes
were also collected and analyzed. Based on the data gathered, an analysis on the types of metadata
assigned to content during the production process was done. The metadata were classified according to
the type of annotation they provide. The workflows related to content essence and metadata were
modeled in a parallel fashion. The models depict the process stages and systems with which different
type of metadata is added, checked or accessed.
In investigating cross media search needs, a task-based approach was adopted. Data on user criteria
were gathered by time line interviews of ten television professionals working in program production.
User needs expressed in search and selection criteria were investigated at three different points of the
search process: (1) criteria in search topic definitions, (2) selection criteria applied on textual
surrogates of video documents (online and paper prints) at the beginning of the search process and (3)
selection criteria applied in watching actual video documents at the latter part of the process.
1.3
Deliverable structure
The rest of the deliverable is structured according to the two studies. Section 2 corresponds to the
metadata workflow study and section 3 to the crossmedia user needs study.
Section 2 reports on the essence and metadata workflows in the two partner companies. Section 2.1
introduces the content-centric workflow models in media production environments. Section 2.2
presents the metadata field classification and section 2.3 the essence/metadata workflow models.
Section 2.4 discusses requirements for content annotation based on the study.
In section 3, users’ needs for video annotation are presented. Section 3.1 introduces the program
production process and video searching in television environments. Section 3.2 presents a model of
user needs. In Section 3.3, we report the results of the study. First, we consider users’ search topics in
categories (3.3.1). Second, criteria in search topics are presented (3.3.2) and third, criteria applied in
selections and their evolution through the search process are presented (3.3.3.). Section 3.4 sums up
the results.
The conclusions of the two studies are presented in Section 4.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
5 (31)
Phase 1 (1.2-31.12.2010)
2
Metadata workflows
2.1
Essence and metadata flow in a content centric process model
The classical definition of metadata is ‘data about data’. According to a more descriptive definition
metadata is “structured information that describes, explains, locates or otherwise makes it easier to
retrieve, use or manage an information resource” (Hodge 2001). In this project metadata was studied
from the viewpoint of content management processes in media companies. According to the Society of
Motion Pictures and Television Engineers content is the sum of essence and metadata: content =
essence + metadata. This definition states that essence, i.e. the essential data delivered to the viewer
(video, audio, images, graphics or text), is not content without the metadata describing it. The essence
is unusable if it is just a part of an unorganized collection of bits. It has to be described by metadata so
that it can be found and used, thus making descriptive metadata key in content management processes.
Workflow processes in media production have traditionally been described with sequential process
models where the process steps follow one another. However, nowadays the workflow of content
creation process is hardly ever organized in a strictly sequential manner. Digital asset management
systems allow connection of isolated systems, thus there is no need to follow a strictly sequential
process where one phase is completed before the next one begins. Content is accessed and metadata
added throughout various phases in the production, and the changes made are shared with all the
parties working with the project. Such a content-centric workflow is visualized in the model presented
by Mauthe & Thomas (2007).
Figure 1 Content-centric essence and metadata flow model (Mauthe & Thomas 2007)
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
6 (31)
Phase 1 (1.2-31.12.2010)
In this type of workflow the content is at the core of the process. All the information is stored in a
content management system and updated as the process goes forward. The progress of the work can be
observed by all the parties taking part in the production process. The production allows working in
parallel with content items. This kind of a process is also said to enable gathering a richer set of
metadata, faster access to content and easier reuse of content (Mauthe & Thomas 2007). This model
matches the existing requirements for fast production cycles for media content. The model is also
useful in the context of this research as it separates the flow of essence and metadata.
In this section, we report on (1) the current metadata models used at two media companies and (2)
how these metadata are produced in parallel with the content essence in the workflow. The process
models for the metadata and essence flows in the partner companies are presented analogously to
Mauthe’s and Thomas’ model (Figure 1).
The metadata models reflect the types of content annotation needed and currently implemented in the
production processes and systems. The flow models may be used to identify when in the process,
using which system and by whom is metadata added, checked and accessed. Both may be utilized to
identify potential bottlenecks in the metadata workflow. They also serve as comparison points to the
user needs discussed in section 3.
2.2
Image agency
2.2.1
Metadata structure
The metadata structure at STT-Lehtikuva is based on an IPTC standard but numerous other metadata
items have been added to the metadata model as well. When the image archive was developed, it was
clear from the beginning that the metadata model would change over time. An important decision was
made to store the images and the metadata separately, i.e. the metadata in a separate database, where it
can be modified with Oracle commands. This decision has proven to be a successful strategy for
working with metadata, and similar solutions are used by other large scale image agencies nowadays.
In total STT-Lehtikuva’s metadata model includes 42 metadata items. 17 of these fields are standard
IPTC metadata fields. Table 1 contains STT-Lehtikuva’s metadata items classified according to the
metadata typification presented by Pereira et al. (2008). We have omitted Pereira’s User interaction
and User context metadata types as those were outside the scope of our study.
The low level metadata fields included in STT-Lehtikuva’s metadata model are calculated
automatically from the pixel-level data in the image. The size and shape (portrait, landscape, square or
panorama) can be used for searching in the image web shop. The fields in the high level content
description metadata type are numerous. Categories are standardized IPTC categories and can also be
used in searches. The caption describes the content of the images, given both in Finnish and English.
ISO-country code is the standardized country code. However, not all image agencies use it in the same
way. The file server program in the digital asset management system checks the country codes when
ingesting images and makes corrections automatically. Additional words-field contains keywords,
which supplement the caption field. They are given both in English and Finnish and are not visible in
the image web shop. Title is the object in the image, for example the name of the person in the image.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
7 (31)
Phase 1 (1.2-31.12.2010)
The search for named persons in the image archive and image web shop application is done based on
the value in this field. The Subject code field is not in use yet but should eventually contain a value
from the 1300 item vocabulary in the IPTC Photo metadata standard. Headline is a short headline for
the image given in Finnish and English, such as “Finnish presidential elections 2006”. Visible
keywords are keywords which are set visible in the image web shop.
Table 1 Metadata structure at STT-Lehtikuva
Metadata type
Metadata fields
Content metadata: low-level: Low level features, which
are typically automatically extracted from the content,
such as color, texture, shape and motion for video data
etc.
X-pixels, Y-pixels, Shape, Rgb size
Content metadata: high-level: High-level features, which
are typically created by a human, such as annotations,
keywords, reviews, ratings, and links to related material
Category (IPTC 2:015), Caption (IPTC 2:120),
ISO Coutry code (IPTC 2:100), City (IPTC
2:090), Country (IPTC 2:101), Additional words,
Title (IPTC 2:005), Keywords (IPTC 2:025),
Subject Code (IPTC 2:012), Headline (IPTC
2:105), Visible Keywords
Content metadata: structure: All types of structure,
organization or arrangement that may be present in one
or more multimedia assets, such as spatial or temporal
segmentation, audio and video streams etc.
Product, Image Type, Basket, Status
Content metadata: life-cycle: Information gathered along
the content life-cycle about the process used in the value
chain, regarding acquisition, scripting, recording, editing,
mixing, archiving, producing and coding.
Creator (IPTC 2:080), Original Type, Date
Created (IPTC 2:055), Caption Writer (IPTC
2:122), History, Source (IPTC 2:115), Priority
(IPTC 2:010), Service date
Content identification and location metadata: Information
to identify and locate the content, such as identification
labels and links.
ID, Assignment ID, Unique ID
Content management metadata: Information useful for
the efficient management of the data in terms of rights
such as expression of rights, protection metadata and
governance.
LK Source, Instructions (IPTC 2:040), Project,
Royalty Type, Pix Code, Copyright Notice (IPTC
2:116), Publishing Rights, Global Sales, Model
Released, Property Released
The structural metadata at Lehtikuva is related to organizing the images. The Product field divides the
images into different image types, such as “posing” or “paparazzi” images. The field is used in the
web shop. The image type divides the images into editorial stock, commercial stock, news, video and
royalty free images, and it defines in what kind of search results the image will be visible in the web
shop. The Basket field divides the images to different entities in the image repository and also defines
the visibility to clients. Status, i.e. sellable, internal or deleted, also affects visibility in the web shop.
Metadata lifecycle related fields contain information on the creator of the image, i.e. the name of the
photographer or organization. Original type specifies the source from where the image has been
transferred to STT-Lehtikuva’s image repository. Caption writer contains the initials of the person
who has added the image to the image database and added the metadata. History concerns earlier uses
of the image. Source contains information about the original owner of the image or the publishing
rights. Service date is the date when the image has been included in some service offered by STT-
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
8 (31)
Phase 1 (1.2-31.12.2010)
Lehtikuva, and is used to control image visibility. For example, news image clients see images in the
web shop for a period of one week.
The content identification metadata contains three fields. ID is an internal identification number.
Assignment ID is related to the physical location of negatives in the archive. Unique ID is specific to a
certain client.
Content management metadata includes several fields. They are related to making royalty payments,
instructions concerning the use of images, codes given to images by international agencies which need
to be referenced in payments, copyrights and publishing rights. They also include information about
the rights concerning the model or items displayed in the image.
It is clear that the metadata model reflects the functions relevant to the area of operation of STTLehtikuva. High level content metadata is important in supporting searching as evidenced by the large
number of high-level content metadata fields. Categorization is also done to aid the searching. Content
management metadata is vital for commercial purposes and is present in the metadata structure by
various fields. The rights and payment transactions also need to be supported by the system.
Compared to the metadata fields utilized at STT (Gylfe 2009), the image metadata model is simpler
with roughly half the number of fields available as at STT. It also seems to be more consistently
utilized during production across all the content annotated at Lehtikuva. The focus is more heavily on
the description of content essence (both low-and high-level), due to the modality of the content. Still,
content management metadata remains important for business reasons. The emphasis on lifecycle
metadata present in the textual news content at STT is not as evident at Lehtikuva.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
9 (31)
Phase 1 (1.2-31.12.2010)
2.2.2
Metadata lifecycle
The metadata lifecycle at STT-Lehtikuva is closely related to the essence lifecycle. The essence
metadata lifecycles at the image agency are presented in
Figure 2. The metadata types according to Pereira are referenced in the metadata lifecycle model in
order to show the lifecycle of different types of metadata.
The metadata lifecycle begins in the planning phase. Information on photo shoots as well as image
requests by clients are filled to the planning systems daily. The planning system is not connected to the
other components of the digital asset management system but is a separate entity, making it difficult to
feed metadata into the production systems. Information exists in the planning phase which could be
useful in the following phases, for example in the case of a photo shoot the place and the person to be
photographed etc. are needed as metadata in the later phases.
In the photo shoot the essence is created and the camera saves automatically EXIF data (e.g. date and
time) which is transferred to the production system. When the photographer uploads images from the
camera to the hard drives of the server of STT-Lehtikuva, there is an in-house coded software which
enforces the photographer to fill a minimum amount of metadata, i.e. the name of the photographer
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
10 (31)
Phase 1 (1.2-31.12.2010)
and the shooting place. The system also gives a unique ID to the photos. It is also possible to fill other
data, such as captions or title, but at the minimum all photographs contain at least the date, the
photographer’s name and the ID when they are ingested to the system. If the photographers have time
to add other metadata they do it, if not the image journalists do the rest of the annotation.
Figure 2 Essence and metadata lifecycle at STT-Lehtikuva
Essence is also added into the process from international image agencies. When images arrive from
international sources by satellite feed some metadata processing is done automatically to all the
images, such as checking the ISO-country code.
In the selection phase the image journalist goes through the images and accompanying metadata. More
metadata processing can be done automatically to image groups, for example copying the content of a
certain field to another in STT-Lehtikuva’s metadata model. Even after the automated processing the
image journalists normally check the metadata and further annotate the images, adding and translating
keywords and captions.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
11 (31)
Phase 1 (1.2-31.12.2010)
The essence with metadata is delivered to the clients with a send code. The same send code field
includes the code for sending the images to the archive, thus archiving is done simultaneously with
sending the images to the clients. In addition to this push-type delivery, STT-Lehtikuva’s clients may
search the image web shop independently and order images via the system, or contact the sales
personnel, who in turn perform the searches as intermediaries.
It has to be noted that there is also a fully automated essence lifecycle in the production process.
Images from the satellite feed, i.e. from the most important international partners, are archived
automatically for a certain time period and if they have not been viewed or purchased by anyone
during that period they are deleted.
When looking at the metadata lifecycle, it is clear that existing metadata is transferred between the
systems and process phases quite well, except for the planning phase. Systems are linked and as stated
earlier there are ways to automate some metadata processing, meaning editing, correcting and
translating the content of some metadata fields. The essence and metadata are stored separately at
STT-Lehtikuva, a solution which seems to work well with image files. The essence is stored to
essence servers and metadata to the Oracle database.
2.3
Broadcasting company
2.3.1
Metadata structure
The metadata model at Yleisradio (Table 2) contains more than a hundred fields, some of the items
containing several subfields. The model does not follow any standard but is tailored to the needs of
YLE. The amount of the metadata fields is vast, but only a small part of the items need to be filled out
via manual annotation. The annotators are advised to fill metadata on three metadata groups which are
organized on different tabs in the user interface: the program tab, production staff and performers tab
and rights information tab. The metadata items presented in black in the table are the ones which
should be taken into consideration by the annotators or they may be filled automatically. The gray
items were currently left blank.
The metadata model contains a few low level fields such as the color system or aspect ratio. The high
level items are numerous and also contain time dependent items. It must be noted however, that many
of the static high level items, such as the broadcasting name, production name or version name are
copied from planning system Plasma and cannot be modified in the media asset management system
Metro by the content annotators. Most annotation effort and time is spent on the time dependent high
level items. The subject of each insert is filled to the subject strata. Key words strata field contains
keywords for the insert and is filled by the archivists. The image strata field needs to be filled with
description of the image, i.e. what is actually pictured in the video. Video and audio strata contain
metadata on the video and audio, which are transferred from Avid iPlay. At the moment these strata
are not in use. Subtitle strata field contains the subtitle text attached to the insert. Annotating the
subjects and images of each insert in a program is a time consuming task and accounts for most of the
annotation work by the broadcast assistants.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
12 (31)
Phase 1 (1.2-31.12.2010)
The structural metadata contains fields related to the program structure, to the temporal segmentation
of the program or other structural information related to the content. Here again, most of the static
items are copied from Plasma and some of the dynamic items such as the default strata, i.e. the
temporal segmentation of a program are produced automatically. Media is the type of the publication
master file, i.e. internet, music, radio, special effects, TV or still image. Series number is copied from
Plasma and contains the number of the part in the program series. Type indicates the media on which
the program has been archived or if the program has been a direct broadcast. Main class, sub class and
the combined class describe the genre of the program. Reference strata is meant to contain the IDs of
the archived clips which have been attached to the program, but the strata was not activated in the
current affairs editorial office.
Table 2 Metadata structure at Yleisradio
Metadata type
WP4
Metadata fields
Content metadata (low-level): Low
level features, which are typically
automatically extracted from the
content, such as color, texture, shape
and motion for video data etc.
Aspect ratio, Color system, Color
Content metadata (high-level): Highlevel features, which are typically
created by a human, such as
annotations,
keywords,
reviews,
ratings, and links to related material
Broadcasting name (Finnish), Language, Broadcasting name
(Swedish), Production name, Version name, Name, Subject, Press
release, Keywords, Production staff, Performers, Subject strata, Key
words strata, Image strata, Interviewee strata, Production staff strata,
Video strata, Audio strata, Subtitle strata
Additional information (tape), Description of attached files,
Additional information (sound), Additional information (sound file),
Subtitle broadcasting name, Original name, Subtitle information,
File name (subtitle), Language (subtitle), Additional information
(subtitle), Attached files
Content metadata: structure: All types
of
structure,
organization
or
arrangement that may be present in
one or more multimedia assets, such
as spatial or temporal segmentation,
audio and video streams etc.
Media, Series number, Version, Type, Main class, Sub Class,
Combined Class, Default strata, Reference strata
Master starting time, Master ending time, Duration, Subtitle starting
time, Subtitle ending time, Duration (subtitle), Compression (sound),
Sample frequency (sound), Bit depth (sound), Channel order
(sound), Loudness, Sound files, Format (sound), Compression
(sound file), Sample frequency (sound file), Bit depth (sound file),
Channel order (sound file), Loudness (sound file), Series number
(subtitle), Essence packages, Essences in package, Properties of
ASF, Timecode leaps
Content
metadata:
life-cycle:
Information gathered along the
content life-cycle about the process
used in the value chain, regarding
acquisition,
scripting,
recording,
editing, mixing, archiving, producing
and coding.
Origin, Broadcasting data, Completion time, Archiving date (film),
Archiving date (insert) , Ready for archiving, Annotator, Saving
date, Archived, Archivist, DOKPVM, DOK source, Generator,
Generating date, Name of metadata modifier, Modifying date,
Editing state, Create new program to Plasma, Program is created to
Plasma
Broadcasting information (Plasma), Broadcasting information
(historical), Production phase, Avid import, Translators/subtitle
editors, Completion date (subtitle), Channel (subtitle), First
broadcasting date (subtitle), Ingest user, Ingest date, Exporting,
Exporting date
Content identification and location
Program ID, Production number, TVAR-ATKN, Production number
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
13 (31)
Phase 1 (1.2-31.12.2010)
metadata: Information to identify and
locate
the
content,
such
as
identification labels and links.
(old), Plasma ID, Media ID, Source ID
Archived tape, Sound pair, Production number (subtitle), Tape ID
(subtitle), Locations (essence), Location details (essence)
Content
management
metadata:
Information useful for the efficient
management of the data in terms of
rights such as expression of rights,
protection metadata and governance.
Additional information, Producing cost center, Insert rights,
Compensation on rebroadcasting, Limitations on use and
broadcasting, Usage rights, Image sources strata
Lifecycle fields include status information necessary for the production process, such as ”ready for
archiving” field, which indicates to the archivists that the annotation is ready and the program can be
archived. Lifecycle items contain information on the people who have been responsible for different
actions in the production process, such as the annotator, archivist and metadata modifier. Dates
describing the production process are included as well, in the completion time, archiving date and
modifying date. The information in the archiving system can be used to create a new program object to
the planning system (Create new program to Plasma) and to get a confirmation that the program has
been created (Program is created to Plasma).
Content identification and location metadata are imported to the system from Plasma in most cases.
The content management metadata are fields which can be modified by the annotators and,
importantly, should be updated during the production process. These fields are also actively checked
when searching for archived material. Insert rights describe the rights to use the content item in other
productions. Compensation might be required in the case of rebroadcasting, which is indicated in the
compensation field. Limitations on use and broadcasting are also indicated and more detailed
information can be given in the Usage rights field.
2.3.2
Metadata lifecycle
The essence and metadata lifecycle at the broadcasting company is visualised in Figure 3.
The planning phase is a critical phase in the production. Much of the metadata is created in the
planning phase and it is transferred through the whole essence and metadata lifecycle. Also, archived
material is searched for and utilized already in the planning phase.
New essence is created in the capture phase, or received from external sources. Video material is
captured by the cameraman and in some cases by the journalist. The raw clips are ingested to the
Interplay system by the journalist with specific software. Not all of the clips are ingested however, as
journalists tend to save a lot of raw material to their own repositories, e.g. external hard drives with
video files. In the capture and ingest phase especially high level metadata is created, as clips are
named according to given instructions and journalists’ preferences. This metadata is only used for
production purposes and does not reach the archiving phase. In addition to the planning of future
programs in Plasma there is another planning cycle within the journalistic production process, where
the detailed content of a program is planned. An editorial system, iNews, is used for this purpose. A
lot of metadata is created in this phase and not used later, as reflected in the metadata line inside the
iNews system.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
14 (31)
Phase 1 (1.2-31.12.2010)
In the pre-selection phase the journalist pre-selects suitable material and possibly edits it. Also
archived clips are used from the content repository. Essence is edited and composed to a ready
program in the editing phase. In all of these phases metadata is produced. In the script writing phase
the journalist produces a lot of information related to the program content. The journalist uses this
information for the journalistic process and it does not necessarily get forwarded as descriptive
information into the metadata lifecycle. In the editing phase the editor names the ready sequences
according to certain naming rules.
Figure 3 Essence and metadata lifecycle at Yleisradio
The program is played out with the Pebble Beach broadcasting automation. Live broadcasts are
archived based on capture from the broadcasting stream. After the delivery of the program the final
content annotation is done by the broadcast assistant. Annotation in this phase deals with high level
content metadata, life cycle metadata and content management metadata. The structure might also be
changed if the segment limits need to be changed for annotation purposes. Annotation is a time
consuming phase, as for time dependent content the annotation is carried out for each segment.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
15 (31)
Phase 1 (1.2-31.12.2010)
When annotation is ready the program is archived. The essence lifecycle does not end in the archiving
phase. The essence is stored in the archive and might be reused for the production of another program.
The archivists check the metadata in the archiving phase, and sometimes ask corrections or add some
items, such as keywords to important segments. However, all content is not archived. Journalists and
broadcast assistants are responsible for deleting essence gathered during the production from Avid’s
servers. The material includes raw clips or archived material which has been transferred from media
asset management system Metro to Avid’s servers for the editing phase. This material will be
discarded, but the raw clips might be archived by the journalists to separate content repositories.
The metadata lifecycle is in general well managed. The systems are integrated and metadata exchange
between different modules is implemented. However, there are some parts where double entry of
metadata happens, especially between the planning of an individual program and its annotation phase.
As an example, the production staff and interviewees for different segments are noted into iNews, the
editorial system. Feeding the information before the delivery is imperative because the names of the
interviewees and production staff are showed in the program. However, in the annotation phase the
same information needs to be rewritten into the media asset management system Metro. Also, the
program needs to be segmented in the annotation phase, while the program parts have been listed with
time codes to the editorial system. This data could be of used as an annotation aid if the systems were
integrated. In news production the editorial system is connected to the media asset management
system and the segmentation is done automatically based on the information retrieved from the
editorial system. Finally, when archived material has been used in a program it would help annotation
if the ID, program name and broadcasting date of the old material could be transferred automatically
to the strata where the archived clip has been used. At the moment broadcasting assistants need to find
out and enter the origin of the clips manually.
2.4
Comparisons of metadata models and lifecycles
Compared to the original model by Mauthe and Thomas (2007) the phases in the essence and metadata
lifecycle are described here on a higher abstraction level. On the other hand some phases are
emphasized mores than in the model. Planning was seen as an important phase in the research
partners’ processes. However in neither of the cases planning was fully integrated into the metadata
lifecycle. In the image agency planning was done in a separate system and no metadata was entered
into the content management system. In the current affairs editorial office of the broadcasting
company the program planning phase was carried out with the editorial system which was not
integrated to the other system components of the digital asset management system. This causes to reentry of metadata into different systems. In both companies the planning phase relied on separate
systems, making it inherently difficult to integrate the phase into the metadata workflow.
Tthe original process model does not include ingest as a separate production phase. However, in both
case companies ingesting was named as a separate phase in both the content and metadata lifecycles.
This was due to the specific content and metadata-related operations and rules that existed to govern
this phase. In the partner enterprises specific actions were taken in the ingesting phase and software
components were dedicated to handling for example metadata creation. In the image agency metadata
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
16 (31)
Phase 1 (1.2-31.12.2010)
is processed automatically when content is ingested. In the broadcasting company there are for
example instructions on naming the clips when ingesting the content.
Actions related to the production of the content in the models representing the partner companies’
processes are not separated to production and post production steps. However, in the program planning
phase, as well as in the elaboration phase of the process model, the media asset management system is
used for retrieval of existing content to help the planning of future productions. The content lifecycle
in the image agency is different in the sense than when searching for content is done, it is mostly sent
to clients or collections of images are made upon clients’ requests. Existing content is not used much
in the planning beyond recognizing gaps in the archives.
The essence and metadata lifecycles are different in the image agency and the broadcasting company
in what comes to annotation. In the process model annotation is included in the post production steps.
In the image agency the annotation is done before the delivery of the content. It is important that the
metadata in the images is sufficient and correct when delivered to clients. Metadata is actually a part
of the product the image agency is selling. In the case of the broadcasting company the actual
annotation is done after the delivery of the content, i.e. broadcasting the program. Annotation is done
with great detail for the time dependent media and annotation is time consuming. The annotation
serves above all the reuse of content.
Archiving is done simultaneously with delivery to clients in the image agency, but also some images
are chosen specifically to be archived. As mentioned earlier, there is also an automated process for
archiving content from certain sources and for discarding unused content. In the broadcasting
company archiving is done by specialized staff. Metadata is checked in the archiving phase, thus
forming a step in the metadata lifecycle. Archiving is not mentioned in the original process model but
was considered here to be a step which should be separated in the essence and metadata lifecycle.
Discarding is present in both of the partner companies’ processes. Broadcast assistants and journalists
are responsible of discarding essence form the servers which are used in the content production.
2.5
Process requirements for content annotation
The goals, principles and requirements for annotation were studied in the partner companies. For this
end, interviews of actors involved (broadcast assistants, journalists, archivists) and observations of the
annotation process were conducted.
According to the results, in the partner companies’ workflows, metadata should:
-
aid content discovery through description and categorization for searches
fill information needs arising during production (related to content or metadata)
facilitate the journalistic processes (e.g. editing)
promote content to clients and support their production processes
enable monetary compensation
Most of the issues related to the goals of content annotation in both companies concerned searching
and finding the content. The most important part of the metadata lifecycle could be said to begin after
the content has been archived, i.e. when metadata enables accessing the content. It seemed that finding
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
17 (31)
Phase 1 (1.2-31.12.2010)
the right content was challenging in both of the research partner’s processes. The selection of a
suitable number of most relevant keywords or search terms was challenging both at the annotation and
search ends.
The results revealed also other interesting goals for content annotation beyond enabling access to
content. Editing the manuscript, serving production and information needs as well as monetary
compensation purposes were mentioned as the goals of content annotation in the broadcasting
company. Metadata and annotated information were used extensively by the journalists in the
broadcasting company for editing the manuscript. Annotating the content with adequate detail was
important for the production process. For example naming the raw material clips with reasonable
detail is important for the editing phase.
In the image agency metadata was used for promoting the content. Image categories were created and
displayed in the web shop with the aim of introducing potential customers the types of content
available. Themed groups of images could also be used in retrieval to delimit searches which currently
produce larger result sets than searchers hope for. In the image agency the metadata lifecycle extends
from annotating the content to supporting the customer’s processes using that content. Metadata was
thus added specifically to support the customer, such as information on the story to which the images
were chosen.
The generation of metadata needs to keep up with the fastening paces of content production. It is
however dependent on the transfer of knowledge in the production process. For example, in the
broadcasting company the broadcasting assistants rely in their annotations on the information given by
the journalists either orally, by email or by for example typing lists of the archived clips used. More
information from the content creators to the annotators was a clearly stated requirement in the results.
The roles in content annotation are slowly changing. In the image agency it was a requirement that
everybody participating in the production process handle the metadata process.
It was mentioned that sufficient metadata is enough – metadata does not need to be complete. This is
related to the fact that content should be in a sellable state as fast as possible. In the broadcasting
companies’ internal searches the availability of audiovisual content as thought to decrease the need for
extremely detailed annotation because the content can be viewed by the searcher immediately. For this
goal content annotators need more training in order to learn new ways of annotation to match the new
crossmedia retrieval paradigms implemented in Metro.
An important requirement for content annotation in a complex production system is that systems
should be interoperable. Double feeding the same metadata is still present in many stages of the
workflows, straining the staff and taking up valuable time. In the image agency the issues were related
to mostly to the planning phase. In the broadcasting company problems in the integration of the
production tools and the digital asset management system was discussed. The metadata models used
by the companies have evolved over time. This is an important issue to consider when building the
systems for annotation and content and metadata archives. They also need to support language
versions.
More automatic generation of metadata was mentioned as a content annotation approach. The
automatic generation meant most often automatic copying or transferring metadata which existed in
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
18 (31)
Phase 1 (1.2-31.12.2010)
another system, whether static or time dependent, but in the case of the image agency methods for
automatically annotating images based on image features were considered to be interesting.
Requirements for content annotation also included tools for helping the annotation, such as spell
checking for the fields where content description is added. User interface requirements were present as
well, such as the intuitiveness of the user interface and similarity of annotation interfaces in the case
where multiple systems are used for annotation.
3
Crossmedia search and selection criteria
3.1
Video searching in television program production process
The research partner YLE is in a changeover from many separate traditional archives to a mediaarchive integrating digital video, audio, image and text in one system. The implementation of a digital
media-archive has considerable effects on documentation and searching methods and practices of
different media and they may change the work tasks and collaboration related to those processes.
Traditional media-archives are based on a text database of annotations of video documents and a
mechanically operated repository of video tapes. The user first queries the database containing the
textual surrogates, then orders the video tape of interest from the tape repository and browses the tape
by replaying it on a special device. The search and selection is often performed collaboratively, for
example, a broadcast assistant searches the database and a journalist watches the video tapes and
selects the relevant clips. The process has been observed to be time-consuming. (Kauranen 2009,
Markkula & Sormunen 2006, Tan & Müller 2003.)
Figure 4 illustrates the work process of television program making in a traditional broadcast archive.
The work process consists of six stages: ideating, planning, shooting, pre-selecting video, script
writing and editing. The program idea and the script is processed and developed during the whole
work process. Video needs may occur and videos searched for at every stage of the process. The
volume of acquired video material (or paper prints of video descriptions) is first growing. The material
is pruned during the whole process but much pruning takes place in the later stages of the work
process when the actual videos are assessed. In all, the whole search process included much piling and
pruning of both textual listings of video documentations and video tapes (Kauranen 2009, Markkula &
Sormunen 2006, Tan & Müller 2003.)
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
19 (31)
Phase 1 (1.2-31.12.2010)
Work pile of video (archive/shot)
Idea
Outline
Final
Script
Stages of the work process
Ideation
Planning
Shooting
Preselection
Script
writing
Editing
Time
Figure 4 The stages of the work process in making a television program according to Markkula
& Sormunen (2006)
The late possibility to browse audiovisual contents caused uncertainty to the search process: a lot of
candidate tapes were ordered because the selection was impossible to do on the basis of text
surrogates. In the integrated crossmedia archive, the user has an instant access to contents which
presents new opportunities and challenges for both automated and manual video indexing, searching,
browsing and visualization methods.
In large and heterogeneous video archives, textual metadata has a major role in video retrieval. This
includes cataloguing type of metadata (identification, origin, location, authorship, lifecycle, etc) and
content description type of metadata (annotations, index terms, classification codes, etc.). Content
descriptions are based on intellectual and manual effort of analyzing of entire video documents. Large
scale automation is not possible with the state-of-art video IR technology even though current IR
methods (e.g., automatic segmentation of video documents) can be applied to improve the efficiency
and quality of metadata production.
Since manual metadata production is time-consuming and expensive, it is important to study what
possibilities improved access to the essence of video documents offers in redesigning metadata
production. It is quite obvious that more efficient tools for browsing video and audio contents decrease
the need for textual content descriptions. However, before making any major changes in the archival
practices, we need to know better what criteria are important for users in formulating queries and
assessing retrieved items at the browsing stage.
In this section, we report (1) what kind of search and selection criteria TV professionals apply for
archive video material and (2) how these criteria evolve during the search process. The user criteria
identified in the study may suggest potential access points for videos to be considered when
redesigning metadata production. The user criteria are studied in the context of real work tasks in
traditional TV broadcast archive and identified at three different points of the search process. We base
our analysis on the task-based approach (see Ingwersen & Järvelin 2000) because the journalistic work
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
20 (31)
Phase 1 (1.2-31.12.2010)
sets primarily the requirements for the material searched and it is probably the most solid component
of the changing environment
3.2
Model of video attributes
To be able to map users’ search and selection criteria on the metadata schema applied in an
crossmedia archive, we need to use an appropriate classification template. Many classification models
of image attributes and considerations for image indexing have been suggested in literature. Nearly all
researchers in the field have applied Shatford’s (1986) faceted classification of image subject
attributes (e.g., Armitage & Enser 1997, Enser 1993, Herzum 1993, Markkula & Sormunen 2000,
Sandom and Enser 2003, Westman and Oittinen 2006).
Shatford (1986) suggests, based on the works by Panofsky (1962) and Markey (1983), four subject
facets in image analysis: Objects (Who), Activities and events (What), Place (Where) and Time
(When). Within these facets, an image may be interpreted to represent both concrete and objective
entities (ofness, e.g. objects, places, actions) and abstract and subjective entities (aboutness, e.g.
themes, feelings, concepts manifested or symbolized by objects). Moreover, Shatford states that an
image is simultaneously specific and generic. For example, the image of the Brooklyn Bridge is at the
same time an image of a specific bridge, i.e. Brooklyn Bridge and an image of a bridge in general.
Shatford’s model is only a starting point in our analysis since it was originally developed for still
images and focuses on subject indexing excluding contextual metadata. The basic idea is that users’
search and selection criteria are values of video document attributes and may be assigned into facets
presented in the model. The analysis template was developed in the course of the data analysis and
new attribute classes were added as they emerged from the data. The resulting analysis template with
examples is presented in the Appendix 1.
Searchers’ criteria were clustered into three main classes: (1) video & audio content related criteria,
(2) document related criteria, which is usually quite objective data (e.g., tape location, copyright,
format) and (3) context of use criteria relating rather to the searcher and his situation than to the video
document (e.g., enough material already, various kinds of video) and thus quite impossible to consider
in documentation. Very general expressions of selection criteria, such as “whole content” or “the
topic”, were excluded from the analysis.
All objects, events and places specified by proper names were classified to named object, event and
place facets regardless how “unique” the places were. This follows the practice by Armitage and Enser
(1997). New content facets were created for audio, shooting technique (e.g., close-ups, black and
white videos and videos with aerial view) and technical qualities (e.g., graphics on the image).
About-level concepts were grouped into two: (1) themes (e.g., food chain, vacation, attitudes of Finns
to Africa and Africans in the past) and (2) impressions including emotions awoken by videos (e.g.,
funny, memorable) or expressed by people in video document (e.g., hungry). The reason for this
distinction is that indexing of impressions, feelings etc. in images are often regarded quite subjective
and inconsistent. For indexing themes, thesauri and subject heading lists are usually applied if
available.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
21 (31)
Phase 1 (1.2-31.12.2010)
All linear time expressions were classified to linear time-facet whether they were decades, specific
dates or months given by a journalist for a broadcast assistant to locate the video material, for
example, “it was last spring when there was about that in news…”. Also expressions such as last…
and when… were decided to indicate linear time facet, e.g., “last drought in Sudan” if it was possible
to find dates or years for the event.
To give an example of the facet analysis, the search topic “Savoy, Seurahuone, Adlon 1930s,
restaurant milieu of that time” were analyzed to comprise of three facets: 1) named place-facet (Savoy
or Seurahuone or Adlon), 2) generic place-facet/indoors (restaurant milieu) and 3) linear time-facet
(1930s’).
3.3
Crossmedia search criteria
3.3.1
Search topic categories
The 24 timeline interviews generated in all 79 search topics. First, search topics were categorized by
their general focus. Four different types of search topics were identified.
1. Search topics in which video image content was defined, e.g. “Armi Ratia (a Finnish beauty
queen) in 1960’s; “rain forest from the air”, “lab picture of plant improvement, plant cutting etc.”.
2. Search topics in which video audio content was defined, e.g., “X and Y (Parliament
representatives) commenting how embarrassed they are to get a rise in their salary”.
3. Search topics in which video topic was defined but not image or audio content specified, e.g.,
“latest drought in Sudan”, “Aimo Tukiainen, all material”. Some of these were very open such as
video clips representative for a particular time period, e.g, “wartime 1939-1942”.
4. Search topics focusing on known video clips in programs or on specific program series, e.g., a task
in which a jubilee program of a magazine was prepared and memorable moments of various insert
series in the past years’ broadcasts were looked for.
Table 3 presents the distribution of different search topic types in our data. In two-thirds of the topics,
visual content was defined. Topical searches were the second common search topic type (20%). 10%
of search topics concerned known items. Sound content was in the focus in two search topics.
Table 3 The distribution of different types of search topics in timeline interviews
Search topic type
WP4
CROSSMEDIA
#
%
Image content defined
53
67
Sound content defined
2
3
Topical
16
20
Known item and program series
8
10
Total
79
100
SOLUTIONS
D4.1.1.3
CROSSMEDIA
22 (31)
Phase 1 (1.2-31.12.2010)
3.3.2
Criteria in search topics
Next, a detailed facet analysis was performed for the search topics described by the interviewees. In
total, 195 facets were identified in the descriptions of 79 search topics. Thus, the average number of
facets in search topics was 2.5. The analysis template in Appendix 1 was used in the analysis. First, we
calculated the occurrences of different facet types over the whole data. Second, we compiled the same
data at search topic level. And third, a task-level analysis was conducted. Presenting results using
different units of analysis was to check how potential outliers in data affect on observed results. A
single journalist conducting a large number of exceptional searches is an example of an outlier.
The results show (see Table 4) that the most frequent criteria in the search topics were generic types of
people and objects (e.g. Eskimos, spiders, refrigerators) which form fifth of all criteria incidences and
linear time expressions (e.g., 1950’s) with the share of 16%. Named persons and objects, types of
places (rain forest) and impressions (looking hungry, pleading) were also common. In all, generic
types of criteria occurred slightly more often (38.5%) than specific criteria (31.8) and the abstract
about-level –criteria of themes and impressions were clearly fewer (13.3 %) than concrete of-level
criteria. Criteria relating to document context (e.g., program name, tape location) formed 9.2 % of all
criteria.
The criteria frequencies at search topic-level (N=79) show that types of people and objects appeared in
44% of the search topics and linear time expressions were used in 39% of search topics. These are
again the most common criteria types. Generic types of facets were slightly more common than
specific types of facets. About-level themes and impressions which formed 13% of all criteria
incidences were nonetheless applied in about third of the search topics. Document context criteria
which were document name, program series name, genre and tape location were also common
included over fifth of search topics. Other context criteria (see Appendix 1) were missing from search
topic definitions. Criteria relating to shooting technique, technical qualities and audio were few in
search topic definitions as was the criteria relating to use context.
Table 4 The distribution of user criteria in search topics using a) criteria occurrence, b) search
topic, and c) work task as the unit of analysis.
Facet type
% of all incidences
(N=195)
In % of search
topics (N=79)
In % of work
tasks (N=14)
CONTENT CRITERIA
88.2
98.7
Specific facets
31.8
54.4
100
78.6
Named person, object
9.7
22.8
64.3
Named event
1.0
2.5
14.3
Named place
5.1
12.7
42.9
15.9
39.2
50
Linear time
Generic facets
71.4
20.0
44.3
64.3
Action, types of events
8.7
21.5
42.9
Types of places
9.2
22.8
57.1
Cyclic time
0.5
1,3
7.1
Types of people & objects
WP4
57.0
38.5
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
23 (31)
Phase 1 (1.2-31.12.2010)
Aboutness
13.3
57.1
31.6
Abstract themes
4.6
11,4
35.7
Impressions
8.7
20.3
35.7
Shooting technique
2.1
5.1
14.3
Technical qualities (=colour)
1.0
2.5
14.3
Duration
-
-
-
Audio specific
1.5
3.8
21.4
DOCUMENT CONTEXT CRT
9.2
22.8
35.7
Origin (here program or series name,
genre)
5.1
12.7
28.6
Availability (here tape location)
4.1
10.1
7.1
Documentation
-
-
-
Earlier usage
-
-
-
USE CONTEXT CRITERIA
2.6
TOTAL
100
6.3
14.3
-
-
In data like ours, which consist of limited number of work tasks (in total 14), individual work tasks
may have a strong effect in the total frequencies of some facet types in the data. Some work tasks
included many search topics and it is possible that a certain facet is applied through all search topics
emerging during that work task. The last column in the table shows in how many work tasks the facet
types was applied. This analysis reveals that the criterion type availability which occurs in 10% of
search topics was actually applied only in one work task. In this task, it was crucial to have all the
video material quickly available in the near tape stock and the facet tape location was applied in every
eight search topics in that work task.
3.3.3
Selection criteria and evolution of criteria during search process
Table 5 presents the distribution of searching and selection criteria. It illustrates how the focus of user
criteria change during the search process starting from (1) search topic definitions, (2) continuing to
the first selection phase of studying textual metadata and finally (3) to the second selection phase
based on video image and audio.
Most of the subjects began the relevance assessments by reading the context of the search word, which
was highlighted on the computer screen. They explained that they tried to find out how the topic was
dealt in the document, whether there was any illustration available or was the topic, for example, only
mentioned in the video document. The subjects said that much of the first phase selections were based
on ‘feeling’ got from the textual metadata and that often there were “lot of optimism involved” in it.
Content criteria were the most frequently applied criteria by users when defining search topics,
searching and selecting video material. Taking a closer look at the criterion types in the content class,
search topics seem to focus on concrete persons, objects, places, action etc. attributes. Named objects,
events; places and linear time were used often in search topic definitions and supposedly also in
searching because these criteria are applied seldom later in the search process. This is quite obvious,
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
24 (31)
Phase 1 (1.2-31.12.2010)
since these attributes are usually well-documented in metadata. Linear time is the most common
criterion of this group, applied also in the first selection phase based on metadata.
Interestingly, types of people, objects and action which are very frequent criteria in search topic
definitions seem to be applied at both selection phases. This suggests that these criteria are not always
fully met in searching and also textual metadata and videos are studied to meet these criteria. On the
other hand, some of the criteria in this group, such as ‘lot of people’ and ‘action’, are probably used as
secondary criteria and therefore left to the end of the search process when more essential criteria are
already met.
Remarkably, cyclic time is almost missing from the criteria expressed by interviewees in the tasks
described. Elsewhere, however, cyclic time as criterion gets attention. For example, a broadcast
assistant comments during the timeline interview: “Then, one thing I have noticed, is that summer,
winter, autumn, spring is in very few... You look for some landscape and it is summer and you go with
the tapes and all are shot in winter time. But it is never said in the documentation. It would be good to
remember to put the season there”.
Impressions were applied frequently in video selection and formed the most common content criteria
type applied in the second selection phase. In all, it was used in more than half of the work tasks. The
selections concerning abstract themes, instead, were applied by reading textual metadata.
Shooting technique (e.g., close-ups) and technical qualities were applied mainly in the selections
based on video image. Shooting technique involved criteria such as close-ups, which were mainly
wanted, shooting angles and aerial views. Technical qualities concerned color related criteria,
technical quality and image manipulations. Texts placed on video image (e.g., journalists or
interviewees name) and image manipulations were common reasons to discard video documents when
watching them. According to interviewees this information was quite often lacking from the textual
documentations.
According to interviews, duration of video clips was a very common criterion and, therefore, probably
implicit criterion which was not always mentioned in the interviews. Very short clips were abandoned.
The data shows that duration as criterion was applied at both selection phases. According to the
interviewees the duration of the video clip on the wished topic was not always available in the
metadata or the subjects found difficult to interpret it.
Table 5 The distribution of user criteria at three different phases of search process: (1) search
topics, (2) selections based on textual metadata and (3) selections based on video image and
audio
Criteria type
CONTENT CRITERIA
Specific / Named
Named person, objects
Named event
Named place
WP4
CROSSMEDIA
SOLUTIONS
% if criteria incidences
Criteria in
Selection criteria
search topics
based on textual
(N=195)
metadata (N=70)
Selection criteria
based on video image
and audio (N=60)
88.2
31.8
82.5
3.2
61.9
11.9
9.7
1.0
5.1
D4.1.1.3
CROSSMEDIA
1.2
1.2
1.2
3.2
-
25 (31)
Phase 1 (1.2-31.12.2010)
Linear time
15.9
Generic / Type of
Type of people, objects
Action, type of event
Type of place
Cyclic time
About
38.5
8.3
23.8
20.0
8.7
9.2
0.5
13.3
Abstract theme
Impression
Shooting technique
Technical qualities
Duration
Audio specific
17.5
10.7
6.0
7.1
10.7
4.6
8.7
7.9
7.9
1.6
19
3.6
7.1
19
2.1
1.0
1.5
2.4
6.0
6.0
14.3
14.3
7.9
6.3
DOC. CONTEXT CRITERIA
Availability
Origin
Documentation
Earlier usage
9.2
34.5
1.6
CONTEXT OF USE CRITERIA
TOTAL
2.6
100
5.1
4.1
15.5
14.3
3.0
2.4
3.6
100
1.6
15.9
100
Audio content was criterion in five tasks, (statements in two, language spoken in two and music in one
task). The availability of audio track was criterion in three tasks and the quality of audio in one. Audio
content was the main focus in two search topics. Searching audio content was often regarded difficult.
Broadcast assistants and archivists sometimes documented statements of politicians etc. which were
believed to be needed later. Statements were usually searched for by using person names, events and
linear time as search criteria and searching thus based on memory.
Document context criteria were applied in 11 tasks of 14 and accounted 34.5% of all criteria applied at
the first selection phase. These attributes, which are related to document availability (copyright,
format, location), metadata (well documented, “good image” documented), origin (name, genre,
production company) and previous usage (“used too much”) are usually available on textual metadata.
The most common criterion in this group was copyright, which was employed in eight tasks of 14 in
the data. In fact, copyright is used as criterion in most tasks. However, according to the interviews,
many journalists find it difficult to understand the copyrights from the metadata and leave this
criterion for the broadcast assistant to apply. Thus, copyright as criterion is missing from some tasks
described by the journalists.
Context of use criteria were employed in half of the work tasks in our data. These criteria are applied
mostly in the second selection phase and often in the cutting phase of the program. The most common
criterion, applied in four tasks, was preference of various kinds of video clips and dropping off similar
video documents. Other criteria in this group were “have enough material already” and “compatibility
of video clips with other video material”.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
26 (31)
Phase 1 (1.2-31.12.2010)
3.4
Crossmedia search needs and selection criteria
The results suggest that user criteria in video searching and selection focus on video content. Content
criteria are applied frequently in search topic definitions as well as in both selection phases. The
results show a slight dominance of general types of needs (types of objects, places action) over
specific needs (named persons and objects, events, places and linear time).
Most search topics (67%) focus on concrete visual content instead of topical or document context
attributes. The facet analysis confirms this observation: concrete named and types of persons, objects,
places, action and linear time are dominating in user’s search topic definitions. Impressions are also
common criteria appearing in fifth of the search topics. Few audio related criteria were also found.
In video selection, some new document attributes become important and some fade away. Document
context criteria formed a considerable group of selection criteria and were applied almost solely at the
first selection phase by studying textual metadata, in which these attributes are available. Copyright,
genre and program name are the most common criteria in this group.
Instead, subjective criteria such as impressions and criteria in class ‘context of use’ are applied clearly
more often in selections based on video image and audio. In all, impressions, e.g., facial expressions of
people or overall feelings depicted from video documents form an important group of selection
criteria. According to interviews, criteria connected to impressions are used also in searching but the
subjects often found these searches troublesome. Because of the subjective nature of these criteria,
textual documentation of impressions and finding right search keys when searching them is difficult.
Other types of criteria which were applied almost solely in watching video documents are criteria
relating to shooting technique and technical qualities. These criteria are very visual concerning
shooting angles, distances, color, technical quality and texts or graphics placed on video image.
The user criteria were studied in the context of video seeking in a traditional broadcast archive.
However, the time-line interviewing method (see Schamber 2000) and the task-based approach
adopted allows the identification of the underlying criteria originating from the requirements of
journalistic program production. The journalistic work tasks in making programs are probably the
most solid component of the changing environment. Therefore, the results are considerable
independent from the search system in use.
4
Conclusions
The results on metadata workflows reflect the different operating environments of the two companies.
The image agency serves both internal and external clients with the content repository, whereas the
broadcasting company mostly uses its content repository for internal production-related content needs.
Also the complexity of the production process could be said to be higher in the broadcasting company.
A comparison of the metadata models of the two companies highlights the differences in the essence
described. At the television company, emphasis on time dependent metadata increases the number of
metadata fields in the model and guides the annotation process of the content according to the strata.
The image agency’s model on the other hand emphasizes caption-like descriptions of the image
together fields adhering to industry standards, a requirement in the minute-by-minute content delivery
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
27 (31)
Phase 1 (1.2-31.12.2010)
from international sources to many national clients. The issues related to content annotation at the two
companies were very similar. The role of metadata is seen as an auxiliary in the key processes of
producing and selling the content. All in all, metadata was seen to serve multiple roles with potentially
conflicting goals. The metadata models are still under development, and the work practices
correspondingly evolve. Issues exist with overlap in metadata workflows, and training staff in
different roles in the process.
The analysis of search topic descriptions and selection criteria applied both in reading textual metadata
and in watching audiovisual media contents revealed the importance and role of particular metadata
categories. In search topic descriptions, users focused on specific entities (named persons or objects,
events and places), generic entities (types of people or objects, events, action and places), abstract
themes (leaving content open), and linear time. When reading textual metadata users paid some
attention to specific and generic contents but documents’ contextual attributes became more important.
In the watching phase, users checked audiovisual features: what it looks like (generic objects), what
emotions it might arose (impressions), what shooting techniques were applied, is technical quality ok
and so on. The findings suggest that the most frequently mentioned search criteria – except linear
time - point to annotated metadata fields (Subject strata – “aihe”, Key word strata – “sisältö”).
Assigning annotations to these fields requires currently a lot of resources in metadata production.
Unfortunately, the state-of-the-art technology for automatic video and audio analysis cannot solve the
problem of identifying named or generic entities in unrestricted video materials.
Based on the two studies reported here, several obvious tracks for redesigning metadata production
should be investigated to find ways to improve cost-effectiveness:
1. Redefining the annotation process. For efficient metadata production, content annotation should
begin early in the production process, and the first high-level content descriptions should originate
from the producer of the essence (e.g. photographer). The systems utilized need to support the
flow of metadata throughout the production process. This would ensure the knowledge gained in
planning and shooting would be retained with the content and avoid double entry of descriptive
information into separate systems.
2. Goals and quality of annotation. Metadata was seen to serve multiple internal and external
functions. If annotation is to be as fast as possible to enable rapid access to content (in content
sales) and to minimize the time spent in annotating (internal workload), the quality criteria for the
annotation for different types of goals and at different stages of the process need to be specified.
This will also lead to changes in the metadata workflow.
3. Utilizing auxiliary information. The planning function was recognized as important in both
companies’ production processes. However, the information resulting from the planning work
could be utilized with better efficiency to aid not only the resulting essence production but also
content annotation. Furthermore, in journalistic process, texts such as program scripts,
introductions and news reports are produced but not exploited in constructing the archive. The
linkage between these texts and the final audiovisual content might be indirect but anyway, it
could offer new options for building additional text indexes for the archive. For this end, the
collection and integration of the planning and editing information to the content through the
systems needs to be implemented.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
28 (31)
Phase 1 (1.2-31.12.2010)
4. Annotating for queries. In the past, audiovisual contents have been annotated by text to help the
user to preselect the video without seeing the contents, for example to imagine how objects in the
moving image look like, what impressions they might awake. In a crossmedia archive, this is no
longer necessary since users can check the audiovisual content easily.
5. Focused areas for video analysis. The study revealed several potential uses for the applications of
video analysis in focused identification problems, for example, identification of faces (close-ups),
number of people (or faces), texts and graphic element attached to the image (making it useless),
context of shooting (indoor/outdoor, aerial view, summer/winter, day/night). By this types of
video analysis users could be served by a preselected list of available restrictions that could be
applied in querying.
6. Additional text indexes by speech recognition. Speech dominates in many TV program genres.
The reliability of automatic speech recognition systems is high enough for building additional
indexes for free-text searching. Combined with fuzzy keyword matching techniques the approach
could improve searching of non-annotated programs and work well in program genres made in a
standard studio environment.
Current approaches to distinct phases and systems in the essence and metadata workflows need to be
rethought from the perspective of annotation. The metadata workflow is not fully integrated into all
phases (e.g. planning) or systems (e.g. separate editorial systems) causing heavier workloads than
necessary. Users need various types of textual metadata in searching and selecting materials in
crossmedia archives. The present annotation practices should be carefully analyzed and redesigned
since 1) the current workflow does not fully utilize available information 2) different annotation
approaches may be required for different goals and phases in the production (e.g. outside sales,
internal searches) and 3) users’ instant access to audiovisual contents makes some sub-goals of the
traditional approach outdated. There are some promising possibilities to develop automatic methods
for metadata production based on video and audio analysis which should be tested.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
29 (31)
Phase 1 (1.2-31.12.2010)
Appendixes
Appendix 1. The model of video search and selection criteria with examples
THE MODEL OF VIDEO SEARCH AND SELECTION CRITERIA
1. Video content criteria
1.1. Named or specific
1.1.1. Named persons and objects
1.1.2. Named events (GATT meeting)
1.1.3. Named places (Australia)
1.1.4. Linear time (1942, 1960’s, March last year)
1.2. Types of or generic
1.2.1. Types of objects (nurses, snake, refrigerators)
1.2.2. Types of action (playing, baking) and events (Christmas)
1.2.3. Types of places (savannah) and
1.2.4. Cyclic time (not winter)
1.3. About: abstract themes (food chain)
1.4. About: impressions (funny, hungry)
1.5. Shooting technique (close-ups, aerial view)
1.6. Technical qualities (graphics on the image, tint, black and white, technically good)
1.7. Duration
1.8. Audio specific criteria
1.8.1. Audio content
1.8.2. Audio technical (audio track available, technically good / bad audio
2. Document context criteria
2.1. Origin (name, author, production country, genre, authenticity)
2.2. Availability (tape location, format, copyright, video document the only available)
2.3. Documentation (level of documentation, “good image” documented
2.4. Earlier usage (used too much)
3. Context of use -criteria
3.1. Various kinds of video (various kinds, not similar)
3.2. Need satisfied (enough material already, all covered)
3.3. Coherence with other material (coheres with other clips, coheres with the program)
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
30 (31)
Phase 1 (1.2-31.12.2010)
References
Armitage L-H, Enser P-G (1997) Analysis of user need in image archives. Journal of Information
Science, 23:287-299.
Enser P-G (1993) Query analysis in a visual information retrieval context. Journal of Document and
Text Management, 1:25-52.
Gylfe, C. (2009) Metadata in Cross Media Editorial Processes. Master’s Thesis. Helsinki University of
Technology, Department of Media Technology. Espoo, Finland. 84 p.
Herzum M. (2003) Requests for information from a film archive: a case study of multimedia retrieval.
Journal of Documentation 59 (2) 168-186.
Hodge, G. (2001) Metadata made simpler. NISO Press, Bethesda, MD. 15 p.
Ingwersen, P. & Järvelin, K. (2005). The turn. Integration of information seeking retrieval in context.
Dordrecht: Springer.
Kauranen P. (2008) Perinteisten TV-arkistojen hakukäytännöt ja ohjelmatyöntekijöiden käsitykset
integroidun videoarkiston käyttöönoton vaikutuksista. University of Tampere. Department of
Information studies. Master’s thesis, 98p. Available at: http://tutkielmat.uta.fi/pdf/gradu02354.pdf
Markey K (1986) Subject access to visual resources collections: a model for computer construction of
thematic catalogues. Greenwood Press, Westport, Connecticut, 1986.
Markkula & Sormunen (2006) Video needs at the different stages of television program making
process. In: Proceedings of the 1st International Conference on Information Interaction in Context
(IIiX), pp.111-118. [Available at: http://portal.acm.org/citation.cfm?id=1164844].
Markkula, M. & Sormunen, E. (2000). End-User Searching. Challenges Indexing Practices in the
Digital Newspaper Photo Archive. Information Retrieval 1(4): 259-285
Mauthe, A. & Thomas, P. (2007) Professional Content Management Systems. Chichester, West
Sussex, John Wiley & Sons Ltd. 314 p.
Panofsky E (1970) Meaning in the visual arts. Penguin, London.
Pereira, F., Vetro, A. & Sikora, T., (2008) Multimedia Retrieval and Delivery: Essential Metadata
Challenges and Standards. Proceedings of the IEEE, Vol. 96, No. 4, p. 721-743.
Sandom C. & Enser PGB. (2003) Archival moving imagery in the Digital Environment. In: Anderson,
J., Dunning, A. & Fraser, M (eds.) Digital resources for the humanities 2001-2002. London: Office for
Humanities Communication, Kings College, 2003.
Schamber L. (2000) Time-line interviews and inductive content analysis: their effectiveness for
exploring cognitive behaviors. JASIS 51(8): 734-744 (2000)
Shatford S (1986) Analyzing the subject of a picture: a theoretical approach. Cataloguing and
Classification Quarterly, 6:39-62.
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
31 (31)
Phase 1 (1.2-31.12.2010)
STT-Suomen tietotoimisto. n.d. Mikä on STT? (online) [Referenced on 10.5.2010], available in
WWW-format: <URL:http://www.stt.fi/fi/>
Tan E. & Müller H. (2003) Integration of Specialist Tasks in the Digital Image Archive. In Cognition
in a Digital World, ed. Herre van Oostendorp, LawrenceErlbaum Associates, Inc., Publishers, New
Jersey
Vänttinen, K. (2010) Case YLE D-keskus (online). [Referenced on 14.5. 2010, available in WWWformat: <URL: http://www.digiwiki.fi/fi/images/1/14/2010-03-05_Vänttinen.pdf >
Westman & Oittinen (2006). Image retrieval by end-users and intermediaries in a journalistic work
context. In: Proceedings of the 1st International Conference on Information Interaction in Context
(IIiX) , Copenhagen, Denmark, pp. 102-110.
Yleisradio Oy. Yle Info. n.d. (online) [Referenced on 10.5.2010], available in WWW-format: <URL:
http://www.yle.fi/fbc/thisisyle.shtml>
WP4
CROSSMEDIA
SOLUTIONS
D4.1.1.3
CROSSMEDIA
32 (32)

D4.1.1.3 Crossmedia C+

Transcription

Similar documents

ACCESS GRANTED - Rule of Law Institute of Australia

aria music server

FotoStation 7.0

Print this article - Dublin Core® Metadata Initiative

aria music server - Ochoa y Díaz Llanos

Streams, Structures, Spaces, Scenarios, and Societies

SDIGER project

THOMSON REUTERS PRESENTATION TEMPLATE