An Immersive View Approach by Secure Interactive Multimedia

Transcription

An Immersive View Approach by Secure Interactive Multimedia
Noname manuscript No.
(will be inserted by the editor)
An Immersive View Approach by Secure Interactive
Multimedia Proof-of-concept Implementation
Pablo Antón · Antonio Maña · Antonio
Muñoz · Hristo Koshutanski
Abstract Live media streaming is a field that recently has had a great impact
in the scientific community, especially in the case of interactive media streaming.
In this paper we propose a reference architecture conceptualizing an immersive
view effect by considering heterogeneous tracking devices and enriched movement
control over heterogeneous stream image sources. A proof-of-concept prototype
implementation of the reference architecture is presented, called Live Interactive
FramE (LIFE), illustrating the potential and value of the immersive view concept.
Our work is part of the DESEOS research project that aims at applying an Ambient Assisted Living paradigm to a targeted scenario of hospitalised children. The
main goal of LIFE is reducing stress and isolation during hospitalisation by enabling an immersive view of school activities via live media streaming. Functional
and security aspects of LIFE are presented along with details of implementation
and performance evaluation. Conclusions of experiments show that LIFE enables
practical secure media streaming solution with optimal video quality settings.
Keywords Virtual Window · Immersive View · Multimedia Security · Ambient
Assisted Living
1 Introduction
The ever-increasing bandwidth of the Internet and the adoption of new technologies (e.g. NGN [1] and SIP [2], RTP [3], etc.) related to the transmission of multimedia signals over IP networks have fostered the emergence of many interesting
video-based applications. However, while the use of Internet for this purpose obviously opens many possibilities for interactivity, few systems have really exploited
this feature, and the interactive capabilities have remained very limited and applied only to the selection of a few preferences by the user. Likewise interactive
Internet-based TV has been advertised for years without going much further than
the provision of some user feedback.
In the field of communications, and in multimedia in particular, the concept
of control refers to the control of the low-level communication mechanism and
protocol, thus dealing with issues like quality of service, error management and
reporting, communication establishment and configuration, client management,
E.T.S.I. Informatica, Campus de Teatinos, Malaga 29071, Spain
Tel.: +34-952133303
E-mail: panton@lcc.uma.es (P. Antón), amg@lcc.uma.es (A. Maña), amunoz@lcc.uma.es (A.
Muñoz), hristo@lcc.uma.es (H. Koshutanski)
2
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
etc. However, in this paper we are interested in control in the sense of streaming
media control from the user perspective, thus focusing on aspects related to source
control (pan, tilt, zoom, camera movement, ), transport control (play, pause,), etc.
In some fields, such as video-surveillance or teleconferencing, some capabilities for interactive control of the image have been offered, but these capabilities
have been developed in an ad-hoc and proprietary manner and have not been
standardized, thus representing only a very limited advance.
In this paper, we present an architecture that supports advanced image control
for the transmission of multimedia streaming. The architecture uses independent
streaming and control channels to enable advanced control capabilities. We have
implemented the architecture in a project, called DESEOS [4], in the domain
of Ambien-Assisted Living (AAL) [5]. This project has the goal of developing
systems to help children who spend long periods in hospitals to keep in touch to
their family and school environments. One of those systems is the Live Interactive
FramE (LIFE); a system that simulates a virtual window to a remote location
with the goal of providing an immersive effect. The goal of LIFE is to increase the
feeling of being present in the classroom, thus reducing stress and increasing the
school performance of the children being cared for. LIFE requires a special type
of interactive media streaming service in order to simulate the effect of looking
through a window by changing the part of the classroom displayed depending on
the position and viewing angle of the hospitalised child.
In the rest of the paper, Section 4 overviews related approaches and positions
them with respect to LIFE. Section 2 describes the general reference architecture
for the interactive multimedia streaming. Section 3 overviews the LIFE application
scenario, as well as requirements and implementation details of the LIFE application together with performance evaluation results. Section 5 concludes the paper,
and Section 6 outlines future work.
2 Our Proposal
Our approach aims at generalizing the concepts of a virtual window and a remote media space control by providing support for (i) heterogeneous mechanisms
representing (tracking or interpreting) user’s position or movement intention to
control remote media source; and (ii) enriching movement control over heterogeneous stream image sources, for example real-time streaming or 3D interactive
environments.
The reason for the heterogeneous device support is being able to provide from
a more economic solution e.g., using wearable glasses, to more flexible but expensive devices, e.g., Kinect1 . The reason for supporting movement control over
heterogeneous image sources is to enable adoption of our approach in a wider set
of application domains.
We propose an architecture designed to take into account a variety of devices to
enable tracking of user’s position, and a variety of image sources to enable immersive view. Figure 1 shows the conceptual view of the architecture. The architecture
enables certain level of interactivity on the video streaming (regardless of the image source). The observer is an active participant in the communication, in such a
way that the system deduces the remote perspective desired by the observer. The
architecture is based on a bidirectional client-server model, and is composed by
the following modules:
• Control position manager. The Control Position Manager Client (CPMC) com-
ponent is in charge of deducing the perspective that the observer wants to
1
http://www.xbox.com/es-es/kinect
An Immersive View Approach by Proof-of-concept Implementation
3
Fig. 1 Immersive View Reference Architecture
visualize (e.g.., Wii based head tracking described in Section 4.1). A relevant
aspect of the CPMC consists of its design, which is devised to work with heterogeneous devices (e.g., wiimote, Kinect, joystick, smart phone, etc). CPMC
translates the desired perspective into data that are the basis to compute the
new coordinates in the desired perspective. The Control Position Manager
Server (CPMS) component gets as input the information of the desired perspective coordinates from the CPMC and computes these data in the remote
space to provide the new perspective (i.e., spinning the camera to a specific
target position). CPMS is designed to work with different image sources (i.e.,
IP camera, trackerpod, pre-recorded Video, etc).
• Communication Module. This component deals with the creation and configuration of the streaming and control channels. In the configuration process,
several security aspects and vision ranges can be handled. As it is shown in
Figure 1, two instances of the Communication Module are connected through
two different channels:
• Interactive and control channel. Data transmitted through this channel are
used for configuration and communication of coordinates with the remote
space.
• Multimedia streaming channel. This channel is used for the streaming transmission. A catalogue of different protocols can be implemented in this channel (i.e., SRTP, RTMP, VRTP, etc.).
• Streaming Adaptor Module (SAM). An optional component that is in charge of
changing some streaming conditions on the client side according to some context conditions (i.e., to resize the frame for a natural vision in low bandwidth
connections).
• Security and Trust Manager. An optional component that is in charge of (i) enabling secure streaming between the two communication sides on both channels, including trusted channel establishment, (ii) enabling security on the level
of device authentication and image source secure identification, and (iii) enabling controlled access to CPMS by observer’s side CPMC component.
The proposed model can be seen as a reference architecture targeting immersive
view effect over an image source. The remote media space control is achieved by
tracking/interpreting location and position aspects of the observer instead of the
observed object. The actual realization and implementation of the architecture
will depend on the application domain and related scenarios, as we will see in the
rest of the paper.
4
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
3 LIFE Implementation
As a proof of concept implementation, called LIFE, has been developed for the provision of a set of means to mitigate problems caused by long-term hospitalisation
of children [6, 7].
3.1 Application Scenario
We consider a scenario which involves two spaces (physical environments): Hospital
where children are hospitalised; School where the hospitalised children’s classmates
and teacher stay. Figure 2 illustrates the set of devices considered in each of these
spaces described as following:
• School: A video capture device (IP camera or webcam), a microphone (if not
integrated in the camera), and a PTZ (Pan, Tilt, Zoom) device (TrackerPod2 ).
• Hospital: An information displayer (monitor or TV screen), and a head tracking
device such as the bluetooth infrared camera of Nintendo Wii remote control
or a Kinect [8, 9].
Fig. 2 LIFE Application Scenario
3.2 Scenario Requirements
The scenario described above entails a minimum set of functional requirements of
the LIFE application such as:
• Fluent video streaming : it is essential to achieve a fluent video streaming that
produces a natural reality feeling for the user.
• Accurate and instantaneous user tracking : it is important to achieve smooth con-
trol on the remote media space by computing an accurate and instantaneous
user position.
• Zoom capabilities : it is essential to provide zoom capabilities on the remote
image source to achieve real feeling of the immersive view effect.
Given the target users of LIFE application are hospitalized children, security and
privacy aspects are important to be addressed. We have identified several specific security requirements for the scenario, which are defined and supported by
2
http://www.trackercam.com/TCamWeb/productdes.htm
An Immersive View Approach by Proof-of-concept Implementation
5
the DESEOS Security Architecture [10], particularly, requirements for authentication and authorization. In the following we summarize the most relevant security
requirements.
• Confidentiality and privacy of multimedia data, such as audio and video, when
transmitted across local and Internet network.
• Authentication of network entities such as school information servers, authen-
tication of pupils when given access to the school multimedia data.
• Accountability of pupils when accessing media data of the school information
system. This requirement is closely related with authentication and confidentiality requirements.
• Access control to school-related multimedia resources accessed by pupils. A
proper certificate-based controlled access mechanism ensuring decentralized
trust establishment.
3.3 Software Architecture
Figure 3 shows the software architecture of LIFE, and how the LIFE application
realizes the client-server model of the reference architecture. The software architecture takes advantage of DESEOS Core [10] to connect the realms of hospital
and school. Additionally, the architecture implements several services, not part
of the conceptual model, to carry out the immersive view effect, such as Head
Tracking, Video Streaming and Tracker Services.
DESEOS Core is in charge of establishing appropriate communication channels with suitable security protocols. This provides a flexible approach to adopt
different protocol solutions in the future to enable secure communication without affecting the LIFE architecture. Each realm involves the use of several devices, which can be replaced by others with similar functionality. To achieve that
particular feature, both DESEOS Core and LIFE application make use of OSGI
Framework [11], since this provides an easier way to work with services. Thus, we
highlight the feature that allows to connect LIFE with heterogeneous devices in a
secure way.
On the server side, the LIFE Server App makes use of two different services,
Video Streaming Service and Tracker Service. The former is used to provide a
video streaming which will be transmitted to client applications using the SRTP
protocol. We notice that there are several implementations of the Video Streaming
Service component depending on the actual video capture device (webcam or IP
camera) and the multimedia libraries used (Xuggler3 , vlcj4 , dsj5 , etc). The latter
is in charge of moving a camera device to a specific position.
The PTZ devices have rotation properties such as turn right/left (Pan) and
turn up/down (Tilt). A special property is the zoom capability, provided by some
PTZ devices if a direct connection to the camera device exists. In our case, the
zoom is done by the software. Currently, it is implemented by means of the TrackerCam component which has an associated HTTP server in charge of accepting,
processing and performing rolling of the physical device.
On the client side, the LIFE Client App makes use of the Head Tracking Service
to get head position data according to some anchor point. Two different implementations have been developed to address this idea, one based on the WiiUseJ6
3
4
5
6
http://www.xuggle.com
http://caprica.github.com/vlcj
http://www.humatic.de/htools/dsj.htm
http://code.google.com/p/wiiusej
6
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
Fig. 3 LIFE Software Architecture
library, which is used to access the Wiimote (remote control of the Wii system).
Another one based on the OpenKinect7 library that allows to access the Kinect
(3D sensor device of the XBox 360).
The WiiMote approach is implemented by using a infrared wiimote sensor,
which locates user position by collecting data from user infrared glasses. The WiiUseJ library functionality is used to process input information from the wiimote
sensor. Another implementation is based on the Kinect functionality, which is more
complex since OpenKinect library still does not provide accurate methods to get
a user’s head position. Then we make use of openCV to analyse and recognise
Kinect data. This analysis is essentially a face recognition based on identifying the
biggest face in the frame as a reference point. The head position coordinates are
computed according to the coordinates x and y from the position of the face in the
frame, while the z coordinate is obtained by using the depth sensor of the Kinect
device.
3.4 Immersive View Realization
Almost all the process to achieve the immersive view is carried out on the client
side. However, the server is in charge of opening a control socket on a predefined
port to wait for a client connection request. When the connection is established
some control parameters are set and exchanged, the Control Channel is started and
the Multimedia Streaming Channel for streaming communications is initialized.
On the client side, the video streaming is received, decoded and sent to LIFE
Client App component. This component renders input frames depending on current user’s head position to simulate the immersive view. Figure 4 shows the resized
frame effect compared to the original frame received. The LIFE Client App computes the values of height and width according to the user position obtained from
7
http://openkinect.org/wiki/MainPage
An Immersive View Approach by Proof-of-concept Implementation
7
Fig. 4 LIFE Resized Frame Effect
Fig. 5 LIFE Tilt-Pan Calculation
the Head Tracking Service. Figure 5 shows the tilt-pan calculation in relation to
the user’s previous position and the user’s current position. The equations in the
figure show the functions used to calculate pan, tilt and zpan values.
After a testing procedure, we have deduced that in order to have an accurate
system it is necessary to get the user head position data as input, at least as often
as the video streaming frequency (fps). This requirement is necessary to guarantee
smooth movements of the immersive view. The computational overload of the Head
Tracking Service (as Kinect solution does by using JavaCV) made us to compute
head position coordinates by means of interpolation to reach a smoother movement
and therefore a more realistic user immersion. The number of interpolation items
can vary depending on a head position coordinate rate parameter, in such a way
that the optimal settings for the immersive view require a good calibration of the
head position coordinate rate parameter.
Additionally, our implementation can be adapted dynamically to some context
conditions to reach an optimal efficiency. In this line, for those cases of a limited
bandwidth, video streaming parameters can be configured according to these con-
8
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
ditions. Thus, frame rate parameter can be set as lower as possible. Consequently,
the smoothness is achieved by the internal mechanism implemented using the same
frame several times and just changing the head position coordinate.
An error correction mechanism has been implemented to discard error flow positioning returned by devices (senseless data). Essentially, the mechanism behaves
as if two consecutive coordinates differ more than a reference predefined control
value, the last coordinate is discarded, but if this happens more than N times (the
value of N can be configured) in a row, we get the last coordinate as a valid one
and the error correction mechanism is restarted. In this way, instantaneous head
movements are discarded unless the head position remains a longer time in one
zone range.
When the resized frame has to be moved beyond original video resolution,
(frame boundaries in Figure 4) Life Client App sends a move command through
the Control Channel to position the video capture device to a new region. We
have defined a set of predefined areas (spaces) where a PTZ-type device should
move to. The granularity of these areas depends on the accuracy of movements
of an current PTZ device in order to avoid unnecessary noise (shake) movements.
When the resized frame has to move to a new position beyond the original frame,
the frame controller computes the next area where the current PTZ device should
move and use the control channel to set the video capturing device to cover the
new position of the resized frame.
3.5 Communications Security
An important aspect of LIFE practical adoption is the addressing of the security of
LIFE communications. We adopted X.509 [12] certificates to encode the necessary
information about pupils and schools. We define a set of subject-specific attributes
that allows us to express information of an entity in the LIFE scenario necessary
for authentication, access control and communication establishment.
Figure 6 shows security communications of LIFE application scenario. When
a secure and trusted channel with the LIFE server is needed the Control Channel
is established. The communication on this channel (LIFE client and LIFE server)
entails the use of pupil and school certificates using the TLS protocol8 . An access
control process has been implemented (after TLS handshake), such that on the
LIFE server side is verified whether the pupil data in the certificate states the
correct school, year of study and class id. We evaluate the year of study and class
id to ensure that the current pupil will have access to the correct multimedia
streaming. The protocol considers the case of parallel connections of more that
one pupil to the same school. Then in the case of more than one pupil of the
same year and class requesting access to the streaming channel, the LIFE server
configures the multimedia streaming to passive mode, which consists on positioning
the camera in the initial calibration state and disabling the control functionality
on the Tracker Service. In that way, none of the pupils has control on the camera
and the immersive view on the LIFE client application uses only the resized frame
mechanism (of software-simulated immersion).
A school authorisation process on the side of the LIFE client has been implemented, which ensures that the school certificate is not only valid and trusted
(a valid DESEOS school entity) but also authorised, i.e. if the school (by name,
locality and country) matches to the school in the pupil certificate. In that way,
confidentiality and privacy of pupil’s communications with the correct school are
enforced.
8
http://tools.ietf.org/html/rfc5246
An Immersive View Approach by Proof-of-concept Implementation
9
Fig. 6 LIFE Communications Security
Once a secure and trusted channel is established the next step is the initialisation of the streaming, which is performed over the secure channel established (over
TLS). It has two steps: Tracker Service initialisation and Video Streaming Service
initialisation. The first step determines if a Tracker Service is enabled, while the
second step determines the Video Streaming Service properties of current camera
settings and some security parameters for the media streaming channel. When
both of them are initialised, the LIFE server opens a control channel and a media
streaming channel, and then returns the media streaming properties object to the
LIFE client over the control channel. In turn, the LIFE client application opens a
media streaming channel with the indicated settings and the streaming protocol.
We remark that the control channel, running over TLS, is used only for immersive
view control commands (not for media streaming) so that the induced overhead
of TLS does not affect the immersive view effect.
We have adopted the usage of SRTP for protecting media streaming channel
with confidentiality, authenticity and integrity. The LIFE control channel remains
over the TLS channel already established. The LIFE control channel is used for
exchanging different commands, for instance with the Tracker Service upon movements by the pupil, while the streaming channel is returning the media of the
camera.
As it was mentioned in Section 4, the SRTP has been specifically designed
to provide core security properties with a strong level of security by using well
known security building blocks and algorithms. At the same time, it provides
efficient security processes with a minimal additional cost of data transmission,
which is an important aspect for live media streaming. There is a master key and
a master salt cryptographic element as part of the SRTP configuration, which the
LIFE server sends to the LIFE client along with the media streaming properties
during initialisation over the secure control channel.
The LIFE server generates a new master password and master salt for any
LIFE client application authorised to access media streaming of the school. This
means that if more than one pupil access to the streaming of the school they will be
10
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
using different master keys and salt, and will respectively derive different session
keys protecting the streaming data. The LIFE sever keeps each master key only
while the session is open(e.g., the current configuration is per 8 hours).
3.6 Streaming Performance Evaluation
LIFE evaluation has been done with a hardware specification shown in Table 1
and with a GrandStream GXV 3601 IP-Camera. All software used in LIFE is open
source, allowing us to set in code points of measurement to obtain necessary data
for our evaluation. The main software package used is VLC 1.2.0, which makes use
of mpeg (libavcodec 53.2.0, libavformat 53.0.3), x264 codec and live555 Streaming
Media library.
CPU
Memory
Graphics
OS
LIFE Client
Pentium Dual-Core E5300 2.6 GHz
4GB DDR2 800 MHz
NVidia GeForce 9400 GT
Ubuntu 10.04 LTS
LIFE Server
Intel Core Duo T2500 2GHz
1GB DDR2 533 MHz
NVidia GeForce Go 7400 128 MB
Windows XP
Table 1 Hardware Specification
In order to show the real frame processing time, we have excluded network
delays from the measurements. Therefore, all evaluations have been done in a
LAN ensuring enough bandwidth for media streaming. Another consideration to
take into account is that the encryption and decryption operations consume the
same computational time (are the same operation) due to the SRTP underlying
cryptographic mechanisms. Therefore, we measured the security overhead time on
the LIFE-Server and multiplied it by two to get the total security process time.
In order to see how the security process affects video streaming, we have extended the computation processes to different fps (frame per seconds) configurations. Figure 7 shows the performance details of LIFE streaming for both, secure
and non-secure versions for two video resolutions. Each of the graphics represents
the video processing time in ms on y axis over different fps on x axis. The nonsecure video streaming (red line) is compared to the secure one (blue line) to show
how the security modules inclusion affects the efficiency in terms of time consumption. More details of the security performance of different video resolutions can be
found in Table 2.
Fig. 7 LIFE Streaming Performance Evaluation for 640x480 and 1280x720 Resolution
An analysis of the main result, in the first case (640 x 480), shows that the
difference between the secure and non-secure streaming is almost negligible even
An Immersive View Approach by Proof-of-concept Implementation
Res\fps
640x480
800x480
800x592
1024x768
1280x720
1280x960
5
43,6 / 40
88,9 / 60
103,9 / 75
208,6 / 120
245,8 / 130
280,8 /165
10
87,2 /
177,9 /
207,9 /
417,3 /
491,6 /
561,6 /
80
120
150
240
260
330
15
130,8 / 120
266,8 /180
311,8 / 225
626,0 / 360
737,5 / 390
842,5 / 495
20
174,4 / 160
355,8 / 240
415,8 / 300
834,7 / 480
983,3 / 520
1123,3 / 660
11
25
218,1 / 200
444,8 / 300
519,8 / 375
1043,4 / 600
1229,2 / 650
–
30
261,7 / 240
533,7 / 360
623,7 / 450
1252,1 / 720
1475 / 780
–
Table 2 LIFE Streaming Performance per Frame Resolution and Frames per Second
(secure/non-secure) in ms
in the case of 30 fps (21 ms). However, we are interested in inspecting the security
overhead for higher video resolutions. We show in Figure 7 the media streaming
performance for the highest resolution. In the case of 1280x720 the upper bound
of feasible secure streaming is 20 fps, taking into account that network delay is
not included in the evaluations. Respectively, in the case of 1280x960 the upper
bound is reduced to 15 fps.
A reference point for the above conclusions is the one-second threshold of
process time in order to have normal video behaviour. We have considered that
all measurements above that threshold are non-viable cases since the processing
time exceeds more than a second (impossibility to show all frames per second).
3.7 Network Settings Impact
We have considered the end-point processing time including security and its impact
on LIFE media streaming performance. However, there are also some networkrelated aspects such as bandwidth, packet loss and packet delay variation that
impact on the LIFE streaming performance. Generally, any live media streaming
is vulnerable and sensitive to these aspects. On the other side, given the way
SRTP works, the protection of media streaming data by SRTP is as sensitive to
network settings as the RTP protocol is without security. In that context, any
network measures against packet loss or packet delay variation that apply to the
RTP media streaming could also be applied to the SRTP streaming, since the RTP
payload is secured and the SRTP packets structure remains as processable as the
RTP packet (refer to [13] for details).
The bandwidth aspect mostly impacts on the live streaming performance. In
case of expected low bandwidth on the side of LIFE clients (hospitals), the calibration module of the LIFE application will allow the administrator entity to lower
either the frames per second or the video resolution, or both of them, in order to
optimize the streaming data to the bandwidth. In the concluding remarks below
we provide some optimal settings of LIFE application taking into account some
possible bandwidth restrictions.
4 Related Work
There are several different areas of related work: virtual window approaches, realtime protocols for streaming media, security of media streaming solutions, and
remote media space monitoring.
12
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
4.1 Virtual windows
There is a certain number of works that couple the movements of the observer’s
head to the shifts in the image [14, 15] but none of these systems took into account
the idea of the fixation point as the link between the observer movements and
the image shift. Overbeeke and Stratmann [16] proposed the first method for
three dimensional image presentation based on a fixation point as linker. Based on
this method a Virtual Window system was proposed [17]. This system uses head
movements in the viewer location to control camera shifts in a remote location.
As a result, viewers have the impression of being in front of a window allowing
exploration of remote scenes rather than a flat screen showing moving pictures.
One of the most relevant goals of this approach is to present an immersive sensation
to the viewer, but due to some drawbacks it was not reached. Among these, we
highlight the fact that the techniques used were not sufficiently efficient to achieve
a fluent video streaming and an accurate head tracking.
The Virtual Window effect is achieved by means of video capture device movements, but not considering frame resize approach due to a limited video resolution.
In our settings, we improved the Virtual Window effect by exploiting the resized
frame approach over high definition video quality.
The proposal by Chung Lee [18] focuses on head tracking. It uses the infrared
camera (Wiimote) and a head mounted sensor bar (two IR LEDs) to accurately
track the location of user head modifying the screen view according to the anchor
point of the user. However, this work is limited to the Wii remote device. Rational
Craft has commercialized a product based on the Chung Lee approach that plays
locally recorded videos 9 . These two approaches are based on the use of 3D models
and recorded videos, respectively, instead of real video streaming as required in
our use case scenario, which introduces an additional layer of complexity.
4.2 Real-time media streaming solutions
Regarding the design and implementation of real-time protocols for streaming
several approaches have been proposed. The RTP (Real Time Protocol) [3] is an
RFC standard and one of the first solutions for real-time communications in 1996.
Simultaneously, the most relevant companies developed their proprietary solutions
as Microsoft Netshow or MMS, ex-Macromedia RTMP, etc [19].
A tailored version of this protocol for streaming is the RTSP (Real Time
Streaming Protocol)[20] appeared in 1998, which included specific interfaces for
stream control, playing, stopping, etc. Many different tailored protocols derived
from this have been developed for particular cases such as the SRTP [13] for security purposes, SIP [2] for session and distribution goals or the WebRTC 10 for
a browser-to-browser streaming.
Nowadays, there are several commercial approaches for real-time P2P streaming such as Octoshape11 and PPLive12 . A comprehensive survey of P2P media
streaming systems can be found in [21]. Octoshape has been used to broadcast
live streaming and help CNN serve a peak of more than a million simultaneous
viewers. It provides several delivery technologies such as loss-resilient transport,
adaptive bit rate, adaptive path optimization and adaptive proximity delivery.
The Octoshape solution splits the original stream into a number K of smaller
9
10
11
12
http://www.rationalcraft.com/Winscape
http://www.webrtc.org
http://www.octoshape.com
http://www.pplive.com
An Immersive View Approach by Proof-of-concept Implementation
13
equal-sized data streams but a number N > K of unique data streams are actually
constructed. In such a way, a peer receiving any K of the N available data streams
is able to play the original stream.
PPLive, one of the most popular P2P streaming software in China, consists of
several parts: (i) Video streaming server: providing the source of video content; (ii)
Peers; (iii) Directory server: automatically registers user information to and cancels
user information from PPLive clients; and (iv) Tracker server: records information
of all users watching the same content; When the PPLive client requests some
content, the tracker server checks if there are other peers owning the content
and sends the information to the client. PPLive uses two major communication
protocols: Registration and Peer Discovery protocol, and P2P Chunk Distribution
protocol.
Architecturally, P2P streaming solutions have different goals compared to our
client-server model of immersive view. These, however, can be used to provide
ground for extending the reference architecture to enable distribution of media
streaming data over P2P topology, in cases multiple pupils wish to connect to a
remote space of the school. As we have discussed in Section 3.5, a specific solution
providing immersive view to multiple peers (pupils) is to enable passive remote
space control with only software-simulated client-side immersive view.
4.3 Security of real-time media streaming
The LIFE application integrates the Secure Real-time Transport Protocol (SRTP)
for protecting live media data in streaming. The SRTP, a profile of the RTP,
aims at providing confidentiality, message authentication, and replay protection to
the RTP-based media streaming. Its main goal is to enable strong cryptographic
operations and, at the same time, high throughput with low packet extension
(minimal additional cost of data transmission). If using the default encryption
settings of SRTP the RTP payload and the SRTP payload have exactly the same size.
Essentially, the SRTP implementation is a “bump in the stack” between the
RTP application and the transport layer. The SRTP intercepts RTP packets down
the stack, performs secure operations on the packet, and forwards an equivalent
SRTP packet to the sending node. On the receiver side, the SRTP intercepts SRTP
packets, performs secure operations on the SRTP packet, and passes an equivalent
RTP packet up in the stack (to the RTP application).
The underlying cryptographic blocks are an additive stream cipher for encryption and a keyed-hash function for message authentication and integrity. The
default master key length is 128 bits and 112 bits for the master salt. The encryption and decryption process of RTP payload have the same computational
operations and, consequently, the same computing cost.
Encryption: AES-CM with 128 bits session key and 112 bits session salt.
Authentication: HMAC-SHA1 with 128 bits session key and 112 bits session salt.
Widener et al. [22] propose an approach of differential data protection where
different portions of the media streaming could have different protection policies
enforced during streaming. Authors identify three types of policies on video streaming: general protection policy regardless of any image streams, policy governing
access to image streams, and policy governing access to filters for a particular
stream. In order to request an image stream an entity needs a credential with
assigned rights for that. A credential contains a collection of general access rights
and a collection of rights specific to object. Access rights define peer accessibility
to portions of streaming data and specify what filters have to be used to process
media data after acquiring the streaming.
14
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
Liao et al. [23] follow the approach [22] applied to the domain of secure media
streaming over peer-to-peer networks. Upon initial joining of peer in the streaming
system, the peer contacts a pre-defined authentication server to obtain a credential
with similar structure and usage as in [22].
The works of Widener et al. and Liao et al. focus on streaming protection
by means of certificates and policies with local or middleware monitoring and
enforcement of the expressed rights. In their approaches the stream data is left
unprotected during network transmission and easy to intercept/modify by other
(malicious) peers. Even more, recent studies on security and privacy issues in
peer-to-peer streaming systems [24] show that commercial streaming solutions do
not perform encryption protection on data during transmission, which makes the
overall system loose not only confidentiality but also authenticity and privacy.
Our approach provides complementary solution to [22, 23] in the way that we
adopt the usage of certificates to enable an access control process between two
entities establishing requested media stream, and exchange of cryptographic keys
used to secure streaming data integrity and confidentiality during transmission.
Thus, any malicious peer intercepting (joining) the streaming will not gain access
to the streaming data.
4.4 Remote media space monitoring
The third area of research related with our approach is the control of remote
media space for video monitoring. Most of the improvements in this area have
focused on the inclusion of dedicated hardware in the camera device to compute
the monitoring itself, the most evident case is the widespread use of the PTZ
cameras dedicated to video surveillance[25, 26]. Many efforts have been spent in
the improvements of object monitoring (persons, animals, etc.) in a limited area
monitored by a certain number of cameras.
Instead of shifting the camera following the movements of a recorded object, our
approach follows a different perspective by deducing camera movements according
to user’s motion (e.g., user’s head location/position).
5 Conclusions
We have presented a reference architecture conceptualizing the immersive view
effect by considering various heterogeneous devices for observer’s position tracking
and an enriched movement control over remote multimedia streaming sources. We
have also presented a proof-of-concept implementation, called LIFE, which enables
an immersive view effect of school activities during children hospitalization. The
goal of LIFE is to contribute to a vision of future AAL applications for “Alleviating
Children Stress During Hospitalisation” [4]. Functional and security aspects of
LIFE have been also presented along with implementation details of how these
aspects have been achieved.
Given the targeted group of users - kids of age ranging from 8 to 16 years
old, an important concluding aspect of LIFE presentation is to identify a set
of settings optimised to achieve a balance between performance and quality of
the immersive view effect. There are two main parameters to take into account:
total time of processing secure video streaming (of given frames per second), and
video resolution enabling us to maximise the immersive view effect. The work
in [27] reports several comparative details on expected watchability percentage
over reduced frame rates of video processing. Depending on several parameters,
An Immersive View Approach by Proof-of-concept Implementation
15
the authors come to the conclusion that 10 fps results in 80% expected watchability,
while 15 fps or higher results in 90% or higher. There are several comparison results
between 5 to 10 fps and 10 to 15 fps with an interesting conclusion that in some
conditions for 5 fps 80% watchability can be achieved.
Given the expected pupils’ age, we concluded that 10 fps is the lower bound to
avoid any emotional discomfort of pupils using LIFE technology for long periods.
According to Table 2, secure media streaming is feasible for all video resolutions
under 10 and 15 fps, while increasing above 15 fps the feasible resolution decreases.
Using 30 fps secure media streaming is feasible up to 800x592 (given the granularity
of resolutions tested).
The video resolution settings allow us to increase the quality of video streaming
and maximise the immersive effect of resized frames on pupil’s movements. The
higher the resolution is, the more space is available to move the resized window
without using the trackerpod for changing the camera angle. The cons of this
setting are the need to transport much larger amount of data, which implies a
higher bandwidth and probably more network delay.
We conclude that the optimal settings of LIFE video streaming is 1280x720
pixels of 10 fps. The results show that even without a high bandwidth connection
LIFE application is still able to provide good quality of immersive view experience
for the defined optimal settings taking into account potential network delay. We
argue that the higher resolution of 1280x960 only gives more vertical space for the
resized frame, which is the less important axis (of use) for pupils.
6 Future Work
Future work includes the development of a calibration module for LIFE able to
adjust video resolution and fps parameters to optimal values for a given user
taking into account quality of network connection and perception aspects of the
user. From a security point of view, the reference architecture can be enriched by
supporting DESEOS security pattern framework [10] in order to flexibly address
security solutions which may be adopted in different application domains. Another
future work field is to address an integration of the LIFE application as an AAL
service, which can facilitate several communication issues by means of the underlying platform’s facility. Nowadays, there are several projects offering a platform
for AAL services, such as [28, 29].
Finally, a future work will focus on refining and formalizing the commands
format and communication protocol of LIFE to support enriched control of remote
media spaces with heterogeneous image sources.
Acknowledgements
This work is supported by the project DESEOS (TIC-4257) Dispositivos Electrónicos
Seguros para la Educación, Ocio y Socialización (meaning “secure electronic devices for education, entertainment and socialization”) funded by the government
of Andalucı́a.
References
1. Douglas C. Dowden, Richard D. Gitlin, and Robert L. Martin. Next-generation networks.
Bell Labs Technical Journal, 3(4):3–14, August 2002.
16
P. Antón, A. Maña, A. Muñoz, H. Koshutanski
2. J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R Sparks, M Handley, and E Schooler. SIP: Session Initiation Protocol. Technical report, RFC 3261, 2002.
3. H Schulzrinne, S Casner, R Frederick, V Jacobson, RTP: A Transport Protocol for RealTime Applications. RFC 3550, July 2003.
4. Pablo Antón, Antonio Maña, Antonio Muñoz, and Hristo Koshutanski. Live Interactive
Frame Technology Alleviating Children Stress and Isolation during Hospitalization. 3rd
International Workshop on Ambient Assisted Living (IWAAL 2011), pp.92–100, Málaga,
Spain 2011.
5. Ricardo Costa, Davide Carneiro, Paulo Novais, Luı́s Lima, José Machado, Alberto Marques, and José Neves. Ambient Assisted Living. 3rd Symposium of Ubiquitous Computing
and Ambient Intelligence, vol. 51 of Advances in Soft Computing, pages 86–94, 2009.
6. Maryanne Lockin. The redefinition of failure to thrive from a case study perspective.
Pediatric nursing, 31(6):474–479, 2005.
7. A Muñoz Hoyos. Influencia de la institucionalización sobre el crecimiento, desarrollo y
comportamiento en el niño. Part of the course: Principales problemas sociales en la infancia. Educación y cuidados. Escuela Universitaria de Ciencias de la Salud. Granada,
1996.
8. Jim Giles. Inside the race to hack the Kinect. The New Scientist, 208(2789):22–23, 2010.
9. Mark Ingebretsen. In the News. IEEE Intelligent Systems, 25(4):4–8, July 2010.
10. Pablo Antón, Antonio Muñoz, Antonio Maña, and Hristo Koshutanski. Security-enhanced
ambient assisted living supporting school activities during hospitalisation. Journal of
Ambient Intelligence and Humanized Computing, 3(3):177–192, December 2012.
11. D. Marples and P. Kriens. The Open Services Gateway Initiative: an introductory
overview. IEEE Communications Magazine, 39(12):110–114, 2001.
12. DR Kuhn, WT Polk, VC Hu, and SJ Chang. Introduction to public key technology and
the federal PKI infrastructure. Technical Report February, 2001.
13. M. Baugher, D. McGrew, Inc. Cisco Systems, M. Naslund, E. Carrara, K. Norrman, and
Ericsson Research. The Secure Real-time Transport Protocol. Technical report, Network
Working Group, 2004.
14. B. Amos and M. Wang. Stereo television viewing for remote handling in hostile environments. Conf. Remote Syst. Technol., Proc.; (United States), 26, January 1978.
15. M. M. Clarke. Remote Systems: Some Human Factors Issues in Teleoperator and Robot
Development: An Interactive Session. Proceedings of the Human Factors and Ergonomics
Society Annual Meeting, 26(9):763–765, October 1982.
16. C. J. Overbeeke and M.H. Stratmann. Space through movement. A method for threedimensional image presentation. PhD thesis, Technische Universiteit Delft, 1988.
17. William W Gaver, Gerda Smets, and Kees Overbeeke. A Virtual Window on Media Space.
In CHI, pages 257–264, 1995.
18. Johnny Chung Lee. Hacking the Nintendo Wii Remote. IEEE Pervasive Computing,
7(3):39–45, July 2008.
19. Wei Chen, Chien-chou Shih, and Lain-jinn Hwang. The Development and Applications
of the Remote Real-Time Video Surveillance System. Tamkang Journal of Science and
Engineering, 13(2):215–225, 2010.
20. H Schulzrinne, U. Columbia, A. Rao, Netspape, R. Lanphier, and RealNetwork. Real Time
Streaming Protocol (RTSP). 1998.
21. Gu Yingjie, Francesca Piccolo, Shihui Duan, Yunfei Zhang, and Ning Zong.
Survey of P2P Streaming Applications.
Technical report, February 2013.
http://tools.ietf.org/html/draft-ietf-ppsp-survey-04.
22. P Widener, K Schwan, and F E Bustamante. Differential data protection for dynamic distributed applications. In Computer Security Applications Conference, 2003. Proceedings.
19th Annual, pages 396–405, 2003.
23. Rongtao Liao, Shengsheng Yu, and Jing Yu. SecureCast: A Secure Media Streaming
Scheme over Peer-to-Peer Networks. In Workshop on Intelligent Information Technology
Application (IITA 2007), pages 95–98. IEEE, December 2007.
24. Gabriela Gheorghe, Renato Lo Cigno, and Alberto Montresor. Security and privacy issues
in P2P streaming systems: A survey. Peer-to-Peer Networking and Applications, 4(2):75–
91, 2011.
25. Thang Dinh, Qian Yu, and Gerard Medioni. Real time tracking using an active pan-tiltzoom network camera. In 2009 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pages 3786–3793. IEEE, October 2009.
26. Yiming Li, Bir Bhanu, and Wei Lin. Auction protocol for camera active control. In 2010
IEEE International Conference on Image Processing, pages 4325–4328. IEEE, 2010.
27. R.T. Apteker, J.A. Fisher, V.S. Kisimov, and Hanoch Neishlos. Video acceptability and
frame rate. IEEE Multimedia, 2(3):32–40, 1995.
28. Mohammad-reza Tazari, Francesco Furfari, Juan-Pablo Lázaro Ramos, and Erina Ferro.
The PERSONA Service Platform for AAL Spaces. Handbook of Ambient Intelligence and
Smart Environments, pages 1171–1199. Springer US, Boston, MA, 2010.
An Immersive View Approach by Proof-of-concept Implementation
17
29. Sten Hanke, Christopher Mayer, Oliver Hoeftberger, Henriette Boos, Reiner Wichert,
Mohammed-R. Tazari, Peter Wolf, and Francesco Furfari. universAAL An Open and
Consolidated AAL Platform. Ambient Assisted Living, pages 127–140. Springer Berlin
Heidelberg, 2011.