Audio Protection with Removable Watermarking
Transcription
Audio Protection with Removable Watermarking
DEPARTMENT OF ELECTRICAL AND INFORMATION ENGINEERING DEGREE PROGRAMME IN INFORMATION ENGINEERING AUDIO PROTECTION WITH REMOVABLE WATERMARKING FOR MOBILE DISTRIBUTION Author ___________________________________ Marko Brockman Supervisor ___________________________________ Tapio Seppänen Accepted _______/_______2009 Grade ___________________________________ 2 Brockman M. (2009) Audio Protection with Removable Watermarking for Mobile Distribution. University of Oulu, Department of Electrical and Information Engineering, Oulu, Finland. Master’s thesis, 77 s. ABSTRACT The failure of encryption-based digital rights management systems and the growing popularity of online music stores have led to an increasing need for new technologies that could be able to protect audio copyrights while it is stored in an unprotected format. Digital watermarking can be used for creating solutions based on embedding inaudible identifiers known as digital fingerprints into digital audio. These fingerprints can be used for detecting the origin of the content in case the content is distributed illegally. This work designed and implemented an audio protection system, utilizing removable watermarking and fingerprinting technologies. An audible watermark was inserted to free preview samples, which were made available in the server. The user could download and listen to the samples, and request a license from the server, which enabled the client application to transform the audible fingerprint in the preview song into a unique inaudible fingerprint, containing user identity information. The fingerprint watermark was tested against an extensive set of signal processing attacks, which made inaudible changes to the audio. The inaudibility of the watermark was also tested with a dozen test users. The fingerprint watermark was robust against all attacks, and average listeners could not tell the difference between watermarked and original versions in the listening tests. The results show that digital watermarking and fingerprinting technologies can be used for creating a robust and imperceptible digital rights management system. The use of removable watermarking provides increased usability, especially in mobile audio distribution. Keywords: Digital rights management, DRM, frequency hopping, robust watermarking, audio synchronization 3 Brockman M. (2009) Audion suojaaminen poistettavalla vesileimauksella mobiilijakelussa. Oulun yliopisto, sähkö- ja tietotekniikan osasto. Diplomityö, 77 s. TIIVISTELMÄ Salaukseen perustuvien digitaalisten käyttöoikeuksien hallintatekniikoiden epäonnistuminen on luonut kasvavan tarpeen uusille teknologioille, joilla voitaisiin suojata digitaalisen musiikin tekijänoikeuksia salaamattomassa muodossa. Digitaalisella vesileimauksella voidaan luoda ratkaisuja, jotka perustuvat kuulumattomien tunnistimien, digitaalisten sormenjälkien, upottamiseen digitaaliseen musiikkiin. Näitä sormenjälkiä voidaan käyttää musiikkitiedoston alkuperäisen omistajan tunnistamiseen, jos sisältöä jaetaan laittomasti. Tässä työssä on suunniteltu ja toteutettu audionsuojausjärjestelmä, jossa hyödynnetään poistettavaa vesileimausta ja sormenjälkitekniikkaa. Palvelimelle ilmaiseksi saataville laitettuihin esikuuntelukappaleisiin upotettiin ensin kuuluva vesileima, jonka käyttäjät pystyivät lataamaan omalle mobiililaitteelleen. Käyttäjät saivat kuunnella esikuunteluversioita, ja ladata palvelimelta lisenssin, jonka avulla päätelaite osasi muuntaa kappaleessa olevan kuuluvan vesileiman käyttäjäkohtaiseksi digitaaliseksi sormenjäljeksi. Sormenjälkivesileiman kestävyyttä testattiin kattavaa signaalinkäsittelyyn perustuvaa hyökkäysvalikoimaa vastaan, joka teki kuulumattomia muutoksia kappaleisiin. Vesileiman kuulumattomuutta testattiin myös kymmenellä testikäyttäjällä. Sormenjälkivesileima kesti kaikkia hyökkäyksiä, eikä keskivertokuuntelija pystynyt erottamaan vesileimattua ja alkuperäistä kappaletta kuuntelutesteissä. Tulokset osoittavat että digitaalisella vesileimauksella ja sormenjälkitekniikalla voidaan luoda kestävä ja kuulumaton digitaalisten käyttöoikeuksien hallintajärjestelmä. Poistettavan vesileimauksen käyttö parantaa järjestelmän käytettävyyttä etenkin mobiilissa musiikinjakelussa. Avainsanat: Digitaalinen käyttöoikeuksien hallinta, DRM, taajuushyppely, kestävä vesileimaus, audion synkronointi 4 TABLE OF CONTENTS ABSTRACT TIIVISTELMÄ TABLE OF CONTENTS PREFACE ABBREVIATIONS 1. INTRODUCTION ..............................................................................................8 2. DIGITAL RIGHTS MANAGEMENT...........................................................10 2.1. Background ............................................................................................10 2.2. Rights models .........................................................................................11 2.3. Rights Expression Languages ................................................................12 2.4. DRM reference architecture ...................................................................12 2.4.1. The content server .....................................................................13 2.4.2. The license server......................................................................14 2.4.3. The client...................................................................................14 2.5. Mobile DRM ..........................................................................................15 2.5.1. OMA DRM Version 1.0............................................................15 2.5.2. OMA DRM Version 2.0............................................................16 2.6. DRM in digital audio..............................................................................16 2.6.1. Audio CDs.................................................................................17 2.6.2. Online music stores ...................................................................17 3. DIGITAL WATERMARKING FOR AUDIO...............................................19 3.1. Background ............................................................................................19 3.2. Watermark characteristics ......................................................................20 3.3. Watermarking methods ..........................................................................21 3.3.1. General watermarking scheme ..................................................22 3.3.2. Watermarking domains .............................................................23 3.3.3. Direct sequence spread spectrum method .................................24 3.3.4. Frequency hopping method .......................................................26 3.4. Digital watermarks in DRM ...................................................................27 3.4.1. Watermarking and encryption ...................................................27 3.4.2. Digital fingerprinting.................................................................29 3.4.3. Tamper detection.......................................................................30 3.4.4. Attacking digital watermarks ....................................................30 3.5. Removable watermarking.......................................................................33 3.6. Commercial solutions.............................................................................33 3.6.1. MarkAny ...................................................................................34 3.6.2. Verance......................................................................................34 3.6.3. Philips........................................................................................35 3.6.4. Alpha Tec ..................................................................................35 4. ALGORITHM FOR REMOVABLE WATERMARKING AND FINGERPRINTING ........................................................................................36 4.1. Embedding..............................................................................................36 4.2. Noise transform ......................................................................................38 5 4.3. 4.4. 4.5. Reading the fingerprint...........................................................................39 Performance evaluation..........................................................................40 4.4.1. Robustness.................................................................................40 4.4.2. Imperceptibility .........................................................................43 Discussion ..............................................................................................44 5. DESIGN OF ROBUST AUDIO PROTECTION SYSTEM.........................48 5.1. General system description ....................................................................48 5.2. Use cases ................................................................................................48 5.2.1. Downloading songs for preview................................................49 5.2.2. Purchasing a license for a song .................................................49 5.2.3. Requirements specification .......................................................50 5.3. System architecture ................................................................................51 5.3.1. Music store ................................................................................52 5.3.2. Client application ......................................................................53 5.3.3. Communications protocol .........................................................54 5.4. Software design ......................................................................................54 5.4.1. Client application ......................................................................55 5.4.2. Server application......................................................................58 5.4.3. Sequence diagrams ....................................................................60 6. SYSTEM IMPLEMENTATION AND TESTING........................................61 6.1. Software platforms .................................................................................61 6.2. Limitations..............................................................................................61 6.3. Functional tests.......................................................................................62 6.3.1. Downloading a list of preview files ..........................................62 6.3.2. Downloading a preview file ......................................................62 6.3.3. Music file playback ...................................................................63 6.3.4. Requesting a license for a preview file .....................................63 6.3.5. Generating unique licenses........................................................63 6.3.6. Noise transform .........................................................................64 6.3.7. Maintaining the network connection.........................................64 6.4. Technical tests ........................................................................................64 6.4.1. Preview file download time.......................................................65 6.4.2. License file download time .......................................................65 6.4.3. Noise transform processing time...............................................66 6.4.4. Multiple users support ...............................................................66 6.4.5. Server stability...........................................................................66 6.4.6. Client stability ...........................................................................66 6.5. User tests ................................................................................................67 6.6. Discussion ..............................................................................................67 7. DISCUSSION....................................................................................................69 8. SUMMARY.......................................................................................................71 9. REFERENCES .................................................................................................72 10. APPENDICES ..................................................................................................76 6 PREFACE This thesis was completed as part of the Zirion project at MediaTeam Oulu, a research group in the Information Processing Laboratory of the University of Oulu, Finland. The project focused on creating new value-adding services and content distribution channel prototypes that function in a real environment. The work started in January 2008 with algorithm development and it was followed by system design and implementation in the summer. This thesis document was written during the fall. I would like to acknowledge Professor Tapio Seppänen for giving essential advice during the writing of this thesis and digital watermarking researchers Mikko Löytynoja, Anja Keskinarkaus and Anu Pramila for fun and encouraging work atmosphere during my years of DRM and watermarking research in MediaTeam in 2004-2008. I would also like to thank Professor Mika Ylianttila for improvement suggestions during the reviewing process and Dr. Pertti Väyrynen for proofreading the text. Special thanks to my family for constant support during my studies, and to my wife Tiia for all the love and sandwiches. Oulu, December 5, 2008 Marko Brockman 7 ABBREVIATIONS 3G AACS AD/ DA CD CEK CEK DFT DRM DVD FFT FP1 HAS HD DVD HTTP ICICS ID IFFT IP IT ITU ITU-R JND JPEG LAME LCG MOS MP3 MPEG ODRL OMA REL S60 SDK SDMI TCP UML URL USB VCMS/A WAV XML XrML 3rd Generation mobile phone standard Advanced Access Content System Analog to Digital / Digital to Analog converter Compact Disc Content Encryption Key Content Encryption Key Discrete Fourier Transform Digital Rights Management Digital Versatile Disc Fast Fourier Transform Feature Pack 1 Human Auditory System High-Definition Digital Versatile Disc Hypertext Transfer Protocol International Conference on Information and Communications Security Identification Inverser Fast Fourier Transform Internet Protocol Information Technology International Telecommunication Union ITU / Radiocommunication Sector Just Noticeable Difference Joint Photographic Experts Group LAME Ain’t an MP3 Encoder Linear Congruential Generator Mean Opinion Score MPEG-1 Audio Layer 3 Motion Picture Experts Group Open Digital Rights Language Open Mobile Alliance Rights Expression Language Symbian S60 platform Software Development Kit Secure Digital Music Initiative Transmission Control Protocol Unified Modeling Language Uniform Resource Locator Universal Serial Bus Verance Copy Management System for Audio content Waveform Audio Format eXtensible Markup Language eXtensible Rights Markup Language 8 1. INTRODUCTION The future of music sales is online and mobile. According to an eMarketer report, the sales of music through physical medium is going to drop by almost two thirds in just five years from 2006 to 2011. Online and mobile sales are predicted to be the major sales channel with a share of 56.5 percent from the total music sales worldwide. [1] One of the enablers for online and mobile music has been digital rights management. It provides the means for protecting the content ownership and copyrights by restricting unauthorized distribution and usage. However, traditional DRM solutions have proved controversial. Different techniques were tried for preventing the copying of audio CDs, but they caused compatibility problems with so many players that DRM is no longer used in audio CD distribution. In mobile music, there are separate groups of music player manufacturers and online music retailers using different DRM techniques, which are not interoperable. This is not an ideal situation from the consumer perspective, because DRM protected music purchased from an online music store may be playable in digital audio players of only one manufacturer. The dominant digital music format is currently MPEG-1 Audio Layer 3, more commonly known as MP3. It is also the de facto standard encoding of music played on digital audio players. The problem with MP3 regarding mobile music distribution is that it does not support copy protection. This has caused online music retailers to use other DRM enabled proprietary audio formats. The aim is to make using the music files difficult in ways not specified and allowed by the record companies. Most of the current encryption based solutions can be circumvented with burning the music to CD and then ripping it back into some unprotected format such as MP3. Digital watermarking can be used for creating a solution for the rights management problem of digital audio. The nature of watermarking allows the audio to be unencrypted because the content protection is embedded into the audio signal itself. The use of an unprotected file format enables the music to be played on any digital audio player, and the music can also easily be burned to CD as well. This eliminates many of the attacks used on other DRM systems and allows better consumer satisfaction because of wider usability. The problem is, however, that digital watermarks can be vulnerable to signal processing attacks. The watermarked signal can be modified so that the modification is inaudible for a human listener, but the watermark signal may be destroyed in the process. This is a major challenge for all watermarking applications, and one of the emphasis areas of this thesis is the robustness of the watermarking algorithm. The goal of this thesis was to design and implement a robust audio protection system using removable watermarking and fingerprinting techniques. The emphasis of the design process was in the watermarking algorithms, which were developed and tested on Matlab environment. The algorithms consisted of an improved version of the audio protection scheme presented in [2], which is also based on research done in the Zirion project in MediaTeam Oulu research group. The most significant improvement presented in this thesis is a method where the audible watermark is transformed into an imperceptible digital fingerprint, when the user purchases a license for the content and the audible noise signal is removed from the audio file. An algorithm for extracting the fingerprint from the audio was also developed. The new algorithm is also improved in terms of modularity and robustness. A music store server and a client application were developed for performing the user tests. The 9 implementation platform chosen for the server was a Java application on a Linux machine, and Symbian S60 3rd Edition platform for the client application. The watermarking algorithms were implemented to the S60 platform to be used by the client application. The perceptual quality of the watermark was analyzed with a listening test by a dozen test users, and the robustness was tested against an extensive set of attacks against watermarked audio clips. This thesis is divided into five main chapters. Chapter 2 explains the basic principles, components and concepts of digital rights management with a focus in digital audio. Chapter 3 introduces digital audio watermarking and presents its major characteristics, methods and applications. Chapter 4 presents the watermarking algorithms which form the core part of the implemented software, and chapter 5 covers the design process of the audio protection system which is the major contribution of this thesis. Chapter 6 presents shortly the implementation details and the results of user testing, and finally, chapter 7 includes the final discussion of the contribution of this thesis and analyzes the possibilities for future work. 10 2. DIGITAL RIGHTS MANAGEMENT The term Digital Rights Management (DRM) refers to controlling and managing rights of digital intellectual property. Usually, the term is used when referring to managing rights of digital media content, such as images, video and audio, but in a broad view, it also includes software copy protection scenarios. This chapter introduces the basic concepts of DRM systems, with a special focus on the usage of DRM in managing copyrighted digital audio. 2.1. Background Before the digital age when copying information was not as easy as today, there was rarely a need for any special rights management systems. Instead, rights were tightly bound to the media format itself. If you bought a book, you were not allowed to create a copy of it, but even if you wanted to, it would not have been a very easy task to achieve. Therefore, the book as a medium restricted its usage in a way the publisher wanted. The introduction of Compact Cassettes facilitated the copying of music a little, but the quality of the new copy was never as good as the original. [3] The birth of DRM has primarily been a consequence of digital formats and the Internet. The ability to distribute content easily and affordably between computers has allowed piracy to grow fast, and DRM has been created as a countermeasure by the content producing industry. The first DRM generation focused primarily on security an encryption as a means of countering unauthorized copying. The content was encrypted, and only paying customers could unlock and use it. The next generation DRM systems introduced a whole new range of capabilities, such as description, identification, trading, protecting, monitoring and tracking of rights usages. In other words, DRM started to include everything a person can do with media content. [4] The environment of digital rights management is mainly characterized by three aspects: law, technology and business models. The rights which need to be managed exist because of law. For every original work which is made accessible through some medium of expression, the copyright law assigns copyrights to the author. The author has the right to reproduce, modify, distribute, perform, or display the work publicly. These rights are essential to DRM. The other important law to DRM is contract law, because all licenses between the content provider and the consumer are basically contracts, which usually grant access to intellectual property for some monetary compensation. DRM would not exist without laws, and it is also the current response of the law for the new content usage scenarios technology has made possible. From the legal point of view, DRM determines which usages are authorized and which are not. [3] Technology is the second major enabler for DRM. Without the ability to represent and enforce rights models digitally, the whole concept would remain theoretical. However, technical implementations have their challenges, especially in the field of security. Studies have shown that every DRM system can be broken, and therefore, the goal usually is to create a security that is robust enough [5]. The DRM reference architecture, which is the technical base of most of the DRM solutions, is presented in section 2.4. 11 Business models are the driving force of DRM development. DRM enables new ways to distribute content and opens up new business opportunities. This was discovered after the first DRM systems failed in making the online world work exactly like the offline world in terms of content distribution and rights management. The Internet was seen merely as a new medium, rather than a whole new world of business model possibilities. DRM works where it supports the business models and not the other way around. Examples of the new business models are paid downloads, pay-per-view and superdistribution [3]. Economic factors such as the market situation play a role when the content providers determine which business models to use for content distribution. These factors also have an effect on consumers and their will to start using new content delivery services. The functionality of DRM can be split into two categories, as illustrated in Figure 1. DRM is about both managing digital rights and digital management of rights. The former in the sense that rights holders must identify their content, collect metadata about it and assert rights they have to the content. In addition, the rights holders must develop business models for content distribution in order to gain benefit from their rights to the content. In other words, DRM enables management of digital rights for the rights holders. The other category, digital management of rights, concerns enforcing digitally the distribution and usage rules set by the rights holders. Most of the DRM functions fall into the enforcement category. [4] Figure 1. The two parts of DRM. 2.2. Rights models Rights have traditionally been divided into three categories: legal, transactional and implicit rights. Legal rights are the ones you get when you produce some kind of original work that falls under the copyright law categories. Legal rights may also be applied by some legal procedure, such as applying for a patent. The second type of rights is transactional rights. These are rights that you receive or give up because of some transactions, such as buying or selling. The third type is implicit rights. They are rights which are bound to the medium that the information is in. For example, a book allows the owner to read the book as many times as he wants, and also sell or lend the book to someone else. [3] The evolution of DRM has led to the introduction of rights models, which specify more accurately the types of rights the digital rights management system can use. The rights models are important when the business models related to digital media are designed. In the new approach, the rights are also divided into three fundamental categories: render, transport and derivative rights. Render rights are the rights to present the digital content on some output medium. For example, printing or viewing on a screen are render rights. The second type of rights is transport rights, which concern copying, moving or loaning the content. The third rights type, derivative 12 rights, has to do with manipulating the original content in a way that a new work is produced. The new content can be extracted or edited from the original, or the original content can be embedded into some new context, where a new work is created. [3] 2.3. Rights Expression Languages Although rights can be expressed with simpler formats, several complex Rights Expression Languages (REL) have been developed for expressing the rights specifications. A REL is a formal language, not open to interpretation like the copyrights laws, but instead defines the rights precisely as a programming language. The most popular languages are Open Digital Rights Language (ODRL) and eXtensible Rights Markup Language (XrML). An overview of a Rights Expression Language is presented in Figure 1. [6] Figure 2. Overview of a Rights Expression Language. The ODRL is an XML-based language, which aims at developing an open standard for rights expressions. It is managed by ODRL Initiative, which supports open-source DRM projects implementing ODRL specifications [3]. The Open Mobile Alliance (OMA) has adopted ODRL in its REL specification already in OMA DRM specification version 1.0 and continued further support in versions 2.0 and 2.1. [7][8][9][10] The XrML is also based on XML, and it has been selected as the basis for MPEG32 REL. XrML is developed by ContentGuard, which has a range of patents covering its usage. License fees are applied if XrML is used in a context covered by the patents. [11] 2.4. DRM reference architecture System enforcing rights models is called a DRM system. Although the DRM system architecture depends heavily on the specific usage scenario, there are some common components, which are found on most of the systems. This common theme is called DRM reference architecture. It consists of three major components: the content 13 server, the license server and the client. The DRM reference architecture is illustrated in Figure 2. Figure 3. The DRM reference architecture. 2.4.1. The content server The content server includes a content database for all content files, and the functionality to prepare content for DRM controlled distribution. In addition to the content itself, the database stores metadata information about the content, such as title, author, format and price. For end users, the content server allows access to the DRM enabled content downloads. The content files are usually manipulated in some way in order to prepare them for controlled distribution when they are imported into the content repository. This is done by the content packager component of the content server. All files which are brought into the system by the content providers are first processed by the content packager and then placed into the content database for storing. Another important task of the content packager is the specification of rights the content provider wants to allow for the user. Separate rights can be specified for previewing purposes, and several purchasing options can be offered to the user. The content packager can be for example a web interface running on top of the server providing database access for the content providers. An essential feature of the content packager is batch processing. As content providers generally add plenty of content in a single session, it must be possible to input multiple files with customizable rights models into the system. 14 2.4.2. The license server Although the licensing system can be implemented in many ways, the license server in a typical DRM system creates licenses for each user from content rights, user identities and content encryption keys. The rights and possible encryption keys are provided by the content server, and the client provides information about the user identity. As the communications path between the license server and the client is usually insecure, the data transmissions must be protected with public-key cryptography. In small scale systems, the content and license servers can be combined and used in a single process. In addition to generating and transmitting licenses to the client, the license server is responsible for the financial transaction of the licensing process. The license server uses the identity of the user to fetch the necessary details concerning the transaction, such as credit card or account details. The identity of the user can be created from a username, social security number, or any other piece of information which accurately identifies the user. 2.4.3. The client The DRM client side application can reside in a variety of platforms depending on the usage scenario. The primary functionality of the client is contained in a DRM controller, which can either be an independent piece of software or it can be integrated into the content rendering application itself. In some solutions, the DRM controller is an external piece of dedicated hardware. The main functions of the controller are to gather identity information from the user, obtain licenses from the license server, authorize the rendering application to have access to the content and perform the possible content decryption. Additionally, the controller delivers the user’s commands to the license server for requesting licenses and checking the payment options. The DRM controller must support public-key cryptography for secure data transmission between the client and the license server. The usage authorization scenarios depend on the used rights models of the content. The basic model authorizes the user to have access to the content as many times as possible for a single fee. Other models may give or restrict access to the content temporarily regarding the selected payment options. Other possibility is to restrict the number of renderings with a counter-based solution. Securing the usage counter in the client device remains an implementation problem, especially in cases when the user is not required to be online when accessing the content. Trusted computing and hash-based solutions have been proposed for secure storing of the usage counter. [12] There are two types of content rendering applications in the client. Those with built-in DRM support which can handle the content usage restriction and license processing by themselves, and the other non-DRM applications which must be restricted by the DRM system for getting access to the content. The main advantage of the applications with built-in DRM is security. The programs allow only specific functions which the content provider or the rights holder has allowed. However, the disadvantages are huge. The vendors must distribute the application to all content users, and in addition, the users must learn to use the new application. This can be an enormous burden for the users. A more common approach is to let the users use their existing rendering applications, and modify their behavior with plug-ins. For 15 example, the plug-in framework of Adobe Acrobat makes it possible to disable commands such as Print and Save As. The advantage of using plug-ins for DRM purposes is that the users usually have the base application installed, and installing an additional plug-in is clearly a smaller burden compared to installing a whole new application. However, making the plug-in frameworks as secure as individual applications is problematic. 2.5. Mobile DRM The most important player in Mobile DRM industry is the Open Mobile Alliance (OMA), which is a standards body developing open standards for the mobile phone industry. It has members, including mobile phone manufacturers, mobile operators, application and content providers and other IT companies. 2.5.1. OMA DRM Version 1.0 OMA DRM 1.0 was the first industry standard method for protecting mobile content. It was approved in 2004, and it is currently supported in most of the mobile phones in the market. The goal of OMA DRM 1.0 is to follow common DRM practices with conforming to special requirements and characteristics of the mobile domain, while providing basic functionality with some level of security. Version 1.0 provides three methods for content protection and delivery. The methods, forward-lock, combined delivery and separate delivery, are illustrated in Figure 4. Figure 4. Content protection methods in OMA DRM 1.0. The simplest and most supported method is forward-lock, where the content is wrapped in a DRM message and delivered to the mobile device. After that the user cannot send the message or the content to any other device. The send option is removed from all applications where it would normally be, and the file cannot be copied over USB or Bluetooth connection either. Because a forward-lock message 16 does not contain any rights specifications, a set of default rights are applied for the media object. The combined delivery method is similar to the forward-lock except that the DRM message contains also the rights specifications. The rights object defines what the user can do with the content. For example, it can allow a temporary access to the content or allow the content to be rendered for a certain amount of times. As with the forward-lock method, the content and the rights are wrapped in a DRM message. The separate delivery method provides more security by separating the content and the rights. The content is encrypted with symmetric encryption, which makes the content object useless for parties without the Content Encryption Key (CEK). This allows distribution of the content via insecure transport methods. The rights and the encryption key are wrapped in a license object, which must be delivered through secure transport channel. Unlike forward-lock and combined delivery, the content object can be delivered freely without compromising the business model behind the rights. Superdistribution, which means distributing the content objects directly between users, is part of the separate delivery method and it should even be encouraged by the content provider because it may bring additional customers to the system. The users who have received new content can unlock it with acquiring the corresponding license. 2.5.2. OMA DRM Version 2.0 In the first DRM revision OMA focused on the fundamental building blocks for a DRM system, but the security level was not high enough for creating a robust system. The new OMA DRM 2.0 addressed these security issues with new features based on the separate delivery method. The new security model relies heavily on the DRM agent of the user device. The content itself is packaged in a similar secure container encrypted with a symmetrical content encryption key, but in addition it utilizes PKI (Public Key Infrastructure) certificates for increased security. Every device with OMA DRM 2.0 support has an individual PKI certificate with a public and a private key. Every rights object is then encrypted with the public key of the receiver before it is sent over the network. The rights object contains the symmetrical key that is used to decrypt the actual content files. The devices must be registered with the rights issuer before they can receive rights objects. During the registration the client certificate is validated against a blacklist of known hacked devices. This method allows banning the distribution of rights objects from non-trusted devices. 2.6. DRM in digital audio Although digital audio formats have been around for several decades, the record companies did not start using DRM technologies with digital audio until 2002, when BMG introduced a copy protection DRM system to be used with audio CDs. The system failed badly as users reported the CDs would not play on PCs or car CDplayers [13]. The introduction was mainly due to the popularity of peer-to-peer file 17 sharing program Napster between 1999 and 2001, which forced the record industry to start taking the thread of Internet piracy seriously. After that the use of DRM spread to most major record labels, but the current trend seems to find other solutions to the piracy issues of digital music. This chapter discusses the current state of DRM technologies in audio CDs and online music store. 2.6.1. Audio CDs DRM technologies were previously used in digital audio CDs, but major publishers have since abandoned the technology and CDs with DRM are no longer published. The last publisher to give up DRM on audio CDs was EMI in January 2007. [14] The goal of DRM in audio CDs was to prevent unauthorized copying. This was attempted with shuffling the audio content in a way the ripping of the audio into nonDRM formats such as WAV or MP3 would not succeed. The CDs contained a dedicated piece of DRM software to achieve this, and often some bonus data content available for computer usage was also included in the CD. The DRM software caused many problems among legitimate users. The reason was that the discs with installed DRM software were not standard compliant Compact Discs but CD-ROM media discs. This rendered the CDs unplayable on some CD players and computers. Some DRM software included security vulnerabilities that exposed the users’ computers open to exploitations. The most famous incident was in 2005, when the Sony BMG DRM software was discovered to automatically install a rootkit to a PC where the audio CD was inserted. Sony BMG admitted the mistake and agreed to recall CDs with the security problem from stores and publish uninstallers for computers where the rootkit had already been installed. [15] 2.6.2. Online music stores Digital Rights Management has become a common component of online music stores, where it aims at restricting the usage of purchased music. Most of the dominating music stores on the market apply some kind of DRM, but some have decided to offer music both with and without DRM, and others have abandoned DRM completely and sell only unprotected music files. Currently, there is a clear trend towards selling music without DRM, because music stores have noticed that consumers are more willing to pay for wider usability rather than usage restrictions through DRM. The dominant player in the online music store market is the iTunes Store. In June 2008, iTunes had 70 percent market share in online music sales [16]. It uses Apple’s FairPlay DRM technology, which is integrated into the iTunes application used for shopping and managing songs purchased from the iTunes Store. The songs purchased from iTunes can only be played on a computer or with iTunes installed or Apple’s portable media player device iPod. Other MP3 devices or applications do not support audio files with FairPlay DRM. Currently, FairPlay allows users to access their music files from five computers and create a maximum of seven CD copies of any playlist containing tracks purchased from iTunes. FairPlay DRM technology can 18 be broken with burning the music to CD and then ripping them back into any nonDRM format. In April 2007, Apple and the record label EMI announced an option for customers to purchase DRM-free music from iTunes. This was after the CEO of Apple, Steve Jobs, published an article in February 2007 where he disputed the benefits of DRM and wished for record labels to allow Apple to sell music without DRM. Jobs explained that since only 3 percent of music on average iPod was protected with DRM, its significance for the music industry would be negligible. However, the songs purchased from iTunes without FairPlay include the purchaser’s name and other identifying information. [17] [18] Another widely used DRM platform in online music stores is Windows Media DRM system by Microsoft. It was part of the PlaysForSure certification, which has recently been rebranded as Certified for Windows Vista. PlaysForSure was created to challenge FairPlay and to create the de facto standard for music stores other than iTunes. It has widely achieved this, but has still remained a defendant because of the success of Apple’s iPod devices, which are incompatible with DRM music purchased from PlaysForSure stores [19]. However, Nokia chose PlaysForSure DRM for its upcoming Comes With Music service, which is planned to launch during year 2008. It allows users to download an unlimited amount of music with their mobile phones for a period of one year after purchasing the device [20]. Other large online music stores using PlaysForSure DRM are Napster and Wal-Mart Music Downloads. Some online music stores, such as eMusic and Amazon, have decided to sell all their music without DRM software restrictions. This allows music purchased from their stores to be played on any digital audio player, which is a clear advantage over FairPlay or PlaysForSure. Some stores claim that DRM is not beneficial for sales, and encourages publishers and independent music labels to allow distributing their music without DRM restrictions. A German online music store Musicload, which announced in March 2007 that three out of four customer support calls were due to problems caused by malfunctioning DRM systems. [21] 19 3. DIGITAL WATERMARKING FOR AUDIO Digital watermarking is a process where information is embedded into a digital host signal, which can be video, audio, or an image. The watermark can be visible or invisible depending on the application. The term watermark derives from traditional paper watermarking, where a visible mark was inserted on paper for authentication purposes. This chapter presents the main characteristics, methods and applications for digital watermarking with a special focus on audio watermarking. The use of watermarking for DRM purposes is also discussed. 3.1. Background The history of information hiding or steganography can be traced back 4,000 years to ancient Egypt, where information was hidden to small adjustments of characters. Later a Greek storyteller Herodotus explained in his Histories that wax was often used to cover a message on a wooden panel to send secret messages [22]. Since then, numerous methods have been developed for hiding information from changing the heights of letter-strokes in a cover text to microdots and invisible ink. Especially the World Wars urged on the research for a reliable and secure way to deliver secret messages. The advantage of steganography over cryptography is that because the message is embedded in a cover signal, an accidental observer may not even notice there is a hidden message in it. A plain cryptographically encoded message will always attract attention, because it is clear that there is some kind of valuable information worth of encryption. While cryptography is about protecting the content of messages, steganography is about covering their whole existence. Therefore, cryptography and steganography are usually used in combination to ensure the security of the message. Traditional watermarking relates to the invention of papermaking in China, but it did not receive broad use until in the 18th century in America and Europe, where it was used as an authentication method for books and money and also for recording manufacturing dates [22]. Nowadays, paper watermarks are used mainly for proving originality and complicating illegal reproduction of important documents and banknotes. The introduction of digital media formats opened many new possibilities for data hiding. The quality of digital signals is higher than their analog counterparts and copying can be done without losing signal fidelity. Digital video, audio or image can also be easily transmitted over information networks. These advantages enabled the possibility to hide information into a digital signal in a way that it is statistically and perceptually undetectable. In some cases, the hidden information can be recovered even if the digital information is compressed, edited or converted from digital to analog format and back. [23] The first digital watermarking publications date back to 1980’s, but a notable increase in research projects occurred in late 1990’s when the number of publications related to digital watermarking increased to over 100 publications a year. The increase in publications led to the first academic conference on information hiding which was organized in 1996. This was due to concern of the publishing industries 20 over copyright issues because of the easy copying of digital material the new technology had enabled. [24] The primary focus of digital watermarking research has always been watermarking of digital images. The first papers on digital audio watermarking were published in 1999 and several embedding and extraction methods have been developed since then. The main feature of all developed algorithms for audio watermarking is taking advantage of the human auditory system (HAS) to improve the imperceptibility of the watermarks. Compared to the human visual system, the HAS is more receptive to dynamic changes. This is a major challenge for audio watermarking, because inaudibility is often a requirement for audio watermarking applications. [25] The applications for digital watermarking generally concentrate on protection of ownership rights of digital video, audio or image. Typical applications include digital signatures, fingerprinting, broadcast monitoring, authentication, copy control and secret communication. Digital signatures and fingerprints can be used for identifying the owner and the consumer of the content. Broadcast monitoring relates to tracking the appearance of distributed material in television or radio broadcasts. Fragile watermarks can also be used for content authentication to make sure the content has not been altered from the original version. This type of watermark is designed so that it is destroyed if the content is modified. Another application is copy protection, where the embedded information contains the rules for content usage and distribution. Secure communication applications resemble the classical steganography scenarios except that the communication channel is a watermark embedded to a digital signal. [22] 3.2. Watermark characteristics Digital watermarks have three important characteristics that are determined by the type of application: capacity, robustness and imperceptibility. Capacity is the amount of data that can be embedded in the watermark, robustness is the ability of the watermark to resist modifications to the host signal, and imperceptibility means that the watermark cannot be detected from the host signal with human senses. These characteristics are partially exclusionary which means that other areas can be emphasized while deteriorating others. Trade-offs must be accepted for optimal performance. For example, a robust watermark cannot achieve both high capacity and imperceptibility. Figure 5 illustrates this compromise. Figure 5. Trade-offs in digital watermarking. 21 Robustness is generally the most important watermark characteristic in copy protection scenarios. The watermark should be designed so that it is not possible to remove the watermark without a proper secret key. Robustness also means the ability of the watermark to resist modifications in the host signal. First of all, the resistance to a lossy compression such as JPEG or MPEG compression is usually a requirement for most watermark applications. In the case of value-adding watermarking, usually only the robustness against unintentional attacks such as geometrical distortions, lossy compression and AD/DA transforms is required. This is because the user would not gain any benefit for deliberately destroying such a watermark. [26] Imperceptibility is also an important factor in most watermarking applications. It is affected by the embedding method and especially the embedding strength. Various methods can be applied for finding the optimal embedding strength, which is at a threshold where the watermark is as robust as possible but still unnoticeable. For example, the Just Noticeable Difference (JND) method can be applied for finding the threshold and instead of being embedded at constant strength, the watermark signal can be dynamically scaled so that it is just below the JND level. The JND function is very complex in reality, but depending on the signal type it can be modeled with different techniques. [27] The capacity of the watermark can usually be increased directly at the cost of robustness. For example, if there are multiple channels of information in the watermark, the same information can be embedded in all of them for increased robustness, or all channels can be used for different information for increased capacity. As well as robustness and imperceptibility, the capacity requirement depends on the watermark application. A simple allow-or-not copy protection can be achieved with just one bit, whereas complex DRM applications may require hundreds of bits for accurate rights model descriptions, user fingerprints and digital signatures. Other digital watermarking characteristics include algorithm complexity and performance. Generally, the more complicated the algorithm is, the harder it is to break the system. On the other hand, large and complex systems are harder to manage and they are more prone to programming errors. Complex algorithms also lead to increased processing time when embedding or extracting the watermark. This can be an issue in DRM systems where performance is vital for the system to function properly. For example, watermark extraction must not cause delays when playing a video or an audio file. 3.3. Watermarking methods This chapter presents the general watermarking scheme and introduces the most common domains where digital watermarks are embedded. Each watermarking method consists of an embedding and an extraction algorithm. The embedding algorithm inserts the watermark into the content data and the extraction algorithm reads the watermark information from the data. However, in some applications just verifying the existence of the watermark is required. 22 3.3.1. General watermarking scheme The general watermarking scheme introduces the basic functional principles of a digital watermarking system. The scheme consists of watermark embedding and extraction process. The most important component of both processes is the algorithm, which handles the system inputs and produces the output of the process. The algorithm details are discussed in more detail in section 3.3. where the different watermarking methods are introduced. The basic framework of the watermark embedding process consists of the embedding algorithm, the original data contents and the watermark information as input, and the produced watermarked data contents as output. An optional watermarking key can be utilized in the embedding process depending on the algorithm details. The purpose of the key is to increase the security of the system so that the security does not depend on the algorithm being secret. The watermark information can be in any digital format the algorithm understands, but usually a bit array is used because of the small capacity of the watermark. The input data can be video, audio or an image signal, and the output data is generally in the same format as the input, but with the watermark information inserted in the data contents. The general embedding process is illustrated in Figure 6. Figure 6. A general embedding scheme of digital watermarking. The watermark extraction framework contains the extraction algorithm, the watermarked data contents as input and the extracted watermark information as output. The output of the extraction process can be a similar bit array as in the embedding phase or simply whether the watermark was found or not. The general watermark extraction scheme is presented in Figure 7. Figure 7. A general extraction scheme of digital watermarking. 23 If the optional watermarking key was used in the embedding phase, it is usually required in the extraction phase as well. Furthermore, if the original data contents are used in the extraction process, the term informed, non-blind or private watermarking is used. This means that the extraction process takes advantage of the original content while extracting. Depending on the implementation of the algorithm, the informed extraction techniques can greatly facilitate the watermark extraction. The simplest informed extraction method is to subtract the original data from the watermarked data, so the remaining data contains only the watermark in case the watermarked data is not modified after the embedding process. If the original data is not required in the extraction process, the term blind or public watermarking is used. This is easier to manage because the presence of the original data can be a difficult requirement in some extraction schemes. The blind watermarking scenario is also more challenging in terms of algorithm development, but it also allows a wider range of possibilities for application development. The term semi-private watermarking is also sometimes used. It does not use the original cover signal in the detection process like private watermarking, but instead a different published watermarked signal. [28] 3.3.2. Watermarking domains Watermarks can be embedded in audio in time domain or some transform domain, such as the Fourier domain. The selection of domain affects the properties of the watermark concerning imperceptibility and robustness. Frequency domain watermarks are generally considered more inaudible, but they are especially vulnerable against frequency modifications such as pitch shifting or dynamic compression. Time domain watermarking techniques generally use spread spectrum based watermarking. Other domains used for audio watermarking are wavelet domain and cepstrum domain, which is basically the Fourier transform of the decibel spectrum of the signal. [29] Watermarking in the Fourier domain is based on the Fourier transform of the signal. It is one of the most important tools of modern digital signal processing, although the Fourier series was introduced already in the early 19th century by Joseph Fourier. By using the Fourier transform, the signal can be presented in frequency domain, where its frequency components are easily modifiable. The discrete Fourier transform (DFT) X(k) of a discrete signal f(k) with length of N is defined as N −1 X ( k ) =∑ x( n)e −2πikn / N , (1) n =0 where k = 0, …, N-1. In actual implementations, the DFT is never calculated with the formula (1), but rather using an efficient Fast Fourier Transform (FFT) algorithm. Frequency domain watermarking takes advantage of the insensitivity of the human auditory system to phase variations. They can also benefit from techniques similar to audio compression, such as psychoacoustic models. [30] Embedding a watermark in the Fourier domain basically means modifying the frequency coefficients of the Fourier transformed signal. The embedding can be done in individual coefficients or 24 spread spectrum or frequency hopping techniques can be utilized. Frequency hopping method is presented in section 3.3.4. One possibility is to modify the coefficient magnitudes by a specified amount of decibels (dB). The magnitudes of the DFT coefficients are described as X ( k ) = ( X Re ( k )) 2 + ( X Im ( k )) 2 , (2) where XRe is the real part and XIm is the imaginary part of the DFT. The coefficient magnitudes can be converted into decibel domain with the formula X dB (k ) = 10 × log10 ( X (k ) ) , (3) where XdB (k) contains the coefficient magnitudes in the decibel domain. The advantage of using the decibel domain is that handling large differences of watermark intensities is simpler. This technique is utilized in [2] and [29]. 3.3.3. Direct sequence spread spectrum method Spread spectrum watermarking means that the power of the watermark information is deliberately spread wider in the frequency domain in order to hide the signal more efficiently in the cover signal. Currently, spread spectrum methods are the most popular watermarking methods in the literature. Two types of spread spectrum methods are generally used in digital watermarking: frequency hopping and direct sequence spread spectrum methods. The frequency hopping method is based on fast switching of the carrier frequency according to a pseudorandom sequence, which must be known both in the embedding and extraction phases. The direct sequence method spreads the watermark signal into a wider band signal, also created from a pseudorandom sequence. [31] In direct sequence spread spectrum watermarking, the watermark signal constructed from pseudorandom sequences can be added to the cover signal by simply adding or subtracting the samples. As the pseudorandom sequence is generally much shorter than the host signal, the sequence is repeated for every block of the host signal. One possible method is to add the pseudorandom signal to the block if the bit to be embedded is one, and subtract if the bit is zero. This method is illustrated in Figure 8. This kind of approach keeps the computational complexity of the embedding algorithm very low for facilitating real-time usage. 25 Figure 8. An example of embedding a direct sequence spread spectrum watermark. The embedded information can be extracted from direct sequence spread spectrum watermark with calculating the cross-correlation between the original pseudorandom sequence and the watermarked signal block by block. If the pseudorandom sequence has been embedded with a big enough scaling factor, the cross-correlation will show a spike at the middle position of the sequence. The spike will be positive if the embedded bit was one, and negative if the bit was zero. Figure 9 illustrates the extraction process of direct sequence spread spectrum watermarking. Figure 9. An example of extracting a direct sequence spread spectrum watermark. 26 In addition to fast computation, the advantage of the direct sequence method is fairly good robustness against different signal processing attacks. The downside is, however, that the watermark signal becomes audible relatively easily if the power of the watermark signal is increased too much. To achieve maximum inaudibility and robustness for the spread spectrum watermark, several methods have been suggested for analyzing the cover audio. Such methods include using psychoacoustic models for achieving perceptual transparency after embedding and a whitening procedure to improve the correlation in the extraction phase in [33], and temporal masking and msequences to increase correlation strength in [32]. Another important usage for direct sequence spread spectrum methods in audio watermarking is synchronization. It is a procedure for determining the exact location of the watermark in the extraction process. Finding the location is important, because generally the watermark is embedded block by block in the audio, starting from some specified position. The synchronization can be performed either by inserting the synchronization signal once to the beginning of the block sequence or to the beginning of each block. The previous method is faster, but the latter provides more robustness due to individual synchronization of each block. The synchronization signal is usually a similar pseudorandom spread spectrum signal as in the direct sequence methods, except that the synchronization signal can be much longer. In the extraction process, the synchronization point is calculated by calculating the cross-correlation of the original synchronization signal and the watermarked signal. The spike in the cross-correlation result determines the synchronization offset point, where the signal must be shifted to, before starting the extraction of the actual watermark. Direct sequence spread spectrum watermarks have the natural feature of synchronization, but a separate synchronization signal can be used for increasing robustness of the watermark. This is because the pseudorandom sequence for the watermark data is usually much shorter than the synchronization sequence. Separate synchronization signals must be used if the watermark is embedded with the frequency hopping method. 3.3.4. Frequency hopping method The frequency hopping method is very different by nature than the direct sequence method. Instead of being a wide band signal, the frequency hopping watermark is present at very narrow bands at any given time. The frequency of the signal changes rapidly over time according to a pre-defined pseudorandom sequence. The frequency hopping band defines limits for the hopping sequence. The pseudorandom sequence defining the frequency hopping sequence can be used as the watermark key for securing the exact location of the watermark signal in the frequency coefficients. An example of the frequency hopping method is presented in [29]. It divides the host audio into blocks of 1024 FFT coefficients and selects two coefficients according to the pseudorandom frequency hopping sequence. The method changes the values of these coefficients to the subband mean, which is calculated from the coefficients around the two coefficients. If bit one is embedded, the lower coefficient magnitude is set K decibels higher and the higher coefficient is set K decibels lower. If zero bit is embedded, the procedure is the opposite. This method is illustrated in Figure 10. The watermark strength is directly determined by the used K value. Therefore, K cannot be higher than the distance from the subband mean value to the frequency masking threshold in order for the watermark to remain below the JND 27 level. The presented method also includes attack characterization which analyzes the host signal blocks with different signal processing methods in order to achieve maximum robustness against the MPEG compression. Figure 10. A frequency hopping method for embedding digital watermarks. 3.4. Digital watermarks in DRM Digital watermarks can be used for protecting and managing rights of digital audio content. The different nature of watermarking from traditional DRM solutions introduces new possibilities for creating successful DRM solutions. The following sections present the most important features and applications for watermarks in DRM. 3.4.1. Watermarking and encryption The general approach of DRM in audio is to use some proprietary format where the content and the rights management metadata are encrypted. Before playback, the file must be decrypted with a DRM controller, as described in section 2.4.3. After decryption, the content is separated from the metadata, which causes a security risk. Audio capturing methods utilize this risk, called the analog hole, which realizes at the latest when the playback starts and the audio is converted to analog format. This enables capturing the audio without the DRM metadata. With watermarking, the metadata is always part of the content, and there is no analog hole problem. Even if the audio was captured at any point, the rights management metadata would still be present in the audio content. Figure 11 illustrates this difference of encryption and watermarking. [3] A more efficient solution is achieved when encryption and watermarking are used concurrently. Encryption can be used with watermarking for creating a secure endto-end communications channel. Generally, there are two basic ways of combining watermarking and encryption. The first is to encrypt the watermark information, but leave the content file unencrypted. This protects the watermark in a way that even if an outsider would be able to extract the metadata from the content, the information would still be encrypted. The other solution is to encrypt both the watermark and the content file. This provides the maximum security, but is also more complex in terms of calculation time when the metadata needs to be read prior to content playback. 28 This approach also removes the benefit that the audio would be playable on all media players. Figure 12 presents an example DRM scenario where the audio file is encrypted after the watermark has been inserted. [3] Figure 11. Comparison of encryption and watermarking in DRM. Figure 12. Encryption and watermarking can be used together in order to create a more efficient DRM solution. The goals of encryption and watermarking are very different and the choice of which technology to use must be considered with the specific application in mind. Encryption provides access control so that only authorized users with proper encryption keys can have access to the content. However, the protection provided by encryption is only in use when the content is encrypted. After decryption the content is as vulnerable as without encryption. This access control problem has proven to be 29 problematic for the music industry, but it can be complemented with digital watermarking in order to create a more robust DRM solution. [34] 3.4.2. Digital fingerprinting Digital fingerprinting is a technology for digital rights management where unique identifiers, known as digital fingerprints, are embedded in content before distribution. The technique is mostly used with digital audio and video, but in principle it can be applied to digital images as well. Conventional digital watermarking methods can be used in the fingerprint embedding process. These methods are discussed in more detail in section 3.3. The purpose of digital fingerprints is to trace the owner of a particular copy of multimedia content in case the content is leaked to unintended domains. To achieve this, the unique identifier in the content must be linked to the owner in some way, for example with the user ID database used by the content vendor. Then, if a copy of the content is discovered on the Internet, the user who made it available can be traced by reading the fingerprint in the content. Figure 13 illustrates the process of embedding a digital fingerprint in an audio file. The user, Alice, has just purchased a music file from an online music vendor and the vendor’s server adds Alice’s customer ID as a fingerprint to the audio. The resulting unique fingerprinted copy of the audio is then distributed to Alice. Figure 13. A customer-specific identifier is embedded in the audio content prior to distribution in order to create a unique digital fingerprint. Digital fingerprints should be imperceptible and they should not be easily removed although the content is tampered with. The imperceptibility is vital because the users would not have any motivation to select a service where the content quality is not as good as in other services. The fingerprint robustness is also important, because culprits can try to remove the fingerprint with different attacks, and even non-hostile users may perform file type conversions or other operations which modify the content slightly. Therefore, the fingerprint should be as robust as possible while still remaining imperceptible. The capacity is usually not an important factor, since the number of unique identifiers increases rapidly with an increased bit depth. 30 3.4.3. Tamper detection Another important application scenario for digital watermarking in DRM use is tamper detection. It provides means for determining whether the content has been modified from the original version. This is achieved with embedding a fragile watermark to the original content, which is then extracted in the content verification phase. The fragile watermark is designed so that it is destroyed if the content is modified in any way, so if the watermark can be extracted successfully then it is certain that the content has not been modified from the original version. The main challenge in this application is to prevent unauthorized insertion of the authentication watermark to tampered or unauthorized multimedia signals. It can also be desirable to detect specific changes to the content, such as lossy compression, which can be distinguished from actual content tampering. Most tamper detection applications use blind watermarking, because of the unavailability of the original signal. [34] 3.4.4. Attacking digital watermarks Attacks on digital watermarks can be defined as intentional or unintentional modifications to the watermarked signal. That is, every small change to the signal can be considered as an attack on the watermark. The used watermarking method affects the ability of the watermark to resists some attacks better than others. The required robustness level also depends on the watermarking application scenario. The attacks can be generally divided into two categories: friendly and hostile attacks. Friendly attacks are usually unintentional where the user does not have any knowledge of the watermark and/or its embedding procedure. Hostile attacks are always intentional and they aim at destroying the watermark. An example of a friendly attack could be a radio station performing an audio preparation process for the audio material. The audio can be normalized to the correct volume level, equalized for better perceived quality and probably run also through noise removal, which removes unwanted parts of the audio. A more common friendly attack on a watermarked audio is the MP3 compression, where the severity of the attack depends heavily on the used compression rate. The growing number of attacks has lead to the development of special applications for testing the robustness of embedded watermarks. A properly defined benchmark can also function as a performance comparison tool for different watermarking algorithms. The StirMark application has been created with the benchmark aspect in mind. It aims at providing a trusted third party watermark evaluation tool for image, audio and video watermarking. [35] There are various types of attacks on digital audio watermarks. Based on the basic characteristics of the attacks, they can be classified into a few basic groups: dynamics attacks change the dynamic profile of the audio. They can be linear or nonlinear, which modify the spectral components depending on the frequency. Filter attacks cut off or increase a specific band from the spectrum. The simplest filter attacks are low-pass and high-pass filters, but more complex ones can also be used. Ambience attacks create effects similar to those naturally present in a closed room, most commonly reverb and delay. Conversion attacks are caused by changing the audio format. Sampling rate, bit depth or the number of channels are the usual 31 properties affected by the audio format selection. Lossy compression attacks use some specific algorithm to remove information from the audio, which causes loss of quality. The MP3 compression falls into this category. Noise attacks add some type of noise to the audio signal. Modulation attacks include effects such as chorus and flanging. Time stretch and pitch shift attacks change the audio length with keeping the pitch constant or change the pitch with keeping the audio length constant. Especially the pitch shift is one of the most sophisticated attacks and it can cause problems for watermarks embedded on specific narrow band frequencies. [35] The attacker does not have to limit to a single attack, but instead he can perform multiple attacks to the same watermarked audio track. This presents a new challenge to the watermarking algorithms because they have to resist many interference signals at the same time. However, the attacker must keep in mind that the perceptual quality of the audio suffers more from a group attack and therefore the attack strength must be lower than if performing just a single attack. The goal of the attacker is to modify the watermarked audio signal just enough for the watermark to be destroyed, because he wants to keep the audio quality as high as possible. On the other hand, the watermark is robust enough if it survives all types of attacks on the signal long enough for the cover signal to be destroyed just enough to make the listening experience noticeably worse. An example of a successful attack scenario is presented in Figure 14. The attack modifies the watermarked audio in a way that the extraction algorithm is unable to extract the watermark information correctly, but the audio content is still of good quality so the listening experience remains unaffected. Figure 14. A successful attack destroys the metadata of the watermark, but does not affect the listening experience too much. Every watermarking method is strong against some attacks and weak against others depending on the embedding details. For example, if the watermark is embedded in the high frequencies, then a low-pass filter can be a tough attack. If the frequency hopping method is used, then a dynamics attack and the pitch shifter are the most dangerous adversaries. Figure 15 presents a spectral frequency display of two audio signals. The first is the original unmodified version and the second is a version where the pitch of the audio has been shifted by 5 percent. It can be seen that the frequency components are compressed in the frequency scale, and the pitch shift is stronger in the high frequencies and weaker in the low frequencies. 32 Figure 15. The spectral frequency display of an audio clip before and after a pitch shift attack shows that the change is especially notable in the high frequencies. In the case of digital fingerprinting, there is one special type of attack to be considered, namely the multi-user collusion attack. It is one of the most sophisticated attacks against digital fingerprints, because instead of being performed just by one malicious user, the collusion attack is a group attack. First, all users participating in the attack get the fingerprinted content legitimately, and then they use averaging methods to attenuate all fingerprints. If the fingerprint embedding and identification scheme does not take this attack into account, the collusion attack can relatively easily destroy all fingerprints from the content. [34] Figure 16. A successful attack destroys the metadata of the watermark but does not affect the listening experience too much. Figure 16 presents a scenario where two users, Alice and Bob, use their fingerprinted copies to create a colluded copy, where the fingerprint is destroyed. In reality, the number of users is dozens, but the principle is the same. The attack can be performed with averaging directly all the samples of the synchronized fingerprinted files of all participating users. This is an example of a linear attack, which is a simple and effective way of attenuating digital fingerprints. Because of the good perceptual 33 quality of the fingerprinted audio clips, nonlinear attacks can be used as well. An effective nonlinear attack is to analyze the minimum, maximum and median of the corresponding sample values of all fingerprinted versions and using some function for determining the final output value for the colluded copy. [34] However, a good collusion-resistant fingerprint can survive a collusion attack and identify all or part of the participants in the attack. One possibility is to use orthogonal watermarks as fingerprints, which ease the fingerprint distinction process. The other solution is to use code modulation with creating user fingerprints from linear combinations of orthogonal basis signals. This method allows introducing correlation between fingerprints and more fingerprints can be used than normally would be possible for a given dimensionality. [34][36] 3.5. Removable watermarking Removable watermarking is a special technique where the watermark is embedded in a way that it can be removed from the host signal after embedding. It is also sometimes called reversible, invertible or erasable watermarking. Embedding a reversible watermark is different from conventional watermarking because the embedding algorithm must store the recovery information for the watermark. This information is used in the watermark removal process for accurate recovery of the original signal. Currently, the research focus has mainly concentrated on removable watermarking of digital images, but most of the methods can be applied to audio signals as well. The methods for reversible watermarking for images and video have been divided into three categories: data compression, difference expansion and histogram bin exchanging. Data compression methods embed the recovery information required for removing the watermark into the watermark itself. In difference expansion, the watermark data is embedded in expanded values of some small numbers which represent the features of the original data. The third method category relies on shifting the histogram bins according to the watermark information. [37] The major challenges for removable watermarking are embedding capacity and robustness [37]. Capacity is often limited because the amount of required recovery information usually increases with the increased capacity. It is not necessarily a requirement for all applications, but it certainly limits the usage of removable watermarking to a certain set of applications. 3.6. Commercial solutions Although digital watermarking is a relatively new research topic, it has received a steadily growing interest from the media industry which is constantly finding more applications for utilizing it in various scenarios. Most of the commercial development has concentrated on digital image watermarking, but a few digital audio watermarking solutions have also been developed. The usage of digital watermarking techniques in the media industry is wideranging. It is used by broadcasters to track and measure TV programming and advertising, and by movie studios and music labels to deter content piracy. Media 34 and entertainment companies use watermarking to identify and manage media assets, photographers and image aggregators to manage image copyrights, and satellite image providers to verify ownership of their images. Also governments authenticate IDs and prevent document counterfeits with the help of digital watermarking applications. [38] 3.6.1. MarkAny MarkAny is a a Korean company developing digital security, authentication and copyright protection applications. It provides image, video and audio watermarking products and also a CastLog broadcast monitoring system for video and audio content. MarkAny also has a product aiming at protecting user-generated content with digital watermarks from unauthorized copying. MarkAny watermarking technology has been implemented for applications in mobile commerce, document security and forensic tracking usage. The audio watermarking solution of MarkAny, MAO 2.0, is a product concentrating on copyright protection. It embeds copyright information in audio content, and it features imperceptible watermark embedding and robustness against audio compression and signal processing. It is essentially a software library which can be implemented into other applications. The current version supports only WAV audio format, but it can be applied to other service providers, extending the support to MP3 or other file formats. MAO 2.0 is certified by Secure Digital Music Initiative (SDMI). MAO 2.0 is mainly targeted for music producers, audio software developers and music content industry in general. [39] 3.6.2. Verance Verance provides cross-platform copy protection solutions for video and music content. They are widely used in business and consumer applications, most notably in Blu-ray, HD DVD, DVD-Audio and SD-Audio formats. Verance provides a set of embedders and verifiers for inserting the watermark into video and audio content. Detectors are also provided for various platforms for reading the watermark information. [40] Verance Copy Management System for Audio content (VCMS/A) is a copy protection solution for DVD-Audio, SD-Audio and SDMI portable device consumer product formats. It claims to be the only solution providing persistent and selfidentifying copyright management across all music distribution formats. Some product manufacturers have inserted detectors in their DVD-Audio, SD-Audio and SDMI portable devices which enable detection of VCMS/A watermarks in audio content. The detection module then interprets the watermarked information and delivers it to the playback and record control software. [40] The Verance audio watermarking technology has been included in the Advanced Access Content System (AACS) standard for content distribution and digital rights management. It is widely used in HD DVD and Blu-ray discs and players. The movie studios can insert a watermark to a movie audio track, which can then be detected in video players if someone manages to copy the movie, for example with illegal 35 camcording. The watermark is embedded with modifying the audio waveform in a regular pattern to convey the information. Another version of the use of Verance audio watermarking solution with AACS is used in home entertainment, where the creation of illegal copies of purchased or rented discs can be disabled. [41] 3.6.3. Philips Philips provides content identification solutions for digital video and audio through several digital watermarking products. Its video watermarking products include RepliTrack for forensic tracking purposes, CompoTrack product family for creating flexible watermarking solutions, CineFence for protection against illegal movie recording in digital cinema environment and VTrack for identifying the source of pirated PayTV content. The Philips audio watermarking software is part of the CompoTrack family, called CompoTrack WAV. It is a product for Microsoft Windows with a DLL API interface and a support for the WAV file format. CompoTrack WAV also includes a detector for WAV audio files or WAV audio streams. Several other companies base their watermarking solutions on Philips CompoTrack products. Media Science International has developed MSI Copy Control software for providing services for record labels to protect their digital content from piracy using content management systems and watermarking technologies. It focuses on protecting the audio copyright right from the recording studios and also through promotion and distribution, and promises full compatibility and integration with Rimage, a producer of Blu-ray and DVD-R discs. Other companies using CompoTrack WAV for creating watermark solutions include Fortium and Ezee studios. 3.6.4. Alpha Tec Alpha Tec is a company specialized in digital image and video processing and multimedia applications. Alpha Tec has a digital audio watermarking product called AudioMark, which concentrates on copyright protection scenarios. The AudioMark software package is designed for embedding inaudible watermarks in digital audio and detecting them from suspected audio files. It supports RAW and WAV audio file formats, batch processing and uses blind extraction while detecting the watermark from an audio signal. AudioMark claims to be resistant to MPEG audio compression, filtering, resampling and requantization signal processing operations. One of the goals of the company in designing AudioMark is user-friendliness. 36 4. ALGORITHM FOR REMOVABLE WATERMARKING AND FINGERPRINTING This chapter introduces the watermarking algorithm used in the audio protection system which is discussed in further detail in chapter 5. The algorithm designed in this work is an improved version of the algorithm presented in [2]. The paper introduced a removable watermarking algorithm for digital audio, where an audible noise signal is inserted into an audio file, which is then made available freely as a teaser of the original content. The user can then remove the noise and restore the original audio quality for a fee. The goal of the new algorithm presented in this thesis is to provide tools for an online music store to publish all their music as free preview versions on the Internet, after audible watermarks have been inserted to the audio files. The purpose of the watermark is to disturb the listening experience enough to make extended listening uncomfortable, but still allow a nice preview of the song in question. The users can download and listen to the preview versions freely, but if they want to have the high quality version without the disturbing watermark, they have to purchase a license for the song. When the user purchases a license, the online store creates and sends it to the user’s device, which then removes the audible noise and transforms it into an inaudible digital fingerprint. The fingerprint in the purchased song contains the user’s music store ID, which can be used for tracing the original owner of the copy if the song is being distributed on the Internet. Therefore, it is very important that the fingerprint is robust enough to resist basic signal processing attacks. The robustness of the fingerprint was one of the priorities of this thesis, and the algorithm was tested against a large set of signal processing attacks. The test scenario and results are discussed further in section 4.4. The algorithm is divided into three phases: embedding, noise transform and fingerprint detection. The embedding phase is very similar to the previous algorithm version of [2], but some modifications have been made. These include adding the Linear Congruential Generator (LCG) for generating the pseudo-random sequence and the synchronization signal embedding process. The noise transform is based on the watermark removal algorithm, but the new version includes synchronization and the ability to leave part of the watermark as a digital fingerprint. Also, both the embedding and the noise transform algorithms include a new feature for supporting different band widths of the watermark. This feature was used for improving the robustness and the imperceptibility of the new version. The fingerprint detection part is completely new and it was not included in the previous version. These phases are presented in further detail in the following chapters. 4.1. Embedding In the embedding phase, a removable watermark is inserted into the original audio in order to produce the distributable preview version. The algorithm combines several digital watermarking techniques, such as frequency hopping and direct sequence spread spectrum watermarking. Inputs of the process are the uncompressed original audio file and the pseudorandom key for improving the security of the watermark. At first, the original file is 37 divided into blocks of 1024 samples and each block is processed separately from here on. The FFT coefficient magnitude of each block is calculated, and the pseudorandom key defines the coefficients which are modified according to a random K value from a specified decibel range [min_k, max_k]. The scaling factors k1 and k2 are then calculated by modifying the FFT magnitude array in decibels and comparing it to the original complex FFT array. The values in the complex FFT array are then scaled according to the scaling values in order to produce the complex FFT array with the added noise. After the IFFT, all blocks of 1024 samples are combined together and the distributable audio file is created. The final step is to add a spread spectrum synchronization signal to the beginning of the block sequence. Outputs of the process are the distributable audio file and the watermarking key, which contains the recovery information needed for removing the watermark. The recovery information consists of the used pseudo-random key, the array of K values and the synchronization signal. The embedding process is illustrated in Figure 17. Figure 17. The process of embedding the initial audible watermark. 38 4.2. Noise transform The noise transform process takes place when the user has acquired the license for the audio file, and the noise can be removed from the preview version. The required inputs are the free distributable version of the audio, the spread spectrum synchronization signal and the watermarking key. The output of the process is the fingerprinted audio file. An overview of the algorithm is presented in Figure 18. Figure 18. An overview of the noise transform process of the implemented system. The process can be divided into three steps: synchronization, block processing and combining the result audio. Synchronization determines the starting point of the watermarking sequence. The synchronization method utilizes direct sequence spread spectrum watermarking techniques, which are described in more detail in section 3.3.3. Synchronization is important because different lossy compression encoders, such as LAME for MP3 encoding, may add some additional samples to the beginning of the audio in the encoding phase. The synchronization signal is removed 39 from the audio after the starting point has been located in order to achieve higher audio quality. After the synchronization step, the audio is divided into blocks, which are processed in a similar way than in the embedding phase. FFT coefficient magnitudes of the individual blocks are modified with the K array values which are part of the watermarking key. The pseudo-random key is used for deriving the frequency hopping sequence which determines the exact FFT coefficients to be modified. The modification is done by first determining the scaling values k1 and k2 by modifying the FFT magnitudes array values, and then scaling the actual complex FFT array for creating the fingerprinted FFT values. Then, after the IFFT, all blocks are combined together and the fingerprinted audio file is created. The actual noise transform from noise into a fingerprint is done when the FFT coefficients are modified with the K array values. It is possible because the K array values are not exactly the same values which were stored by the server in the embedding phase, but instead they are modified slightly by the server in a way that it contains the digital fingerprint of the user. The ID of the user in the music store can be used as the fingerprint data. This means that a unique K array must be generated by the server every time a new customer purchases a license for a song, because of different fingerprint data. One advantage of this kind of approach is that the song is never in an unprotected state, because it transforms directly from the free preview version into the fingerprinted version without any additional steps in between. It is also convenient for the user because he does not have to download the song again after purchasing. Instead, he only needs to acquire the license and wait for the local noise transform process to be completed. 4.3. Reading the fingerprint The last part of the algorithm is the one that the rightsholders would rather not use, but unfortunately, it is still probably necessary. When a rightsholder discovers that songs are being distributed illegally, he can take one of the songs and check if there is a fingerprint. The purpose of the fingerprint reading algorithm is to use the whole song and extract the digital fingerprint as reliably as possible. It uses the original audio file in the process, so the extraction method is non-blind. An overview of the fingerprint reading process is presented in Figure 19. The first step of the algorithm is to synchronize the fingerprinted file to the original audio file. The synchronization is performed in a similar way than with a separate synchronization signal. Then, as in the previous algorithms, the audio is divided into blocks and the FFT coefficient magnitudes are analyzed. This time also the original signal is divided into blocks and their FFT magnitudes are calculated. The pseudo-random frequency hopping sequence is generated from the pseudorandom key and those coefficients of the original and fingerprinted are compared. This gives us the bit value and the bit intensity of the current block. After all blocks have been compared, the encoded bit array is created by integrating over all bit values and their intensities. The final step is to decode the error correction, which results in the fingerprint data. 40 Figure 19. The algorithm for reading the fingerprint. 4.4. Performance evaluation Because the protection of the presented DRM algorithm relies heavily on the robustness and imperceptibility of the fingerprint watermark, a proper testing process had to be performed in order to evaluate the performance of the algorithm. The robustness and imperceptibility tests were tested separately in two test cases. The robustness test applied a series of signal processing attacks on fingerprinted songs. The attacks aimed at destroying the fingerprint watermark without otherwise destroying the audio quality. The imperceptibility was tested with a listening test by a dozen of test users. The test was implemented with a web-based Audio Quality Evaluation Tool, which allowed the users to listen to watermarked and nonwatermarked audio clips and evaluate their perceptual quality. 4.4.1. Robustness The robustness of the algorithm was tested with an extensive set of attacks against the fingerprint watermark. A set of 15 different attacks was compiled from [35], which presents a wide range of selected attacks from various attack classes used in 41 the StirMark benchmark. These attacks were applied to 16 different fingerprinted audio samples, which represent different musical styles. The attacks were configured to cause nearly imperceptible changes to the audio in order to provide a realistic attack scenario, because the listening experience must not deteriorate, as explained in section 3.4.4. The attacks and their descriptions are presented in Table 1. The specific attack properties are listed in more detail in Appendix 1. Table 1. The applied attacks and their descriptions No. 1 2 3 4 5 6 7 8 9 10 11 12 Attack Description MP3 compression The de facto standard encoding for music on (two attacks with 128 and digital audio players. Basically, a lossy 192 kbps bit rate) compression algorithm utilizing psychoacoustics models for effective perceptual coding. Chorus Multiple delayed versions of the audio are added into itself. The delay time, modulation strength and voice number parameters can be modified. Compressor The loudest signal peaks are limited, which allows stronger overall signal strength. Delay A delayed copy of the audio is added to the original copy. Flanger A slightly delayed copy of the audio is added to it, with the length of the delay changing constantly. Invert Inaudible attack. The audio sample values are inverted. Low pass filter A filter that removes all frequencies higher than the chosen parameter. Pitch The frequency of the audio is changed (two attacks with -1% and without changing the speed of the audio +1% pitch) signal. Random noise The audio sample values are modified with a random value. The maximum change from the original is specified with a separate parameter. Resampling The sampling frequency of the audio is changed. Reverb Similar to delay but with shorter delay time and reflections. Stretch The audio duration is changed without (two attacks with -2% and changing the audio frequency or pitch. +2% stretch) The test preparations required creating the fingerprinted versions of all songs in the test. Three different versions were used: one without error correction, one with hamming and one with turbo encoding. First, the original song versions were manipulated with the embedding algorithm presented in section 4.1. This created the 42 distributable preview versions, which were then transformed into the fingerprinted versions with the noise transform algorithm described in section 4.2. All 15 attacks were applied to each three versions of fingerprinted song with Adobe Audition 2.0 audio processing software. This resulted in a total of 720 fingerprinted audio clips, which were modified by the attacks. The final step of the testing process was to check if the fingerprint reading algorithm was still able to read the fingerprint watermark from the audio clips regardless of the attacks. The test used the fingerprint reading algorithm presented in section 4.3. for detecting the watermark. It should be noted that the length of the audio clips was between 11.5 seconds and 19 seconds with the average length being 13.5 seconds. The clips were selected to be short because the noise transform algorithm had to be processed on a mobile phone, and increasing the audio length would increase the processing time. The short length of the audio samples affects greatly the effectiveness of the fingerprint detection algorithm, because during embedding the fingerprint is iterated over the whole audio. In the detection phase, the detection results are also iterated, and a longer audio file provides a more reliable detection result. If complete songs would be used, the audio length would be 10-15 times longer than in these tests. Therefore, this test provides data on how the algorithm performs with much shorter audio files than normally would be used on a system like this. The fingerprint detection results on a test with full songs would be better than the results of this test. The results of the robustness are presented in Table 2. The table presents the detection percentage of the fingerprint from all 16 test samples. A result of 100% means that the fingerprint was detected from all 16 audio clips, which were modified with the corresponding attack. A result of 81% means that the fingerprint was detected from 13 of 16 audio clips. Table 2. The fingerprint detection percentages of each song after the signal processing attacks Attack MP3 128kbps MP3 192kbps Chorus Compressor Delay Flanger Invert Low pass filter Pitch -1% Pitch +1% Random noise Resampling Reverb Stretch -2% Stretch +2% Not encoded 100% 100% 100% 81% 100% 100% 100% 100% 25% 38% 100% 100% 100% 94% 88% Hamming 100% 100% 100% 100% 100% 100% 100% 100% 25% 31% 100% 100% 100% 100% 69% Turbo 100% 100% 100% 100% 100% 100% 100% 100% 31% 19% 100% 100% 100% 100% 94% 43 4.4.2. Imperceptibility The imperceptibility of the algorithm was tested with several users who listened to watermarked and non-watermarked audio clips. The test environment was a webbased Audio Quality Evaluation Tool which has been used in previous studies for evaluating the perceptual quality of digital audio watermarks. The total number of users who participated in the listening test was 10. The respondent age distribution varied so that 7 of them were aged between 21-30 years, and also one respondent from each category of 11-20, 31-40 and 41-50 years. Six test users described themselves as average music listeners, two of them were dealing with music in their work and two played some musical instrument. From the audio clips used in the robustness test, 10 clips were selected to be used in the imperceptibility test. The clips were selected to represent different music styles. Fingerprinted versions were also created in a similar way than in the robustness test. This time only the version without error correction was used, since message encoding does not affect the perceptual quality of the audio signal. The original and the fingerprinted audio clips were then uploaded to the evaluation tool. Every file was uploaded two times in order to increase the statistical reliability of the results. This resulted into 40 audio clips, from which 20 were watermarked and 20 were not. The users first listened to every clip one at a time and evaluated whether it was watermarked or not. Then they evaluated the perceptual quality of the clips with a grade from 1 to 5. The grades were accompanied with textual descriptions, which are presented in Table 3. The scale is one of the suggested subjective audio quality measurement methods in ITU-T P.800 recommendation [42]. Table 3. The grades and their descriptions used in the user tests Grade 1 2 3 4 5 Text Imperceptible Perceptible, but not annoying Slightly annoying Annoying Very annoying The answers percentages for each audio clip, both non-watermarked and watermarked versions, are presented in Table 4. The answers are categorized into three listener groups. The fourth result column includes the results of all three listener groups. The presented percentages in the non-watermarked column are averaged between the two identical non-watermarked audio clips, and the percentages in the watermarked column are averaged between the two identical watermarked audio clips. From the 20 non-watermarked audio clips the average number of correct answers was 20*0.63 = 12.6, and from the 20 watermarked audio clips, the average number of correct answers was 20*(1-0.57) = 8.6. This results in a total of 21.2 of 40 correct answers per user, which is 53% of the total amount. The users answered on average that 60% of the songs were not watermarked and 40% were watermarked although the correct ratio was 50% to 50%. 44 Table 4. The percentages of answers deciding that the audio clip was not watermarked Audio clip Non-watermarked Watermarked Average Musician Pro All Average Musician bigyellow 25% 50% 75% 40% 42% 75% exitmusic 50% 50% 75% 55% 58% 50% bryanadams 75% 75% 75% 75% 67% 50% cocker 58% 75% 75% 65% 75% 75% duel 75% 25% 75% 70% 58% 50% finlandia 58% 25% 75% 55% 58% 75% metallica 75% 50% 100% 75% 67% 75% queen 67% 25% 100% 75% 50% 50% sipe 58% 100% 75% 70% 42% 25% sting 58% 50% 25% 50% 50% 75% 53% 60% AVERAGE 60% 75% 63% 57% Pro 25% 25% 75% 75% 75% 75% 50% 50% 75% 25% 55% All 45% 50% 65% 75% 60% 65% 65% 50% 45% 50% 57% The results for the perceptual quality evaluation of the audio clips presented as mean opinion score (MOS) values are presented in Table 5. These values were accompanied by the textual descriptions presented in Table 3. Table 5. The average (MOS) grades per audio clip categorized with listener types Audio clip Non-watermarked Average Musician bigyellow 4.50 4.75 exitmusic 4.09 4.75 bryanadams 4.92 5.00 cocker 4.67 5.00 duel 4.67 5.00 finlandia 4.84 5.00 metallica 4.75 5.00 queen 4.67 4.75 sipe 4.50 5.00 sting 4.58 5.00 AVERAGE 4.62 4.93 Pro All 5.00 4.65 5.00 4.40 5.00 4.95 5.00 4.80 5.00 4.80 5.00 4.90 5.00 4.85 5.00 4.75 5.00 4.70 5.00 4.75 5.00 4.76 Watermarked Average Musician Pro All 4.59 5.00 5.00 4.75 3.67 5.00 4.50 4.10 4.92 5.00 5.00 4.95 4.50 5.00 5.00 4.70 4.59 4.75 5.00 4.70 4.33 5.00 5.00 4.60 4.75 5.00 5.00 4.85 4.50 5.00 5.00 4.70 4.59 5.00 5.00 4.75 4.59 4.75 5.00 4.70 4.50 4.95 4.95 4.68 4.5. Discussion Robustness and imperceptibility of the fingerprint watermark were one of the main focus areas of this thesis. The watermark was tested against a large set of different attacks to ensure that it could not be easily removed. The attack set included attacks from every attack group presented in [35]. The results indicate that the implemented algorithm is most vulnerable to the pitch attack. This was of course expected, because the watermark is embedded in the frequency domain. The FFT coefficient values change radically in the pitch attack because it modifies the sound frequencies directly. Figure 15 in section 3.4.4. demonstrates the effect of this attack. However, modifying the pitch is probably the 45 most audible from the attacks. If two versions of a song modified with pitch -1% and pitch +1% attacks were listened carefully and compared, the listener would probably be able to notice a clear difference between them. The difference between pitch -1% or +1% and the original version is much harder to notice, but still it is the most audible attack of the set, except from the attack of resampling to 8,000Hz, which was not intended to be an inaudible attack. The use of forward error correction seems to improve the results a little. The reason for lower success rates in some situations is probably the use of short audio clips. This is because the use of error correction increases the message length by 75% (hamming) or by 256% (turbo). This means that the number of times the message can be repeated during the audio is much lower than if error correction was not used. Using longer and more realistic audio clip lengths would improve the results in all situations significantly. The clips were selected to be short in this study because of performance issues, as the duration of the noise transform process would increase linearly with the audio clip length. Another reason for using short clips was the use of copyrighted works as test samples. The fair use principles recommend using shorter than 30 second audio clips if the works are copyrighted. The overall results show that the implemented algorithm is fairly robust against the most common signal processing attacks. The lowest success rates were against the pitch attack, 25%, which means that in every fourth audio clip the fingerprint was still detected. This is a sufficient result in most cases, because Internet piracy rarely distributes individual songs, but instead whole albums or even whole discographies are distributed. This increases the probability that the fingerprint can be detected at least in part of the songs, which is the goal of this DRM system. The imperceptibility of the fingerprint watermark was tested by a dozen test users, who listened to watermarked and original versions of the audio clips. In the first test, they evaluated which clip was watermarked and which was not, and in the second test, they evaluated the imperceptibility of the audio clips with a numerical grade from 1 to 5. The results of the experiment show that the test users marked 63% of the original non-watermarked audio clips as non-watermarked and 57% of the watermarked clips as non-watermarked. The 6% difference in the values comes down to 24 answers from the total of 400 answers collected in the entire experiment. This is 2.4 answers per user and 0.24 answers per song and 0.06 per clip. This means that in every fourth song an average user would favor the non-watermarked version in one of the four audio clips of the song. In other words, he would favor the non-watermarked version in every 17th audio clip. This result would suggest that the watermark was not entirely inaudible. However, the margin of error and the confidence interval must also be taken into account. The standard error can be calculated with the formula e= p * (1 − p ) , n (4) where p is the proportion of correct answers and n is the total number of answers in the test. This gives us the standard error of 2.50%. For normally distributed samples, the confidence interval of 95% is calculated with multiplying the standard error with 1.960. The resulting confidence interval of 95% is therefore ±4.89%. This means that 46 if a larger sample set would be used, the number of correct answers would in 95% probability be between 48.11% and 57.89% of the total amount of listened audio clips. With calculating the gauss error function, we can deduce that, if the confidence interval was loosened to 76.99%, then, the resulting percentages would be 50.00% and 56.00%. This statistical analysis of the first listening test suggests that listeners would in 76.99% probability favor slightly the non-watermarked version in terms of audio quality. The difference in results between listener groups is quite notable. The users identifying themselves as professionals working with audio got the best results, which is not unexpected. However, the results of the two musician test users got the least accurate results. If the results of the two professional test users are removed, then the percentages of ‘not watermarked’ answers are 58.25% for non-watermarked and 57.75% for watermarked audio clips. This suggests that an average use could not tell the difference between the watermarked and the non-watermarked version. The analysis of the results without the professionals is justified because the two persons have been working closely with digital audio watermarking, which gives them an advantage in the watermark detection. In the second test, the listeners evaluated the perceptual quality of the audio clips with giving them a grade from 1 to 5, where 1 was very annoying and 5 was imperceptible. The users identifying themselves as average music listeners were clearly most critical in their evaluation with an average grade of 4.619 for nonwatermarked and 4.503 for watermarked audio clips. The trend is the same as in the first test where the users also preferred the non-watermarked version. The average grade of the musician test group also follows the same pattern as in the first test because they gave an average score of 4.95 for the watermarked clips and 4.925 for the non-watermarked clips. The professionals were surprisingly the least critical and they gave the best scores for both the non-watermarked and watermarked audio clips. The lower scores of the average users could be a result for a misinterpretation of the natural distortions of the music. This is supported by the lowest scores of 4.09 and 3.67 given to the exitmusic audio clip, which is heavily compressed with a dynamic range compressor in order to achieve higher loudness levels. The resulting audio has less dynamics than a normal uncompressed song, and even the nonwatermarked version sounds distorted, although it does not contain any watermarks. This may have affected the listeners, especially in the average group, who gave the score of 4.09 for the non-watermarked clip. However, the lowest watermarked clip score of 4.1 for the exitmusic from all listeners is justified, because the compression affects the song in a way that it makes originally quiet parts sound loud. The song has quiet singing and a guitar playing in the background, but as the levels are maxed out, the algorithm embeds the watermark with a stronger scaling than if the audio levels were lower. This causes the watermark to be more audible in exitmusic than in other clips. The average users seem to have been the most sharp-eared for detecting this. Except from the exitmusic audio clip, the differences between the nonwatermarked and watermarked MOS grades are small. If the exitmusic grades are removed from the test, the average grade from all users was 4.79 for nonwatermarked and 4.74 for watermarked clips. The difference in grades is only 1% which is less than the standard error. This implicates that the watermark would be inaudible if the problem with dynamically compressed audio signals was fixed. 47 The conclusion from the results is that the fingerprint watermark is very close to being inaudible. In the first test, only the professional users could tell the difference between watermarked and non-watermarked audio clips. The second test showed that the watermark algorithm should be improved to achieve more inaudible results in quiet audio clips that have been dynamically compressed. Otherwise, the results show that the users were not able to distinguish the watermark from the audio. 48 5. DESIGN OF ROBUST AUDIO PROTECTION SYSTEM This chapter presents the design of a robust audio protection system utilizing digital watermarking techniques. The purpose of the system was to implement and test a new business model for mobile audio distribution. The new business model is described in section 5.1. The use cases which are used for deriving the requirement specification are presented in section 5.2. A detailed system architecture description is given in section 5.3., and the software design process is described in section 5.4. 5.1. General system description The implemented system is a mobile audio distribution system, which consists of an online music store server and a mobile client application. The purpose is to allow the music vendor to distribute all their music as free distributable preview versions, which are available for download to all users. These preview versions contain an audible noise signal, which allows the users to listen to the whole song and to sample what would the high quality version sound like. The users can then purchase a license for the song, which removes the noise and restores the original high quality of the song. The new business model tries to solve the problem of people using short preview samples as ringtones in mobile phones causing revenue losses for the mobile industry. The preview versions with the added noise are not so attractive in ring tone use because of the disturbing noise, so the ring tone sales are not affected by the use of preview samples. The use of full songs as previews also gives a better preview of the whole song for the users and it also speeds up the song purchasing process, because users do not need to download the song again. This can also show up as an increase in music sales, especially in mobile markets where the bandwidth is not yet as high as in desktop computers. The digital watermarking algorithms presented in chapter 4 form an important part of the system, because they perform the required watermark and fingerprint processing functions. The music store server and the client application are basically interfaces for the actual algorithms connected via client-server architecture. The embedding algorithm is used by the online music store when songs are imported to the system for the first time, and the noise transform algorithm is used by the mobile client application when the user purchases a license for a song. The fingerprint detection algorithm is a special type of algorithm, which is used separately from the other two software components of the system. The idea is to read the fingerprint of only those audio files which show up on unauthorized music distribution channels, such as peer-to-peer networks. 5.2. Use cases The use case analysis concentrates on finding the requirements of the software by means of separate use cases, which represent the different behavioral aspects of the system. Each use case contains a set of inputs for which the system must respond accordingly. This leads to the specification of functional requirements for the system. 49 The analysis defines only use cases initiated from the client side, because the music store administration tasks, such as importing new audio files and detecting fingerprints, were not implemented outside Matlab environment. 5.2.1. Downloading songs for preview The first use case consists of requesting a content index from the server, displaying it to the user and downloading one of the preview music files in the index. The use case is initiated by the user of the client application. It sets requirements for the message sequences between the server and the client, and the performance of the network connection is also an important factor during the song download process. Actors: Music store customer Preconditions: The client application has been installed on the mobile phone and it is running. The network connection is available on the phone over Wi-Fi or 3G network. The music store server is running on a computer with network access and it has preview music files available for download. The IP address and port of the server has been set in the settings of the client application. Description: The user selects a menu command which starts the preview file download process. The user is asked for a network access point. He selects either the preferred Wi-Fi connection or a 3G network connection. After a few seconds, the server responds with sending a list of the preview files available for download. The list is displayed to the user as a selection list box. The user selects one song from the list and the server starts sending the file. A progress bar is displayed to the user, indicating the status of the download. After the file has been downloaded, the new song is written to the phone memory and it is possible to listen to the preview with the client application. Exceptions: 1) The phone fails to initiate a network connection. 2) The network connection is interrupted before the song has been downloaded. 3) There is not enough memory for saving the downloaded file on the phone. 4) The mobile phone runs out of battery. Post-conditions: The user has downloaded a preview music file from the music server. The song is saved on the phone and it is displayed to the user in the client application user interface. The file can be played with the client application. 5.2.2. Purchasing a license for a song The second use case is about requesting, transmitting and applying a license file to a previously downloaded preview music file. It is also initiated by the client user. The primary task in the use case is applying the license, which requires a lot of 50 processing power and memory from the device. The network connection is used only for requesting and transmitting the license data. Actors: Music store customer Preconditions: The client application has been installed on the mobile phone and it is running. The network connection is available on the phone over Wi-Fi or 3G network. The music store server is running on a computer with network access and has preview music files available for download. The server also has the watermarking keys for the corresponding files stored. The IP address and port of the server and the user ID has been set in the settings of the client application. Description: The user selects a song from a list box which displays all previously downloaded preview music files. Then the user selects a purchase menu command, which starts the license retrieval process. The user is asked for a network access point. He selects either Wi-Fi or 3G network connection. The client application sends a license request containing the ID of the selected song and the user ID to the server. The server generates a unique license containing the user ID embedded in the watermarking key and sends the key back to the client application. After a few seconds, the client begins the noise transform process described in section 4.2. The process removes the noise watermark in the preview version and transforms it into a fingerprinted version where the user ID has been embedded with an inaudible watermark. After the noise transform process, the fingerprinted version is written to the phone memory. The new file is displayed in the user interface Exceptions: 1) The phone fails to initiate a network connection. 2) The network connection is interrupted before the license has been received. 3) There is not enough memory for saving the transformed file on the phone. 4) The mobile phone runs out of battery. Post-conditions: The user has purchased a license for a previously downloaded preview music file. The application has created a high quality fingerprinted version of the song, which is displayed to the user in the user interface of the client application. The file can be played with the client application. 5.2.3. Requirements specification The requirements for the designed system can be derived from the use cases. The requirements can be divided into technical and functional requirements, where the former concern the technical constraints of the system and the latter the actual functionality the system must provide. The technical requirements are listed in Table 6 and the functional requirements in Table 7. 51 Table 6. The technical requirements of the system No. 1 2 3 4 5 6 Requirement The preview files must be downloadable over Wi-Fi or 3G connection in less than a minute. The license files must be downloaded in less than five seconds. The noise transform must not take over 30 seconds. The server must support multiple users simultaneously. The server must not crash if the connection is terminated unexpectedly. The client must not crash if the connection is terminated unexpectedly. Use case 1 2 2 1, 2 1, 2 1, 2 Table 7. The functional requirements of the system No. 1 2 3 4 5 6 7 Requirement The client application must be able to download a list of available preview files from the server and display the list to the user. The user must be able to select a song from the list of preview files and the song must be downloaded and saved on the client device. The user must be able to play preview and fingerprinted songs on the device. The user must be able to select a preview music file and request a license from the server. When requested, the server must generate a unique license based on the ID of the user and send the license to the client. The client must perform a noise transform on the device removing the audible noise from a preview music file and transforming it into a fingerprinted song. The network connection between the server and the client must be left on for the duration the client program is running after the client has connected to the server. Use case 1 1 1, 2 2 2 2 1, 2 5.3. System architecture The general structure of the system is based on client-server architecture. The server functions as a music store service provider and the client application is used for accessing the services provided by the store with a mobile device. These two major components of the system are further divided into several subcomponents which represent more accurate abstractions of the component functionalities. A functional overview of the architecture is presented in Figure 20. 52 Figure 20. An overview of the system architecture with a sequence of events demonstrating the functionality of the system. 5.3.1. Music store The music store is a server component providing services to client users. The store consists of an audio database, watermark database and a license server. An interface for accessing the administration functions is also provided. The audio database is where the original versions and the distributable preview versions of the songs are stored. The original uncompressed versions should be stored as protected files, because they need to be accessed only by store administrators. The watermark database contains all watermarking keys, which are linked to the respective preview music files in the audio database. The keys must also be protected from unauthorized access, because they contain the information for removing the noise from the preview versions available for free download. The license server part is used for generating licenses for songs requested by the client. It uses information from the watermarking database and the client for creating unique licenses which are then sent to the client. The music store has two separate user groups: administrator users and client users. The administrator users have direct access to the audio and watermark databases and they can import new songs to the system. They can also access the original uncompressed songs in the audio database. The client users have access to the music server via TCP/IP services with using the client application. They can download preview versions of the songs for free, or request licenses to be generated for previously downloaded preview versions. The song importing process is managed by the administrator user. During the process the preview versions of the songs are created from the original versions with using the embedding algorithm described in section 4.1. A watermarking key is also 53 created for every imported song. The key contains information for removing the watermark inserted into the preview version and it must be stored securely in the watermarking database. The song preview download process is straightforward. First, the client sends a request to the server for a list of all available songs which is then displayed to the user. The user then selects one song from the list which is then requested from the server and the download process can start. The files could also be available on a HTTP server for downloading with web browsers. The third major task of the music store is generating licenses. The licenses are generated by a license server, which must have access to the watermark database containing the watermarking keys. The license consists from a song-specific pseudorandom key, a scaling value of the synchronization signal and an array of values indicating transitions in the frequency domain of the song. All these values are part of the watermarking key, but the plain frequency domain transitions must not be used in the license. This is because the original array contains the information for removing completely the audible watermark in the preview version. This is not the intention of the license, but instead the watermark should be transformed into an inaudible fingerprint. This is achieved with modifying the frequency domain transition array slightly according to the ID of the user requesting the license. This method generates unique licenses for each user for each song, which increases the security of the system. All communication with the client is done via TCP socket interface. The server initializes a server socket which spawns a new server thread for every incoming socket connection. This approach enables multiple simultaneous users. 5.3.2. Client application The client application is used for accessing the services provided by the music store server. Its purpose is to function as an easy-to-use front-end for using the functions required by the business model described in section 5.1. In addition to the previously mentioned functions related to communication with the music store, song preview downloads and requesting licenses, the client application includes several other features as well. It includes a file browser for accessing the downloaded preview files and fingerprinted music files. It features basic file operations such as deleting and selecting. The music files can be played with selecting a file and clicking the play button. This activates the embedded music player in the application, which offers functions for pausing, resuming and stopping the music playback. The volume can also be controlled with the dedicated volume control buttons of the device. The main functionality concerning the business model is located in the DRM agent component. It contains the algorithm for the noise transform process which removes the audible noise from the preview file and transforms it into an inaudible userspecific fingerprint. Information from the received license is used in the process as described in section 4.2. 54 5.3.3. Communications protocol The client and the server communicate with a simple and efficient text-based communications protocol. The purpose of the protocol is to deliver messages from client to server and vice versa. There are three different message sequences which are all initiated by the client. The first is a request for the content index which contains an array of all the downloadable preview files in the server. The server responds by sending the content index. The next is a request for downloading a specific preview file. The server responds by sending the length of the file and then the actual file. The third sequence is a request for a license of a particular preview file. The client must identify itself by sending the user ID to the server in this phase. The server responds by creating the license and sending it to the client. Table 8 lists all the messages from client to server and Table 9 all the messages from server to client. All messages end with a specific message end character ‘\n’. In the content file message from server to client, the end character is sent before the content file, so that the client knows how to process the following N incoming bytes. Table 8. Messages from client to server No. 1 2 3 Event Request a content index from the server Request a content file (‘filename’) from the server Request a license for a specific content (‘filename’) from the server for the user (‘id’) Format 00# 01#<content>filename</content># 02#<content>filename</content># <userid>id</userid># Table 9. Messages from server to client No. 1 2 3 Event Content index (N = number of files) Content file (N = length of the file in bytes) Content license (K = Pseudo-random key, S = Synchronization signal scale N = Length of the array A A = Array of transitions in the frequency domain, separated by a space character) Format 10#N#filename1#filename2#...# 11#N# Content data 12#K#S#N#A# 5.4. Software design The requirements specification derived from the system use cases are the basis of the software design process. The goal of the software design is to design a software system that provides all the functions defined in the requirements. The design approach used in this thesis was to utilize Unified Modeling Language (UML) 55 techniques for presenting the design. Static software structures were presented with UML class diagrams and the core behavior was described with UML state diagrams. In addition, a UML sequence diagram was used to give an overview of the communication sequence between the server and the client. 5.4.1. Client application The client application architecture is based on a typical Symbian S60 3rd Edition application framework. The large number of classes compared to actual functionality is because of the nature of the S60 architecture. The basic S60 framework consists of Application, Document, AppUi and View classes. The entry point function creates the Application class which creates the Document class which in turn creates the AppUi class. The AppUi acts as a main controller class in the framework, and it is responsible for creating any view or container classes. Most S60 applications follow the traditional software architectural pattern of separating the model, the view and the controller. The model represents the data or the state of the application. It is often also called the application engine. The view contains all the visual elements the application displays to the user such as menus, text or images, and the controller is responsible for reading all user input events and processing them accordingly. Figure 21. The class diagram of the client application. 56 The class diagram of the client application is presented in Figure 21. The basic S60 framework classes are located at the top of the diagram forming the model-viewcontroller architecture. CRemoAppUi is the main controller class, and it handles all user inputs such as menu events or button presses. CRemoAppView is responsible for drawing the user interface which has a file browser as the main component. The contents of the file browser are read in the CRemoEngine class, which implements most of the functionality of the application. Figure 22. The state diagram of the client application. 57 Application settings are contained in the CSettings class. The stored settings are server IP address and port, user ID and volume level. The class stores the settings on a CDictionaryStore, which is basically an ini file with read and write stream access. The settings can be modified with the class CSettingsDialog which opens an editing dialog for modifying the values. These classes are owned by the CRemoEngine but operated by the CRemoAppUi when the user selects the settings menu command. The network connection is created and operated by the CSocketsEngine class. It uses several classes to assist in the process, such as CNetConnection which creates and maintains the actual connection to the network. CMessage is used for parsing the message received from the server and delivering it to CRemoEngine. CSocketsReader and CSocketsWriter are used in the corresponding socket operations and CTimeOutTimer notifies the socket engine in case there is a connection timeout. Additional classes used by the engine class are CSecondTimer, CFileHandler, CAudioPlayer, CAlgorithm and CWatermarkFunctionLibrary. The timer class is used for calling delayed operations such as reconnecting to the server or starting the noise transform process. The transform process must be delayed a little because the user interface must be updated before the transform process starts, and it takes a moment to redraw the screen. CFileHandler encapsulates some functions for accessing files, CAudioPlayer contains the functions for audio playback and CAlgorithm implements the actual watermarking algorithms required for the noise transform process. In addition, CWatermarkFunctionLibrary also contains some helper functions for handling complex arrays in S60 environment. A simplified state diagram of the client is presented in Figure 22. It presents the functions of the application, the server connection logic and the communication sequences between the server and the client. The only simplifications are the omission of the settings menu and the file browser states. These are separate from the main functionality so the validity of the state diagram is not affected. The client starts unconnected with the file browser displaying the files in the sounds folder of the S60 directory structure. The possible functions for the user at this point are playing an audio file or requesting a content list or a license. If the user selects an audio file and clicks play, the audio playback process is started. The process terminates when the end of the audio clip is reached or the stop button is pressed. The playback can also be paused and resumed. Another two functions available for the user are requesting a content list from the server or requesting a license for a previously downloaded preview music file. Both of these processes require the network connection to the server, so the connection process is started. After asking the network access point from the user, an Internet connection is first established and then a direct TCP/IP connection to the music store server is created. Then depending on which function was selected before in the first place, the corresponding message is sent to the server. In addition to the initial idle state, there is another idle state where the only difference is that the server connection has been established. Both states have the same functions available, but the connected idle state is the only state where the client can receive messages from the server. After a new message has been received, it is parsed and the output is presented to the user. Downloading large files from the server is a special case where the client application goes into a special content receiving state where the client does not parse the incoming data, but instead it writes everything into a buffer. After the amount of bytes specified in the header message has been received, the client writes the data into a file and returns to the idle state. 58 5.4.2. Server application The server is based on a basic Java socket server architecture. This contains the main method in a separate RemoServer class, which creates a ServerSocket object and assigns a specified TCP port to it. A separate RemoServerThread class contains the actual server functionality such as reading input commands and sending data to the client. The class diagram of the music store server is presented in Figure 23. Figure 23. The class diagram of the server software. The state diagram presented in Figure 24 illustrates the functionality of the RemoServer class. The main method of the server keeps listening to the port, and it creates a new RemoServerThread every time a new incoming connection is detected. RemoServer then passes the corresponding Socket object to the new thread so that it can have read and write access to the socket. Figure 24. The state diagram of the RemoServer class. The functionality of the RemoServerThread class is presented as a state diagram in Figure 25. After a new RemoServerThread object is created, it goes into an idle state where the thread listens to input commands from the socket. After a command has been received, the thread parses it according to the communication protocol rules presented in section 5.3.3. If the parsed command is a content index request, the server reads the content index from a file and creates the response message in an appropriate format. The message is then sent to the client. If the input command is a content request, the server reads the requested content file and creates and sends the message to the client. In the case of a license request, the server must first read the original license data from the file system. This data contains the pseudo-random key, synchronization signal scale and an array of transitions in the frequency domain. The final license data is then created by modifying the license data according to the user ID the client has sent to the server. After this operation, the license message is created and sent normally. After sending any message to the client, the server returns to the idle state. The server thread is terminated in case the connection to the client is closed. 59 Figure 25. The state diagram of the RemoServerThread class. Figure 26. The sequence diagram illustrating the message interchange between the client and the server. 60 5.4.3. Sequence diagrams The network messages used between the server and the client are specified in section 4.4.3. All the possible sequences of messages can be derived also from the state diagram of the client software, but Figure 26 illustrates a basic message sequence. All time-consuming activities during the interchanging of messages are preformed at the client side. This is an important note that can be made from the sequence diagram. The only phase which takes a longer time than a fraction of a second on the server is sending the content files, because this operation is divided into multiple message packets. The conclusion from this observation is that the server can easily manage multiple connections at the same time. The performance should not become an issue. 61 6. SYSTEM IMPLEMENTATION AND TESTING The system implementation is based on the software design process presented in chapter 5. The watermarking algorithm presented in chapter 4 is also an essential part of the implementation. This chapter discusses the main features of the implemented software system with an emphasis on the system and user testing. The system functionality was tested against the requirements specified in section 5.2.3. The user tests were based on a web-based Audio Quality Evaluation Tool. 6.1. Software platforms The server was implemented on Java SE 1.6 platform on a Linux server with kernel version 2.6.18. The implementation is characterized by the architecture of the basic Java socket server and the rather small amount of different messages it must be able to handle. This resulted in a moderately simple implementation with less than 400 lines of Java code. The required audio and watermark databases were implemented with using a dedicated directory in the filesystem where the preview audio files were stored. The content index and the license database were implemented with a text file, which could be easily read with the server application. The client was implemented on Symbian S60 3rd edition platform. The used SDK version was S60 3rd Edition Feature Pack 1 (FP1) and the device used while debugging and user testing was Nokia N95, which uses the same S60 3rd Edition FP1 operating system version. The use of S60 platform sets many requirements for the developer and the development process. The platform is not as slick as the Java platform in terms of documentation quality and the quality of the platform libraries delivered with the SDK. The application was implemented with reusing several components form previously created S60 applications by the author. Such components include parts of the algorithm, logging system, socket engine, timer, file handler and the watermark function library. The use of these components facilitated rapid software development, but the nature of the S60 platform is unfortunately such that it sometimes behaves illogically and presents strange problems at random, for example, during the compiling process. This causes the development process to be not as rapid as it would be on some another platform. The signal processing algorithms required by the system were originally designed and developed on a Matlab environment. This is the general approach and provides a fast and efficient way of testing and debugging new algorithms. The algorithm used for the noise transform process was then ported to the S60 platform to be used in the client application. 6.2. Limitations The implemented system has some limitations, or simplifications, concerning some parts of the functionality. Most of the limitations concern the server part of the software. The two most distinguishable limitations are the lack of an actual financial transaction during the license purchasing process and the lack of a proper interface for importing the songs to the music store server. 62 Before the license is generated on the server and sent to the client, the financial transaction event should occur. However, in the current implementation it is left as a concept only and it is not implemented. This is because the transaction would be too complex to implement and also because it would not affect the general usability and the test results. The other major simplification is the song importing process. The importing algorithm was not ported on the software platform of the server, but instead it was used with the Matlab program. The algorithm produces the watermarking key and the preview version of the song, which can then manually be uploaded to the server. This is unpractical if done for hundreds or more songs, but in this test, only a couple of dozen songs were used so the simplification was justified. In addition, the server has no graphical user interface, but instead it is used via TCP/IP access by the client and with direct file access to the databases by the administrator. This approach also accelerated the development process. Another limitation is that the communications channel between the server and the client is not secure. It is done with a regular unencrypted TCP/IP socket connection which would be an enormous security issue if it was used on a real system. A public-key encryption could be implemented, but it was left out because it would not affect the test results in any way. 6.3. Functional tests The system functionality was tested against the requirements specified in section 5.2.3. The system should provide all behavior listed in Table 7 and also follow the technical conditions specified in Table 6. The results of the functional tests are presented in this section. 6.3.1. Downloading a list of preview files Requirement: The client applications must be able to download a list of available preview files from the server and display the list to the user. Execution: The user selects the Download songs command from the menu, and selects a valid access point to be used for creating the Internet connection. A connection to the server is established and the list of the available preview files is transmitted from the server to the client. The list is displayed to the user. Result: The test was successful. 6.3.2. Downloading a preview file Requirement: The user must be able to select a song from the list of preview files and the song must be downloaded and saved on the client device. 63 Execution: The user selects the Download songs command from the menu, and selects a valid access point to be used for creating the Internet connection. A connection to the server is established and the list of the available preview files is transmitted from the server to the client. The list is displayed to the user. The user selects a song from the list and presses the download button. The file transmission begins and a progress bar indicating the download status is displayed to the user. After the download is completed, the file is saved on the device and it appears on the file browser. Result: The test was successful. 6.3.3. Music file playback Requirement: The user must be able to play preview and fingerprinted songs on the device. Execution: The user selects a preview music file on the file browser and presses the OK button. Music playback begins, and the device softkeys are changed to Pause and Stop. After the playback is complete, the music playback ends and the softkeys are changed back to Options and Exit. The user then selects a fingerprinted song and clicks the OK button. The music playback begins. Result: The test was successful. 6.3.4. Requesting a license for a preview file Requirement: The user must be able to select a preview music file and request a license from the server. Execution: The user selects a preview file on the file browser and selects the Purchase menu option. After selecting a valid access point, the server connection is established and the license is immediately received. A text indicates that the license has been received and the noise transform process has started. Result: The test was successful. 6.3.5. Generating unique licenses Requirement: When requested, the server must generate a unique license based on the ID of the user and send the license to the client. Execution: The user selects a preview file on the file browser. Then the user selects the Purchase menu option. After selecting a valid access point, the server connection 64 is established and the text output of the server indicates that the client has requested a license and it has sent the filename of the song and the user ID to be used in the license creation process. The server receives the data and reads the watermarking key from the database. Then it modifies the watermarking key according to the user ID. A unique license is created for the user, which is then sent to the client. Result: The test was successful. 6.3.6. Noise transform Requirement: The client must perform a noise transform on the device removing the audible noise from a preview music file and transforming it into a fingerprinted song. Execution: The user selects a preview file on the file browser. Then the user selects the Purchase menu option. After selecting a valid access point, the server connection is established and the license is immediately received. The noise transform process has started. After 13 seconds, the process is completed and a new fingerprinted file is created. The new file is displayed on the file browser. Result: The test was successful. 6.3.7. Maintaining the network connection Requirement: The network connection between the server and the client must be left on for the duration the client program is running after the client has connected to the server. Execution: The user selects the Download songs command from the menu, and selects a valid access point to be used for creating the Internet connection. A connection to the server is established and the list of the available preview files is transmitted from the server to the client. The list is displayed to the user, but he selects Cancel and the file browser is again displayed. The server connection is still on. Result: The test was successful. 6.4. Technical tests The technical properties of the system were tested against the technical requirements specified in section 5.2.3. The tests were performed on a Nokia N95 phone and a Wi-Fi connection. The server was running on a dedicated server computer running Linux operating system. 65 6.4.1. Preview file download time Requirement: The preview files must be downloadable over Wi-Fi or 3G connection in less than a minute. Execution: The user selects the Download songs command from the menu and selects a valid access point. The server connection is established and the list of the available preview files is displayed to the user. Then the user selects a song from the list and presses the download button. The file transmission begins and a progress bar is displayed to the user, which indicates the download status. The download time is recorded and the process is repeated for every file. The results are displayed on Table 10. Result: The test was successful. Table 10. Technical performance details of the system Audio clip Download time aerosmith bigyellow bryanadams celine cocker dafunk duel exitmusic finlandia Madonna metallica ordinaryworld queen rushing sipe sting AVERAGE 8s 10s 11s 10s 16s 15s 18s 19s 16s 11s 11s 16s 10s 22s 14s 10s 13.6s License download time < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s < 1s Noise transform time 12s 15s 13s 12s 13s 15s 17s 19s 16s 14s 12s 19s 13s 15s 17s 13s 14.7s 6.4.2. License file download time Requirement: The license files must be downloaded in less than five seconds. Execution: The user selects a preview file on the file browser and selects the Purchase option from the menu. After selecting a valid access point, the server connection is established and the license is received. The download time is recorded and the process is repeated for every file. The results are displayed on Table 10. Result: The test was successful. 66 6.4.3. Noise transform processing time Requirement: The noise transform must not take over 30 seconds. Execution: The user selects a preview file on the file browser and selects the Purchase option from the menu. After selecting a valid access point, the server connection is established and the license is received. Then the noise transform process begins and its duration is recorded. The results are listed in Table 10. Result: The test was successful. 6.4.4. Multiple users support Requirement: The server must support multiple users simultaneously. Execution: The user selects the Download songs menu command and selects a valid access point. After the server connection is established, the list of available preview files is shown to the user. At this point, another user connects to the server and requests the list for the preview files. A new thread is created on the server. Both users can select and download preview songs at the same time. Result: The test was successful. 6.4.5. Server stability Requirement: The server must not crash if the connection is terminated unexpectedly. Execution: The user selects the Download songs menu command and selects a valid access point. After the server connection is established, the list of available preview files is shown to the user. Then, the client device is shut down by pressing the power button. This terminates the connection between the server and the client. The server console displays a java.net.SocketException message and the running thread is terminated. The server is accepting new connections normally. Result: The test was successful. 6.4.6. Client stability Requirement: The client must not crash if the connection is terminated unexpectedly. Execution: The user selects the Download songs menu command and selects a valid access point. After the server connection is established, the list of available preview files is shown to the user. Then the server process is killed in the server machine. The 67 connection between the server and the client is terminated. After the connection timeout, the client notices that the connection has been lost. The client can use all commands normally and the connection to the server is attempted if the command requires a server connection. Result: The test was successful. 6.5. User tests In addition to the algorithm imperceptibility tests, the test users answered a questionnaire on the business model behind the system. The users were first introduced to the client application running on a Nokia N95 mobile phone, and then they used the application to download free preview audio files from the server. After listening to the audio files, they purchased a license to at least one preview file, which was then transformed into a fingerprinted audio file. They listened to the fingerprinted version and compared it to the preview version. After the test, they answered the questionnaire. The answers are presented in Table 11. Table 11. User questionnaire results Question Have you bought music with a mobile device? Do you think a full song preview with the noise is better than a normal preview with a short high quality sample? Do you think a full song preview with the noise is better than a full song preview with otherwise decreased audio quality (low bit rate)? Do you think it is beneficial that you don't need to download the song again after purchasing? Would you consider using the system if it was commercially available? Yes 0% 100% No 100% 0% 60% 40% 100% 0% 20% 80% The questionnaire was presented with the same audio quality evaluation toolkit as the imperceptibility tests. Before listening to the watermarked and non-watermarked samples, the users answered the questions about their usage experience on the system. The respondents are therefore the same as in the imperceptibility test, which was explained in section 4.4.2. 6.6. Discussion The users who participated in the questionnaire had no previous experience on purchasing music from online music stores with a mobile device. The reason is probably that the era of mobile music sales is just taking its first steps with the iTunes support of iPhone, Nokia Music Store and the Comes With Music service coming to the latest smartphones. The trend is clear, however, as the eMarketer report shows [1]. 68 The users were quite reluctant for using a similar system if it was commercially available. The users may have thought that the commercial system would be as crude as the implemented demo version. The iPhone solution, where you can easily purchase music from iTunes, is currently settings the standards for implementation quality required by the customers. Another reason can be that people do not appreciate the idea of having music available in their mobile phones. There are still many people who do not have smartphones with digital music player capabilities, and some are still very content with phones that have only call and text messaging capabilities. The idea for using the full song for previewing purposes was appreciated. Every test user preferred the full song to the generally used short preview clips. They also appreciated the feature that they did not have to download the song again after they had listened to the preview version and decided to unlock the high quality version. The watermark noise used to disturb the listening experience in the preview version did not get full marks from every test user. Some would have preferred a low bit rate version or otherwise decreased audio quality to the watermark noise in the preview version. The watermark noise inserted by the embedding algorithm can be modified to sound different, but this thesis concentrated more on the imperceptibility and robustness of the fingerprint watermark. Future research could put more emphasis on the sound quality of the initial watermark on the preview songs. 69 7. DISCUSSION The failure of encryption-based DRM systems in digital audio distribution has led to an increase in online music stores that sell their music in some unprotected audio format, such as MP3. At the same, time the online and mobile music markets continue to grow as more people are getting accustomed to using digital audio players and smartphones. This has lead to an increasing need for new DRM technologies that could be able to protect the content while it is stored externally unprotected. Digital watermarking can be used for creating solutions based on embedding inaudible identifiers known as digital fingerprints in audio. These fingerprints can then be used for detecting the origin of the content in the case of Internet piracy. This work designed and implemented an audio protection system utilizing removable watermarking and fingerprinting technologies. The fingerprint watermark was robust against signal processing attacks and it was proven very close to being imperceptible in the listening tests. A more detailed analysis of the robustness and imperceptibility tests are presented in section 4.5. Test users also answered a questionnaire on purchasing music with a mobile device. These results are discussed in section 6.6. Digital audio watermarking has been widely researched and several methods have been developed for embedding the watermark. This thesis is also a continuation in the series of digital watermarking publications at the MediaTeam Oulu research group. The digital watermarking research at MediaTeam has currently two main topics: image and audio watermarking. The algorithms presented in this thesis are part of the long-term research of audio watermarking algorithms, and they combine pseudo-random frequency hopping sequences and spread spectrum synchronization with removable watermarking and fingerprinting techniques. Related work in the field has usually concentrated on algorithm details and testing the algorithm performance. This work also presents a fully functional system for mobile audio distribution utilizing watermarking and fingerprinting techniques. This allowed test users to try out the system and give their valuable opinion. Related work on removable watermarking has mainly concentrated on digital image watermarking, while this work combines audio watermarking with removable watermarking techniques. The industry has been very interested in the algorithms developed in this study, and a local company has bought the rights to apply for a patent for the invention of transforming the audible noise into an inaudible fingerprint. This proves the significance of this work to the industry. Also because the results of this work show a clear and significant improvement over the previous version of the algorithm, a new paper based on this thesis will be submitted to a conference as well. The interested response of the industry opens up many directions for future research. The algorithms could be improved in many ways. From the user point of view, the sound quality of the watermark in the music file previews should be optimized to be at the exact level where it would not be too disturbing, but still make long-term listening uncomfortable. Also, the fingerprint watermark is still not perfect in terms of robustness and inaudibility. The robustness against the pitch attack could be improved by embedding the signal wider in the frequency band. That way, the modifications in the audio frequency would not destroy the watermark so easily. Robustness should also be evaluated with full length songs. This could reveal new ways to optimize the embedding algorithm. 70 The perceptual quality of the fingerprint watermark could be improved with using an optimal frequency band for every audio file. The use of a more effective frequency band could allow lowering the embedding strength of the fingerprint, which would directly improve the perceptual quality. 71 8. SUMMARY This thesis designed, implemented and evaluated an audio protection system for mobile distribution environment. The technological focus was on digital rights management and digital audio watermarking techniques. The audio protection system consists of a server and a client component. The server provides free preview music files that can be downloaded with the client application. The preview files contain an audible watermark noise signal. The idea is to allow the music vendors to distribute complete songs as previews with the watermark, which also enables the customers to have better previews of the music instead of the traditional 30 second samples. The noise signal makes long-term listening of the preview samples unpleasant, but still allows the users to have a proper preview of the whole song. The client application contains a watermarking algorithm for transforming the noise signal into an inaudible fingerprint effectively creating a high quality version of the song. This process requires a license which is generated by the server and sent to the client. The license contains the watermarking key required for the noise transform process. The purpose of the digital fingerprint inserted in the noise transform process to the high quality version is to identify the user in case the song is leaked to unauthorized domains such as piracy torrents. The advantage of having the identity of the song owner in a digital watermark is that it cannot be removed with traditional DRM circumventing techniques such as burning the music to CD and then ripping it back into some unprotected format. The watermark remains in the audio although it would be transformed into analog format. The implementation was tested with three separate test cases. The robustness of the fingerprint watermark was tested against an extensive set of attacks, which performed inaudible changes to the audio and tried to destroy the fingerprint. The results proved that the algorithm was robust even when short audio clips were used. The audio files used in real life cases would be 10-15 times longer, which should improve the results notably. The inaudibility of the fingerprint was tested with a listening test by 10 test users. The results implicated that an average user could not tell the difference between watermarked and non-watermarked audio clips. The algorithm should still be improved in terms of audio quality with clips which are dynamically compressed. The last test presented the test users a questionnaire about the audio distribution business case implemented in the system. They liked the idea that the full song was available in the free preview version, although the watermark sound quality could be improved. The algorithms developed in this work have received an interested response from the industry, and a local company has bought the rights to apply for a patent for the idea of transforming the audible noise watermark into an inaudible fingerprint. Also, the good results in the robustness and imperceptibility tests could enable publishing a conference paper on the topic of this thesis. The main future work aspects include optimizing the perceptual quality of the audible watermark in the preview versions into a level where it sets a balance between giving a good preview and encouraging the customer to purchase the high quality version. Also, the robustness and imperceptibility could be improved with additional research on the watermark embedding details. 72 9. REFERENCES [1] Verna, P. (2007). Recorded Music: Digital Falls Short. (read 11.8.2008) eMarketer report. URL: http://www.emarketer.com/Reports/All/Emarketer_2000472.aspx [2] Löytynoja, M., Cvejic, N. and Seppänen, T. (2007). Audio protection with removable watermarking. Proc. Sixth International Conference on Information, Communications and Signal Processing (ICICS 2007), December 10-13, Singapore, 1-4. [3] Rosenblatt, W., Mooney, S. and Trippe, W. (2001). Digital Rights Management: Business and Technology, John Wiley & Sons, Inc, Chichester, 312 p. ISBN: 978-0-7645-4889-5 [4] Rump, N. (2003): Digital Rights Management: Technological Aspects. In Digital Rights Management – Technological, Economical, Legal and Political Aspects. LNCS 2770, Springer, Berlin, 3-15. ISBN: 3-540-404651 [5] Hauser, T. and Wenz, C. (2003): DRM Under Attack: Weaknesses in Existing Systems. In Digital Rights Management – Technological, Economical, Legal and Political Aspects. LNCS 2770, Springer, Berlin, 206-223. [6] Iannella, R. (2001). Digital Rights Management (DRM) Architectures. DLib Magazine 7(6). [7] Open Digital Rights Language (ODRL) (read 23.7.2008) URL: http://odrl.net [8] Rights Expression Language. Approved Version 1.0 – 15 June 2004. Open Mobile Alliance. OMA-Download-DRMREL-V1_0-20040615-A [9] DRM Rights Expression Language. Approved Version 2.0.1 – 26 Feb 2008. Open Mobile Alliance. OMA-TS-DRMREL-V2_0_1-20080226-A [10] DRM Rights Expression Language. Candidate Version 2.1 – 24 Jul 2007. Open Mobile Alliance. OMA-TS-DRMREL-V2_1-20070724-C [11] eXtensible Rights Markup Language (XrML) (read (23.7.2008) URL: http://www.xrml.org [12] Löytynoja, M. and Seppänen, T. (2005). Hash-based Counter Scheme for Digital Rights Management. Proc. 2005 IEEE International Conference on Multimedia & Expo, Amsterdam, The Netherlands, 121-124. [13] Taylor, S. (2007). Industry sees sunnier side of digital copying. (read 17.11.2008) Global Technology Forum, Best practice, 21 Aug 2007. URL: 73 http://globaltechforum.eiu.com/index.asp?layout=printer_friendly&doc_id =11248 [14] Emi abandons CD DRM. (read 6.8.2008) Boing Boing. URL: http://www.boingboing.net/2007/01/08/emi-abandons-cd-drm.html [15] Halderman, J. A. and Felten, E. W. (2006). Lessons from the Sony CD DRM episode. Proceedings of the 15th conference on USENIX Security Symposium - Volume 15. Vancouver, B.C., Canada, USENIX Association. [16] Digital developments could be tipping point for MP3. (read 11.8.2008) URL: http://www.reuters.com/article/musicNews/idUSN0132743320071203 [17] Apple iTunes Store Support – Authorization FAQ (read 11.8.2008) URL: http://www.apple.com/support/itunes/store/authorization/ [18] Jobs, S. (2007). Thoughts on Music. (read http://www.apple.com/hotnews/thoughtsonmusic/ [19] Anderson, N. (2007). PlayForSure becomes “Certified for Windows Vista” (read 12.8.2008) URL: http://arstechnica.com/news.ars/post/20071212playforsure-becomes-certified-for-windows-vista.html [20] Nokia outlines its vision of Internet evolution and commitment to environmental sustainability. (December 2007) Nokia Press Release. URL: http://www.nokia.com/A4136001?newsid=1172937 [21] Fisher, K. (2007). Musicload: 75% of customer service problems caused by DRM (read 13.8.2008) URL: http://arstechnica.com/news.ars/post/ 20070318-75-percent-customer-problems-caused-by-drm.html [22] Juergen, S. (2005). Digital Watermarking for Digital Media, Information Resources Press, Arlington, VA. ISBN: 159140519X [23] Swanson, M. D., Kobayashi, M. and Tewfik, A. H. (1998). Multimedia data-embedding and watermarking technologies. Proceedings of the IEEE 86(6): 1064-1087. [24] Petitcolas, F. A. P., Anderson, R. J. and Kuhn, M. G. (1999). Information hiding - a survey. Proceedings of the IEEE 87(7): 1062-1078. [25] Cvejic, N. and Seppänen, T. (eds.) (2008) Digital Audio Watermarking Techniques and Technologies: Applications and Benchmarks, Information Science Reference, Hershey, PA, USA, 1-10. [26] Pramila, A. (2007) Watermark synchronization in camera phones and scanning devices. Master’s Thesis, University of Oulu, Department of Electrical and Information Engineering, Oulu, Finland. 11.8.2008) URL: 74 [27] Mäkelä, K. (2000) Digital watermarking and steganography. Diploma Thesis, Department of Electrical Engineering, University of Oulu, Oulu, Finland. [28] Petitcolas, F. A. P. (2003): Digital Watermarking. In Digital Rights Management – Technological, Economical, Legal and Political Aspects. LNCS 2770, Springer, Berlin, 81-92. [29] Cvejic, N. and Seppänen, T. (2004). Spread spectrum audio watermarking using frequency hopping and attack characterization. Signal Processing. 84(1): 207-213. [30] Wen-Nung, L. and Li-Chun, C. (2006). Robust and high-quality timedomain audio watermarking based on low-frequency amplitude modification. IEEE Transactions on Multimedia. 8(1): 46-59 [31] Löytynoja, M. (2008) Digital Rights Management of Audio Distribution in Mobile Networks. Dissertation, Acta Univ Oul C 311, Department of Electrical and Information Engineering, University of Oulu, Finland. [32] Cvejic, N., Keskinarkaus, A. and Seppänen, T. (2001). Audio watermarking using m-sequences and temporal masking. Proc. 7th IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York, NY, 227-230. [33] Jong Won, S. and Jin Woo, H. (2001). Audio watermarking for copyright protection of digital audio data. Electronics Letters 37(1): 60-61. [34] Liu, K. J. R., Trappe, W., Wang, Z. J., Wu, M. and Zhao, H. (2005). Multimedia Fingerprinting Forensics for Traitor Tracing, EURASIP Book Series on Signal Processing and Communications, Hindawi Publishing Corporation, New York, 272p. ISBN: 978-9775945181 [35] Steinebach, M., Petitcolas, F. A. P., Raynal, F., Dittmann, J., Fontaine, C., Seibel, S., Fates, N. and Ferri, L. C. (2001). StirMark Benchmark: Audio Watermarking Attacks. Proceedings of the International Conference on Information Technology: Coding and Computing, April 2-4, Las Vegas, 49-54. ISBN: 0-7695-1062-0 [36] Wu, M., Trappe, W., Wang, Z. J. and Liu, K. J. R. (2004). Collusionresistant fingerprinting for multimedia. Signal Processing Magazine, IEEE 21(2): 15-27. [37] Feng, J.-B., Lin, I.-C., Tsai, C.-S. and Chu, Y.-P. (2006). Reversible watermarking: Current status and key issues. International Journal of Network Security 2(3): 161-171. [38] Digital Watermarking Alliance brochure. (read 5.11.2008) URL: http://www.digitalwatermarkingalliance.org/about.asp 75 [39] MarkAny – Inaudible and Robust Audio Watermark Technology. (read 5.11.2008) URL: http://www.markany.com/en/sub_index.asp?fn=product &spname=product_04_04 [40] Verance Music Solutions (read http://www.verance.com/solutions/music.php [41] Verance Announces Availability of Audio Watermark Technology for High-Definition Entertainment Formats. (read 5.11.2008). Verance press release. July 2, 2007 [42] ITU-T Recommendation P.800 (1996). Determination of Transmission Quality 5.11.2008) Methods for URL: Subjective 76 10. APPENDICES Appendix 1 The properties of the applied attacks against the fingerprint watermark 77 Appendix 1 No. 1 The properties of the applied attacks against the fingerprint watermark Attack MP3 compression 2 Chorus 3 Compressor 4 Delay 5 Flanger 6 7 8 Invert Low pass filter Pitch 9 10 Random noise Resampling 11 Reverb 12 Stretch Properties Encoder: LAME version 3.97 MMX Parameters: --quiet –h –b 128 (attack #1) Parameters: --quiet –h –b 192 (attack #2) Sampling frequency: 44100Hz Voices: 5 Delay time: 30ms Delay rate: 1.2Hz Feedback: 10% Spread: 60ms Modulation depth: 5dB Modulation rate: 2.0Hz Output level: Dry 100%, Wet 5% Attack time: 1ms Release time: 500ms Output gain: 0dB Threshold: -50dB Ratio: 1:1.0 Delay time: 400ms Mix: 5% Initial delay: 1ms Final delay: 2ms Stereo phasing: 45 degrees Feedback: 10% Modulation rate: 0.40Hz Mix: 5% Cut-off frequency: 15kHz Splicing frequency: 40Hz Overlapping: 33% Ratio: 101% (Attack #1) Ratio: 99% (Attack #2) Maximum noise amount: 0.91% Resampling to 8000Hz Bit depth: 8 Decay time: 700ms Pre-delay time: 10ms Diffusion: 1818ms Perception: 50 Output level: Dry 100%, Wet: 20% Splicing frequency: 40Hz Overlapping: 33% Ratio: 102% (Attack #1) Ratio: 98% (Attack #2)