August 2012
Transcription
August 2012
August 2012 ISSN 1932-8214 AT&T makes its speech technology available to developers Network-based speech recognition accessed through an API AT&T Research (formerly Bell Labs) has been involved in speech technology research for many decades, for example, developing a continuous digit recognizer in the 50s. (See the interview with Mazin Gilbert, AVP of the Intelligent Systems Organization, AT&T Research, SSN, May 2012, p. 15.) The company’s speech recognition technology has found a home in some deployed applications, including the Vlingo voice assistant that is part of the new Samsung Galaxy III, with the assistant called S-Voice (SSN, July 2012, p. 1). (Vlingo is now part of Nuance, so the technology used in the Samsung phone lines may evolve eventually to Nuance technology.) Among other applications, AT&T speech technology has been used within AT&T for IVR customers for over 20 years. The AT&T Watson speech technology has now been made available to developers as a networkbased service accessed through an Application Programming Interface (API) that AT&T recently released. Gilbert summarized in a note to Speech Strategy News: “By exposing the speech APIs, we are lowering the barrier to entry for developers to empower their applications with speech. The responses we have received so far have been overwhelming. Our plans don't stop here. We will continue to expose additional APIs and innovations to enable developers to create more advanced and personalized mobile applications ranging from virtual assistants to interactive gaming. Stay tuned!” Continued on page 17 Nuance to release Dragon NaturallySpeaking 12 desktop dictation software Improved accuracy and more ease-of-use features On July 26, Nuance Communications announced the latest version of their speech-to-text dictation software for Windows personal computers, Dragon NaturallySpeaking 12. While Nuance says there are more than 100 new features and enhancements, perhaps the most important is a 20% drop in error rate on average in the core speech recognition technology out-of-the-box plus faster response, according to Erica Hill, a product marketing manager at Nuance. The faster speed of transcription is achieved in part by taking advantage of multi-core processors and more memory if available, Hill said. Dragon is the core technology in Dragon Dictation, the Dragon Go! mobile personal assistant, Dragon TV, Dragon Drive! (the automobile version), and Dragon ID (biometric speaker authentication). Continued on page 18 Google Voice Search in Android 4.1 Jelly Bean More direct answers with Google Now and “all business” Google made announcements on June 27 on improved voice search in the next version of Android, which allows queries in natural language (SSN, July 2012, p. 7). With Google’s Knowledge Graph display of search results that attempt to provide a more direct answer to your inquiry next to the classical research results (SSN, June 2012, p. 1), The voice search can also respond by voice. Google also announced Google Now, which adds to user-initiated searches in that it attempts to anticipate a search using location and time of day (e.g., weather), and displays multiple possible panes of such information. It can automatically notice you have an appointment upcoming, calculate travel time, post a map in the pane, and even warn you when you should leave. In the past, Google has in the past displayed what it calls “one boxes,” displaying relevant information in one pane next to the search web sites, such as a map showing locations of restaurants in response to a search for “Greek restaurants.” Google Now also formats the results in a way more suited to a mobile device. Google also has a translation app within Google Now. Continued on page 19 Interviews with Nik Stanbridge, VoiceVault, p. 13; Chih-Chung Kuo, ITRI, p. 15 Speech Strategy News August 2012 2 Table of Contents AT&T makes its speech technology available to developers 1 Nuance Dragon Drive! messaging in 2012 BMW 7 and BMW 3 Series 12 Nuance to release Dragon NaturallySpeaking 12 desktop dictation software 1 AVIOS Speech Conference in Israel draws both academia and industry 12 1 12 Network-based speech recognition accessed through an API 1 Improved accuracy and more ease-of-use features Google Voice Search in Android 4.1 Jelly Bean More direct answers with Google Now and all business Editor’s Notes Adding value rather than adding apps Bill Meisel, Publisher & Editor Commentary: Follow-up on software patents Bill Meisel, Publisher & Editor ICSI and Microsoft to collaborate on conversational human-machine interaction 1 1 5 5 5 6 6 7 International Computer Science Institute will explore speech and other modalities 7 US Department of Justice licenses Nexidia audio discovery 7 Criminal Division installs Nexidia audio technology to streamline investigations 7 iSpeech Cloud mobile speech platform claims 13,000 developers 7 Improvements in speech recognition and text-tospeech and a home automation solution announced 7 Empirix launches automated contact center regression testing as a service Checks how existing services are impacted when changes are introduced Voxeo launches "Zombie IVR" campaign 8 8 9 “Talking-Dead” self-service platforms continue to suck the life out of customer satisfaction 9 Calabrio integrates speech analytics and workforce optimization More efficient review of voice transactions StrikeForce and TradeHarbor partner to offer three-factor voice verification 10 10 10 Out-of-band authentication adds security for mobile transactions 10 Active Endpoints allows use of enterprise software from a mobile phone Helps iPhone and Android smartphone users to visualize, create, and modify their own wizards 11 11 Polish telecom deploys Nuance VocalPassword for use by its employees 11 Subsidiary of Deutsche Telekom uses voice verification for automatic resetting of passwords 11 Speak emails and text messages with cloud-based transcription 12 Speakers from around the world addressed both academic and business issues Trapit uses natural language text processing to deliver content 12 Highly personalized content based on user-specified interest and user-specific adaptation 12 Interview with Nik Stanbridge, VoiceVault 13 Interview with Chih-Chung Kuo, Industrial Technology Research Institute 15 Biometric identity verification with text-dependent voiceprints 13 ITRI’s speech research includes speech recognition, speaker recognition, and speech synthesis 15 News briefs ............................................... 19 VoiceVault releases new generation of its speaker verification.................................................................... 19 Samsung reduces search capability in its Galaxy SIII smartphone, apparently in response to Apple patent suit ................................................................................ 20 Nuance hints that its personal assistant aimed at corporations will be called “Nina” .............................. 20 Voxeo Labs announces strategic partnership with Deutsche Telekom ....................................................... 20 Pronexus adds web sites to support IVR developers ... 20 Spoken Communications partners with Varolii to combine customer interaction applications with Spoken’s inbound capabilities.................................... 20 Chinese search giant Baidu opens tech lab in Singapore, with a partial focus on speech applications20 SRI reveals voice assistant for Spanish banking group BBVA ............................................................................. 20 Northwest Multiple Listing Service selects Interactive Intelligence’s IP communications software suite ...... 21 CallMiner analytics solution adds new personalization features ........................................................................ 21 Hungarian telecom operator introduces voice identification in customer service operations using Nuance voice biometric solution ................................ 21 Sandata Technologies Electronic Visit Verification technology to improve visibility and oversight of home care delivery for Louisiana Department of Health and Hospitals ...................................................................... 21 Easy Voice Biometrics allows finding closest match to individuals when comparing voice files ..................... 21 Google adds voice search to Google+ Local on iOS ..... 22 VoiZapp app uses Android speech recognition to post to a Facebook news feed ................................................ 22 Mossberg review of Android Jelly Bean criticizes the voice assistant ............................................................. 22 Speech Strategy News August 2012 3 Wolfram|Alpha provides answers to Samsung’s S Voice as well as Apple’s Siri .................................................. 22 BMW to incorporate Nuance voice command in its dashboards .................................................................. 22 Horizon Private Cloud provides outsourced services to Voice Automated, a Nuance Dragon reseller............. 22 Motorola’s new ATRIC HD phone for AT&T automatically goes into car mode when docked .............................. 23 Dictation features in new Mac OS is done in the network, some personal data is used ........................ 23 Mercedes-Benz adds connection to Apple Siri to its COMAND navigation system ....................................... 23 It’s will be legal to text while driving in California if you use speech recognition in the new year .................... 23 SoundGecko web application and mobile phone app is a text-to-speech service .............................................. 23 Veveo provides predictive search on Android phones, including personal info on phone ............................... 23 Nuance Dragon Dictation and Dragon Search apps now available in Vietnam .................................................... 24 Samsung TV has voice recognition ............................... 24 HondaLink allows communicating with your car using your smartphone ......................................................... 24 TalkTalk chooses Nexidia Advanced Interaction Analytics for its phone services .................................. 24 United Hospital System selects M*Modal clinical documentation system with speech recognition ....... 24 Providence Health & Services deploys Nuance Dragon Medical 360 Network Edition ..................................... 25 Terra Nova provides Health Sciences North with transcription and speech recognition editing services25 4medica’s cloud-based Electronic Health Record adds medical speech recognition from Nuance ................. 25 me2me releases a new version of its digital dictation app for iOS and BlackBerry devices, targeted at the healthcare market through M*Modal partnership ... 25 Leon Medical Centers selects IDS for speech recognition, mobile dictation, and workflow solutions26 Google search box in Chrome web browser displays calculator when calculation is entered, allows voicing equation ....................................................................... 26 SpeakGlobal adds text-to-speech to its English language learning site for Japanese learners ........... 26 Raytheon BBN awarded DoD contract to develop a foreign-document translation system ........................ 26 Carnegie Speech provides English language training with speech recognition for training institute in Dubai26 Goya Foods chooses Wavelink for voice-enabled warehouse picking solution ........................................ 27 W3C Multimodal Interaction Working Group publishes “Registration & Discovery of Multimodal Modality Components in Multimodal Systems: Use Cases and Requirements” ............................................................. 27 Shanghai Zhi Zhen Internet Technology sues Apple in China over Siri .............................................................. 27 International Research Consortium (U-STAR) launches translation app............................................................. 27 Voxbone provides phone network for Lexifone speechrecognition-based realtime translation service ......... 28 Microsoft touch keyboard in Windows 8 corrects some touch mistakes ............................................................ 28 Siri knows which is the best phone now ....................... 29 National Federation of the Blind sues over US State Department’s purchase of Amazon Kindles, citing limitations of the text-to-speech feature .................... 29 Accessible Media service adds text-to-speech ............. 29 Microsoft improves accessibility TTS function in Windows 8 .................................................................... 29 Proloquo2Go assistive software offers children with speaking disabilities artificial speech ........................ 30 Google researches computing methods using simulated neural networks ........................................................... 30 James and Janet Baker still pursuing compensation for their Dragon speech recognition technology ............. 30 Robots don’t just beep to warn you of movement, they now talk ........................................................................ 30 Analyst compares Siri speech recognition search to Google text search ....................................................... 30 Loading the dishwasher is still a job! ............................ 31 Taiwan's National Cheng Kung University files patent a lawsuit against Apple over Siri features ..................... 31 Statistics and Surveys ............................... 31 Smartphones in use worldwide to exceed 2.4 billion in 2016 ............................................................................. 31 Smartphone shipments to grow 38.8% this year to 686 million units .................................................................. 31 Approximately three quarters of the world’s population now has access to a mobile phone ............................ 31 325 million Android phones expected to be sold worldwide in 2012 ....................................................... 31 Samsung Galaxy S3 hits 10 million units in sales within two months .................................................................. 32 Android has 77% share of China’s smartphone market32 Biometric security to become a “must have” on all smart mobile devices, market research firm claims . 32 Apple iPhone maintains consumer interest over Android32 If you are under 34, you most likely use your mobile phone as your primary phone ..................................... 32 Voice search from Google on top 10 list of downloaded apps .............................................................................. 32 The mobile ad market could reach $18.3 billion by 2015 ............................................................................. 32 Consumers show mixed interest in mobile coupons ... 32 Global mobile app store revenue to exceed $34 billion in 2016 ......................................................................... 33 Hispanic community increasingly using mobile devices as a primary means of Internet access ...................... 33 Nearly six out of 10 parents of children aged 8-12 have provided their children with cell phones .................... 33 Vocalabs finds that making it hard for a customer to reach an agent serves no purpose ............................. 33 Contact center campaign survey concludes that the phone remains the most popular communications channel ......................................................................... 33 A variety of issues flagged in a survey of contact center professionals................................................................ 33 600 million smartphones projected to support gesture recognition in 2017 ..................................................... 33 Speech Strategy News Financial Notes ......................................... 34 Nuance reports Vlingo financials .................................. 34 M*Modal to be acquired for approximately $1.1 Billion by One Equity Partners ................................................ 34 Agero expands cloud-based content delivery to vehicles with investment in M-Way Solutions of Germany ...... 34 Samsung delivers higher profits due to smartphone sales surge ................................................................... 35 West Corporation reports increased revenue and profits for its second quarter .................................................. 35 Spoken Communications acquires HyperQuality, provider of quality assurance and business intelligence for contact centers .................................. 35 Interactive Intelligence announces preliminary Q2 results ........................................................................... 35 Apple acquires fingerprint scanner firm AuthenTec..... 35 People ...................................................... 36 Thomas B. Sabol named Chief Financial Officer of Comverse, Inc. ............................................................. 36 Bill Robinson named Executive Vice President of Worldwide Sales at inContact ..................................... 36 Eliza names Lee Horner Senior Vice President of Sales36 Lyle Ball named Chief Operating Officer at translation company MultiLing ...................................................... 36 Cyara Solutions names Laurence Webb general manager of sales for Australia and New Zealand ..... 36 For Further Information on Products Mentioned in this Issue .............................................. 37 Meisel-on-Mobile (www.meisel-on-mobile.com)........... 43 August 2012 4 Speech Strategy News August 2012 5 Editor’s Notes Adding value rather than adding apps Bill Meisel, Publisher & Editor Modularity is an important principle. Today’s complex software couldn’t be written without subroutines or layers of software such as the operating system, device drivers, etc. Adam Smith in his An Inquiry into the Nature and Causes of the Wealth of Nations, talked about the “division of labor” creating efficiencies by breaking up a complex task into separate steps (each requiring less skill and training than the whole), using a pin factory as an example. The Web can be considered a set of modules (web sites) with different information and services. Today’s mobile apps might be thought of as applying the “principle of modularity” to the user interface of a mobile device. If a developer can think of something you might want to do on a mobile device, today they can create a module for just doing that, and it can be downloaded to your device easily and extend the function of that device. You can assemble pieces to match your needs. But modularity works best when it contributes to the whole. The subroutines in a well-designed application contribute to overall effective program behavior, delivering a consistent and unified experience. When that integration fails, the software will fail as a product. A factory production line has to create a total product with features that customers will buy—the head of a pin is not of much use without the rest of the pin. Without search engines to unify all the variety on the Web, the Web wouldn’t be the asset it is today. Integration can be an issue for mobile apps. One can tolerate learning, navigating to, and launching an app that is used often. A frequently used application can be placed in a prominent position, and frequent use means you won’t have to think much about how to use it. But, as the number of apps grows, usability drops. The value of the hundreds of thousands of apps available is that you can assemble a set of capabilities tuned to your every interest and whim. But once assembled, are they a whole? Or to use most of them, do you find yourself looking for a particular app and then having to remember how to use it? Is your “user interface” over-burdened—a series of disconnected modules? Has modularity been overdone? In a mobile device, or even a PC/laptop, this integration will become increasingly critical over time. The current methods, such pages full of application icons on a mobile device or a list of Web pages delivered as the result of a search, are already becoming over-burdened. Even the ubiquitous pull-down menus on PC applications are getting hard to use as features and sub-menus proliferate. If every company develops an interactive mobile app (voice-enabled or otherwise) that is used only occasionally by a consumer, the number of mobile apps will become like the number of web sites, requiring a unifying force. Adding features to add value has its limits. Most big breakthroughs have been means of allowing modularity to have its impact while integrating that modularity into a whole. The Graphical User Interface was such an innovation. Personal assistants on mobile devices are another integration innovation, although perhaps one still in its infancy. Perhaps a search feature that includes apps (initiated by either voice or typing, and, most likely including some natural language handling capability) will be the unifying force. This feature is available on some phones through personal assistants or search for at least the apps delivered with the phone. If apps are included in a search function or a request to a personal assistant, there are two aspects that seem necessary for a successful integration of separate modules. First, a new app/module should automatically become accessible to that integration engine. For full integration, this might require an industry-standard way for an app to report its name, what it does, and what requests it might be able to address. Web sites are collections of text in an industry-standard format (HTML) that can be searched or tagged, allowing search engines to do what they do. It would be ideal if apps had a consistent way to understand what they do. Second, a request may include parameters that an app uses to perform its function, such as an address for a navigation application or a restaurant name for a review or reservation. It would be frustrating and inefficient if the user had to repeat the information once the app is launched after including it in the original request (e.g., “Italian restaurant in Beverly Hills”). Thus, the app, when it is registered with the integration function, should report the parameters it uses, such as “business name” and “location” (ideally with supplementary information that can be used in natural language processing—e.g., “restaurant,” “diner,” “café”—to identify those parameters). An industry standard should include parameter reporting, at least as an option. Speech Strategy News August 2012 6 The commercial success of such implementations is likely and would drive acceptance of at a standard way for applications to describe themselves. Since a standard takes time, the most likely first efforts will be consortiums, informal agreements, or a format driven by a successful integration platform such as personal assistant applications or search engines. The de facto standards can be driven by firms with the power to do so, with Apple, Google, and Microsoft, perhaps even Nuance, being prime suspects. Those firms could simply have a reporting mechanism for a new app that accepted a particular format. Other integrators could use that information if an app delivered it, making the de facto standard available to all. Either an informal or formal standard would help the user and the software industry. Commentary: Follow-up on software patents Bill Meisel, Publisher & Editor In last month’s editorial, I expressed my concern over the impact of the current patent system on innovation, using Apple patents cited in a suit against HTC (really against the Android operating system) as an example. I suggested that patents, particularly when covering an element of a user interface, were difficult to evaluate. One short-term remedy I suggested was that judges refuse to issue injunctions based on an element of a product design that wasn’t a core aspect of the product. If a patent violation was upheld at the end of the trial, the court could then assess financial damages based on the importance of the feature, but allow the product to remain on the market. Such actions help consumers by maintaining competitive products and consistent user-interface features across products. Richard Posner, a well-known jurist who sits on the 7th U.S. Circuit Court of Appeals in Chicago, teaches at the University of Chicago, and has written books on intellectual property and the impact of law on economics, expressed similar views. Posner presided over Apple’s lawsuit against Motorola Mobility, soon to be part of Google. He canceled a trial between the two and rejected Apple’s request for an injunction barring the sale of Motorola products claimed to be using Apple’s patented technology. In his ruling, Posner said an injunction barring the sale of Motorola phones would harm consumers. He further rejected the idea of trying to ban an entire phone based on patents that cover individual features like the smooth operation of streaming video. Apple’s patent, Posner wrote, “is not a claim to a monopoly of streaming video!” Posner told Reuters in an interview in July that some industries, like pharmaceuticals, had a better claim to intellectual property protection because of the enormous investment it takes to create a successful drug. Advances in software and other industries cost much less, he said, and the companies benefit tremendously from being first in the market—a benefit they would still get if there were no software patents. “It's not clear that we really need patents in most industries,” he said. Posner’s views were also reported in the Wall Street Journal in July, and hopefully will have some impact on other jurists. Posner also noted that devices like smartphones have thousands of component features, and they can all receive legal protection individually. He commented, “You just have this proliferation of patents. It's a problem.” In a blog posting, Are Patents on the Mobile User Experience in the Public Interest?, last October, I cited a patent suit Apple filed against HTC that included a patent on the slide-to-unlock feature on mobile phones, one that could be considered an image of a slide switch on the screen that works like a slide switch. In another example of a careful ruling, a judge in the UK in July, in the Apple suit against HTC, called that patent and two other user-interface patents (one on multi-touch and the other on a multilingual keyboard) invalid. The judge said the slide-to-unlock was an obvious development, citing the presence of a similar feature on a 2004 Swedish phone. In July, Samsung issued a software update for its flagship Galaxy III smartphone that was characterized as a security update. The update, however, removed the feature in the Google search bar that was used for a search for local content on the phone as well as web search (p. 20). Apple had successfully obtained a ruling that this feature infringed an Apple patent. This is an example where a useful feature for users is being denied Samsung buyers by a patent war. Speech Strategy News August 2012 7 ICSI and Microsoft to collaborate on conversational human-machine interaction International Computer Science Institute will explore speech and other modalities Researchers at the International Computer Science Institute (ICSI) in Berkeley, California, will work with Microsoft to advance the state of the art in human-computer interaction relying on speech and other modalities, the organizations announced. The collaboration takes advantage of ICSI’s history in speech processing research (SSN, July 2012, p. 22) and Microsoft’s experience in deploying natural speech interfaces in its services and applications. Roberto Pieraccini, director of ICSI, said that this work is “particularly important now, as the popularity of devices that understand and produce speech grows more quickly than ever before.” Senior ICSI and Microsoft researchers, as well as postdoctoral researchers and students at ICSI, will conduct the research. Elizabeth Shriberg and Andreas Stolcke, Principal Scientists with the Conversational Systems Laboratory at Microsoft and ICSI External Fellows, will lead the effort. The Conversational Systems Lab (CSL) is an applied research group within Microsoft’s Online Services Division based at the Microsoft Silicon Valley campus in Sunnyvale, California. CSL is exploring novel ways to interact naturally with computer systems and services using speech, natural language text, and gesture. Its aim is to enable conversational understanding of users’ inputs and intentions across a range of devices, from mobile phones to Xbox consoles in the living room. In one of the first projects under this collaboration, researchers will use information conveyed by speech prosody (the melody and rhythm of speech) to improve automatic speech understanding. Shriberg noted that Patterns of timing and intonation in spoken language encode information far beyond that conveyed by words alone. “This information is important for achieving natural and efficient conversational interactions with machines,” she said. “We expect to accelerate progress on human-computer dialog systems that better understand and use cues in human-human spoken communication that we often take for granted.” US Department of Justice licenses Nexidia audio discovery Criminal Division installs Nexidia audio technology to streamline investigations Nexidia announced that the Criminal Division of the United States Department of Justice (DOJ) has licensed Nexidia’s Audio Discovery software, which can find content in audio files containing specific phrases (SSN, July 2012, p. 11). The DOJ will use Nexidia for reviewing audio content produced in its investigations. Other government agencies already using Nexidia solutions include the United States Securities and Exchange Commission (SEC), the Commodity Futures Trading Commission (CFTC), the Federal Energy Regulatory Commission (FERC), and the Federal Trade Commission (FTC), as well as OfCom in the UK (the independent regulator and competition authority for UK communications industries). Jeff Schlueter, Vice President & General Manager of the Legal Market business unit for Nexidia, said, “The DOJ decision to license our software is further proof that it is the de-facto standard for reviewing audio in a timely and cost effective manner.” iSpeech Cloud mobile speech platform claims 13,000 developers Improvements in speech recognition and text-to-speech and a home automation solution announced iSpeech provides internally developed, cloudbased speech recognition and text-to-speech as well as mobile apps, including DriveSafe.ly and iSpeech Translator (SSN, September 2011, p. 11). In July, the company announced that the iSpeech development platform has been used over 1.6 billion times in mobile apps made by over 13,000 developers, which the company claims makes iSpeech the largest mobile speech development platform in the world. The technology can be used without cost in a standard version that credits iSpeech. iSpeech also released updates to its Web API, iPhone, Android, and BlackBerry Software Development Kits (SDKs) that provide faster performance and optimized speech recognition for Siri-like personal assistant applications and other popular speech recognition use cases, the company said. Heath Ahrens, Founder and CEO of iSpeech, said the free version is used through the SDK, which uses an Application Programming Interface (API) to handle most of the interaction with the cloud-based speech technology. The speech recognition and text- Speech Strategy News August 2012 8 to-speech software resides on servers owned by iSpeech. The SDK assures that the requirements for the free version are followed, including displaying the source of the speech technology within an app using it. The SDK also allows iSpeech to know the app, the type of operating system and similar information, giving it some visibility into where the technology is being used and how it is used. If a company wishes to use the API directly, without the constraints imposed by the SDK, they can pay a fee based on usage and/or number of downloads, Ahrens said. The company also provides professional services to, for example, create a specialized statistical language model for a company with an application that doesn’t fit the current contexts available. The company can also create custom TTS voices for customers. There are also specialized contexts already developed for common use cases such as virtual assistants, translation apps, navigation, e-learning, and dictation. The company is also considering licensing the core technology as software, with no formal plans as yet. A calorie counter app from about.com apparently uses iSpeech speech recognition with a custom language model. The app allows saying what you are eating and getting a calorie count. About.com says it has 250,000 foods in its database, presumably fodder for a Statistical Language Model. iSpeech speech recognition and text-to-speech is available in over 25 language and accent combinations. The iSpeech platform, launched less than 10 months ago, is used by apps in lifestyle, food, travel, retail, finance, gaming, messaging, dictation, translation, and social services. Companies listed by iSpeech in an announcement were Hearst (the media and information company), Telenav (navigation services), SpeaktoIt (a personal assistant mobile app, SSN, November 2011, p. 28), and Vocre (speech translation). Ahrens said that the company has the server capacity and technology to provide low-latency response (“as fast as anybody”) and reliable availability (“100% uptime so far”). The company is cash-flow positive, he said, in part thanks to its successful DriveSafe.ly app. The company also announced iSpeech Home, a platform for developers to use for connected devices in the home and home networks. iSpeech Home is intended to allow consumers to control their televisions, home entertainment systems, lighting, heating, ventilation, irrigation, security systems, refrigerators, washers and dryers and other household appliances through voice and natural language commands. The system combines embedded speech recognition for quick local action with network-based speech recognition for more complex queries. It also uses iSpeech text-to-speech for voice feedback. The company recently hired Qiru Zhou as Chief R&D Scientist. Qiru, an expert on speech and language processing, was with Bell Labs (now part of Alcatel-Lucent after the split-up of AT&T) as a member of its technical staff from 1992 to 2011. He contributed to and led various Bell Labs major research projects on robust, real-time speech recognition, large vocabulary speech recognition, natural language call routing, and spoken language human-machine dialogue interface and architecture. Empirix launches automated contact center regression testing as a service Checks how existing services are impacted when changes are introduced Empirix provides testing, monitoring, and analytics solutions for service providers, mobile operators, and contact centers, including the simulation of calls using speech recognition to react to prompts. The company announced the availability of Empirix Regression Testing as a Service (Empirix RTaaS), a new quality assurance solution for ensuring that existing services are not negatively impacted when changes are introduced into complex contact center environments. It combines Empirix Hammer Test technology with customizable services for auditing contact center operations, assessing customer experience, and designing test plans. Empirix RTaaS measures the impact of changes on switching, routing, Interactive Voice Response (IVR), and agent desktop solutions prior to their deployment. Businesses are continuously updating their contact center systems in response to events such as product launches, cost-cutting programs, or mergers and acquisitions. Therefore, comprehensive understanding of all contact center systems can be difficult to obtain, especially for companies that have lost legacy expertise over time. Empirix RTaaS provides organizations with detailed knowledge of these systems to identify unused or underutilized resources, as well as the thousands of routes that calls travel throughout the contact center. Armed with this information, businesses can leverage the Empirix RTaaS solution to automate all their test Speech Strategy News August 2012 9 functions, including test script creation, execution, monitoring, reporting, and documentation. Companies can then perform repeated regression testing on an ongoing basis as changes are introduced. Tim Moynihan, vice president of marketing, Empirix, said, “As organizations continually update their contact center systems, they must not only ensure that new features function properly prior to their deployment, but also that existing capabilities function at expected levels.” In recent service engagements, Empirix said companies saved between 60-70% when they automated processes that were previously handled manually. They were able to reduce the time needed to test new solutions and gain actionable intelligence for correcting any issues detected. Voxeo launches "Zombie IVR" campaign “Talking-Dead” self-service platforms continue to suck the life out of customer satisfaction A frustration of those of us involved in speech technology for many years has been the general reaction of new acquaintances when we tell them of our involvement with speech recognition; the general reaction is something like, “Oh, you’re responsible for those awful customer service systems that don’t let me get to an agent.” Thank you, Apple, for making speech recognition fun. The frustration goes beyond the reaction of acquaintances. Most of us believe that the speech technology isn’t the limiting factor driving dissatisfaction. Instead, it is often attitudes of call center managers that tend to maintain the same structure of interaction that they had with touch-tone menus, without realizing that the decision tree forced by touch-tone technology might not be natural to the caller. If anything, many call centers have retreated from the use of more advanced speech technology and good design in call centers in the name of saving money during a recession (and perhaps the lower cost of agents outsourced to developing countries). Part of the problem is older equipment that has minimal flexibility. Voxeo is fighting back with a “Zombie IVR” campaign, emphasizing frustration levels with the thousands of out-of-date end-of-life systems that trap callers in “IVR hell.” To fight off these “talking dead” IVR systems, Voxeo is launching a campaign to showcase how fast, easy and cost-effective it can be to migrate away from outdated IVR systems to a flexible, standards-based solution with the ability to adapt to customers’ heightened expectations and changing preferences, including the demand for mobile and social media interactions. Voxeo’s architecture is based on open standards such as VoiceXML 2.0 and 2.1, CCXML, SIP, MRCP and SSML. Voxeo offers deployment flexibility, delivering both hosted cloud and onpremise options with the ability to easily move from one to the other or leverage a hybrid combination of the models. “IVR systems today need to keep up with changing customer preferences and growing expectations,” said Kim Martin, director of marketing at Voxeo. “It’s not just about upgrading hardware and software, but about upgrading the total customer experience with the ability to provide multi-channel interactions, personalization, location intelligence and more. Companies that find themselves locked into old technology are now realizing how important it is to build in a completely standards-based environment like Voxeo, that is unlocked at every layer and provides the ability to integrate cross-channel, actionable analytics to easily tune and refine applications to meet customer expectations. It’s ultimately about empowering companies with the right functionality and tools, even down to the flexibility of leveraging cloud hosting, so they can better focus on their customer experience and not the underlying infrastructure of their IVR.” Zombie IVR systems, Voxeo says, have offputting features such as greeting customers with unhelpful, one-size-fits-all menu options, that insist their options “have recently changed” when they actually haven't been updated in years. For example, such systems might require customers to enter their account numbers, only to be asked again when the customers give up and transfer to an agent for help. These Zombie IVRs are unable to understand or adapt to the customer's needs and merely drone on like the “talking dead.” In summary, most of these legacy systems were an attempt to reduce calls to agents at the expense of customer satisfaction. While the total cost of ownership of aging legacy Zombie IVR systems continues to rise, Voxeo says its VoiceObjects has been proven to save customers up to 80% in maintenance and lifecycle management costs. To speed up the migration process and keep Speech Strategy News August 2012 10 costs down, Voxeo and its partners have a variety of tools that ease conversion from common platforms; the company says it has been able to automate the conversion of up to 95% of old code to Voxeo VoiceObjects. Calabrio integrates speech analytics and workforce optimization More efficient review of voice transactions Calabrio, Inc. provides contact center workforce optimization and analytics software. In July, the company announced a speech analytics application integrated within a workforce optimization framework. The company’s Calabrio ONE workforce optimization software includes call recording, quality assurance, workforce management, performancebased dashboards, reporting, and now speech analytics. Calabrio ONE is built on a Web 2.0-based architecture that allows the contact center to integrate new applications more easily, as well as personalize and optimize the desktop toolset for each user—agents, supervisors, managers, knowledge workers, and executives. Calabrio Speech Analytics turns recorded phone transactions into meaningful data. Calabrio Speech Analytics automates search of voice transactions, so quality and compliance teams spend substantially less time on review. Tom Goodmanson, president and CEO of Calabrio, said, “Calabrio’s goal is to drive Speech Analytics into organizations in a powerful yet flexible way and bring structure to the most unstructured data, which is voice.” The latest Calabrio ONE suite also includes several enhancements to Calabrio Workforce management and Calabrio Quality Management applications, including more language options and serviceability enhancements: § A dynamic dashboard capability, which includes the ability to drill down on the detail within one analytics widget to change the scope of all related widgets within the dashboard, and ultimately drill into root cause data for further analysis and action; § A real-time recording monitoring application, which monitors recording states and alerts in the event of an outage; § User level localization for English, French, Spanish, and Portuguese. Calabrio ONE is available immediately through Calabrio and its partner network. StrikeForce and TradeHarbor partner to offer three-factor voice verification Out-of-band authentication adds security for mobile transactions StrikeForce Technologies and TradeHarbor are partnering to offer a “three-factor” voice verification solution for mobile devices. TradeHarbor provides voice verification software (SSN, December 2011, p. 15), and StrikeForce provides multi-factor out-ofband authentication, which can include biometric authentication methods. The new multi-factor solution combines three critical factors—who you are, what you have, and what you know—over a mobile device. Each verification interaction produces a legally binding voice signature combined with outof-band authentication, which includes an audit trail to mitigate repudiation by the person being authenticated. Malware on a PC or mobile device can hijack a web-based interaction and use the consumer’s session to do things such as create a wire transfer without the consumer realizing it. Out-of-band authentication adds authentication through a separate channel to avoid such attacks—thus, “out-of-band authentication.” The telephone network is an ideal out-of-band channel for authentication. StrikeForce’s ProtectID out-of-band authentication technology offers eight different out-of-band methods, including phone, voice, instant messaging, hard tokens, and desktop/mobile tokens. ProtectID can be installed and managed on premise or with StrikeForce’s hosted service offering. TradeHarbor’s Voice Signature Service deploys voice authentication technology in a scalable Web Service. It provides the ability to obtain legally binding document signatures over the telephone and in mobile and Web transactions. Speech Strategy News August 2012 11 Active Endpoints allows use of enterprise software from a mobile phone Helps iPhone and Android smartphone users to visualize, create, and modify their own wizards Customer Relationship Management (CRM) software such as Salesforce CRM from Salesforce.com is used by professionals to keep track of prospects, appointments, and other aspects of selling a company’s products or services. It’s an example of enterprise software applications that companies use to organize and report activities. Active Endpoints, Inc. announced Cloud Extend Mobile in July, a product aimed at letting mobile workers access such enterprise applications, debuting first on Cloud Extend for Salesforce. The company already has a product, Cloud Extend, that can work through Web browsers. The company’s software uses features of Salesforce CRM for the back-end integration. The mobile phone software supports speech-totext and touchscreen input with end-user customization. Cloud Extend Mobile allows iPhone and Android smartphone users to visualize, create, and modify their own wizards, without IT skills or training. The dual input method allows free-form dictation of meeting notes, but also supports data entry that could be accomplished more quickly by tapping on-screen options, such as icons, pick lists, and check boxes. Users can get a quick start using a library of prebuilt tools. One is a free “Meeting Follow-Up Wizard.” Users speak or tap info on the handset, and their company’s enterprise app is automatically updated. The wizards can be set up to ask for information mapped to software databases for a particular task. Mark Taber, CEO, Active Endpoints, said, “Cloud-based enterprise apps pump the information lifeblood for companies around the world; however, it’s nearly impossible for business users to utilize those apps on the go, on a three- or four-inch screen. Cloud Extend Mobile is going to spark a paradigm shift for business smartphone users, letting them take full advantage of the incredible computing power that’s available in iPhone and Android devices…This is the future for smartphones in business.” Polish telecom deploys Nuance VocalPassword for use by its employees Subsidiary of Deutsche Telekom uses voice verification for automatic resetting of passwords Nuance Communications announced that Polska Telefonia Cyfrow, a subsidiary of Deutsche Telekom, has deployed Nuance VocalPassword for use by its more than 4,500 employees. Nuance VocalPassword enables employees to automatically reset their network and desktop access passwords simply by speaking. The system uses Nuance voice biometrics (speaker verification) to confirm their identity and Nuance speech recognition to implement the password reset. Nuance indicated that its voice biometric solution has processed more than 20 million voiceprints. The company said that organizations are using the technology in financial services, customer care, government, and consumer devices, among others. Robert Weideman, executive vice president and general manager, enterprise of Nuance Communication, noted, “Given that human voices are as individually unique as fingerprints and retinas, they are an ideal way for companies to authenticate employees and customers.” Maciej Zawada, platforms and systems development bureau director of Polska Telefonia Cyfrowa, said, “Nuance VocalPassword has positively impacted our employees, giving them the ability to easily and efficiently reset their passwords 24 hours a day, seven days a week. As a result, not only have we been able to eliminate the need to ask them a series of detailed questions to verify that they are indeed who they say they are; more importantly, we have been able to reduce the time it takes to verify an employee’s identification to just 20 seconds, freeing up our IT staff to handle more pressing issues. Given our positive experience with VocalPassword, we are exploring how we can now roll this service out to our customer service contact center.” The Nuance Voice Biometrics portfolio includes VocalPassword; FreeSpeech, which automatically identifies speakers passively during a live conversation with a customer service agent; DragonID, which provides authentication and identification capabilities embedded into hardware devices, such as mobile phones; and Loquendo Public Security Solutions for government agencies, such as law enforcement, military, and intelligence services. Speech Strategy News August 2012 12 Nuance Dragon Drive! messaging in 2012 BMW 7 and BMW 3 Series Speak emails and text messages with cloud-based transcription In May, Nuance Communications introduced Dragon Drive!, its cloud-based natural-language voice platform designed specifically for the connected car. Dragon Drive! Messaging (DDM)—a mobile assistant that lets users speak, listen, and respond to text messages and emails—was the first service offered by the Dragon Drive! platform. The speech recognition uses the same core technology as Nuance’s Dragon Dictation app. Nuance provides a hybrid automotive platform, with speech technology local to the vehicle as well as in the cloud. In July, Nuance announced that BMW was the first manufacturer to integrate DDM. The solution gives drivers the ability to dictate emails and text messages to their contacts simply by speaking, and is fully integrated as part of the BMW ConnectedDrive Navigation system Professional in the new 2012 BMW 7 Series, BMW 3 Series Touring, and BMW 3 Series ActiveHybrid vehicles, with additional models to follow. DDM delivers a fully integrated mobile assistant messaging experience that lets drivers speak, listen, edit, and respond to text messages and emails while keeping their hands on the wheel and eyes on the road. Drivers can speak simple commands to format e-mails by adding new lines, paragraphs, and speaking punctuation and other format commands. Arnd Weil, vice president and general manager, automotive, Nuance, said, “People want to connect with family, friends, and colleagues while they’re on the road, but without the dangerous distractions posed by manually engaging handheld devices.” In addition to the dictation functionality, the new BMW Navigation system Professional also features local voice command and control with Nuance technology. Drivers can speak one-shot commands for phone calls and navigation, such as “Call John Miller on Mobile” or “Navigate to 100 Boylston Street in Boston, Massachusetts.” DDM will be available in BMW vehicles starting in July 2012 in six different languages, including US and UK English, French, Italian, German, and Spanish. BMW buyers can test DDM free for 60 days. Once the trial period expires, DDM will be available as a Nuance service with an annual renewal option. AVIOS Speech Conference in Israel draws both academia and industry Speakers from around the world addressed both academic and business issues The Applied Voice Input Output Society (AVIOS), the speech industry’s non-profit industry organization, organizes a number of conference and local chapter meetings to serve the needs of its members and the general speech community, including the Mobile Voice Conference (the fourth to be held April 15-16, 2013 in San Francisco). The 2012 Afeka-AVIOS Speech Processing Conference, held June 19-20 in Tel-Aviv, was organized by the Afeka Center for Language Processing (ACLP) and AVIOS Israel, the AVIOS local chapter in Israel. The 2012 conference had representatives from both the academic and industrial speech communities. International speakers included Prof. Lawrence Rabiner, Rutgers University; Dr. James Larson, VP, Larson Technical Services; Prof. Sadaoki Furui, Tokyo Institute of Technology; and Peter Mahoney, Chief Marketing Officer, Nuance. Dr. Nava Shaked, Chairman of AVIOS Israel, had a major role in organizing the conference. She said that it was very successful, with strong content and enthusiastic networking. Dr. K. W. “Bill” Scholz, AVIOS President, commented, “We have been encouraging the growth of AVIOS local chapters across North America, Europe, the Middle East, and Australia to reach across many geographies in an effort to raise awareness of speech technology as a tool that that helps the general public.” Trapit uses natural language text processing to deliver content Highly personalized content based on user-specified interest and user-specific adaptation Trapit, founded in 2009, is backed in part by SRI International, a source of speech and natural language research that has been the basis of other companies, most notably Siri before its purchase by Apple. Trapit, which has a Web version of its service content selection service, announced the launch of their first iPad app in July. The free app is described in a press release as “built from the same Speech Strategy News August 2012 13 AI technology that powers Siri.” Unlike Siri, which uses speech input, Trapit is text-based. The app delivers articles, videos, features, and blogs on user-defined topics. Trapit isn’t limited to pre-set categories or broad topics; Trapit explores the entire Web and delivers “high-quality” content on specific interests and hobbies defined by the individual user. Gary Griffiths, CEO and co-founder at Trapit, said, “Our iPad app represents an entirely new approach to content discovery and consumption on the iPad. Most people are tired of seeing the same articles they already saw on social networks; they’re looking for fresh, high-fidelity content from new sources, on topics they actually care about.” Trapit’s iPad app utilizes the same underlying platform as the Web app, using natural language processing, semantic analysis, and user feedback to help select articles based on user-specified interests. Users create focuses on any topic, from general topics like “US politics” to more niche topics like “cooking with avocados.” Trapit learns more about the topic and individual preferences through both implicit engagement and explicit feedback. Trapit says it focuses on giving each user an experience that is 100% unique to them, so that two users with “traps” on the same topic, like “80’s Rock Bands,” won’t be delivered exactly the same content. Trapit uses over 100,000 “carefully vetted” sources, the company said. The app also features a “save to reading list” function and one-click sharing to Twitter, Facebook, or email. Interview with Nik Stanbridge, VoiceVault Biometric identity verification with text-dependent voiceprints Nik Stanbridge, VP of Product Marketing, VoiceVault, was interviewed by Bill Meisel in late July. Nik is responsible for all aspects of Product Management, Marketing and social media integration at VoiceVault. He is an experienced Product Manager with over 20 years experience in global B2B and B2C market sectors. Prior to his current position, Nik held a variety of Product Management roles in technology companies including PDF aggregation software for regularity submissions in the pharmaceutical industry; software for PDF document creation, manipulation and conversion; and industrial inkjet print heads and ink systems. Please describe how your basic voice verification technology works. Was it developed at VoiceVault? VoiceVault technology is 100% in-house developed and proprietary to VoiceVault. It is used to verify someone’s claimed identity. That is, we can verify that someone is who they claim to be—it's about identity verification rather than identification. Our technology has undergone extensive user experience testing that has enabled us to define the optimal number of words to use in the enrollment and verification processes. It has also applied many years of research effort into making the voice biometric accuracy for that user experience appropriate for high security applications. To enroll a user, we prompt for speech that contains specific words that are then used to create that users' voiceprint which is stored against their claimed identity (an account number for example). To verify that person is who they claim to be, they have to speak some or all of the words that they used when they registered their voice. This speech is then compared to the voiceprint associated with their claimed identity to assess the probability that they came from the same person. Our technology is able to work with very small amounts of speech: enrollment typically requires 10 seconds of speech and verification less than 5 seconds. When used with our adaptation technology, which enables a voiceprint to be updated with verification speech from one or more unsuccessful verification attempts, we are able to deliver very high levels of accuracy [see below] in a wide range of speaker environments and channels. This provides a flexible deployment approach that in turn delivers an excellent user experience with high levels of security. What market segments do you see as early adopters of voice verification, and what is their motivation? Our strategic focus is to be the supplier of choice for Financial Services and Healthcare enterprises for voice biometric solutions. Over the next 2-3 years we will build on the client base we already have to be the foremost supplier of smart device and telephony-based voice biometric solutions as measured by the number of Fortune 500 companies we will have as clients. Speech Strategy News August 2012 14 While we continue to have success in deploying voice e-signature solutions to the Healthcare market, we are also seeing a significant uptake of identity verification solutions in the Financial Services market. Current indications are that the Financial Services vertical is poised for rapid growth of text-dependent solutions on smart device apps where text dependent voice biometrics is ideally suited. Password Reset and One-time Password continue to be the ideal applications for how organizations take their first steps in learning and understanding voice biometrics. Authentication and transaction authorization will be the biggest growth area. Traditional call center applications will continue to be important including their use of voice e-signatures and caller authentication. We expect smart device solutions to be a significant growth area and we believe that our short-utterance text-dependent solutions are very well placed to benefit from this rapidly growing market. VoiceVault has had some successes internationally. Please outline these and the reason for international growth. Our current voiceprint distribution is 80% US and 20% non-US. By vertical this represents 50% Healthcare, 40% Financial Services, and 10% other. International growth is important to us and we acknowledge and recognize that while we have a strategic US focus, many large institutions have technology and innovation centers outside the US. Initiatives in our target markets can come from anywhere, and we look carefully at each one. As the number of large-scale voice biometric solution deployments is still growing, it’s important for us to promote and encourage adoption so that we can be seen as leading the way in voice biometrics—and this involves thinking and operating internationally. What do you see as distinguishing characteristics of voice verification technology from different vendors? § § § § § § § § § § § § § § Our key differentiators are: Our technology can be optimized to deliver a false accept rate of 0.01% with a false reject rate of less than 5% for high security applications; It can be optimized to deliver a false reject rate of 0.05% with a false accept rate of less than 1% for costreduction applications; Extensive and on-going user experience testing has resulted in a highly engaging but non-intrusive user experience design for authentication and authorization solutions; Accuracy levels can be achieved with 10 seconds of enrollment audio and less than 5 seconds of verification audio; It is a software-based solution designed for rapid and simple deployment. It can either be vendor-hosted or on-premise, with no specialized server requirements—commodity hardware and virtualization are all supported. Web services APIs are extremely easy and quick to integrate with; partners and clients have developed proof-of-concept applications in a matter of hours; The same deployment can be used for all channels and all applications; all you need is the ability to record speech and a network connection to submit it; VoiceVault is exclusively Voice Biometrics. We are 100% focused on being the best provider of voice biometric authentication and authorization solutions. Looking at voice biometric vendors in general, the key characteristics to look at and consider are: The type and amount of speech required for the enrollment of a person's voice and for subsequent verification attempts; The level of accuracy that can be achieved using this amount of speech (the false accept rate / false reject rate); The suitability of the enrollment / verification processes and user experience for a given use case; The suitability of the obtainable accuracy level to the business case; The scenarios and use cases that this amount of speech / accuracy can be used in (text independent conversational speech for example isn't suitable for a smart device app); How easy is it to develop an application and how straightforward is the API integration? Speech Strategy News August 2012 15 Any final comments? There is “no one size fits all” in voice biometric deployments. Every voice biometric solution is designed to meet a specific use case, so understanding the user case / business case in which the solution will be integrated and deployed is key to the successful use of the technology. Taking time to understand what the technology is going to be used for and what the deployment success criteria are is essential. Interview with Chih-Chung Kuo, Industrial Technology Research Institute ITRI’s speech research includes speech recognition, speaker recognition, and speech synthesis Chih-Chung Kuo, Technical Director, Division for Computational Intelligence, Information and Communications Research Laboratories (ICL), Industrial Technology Research Institute (ITRI) was interviewed by Bill Meisel in mid-July. ITRI is a nonprofit R&D organization in Taiwan engaging in applied research and technical services. The organization has offices in Taiwan, San Jose (California), Tokyo, Germany, and Russia. Dr. Kuo is a senior researcher at ICL in ITRI. As the Technical Director of the Division for Computational Intelligence & HCI Technology, he leads a team to develop state-of-the-art technologies and to play the leading role in providing solutions for Taiwan industry in the field of speech and intelligent user interfaces. Dr. Kuo holds a Ph.D. in EE from National Tsing Hua University, Taiwan. Please briefly state the overall goal of ITRI. Founded in 1973, Industrial Technology Research Institute (ITRI) is Taiwan’s largest and one of the world’s leading high-tech R&D institutions. Well-positioned to be a pioneer of industry with brand new ideas and innovation, the goal of ITRI is to promote the advancement of Taiwan’s diverse industries by: § Expediting the development of new industrial technologies; § Aiding in the process of upgrading industrial technologies; and § Shaping the future of industrial technologies for greater efficiency and sustainability. Being a multidisciplinary research center, ITRI focuses on six technical fields that include Information and Communications; Green Energy and Environment; Medical Devices and Biomedical; Electronics and Optoelectronics; Material, Chemical and Nanotechnology; and Mechanical and related systems. ITRI has aggressively researched and developed countless next-generation technologies including green energy, mobile digital life, cloud computing, flexible displays, 3-D ICs, RFID, light electric vehicles and tele-care technologies. For five consecutive years ITRI has received prestigious international awards for outstanding technology innovation, such as the Wall Street Journal Technology Innovation Award, R&D 100 Awards, the iF Design Award the Red Dot Design Award, to name a few. ITRI makes a concerted effort to collaborate with international partners to enhance and facilitate technology innovation and commercialization, aiming to transform Taiwan’s research capability from a “follower” to a “frontrunner,” so as to provide leading edge opportunities for domestic industries. For more details please refer to www.itri.org.tw/eng. What is the main focus of ITRI’s activities in information and communication research in particular? ITRI’s activities in information and communications research are conducted by Information and Communication Research Laboratories (ICL), one of ITRI’s six core laboratories. Dedicated to the vision of enabling a Green, Intelligent, and Healthy Society, ICL is executing its industry-enabling strategies by developing software-centric, service-oriented circuits, information, and communications technologies, in addition to stressing system integration, in the following focused areas: Smart endpoints, mobile enabled cloud services, intelligent vehicles and transportation systems, green energy and health care. (See figure.) Speech Strategy News August 2012 16 M ain Focus of ITRI’s Inform ation & Com m unication Research Please describe what ITRI is doing in speech and natural language research. ITRI’s speech research includes speech recognition, speaker recognition, and speech synthesis. Speech recognition technologies range from small-footprint voice command recognition optimized for IC and embedded systems to large vocabulary continuous speech recognition run on servers. Nuvoton N572F064 and Grain Media ET11A5 are two speech ICs with speech recognition technology transferred from ITRI. ITRI’s natural language research involves two aspects: one is for spoken dialog systems like natural language understanding and generation as well as dialog management; the other is about how to extract information from unstructured text content retrieved mostly from web. ITRI’s speech and natural language research focuses on Mandarin Chinese for accents of both Taiwan and China. Since English is frequently used in modern Taiwan society in practice, mixed-Chinese-English processing is also an emphasis of our research. For example, an ITRI polyglot TTS system with unified model of Mandarin and English can produce fluent synthetic speech Mixed-Chinese-English for mixed-Chinese-English sentences, which should be the leader in solutions one focus this kind of TTS system. An image-based avatar technology integrated with our TTS engine can produce a text-driven “talking head”. The synthesis of both image and voice of a true person make the avatar look just like real video captured from the person. We believe that this is quite a unique technology and system. Please visit our demo site at www.ecsr.itri.org.tw/ttsdemo/vttsdemo.php, where you may enter any Chinese text and watch the synthetic video. For more details and a demonstration please click (in Chinese only) http://atc.ccl.itri.org.tw/speech or visit a Chinese language learning web site (in English) for foreign learners, which has integrated almost all of our technologies at www.cola.itri.org.tw/index.php. Are ITRI technologies available for external license? The Taiwan government funds the R&D activities of ITRI, which in turn transfers R&D results to local enterprises. In addition to technology transfer, ITRI offers a range of technical services to assist industries to enhance their competitiveness, including products and process development, pilot production, Speech Strategy News August 2012 17 test/certification, and IP licensing. Take the year 2010 as an example; 423 technologies were transferred to 491 companies, 690 investment deals were reached and a number of start-ups were established. Technology transfer is conducted based on the principle of fairness, openness, and efficiency, with the priority given to domestic enterprises. Companies in jurisdiction outside Taiwan need special approval from the Ministry of Economic Affairs. See www.itri.org.tw/eng/econtent/business/business03. AT&T speech technology (cont.) Continued from page 1 AT&T Watson takes input, analyzes it, performs one or more services, and returns a result in real time. Input can be audio files, speech, gestures, face recognition, and text. (Source: AT&T) There is a registration charge of $99 for developers, Gilbert indicated, which will allow developers to use all AT&T APIs, including speech, as they become available, without a per transaction charge through 2012. Gilbert said that AT&T is working on pricing beyond 2012, and current projections have pricing at about one cent for most “small transcriptions.” He said AT&T will review pricing as we get closer to 2013, but he does not anticipate pricing “going anywhere but down.” (More detailed pricing information is available online. AT&T also sent out an eblast with a discount code that allows getting the API with the $99 fee waived through August.) AT&T Watson is a network-based engine that integrates a variety of speech capabilities, including speaker-independent speech recognition, AT&T Natural Voices text-to-speech, speaker verification, natural language understanding, LLAMA-based machine learning, search, translation, and dialog management. AT&T says that the Watson speech engine continuously improves accuracy by learning different accents and speech patterns. WATSON can combine speech with other modalities, such as a touch-screen tap (“show me the closest coffee shop to here”) or other gesture (see figure). AT&T said in advertising material that AT&T has accumulated more than 600 patents on the AT&T Watson technology. Watson uses a plugin architecture where each subtask is contained in its own plugin. Depending on the task to be performed, Watson selects the right plugins at run time, assembles them into a working engine, and coordinates the information exchange between the plugins. It also handles communication with the end device. However, only speech recognition (speech-to-text transcription with Statistical Language Models that are tuned for specific “contexts”) is available initially with the current API. The API allows sending audio and receiving back text. AT&T indicated that native and HTML5-based Software Development Kits (SDKs) would be available “soon.” The contexts make the speech recognition more accurate and also support specialized vocabularies, including: Speech Strategy News August 2012 § § § § § § § Generic speech-to-text (general dictation, automatically detects English or Spanish, and returns the appropriate text transcription); Web Search speech-to-text; Local Business Search speech-to-text; Voicemail-to-text; SMS (text message) speech-to-text; Question Transcription (converts questions to text); and Nuance Dragon version (cont.) Continued from page 1 A new interactive tutorial is available to walk people through exercises that demonstrate best practices for dictating, editing and formatting to get up and running quickly. Dragon’s adaptive features that personalize the speech recognition, vocabulary, and other aspects specific to a user have been further enhanced. Dragon 12 adds Smart Format Rules, a new technology that adapts to the way the user prefers to format their words. For example, if you work at Nuance (or write a speech newsletter), you probably want “dragon” to appear capitalized when you dictate it. Dragon automatically detects word, phrase, and format corrections, including abbreviations, numbers and more, so dictated letters, emails and documents reflect a person’s own writing style. (The software asks if you want the replacement to always be made.) Dragon also offers more and more likely alternate word choices in its correction list. For example, if one dictates “Eric,” the correction list includes seven alternative spellings. Dragon 12 reminds users to use the feature that scans documents and emails they choose to find vocabulary and usage data for the programs language model. Dragon 12’s use within other programs has been improved. If Gmail and Hotmail are used through Internet Explorer 9, Mozilla Firefox 12 or higher, and Google Chrome 16 or higher, Dragon 12 offers full text control and adds commands for the most frequent actions. Dragon 12 adds support for the Dragon Remote Microphone App in Android phones, previously available only for iOS. The feature lets one use a mobile phone as a wireless microphone over a Wi-Fi network using the free Dragon Remote Microphone App. Dragon 12 also supports wideband 16 kHz Bluetooth wireless headset microphones, providing increased accuracy through a higher-quality audio signal. Some people find catching errors or poorly stated points in a document easier if they hear it read. 18 TV Speech to Text (AT&T’s U-verse video programming guide). The contexts are language models built, maintained, and tuned by AT&T. AT&T is also offering the AT&T Application Resource Optimizer (ARO) as open source code. ARO is a free diagnostic tool that helps to optimize a mobile app’s performance, speed, network impact, and battery utilization. Dragon 12’s text-to-speech reads text now with more control—fast-forward, rewind, speed, and volume control. Hill indicated that the TTS itself is more natural-sounding in this version. An improved help menu provides access to many resources, including the Accuracy Center, the Performance Assistant, Dragon’s Help, the Tip of the Day, the Sidebar, Tutorial and Interactive Tutorial, a link to printable documentation, and links to Web resources. A user can get help at any time by saying “Give me help.” When dictating into Dragon’s native dictation box and some other applications, all of Dragon’s features for text control are available. In the new release, Dragon automatically displays the dictation box when you dictate into a text field for which it does not have full text control. After you finish dictating, you can transfer the text from the dictation box to the desired application quickly by voice. (This option can be turned on or off based on your preference). Dragon 12 lets you specify preferences for commands within Dragon. By giving you the option to disable certain commands, Dragon can boost performance, as well as avoid an unintended command. In order to avoid unintended actions, Dragon now, by default, requires you to say “Click” before the name of a menu, button, check box, other interface control, or hyperlinks. You can now turn this requirement on or off for menus separately from other controls. The new command “Open Top Website for <keywords>” directly opens the top-ranked web page for the keywords you include when you dictate the command. You can say this command at any time, whether or not a Web browser is currently open. In particular, this is a convenient way to quickly open the website of a company or institution. Professional and Legal Editions have added features for administrators, including a recognition log file for each end-user for usage information which can give users targeted advice and measure return on investment. Dragon's Auto-Transcribe Folder Agent (ATFA) manages the flow of transcribed text and synchronized audio of digital Speech Strategy News August 2012 voice recordings to streamline third-party review and correction. Peter Mahoney, Chief Marketing Officer for Nuance and Senior Vice President, General Manager, Dragon, said that, with the new improvements, “The technology simply disappears and your ideas flow onto the screen in front of you.” Dragon NaturallySpeaking 12 is available for preorder immediately starting at $99.99, with availability as a download on August 3. Google voice search (cont.) Barra told Wired: “It’s very deliberately not making jokes with you. Google is a neutral party— it’s not your friend, secretary or sister…It is an information retrieval entity…And it’s very important that this entity be impartial, and adding jokes and other mannerisms to the voice would take away from that.” Barra said that the name of the function is simply Google Voice Search. Barra also emphasized that Google’s text-tospeech voice is something special. He said that the solution can speak in the same voice whether using a TTS engine on the device or in the network. The network-based solution, he said, uses a lot of speech data to give a natural feel. He said that, in contrast to TTS voices created for telephone applications, the voice was created for voice search; he said the voice was the “first conversational voice” in speech synthesis. Barra said that some of the things that Google did in Jelly Bean are representative of where the company thinks the industry should go in the mobile space. One was the home screen experience, where “stuff appears and actions can be invoked, without having to dive into an application.” The second thing he mentioned is more efficient task switching. He gave the example of calling someone back. That function should not be three clicks away; it should be one click away. Google will be trying to make easier access to all the specialized applications evolving. Barra also addressed the objective of the Nexus pad computer. He said it is focused on delivering digital content—movies, books, magazines, and gaming. He emphasized the suitability for highperformance gaming as a distinguishing factor, with the device containing a gyroscope and a powerful Graphical Processing Unit, and Google Play as an integrated resource for content and games. Continued from page 1 To access Google Now, one can swipe up from any screen. Once in Google Now, one can say, “Google” to initiate voice search. The multiple-pane results can be displayed in response to a search, providing additional information that is closer to the answer to a search than the list of web sites provides. Google says it has improved voice search so that it can display answers to spoken questions from sources including Wikipedia, the CIA World Factbook, and Freebase, a community-run knowledge database. For most text or voice search queries, the context is detected from the phrase, e.g., the name of a sports team (to get news on that team) or an airline flight (to get flight status). There are some less obvious alternatives, e.g., “area code 215” will give the location of that area code; “Translate to Spanish, Where is the Palace Hotel?” will provide the translated phrase; and “pictures of…” or “images of…” will launch an image search. Google has also updated their search results to include a new scientific calculator. Formerly, the results would just show the calculated results if you were to type, for example, 5+5, but now the result will pop up on a full calculator on-screen with 34 buttons including logarithmic functions. Google is providing search capabilities that make search similar to a voice assistant, like Apple’s Siri. But in contrast to Siri, Google seems to be avoiding treating the search capability as a single personal assistant. Some insight into the philosophy behind this was provided in an interview with Hugo Barra, Android’s director of product management, published in July in Wired. 19 News briefs VoiceVault releases new generation of its speaker verification VoiceVault announced the release of the next generation of its voice biometric speaker verification engine on July 30 (see interview, p. 13). The new voice biometric engine delivers a false accept rate of 0.01% at false reject rates of less than 5%, the company indicated. VoiceVault says it has a verifiable equal error rate (EER) of only 0.1%, compared to a typical EER of around 2% in other voice biometric deployments, based Speech Strategy News August 2012 20 on a “real-world financial services application,” used for authorizing high-value financial transactions on a smartphone, where voice biometrics is part of a 4-factor security cocktail. Samsung reduces search capability in its Galaxy SIII smartphone, apparently in response to Apple patent suit When users update the software in Samsung’s Galaxy S3 smartphone, responding to what seems a maintenance release, they will find they can no longer use the search function for device-local information such as contacts, apps, and other on-device material using software developed by Google as part of Android. This is apparently a response to Apple’s patent lawsuit against the device, and is an example of how patents on user interface features are not in the interest of consumers. Nuance hints that its personal assistant aimed at corporations will be called “Nina” In a brief announcement, Nuance indicated that it would release a corporate voice assistant branded Nina this summer, aimed at supporting customer service on mobile phones. Voxeo Labs announces strategic partnership with Deutsche Telekom Voxeo Labs, part of Voxeo, announced their strategic partnership with Deutsche Telekom AG in Europe. The new partnership introduces the Tropo API as an addition to Deutsche Telekom’s Developer Garden. The Tropo API by Voxeo Labs enables developers to make and receive phone calls and text messages from any web or mobile application, using a web-based API and pay-as-you-go pricing. Tropo offers many advanced features including speech recognition, text-to-speech, conference calling, and call recording, all using web technologies and programming languages developers already know. Pronexus adds web sites to support IVR developers Pronexus announced that it has launched two new websites, VBVoice.com, the free toolkit offered by Pronexus for developers to build their own Interactive Voice Response (IVR) solution, and Pronexus.com. Gary T. Hannah, President and CEO, Pronexus, said, “The VBVoice toolkit is widely used and continues to win in verticals like healthcare, government, financial, and consumer.” Spoken Communications partners with Varolii to combine customer interaction applications with Spoken’s inbound capabilities Spoken Communications, provider of a cloud platform for contact centers (see interview, SSN, May 2012, p. 17) announced a new partnership with Varolii. Varolii’s cloud-based communication services help organizations effectively interact with large numbers of customers and employees through voice, text messages, smartphone applications, and email. The new partnership supports Varolii’s customer interaction applications with Spoken’s inbound capabilities. Spoken’s enterprise cloud features the full suite of inbound contact center functionalities, including call switching, recording, monitoring and analytics as well as the company’s speech recognition IVR. Chinese search giant Baidu opens tech lab in Singapore, with a partial focus on speech applications Baidu, which already has offices in China and Silicon Valley, opened its first R&D lab in Singapore, the Baidu-I²R Research Centre (BIRC). The lab is a joint venture with Singapore’s Agency for Science, Technology and Research (A*STAR). The objective is to create new technologies for the Southeast Asian region. The research group has reportedly already developed speaker authentication and other speech technology for the Vietnamese and Thai language, apparently through a technology agreement with A*STAR (not yet a commercial product, but intended for mobile devices). The lab projects include natural language processing, information retrieval and information extraction, and speech processing systems. These should eventually find their way into Baidu’s Box Computing and Baidu Cloud mobile platforms. SRI reveals voice assistant for Spanish banking group BBVA According to a report from TechCrunch, SRI International is working on a new project for Spanish banking group BBVA. The browser-based system “Lola” is designed to help users with online banking, imitating the style and manner of a human bank teller. The system operates by chat or speech, at least as Speech Strategy News August 2012 21 demonstrated to TechCrunch. This is the sort of company-specific personal agent that I’ve predicted every company of any size will need someday. Northwest Multiple Listing Service selects Interactive Intelligence’s IP communications software suite Northwest Multiple Listing Service (Northwest MLS), a consortium of real estate brokers, has selected Interactive Intelligence Group’s all-in-one IP communications software suite, Customer Interaction Center (CIC). The real estate listing service is replacing a Nortel telephony system with CIC, which will support all employees at its Kirkland, Washington, headquarters, and at the company’s 16 satellite locations. Interactive Intelligence reseller, KRP Communications, will provide CIC deployment services and ongoing maintenance for Northwest MLS. Northwest MLS president and CEO, Tom Hurdelbrink, said, “When call volume is heavy in one location, we’ll use CIC to automatically route calls to another office. Similarly, when a particular area is affected by weather-related interruptions, calls can be routed to a different location or to employees working from home. This will enable us to respond more quickly and consistently to our members.” CallMiner analytics solution adds new personalization features CallMiner, which provides customer analytics solutions for contact centers (SSN, March 2012, p. 9), has today announced availability of Version 9.0 of its flagship Eureka! solution. Version 9.0 carries with it a new set of features called “myEureka,” which enable personalized portal functionality to be delivered directly to users. Scott Kendrick, VP of Product at CallMiner, said, “Until now, contact center analytics has been the preserve of dedicated analysts. myEureka…pushes actionable business intelligence insights directly into the workplace, to the people who need and can act on it in real-time. MyEureka delivers relevant data to stakeholders at every level: the VP who manages contact centers and/or BPOs, the Supervisor who manages a team of agents, and to agents themselves, to provide direct feedback on performance and where improvement is needed.” Hungarian telecom operator introduces voice identification in customer service operations using Nuance voice biometric solution Magyar Telekom provides fixed line and mobile communications services for residential (T-Home and T-Mobile brands) and SME customers (Telekom brand) in Hungary. Magyar Telekom is the first in Hungary to introduce voice-based identification to facilitate safer and more convenient customer service solutions. Powered by Nuance voice biometrics, the system currently identifies 20 million customers. Sandata Technologies Electronic Visit Verification technology to improve visibility and oversight of home care delivery for Louisiana Department of Health and Hospitals Sandata Technologies, a national provider of information technology solutions for the home care industry (SSN, May 2012, p. 14), announced that the Louisiana Department of Health and Hospitals has licensed its Electronic Visit Verification solution, Santrax Payor Management (SPM). Sandata’s partner, CNSI, Inc., was awarded the Medicaid Management Information System Replacement and Fiscal Intermediary Services contract for Louisiana’s Medicaid program. Through a subcontract with CNSI, Sandata will provide the SPM solution to meet the visit verification requirements. Santrax Electronic Visit Verification includes voice biometrics to perform speaker verification. Easy Voice Biometrics allows finding closest match to individuals when comparing voice files Easy Voice Biometrics is a partnership between several companies to provide forensics voice products, with advanced support. The organization offers a product by the same name, Easy Voice Biometrics, a product that allows technicians to find the closest match when comparing voice files. The product is intended for professional audio forensics specialists to allow for quick comparison and identification of voice files. Mathematical voice ID methods are used along with other methods including voiceprint, pitch, and formants analysis, linguistic and auditory analysis. The Easy Voice Biometrics product is designed for law enforcement agencies, state and private forensic audio investigators, detectives, and lawyers to perform the following tasks: Speech Strategy News § § August 2012 22 Facilitate voice expert identification analysis in the performance of multi-target forensic audio investigation by eliminating imposters and ranging the top-in-the-list speakers according to the biometric traits likelihood probability. Express attribution of the investigated speakers' voices by the proximity degree. Google adds voice search to Google+ Local on iOS Google modified its Google Places service in May, adding data from its purchase of Zagat and renaming it Google+ Local. The company has now similarly revamped its iOS app (which is also now known as Google+ Local). Voice search is now included. Zagat scores are now included alongside Google user reviews, and one can rate business and locations in the app, making the feature somewhat of a competitor with Yelp. VoiZapp app uses Android speech recognition to post to a Facebook news feed VoiZapp Inc. launched the Android app “Friends Aloud.” The application allows Facebook participants to access, listen to, and post status updates and comments by voice to their Facebook news feed. It uses Android’s built-in speech cloud recognition from Google. It can also use text-to-speech capability to read aloud Facebook news feed posts and their associated comments. Mossberg review of Android Jelly Bean criticizes the voice assistant Although Walter Mossberg in the July 11 issue of the Wall Street Journal gave general good reviews of the new Google Nexus pad computer, he also discussed the latest Android release, “Jelly Bean.” He commented briefly that the voice assistant function didn’t seem to measure up to Apple’s Siri (without providing much discussion of how he arrived at that conclusion). Wolfram|Alpha provides answers to Samsung’s S Voice as well as Apple’s Siri Wolfram|Alpha, which attempts to answer a broad range of questions over a long list of subjects) and contributes answers to Apple’s Siri, announced that it is also providing data to Samsung’s S Voice. The Samsung Galaxy S III, as well as the Galaxy Note, will now include the Wolfram|Alpha knowledge base with S Voice and the productivity app S Note. Users will be able to get answers to factual questions. Users can ask questions such as “How high is Mount Everest?,” “Who is Barack Obama?,” or “What is the weather like today?,” and Wolfram|Alpha will give the correct answer. BMW to incorporate Nuance voice command in its dashboards BMW will be the first automaker to incorporate Nuance Communications’ Dragon Drive voice messaging technology in its BMW 7 Series flagship luxury sedans as well as the BMW 3 Series Touring and ActiveHybrid. Nuance is starting small. The first Dragon Drive application will be an SMS service, allowing drivers to send a text message to a number or contact in their address books as well as dictate the message itself. That service will start appearing in vehicles on dealer lots this summer. But soon, Nuance is expected to start layering on more functions. BMW is implementing Dragon Drive’s initial service, messaging, which allows drivers to listen to speech-transcribed e-mail and SMS, as well as dictate, edit, format, and send messages via voice command. Horizon Private Cloud provides outsourced services to Voice Automated, a Nuance Dragon reseller Horizon Private Cloud (HPC) announced a cloud desktop services agreement with Voice Automated, a distributor of speech recognition applications for the healthcare, medical, and legal industries and a reseller of Nuance Dragon products. Under terms of the services agreement, Horizon Private Cloud is providing Voice Automated with cloud desktop hosting, application virtualization, and data protection and storage from HPC’s data center in Irvine, CA. Future plans call for an additional healthcare and legal software hosting to service customer throughout North America. Robert Christiansen, General Manager of HPC, said, “We provide a unified cloud solution, meaning all your apps and data, are available anywhere, on any device…Companies don’t want to buy infrastructure, licensing, and employ IT staff. They simply want the service (their apps and data) delivered to them seamlessly while only paying for what they use. We see a perfect fit with Voice Automated and the services they provide for their healthcare and legal customers.” Speech Strategy News August 2012 23 Motorola’s new ATRIC HD phone for AT&T automatically goes into car mode when docked The new Motorola ATRIX HD Android-based smartphone is available from AT&T for $99.99 with a two-year agreement. The phone comes pre-loaded with SMARTACTIONS, a free app from Motorola that suggests ways to automatically change the phone’s settings throughout the day. For instance, when you place the Motorola ATRIX HD in the Vehicle Navigation Dock accessory and enable Drive Smart, it will set your phone to vehicle mode, read your text messages aloud, and auto-reply to incoming calls and texts, as well as provide turn-by-turn navigation. Dictation features in new Mac OS is done in the network, some personal data is used The AppleInsider web site reviewed the speech recognition (dictation) features in the recently announced Apple Macintosh operating system, OS X 10.8 Mountain Lion (SSN, July 2012, p. 1). The speech-to-text works everywhere that one can type—one simply clicks on a microphone icon to dictate. The audio is not converted to text on the Mac; the audio is sent to Apple’s servers, thus requiring an Internet connection. AppleInsider emphasized that Apple is careful about making sure users understand the privacy issues. Dictation is turned off by default, for example, and users are warned that the audio leaves their computer. Apple says that it uses the data to improve the speech recognition accuracy. It also downloads other data; Apple’s warning includes: “Your computer will also send Apple other information, such as your first name and nickname; and the names, nicknames, and relationship with you (for example, “my dad”) of your address book contacts. All of this data is used to help the dictation feature understand you better and recognize what you say. Your User Data is not linked to other data that Apple may have from your use of other Apple services.” Mercedes-Benz adds connection to Apple Siri to its COMAND navigation system Apple is working with some car manufacturers to integrate the Siri speech recognition system used in its iPhone to enhance the in-car infotainment options available. Working with voice control buttons that already exist in many Bluetooth-enabled cars, one will be soon be able to access a range of Siri functions like selecting and playing music, hearing and composing text messages, using maps and getting directions, and getting calendar information and reminders. Input will be via the car’s built-in microphone and output via the vehicle speakers. Mercedes-Benz will apparently be first to market, having launched their new A-Class with a specific module on the COMAND navigation system. It’s will be legal to text while driving in California if you use speech recognition in the new year California lawmakers have made it illegal for you to type text while driving, but if you have speech recognition on your phone, it will be OK to speak your message through Apple’s Siri or a similar voice assistant. Gov. Jerry Brown just signed a bill that clarifies the state’s texting laws. Sending and receiving text messages through hands-free speech recognition and speech synthesis is legal. SoundGecko web application and mobile phone app is a text-to-speech service SoundGecko is a web application that’s essentially a text-to-speech transcription service. Drop a URL into SoundGecko and it converts the article at the URL into speech. The web app also integrates with cloud services and an iPhone app. The simplest way to use SoundGecko is to have it send an email with the file, but it can be integrated with Dropbox or Google Drive for immediate syncing. The iPhone app will also directly sync up articles you've converted, and can use a Google Chrome web browser extension to add articles on the fly. Veveo provides predictive search on Android phones, including personal info on phone Veveo announced vtap QuickSearch, a predictive “search as you type” universal search application that is context-aware and personalized for Android smartphone users. vtap QuickSearch works on all Android devices (running version 2.1 or later) including the Samsung Galaxy III and Galaxy Nexus, which recently lost its local-device search functionality in the latest software upgrade, possibly as a response to Apple’s patent suit. vtap QuickSearch searches across device content including Contacts, Calendar, Music, Text Messages, Device Settings, and others, to seamlessly merge with online results from various Android app stores, Wikipedia, Wiktionary, movies, local business listings and places of interest. vtap QuickSearch then prioritizes the results based on learned user preferences. Using Veveo’s predictive search, the results are Speech Strategy News August 2012 24 displayed as a user types, thereby providing instant search results that update with each additional keystroke. The application is available for download on the Google Play Android store. Veveo’s QuickSearch is an OEM application for smartphone manufacturers to embed universal search and personalization capabilities directly on their devices. With multi-lingual capabilities for more than 50 languages in QuickSearch, the localized versions of vtap QuickSearch will soon be available for other international stores. Nuance Dragon Dictation and Dragon Search apps now available in Vietnam Nuance announced that its Dragon Dictation and Dragon Search applications for the iPhone, iPod touch, and iPad are available free in the Vietnam App Store. Supporting the Vietnamese language across regional variations in pronunciation, the launch offers Vietnamese consumers a fast and convenient way to dictate SMS text messages, emails, social media updates, mobile Web searches and more. Michael Thompson, executive vice president and general manager, Nuance Mobile, said, “Dragon Dictation and Dragon Search have already demonstrated incredible success across Asia, and we are thrilled to expand the availability even further with the debut in Vietnam. The rapid worldwide adoption of the Dragon apps demonstrates the strong consumer demand for voice-enabled mobile interfaces on a variety of iOS devices.” Samsung TV has voice recognition Samsung’s new 75-inch ES9000 series TV has voice control in addition to many other features, such as 2D and 3D compatibility (four pairs of glasses included), and extensive Smart TV features (including Internet streaming). The ES9000 also has a built-in Skype-compatible camera for videophone calling, as well as support for gesture and face recognition control and Wireless Bluetooth streaming from compatible portable devices. HondaLink allows communicating with your car using your smartphone The new Honda Fit EV electric car includes the company’s HondaLink system as standard equipment. If an owner downloads the Fit EV application, he or she can communicate with the vehicle from a smartphone running iOS or Android, a personal computer, or the interactive remote. HondaLink allows you to monitor the Fit EV’s state of charge (and estimated range), begin charging, or see how long it will be before the car is fully charged. To help reduce the cost of charging, the system allows you to set the charge timer to take advantage of off-peak charging rates, as well as to pre-cool the cabin using electricity from the utility rather than the car’s battery. A navigation system includes the location of both 120-volt and 240-volt public charging stations. HondaLink also allows one to stream various smartphone applications through the stereo by voice commands, touch-screen commands, or steering-wheel buttons. It will connect drivers to cloud-based news, information, and media feeds. HondaLink can announce the latest messages on a Facebook wall or Twitter feed. HondaLink can read a downloaded book from your smartphone to you on the morning commute, or announce calendar reminders verbally. The system is also set to debut in the fall on the 2013 Honda Accord. TalkTalk chooses Nexidia Advanced Interaction Analytics for its phone services Nexidia announced that TalkTalk, a UK provider of home phone, broadband and mobile services to consumers, chose Nexidia Advanced Interaction Analytics to ensure quality, compliance, and performance consistency across its internal call centers and outsourcer network. The license agreement includes Nexidia’s OnDemand hosted services and Managed Analytics and Business Services. United Hospital System selects M*Modal clinical documentation system with speech recognition M*Modal, which is to be acquired for approximately $1.1 Billion by One Equity Partners (p. 34), announced United Hospital System has selected the M*Modal Fluency family of cloud-based solutions for clinical documentation. As part of the agreement, United Hospital System will roll out M*Modal Fluency Direct, M*Modal Fluency for Transcription, and M*Modal Fluency for Imaging to its facilities across Wisconsin and northern Illinois. The selection will include speech recognition and understanding to improve Speech Strategy News August 2012 25 the productivity of in-house transcriptionists while boosting physician adoption and Electronic Health Record usability. Toni Kuehl, Director, United Hospital System, indicated that the speech recognition feature was an essential determinant of the choice: “M*Modal Fluency Direct voice-enables our electronic health record improving the accuracy of clinical outcomes documentation using the power of the physician’s spoken word. Because of its usability, the technology is able to complement physician workflow and give doctors time back in their day to focus on patients. The M*Modal Fluency Direct solution will assist our physicians in creating a higher quality clinical document with instant turnaround time.” Physician dictation is transformed into electronic documents that are structured, clinically encoded, searchable, and shareable. Providence Health & Services deploys Nuance Dragon Medical 360 Network Edition Nuance Communications announced that Providence Health & Services, ranked by Thomson Reuters among the top 20% of best-performing health systems in the country, is deploying Dragon Medical 360 | Network Edition across its healthcare enterprise, making medical speech recognition available at 27 hospitals and 250 clinics. The organization-wide deployment of Dragon Medical will support Providence’s rollout of the Epic Electronic Health Record (EHR) system by empowering clinical staff to document and navigate the EHR by speaking. Over the next year, Dragon Medical will be seamlessly integrated with Epic for approximately 8,000 Providence clinicians. Once fully deployed, clinicians will be able to interact with, document and navigate through the EHR simply by using their voice—a workflow that Nuance indicates is proven to be more efficient and natural than typing alone, leading to faster EHR system adoption and improved physician satisfaction with EHR use. With a voice-enabled EHR, documentation can be done by speaking in freeform or to trigger various clinical templates and medical record review, and sign-off can occur in real-time— eliminating the time lag and costs associated with medical transcription. Janet Dillione, executive vice president and general manager, Nuance Healthcare, noted that the company’s voice-driven clinical documentation solutions hare being used by more than 450,000 clinicians across 10,000 healthcare facilities. Terra Nova provides Health Sciences North with transcription and speech recognition editing services Health Sciences North (HSN), a major healthcare provider based in Sudbury, Ontario, Canada, has selected Terra Nova as their outsourced clinical documentation partner. Terra Nova provides clinical documentation services to hospital and clinic facilities in Canada and the Unites States. Terra Nova said it achieves an accuracy rate of more than 99%. 4medica’s cloud-based Electronic Health Record adds medical speech recognition from Nuance 4medica announced that its cloud-based Integrated Electronic Health Record (4medica iEHR) will include medical speech recognition using Nuance Healthcare's 360 | Development Platform. Completely webbased, the combined solution will allow physicians to document care via voice anytime or anywhere they can connect to the Internet. Oleg Bess, M.D., 4medica CEO, said, “It's an easy, natural way for physicians to take advantage of mobile technology, and, as a result, increase their productivity, enhance care delivery, and improve the accuracy and timeliness of clinical documentation.” 4medica's iEHR enables hospitals, physicians, labs and health information exchanges (HIEs) to aggregate laboratory, imaging, pathology, e-prescribing and inpatient data from multiple sources into a single patientcentric record. With the new integrated speech recognition feature, clinicians can speak into the 4medica iEHR Note Writer and add their own notes to the template. Alternatively, they can populate open fields of the clinical note by voicing their desired selection. They can also view the narrative note in real-time on the screen and make corrections by manually editing, deleting, or adding to the dictated note. me2me releases a new version of its digital dictation app for iOS and BlackBerry devices, targeted at the healthcare market through M*Modal partnership Me2me Corp., a software company delivering mobile dictation, transcription and speech recognition solutions (SSN, June 2012, p. 10), has launched a new version of its Frisbee Smart App for iOS and Speech Strategy News August 2012 26 BlackBerry. It communicates with the Frisbee Enterprise Server solution, enabling users to send dictation that is immediately available for the transcription/editing staff. Thanks to a recent partnership with M*Modal (p. 34), healthcare professionals can also now make full use of the Frisbee Enterprise iOS or BlackBerry App to record with high quality audio totally suitable for use with speech recognition in the cloud. Leon Medical Centers selects IDS for speech recognition, mobile dictation, and workflow solutions Leon Medical Centers, a multi-specialty healthcare services provider for more than 38,000 Medicare patients in Miami-Dade County, has chosen Integrated Document Solutions (IDS) to streamline its radiology reporting using IDS’s cloud computing workflow portal, mobile applications, and speech recognition technology. Leon’s radiologists began using IDS’s speech recognition application Voice2Dox earlier this year to document diagnostic imaging studies, reducing turnaround times and transcription costs. By integrating IDS’s AbbaDox ecosystem with Leon’s patient scheduling and Picture Archiving and Communications System (PACS), providers are able to unify worklists and standardize reports across Leon’s locations and facilities throughout the county. Maureen Desoria, RN, BSN, JD, Director of Clinical Services, Leon Medical, said, “Because we serve a predominantly geriatric population, timely and accurate diagnosis is critical. The AbbaDox system interfaces with our existing technologies and Electronic Medical Record, which allows radiologists and physicians to provide treatment and secure our patients’ health.” Google search box in Chrome web browser displays calculator when calculation is entered, allows voicing equation Google launched a new feature in its search engine that displays a scientific calculator as well as the results of a calculation. Voice search in mobile devices or the Chrome web browser can be used to input calculations without touching the keyboard. The calculator has a full complement of scientific functions, including sin, cos, tan, log, exponential, and square roots. SpeakGlobal adds text-to-speech to its English language learning site for Japanese learners SpeakGlobal in Japan, which provides web-based “chat robots” with speech recognition for English language learning, announced the addition of text-to-speech (TTS) to its SG World global chat site. Now, site visitors will find a text-to-speech voice in designated chat rooms. Visitors simply type a text message on the screen. The written text will immediately appear above their avatar, and simultaneous audio of the text is heard automatically. The TTS speech features both male and female voices with natural, standard American English pronunciation. Raytheon BBN awarded DoD contract to develop a foreign-document translation system The Defense Advanced Research Projects Agency (DARPA) has awarded Raytheon BBN Technologies, a wholly owned subsidiary of Raytheon, an additional $5.9 million in funding under the Multilingual Automatic Document Classification, Analysis, and Translation (MADCAT) program. This award follows Raytheon BBN’s participation in the first four years of the MADCAT program. The Raytheon BBN team’s goal is to create a prototype system that provides accurate, relevant, distilled, actionable information to military commands and personnel. If successful, the system will automatically convert foreign language text images, such as handwritten notes and machine-printed documents, into English transcripts without the use of linguists and analysts. When human analysis is necessary, linguists and analysts would be able to use the technology to more effectively and efficiently explore the content of documents of interest. Under the contract, BBN is tasked to advance previous work under the MADCAT program to refine a laptop-deployable prototype translation system, integrate optical character recognition with Raytheon BBN’s translation and distillation techniques, and develop novel methods to process handwritten text. Carnegie Speech provides English language training with speech recognition for training institute in Dubai Carnegie Speech, which provides English assessment and instruction software, announced its English language learning technology was selected by ITEC, an academic/training institute in Knowledge Village, Dubai, UAE, to help meet the growing demands of spoken and aural English skills. ITEC will use Carnegie Speech’s NativeAccent English speech training software to improve student spoken-English skills. Speech Strategy News August 2012 27 NativeAccent software combines advanced speech recognition and intelligent tutoring technologies to improve English speaking and listening skills for personnel at multinational corporations and students at international academic institutions. Featuring personalized learning paths based on student mother-tongue, gender and English proficiency, as well as real-time analysis pinpointing student English errors and delivering immediate remedial instruction, NativeAccent minimizes English accents among multinational English speakers to improve communications, reduce errors and increase business efficiencies. Goya Foods chooses Wavelink for voice-enabled warehouse picking solution Goya Foods, which makes and distributes more than 2,000 products globally and has a total of 13 distribution centers, has chosen Wavelink’s Speakeasy voice-enabled stock picking in its new warehouse management system. Luis Ramos, general manager at Goya Foods, explained, “We needed to be more efficient across the board. Speakeasy's voice enabling technology gives picking greater efficiency, more accuracy, and is safer than traditional picking. We went from 40 - 60 mispicks a night to almost non-existent mispicks a night.” He added, “Our employees embraced the change. They immediately saw the safety and efficiency benefits of voice picking and understood that it ultimately gave them more quality time off the job.” Stephen Bemis, vice president of worldwide sales at Wavelink, expanded on the benefits: “Speakeasy's text-to-speech and speech-to-text technology gives companies exactly what they need in voice software with tremendous flexibility. Also, the flexibility of the product doesn't require a user to be assigned to a specific device and can be used across multiple shifts. Any individual can pick up a device, no matter their native language and begin working immediately. This was important for Goya Foods where both English and Spanish are spoken.” Wavelink was recently acquired by LANDesk Software. W3C Multimodal Interaction Working Group publishes “Registration & Discovery of Multimodal Modality Components in Multimodal Systems: Use Cases and Requirements” The Multimodal Interaction Working Group has published the First Public Working Group Note. The latest published version can be found at www.w3.org/TR/mmi-discovery. The background and objective of this WG Note is as follows: § The users of mobile phones, personal computers, tablets or other electronic devices are increasingly interacting with their devices in a variety of ways: touch screen, voice, stylus, keypads, etc. § Today, users, vendors, operators and broadcasters can produce and use all kinds of different media and devices that are capable of supporting multiple modes of input or output. Tools for authoring, edition or distribution of Media for Application developers are well documented. But there is a lack of powerful tools or practices for a richer integration and semantic synchronization of all these media. § To the best of our knowledge, there is no standardized way to build a Web application that can dynamically combine and control discovered modalities by querying a registry-based on user-experience data and modality states. This document describes design requirements that the Multimodal Architecture and Interfaces specification needs to cover in order to address this problem. Shanghai Zhi Zhen Internet Technology sues Apple in China over Siri Shanghai Zhi Zhen Internet Technology, a Shanghai-based company with personal assistant software, has sued Apple over of its Siri technology, claiming patent infringement.The company is the developer of software called “Xiao i Robot” that communicates through voice, and can answer users’ questions while also holding simple conversations. In 2004, the company applied for a patent in China covering the technology, and was later granted it in 2006. Apple’s Siri, became available in China starting early this year, when the iPhone 4S was officially launched. Last month, Apple said it had incorporated Chinese Mandarin and Cantonese languages into Siri. International Research Consortium (U-STAR) launches translation app International Research Consortium (U-STAR, an organization with members from 23 countries) announced a network-based speech-to-speech translation application, “VoiceTra4U-M.” Singapore’s Institute for Infocomm Research (I2R), an institute of the Agency for Science, Technology and Research (A*STAR), was a founding member of U-STAR. The application allows speech translation in 23 Languages and will allow up to 5 users to chat simultaneously. Speech Strategy News August 2012 28 U-STAR, currently comprised of 26 institutes from 23 countries, has been conducting ongoing research on speech translation. U-STAR and its members have collaboratively developed the multilingual speech translation system to provide translation services via a publicly released client application, by connecting the servers of U-STAR member institutes. More languages will be available when other research entities participate by plugging in the U-STAR speech translation communication protocol libraries. U-STAR also seeks to utilize the log data of speech translation collected from field experiments, helping each research organization raise their accuracies in speech translation technology, as well as encouraging business opportunities for the speech translation service to be cultivated in various markets. Voxbone provides phone network for Lexifone speech-recognition-based realtime translation service Voxbone announced it is enabling Lexifone to launch a real-time language-translation service that combines Lexifone’s phone-interpreter technology (SSN, March 2011, p. 6) with Voxbone’s IP voice services. The new Lexifone service, aimed at consumers and businesses, allows each party in a telephone call to speak and be heard in his or her chosen language. It is being launched worldwide after availability to selected users in closed beta trials for the past five months. Lexifone is a service, not an app, that may be used on any phone—landline, mobile or VoIP—without an Internet connection or software download. To reach Lexifone’s translation bridge, users simply dial local telephone numbers that Voxbone will make available in more than 50 countries, then select the languages and the numbers of the people they want to call in 120 countries. Lexifone translates the caller’s spoken language, such as French, into the language the called party can understand, such as German, and vice versa. The Lexifone automated phone-interpreter service, which currently accommodates translations into seven languages and 15 dialects, relies on Voxbone’s access numbers and all-IP network. Dr. Ike Sagie, Lexifone CEO and founder, said, “The caller’s connection to our language-translation platform must be crystal-clear because sound quality is crucial for accurate voice recognition.” Microsoft touch keyboard in Windows 8 corrects some touch mistakes In a company blog, Kip Knox of Microsoft discussed the touch keyboard in Windows 8. In Windows 8, Microsoft set out to improve on text input support. The result was a standard touch QWERTY layout in English. Knox said that exploring other options led back to the keyboard. Microsoft researchers conducted an in-depth study in which they observed people “living with” tablets over a period of time. Microsoft found that, when typing on a tablet, most people either set it on their lap or a table and multi-finger type, or hold it in their hands and type with their thumbs, or hold it with one hand and “hunt and peck.” Our standard touch keyboard layout is optimized for laying the tablet down and multi-finger typing, and also works well for typing with one hand. We also introduced a new layout we call the thumb keyboard (which we showed for the first time at our very first preview of Windows 8 about a year ago), which is designed for holding the tablet with two hands and typing with your thumbs. This keyboard is adjustable in size, to accommodate different hand sizes. An interesting observation from our posture research is that people frequently switch postures, and that posture switch is often seen as a positive thing, as we move about to remain comfortable. So in our keyboard layouts we also considered what it would be like to type for a period of time—say, an email to your mom—and switch postures while you do it. You might start by typing with the tablet lying on the coffee table, for example, but then you might tire of that posture and pick up the tablet, lie back on the couch, and interact with two thumbs. Microsoft added a version of the keyboard layout that made the thumb version easier, e.g., adjusting the size of the keyboard to a hand size. If you lay down a piece of glass and type on it, Microsoft noted, you get no feedback, as you do on a physical keyboard; there is no indication for where to position your hands, and there is no indication of whether you’ve hit a target or not. But they also observed that a touch keyboard can do things that a physical keyboard can’t. The keys change color when you touch them, and they trigger a subtle sound, which Microsoft tried to make minimal to avoid irritation. Knox indicated that Microsoft explored haptic feedback (a vibration of the device based on input), but most people find the current state-of-the-art haptics somewhat irritating when typing pieces of any length. The Windows 8 touch keyboard compensates for some typing errors with what Microsoft calls the “touch model.” When a user taps a key on the touch keyboard, the software detects the coordinates of the touch and Speech Strategy News August 2012 29 maps it to the geometry of the keys. But the press migrates outside the boundaries of the key. The key press is first compared against a model that assesses the likelihood that you intended to strike that key or a key near it and uses a model of typical errors along with lexical information to correct the touch. Siri knows which is the best phone now There was well publicized “oops” when Siri answered the question, “What is the best smartphone ever?” with a Windows Mobile phone. She’s been fixed. Here’s how a conversation currently goes: Q: What is the best smartphone ever? Siri: You're kidding, right? Q: What is the best smartphone ever? Siri: Wait, there are other smartphones? Q: What is the best smartphone other than iPhone? Siri: Would you like to search on the Web for "What is the best smartphone other than iPhone?" Q: No, what is the best smartphone ever? Siri: The one you’re holding now. National Federation of the Blind sues over US State Department’s purchase of Amazon Kindles, citing limitations of the text-to-speech feature The National Federation of the Blind filed a complaint with the Office of Civil Rights for the United States Department of State, alleging that the State Department’s plan to purchase and deploy 35,000 Amazon Kindles throughout the world violates federal law because blind people cannot independently access and use the devices or their content. The State Department has announced plans to purchase 35,000 of Amazon's dedicated e-reading devices under a sole-source contract, at a cost of $16.5 million, as part of an international learning program being referred to as the Kindle Mobile Learning Initiative. The aim of this program is to create a global e-reader program that introduces aspects of U.S. society and culture directly to young people, students, and international audiences and to expand English-language learning opportunities abroad. The plan will involve deploying the Kindles to embassies, libraries, and other entities around the world. The complaint also alleges that a previous deployment of six thousand Kindles to State Department facilities throughout the world violates the law. Of the Kindles currently available, not all are capable of speaking the content of books. While the State Department proposal specifically calls for the inclusion of this feature, the contract makes no reference to the department’s obligation to purchase accessible technology under Section 508 of the Rehabilitation Act or otherwise require that the devices procured be accessible to the blind. Blind readers cannot independently access the text-to-speech reading and voice-guided menu features of the Kindle, the complaint alleges, and cannot independently navigate within a book once it is opened, meaning that they must simply read it from beginning to end. Accessible Media service adds text-to-speech Online technology that can read the text on a website aloud to its visitors has been installed at www.ami.ca, the website of the Accessible Media Inc. specialty service. AMI operates an audio and TV service in addition to its website; the not-for-profit multimedia organization serves Canadians who require online reading support; the audience includes people with vision loss, people with dyslexia and other perception difficulties, as well as people learning English or French as a second language. The text-to-speech audio software, called BrowseAloud, features a selection of high-quality, natural-sounding voices in both official languages. A new Described Video (DV) Guide provides a list of described television programming across Canada. It was developed in conjunction with the Canadian Radio-Television and Telecommunications Commission (CRTC’s) Described Video Working Group and the Canadian Association of Broadcasters (CAB), and designed to build awareness of described video programming and enable blind or low vision customers to plan their television viewing. Microsoft improves accessibility TTS function in Windows 8 Microsoft has made some changes to text-to-speech tool Narrator on the Consumer Preview of its new Windows 8 accessibility tools. Most of them concern the new touch features, which let users move a finger across the screen to be read the icons or content, and then tap to select. The tools were meant to make touch Speech Strategy News August 2012 30 screens easier to navigate for the visually impaired. To make the connection between the touch and the audio more obvious, Microsoft has added quick audio cues to provide feedback for actions, and it’s streamlined the gestures people used to navigate. Proloquo2Go assistive software offers children with speaking disabilities artificial speech Proloquo2Go from AssistiveWare is an Augmentative and Alternative Communication (AAC) solution for iPad, iPhone, and iPod touch for people who have difficulty speaking or cannot speak at all. Speech can be generated by tapping buttons with symbols or typing using the on-screen keyboard with word prediction. The product is often used by adults and children diagnosed with autism, cerebral palsy, or Down syndrome, as well as stroke victims, but previously the only voice options for children were adult voices or those electronically altered to sound like a child’s voice. Now the app offers real children’s voices. The American children’s voices, called Josh and Ella, were recorded by actual children over the course of several days and include recordings for 14,000 words to match the preloaded images. (Two British children’s voices, Harry and Rosie, are also available.) Google researches computing methods using simulated neural networks Google’s X Lab, headed by co-founder Sergey Brin, is the lab that produced the glasses with the holographic display that received so much attention. Google fellow Jeff Dean and visiting faculty Andrew Ng (from the Stanford Artificial Intelligence Lab) reported other research in a blog post. The researchers reported on a pattern recognition approach using learning algorithms that adjust the parameters of a simulated neural network. The “artificial neural network” was simulated with 16,000 processors and a billion connections in Google data centers. The network was shown YouTube images for a week to see what it would learn—an “unsupervised learning” or “clustering” approach. Apparently, it focused in on cats and learned to recognize them in videos. (What does this say about youTube videos?) Some of the research has been published. James and Janet Baker still pursuing compensation for their Dragon speech recognition technology Old-timers like your editor remember the raw deal speech recognition pioneers Jim and Janet Baker got, when shortly after selling their company, Dragon Systems, to Lernout & Hauspie for stock in L&H, it was revealed that the founders of L&H, who eventually went to jail, were cooking the books. L&H filed for bankruptcy, and the Bakers were left with nothing. ScanSoft (now Nuance Communications) bought the assets of L&H—that’s how they got into the speech recognition business—and inherited the Dragon technology. The Dragon brand is widely used by Nuance today. The New York Times reported the full story, and suggested the Bakers might finally get some compensation. A suit by the Bakers against the investment firm handling the sale, Goldman Sachs (which did get its commission) is being sued by the Bakers, and a resolution may be near. Robots don’t just beep to warn you of movement, they now talk RMT Robotics Ltd. introduced a programmable sound system ADAM RAP for the ADAM mobile robot, uses interactive voice messages and mobile “vehicle in motion” jukebox. ADAM promotes lean manufacturing efficiency in tire manufacturing facilities by automating component handling and orchestrating work-in-process (WIP) logistics, delivering what is needed in the exact time and quantity required. A reactive audio playback application plays various sound bites or text-to-speech audio based on the specific function the robot is undertaking. Although all robots have a standard beeper-based “vehicle in motion” alert system mandated by international safety standards, noise proliferation combined with monotonous beeps diminishes worker alertness. ADAM RAP’s design promotes safer work environment and enhances workerrobot interaction. Analyst compares Siri speech recognition search to Google text search It was widely reported that Piper Jaffray analyst Gene Munster compared Apple’s Siri to Google and found Siri far inferior in answering 1600 questions. A note to the firm’s clients reported: § Google understands 100% of the questions (meaningless, since they are typed in for the Google case— wouldn’t it have been a fairer comparison to use Google’s Voice Actions, the closest it has to a Siri equivalent?). Speech Strategy News August 2012 31 § Google replies accurately 86% of the time (What is “accurate” if the results are a list of web sites? Does this mean that the user could filter the web sites to find the answer, adding human intelligence to the mix?). § Siri comprehends 83% of queries in noisy conditions, 89% in a quiet room (presumably meaning the speech recognition transcribed the speech accurately, although an exact transcription might not be required to get the correct answer. Were the errors of consequence, or “the” replaced by “a”?). § Siri answers accurately 62% of the time on the street and 68% in a quiet room (This is loaded if 11% and 17% of the questions respectively are the wrong question, as reported—no chance of a correct answer. And Siri attempts to get the answer directly, unlike Google at this point, and, when that isn’t possible, does a Google search!) For this newsletter editor, the results were an extreme example of comparing Apples and oranges. I can draw no conclusions from what I’ve seen of this report, although in fairness the full report might be more informative. Loading the dishwasher is still a job! In a wonderful analogy in an interview with the MassDevice web site, Dr. Nick van Terheyden, chief medical information officer for Nuance Communications, points out the problem with high expectations for Electronic Medical Records (EMR). While noting the need for and potential power of good, accessible medical content for doctors, van Terheyden pointed out that no one wanted to wash dishes until the dishwasher was invented, and now no one wants to load the dishwasher. The EMR has a similar problem—no one wants to load it with some of the most valuable data. Of course, Nuance is working on a solution with speech recognition and natural language processing. Taiwan's National Cheng Kung University files patent a lawsuit against Apple over Siri features Taiwan’s National Cheng Kung University has filed a lawsuit in the US against Apple claiming that the company has infringed two patents it holds on speech recognition that it believes are related to Apple’s Siri voice assistant. Statistics and Surveys Smartphones in use worldwide to exceed 2.4 billion in 2016 Yankee Group forecast in July that smartphones in use worldwide will exceed 2.4 billion by 2016, rising almost linearly from 1.12 billion in 2012. Smartphone shipments to grow 38.8% this year to 686 million units Research firm IDC expects global smartphone shipments to grow 38.8% this year to 686 million units. Approximately three quarters of the world’s population now has access to a mobile phone Approximately three quarters of the world’s population now has access to a mobile phone, according to a new study from the World Bank. Fewer than 1 billion mobile subscriptions were active in 2000, while there are six billion subscriptions active today. Last year alone, mobile users downloaded more than 30 billion apps, the study estimated. The majority of today’s mobile subscriptions (5 billion) are in developing countries. World Bank Vice President for Sustainable Development Rachel Kyte, said, “Mobile communications offer major opportunities to advance human and economic development—from providing basic access to health information to making cash payments, spurring job creation, and stimulating citizen involvement in democratic processes.” 325 million Android phones expected to be sold worldwide in 2012 A June Yankee Group forecast predicted that 325 million Android devices and 163 million iPhones would be sold worldwide in 2012. Results from a recent survey by Kantar Worldpanel ComTech, show Android’s share rising to 84.1% in Spain, and it has at least half of the smartphone sales in Great Britain, Germany, France, and Italy. Speech Strategy News August 2012 32 Samsung Galaxy S3 hits 10 million units in sales within two months Samsung’s Galaxy SIII has a voice assistant that is a challenger to Apple’s Siri (SSN, July 2012, p. 1). Perhaps indicating the value of the voice assistant in marketing, the model achieved the company’s stated goal of 10 million sales within the first two months. The success of the SIII was highlighted in Samsung's recent quarterly earnings statement, in which it reported a Q2 2012 operating profit of $5.9 billion. Android has 77% share of China’s smartphone market Android took almost 77% of sales in the first quarter of 2012, according to Beijing-based Analysis International, which specializes in the Chinese market. A year ago, in the first quarter of 2011, Nokia’s Symbian operating system was the market leader with 42.5% of the market against Android's 33.6%. Biometric security to become a “must have” on all smart mobile devices, market research firm claims Goode Intelligence predicts that mobile biometric security will move from “an interesting concept” to a “must-have” feature for all smart mobile devices (SMDs). Alan Goode, founder and Managing Director of Goode Intelligence, said, “Last year, we forecasted that the mobile biometric security market would grow to 39 million users by 2015. This was based on the expectation that initial growth would come from two biometric modalities; embedded fingerprint sensors and voice biometrics.” Goode Intelligence predicts that mobile biometric security “will become a standard feature in SMDs as these devices become the prime computer in both our personal and business life. Whether it is for protecting the physical device or for providing strong authentication and identity verification for a remote service, such as NFC-based mobile payments, mobile phone-based biometrics can offer a wide variety of solutions—the third factor in the palm of your hand.” Apple iPhone maintains consumer interest over Android Despite increasing sales number for Android, a Yankee Group survey found that in May 2012 more people said they intended to buy an Apple iPhone than an Android phone by a margin of 4%. If you are under 34, you most likely use your mobile phone as your primary phone According to Yankee Group, more than 60% of survey respondents from 18-34 said their primary phone was their mobile phone. In the age group 35-44, the percentage dropped to 47.6% and was below 30% in the higher age groups. Voice search from Google on top 10 list of downloaded apps Voice Search from Google (“Voice Actions”) was ranked No. 6 among free Android apps recently, according to research from Google Play. The app has ranked high for many weeks in a row. Google’s summary for the app: “Search the web and your phone by voice and control your phone with Voice Actions. Quickly search your phone, the web, and nearby locations by speaking, instead of typing. Call your contacts, get directions, and control your phone with Voice Actions.” The mobile ad market could reach $18.3 billion by 2015 Even as smartphones account for 10% of the time spent consuming media, they draw only 1% of advertising spending in the U.S., according to EMarketer. Bloomberg News reported in July that the picture is changing as more technology companies, including the social media powerhouses, create mobile ad products and woo big brands such as Target, American Express, and Coca-Cola. Bank of America Merrill Lynch predicts the mobile advertising market will surge to $18.3 billion in 2015, from $3.6 billion last year. Consumers show mixed interest in mobile coupons With advertisers struggling to find the right way to reach consumers using mobile phones, an April 2012 survey by Yankee Group will interest them. The survey asked, “Thinking about mobile couponing, please rate your experience or interest in the following activities on your mobile phone.” About two-thirds of respondents said they would be interested in getting coupons on mobile phones, but only if “it were free.” Speech Strategy News August 2012 33 Global mobile app store revenue to exceed $34 billion in 2016 Yankee Group estimated that global revenue from app stores would rise from 13.2 billion in 2012 to $34 billion in 2016. Hispanic community increasingly using mobile devices as a primary means of Internet access In July, the Hispanic Institute and Mobile Future published a report revealing that Hispanics are increasingly turning to mobile devices as their primary means of accessing the Internet. The report concludes that policymakers must consider Hispanics’ reliance on mobile devices as they implement a national broadband policy by making more wireless spectrum available, ending regressive taxes on broadband users, and continuing to support the Lifeline/Link-Up programs (which offer discounts to qualified, low-income wireless customers). Nearly six out of 10 parents of children aged 8-12 have provided their children with cell phones Fifty-six percent of parents of children aged 8-12 have provided their children with cell phones, according to a new survey conducted by ORC International for the National Consumers League (NCL),. Of those parents, roughly a quarter say they are facing higher bills than they had expected to pay in order for their child to have a cell phone. The top three reasons parents buy cell phones for tweeners are safety (84%); tracking child's after-school activities (73%); and “child asked for one” (16%.) Vocalabs finds that making it hard for a customer to reach an agent serves no purpose In a free July report, Vocal Laboratories (Vocalabs) surveyed over 8,000 customers immediately after a customer service call. Among other findings, the report found that making it hard for a customer to leave an automated system to reach an agent served no purpose and was counter-productive. For example, among customers who reported that an automated system made it hard to reach a person or find the right option, only 2% successfully used self-service. The majority did eventually reach a person, and the rest hung up without getting what they needed. The survey also concluded that customers are much better than the automated system at deciding when they need to talk to a person and when they can use self-service. Contact center campaign survey concludes that the phone remains the most popular communications channel Infinity CCS, a global provider of contact center technology solutions, has announced the results of its 2012 Contact Center Campaign Survey designed to reveal how effective contact centers are in setting up new customer campaigns or services. The key results: § 62% of contact center campaigns are set up in under 3 weeks. § Phone is still the most frequently used communication channel for both inbound and outbound contact (used in over 70% of both inbound and outbound projects). Other channels included email, web contact forms, post, and social media. § 85% say campaign development software makes it ‘easier’ to launch new projects or services. A variety of issues flagged in a survey of contact center professionals A recent study, conducted by International Customer Management Institute (ICMI) and sponsored by inContact surveyed more than 500 contact center professionals from more than 20 countries. Nearly 70% of respondents identified “meeting service level agreements” as a measure of success in a contact center today. More than 40% of contact centers indicated they are experiencing agent attrition, and 25.3% indicated that they were experiencing customer attrition. One major challenge identified in the survey, with 69.8 percent of contact centers, was increased complexity, due to the proliferation of new channels and increasingly multichannel customers. Less than half of those respondents (32.7 percent), however, had upgraded their contact centers to deal with channel proliferation. 600 million smartphones projected to support gesture recognition in 2017 A new study from ABI Research forecasts 600 million smartphones will be shipped with vision-based gesture recognition features in 2017. Speech Strategy News August 2012 34 Financial Notes Nuance reports Vlingo financials On July 23, Nuance Communications (Nasdaq: NUAN) submitted a report to the SEC providing audited financial results for the recently acquired Vlingo. Revenue for the quarter ended March 31, 2012 was reported at $1.3 million with a net loss of $6.3 million. The report listed, as of March 31, 2012, total assets (mostly cash and cash equivalents) of $48.7 million, total liabilities of $54.4 million, and redeemable preferred stock valued at $79.7 million. M*Modal to be acquired for approximately $1.1 Billion by One Equity Partners M*Modal (NASDAQ/GS: MODL), a provider of integrated clinical documentation solutions for the U.S. healthcare industry (SSN, June 2012, p. 45), announced its financial results for the three months ended March 31, 2012. Net revenues increased 5.5% to $117.4 million for the first quarter of 2012 compared with $111.2 million for the first quarter of 2011. Adjusted EBITDA for the first quarter of 2012 was $26.6 million, or 22.6% of net revenues, compared with $26.7 million, or 24.0% of net revenues, for the first quarter of 2011. Net loss for the first quarter of 2012 was $(2.9 million), or $(0.05) per fully diluted share On July 2, MModal and One Equity Partners announced that they have entered into a definitive agreement pursuant to which One Equity Partners, the private investment arm of JP Morgan Chase & Co., will acquire all of the outstanding shares of M*Modal for $14.00 per share in an all-cash transaction. The transaction is valued at approximately $1.1 billion. Under the terms of the agreement, which was unanimously approved by M*Modal’s Board of Directors, M*Modal shareholders will receive $14.00 in cash for each outstanding share of M*Modal common stock they own, representing an 8.3% premium over the closing price on July 2, 2012. On July 11, Glancy Binkow & Goldberg LLP announced that it is investigating potential claims against the Board of Directors of M*Modal related to the proposed acquisition by One Equity Partners. This investigation concerns whether the Board of Directors of M*Modal breached their fiduciary duties to stockholders by failing to adequately shop the company before agreeing to enter into the proposed transaction, and whether the Company has disclosed all material information to shareholders about the transaction. Agero expands cloud-based content delivery to vehicles with investment in M-Way Solutions of Germany On July 19, Agero Connected Services, a subsidiary of Agero, a provider of vehicle connectivity solutions (SSN, June 2012, p. 8), announced an investment in M-Way Solutions GmbH, of Stuttgart, Germany, a provider of mobile enterprise software and mobile services. The partnership will enhance Agero’s current capability to provide global automakers with cloud-based solutions by adding a market-proven platform for delivering tailored Web-based content into connected vehicles. Agero plans to couple M-Way's platform and technology with Agero's third-generation telematics infrastructure, which integrates diverse functions within the vehicle’s electronic architecture, resulting in services tailored to drivers through multiple in-vehicle and off-board human-machine interfaces. The two companies envision content delivered to drivers as part of an aggregated service that includes personalization and mobile CRM processes, pre-sales and after sales services, while conforming to human-machine interface requirements that meet the safety demands within the vehicle. Through its Mobile Enterprise Application Platform, mCAP, M-Way enables businesses to implement mobile enterprise services, including enterprise app distribution and mobile device management, workflows, mobile customer relationship management (CRM) services, and mobile commerce solutions for all mobile devices such as iOS, Android, BlackBerry, smartphones, and tablets. The company also has been developing and providing production systems and platforms for both premium- and mass-market automotive clients. The maturity of the mCAP platform, which has served clients in the European mobile enterprise market for the past several years, led to Agero's investment in repurposing the platform for in-vehicle content delivery, presales, marketing, mobility, and after-sales, as well as dealer- and customer-CRM services. Speech Strategy News August 2012 35 Agero foresees the next wave of connected vehicle services to be more complex than simply enabling drivers to access mobile apps via their dashboard, according to Frank Hirschenberger, Agero’s director of Product Innovation. “The M-Way partnership will enable Agero to effectively and quickly implement several critical system aspects to meet this challenge,” he said. Samsung delivers higher profits due to smartphone sales surge Samsung is the best-selling phone manufacturer in the world right now, according to Gartner Group, and the company reported hefty profit gains in the second quarter, up 79% percent year over year. West Corporation reports increased revenue and profits for its second quarter On July 18, 2012 - West Corporation, a provider of technology-driven communication services, announced its second quarter 2012 results. Revenue was $661.9 million, compared to $622.8 million for the same quarter last year, an increase of 6.3%. The Unified Communications segment had revenue of $369.5 million in the second quarter of 2012, an increase of 6.5% over the same quarter last year. The Communication Services segment had revenue of $295.2 million in the second quarter of 2012, 6.1% higher than the second quarter of 2011. The Company’s platform-based businesses had revenue of $485.2 million in the second quarter of 2012, an increase of 8.5% over the previous year. Adjusted EBITDA for the second quarter of 2012 was $179.5 million, or 27.1% of revenue, compared to $170.1 million, or 27.3% of revenue, for the second quarter of 2011. At June 30, 2012, West Corporation had cash and cash equivalents totaling $84.9 million and working capital of $260.0 million. Spoken Communications acquires HyperQuality, provider of quality assurance and business intelligence for contact centers On July 2, Spoken Communications (see interview, SSN, May 2012, p. 17) announced its acquisition of HyperQuality, a provider of third-party quality assurance and business intelligence for contact centers. Howard Lee, CEO, Spoken Communications, was the founder of HyperQuality. Integrating HyperQuality’s suite will be integrated with Spoken Communications’ cloud-based contact center platform. Interactive Intelligence announces preliminary Q2 results On July 17, Interactive Intelligence Group Inc. (Nasdaq: ININ, p. 21) announced preliminary results for its second quarter ended June 30, 2012. Interactive Intelligence said it expects to report total revenues for the second quarter of 2012 in the range of $54.0 million to $55.0 million, up approximately 4 to 6 percent year-over-year, below the company’s guidance of $58.0 million to $61.0 million due, primarily, to a greater than expected level of second quarter product orders that will be recognized as revenue in future quarters. GAAP net loss in the second quarter of 2012 is expected to be in the range of $0.5 million to $1.5 million, or $0.02 to $0.07 fully diluted earnings per share (EPS). Non-GAAP net income is expected to be in the breakeven range of $1.0 million, or EPS from $0.00 to $0.05. Growth in total orders for the second quarter of 2012 was 26% compared to the second quarter of 2011. Cloud-based orders increased 88% over the prior year’s second quarter and represented 24% of total orders received in the quarter. Apple acquires fingerprint scanner firm AuthenTec According to an SEC filing August 26, Apple has acquired AuthenTec, a company that specializes in security systems such as fingerprint scanners. AuthenTec will become a wholly owned subsidiary of Apple, at a price of $8 per share, pending regulatory approval. Under the agreement, Apple the right to acquire nonexclusive licenses and other rights on AuthenTec hardware, software, and patents. For that, Apple will hands over $20 million, after which it has 270 days to license certain technologies for up to $115 million. There alos is a development agreement, which says that AuthenTec will perform certain non-recurring engineering services for Apple for product development and will receive payment of a total of up to $7.5 million for performance of the development work. Speech Strategy News August 2012 36 People Thomas B. Sabol named Chief Financial Officer of Comverse, Inc. Comverse Technology, Inc. (CTI), announced that, effective July 24, 2012, Thomas B. Sabol will become Chief Financial Officer of CTI’s wholly-owned subsidiary Comverse, Inc. (CNS), which provides business support solutions (BSS), mobile Internet, and value-added services (VAS). As previously announced, CTI plans to spin off CNS as an independent public company in a transaction that is expected to become effective in the third quarter of this fiscal year. Mr. Sabol joins Comverse following two years as Chief Financial Officer of Hypercom, a publicly traded company in high security, end-to-end electronic payment products and services. Bill Robinson named Executive Vice President of Worldwide Sales at inContact inContact, a provider of cloud contact center software (SSN, February 2012, p. 24), announced the appointment of Bill Robinson as Executive Vice President of Worldwide Sales. Robinson will be responsible for growing the company’s global cloud footprint through direct sales as well as indirect channels, including Siemens Enterprise Communications and Verizon Business. Among other positions, Robinson was Senior Vice President of Worldwide Field Operations for Witness Systems, where he led a team of more than 200 and approximately tripled sales in 3 years, positioning the company for a $1 billion merger with Verint Systems. Eliza names Lee Horner Senior Vice President of Sales Eliza Corporation announced the appointment of Lee Horner as Senior Vice President of Sales. Horner will support Eliza's continued growth and leadership in the “Health Engagement Management” segment, according to a company announcement. (See interview with Lucas Merrow, founder and CEO of Eliza, SSN, July 2012, p. 23.) Previously Senior Vice President at Vitera Healthcare Solutions (formerly Sage Software), Horner was responsible for the strategic direction and execution of all sales, delivery, and marketing throughout North America. John Shagoury, President of Eliza Corporation, said that Lee “will help our customers make more informed decisions about improving and modernizing their end-to-end engagement strategies with solutions that positively affect health outcomes, care and costs—today and in the future.” Lyle Ball named Chief Operating Officer at translation company MultiLing MultiLing, a translation services provider specializing in intellectual property (IP) and technical materials translations for multinational enterprises, announced the appointment of Lyle Ball as chief operating officer. With nearly 20 years of experience managing or consulting high-tech and clean-energy companies, Ball spent the past six months strategically advising MultiLing on high-growth strategies related to its market shift to IP translations. As chief operating officer, Ball will be responsible for more than 200 employees in seven country offices and more than 1,000 highly skilled contractors across more than 80 languages. Cyara Solutions names Laurence Webb general manager of sales for Australia and New Zealand Cyara Solutions, which provides premise and cloud solutions for testing, monitoring, and simulation of IVRs and contact center systems and applications (SSN, October 2011, p. 26), announced the hiring of Laurence Webb, a thirty-year IT veteran and former director of sales for Telstra’s outsourcing business, as Cyara’s general manager of sales for Australia and New Zealand. Speech Strategy News August 2012 37 For Further Information on Products Mentioned in this Issue Company 4medica ABI Research Location Culver City, CA Oyster Bay, NY New York, NY Toronto, Canada Waltham, MA Tel Aviv, Israel About.com (part of the NY Times Co.) Accessible Media Inc. (AMI) Active Endpoints, Inc. Afeka Center for Language Processing (ACLP) Agency for Science, Singapore Technology and Research (A*STAR) Agero Medford, MA Amazon Seattle, WA Analysis Beijing, International China Apple Cupertino, CA Applied Voice Input San Jose, Output Society CA (AVIOS) AssistiveWare Amsterdam, The Netherlands AT&T San Antonio, TX AuthenTec Melbourne, FL Baidu, Inc. Beijing, China BBVA Spain BMW Westwood, NJ Calabrio Minneapolis , MN CallMiner Fort Myers, FL Carnegie Speech Pittsburgh, PA CNSI Gaithersbur g, MD Comverse New York, Technology NY Comverse, Inc. Wakefield, (subs of Comverse MA Technology) Product Mentioned Contact info Electronic Health Record (310)695-3300; www.4medica.com Market research (516)624-2500; www.abiresearch.com www.about.com Calorie counter app Web site with acccessibility features Process automation products Research organization (416)422-4222; www.ami.ca (781)547-2900; www.cloudextend.com +972-3-768-8757; www.aclp.co.il Agency for supporting innovation www.a-star.edu.sg Driver assistance and connected vehicle services Product sales on the Web Market reseach in China (781)393-9300; www.agero.com Personal computers, music players, wireless phones Non-profit organization supporting quality speech application development Assistive software www.apple.com Telecommunications services www.att.com; www.wireless.att.com; www.synaptic.att.com (321)308-1300; www.authentec.com Fingerprint authentication solution Search in Chinese www.amazon.com http://english.analysys.com.cn (408)323-1783; www.avios.com www.assistiveware.com http://ir.baidu.com Banking group Automobiles www.bbva.com (201)307-4000; www.bmw.com Contact center suite (763)592-4600; www.calabrio.com Speech analytics (239)689-6463; www.callminer.com Reading training using speech recognition IT and business process outsourcing solutions Network-based communication services Commuications solutions (412)622-2181; www.carnegiespeech.com www.cns-inc.com (212)739-1000; www.cmvt.com (781)246-9000; www.comverse.com Speech Strategy News August 2012 38 Companies mentioned in this issue Company Cyara Solutions Location Melbourne, Australia DARPA (Defense Arlington, Advanced Research VA Projects Agency) Deutsche Telekom Germany Easy Voice Windsor, Biometrics PA Eliza Corporation Danvers, MA eMarketer New York, NY Empirix Bedford, MA Gartner Group Stamford, CN Glancy Binkow & Los Goldberg LLP Angeles, CA Goode Intelligence London, UK Product Mentioned Contact center solutions Google Goya Foods Grain Media Health Sciences North Hearst Corporation Honda Horizon Private Cloud HTC HyperQuality ICMI IDC inContact, Inc. (formerly UCN) Industrial Technology Research Institute (ITRI) Infinity CCS Mountain View, CA, and Cambridge, MA Secaucus, NJ Hsinchu City, Taiwan Sudbury, Canada New York, NY Japan Lake Forest, CA Taiwan, R.O.C. Seattle, WA Colorado Springs, CO Framington, MA Salt Lake City, UT Chutung,Hs inchu,Taiwa n Birmingham , UK Research support Telecommunications services Speech biometrics Speech-enabled programs for healthcare Market research Contact info +61 3 9607 8304; www.cyarasolutions.com www.darpa.mil www.telekom.com (717)764-9240; www.easyvoicebiometrics.com (978)921-2700; www.elizacorporation.com (212)763-6010; www.emarketer.com Hammer telephone application testing Information technology reports and consulting Law firm (781)266-3200; www.empirix.com Market research Voice and directory search +44 20 3356 4886; www.goodeintelligence.com (650)253-0000; www.google.com; www.google.com/mobile; www.grandcentral.com Food company (201)348-4900; www.goya.com Integrated circuits +886.3.564.5533; www.grainmedia.com Healthcare organization (705)523-7100; www.hsnsudbury.ca Media and information www.hearst.com Automobiles Cloud desktop and application virtualization solutions Smartphone and PDA Phone devices Quality assurance and business intelligence for contact centers Contact center services www.honda.com (888)652-2948; www.horizonprivatecloud.com +886-3-3753252; www.htc.com Market research (508)988-7988; www.idc.com On-demand contact center services Research organization (801)320-3200; www.inContact.com Contact Manager platform +44 121 450 7830; www.infinityccs.com (203)316-1111; www.gartner.com (310) 201-9150; www.glancylaw.com (206)283-7119; www.hyperquality.com (719)268-0328; www.icmi.com +886-3-582-0100; http://www.itri.org.tw Speech Strategy News August 2012 39 Companies mentioned in this issue Company Integrated Document Solutions (IDS) Interactive Intelligence Group Inc. International Computer Science Institute (ICSI) International Research Consortium (USTAR) iSpeech ITEC Kantar Worldpanel ComTech KRP Communications LANDesk Software Lexifone Louisiana Department of Health and Hospitals M-Way Solutions M*Modal (MModal, was MedQuist) Magyar Telekom me2me Mercedes-Benz Mercedes-Benz USA Microsoft Corporation Mobile Future Motorola Mobility (acquired by Google) MultiLing National Cheng Kung University National Consumers League National Federation of the Blind Location Product Mentioned Ft. Medical dictation service Lauderdale, FL Indianapolis Unified Communications and , IN IVR Contact info (954)484-0969; www.idssite.com Berkeley, CA Research institute www.icsi.berkeley.edu London, UK Research organization http://ustar-consortium.com Newark, NJ Dubai, UAE -- Application developer toolkit Academic/training institute Consumer panels (877)447-7332; www.iSpeech.org www.itec.ae www.kantarworldpanel.com Burnaby, BC, Canada South Jordon, UT Haifa, Israel Baton Rouge, LA Unified communications integrator IT software (604)433-1530; www.krpcomm.com Voice translation by phone Health organization www.lexifone.com (225)342-9500; http://new.dhh.louisiana.gov Stuttgart, Germany Pittsburgh, PA Budapest, Hungary Zurich, Switzerland Germany Montvale, NJ Redmond, WA Washington , DC Mobile solutions for the automotive industry Speech recognition technology for healthcare transcription Telephone service provider +49 711 49066 - 460; www.mwaysolutions.com (412)422-2002; www.mmodal.com Storing and retrieving personal information by phone Automobiles Automobiles www.me2me.com Various applications, products, and services Coalition advocating innovations in wireless technology and services Mobile phones, portable devices (206)454-2030; www.microsoft.com/speech (866)459-5998; www.mobilefuture.org Provo, UT Taiwan Translation services University (801)377-2000; www.multiling.com http://english.web.ncku.edu.tw Washington , D.C. Baltimore, MD Consumers organization (202)835-3323; www.nclnet.org TTS for visually impaired (410)659-9314; www.nfb.org Downers Grove, IL (317)872-3000; www.ININ.com (801)208-1500; www.landesk.com +36 1 458 0000; www.telekom.hu www.mercedes-benz.com (201)573-0600; www.mbusa.com (630)353-8000; www.motorola.com Speech Strategy News August 2012 40 Companies mentioned in this issue Company Nexidia Northwest Multiple Listing Service (NWMLS) Nuance Communications Nuvoton Technology Corp. OfCom Location Atlanta, GA Kirkland, WA Product Mentioned Audio content search Real Estate broker consortium Contact info (404)495-7220; www.nexidia.com www.nwrealestate.com Burlington, MA Hsinchu Science Park, Taiwan London, UK Speech technology, applications, and services Integrated circuits (617)428-4444; www.nuance.com Independent regulator and competition authority for UK communications industries Investment firm +44 300 123 3000; www.ofcom.org.uk Surveys Market research (800)444-4672; www.orcinternational.com (612)303-6000; www.piperjaffray.com Telephone services www.t-mobile.pl Speech recognition computer telephony (613)271-8989; www.pronexus.com Hospital and health services www2.providence.org Aerospace and defense company Industrial robots and accessories (781)522-3000; www.raytheon.com Hosted CRM software (415)901-7000; www.salesforce.com Wireless telephones and TVs www.samsung.com IT solutions for home healthcare and social services (516)484-4400; www.sandata.com Enterprise business process software +49 69 2222 7846; www.enterprisecommunications.siemens.com One Equity Partners New York, NY ORC International Princeton, NJ Piper Jaffray Minneapolis , MN Polska Telefonia Warsaw, Cyfrowa (subs. of Poland Deutche Telecom) Pronexus Ottawa, Ontario, Canada Providence Health Renton, WA & Services Raytheon Waltham, MA RMT Robotics Ltd. Grimsby, ON, Canada Salesforce.com San Francisco, CA Samsung Seoul, Electronics South Korea Sandata Port Technologies Washington , NY Siemens Enterprise Munich, Communications Germany (SEN) SoundGecko -SpeakGlobal, Ltd. Speaktoit plc Spoken Communications SRI International App converting articles to speech Kobe, English as a Foreign Language Japan training Newark, DE Personal assistant mobile app Bellevue, Call center and service provider WA solutions Menlo Park, Speech recognition and CA language R&D +886-3-5770066; www.nuvoton.com www.oneequitypartners.com (905)643-9700; www.rmtrobotics.com http://soundgecko.com www.speakglobal.co.jp www.speaktoit.com (206)428-6044; www.spoken.com (650)859-2000; www.sri.com Speech Strategy News August 2012 41 Companies mentioned in this issue Company Strikeforce Technologies TalkTalk Group TeleNav Terra Nova The Hispanic Institute TradeHarbor Trapit United Hospital System University of Kansas Verint Systems Verizon Business Veveo Vlingo (acquired by Nuance) Vocal Laboratories (Vocalabs) Vocre Voice Assist Voice Automated VoiceVault VoiZapp Inc. Voxbone Voxeo Voxeo Labs (part of Voxeo) W3C Multimodal Interaction working group Wavelink Corporation Location Product Mentioned Edison, N.J. Two-factor authentication for web sites London, UK Fixed line broadband, voice telephony, and mobile services Santa Navigation services Clara, CA St. John's, Clinical documentation solutions Canada Washington Non-profit education forum , DC St. Louis, Speaker verification applications MO Palo Alto, Personal assistant app CA Wisconsin Regional healthcare system and northern Illinois Lawrence, University KS Melville, NY Call center and security solutions Los Enterprise telephone solutions Angeles, CA Andover, Usability solutions for connected MA smart devices Cambridge, Voice-powered interface for MA mobile phones Golden Usability testing Valley, MN -Translation app Lake Hosted speech services Forest, CA Lake Voice to text workflows for Forest, CA vertical markets Dublin, Voice verification technology Ireland and service Austin, TX Tweet-to-voice and Facebookto-voice Brussels, Inbound VoIP provider Belgium Orlando, FL Voice hosting and contact center solutions San Phone platform research Francisco, CA — Standards effort Contact info (732)661-9641; www.strikeforcetech.com +44 20 3417 1000; www.talktalkgroup.com (408)245-3800; www.telenav.com Midvale, UT (888)697-9283; www.wavelink.com Mobile application development and mobile infrastructure management software (888)600-4178; http://terranovatrans.com www.thehispanicinstitute.org (314)878-1200; www.tradeharbor.com http://trap.it www.uhsi.org (785)864-2700; www.ku.edu (631)962-9600; www.verint.com (213)625-1005; www.verizonbusiness.com (978)687-8240; http://corporate.veveo.net (617)871-2987;www.vlingo.com; www.vlingomobile.com (952)941-6580; www.vocalabs.com www.vocre.com (949)257-0923; www.voiceassist.com (714)969-7632 ; http://store.voiceautomated.com +353 1 603 9500; www.voicevault.com (512)850-5803; www.voizapp.com +32 28 08 00 00; www.voxbone.com (407)418-1800; www.voxeo.com www.voxeolabs.com www.w3.org/2002/mmi Speech Strategy News August 2012 Companies mentioned in this issue Company West Corporation West Interactive (unit of West Corp.) Wolfram|Alpha World Bank Yankee Group Yelp Location Omaha, NE Omaha, NE Cambridge, MA Washington , DC Boston, MA San Francisco, CA Product Mentioned Communication solutions Out-sourcing of customer contact solutions Knowledge search web site International bank Market research Web review service Contact info www.west.com (402)963-1300; www.westinteractive.com (217)398-0700; www.wolframalpha.com www.worldbank.org (617)598-7200; www.yankeegroup.com www.yelp.com 42 Speech Strategy News August 2012 43 Free Blog (with a chance to comment!) Meisel-on-Mobile (www.meisel-on-mobile.com) Will this make you watch TV advertisements? Siri gets smarter The ultimate mobile user interface: brain implants!? Mobile marketing: Engaging your customer on a mobile device Major themes at the Mobile Voice Conference What does it take for mobile personal assistants to “understand” us? Voice control of your TV: Is it listening to everything you say? I wish to subscribe to Speech Strategy News for one year (12 issues), payable in US$ on US bank— Individual* Corporate* Individual* Corporate* PDF PDF PDF PDF 6 monthly issues 6 monthly issues 12 monthly issues 12 monthly issues $215 $750* $425 $1,495* * Corporate subscriptions: Unlimited users within a corporation for PDF version with Web access through corporate password. Individual subscriptions cannot be shared (neither passwords nor electronic copies). Please send information on your consulting. Name: Company: Address: Check enclosed, payable to TMA Associates (in U.S. $ on a U.S. bank). Invoice me. Charge my— Visa MasterCard American Express City, State ZIP/Postal code Card # Country Expiration date: Email (required for email alerts or a Web subscription): Signature: _______________________________________________ Phone: Copyright TMA Associates 2012; All rights reserved. TMA Associates, P.O. Box 570308, Tarzana, CA 91357- 0308 USA. Tel: (818) 708-0962. Fax: (818) 232-0368, or go to www.tmaa.com/subscribetossn. 230 Speech Strategy News is published twelve times per year by TMA Associates, Editor: William S. Meisel. Trademarks mentioned in this publication are the property of the companies mentioned; they are used editorially. The material herein is based on data from sources believed to be reliable, but is not guaranteed as to accuracy and does not purport to be complete. From time to time, the author or TMA Associates may have consulting assignments, advisory positions, own stock, or have other business relations with organizations in speech recognition and associated areas, including companies discussed in this newsletter. Speech Strategy News is a trademark of TMA Associates.