Sea, speech and sun at Nuance Conversations in Cannes, France

Nuance Conversations Europe 2007 was held last week in the legendary Carlton Inter-Continental hotel in Cannes, France, home to many a movie star. After last year’s successful first edition in Mallorca, expectations among the 250+ speech technology crowd were high. Apart from the traditional contact centre business and technology sessions, the new mobile and automotive tracks were testimony to the ubiquity of speech technology in our daily lives. The fil rouge of this year’s conference was “Elevating the User Experience”.

Carlton Inter-Continental hotel in Cannes, France

Nuance executive Peter Hauser painted a rosy picture of the state of the industry in general, and of his company in particular. As basic speech technologies have matured to the point of becoming commoditised, the real battle ground for the industry has shifted to the customer experience. In the same year that Dragon Naturally Speaking is celebrating its 10th birthday, Nuance has combined the best-of-breed features of its various legacy ASR technologies into Recognizer v9, which is now available worldwide. The new recognizer plays a pivotal yet unobtrusive role in Nuance’s vision for multi-modal mobile solutions, which is detailed below.

To bring this vision to the market, Nuance relies heavily on its partner network, so naked motivator David Taylor was called in to infuse the audience with their own dreams and with some simple – cynics would say simplistic - principles on how to realise them. Whether you like Mr. Taylor’s style or not, I guess there’s some truth in his truisms.

But let’s not get distracted. What’s the mobile vision about? Here’s how Nuance sees things: users push a single Voice button and talk. The device then does one of two things: perform the recognition directly by itself, or hand it off to a recognition server in the network. The recognition result is fed into a diverse set of applications including directory assistance, voice mail, music catalogue search, music playing, or navigation, to name just a few. Depending on the application context and the device capabilities, the requested information is then presented in the form of text, graphics, audio and/or video. The basic technology to enable this kind of “elevated user experiences” is there, and was convincingly demonstrated in Cannes. Therefore, the one million – or rather, billion - dollar question is not about technology anymore. But if that’s the case, what are then the main drivers or barriers that will make or break this whole new range of mobile services?

Nuance Conversations Europe 2007 in Cannes, France

Elements of an answer were provided on Wednesday by a panel consisting of Peggy Ann Salz (Informa), Pascal Coutier (Logan Orviss International), Trond Lund (Fast) and Marcel Pirlich (Arvato Mobile). Drivers for experience-rich multi-modal service uptake include personalisation (knowledge of past usage patterns), context-awareness (current time, place & activity), and flat rates. Barriers include lack of usability (see my previous post on free 411 systems in the US), difficulty of rich client installation procedures, absence of data bundling in Europe, availability of capable handsets and above all, lack of decent business plans. There is a consensus in the mobile industry that the mobile search market will eventually be driven by advertisements (a market estimated at $6.5 billion by 2011). Speech is seen as a major enabling technology in this space, as it facilitates data input in application contexts where secrecy and discretion are not an issue. In hands-free contexts like cars or warehouses, human speech is even the only legally allowed or practical means for inputting data.

Frederik Durant at Nuance Conversations 2007 in Cannes, France

To prove their point about the speed, naturalness and safety of the speech interface, Nuance put former Formula One driver Perry ‘The Stig’ McCarthy on stage behind the wheel of a virtual sports car, and asked him to read an SMS and select a given iPod tune while driving on the Monaco F1 circuit at “normal” speed. After 10 minutes and 20 odd crashes, the audience finally got to hear some music. Perry McCarthy’s opponent from Nuance got the same job done with TTS and speech input in a minute or so, without a single scratch to his car – although he did drive a bit slower, I must admit.

Just say 'The Supremes - Ain't No Mountain High Enough' and enjoy the multi-modal experience!

In the Contact Centre Business track I attented two customer testimonials.

Melanie Rowland, Head of Self Service and Automation-IVR at Vodafone UK, presented her company’s long journey from organically-grown IVR silos – a real customer-nightmare – to speech-enabled services hosted by Vicky, Vodafone’s virtual advisor. At the start in 2005, no less than 19% of the 6 million weekly inbound telephony contacts were simply abandoned, and only one caller out of four was able to select the right option for their query. Vodafone first fixed the existing IVRs, which allowed the automation rate for e.g. post-pay customers to jump from 20% to 45%. This investment paid for itself in less than 5 months. As customers were still dissatisfied with the existing voice, Vodafone then went a step further to create Vicky, a virtual persona who embodies Vodafone UK’s brand essence and personality. Since then, customer satisfaction rates have risen dramatically. Vodafone UK now plans to automate 60% of all inbound traffic by next year. As success factors, Mel Rowland cited the primacy of the customer experience over technology concerns, and also stressed the importance of well-defined and consistently tracked key performance indicators. Getting early buy-in from the business, marketing, technology and customer service departments through open communication is also key to succeed.

Melanie Rowland presents her colleague Vicky, the Voice of Vodafone UK 

On Wednesday afternoon Ross Moody, Head of Shared Services at Standard Bank in South-Africa, explained how his company’s initial exploratory initiative to integrate two different platforms turned into a strategic thinking exercise as the full potential of VoiceXML-based speech technology became apparent. In a country with 11 official languages, customers more often than not use another language than their mother tongue when interacting with service departments or systems. With help from Intelleca, Standard Bank therefore had to spend quite some effort on collecting data, tuning acoustic models, redesigning grammars and adding phonetic transcriptions to the pronunciation dictionaries in order to achieve desired perfomance levels. The use of a separate male persona for the wizard/guide as opposed to the female virtual agent seemed awkward to me, as it made the whole user experience slower and overly complex. Despite this negative point, customer feedback to the new, integrated speech solution was said to be “overwhelmingly positive”.

The Carlton Inter-Continental on La Croisette in Cannes, France

Like last year, Chief Scientist Vlad Sejnoha offered some glimpse into Nuance’s R&D future. The new vision, called Care 2.0, is driven by a common technology base which integrates elements from the traditional core technologies Dictation, Network and Embedded, which are converging. The common platform is backed by the availability of huge amounts of performance data and computing grids to process them, the existence of similar requirements and approaches across device categories and applications, and seamless APIs. In this world of converged and always-available basic technologies, the new challenge is to find effective ways of combining speech and visual components in order to offer compelling multi-modal, experience-rich services. The new Quest is to discover what kind of applications users want to talk to, and what type of service users need in a given context.

Unless I was blinded by the April sun, the perfect temperature and the excellent food & drinks, Nuance Conversations Europe 2007 led me to one conclusion: the present and future of speech technology look very bright indeed.

2 Responses to “Sea, speech and sun at Nuance Conversations in Cannes, France”

  1. Simon says:

    My God you really love this speech technology :)

  2. [...] Keynote speaker Geraldine Wilson from Yahoo! Europe predicted that search will be the doorway to the mobile Internet just as it has been on the good old Internet 10 years ago. Unfortunately she did not elaborate on the role that speech input would play in this trend. Ms. Wilson pleaded strongly for a physical search button on each mobile phone - which made me think of Nuance’s plea for a physical speak button at their last European Conversations conference. Interesting to see that physical – read: hardcoded – features are still considered that important in a technological world driven by personalization, virtuality and constant change. Why not get rid of all physical buttons altogether, and turn the mobile phone into a multi-modal portal-like personalisable device, with freely downloadable configurations and skins? Ms. Wilson repeatedly mentioned Apple’s iPhone, but remained silent on Google’s gPhone. Which, admittedly, is currently also known as the vaporPhone. [...]

Leave a Reply