On the 12th of February the Belgian speech technology industry gathered in the Belgacom Towers for the second edition of the ContactCentres.be seminar “Speech Technology: Customer Experiences in Belgium”. Apart from seminar host Belgacom, the event was also sponsored by Acapela, DBScape, Genesys, NextiraOne, Nuance, Quentris and The Ring Ring Company. Chairman Alain Rondenbosch of the organizing ContactCentres.be Speech Technology Workgroup was very pleased to welcome a larger audience than at the first edition in November 2006.
The format of the seminar is simple: give the floor to the actual end users of speech technology – mainly call centre managers in our setting – rather than to the vendors. As a novelty, Voice User Interface (VUI) design expert Tom Houwing, director of VoiceAndVision, was invited this year as keynote speaker and moderator.
In the first part of his keynote address, Tom Houwing sketched a painful but – sadly enough - realistic picture of how otherwise professionally run call centers give the automated voice channel the proverbial stepmother treatment. Everyone knows how to speak and listen, right? Not! Causes of voice user interface neglect by automation project champions are naivety, ignorance, or downright complacency. These are my own terms, not Tom Houwing’s, but I’m sure he would agree. Mr. Houwing gave the example of a large utility company who failed to apply even the simplest of user interface design principles to its main customer service line: offer the most frequently chosen option in first, instead of in sixth place.
On a positive note, I must mention Tom Houwing’s wonderful account of a stepwise disambiguation strategy for the speech recognition of loosely structured alphanumeric license plates: the uncertainty implicit in the N-best result list is personified by a dumb subordinate who doesn’t know how to jot down license plate numbers; he is consequently yelled at by his condescending, bossy boss, who himself is interfacing with the caller – in a more charming tone. This is one brilliant example of how an inherent challenge of speech recognition is not just being overcome, but turned into a funny customer experience that also works, in all meanings of the word.
As could be hoped or feared, the end user presentations that followed would give the keynote speaker and the attentive audience more than enough inspiration on do’s and don’ts in VUI and customer interaction design.
Marketing Manager Catherine De Baets of the Belgacom Call Center explained how a legacy static script-based IVR was transformed into a fully dynamic version. The business goal was to speed up reactivity through maximum agility on the rapidly changing market in the sales, marketing and complaint handling domains. By relying on a centralized, cross-channel Customer Relationship Management (CRM) database, all IVR menus at Belgacom Call Center are now data-driven. This way, customers who are not eligible to a certain service (e.g. Belgacom TV) now do not get this option on offer anymore – and that is certainly a bonus. Even more important is the flexibility gained by Belgacom’s internal business users, who don’t rely on the IT department anymore to deploy their marketing initiatives. As a consequence, the time to market of a new promotion or other marketing action has dropped to between 4 hours and 2 weeks.
Although, according to the speaker, the profile-based approach has enhanced customer satisfaction, the question must be asked if the exclusive reliance on text-to-speech – identified from the start as a key success factor, by the way – is really necessary. It is a common misconception to believe that because text-to-speech is particularly apt at reproducing dynamic text, all dynamic text necessarily needs to be reproduced by text-to-speech. The real decision criterion should be whether the structure of the text itself is predictable or not. Judgeing from a test call I made after the seminar, the IVR menus are indeed dynamic, but their structure is not. As a result, the auditive quality of the overwhelming majority of menu items could be greatly enhanced by prerecording and postproducing them once and for all, with enough variation for good prosody. Audio file concatenation would then take care of the rest. This would, as an example, prevent phone numbers from still being read aloud one digit at a time, which is not customer friendly at all, let alone state of the art.
Contrary to the kind of on-the-fly text-to-speech discussed above, BeTV’s System & Network Manager Jean-Michel Motte presented a use case of the same technology in the less dynamic setting of a movie ordering service by phone. Not to mention a name, Acapela’s Virtual Speaker was showcased as an off-line technology situated half-way between dynamic text-to-speech and prerecorded, concatenated prompts. The tool allows off-line prompt creation and enhancement, but without the need for a studio. Or a speaker, for that matter, but do note you can roll your own TTS voice these days. Interestingly enough, Mr. Motte showed how the correct pronunciation of foreign names like his own company’s name could be forced by misspelling it, rather than by creating an entry in the pronunciation lexicon. For sure a pragmatic use of speech technology!
Valérie Nève, Call Center Manager at Leroy Merlin in Lille (Northern France, close to the Belgian border) presented a call center routing and CRM integration case based on Genesys, Vocabase and Acapela technology on the IVR side, with NextiraOne France as integrator. In 2007, the company’s 25 expert advisors treated 107.000 incoming phone calls. A major objective of the automation project was to bring down the abandonment rate from 30 to 5 percent. As each contact represents a revenue opportunity, losing one call out of three is clearly unacceptable. Through DTMF menus, customers now select the competence level required to answer their question, and get transferred to the right agent. In case no agent is available, the CRM system immediately proposes a call-back time frame. The IVR line also gives access to other personalized services like loyalty card information, credit services, and order taking.
At this stage the Leroy Merlin project focus seems to have been more on back-end CRM integration than on front-end speech automation – speech recognition is indeed still on the to do list. Like the Banksys case presented at the previous seminar, speech technology often only comes into play after the necessary back-end processes have been put into place. In a way, this is good, because it reduces project complexity and prevents speech technology or classic IVR front-ends from getting blamed for failing back-end processes. This being said, there’s not much point either in creating wonderful back-end solutions if they can’t be reached because of an overly complex IVR menu structure. As far as I can recall, the presentation did not mention any hard figures on customer satisfaction. In any case, my advice to Leroy Merlin would be to benchmark the current DTMF-only interface, have a VUI expert – not an engineer! – design the speech interface, implement it, and measure customer satisfaction and other relevant business key performance indicators again.
After the break, we got news from Lisa, Belgacom’s fully automated directory assistance, reachable in Belgium at 1234. The last time I heard from her was almost three years ago at Voice World 2005 in London. When the service was launched in October 2004, it failed to live up to the high expectations raised by the publicity campaign; that some journalists openly tried – and still try – to ridicule the system did not help its reputation either. A bit cheap, because it would not be difficult either to “prove” that DTMF recognition doesn’t work if you decide to wear boxer’s gloves. Anyway, whether we like it or not, many people inside and outside the industry still perceive and refer to Lisa as their personal benchmark of state-of-the-art speech technology in Belgium. So any news from Lisa is important news.
To put everything into perspective, Guido Vermeire, Projects & Technical Support Manager at Belgacom Directory Information Services started his exposé with some figures from the human-manned DA numbers 1207 (Dutch), 1307 (French), 1407 (German) and 1405 (English). In 2006, about 480 call center FTEs handled a bit less than 42 million DA inquiries, that’s 3.4 million a month, 115 thousand a day, or 240 per FTE per day. In 2005, Lisa/1234 handled 33 thousand calls a month – a mere 1% of all DA inquiries. The goal of 83% of these automated inquiries was to find the phone number of a residential or business listing; the remaining 17% were reverse queries, starting from a known phone number. For 2006 and 2007, no figures were made available. In 2008, the monthly call volume of Lisa/1234 has risen to 46 thousand – still negligible compared to the non-automated case – but the breakdown by call type has completely changed: only 29% of normal business or residential queries, versus 71% of reverse ones.
I could not prevent a deep frown when hearing these numbers. But there was more disturbing news. Of 100 normal inquiries handled by Lisa, only 29% lead to a fully automated result. In 39% of the cases, Lisa decides to transfer the caller to a human colleague. That is, if the caller hasn’t asked for such a transfer in the first place, which happens in another 22% of the cases. The remaining 12% cover the premature hang-ups. In reverse search, success and automation rate are around 85%, which is understandable as the automation task only involves recognition of a phone number, which can be done via DTMF.
All in all, this means that Lisa automates around 3900 normal business or residential inquiries a month. That’s 130 calls a day, or half an FTE.
Mr. Vermeire should be admired for his openness and honesty in sharing the list of technical and voice user interface challenges that the Lisa project had to overcome. It was not the list itself that surprised me, but the lack of answer that apparently had been given to address the various problem areas. For example, the “spoke too late” or “spoke too soon” effect needs to be tackled by good prompt design and timeout settings in any speech recognition project. Stopwords and garbage models have been inserted in speech recognition grammars for over a decade. If disambiguation by street name is not an ideal solution, then maybe the solution needs an update. If the dialog is described as “customer friendly but directed”, some serious misconceptions have taken root about good VUI design. And if customers are afraid to be transferred to an operator because it will cost them money – why not take away their fear with a simple prompt? In short: if the water tap is leaking, why not repair it?
The simple but disconcerting answer may be that there has never been a clear and integrated strategy behind the initial decision to develop and launch Lisa. Was there a need for cost reduction in the call center? Probably not, the unions more than likely would have disagreed. Do callers spend too much time in the waiting queue on the human-manned DA line before being served? We don’t know for sure. Was the money spent on marketing a new 1234 number in line with the expected quality of the service running behind it? I’m afraid not. Were the key business and technical performance indicators clearly stated from the start? If they were, they were not presented at the seminar.
I do apologize for all this frankness, but Belgium deserves a better automated DA service than Lisa is able to provide today. If Belgacom won’t or can’t do it, maybe someone else will. Time will tell.
Last speaker of the day was Hans Van Hauteghem, Manager of Product Development at the Royal Meteorological Institute of Belgium. He was assisted on the linguistic side by your servant. Together with its partner The Ring Ring Company, the RMI has been offering weather information services by phone for more than 10 years, using prerecorded messages at the start. In October 2006, they switched to a text-to-speech only solution, thereby addressing concerns about consistent message quality, and improving the speed of the service. Weather reports are now updated 5 times a day across all media channels, and made available to the general public and specialized target customers alike. Simple and highly structured data types like sea tides, ephemerides, and UV radiation are coded in XML and only transformed into text later on by a text generation module residing on the IVR provider’s side. More complex, less structured data like the full weather forecasts are manually written by the RMI meteorologists before they cross the wire to the TTS platform.
The solution is not exactly what you’d call rocket science, but it works fine. Indeed, the value of a speech application is not necessarily linked to the amount of technical sophistication that the average IT engineer would like to throw at it. Small and simple can be beautiful! The 160 thousand callers who use the fully automated weather line on a yearly basis surely seem to agree.
In his closing words, Tom Houwing sent a tactful but strong message to the audience that the speech technology world doesn’t stop at the Belgian border. Numerous successful, state-of-the-art and fun speech technology applications are being showcased every year at Voice Days in Germany or SpeechTek in New York.
To conclude in my own words: if the Belgian market is somewhat struggling, this has more to do with a lack of vision, ambition and belief from the large players, than with innate peculiarities of speech technology implementation practices in Belgium as such. If we need to learn one lesson, it’s that successful speech applications don’t depend on speech technology in the first place, but on human factors.
And now, back to work!