Archive for the ‘VUI Design’ Category

Google gives acte de présence with 1-800-GOOG-411

Saturday, April 14th, 2007

Barely waiting for Microsoft and Tellme to return from their honeymoon, Google Labs recently launched Google Voice Local Search, an experimental 411 (directory assistance) service. For the moment, 1-800-GOOG-411 just offers US local business listings, directly accessible from any US phone. But with a Grandstream SIP phone, an Asterisk PBX and a gateway like FreeWorldDialup, this minor nuisance is quickly bypassed.

So instead of speculating if, when and how Google will integrate the new service in its pay-per-click or pay-per-call advertising model, I just called 1-800-GOOG-411 for a quick try-out. Jingle Networks‘ 1-800-FREE-411 service was chosen as Google’s sparring partner.

To make the test a bit more fun and real for myself, I decided to only search for US businesses that I have actually visited at some point in time. This way I not-so-randomly picked David’s World Famous catering service in Burlington, MA; the MIT COOP bookstore in Cambridge, MA; and the Starbucks on El Camino Real in Palo Alto, CA.

First some food from David’s World Famous. My call to 1-800-GOOG-411 was answered by a neutral-sounding male voice saying “calls recorded for quality”. Notice the absence of any verb? After two seconds I got a pre-recorded prompt “GOOG-411 experimental. What city and state?” My answer “Burlington, Massachusetts” was well recognized and explicitly confirmed by the system. To the next question “what business name or category?” I said “David’s World Famous”. There was a short database lookup and after 21 seconds into the call, I got presented with the top-2 results. I chose the first one and could have been connected directly to the catering service after 41 seconds, if I had wanted to. Instead I asked for more address details, which another male TTS voice read aloud twice, presumably to give me a chance to jot it down. The phone number was read correctly, in a conversational, natural way. After this self-chosen digression, I was connected to the David’s World Famous answering machine – not a surprise, really, as the local time in Massachusetts at that moment was well after midnight.

I then tried the same procedure through 1-800-FREE-411, at least that’s what I had in mind. “Welcome to 1-800-FREE-411! Press 9 now to get the last number you requested”, said a female pre-recorded voice. I wasn’t interested in that, so I kept silent. After 12 seconds, a first commercial offered me to take part in Stonebridge Life’s $25,000 give-away. Er, maybe some other time. Thirty-one seconds into the call, I got a “What city and state, please?” prompt, and said “Burlington, Massachusetts”. There was no explicit confirmation; instead the system immediately continued with “Are you looking for a business, government or residential listing?” “A business listing”, I said. Again no confirmation, but another prompt “Would you like to search by name or by category?” “By name”, I answered. “OK, what listing?” “David’s World Famous”, I said. Now things became funny. The call was sponsored by “Girls Gone Wild”, who offered me two videos for free, meaning I just had to pay shipping and handling costs. Yeah, right. Not that I dislike oriental food, but hot ‘n’ spicy DVDs were not exactly what I had asked for. Anyway, back to the call. A flat female voice brought me down to earth with the message “the number you requested is seven eight one – two two nine - eight seven eight six”. You would think any decent VUI designer knows by now that US phone numbers don’t get read this way, but apparently not so at 1-800-FREE-411. What’s worse, after I’d heard the requested phone number, I was presented with two options: hear it again, or get connected to … Girls Gone Wild. While I was waiting for the obvious third option that would connect me to David’s World Famous, the system again threw the flat-spoken number at me, and prompted me for yet another repeat. Just when I thought I was finally going to be connected, the system thanked me for calling, made some more publicity about their own website “to learn about other special offers” and then hung up. Two minutes and five seconds had gone by, and I was still left with an empty stomach.

After the stomach, time for the brain. I called 1-800-GOOG-411 again, now searching for the MIT Coop bookstore. The speech recognition of “Cambridge, Massachusetts” went smoothly, as expected. Alas, the business name turned out to be more problematic, with its two abbreviations. “MIT” stands for Masschusetts Institute of Technology, and is customarily pronounced one letter at a time: M-I-T. The word “Coop”, although an abbreviation for “cooperative“, is pronounced as an acronym over there, rhyming with “loop” or “soup”. Being a foreigner, I pretended not to know this and said “M-I-T Co-op” at first. Successive attempts to recognize this same pronunciation generated a “no match” leading to a “try again” prompt, and a low-confidence false match with an attached explicit confirmation prompt. The system then presented me with some indirect matches from its database, all of which were irrelevant. After the fourth list item, the Google voice suggested to start all over again, so that’s what I did and said. I now pronounced MIT as an acronym, sounding like the German preposition “mit”, and stuck to “Co-op” for the second part. Apparently I guessed right, because the system literally confirmed my incorrect pronunciations and offered me a short list of three MIT Coop locations. I chose the second one, and after one minute and fifty-five seconds, I was connected to the answering machine of the MIT Coop on Kendall Square in Cambridge, Massachusetts.

My first search for the MIT Coop at 1-800-FREE-411 failed immediately with the message “We’re sorry but no live operators are available at this time. Please try again later”. For an automated system, that’s an illogical answer, especially since 1-800-FREE-411 explains in its own FAQ that they are ”no longer supporting live operator services from certain localities”. Subsequent calls [1,2,3,4] did go through, but they all suffered from no matches and false matches, irrespective of my pronunciation of “MIT Coop”. I couldn’t verify if “MIT Coop” was in-grammar or out-of-grammar, but the corresponding web search did return one entry. On the positive side, 1-800-FREE-411 transfers callers to an operator after two failed recognition attempts.

My last search for Starbucks Coffee on El Camino Real in Palo Alto, California went without a glitch at both 1-800-GOOG-411 and 1-800-FREE-411. With Google, I was transferred after 45 seconds; with the other system I got to hear the complete number after one minute and fifty seconds. This time the irrelevant ads were from InCharge Debt Solutions and American Express, respectively.

Before we draw some conclusions, first a warning: no speech recognition system should ever be evaluated on the basis of a few calls and utterances made by a single speaker over a single channel. To do so would not only be unfair, but also unscientific and possibly completely wrong. This being said, my first impression is thar Google’s potential entry in the automated DA space should be a major concern for all other players on the US 411 market. As could be expected, the 1-800-GOOG-411 voice user interface is clean and snappy, with various error recovery mechanisms already in place; speech recognition looks good; and the direct transfer to the requested number is an obvious functionality that’s blatantly missing with 1-800-FREE-411. So looking from the technology side, Google seems to know what they’re doing – hardly a surprise.

A bigger challenge for Google or any competitor will be to balance the economic aspects of sponsored local audio ads (remember the DMarc acquisition) with the human interaction limitations of a spoken phone interface. A caller’s tolerance for inserted ads is inversely proportional to the degree of certainty with which the business or category name is entered. If I ask for Starbucks, I want Starbucks’ phone number; but if I just want coffee, multiple relevant results are expected, including sponsored transfers and special offers. With its army of natural language processing specialists, the richness and vastness of its data, and its very deep pockets, Google is well placed to shake the US Directory Assistance industry, if it wants to. Unless it has other priorities, with even bigger returns.

Get human – or get lost!

Monday, August 14th, 2006

Last year Paul English published his notorious IVR cheat sheet to help US customers bypass automated phone systems. Many customer service managers were not amused: they risked seeing their personnel costs skyrocket and the return on their technology investments plummet. Quite an unfortunate development, as the oft-cited goals of (speech-enabled) IVR technology are to reduce costs as well as to increase customer satisfaction.

In a remarkable keynote address held at last week’s SpeechTEK conference in New York City, Mr. English reiterated his plea to customer service managers and CEOs: stop running your customer service department as a cost center; instead, reach out to your customers, and start considering customer contact as an important company asset – which it is. In short: “get human” (again).

The trouble with such a general statement, of course, is that every company executive will agree in principle. But will they also act on it? In my opinion, key elements to bring about a positive evolution are:

  • a growing awareness of the real (= known plus hidden) costs of badly designed IVR systems;
  • the willingness to make current IVR systems more user-friendly, by using speech technology (and other technologies) knowledgeably;
  • the promotion of the customer service function to the top level of the company hierarchy, in order to enable the definition of integrated customer service strategies and policies.

A first tentative answer to the self-proclaimed American IVR crisis is Paul English’ proposal for a “GetHuman™ earcon standard “, currently open for discussion. In its current preliminary version, the very first rule reads:

If a human operator is available when a consumer calls, the human should answer the phone

Assuming human operators don’t sleep while working, this rule is a tautology. The real question is whether a sufficient number of operators are being made available at a given moment in time – which brings us back to workforce optimization, i.e. company policy.

As a sign of the importance given by the speech industry to the GetHuman™ initiative, it was also announced that Microsoft, Nuance Communications and others “will work with the GetHuman™ project to drive adoption of these standards”. Cynics might say that a dangerous initiative has been neutralized. Whatever the point of view, the GetHuman™ initiative is important because it stresses what should be self-evident: that customer service is about serving customers.

If customer service departments don’t get human, customers will vote with their feet, and tell them to … get lost.

ABN AMRO introduces speaker verification for phone banking in The Netherlands

Friday, July 21st, 2006

ABN AMRO, a major international bank with Dutch origins, yesterday announced the progressive introduction of voice verification for its phone banking customers in the Netherlands. According to the press release (available in English and Dutch), “voice verification will initially be applied to customers making balance enquiries, transfers and investment orders via the telephone.” The technology is “fast, easy and, above all, secure”, and its introduction “means better access and more convenience for the customer”. The system has been thoroughly tested on 1.450 people, including relatives, six twins and testers temporarily suffering from a cold. ABN AMRO claims to be “the first major bank in the world to introduce this technology in this way”. The system has been launched for “a[n] initial group of customers”, whose “experiences will provide the basis for the continued roll-out in 2007″.

Judgeing from my own experience with another recent but as yet under-the-radar speaker verification project, I can generally agree with ABN AMRO’s security and convenience claims. Let me explain how this can work in technical terms, and provide some nuances:

ABN AMRO’s system initially asks customers to say their account number, and then first performs a voice/speech recognition (not: voice/speaker verification) task. By saying this publicly known piece of information, any caller (whether he/she is genuine or an imposter) can thereby claim the identity of the person who owns the account number in question. The accurate recognition of this account number is made easier by the embedded check digits, which help with ruling out invalid recognition candidates in the N-best result list returned by the speech recognizer. A correct first recognition is convenient because the caller doesn’t have to waste time retrying. From a pure speech recognition (again, not: verification) point of view, the prime objective is not security, but convenience.

Security really comes into play in the next phase, after a successful recognition of the account number. In this voice/speaker verification phase, the claimed identity is verified. Is the caller really who he/she pretends to be? Instead of a difficult to remember PIN code (convenience!), various biometric characteristics of the caller’s voice are compared to a previously recorded and stored voiceprint from the genuine owner of the account number at hand. This reference voiceprint has been created beforehand during a one-time enrollment phase, which is equivalent to the one-time assignment of a PIN code to a genuine customer. If the caller’s voice sufficiently resembles the reference voiceprint, the caller is considered genuine, and access is granted. In the other case, if the caller’s voice differs too much from the reference voiceprint, the caller is considered an imposter, and access is refused. The whole question now is: what exactly do “sufficient” or “too much” mean in the previous sentences? ABN AMRO does not want to reject genuine customers, and certainly does not want to grant access to imposters. The answer to this question must come from a tuning exercise.

Any speaker verification (sub)system – or biometrics (sub)system, for that matter – needs to find the right balance between security and convenience. When a verification engine compares the caller’s voiceprint to a reference voiceprint, it returns a score. If this verification score is above a certain threshold, the caller is allowed access. If not, access is refused. The major goal of verification tuning is to set the “right” verification threshold. This is done by analyzing thousands of actual verification attempts in a controlled test setting, and allowing for a certain percentage of false rejects or false accepts. In ABN AMRO’s case, 25.000 test calls have been made so far with 1.450 different people. Whichever verification threshold was chosen, ABN AMRO has had to strike a balance between security (as few allowed-in imposters as desired) and convenience (as few refused genuine customers as desired).

Ultimately, the trade-off between security and convenience in the voice/speaker verification part of an IVR system, reflected in a higher or lower verification threshold, is a matter of policy and risk analysis. The progressive roll-out of the technology in the coming months allows ABN AMRO to fine-tune this threshold even better, based on real caller data.

Does all of the above sound alarming? It shouldn’t. Even though 100% security and convenience do not exist in a voice/speaker verification setting, customers must realize that they do not exist in a PIN code based system either! In fact, PIN codes are much less secure: if my PIN code falls into the hands of an imposter who also knows my (publicly known!) account number, the probability of that imposter getting access is almost 100%. My voice, however, is not that easily stolen or duplicated. If, on top of the account number, ABN AMRO’s voice/speaker verification system asks the customer for an answer to a secret question, security is even greatly enhanced. The press release did not expressly mention any such feature, but chances are it will be present.

Although ABN AMRO’s claim to be “the first major bank in the world to introduce [voice/speaker verification] technology in this way” is probably exaggerated, the announcement is important news for other financial institutions in the Benelux and Europe.

So, who’s next?

Technology vs. user-centered design

Saturday, August 20th, 2005

I ran into two articles this morning that are not linked, but in fact tell the same story.

First article, from De Standaard (in Dutch): chip-maker Intel is opening a “Concept Store” in Brussels, a temporary shop where they will showcase not their newest technology, but how to use it. Shop-sellers have been expressly forbidden to use techie words like “gigabytes” or “megahertz”, let alone “dual-core technology”. Instead, they are supposed to ask customers … what they would like to do with their computer. Is that revolutionary, or what?

Second article, from the Butler Group Blog: how to develop a speech application that traps your callers into an endless loop. Funny to read, but not so funny for the caller, and even less for the (anonymous) company that dares to treat its callers this way.

Morale of the stories: users don’t care what technology you use, as long as they can get their things done, and their problems solved. So if you consider speech-enabling your current IVR system, make sure you know what you’re doing, or get some professional advice first.