Archive for the ‘VoiceXML’ Category

Bronze INCA Award for voice application offering realtime Belgian train times

Thursday, May 14th, 2009

Last Tuesday at the iMinds conference in Ghent my speech application prototype with realtime Belgian train times got a Bronze INCA Award. Here’s what the jury had to say:

A practical application shows the clear potential for new applications by voice via any phone using public data and open APIs (that are unfortunately not available yet).

About 25% of my code and time was indeed spent on the development of an (obviously) invisible screen-scraping Web service layer on top of Judgeing from the feedback I got before, during and after iMinds, there would have been 5 or 10 times as many public transport related INCA submissions, if only Infrabel, De Lijn, MIVB etcetera had done the effort of opening up their internal web services to the development community.

Bronze INCA Award for Realtime Belgian train times application at iMinds 2009 in Ghent

So, in case the folks at IBBT and the newly to-be-elected politicians are looking for a low-cost initiative with an immediate impact on innovation and value creation in the local ICT sector: have the Belgian and Flemish government agencies and companies open up their data. Make this your top priority for this year. No, this month. No, this week!

INCA Award submission, in French

Tuesday, April 28th, 2009

Since the train delay voice application demo submitted for the INCA Award does not only exist in Dutch, I thought a French video was also in order. Here it is:

INCA Award submission

Tuesday, April 28th, 2009

Yesterday afternoon I submitted my proposal for the INCA Award. Read all about it on

Or just watch this video:

Announcement: iMinds conference in Ghent on May 12, 2009

Sunday, April 26th, 2009

In two weeks’s time the Flemish Institute for BroadBand Technology (IBBT) is hosting a new networking event called iMinds. This conference is the sequel of the 5-year old IBBT Brokerage event.

What’s more, IBBT also announced an excellent new initiative for innovative developers: the INCA Awards, with €20K in prize money. The deadline for submission is tomorrow evening – still working on my project! The winners will be awarded at the same iMinds conference on May 12.

See you in Ghent!

Voxeo acquires VoiceObjects

Tuesday, December 9th, 2008

Two of my favourite companies in the speech-driven call automation field today announced their union. By acquiring VoiceObjects of Cologne, Germany, the American voice hosting company Voxeo further expands its presence in mainland Europe.

Voxeo of Orlando, Florida runs the world’s largest VoiceXML platform and is reputed for its extreme service. Three years ago I had the pleasure to work with them while developing the now defunct Beavis and Butthead hotline. I was just as impressed with the stability of their platform as with the professionalism and accessibility of their staff, up to and including CEO Jonathan Taylor.

In summer 2004, just after I started as an independent consultant in this business, VoiceObjects was so kind as to offer me a free voucher for their Consulting Partner Certification. Apart from this gesture, I also appreciated the flexibility and ease of use of their eponymous flagship product which greatly simplifies the development of multilingual speech applications.

Developers like me will surely welcome the announcement that “VoiceObjects will also be available in extremely cost-effective on-demand and on-premise offerings bundled with Voxeo’s own Prophecy VoiceXML Platform.”

Voxeo: “Largest VoiceXML IVR Hosting Provider in Europe”

Thursday, September 27th, 2007

Yesterday Voxeo announced their leader status as VoiceXML hosting provider in … Europe. The Orlando, Florida based company boasts a capacity of no less than 5000 concurrent ports from their existing data centre in Slough, UK, with local phone  number availability from every European country. An additional hosting facility in continental Europe is in the process of being deployed.

I have had the chance to develop and deploy sizeable VoiceXML applications on the Voxeo platforms a while ago, and was impressed by the company’s excellent support. My favourite debugging tool at the time, however, was BeVocal’s: it contained less low-level information, and the colouring log filter capabilities were as simple as brilliant.

As I have wondered before, it will be interesting to see if and how Nuance/BeVocal and Microsoft/Tellme will react to this outright challenge on the European front. Not to mention the local champions in the various European countries.

VoiceXML Application Developer Certification

Thursday, June 28th, 2007

After having planned and postponed it for three years, this morning I finally took the official VoiceXML Application Developer Certification test, at the Cronos Campus in Brussels. It was tougher than I thought, but I did pass, with a decent margin. So from now on my wife is married to a certified VoiceXML developer – ain’t that a thrill!

VoiceXML Application Developer Certification

The exam questions are crafted in such a way that it doesn’t make much sense to study the language elements exhaustively from books; a few years of development experience seem to do the trick just as well, if not better. Most questions are of a practical nature, in the sense that you’re expected to predict the behaviour of a piece of VoiceXML, SRGSSISR, SSML, CCXML and/or JavaScript code. Or vice versa: what code is required to achieve a specific outcome?

For me the toughest questions were about subdialogs, CCXML, and the “initial” tag – probably because I haven’t come across these areas too often in the recent past. The easier questions for me had to do with grammars – both in XML and ABNF notation. If you do want to study one subject thoroughly, I would suggest the Form Interpretation Algorithm.

Some facts and figures: the certification cost me 122 euro, and I did need the full 120 minutes to fill out and review the 59 questions. So far the VoiceXML certification test is only offered on a desktop computer – if anyone’s working on a VoiceXML implementation, let me know :-)

Interview with IET Magazine on Voice Biometrics in Phone Banking

Saturday, January 20th, 2007

A few weeks ago I was interviewed by IET Magazine about the application of voice biometrics in financial call centre environments. The article “Look Who’s Talking” by Juan Pablo Conti is now available on the Institute of Engineering and Technology website. The IET is the largest professional engineering society in Europe and the second largest of its type in the world.

The article opens rather spectacularly with the statement by Andrew Moloney, head of international marketing at RSA Security, that innovative, entrepreneurial fraudsters are moving their criminal activities from online banking to phone banking. To counter this new form of fraud, financial institutions increasingly base their security not only on what their customers know or have, but also on what they are. Enter voice biometrics. The author mentions the two publicised cases from the Low Countries that have been covered in previous posts on this weblog: ABN AMRO in The Netherlands and Dexia in Belgium. As frequent readers of this blog know, I have had the pleasure of contributing to the latter project.

Whereas the RSA Security executive non-surprisingly stresses the security aspects, my personal contribution to the IET article focused in on the convenience benefits for customers. In a previous weblog article I explained that from a narrow technology-only view on voice biometrics, a heart-rending trade-off between security and convenience seems inevitable. The real value of the IET article is that Andrew Moloney shows a way out of this dilemma.

Mr. Moloney is quoted saying that there is probably a 10% level of [false] reject rate (my emphasis). Note that this figure means nothing if we don’t know the corresponding false accept rate (FAR). But for the sake of the argument, let’s assume the FAR is at an acceptable level (as defined by the financial institution, based on a policy decision, a given accept/reject threshold and test results). Now, to lower the 10% FRR - indeed unacceptable in large-scale roll-outs - while keeping the FAR at a fixed low level, the RSA Security executive’s strategy of framing the voice biometrics application in a broader security and convenience perspective is absolutely right. Mr. Moloney explains that by looking at a genuine caller’s past usage patterns, it becomes possible to factor in more security-related attributes in the final accept/reject decision. How can this work?

My interpretation is that at first, the pure voice biometrics threshold is lowered. As a result, FRR goes down, while FAR goes up – that’s the name of the trade-off game. But to compensate for this temporary loss of security, the call’s actual (non voice related) attributes are then compared with the expected attributes as learnt from the (assumed) genuine customer’s past usage patterns: filtering out abnormal behaviour brings the FAR down again. In the end, FRR goes down, while FAR is still stable at an acceptable level. So everyone wins.

ABN AMRO introduces speaker verification for phone banking in The Netherlands

Friday, July 21st, 2006

ABN AMRO, a major international bank with Dutch origins, yesterday announced the progressive introduction of voice verification for its phone banking customers in the Netherlands. According to the press release (available in English and Dutch), “voice verification will initially be applied to customers making balance enquiries, transfers and investment orders via the telephone.” The technology is “fast, easy and, above all, secure”, and its introduction “means better access and more convenience for the customer”. The system has been thoroughly tested on 1.450 people, including relatives, six twins and testers temporarily suffering from a cold. ABN AMRO claims to be “the first major bank in the world to introduce this technology in this way”. The system has been launched for “a[n] initial group of customers”, whose “experiences will provide the basis for the continued roll-out in 2007″.

Judgeing from my own experience with another recent but as yet under-the-radar speaker verification project, I can generally agree with ABN AMRO’s security and convenience claims. Let me explain how this can work in technical terms, and provide some nuances:

ABN AMRO’s system initially asks customers to say their account number, and then first performs a voice/speech recognition (not: voice/speaker verification) task. By saying this publicly known piece of information, any caller (whether he/she is genuine or an imposter) can thereby claim the identity of the person who owns the account number in question. The accurate recognition of this account number is made easier by the embedded check digits, which help with ruling out invalid recognition candidates in the N-best result list returned by the speech recognizer. A correct first recognition is convenient because the caller doesn’t have to waste time retrying. From a pure speech recognition (again, not: verification) point of view, the prime objective is not security, but convenience.

Security really comes into play in the next phase, after a successful recognition of the account number. In this voice/speaker verification phase, the claimed identity is verified. Is the caller really who he/she pretends to be? Instead of a difficult to remember PIN code (convenience!), various biometric characteristics of the caller’s voice are compared to a previously recorded and stored voiceprint from the genuine owner of the account number at hand. This reference voiceprint has been created beforehand during a one-time enrollment phase, which is equivalent to the one-time assignment of a PIN code to a genuine customer. If the caller’s voice sufficiently resembles the reference voiceprint, the caller is considered genuine, and access is granted. In the other case, if the caller’s voice differs too much from the reference voiceprint, the caller is considered an imposter, and access is refused. The whole question now is: what exactly do “sufficient” or “too much” mean in the previous sentences? ABN AMRO does not want to reject genuine customers, and certainly does not want to grant access to imposters. The answer to this question must come from a tuning exercise.

Any speaker verification (sub)system – or biometrics (sub)system, for that matter – needs to find the right balance between security and convenience. When a verification engine compares the caller’s voiceprint to a reference voiceprint, it returns a score. If this verification score is above a certain threshold, the caller is allowed access. If not, access is refused. The major goal of verification tuning is to set the “right” verification threshold. This is done by analyzing thousands of actual verification attempts in a controlled test setting, and allowing for a certain percentage of false rejects or false accepts. In ABN AMRO’s case, 25.000 test calls have been made so far with 1.450 different people. Whichever verification threshold was chosen, ABN AMRO has had to strike a balance between security (as few allowed-in imposters as desired) and convenience (as few refused genuine customers as desired).

Ultimately, the trade-off between security and convenience in the voice/speaker verification part of an IVR system, reflected in a higher or lower verification threshold, is a matter of policy and risk analysis. The progressive roll-out of the technology in the coming months allows ABN AMRO to fine-tune this threshold even better, based on real caller data.

Does all of the above sound alarming? It shouldn’t. Even though 100% security and convenience do not exist in a voice/speaker verification setting, customers must realize that they do not exist in a PIN code based system either! In fact, PIN codes are much less secure: if my PIN code falls into the hands of an imposter who also knows my (publicly known!) account number, the probability of that imposter getting access is almost 100%. My voice, however, is not that easily stolen or duplicated. If, on top of the account number, ABN AMRO’s voice/speaker verification system asks the customer for an answer to a secret question, security is even greatly enhanced. The press release did not expressly mention any such feature, but chances are it will be present.

Although ABN AMRO’s claim to be “the first major bank in the world to introduce [voice/speaker verification] technology in this way” is probably exaggerated, the announcement is important news for other financial institutions in the Benelux and Europe.

So, who’s next?

The shrinking VoiceXML ecosystem: curse or blessing?

Tuesday, June 13th, 2006

Two months ago this blog expressed mixed feelings about Genesys‘ acquisition of VoiceGenie, argueing that a takeover of Audium or VoiceObjects would have been more instrumental in bridgeing speech application ecosystem bottlenecks. John Chambers must be among our readers, because last Thursday Cisco Systems, the worldwide leader in networking for the Internet, has announced an agreement to acquire Audium for $19.8 million in cash.

According to the press release, the acquisition will “enable enterprises to easily build automated voice response applications that are integrated with not only their converged IP network but also work well within their Services Oriented Architecture (SOA) enabling the use of common services across the network”.

Cisco IOS VoiceXML interpreters have been integrated in various gateway and router products for quite a while, but this huge potential – given the installed base – was not yet matched by state-of-the art tools on the application development side. So from Cisco’s point of view, the acquisition clearly makes sense.

The continued shrinking of the VoiceXML ecosystem may mean that technology providers will have an easier time offering well-integrated, well-supported (and, on the whole, more affordable?) product suites to their customers. If integration issues are the parasites of the VoiceXML ecosystem, consolidation will reduce their visible impact or kill them alltogether – over time. A Good Thing, one might say. On the other hand, the reduction of the VoiceXML ecosystem to a smaller number of vertically integrated players may hamper standardization, openness and innovation. For one thing, the gobbling up of Audium by Cisco may very well make it practically or commercially impossible for non-Cisco customers to use this excellent tool. Unless they also switch to Cisco hardware, that is …

Let’s hope that common sense prevails, in the (common) interest of accelerated market development. Once we’re there, may the best win.