Archive for May, 2006

Forthcoming: Genesys Customer Interaction Portal for Self-Service

Thursday, May 11th, 2006

Previous posts ([1], [2]) in this blog pointed to the increased intrest in hosted or managed service models for speech applications. Another writing suggested that Genesys’ recent acquisition of VoiceGenie did not bring any news in terms of rapid application development. Apparently Genesys was well aware of this gap in their offering, as they have just announced the Genesys Customer Interaction Portal, “a browser-based portal to simplify the development and provisioning of voice self-service and speech applications”. The announcement is somewhat premature, though, since the portal will only be available in a month or so.

According to the press release, the Genesys Customer Interaction Portal will enable managed services providers to quickly productize speech applications (themselves built from so-called Voiclets) which are easily configurable by the end customer via a web interface. Sounds somewhat like a carrier-hosted variant of, if you ask me. In exactly what tool the speech applications and/or the Voiclets are themselves developed, however, is unclear: Voicint, the company who built the Customer Interaction Portal for Genesys, strangely also promotes VoiceObjects X5 as a complementary tool for building and managing speech applications. On top of this third-party tool, Voicint also offers VUI design services. All of which suggests that the Customer Interaction Portal is great for provisioning prepackaged applications and/or Voiclets, but not meant for quick-and-clean voice application development.

The dust will hopefully settle as soon as the Genesys Customer Interaction Portal is transformed from vaporware into testable software. To be continued!

Voxeo and MAP Telecom offer VoiceXML & CCXML hosting services in EMEA through strategic partnership

Tuesday, May 9th, 2006

Orlando, FL. based Voxeo and Monaco based MAP Telecom have announced a strategic partnership whereby “Voxeo will provide the IVR infrastructure for MAP Telecom’s current facilities in Europe and four planned facilities in the Middle East”.

Current customers would be given “the choice to move from MAP Telecom’s legacy platform” (Voxbuilder from Voxpilot), and migrate to “a new and expanded multi-language developer community portal based on the Voxeo Evolution site” (more particularly

It is not the first time that a US-based VoiceXML hosting company tries to set foot in mainland Europe. In May 2001, Tellme Networks acquired merged with Brussels-based MagicPhone, but the unconsumated marriage ended in poverty and dispute 15 months later.

From the point of view of Voxeo, a strategic partnership with a pan-European player makes sense in various ways. First, the financial risks linked to the setup of new platforms are shared. Second, MAP Telecom’s local knowledge in number provisioning on a pan-European scale (and beyond) offers Voxeo hassle-free access to a market of hundreds of millions of callers. Third, the respective companies’ core competencies are clearly complementary.

MAP Telecom, on the other hand, will benefit from Voxeo’s excellent reputation in system reliability and uptime.

Here’s a number of questions I’d like to see answered:
1) Why should MAP Telecom customers or development partners really care about which VoiceXML browser they’re using? Haven’t these become commodities, just like MS IE or Apache in the web browser world?
2) Will Voxeo’s excellent customer service be replicated in EMEA? If so, to what extent will the service be localized to a multilingual audience? Which party will take care of this, Voxeo or MAP Telecom?
3) Will MAP Telecom and its ecosystem of development partners commercially benefit from Voxeo’s customer base as far as global accounts are concerned? In other words, does the partnership offer any commercial synergy?
4) Will the addition of an alternative platform bring about lower prices, and hence market acceleration?
5) Will MAP Telecom’s legacy Voxpilot platform be maintained forever, or phased out?
6) How will the partnership succeed in convincing conservative European call center managers to adopt the hosted or managed services model for more than just the speech interface?

Irrespective of the answers to these questions, Voxeo’s crossing the Atlantic is a clear vote of confidence in the future of the European speech technology market. Finally, it will also be interesting to see if the partnership can be a boon the Skype Voice Services program, in which both MAP Telecom and Voxeo play a role.

Job: Principal Consultant in GVP & speech applications at Genesys, Europe

Monday, May 8th, 2006

See the bottom of this page.

Nuance Conversations in Mallorca, Spain

Tuesday, May 2nd, 2006

Last Thursday and Friday I was at Nuance Conversations Europe in the picturesque town of Porto Petro on the isle of Mallorca, Spain. With 235 people from 25 countries, the first European edition of this conference was well attended. This post presents some highlights from the plenary sessions, from Thursday’s technical track and from Friday’s business track.

Nuance Conversations Mallorca, 27-28 April 2006

In the introductory session, Steve Chambers outlined Nuance’s (network) speech strategy. The first focal point in the short term is – or rather, remains – customer care: what callers want is speedy service, and a sense of control. “Human Touch” is the unifying theme under which Nuance is addressing the customer’s desire for more natural, unconstrained input. OpenSpeech Dialog (OSD) is the flagship product that should help make this dream come true. The second short-term focus is about “Googlizing Voice”, with “Mobi” as a concept of mobile dictation in a multi-modal setting. In the longer term, Nuance’s marketing strategy is centered around the idea of the Visible Customer: organizations should take a holistic approach towards customer care, by integrating currently siloed customer personas of repeat callers into a single view. Mass-personalization means new revenue opportunities. Mr. Chambers organized an automated vote, in which the audience identified lower prices and advances in speech accuracy as major factors for accelerating the adoption of speech technology in the telephony network.

Keynote speaker at the conference was Eckhard Geulen, Senior Exec. VP Marketing & Sales of Value Added Solutions at T-Com, Deutsche Telekom’s fixed branch. Dr. Geulen strongly pleaded in favor of customer-motivated speech initiatives, as a necessary complement to the more traditional organizationally motivated cost cutting exercises. Speech technology should indeed be positioned in the larger context of value-added services, and not just as a replacement for expensive customer service representatives. Referring to the Visible Customer concept, Eckhard Geulen openly admitted that Deutsche Telekom still has a long way to go: today, the respective identity records of a customer switching from T-Com to T-Mobile (or vice versa) are not linked (yet). Another frank statement: “the [German or European] market isn’t there yet”. To offer better risk-return ratios for its customers, T-Com has heavily invested in a managed services model. This way small and mid-size companies who are unable or unwilling to run their own voice platform can profitably develop speech initiatives without incurring heavy up-front investments. To further support market acceleration, T-Com has partnered with Genesys, Nuance and Voice Objects to create a Voice Community for the German-speaking market.

Frederik Durant in Porto Petro, Mallorca, at the Mediterranean

Thursday’s technical sessions featured the products OpenSpeech Dialog (OSD) and PromptSculptor, and also presented some best practices in information-driven VUI design and multi-lingual speech systems.

OpenSpeech Dialog supports a holistic approach to voice application development, based on the higher-level xHMI (eXtensible Human Machine Interface) language. xHMI was developed by Nuance and 20 partners to enable a “simpler and quicker” implementation of adaptive calls, i.e automated calls that adapt sensibly to callers speaking in their own way. At runtime, the OSD application flow is controlled by a conversation manager that keeps track of which slots from the conversation memory still need to be filled out. To implement the same functionality, xHMI code should be more compact and powerful than VoiceXML, which is more susceptible to “state explosion”. The compactness should be no surprise as xHMI does not live next to VoiceXML, but on top of it: the xHMI runtime processor indeed generates VoiceXML. To further ease voice application development, the next version of V-Builder (4.0) will support creation of xHMI code through a graphical interface. This is one example of a “blue” Nuance tool (V-Builder) integrating with a tool from the Scansoft/SpeechWorks legacy (OSD).

Comment: with respect to the xHMI vs. VoiceXML discussion, Nuance acknowledged that “innovation leads standardization by 3-5 years or more”. It remains to be seen whether xHMI will ever make it into an industry-backed standard as widely supported as VoiceXML is nowadays. Given the announced integration of xHMI, OSD and V-Builder, the first question developers should ask themselves is not whether they need xHMI or VoiceXML, but rather whether they need (and can afford!) a VoiceXML-generating tool at all to start with. If the answer is positive, the next question then is whether OSD/xHMI/V-Builder is suited for the job, as compared to e.g. VoiceObjects X5 or Audium (whose respective companies were, by the way, present at the conference as sponsors ).

On the text-to-speech side, PromptSculptor allows VUI designers/implementors to tune statically or dynamically generated prompts. PromptSculptor’s GUI allows users to manually edit any input text at the word or even phoneme level, by adapting a.o. duration, pitch or stress. The adapted prompt elements are stored back in the acoustic database, where they can be used for offline or online TTS generation. PromptSculptor is Nuance’s counterpart of Loquendo’s TTS Director and, to a minor extent, Acapela’s VirtualSpeaker. Next to PromptSculptor, Nuance’s senior TTS director Jan De Moortel also presented CustomVoices, a program/process allowing brand-aware companies to (have Nuance) develop their own custom TTS voice. Interested readers should count about one month for voice talent selection, script selection and recording, and another 3 to 5 months for building the actual new TTS voice. The first audio samples are available about 6 weeks after the end of the recording sessions.

To conclude the TTS session, Michel Arsac-England from VoltDelta International presented some lessons from a DA implementation at Telix AG. He pointed out the importance of TTS quality to user acceptance: most people dislike mispronounced names. Of course, even with an acoustic tuning tool like PromptSculptor, input data quality remains a prerequisite for TTS quality, as the GIGO (garbage in, garbage out) principle fully applies.

From the session on multi-lingual speech applications, I recall the following lessons: modularize “just enough” (?) to preserve flexibility, find experts on local culture, be aware that requiremenents, expectations and success metrics depend on culture, and don’t assume that language equals culture. As always, best practices are quite easy to enumerate, but more difficult to realize. Experience really makes the difference here.

Cultural awareness is key to developing excellent multilingual speech applications

On Friday I attended two sessions from the business track.

Christian Pereira, CEO of dtms Solutions, made the economic case for the hosted and managed service models for speech applications. Mr. Pereira completed Eckhard Geulen’s observations by pointing out the cost advantage of a mutualized, outsourced voice (application) platform. His company boasts a 30% growth in platform capacity a month, which does not require, however, a commensurate growth in support personnel. In a non-mutualized environment, a similar growth would be prohibitively costly due to an increase in operational staff, and therefore hamper market development. By centralizing and mutualizing the voice (application) platform, the cost per port per year drops from approx. 3000 euros (in a 30-port setup) to only 1000 euros (in a 1000-port setup). The cost advantage is shared between the customer and the platform provider (i.e. dtms Solutions). Mr. Pereira identified professional services and network limitations as the bottlenecks in dtms Solutions’ growth path . They are respectively addressed by stable partnerships with external application development companies, and by contracts with alternative network carriers.

Comment: Mr. Pereira singled out T-Com as his largest competitor, but that may only be true as far as the German (speaking) market is concerned. For the moment, dtms Solutions clearly focuses on the German (speaking) market, as their website does not even exist in English (yet). Similar companies like Monaco-based MAP Telecom openly target the pan-European market, but it is unclear today whether that broader focus (a contradictio in terminis?) is an asset or a liability. As the market matures and the demand for speech-driven phone applications rises, the value of a stable partner network with local ramifications – remember the importance of culture – may indeed prove to be the differentiating factor.

In the next session, Peter Mahoney, Nuance’s Vice President of Worldwide Marketing explained how to launch a speech application. Surprisingly often, speech projects focus on planning and building the application, but then neglect to bring the application to the market in a structured, controlled way. As user acceptance leads to loyalty and drives ROI, no-one really can afford to neglect the last part of the project. Mr. Mahoney presented best practices like building a launch team, and involving the marcom function as well as potentially concerned customer agents up front. In other words, he pleaded for not neglecting the internal audience; their acceptance of the speech initiative is just as important for overall success as the external end customers’ experience.

Nuance Conversations Europe was concluded with a presentation by Vlad Sejnoha, Nuance’s Chief Scientist. The main message I recall is that the traditional categories that prevail today in the speech world (network speech, embedded speech, dictation) are bound to become blurred in the coming decade.

All in all, it was a great event in a great setting; if speech technology and/or its European market penetration indeed evolve as quickly as announced over the next 12 to 24 months, I’d love to be present next time as well.

Job: Skype Voice Services Program Manager (London, UK)

Monday, May 1st, 2006

See the Monster job ad.