Note: Make blog post then link to top-level menu!
"Text-to-Speech" "Speech-to-Text" ("instant message" OR "instant messaging" OR "instant messenger")
Notes
What is an IM-Voice bridge, and what does it need to do??
A lot of the tawdriness associated with chatterbots results from the current text interface impasse. A truly interactive voice interface would literally do wonders for the "face" of AI, and help it to break through the glass ceiling where its been stuck for some decades. Clearly, Internet infrastucture is moving in the direction of the mobile platform, which is even more suitable for voice interaction than the desktop of yore.
An IM-Voice Bridge goes from text to speech as well as from speech to text. It takes standard instant messages and lets you hear them. It also makes possible actually speaking into an instant messenger, in other words voice to text. A good application would include an animated avatar, or talking head, with lipsync. A really good application would include a customizable avatar to varying degrees, such as hair, skin, clothes, etc. Most instant message applications are based on the XMPP protocol, formerly known as "Jabber". Recently, even the big holdouts like Skype and Facebook have come into line with XMPP standard for instant messaging. In fact, XMPP instant messaging is the primary transport mechanism for conversational AI engines; in other words, textual conversation travels across the Internet in the form of an instant messaging.
Varying elements of this concept have been implemented in assistive technologies for the benefit of disabled people. There have also been a number of similar products in the past, which for one reason or another became defunct. In the past,
speech-to-text reporters (STTR), or "captioners", were people who manually transcribed speech to text for use by the disabled. Assistive technologies include both
Screen Readers and
Voice Browsers. More recently, there have even been moves toward "voice operating systems". For example, screen readers and voice browsers have been used to speak (hear) instant messages.
My primary interest is to be able to talk with AIs via any IM-Voice Bridge; however, such as system could equally be used by anyone for "hands free" communicating, while driving for example. Instant message and SMS are indeed very close in compatibility. Increasingly, there are mobile apps for hands-free SMS while driving.
If there were an existing application, such as an
IM-Voice Bridge, then I could easily interface with any AI Engine via XMPP (formerly "Jabber"). Ideally, an IM-Voice Bridge would include its own lipsynced avatar (animated avatar). However, there still is no decent consumer IM -Voice Bridge application available, neither on the web nor on mobile (with or without an avatar). I want a talking avatar app for instant messaging, plug and play any
XMPP IM account on backend. After investigating assistive technology such as screen readers and voice browsers, I've concluded they are not adequate.
Currently, there no web-based voice-in / voice-out solution available for
instant messaging. I am mystified as to why there seem to be no plug and play animated web avatar front-ends for IM. There ought be a product that can speak instant messages on the fly (IM-to-Speech).
I've searched hard for any Windows7-compatible desktop avatar (talking head) frontend, which can easily accept *ANY* IM-XMPP/Jabber. Generally an Avatar System will include text-to-speech (TTS) (aka
speech synthesis) with
lip-sync.
I have succeeded in uncovering two enterprise level voice-IM gateways, by 4DK and Gold Systems.
I've asked many in the industry for advice, including:
VoiceXML
Wikipedia has a helpful "
Comparison of screen readers", which includes a listing of "
Contemporary screen readers". Screen readers will be for Windows, Mac, Open Source, or mobile platforms. Apple iOS has a built-in screen reader accessibility feature, called "
VoiceOver". (Apparently,
Apple iChat does do IM to speech, but don't know about speech to IM.) Generally speaking, assistive screen readers do not provide an ideal solution for people without special needs. Perhaps, so-called voice browsers are a step further in that direction.
Fire Vox is a browser plugin, an open source extension for the Mozilla Firefox web browser that transforms it into a self-voicing application.
SkypeTalking is a Python program that reads incoming and outgoing chat messages using the Skype API and your screen reader.
WebAnywhere is a web-based screen reader for the web, and so requires no special software to be installed.
Can a "
Voice Browser", such as http://getvocal.com, be used to hear and speak generic instant message IM sessions?
CPEeK-Up
"CPEeK-Up" .. telephony text <=> speech bridge to help speech-disabled people
SoftBridge
SIMBA enables a text-based IM client to communicate with an IP phone, a telephone or a cellular phone. Similar to the Deaf Telephony SoftBridge, SIMBA provides a bridging service, which enables a Deaf user with a text-based IM client to communicate with a hearing user with a telephone or a cellular phone. Using an IM client, a Deaf user sends a text message to a telephone user through SIMBA. SIMBA establishes a call to the telephone user and converts text messages to speech via a Media Adapter Server (MAS). When the called user picks up the phone, he/she hears the synthetic voice and speaks to the Deaf user. After receiving audio from the hearing user, SIMBA then controls the MAS to convert the incoming audio stream to text and sends the text message to the Deaf user.
Google API (+ App Engine ?)
Wondering how to bridge undocumented @googlechrome speech recognition with undocumented @googlevoice API .. in order to *speak* to chatbots
Open Source
Desktop
http://smart-butler.com .. 2009 text to speech (TTS) virtual agent for instant messaging (IM) .. USD 15 .. Microsoft Agent
Enterprise Solutions
Defunct
Mobile (iPhone & Android: Hands Free SMS)
http://chatopus.com .. by @asiayeah .. "Speak instant messages (text-to-speech support)" [Palm OS platform]
Listen To Incoming SMS On Your Android Phone
Listen To Incoming SMS And Respond via Speech-To-Text [Android]
References