| Website | vlingomobile.com |
| Category | Web |
| Phone | (617) 868-0227 |
| info@vlingomobile... | |
| Employees |
| Total | $26.5M |
| Series A Charles River Ventures Sigma Partners | $6.5M |
| Series B, 4/08 Yahoo! Charles River Ventures Sigma Partners | $20M |
Cambridge-based Vlingo is trying to make voice enabling applications easier, by using their own speech-to-text J2ME/Brew application API (Windows/Symbian later this year). Using the API, developers will be able translate a user’s voice to text, and use it in their application as if typed directly into the program. One of their first examples was for local search and shopping. Vlingo voice-enabled a text box on the program you could fill out by holding down the talk button and saying a phrase, like “Pizza in San Francisco”. The system then fills in the form with what you said, letting you modify the text normally if it gets it wrong.
Vlingo plans on monetizing the service by charging developers on a cost per month or per user basis. Their only direct competitor is Yap, but there are other alternative services like Jott that use human beings in the mobile voice-to-text space.
The team behind the service has some significant experience in the speech recognition space. The two co-founders (Mike Phillips and John Nguyen) worked for SpeechWorks, which was acquired by ScanSoft, which then renamed itself Nuance. Nuance most recently paid $293 million for VoiceSignal, a company using speech recognition for mobile search in 21 languages.
| Website | vlingomobile.com |
| Stage | Live |
| Tags | voicetotext, text, voice, mobile, phone |
Vlingo’s system starts with a basic statistical language model to make the best guess about what you say. It then improves upon that by taking into account context, and positive and negative user feedback down to the individual. Context helps the system by narrowing the number of possible words you said. For instance, if the context is an address, the number of possible street names is limited to the ones in the city. User feedback correcting the system’s output or leaving it be helps the system learn how you speak (e.g correcting Austin to Boston).