Voice interfaces are finally ready for prime time, thanks to mature technology and the fact that there is a high user demand for such a service. However, you wouldn’t always know it looking at some of the products that are out there.
Based on what we know, here are five common obstacles which you might need to overcome to make your voice UI simpler and more satisfying to use.
1) It Is Not Exactly Clear How It Works
When your input is just a microphone, people have no idea what your product can (and cannot) do. So they try crazy things beyond the capacities of your system and end up disappointed.
This is why it’s a good idea to frame your voice interface – through its visual design, its name, and the way it introduces itself.
By having your voice interface ask a specific question (like ‘What clothes are you looking for?’) you’ll make sure that users focus on topics it can answer, rather than oddball questions like ‘Who is Keyser Söze?’ or even simply ‘When is the shop open?’.
2) You Haven’t Considered Privacy and Social Contexts
People are happiest using voice interfaces somewhere they aren’t likely to be heard: at home, in their cars, or even in the ‘privacy’ of a crowded street. At work, or in static public spaces like the train, it’s a different matter.
Asking people to shout the name of the medical condition they need help with is unlikely to be a good idea in most contexts.
Naturally, voice will not be appropriate for all tasks in all social contexts. On the other hand, that doesn’t mean that voice cannot be used for sensitive information. In the privacy of their home, people will most likely prefer to dictate secret information such as their Netflix password rather than typing it on an awkward on-screen keyboard.
3) Your UI Has a Short Memory
Most conversations involve a series of related statements or commands (‘What was the last message from my husband? Send him a reply.’)
Voice systems that lose track of the conversation (‘Searching the web for ‘Send him a reply…’) are infuriating. We expect to be able to build on what’s already been said as we do in a human conversation.
Make your system smart enough to keep key information in mind, and build on it to understand future queries. For example, “Just the formal ones” should be understood when it comes after “Show me the shirts on offer”.
4) It Is Exclusively Voice-Based
There are tasks that a purely voice UI will never be good at. Providing a long list of options by speaking them out, one after the other after the other is time-consuming and therefore boring.
Consider mixing voice and graphical UI, using one’s strength to compensate the other’s weaknesses, to easily display several options in a manageable manner.
Take Amazon’s Echo device – it has a companion mobile app so that when you want to see the shopping list you’ve built up through voice, instead of it being read out painfully item by item, it can be viewed and edited on screen.
5) You Don’t Think About the Audio Context
If you are in a noisy environment like a driving a car, a factory floor, or there’s loud music, you system will struggle. It’s obvious but often forgotten: voice UI needs to be able to distinguish the voice clearly.
When designing a voice interface, consider explaining why the system is having trouble understanding (‘Sorry, I couldn’t hear that clearly – it’s noisy right now.’). It helps users understand why there’s a problem and how they might fix it.
If you control the hardware, more complex audio capture systems would work better, as Amazon did with the Echos’ seven microphones array. If the only audio input available is a smartphone mic, consider providing an alternative interface, such as physical buttons or on-screen.