For people with a physical impairment, using switches and buttons to control everyday technology and devices is not always easy. Although voice control could be a viable solution, vocal interfaces for so-called ‘assistive devices’ are not yet widely implemented – because of several reasons:
- Users that could benefit from voice control often have a speech pathology as well, making (even state-of-the-art) speech recognizers virtually useless.
- A user’s voice may change over time due to progressive speech impairments, the environment, changing body position, etc. – requiring the voice control solution to adapt / learn ‘spontaneously’ for the system to work properly.
The ALADIN (Adaptation and Learning for Assistive Domestic Vocal Interfaces) project addresses current solutions’ shortcomings, thus making voice control a viable aid. The ALADIN interface is capable of learning what is the meaning of a certain command, which words are used, and what are the user’s vocal characteristics. This gives users much more flexibility, and helps them overcome current limitations: they can formulate commands as they like, using the words they are able to pronounce. The commands are learned while users are using the device.
ALADIN is designed to be a modular framework, capable of interfacing with existing interfaces such as Z-Wave, X10, KNX or IR remotes in the future, as well as provide a native experience via a tablet interface. Currently we use KNX hardware for demonstration purposes, although we are working on a native television application and IR remote control integration.
KNX setup used during the iMinds The Conference demo
To cope with dysarthric speech patterns, ALADIN is able to learn vocabulary and induce grammar in a robust and adaptive manner. This means users do not have to adapt their speech to the system and can use their own words and sentences to control their home and devices.
We distinguish two phases: a training phase and a usage phase. In the training phase, a command is learned by giving the desired vocal command, followed by demonstrating the action with the manual control. For example, the user could give the vocal command “Turn on the television” together with pressing the standby button on the television remote control.
A vocabulary finding module uses a semantic frame description of manual controls together with the vocal commands to find words or phrases that constitute the user’s commands. The main challenge here is that since the vocal user interface operates without prior speech knowledge, we work without segmentation and without knowledge of word order. The vocal command is processed into low level (spectrographic) and intermediate level (utterance-based) acoustic representations. The word finding is based on non-negative matrix factorization (NMF), which decomposes the utterance-based representations into a low rank multiplication of recurrent acoustic units representing subwords, words or phrases, and their activation across sentences. For example, the learned ‘vocabulary’ could become “Turn on” and “the television”.
In the usage phase the most likely command is induced from a vocal command by factorizing it’s utterance-based representation using the acoustic units found in training. With acoustic units mapped to elements of semantic frames during training, the recognized action is then send to the device. The effectiveness of this approach has been evaluated on a database with spoken home automation commands [source], and is visualized below.
Vocal interfaces can help users who are unable to easily use their upper limbs to control regular remote control systems. There are numerous causes for reduced motor control of limbs, such as Parkison’s disease, stroke (CVA), multiple sclerosis (MS), amyotrophoc lateral sclerosis (ALS), or spinal cord injury. Some of these conditions also cause a voice impairment, most notably as the result of a stroke, Parkinson’s disease, and ALS.
During the development of ALADIN, we involve users throughout the design process, charting their needs and expectations, and testing out prototypes in an iterative design process.