It’s time virtual assistants got smarter and Amazon is taking things forward by teaching Alexa how to have more natural conversations. The new ‘Conversation Mode’ that is going to be rolled out to Echo devices soon allows Alexa to participate in free-flowing conversations with users. The more natural part of the whole process is that users will not necessarily have to use the wake word ‘Alexa’ to initiate the conversation.
Under normal circumstances, users wake up virtual assistants with wake words like - Hey Siri, Hey Google, or in this case, Hey Alexa. This makes the exchange between the assistant and the user sound robotic. The new Conversation Mode should make the exchange resemble a normal conversation where users do not always use the names of the people they are speaking to.
This Conversation Mode can be enabled and disabled by voice commands and thus can be turned on only when needed. This takes care of the privacy aspect at least partially.
Amazon introduced this new mode in its hardware event in 2020 and the feature is now ready to be rolled out. Last year, the company demoed this mode by showing how it would work when two people were talking about ordering a pizza. The demo showed how Alexa was able to understand the exchange and make necessary changes to the order including picking a topping and deciding on the size.
Echo users can draw Alexa into a conversation by saying -- “Alexa, join our conversation”. Now, while normal conversations are natural for us, it’s a hard space for artificial intelligence (AI) to navigate.
Amazon says it uses a combination of visual (explains why Echo Show devices are going to be the first devices to get it) and acoustic clues to understanding when the conversation is directed at the device and whether a reply is expected. The company has pointed out that is a hard problem for AI since many questions in a conversation could be meant for a device or a person, this is where the visual clues will help. Amazon says it has developed a method for the device to understand if it is being spoken to by estimating the head orientation of the people in the conversation and within its field of view.
“We trained a deep-neural-network model to infer the coefficients of the templates for a given input image and to determine the orientation of the head in the image. Then we quantised the weights of the model, to reduce its size and execution time. In our experiments, this approach reduced the false-rejection rate (FRR) for visual device directedness detection by almost 80% relative to the [standard perspective-n-point] approach,” Amazon explained in a blog post.
In addition to this, Amazon is using an audio-based device voice activity detection (DVAD) model to understand audio cues to figure out of Alexa should respond or not. With the audio and visual mode working in tandem, Amazon claims it was able to reduce false wakes (triggered by ambient noise) by 89 per cent and also managed to reduce fake wakes triggered by the device’s responses by 42 per cent.
Once the Conversation Mode is triggered by saying “Alexa, join the conversation”, a solid blue border will be visible around the Echo Show 10 screen along with a light blue bar at the bottom of the screen, which indicates that the requests are being sent to the cloud.
To end the conversation you need to say -- “Leave the conversation.” Alexa will also automatically exit the Conversation Mode if interactions stop for a certain period of time.
The new Conversation Mode feature is coming to Echo Show 10 (3rd Gen) devices first and then will roll out to more devices.
Also Read: Amazon unveils Prime Video app for Mac users
Also Read: Elon Musk in talks with Brazil govt on monitoring Amazon rainforest