Google previewed a new skill for its Google Assistant during Google I/O this week. Called Google Duplex, its purpose is to make calls on our behalf, conducting the conversations in a natural sounding and flowing manner to help perform real world tasks.
Any sufficiently advanced technology is indistinguishable from magic
Arthur C. Clarke
An oft overused phrase, banned in journalism, but this editor honestly felt it applied here after hearing Google’s demos of making table reservations and booking hair appointments unaided.
If you’ve not heard Google Duplex do its thing then listen to the example below:
Duplex scheduling a hair appointment:
Audio courtesy of Google
Eerily sounding this, isn’t it? Google Duplex has even been mentioned in relation to ‘The Turing Test’. A test developed by Alan Turing in the 50’s which is used to determine if an Artificial Intelligences behavior is indistinguishable from that of a human – high praise indeed. Certainly, we only got to hear the best examples, even if we would love for Google to release a ‘Gag Reel’ of Google Duplex’s earlier conversations!
The step change made with Google Duplex is its ability to hold naturally sounding conversations thanks its advances in understanding, interacting, timing and speaking which ensures that the recipients don’t have to adapt to talking to a machine. It’s this ability to successfully ‘fool’ the person at the other end of the line that prompted philosophical and ethical concerns.
Google has since responded by stating that Google Duplex would appropriately identify itself during its conversations.
How does Google Duplex achieve this?
Google Duplex employs the TensorFlow Extended (TFX) Machine Learning platform to create a Recurrent Neural Network (RNN) with speech processing duties being handled via an Automatic Speech Recognition (ASR) and Text To Speech (TTS) engines controlling intonation depending on the circumstances.
Automatic Speech Recognition (ASR) converts recipients speech to text, the text is analyzed within context, with the response being converted to speech for the recipient via a Text To Speech (TTS) engine.
To achieve the required quality of interaction, Google Duplex is trained in narrow Domains like booking a hair appointment. Training is undertaken in real-time and is supervised by a human operator who will monitor the interactions and intervene as and when appropriate. These highly trained instructors keep overseeing the training until the conversation performs at the quality level required. At this point, Google Duplex is free to operate on its own.
Is Google recording every conversation? In the UK and many US States, you only need one party’s consent and you’re legally able to record the call. However, certain US States require consent from both parties. Will Google Duplex not operate when calling that State or will Google Duplex ask for permission? – a conversation killer if we ever heard one.
Google Duplex, as it expands into other domains, may also require knowing more personal information in order to fulfill your requests. We can decide not to provide that information but at the detriment of the quality of the service that Google Duplex can provide.
Speaking and listening like us
Nuances around timing are also employed to assist in holding a natural conversion. Informed by user studies, Google was able to match its latency to people’s expectations.
Responding rapidly in response to “Hello” and pausing for a more considered response when answering questions better mimics how we hold a conversation. Additionally, Google has employed speech disfluencies to create breaks during the conversation, for example, “erms”, “hmms” that makes the speech produced by Google Duplex even more naturally sounding.
Understanding humans’ responses is even more challenging as we use complex sentences, sometimes contradictory, often unstructured that rely on context all with background noise. “OK for 4”, during longer conversations may be referring to the time of the reservation or for the number of people.
Example of a complex statement:
Audio courtesy of Google
Were going to commit another journalistic faux pas and bring out another overused quote here
You only get one chance to make a first impression, you better get it right
For Google Duplex to become a daily part of our lives it has to get it right, for both us and the businesses. How many times have our digital assistants failed to do what we ask of them today?
As users, we often abandon functionality that fails to meet our expectations – but with Google Duplex, as the name implies it flows both ways. If the business on the other end deems that its time has been wasted due to a poor interaction, it won’t be long before we see businesses put the phone down on Google Duplex or block the numbers altogether.
Google states that Google Duplex is self-monitoring, in the event of a task that it can’t complete autonomously it signals a human operator who will complete the task. How this manifests itself in reality for some of these edge cases remains to be seen.
Predicted next steps?
Taken to its logical conclusion we envisage Google Duplex making a difference in many many scenarios.
Given a deeper level of integration with our cars than we have today, envisage a scenario where during a crash, if our airbags deploy, Google Duplex can summon the emergency services on our behalf automatically.
The onset of a stroke can result in trouble speaking, another scenario where Duplex could be utilized to summon the emergency services. With the simple placement of cost-effective smart home buttons around the home, combined with IFTTT, Google Duplex could summon help for the elderly in the event of a problem.
Tasks that seem mundane for the majority can be profoundly challenging for people with disabilities. In these scenarios Google Duplex will turn out to be truly liberating for those individuals.
Our experiences while on holiday and for ex-patriots will be significantly enhanced once additional languages are supported.
Currently, Google has focused on Google Duplex initiating calls on our behalf. Its not a huge leap until Duplex starts answering our calls when were busy and taking the appropriate actions.
Even today it seems it would be capable of transcribing voice-mails as a minimum for us. Also taking action if the call/transcript was from the hair salon that was busy when Duplex first called it and calling it back. Or more usefully, responding directly to the call-back from the hair salon.
With Google Duplex making calls, its natural to expand our thinking beyond smartphones. With Googles ever-expanding reach, including our wrists, our TVs, speakers and digital displays in the kitchen its entirely plausible that well be able to initiate our requests from all of our connected devices without having to dig out our phones.
Literally, and figuratively, you haven’t heard the last from Google Duplex, Google plans to test this summer. But in the meantime, you can listen to some additional examples below:
Duplex reserving a table:
Another restaurant reservation:
Asking for Holiday hours:
Audio courtesy of Google