From Siri to Alexa to Google, we are surrounded by AI systems that have been designed with a single goal: to understand us.
We’ve seen incredible progress already. By performing hundreds of billions of calculations in the blink of an eye, the latest AI techniques can understand certain types of text with human-level accuracy. The challenge becomes significantly more daunting, however, when text is part of a larger conversation, where it requires considering context to interpret what the user means and decide how to respond. Still, chatbots like Facebook’s BlenderBot 2.0 seem to foreshadow far less frustrating interactions with AI.
But here’s the catch: The more complexity we add to these conversational AI bots, the more difficult it becomes to meet our expectation of a response in real time. BlenderBot 2.0 is a perfect example. Because it addresses the key limitations of BlenderBot 1.0, including its lack of long-term memory, 2.0 is much more complicated than its predecessor. And as a result, it’s harder to expedite the machine learning (ML) that makes it work behind the scenes.
The speed limit of conversational AI and chatbots
There’s no secret to holding a natural conversation. Instead, it takes a mind-numbingly massive network of ML models, which each solve a small piece of the puzzle in determining what to say next. One model might consider the location of the user, another the history of the interaction, and another the feedback that similar responses have received in the past — with every model adding precious milliseconds to the latency of the system.
The real limit for conversational AI, in other words, is our patience.
The depths of dependency hell
Our expectations for AI are fundamentally different in an academic context, where we’re content to wait hours or even days for results, compared to a live environment, where we demand an immediate response. For conversational AI bots in particular, every potential improvement must be weighed against the desire for lower latency.
That latency is a product of what’s called the “critical path”: the shortest sequence of linked ML models required to go from an input, the user’s message, to an output, the bot’s response. This is an old concept from project management, but it’s especially relevant to today’s ML networks when trying to avoid unnecessary steps.
So how do you find the critical path? It all comes down to dependencies, which have long been a defining problem of software development in general. For any kind of connected software architecture, improving one application can force engineers to update the entire system. Sometimes, though, an update that’s essential for Application A is incompatible with Applications B, C, and D.
This is known as “dependency hell.” And without extraordinary attention to detail, machine learning dependencies take that frustration to new depths.
Normal software dependencies rely on APIs that communicate the simple, discrete state of a given application, such as a cell in a spreadsheet changing from red to green. APIs allow engineers to develop each application somewhat independently, while ensuring that they stay on the same page. But with ML dependencies, engineers instead deal with abstract probability distributions, which means it’s not obvious how changes to one model should impact the larger ML network. Only by mastering these nuanced relationships between models can we make conversational AI a reality — let alone a real-time experience.
Saving time by skipping steps
To get good at conversational AI dependencies, you need to combine machine learning with human intuition.
For example, our conversational AI bot is designed to solve employees’ requests, whether they want a PowerPoint license or have a question about the PTO policy. It turns out that even superficially simple issues lead you deep into dependency hell. The answer to a PTO question might be buried on page 53 of the employee handbook, and it might be different for a salesperson in Canada than for an engineer in Spain. Add on the challenge of ignoring irrelevant details, like the employee’s Hawaiian vacation plans, and you’ve got dozens of specialized ML models that must all operate as a unit.
The trick is determining which models — which steps in the critical path — are necessary to solve each issue. The first step is natural language understanding, or NLU, whose goal is to transform unstructured text into machine-actionable information. Our NLU is a pipeline of many ML models that correct for typos, recognize key entities, separate the signal from the noise, figure out the user’s intent, and so on. With this information in hand, we can start to winnow out unnecessary models downstream.
That means making a prediction about what a helpful solution to the issue could be — before analyzing the actual solutions that the company has available. An employee who asks for access to PowerPoint might benefit from a software license or a request form, but they almost certainly don’t want a map of the new office. By leveraging the information from our NLU process, we can predict which models to activate and which models to bypass, through what’s called a “pre-trigger” system.
Given the abstract nature of the probability distributions involved, our pre-trigger system relies on both machine learning inputs and intuition-based rules from human experts. Ultimately, spending time where it counts is both an art and a science.
Making room to make progress with conversational AI bots
No one knows what conversational AI will look like in ten years. What we do know, however, is that we need to optimize our chatbots now to make room for future progress. If we want to maintain a conversational experience, we can’t keep adding more and more complexity without considering the latency of the whole system.
Contrary to science fiction, the “breakthroughs” we see in artificial intelligence are the product of many small, incremental improvements to existing models and techniques. The work of optimizing conversational AI isn’t made for the movies, and it rarely happens overnight. But it’s these years of tireless energy — not single sparks of genius — that are allowing chatbots to understand us and help us, in real time.