Master of Imitation: An Introduction to Imitation Learning in Artificial Intelligence

Reinforcement Learning has astonished the world, whether by defeating Go champions or through outstanding performance in complex video games. However, the elaborate training processes limit its use in real-world applications. In this blog post, we look at “Imitation Learning”, a machine learning method that uses expert knowledge to teach programs to behave like humans.

Meister der Nachahmung: Eine Einführung in  Imitation Learning in der Künstlichen Intelligenz

Since the historic defeat of top Go player Lee Sedol by the computer program from Google DeepMind, the power of Reinforcement Learning has become evident to both research and industry. From mastering ancient board games to conquering virtual battlefields in StarCraft II and Dota II, Reinforcement Learning has demonstrated its capabilities in handling complex tasks that were previously reserved for human intelligence. Beneath the surface of these achievements, however, lies the hard reality of the lengthy training processes involved. While it can be used successfully for simpler tasks, it is clear that this methodology is still far from being accessible for a broad range of more complex tasks in the real world.

Fortunately, however, many of these tasks are performed very well by human experts. In this blog post, we will explore the core ideas of a method called “Imitation Learning”, which can be used to leverage the knowledge of these experts to develop programs that come very close to human performance.

Framework

First, let us briefly familiarize ourselves with the framework underlying Reinforcement Learning, which is best described by the following image: An “Agent” is placed in an environment in a specific state (e.g. a car on a road, where the state tells the agent its speed and position), where it can perform actions that move the environment into another state (e.g. accelerating to speed up) and also provide a certain reward (e.g. for staying within the boundaries and the speed limit). The agent then tries to act in such a way that it receives as much reward as possible. A more detailed explanation can be found in this blog post.

With the concept of Imitation Learning, we have a person demonstrate their competence at a task while we record every step. In the example above, we would ask a particularly good driver to drive through the city while recording the states in the form of camera and sensor data and the actions in the form of steering and throttle inputs.

In the concept of Reinforcement Learning (RL), an “agent” is an autonomous system or entity that makes decisions and performs actions in an environment in order to achieve specific goals. The agent learns through interaction with the environment and adapts its behavior to obtain a maximum reward.

Behavioral Cloning as a subfield of machine learning

This brings us to the first – very simple, yet often effective – approach to Imitation Learning: In an ideal replica of the environment, we could repeat the experts’ actions and obtain the same performance, so we can train a neural network to respond to new situations with the same action as the human demonstrator. Of course, reality is not perfect. The road the expert drove on yesterday might be wet today because it has rained, the road might be full of traffic because there is a concert at the weekend, or we might simply want to drive in a completely different country. In addition, all machine learning models have a certain degree of inaccuracy, which causes the agent to always deviate slightly from the path shown by the experts.

All of this makes it increasingly difficult for the agent to take the same paths as the expert, until it eventually moves too far away from them and can no longer solve the task. For example, an autonomous vehicle trained through behavior-based imitation (behavioral cloning) would initially drive very close to the human path. But then the road is no longer as busy as it was when the human drove, and the agent drives a little faster than the human, which causes it to drift slightly off the road in the next curve. However, if this happens several times, the vehicle moves farther and farther off the road until it can no longer recover and simply drives away. This is shown schematically in the following animation.

Furthermore, we must not forget that humans do not always act the same way in complex, real-world environments; for example, you might drive a little faster when you have an appointment than when you are going shopping. This also makes direct imitation even more difficult.

Use of Reinforcement Learning

But Reinforcement Learning comes to the rescue! In modern Imitation Learning methods, a model is not trained merely to perform actions similar to those of a human; instead, it is rewarded when it behaves like the human demonstrator and penalized when it acts very differently. This reward can then be used to steer the agent back in the right direction when it begins to deviate from the learned path.

In our example above, as soon as the agent deviates too far from the road, it receives a small reward, but also recognizes that its reward increases again when it moves closer to the center of the road, allowing it to find its way back. This is illustrated in the schematic animation below.

IQ-Learn on the Robomimic Can taskwith a 6 DOF arm learning to collect cans and place them in the correct container (using 30 expert demos). Source:The Stanford AI Lab

Further links

  • Blog Post about Atari Games by DeepMind

  • Blog Post about beating Lee Sedol in Go by DeepMind

  • Blog Post about utilizing Inverse Reinforcement Learning in Google Maps by Google Research

  • Blog Post about IQ-Learn by the Stanford AI Lab

In fact, learning a reward instead of blindly trying to follow a human path incorporates much more knowledge and reasoning behind human behavior. To further illustrate how powerful this method can be even in very complex environments, below are examples from a blog post by the authors of IQ-Learn, currently the most powerful algorithm for Imitation Learning, which also uses inverse reinforcement (Inverse Reinforcement Learning). As you can see, an agent can achieve strong performance even with only 20-30 expert examples.

IQ-Learn on Minecraftwhen solving the “create waterfall” task (using 20 expert demos). Source:The Stanford AI Lab

Advancing companies with Imitation Learning

By harnessing the power of imitation, companies can unlock lucrative opportunities. From processing machine data and autonomous production to personalized recommendation systems on websites, the ability to replicate human expertise at scale holds immense potential for innovation and digital transformation. Algorithms such as IQ-Learn show that remarkable results can be achieved even with a modest number of examples.

In a collaboration, STRG and FH St. Pölten are researching, with support from the FFG (Austrian Research Promotion Agency) in a locally funded Austrian research project called STRG.agents, the capabilities of reinforcement and imitation learning in the context of online web portals. However, the areas of application for our research are already being used in industry.