Human-Aided Training of Deep Reinforcement Learning for AI Self-Driving Cars


When my daughter first started to drive a car, I was excited for her due to the aspect that she’d have a new found sense of independence and would be able to go places without having to constantly find someone to give her a lift. I knew she was a very responsible teenager and had a good head on her shoulders, and so the hefty responsibility of driving a car was something she would take quite seriously. The key was to help make sure that she was proficient enough in driving so that she would be able to drive safely. This was not only so that she personally would drive safely, but that she would also be wary of the idiot drivers out there that could readily get her into danger or even ram into her.

I sat in the front passenger seat of the car and watched as she got settled into the driving position, which was both shocking and exhilarating. She had watched me drive, many times, and often asked questions about the nature of driving. She was astute to the controls of the car and also the need to be watching the traffic and pedestrians. It’s one thing though to have learned by observing someone else, and a whole another ballgame when you are in the actual driver’s seat. She was shifting from a learned observer to now an active participant that was going to be in-charge of the driving task.

One of the difficulties often with a parent helping to teach their child how to drive involves the dynamics between the parent and the child. In essence, the same kind of potential tensions or issues that might be involved in the day-to-day interactions are equally going to play out while the child is driving a car. This can be a very dangerous situation. If the child gets berated by the parent or frustrated by the parent, the ability to control the car can be lessened rather than improved. Parents that opt to get into a bitter fight with their child while in the parent-child driving task are unwittingly putting them both into danger and equally imperiling others that are on-the-road. This is why many parents choose to have a professional trainer do this task, partially to avoid the parent-child dynamics and of course also because the professional trainer knows techniques of car driving that perhaps the parents don’t know or they aren’t aware of how to teach it.

Another aspect of teaching someone to drive involves the frequency of feedback during the driving task. For a teenager learning to drive, some parents are tempted to provide a stream of commentary. Ease your foot off the brake, don’t let the car veer to right, keep your eyes on the road, watch your speed, face forward and keep a straight back, notice that kid on a bike behind you, and so on. The parent often thinks this is helpful, but it can actually bombard the teen driver with too much information at once, and also disrupt their concentration. It can wear on the student driver and lead to anger towards the parent. This can then trigger a verbal battle and the learning devolves into a massive fight between parent and child.

With my daughter, I gauged that a minimal amount of feedback would be best, and should occur on an appropriately timed basis. If the feedback comes too long after something has occurred, it definitely would not have as much a learning impact as if provided closer to the actual circumstance or situation that happened. The feedback needed to be timed to occur either just before, during, or just after an aspect of something noteworthy. As the passenger and trainer, if you can see that the car is aiming toward sideswiping a parked car, you’ve got to make a quick judgement as to whether to render a comment before it occurs, or wait and see if the sideswipe isn’t going to happen but that it might be worth pointing out how close things came. This awareness of how much feedback to provide and the timing of it was also something that needed to adjust over time. The more proficient my daughter became at driving, the feedback aspects needed to be adjusted accordingly.

What does this have to do with AI self-driving cars?

At the Cybernetic Self-Driving Car Institute, we are using human-training to aid in doing drive-training for the AI of self-driving cars.

There are various ways to teach the AI of a self-driving car about the driving task.

First, AI developers can try to program directly the AI about how to drive a car. This involves identifying various driving algorithms and writing the programming code that implements those algorithms. Unfortunately, this can be very labor intensive to do, it can take a long time to do, and the odds of the code covering all various facets of driving and the myriad of driving situations is problematic. Thus, this form of “teaching” is often done for the core of the AI in terms of the driving task, and then other techniques are used to augment it.

Second, there is learning by being directly taught. In this case, the AI is almost like a blank slate and has been developed to observe what the human does, and then try to mimic those actions. This can be handy, but it also often lacks the context of the driving task. In other words, the human driver might show the AI how to turn the wheel or how to make a quick start, but the AI won’t know in what context these actions should occur.

Third, let the AI try driving a car and then have some form of self-correcting feedback that the AI uses to adjust accordingly. This is popular with the use of car driving simulations. You devise the AI so that it is able to drive a simulated car. You setup that the simulated car should not go off the roadway of the simulation. The AI tries to drive the simulated car, and when it goes off the simulated road it docks itself points. It has a goal of trying to score points rather than lose points. So, it gradually coalesces toward not driving off the road. It does this by self-correcting as based on a set of constraints or limits, and some kind of rewards or punishment points system.

This approach doesn’t work so well in the real-world since you wouldn’t want an actual car to continually be going off the road or crashing into walls, and so instead this is done with a simulation. And the nice thing about a simulation is that you can have it run hundreds, thousands, or even millions of times. The simulated car can go on and on, for as much simulated instances as needed, in order for the AI to catch onto what to do.

Machine learning comes to play here. An artificial neural network can be fed hundreds, thousands, or hundreds of thousands of pictures of the backs of cars, and gradually devise a pattern of what cars look like from behind. This then helps for the self-driving car’s cameras in that when an image is captured while the car is driving along, the neural network can readily identify what’s a car ahead of the self-driving car and what might not be a car. In a sense, this form of machine learning involves making lots of observations (looking at pictures of the back’s of cars), and then finding patterns that are able to find the key aspects in those pictures.

Another way to learn the driving task involves having the AI try driving the car and then have a human offer commentary to the AI system.

This is quite similar to my points earlier about teaching my daughter to drive. A human “passenger” provides feedback to the driver (the AI in this case), and the driver then adjusts based upon the feedback provided. Some call this feedback a “critique” and the AI is setup as a deep reinforcement learner. This is considered “deep” because the critiques are occurring as part of the more advanced learning aspects, and it is considered a form of “reinforcement” because it advises the AI to either do more of something or do less of something.  It reinforces proper behavior and let’s say reinforces avoidance of improper behavior.

A recent research paper presented at the American Association for Artificial Intelligence (AAAI) annual conference described an AI setup akin to this notion of providing critiques for deep reinforcement learning (in a paper entitled “Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces”).  The researchers Garrett Warnell, Nicholas Waytowich, Vernon Lawhern, and Peter Stone (associated with the U.S. Army Research Lab, Columbia University, and the University of Texas at Austin), were interested in seeing if they could use human trainers to guide deep neural networks in performing somewhat complex tasks. Over a series of sequential decision making points, the interaction of the humans with the autonomous agents was intended to provide guidance to the AI system. The researchers called the system Deep TAMER as it was an extension of the TAMER system, and opted to try this out on the Atari game of bowling. Their efforts were fruitful and showed that in relatively quick time the human trainers were able to dramatically aid the AI in improving its scores in the game.

This can be done with AI self-driving cars too.

Real-time feedback (or critiques) are conveyed or communicated to the AI deep reinforcement learning system, in order to improve the driving skills of the AI. Similar to my description about teaching my daughter, the feedback needs to be done on a timely basis, and associated somewhat immediately with the unfolding of the driving task during a driving effort. The feedback needs to be clear cut and focused on the nature of the driving task.

For my daughter, she could filter out feedback that was not relevant to the driving task, such as if while she was driving we also talked about her homework due the next day or that the weather is particularly sunny that day, but with the AI system we constrain the feedback to a focused set of feedback commands. You could argue that we ought to add a Natural Language Processing (NLP) element to the AI driving system so that the human trainer could indeed just speak as though they were talking to another human. This is indeed part of the direction we are headed in these efforts. Not quite there just yet.

It is important to also be gauging how the learner is doing during the feedback sessions. You want to ensure that the AI is not becoming overly reliant on the feedback. This could become an unintended consequence of the training, namely that the AI system starts to over-fit to the human trainer. With my daughter, her desire for independence was a counter-weight that prevented her from becoming overly reliant on my feedback while she was learning to drive. Her goal was to get rid of the human trainer as soon as possible, thus gaining her own independence (and it wasn’t because she didn’t want me there, but only because she wanted to be able to do the driving on her own).

The AI for the self-driving car exhibits a high-dimensional state space, meaning that when you consider all of the decision making factors involved in driving a car there are many dimensions involved. Rather than using large amounts of training data to try and provide complete guidance, we augment the training via the use of human trainers. Their input aids in the AI self-adjusting internally, after having undertaken other forms of training.

For the AI system, here’s some aspects about the feedback being provided that are notable to the design of the human-training:

Too Little Feedback

The human trainer has to judge how much feedback to provide to the AI self-driving car. Too little feedback can be bad because the AI isn’t getting what it needs in order to improve in the driving task.

Too Much Feedback

The human trainer has to be cautious in giving excessive feedback. Besides it cluttering up the AI in terms of what it is learning, there is the other danger of the AI becoming overly reliant on the human training.

Disruptive Feedback

The feedback can be inadvertently disruptive to the AI. If the AI was in the midst of ascertaining an action plan, and the feedback occurs, the AI might not complete the action plan or be otherwise distracted from the needed elements of the driving task.

Irrelevant Feedback

To control for irrelevant feedback, we constrain the set of feedback statements that the human trainer can provide. This admittedly is not the way of the real-world, in that a human training another human could be as irrelevant as they wanted to be, but even with human learners they might have a difficult time figuring out what feedback is on-target to the task and which feedback has no bearing on the task. We preempt that from happening by having a strict list of feedback possibilities.

Inconsistent Feedback

The potential for inconsistent feedback and even conflicting feedback can be a difficulty for the AI system. Suppose that the human trainer says to speed-up when taking a curve, but then later on the human says to slow down when taking that same curve. What is the AI to make of this seemingly inconsistent or conflicting feedback? We have the AI system indicate to the human trainer that the feedback being provided seems inconsistent, thus at least alerting the human trainer to the aspect (and the human trainer can then possibly adjust if indeed they are needlessly being inconsistent).

Apt, Contributory, Timely Feedback

The aim is to have human trainers that are providing apt, contributory, and timely feedback to the AI system. This is accomplished by having human trainers that are well versed in doing this training and that are earnestly trying to do the training. This might be the same as the training with my daughter, namely that I was earnestly desirous of helping her to drive (you can bet that was the case!). Imagine if she had someone in the car that was not so earnest and instead was maybe even purposely trying to confuse her about the nature of the driving task.


Providing human-training to the AI of a self-driving car is a means to rapidly improve the AI capability for the self-driving task. It does not replace other means of teaching the AI to drive a car, and instead it is used to augment the other techniques. Designing the AI for this purpose is an added challenge and not something that the AI would normally be structured to do. It involves making the tactical and strategic AI driving elements ready for receiving feedback and be able to adjust according to the feedback provided.

Even though we are all trying to head toward AI self-driving cars that are true self-driving cars, normally referred to as Level 5, which is the highest level of self-driving cars and refers to a self-driving car that can drive in whatever manner a human could drive a car, just imagine if we not only taught the AI by using human trainers, but suppose one day we had AI self-driving cars that taught humans to drive.

I realize that the self-driving utopians are wanting to eventually do away with all human driving, but I am not so sure that’s the world that everyone agrees should be our future. Some believe that we will always want to reserve the ability to drive a car. With a world of predominantly AI self-driving cars, humans might gradually forget how to drive a car. In that case, maybe we could possibly have the AI of a self-driving car be the driver trainer for a human driver.

Thankfully, that day had not yet arrived when I got a chance to teach my daughter how to drive. It was a memorable experience for us both.

by Dr. Lance Eliot, AI Trends Insider