Turing Test for AI Self-Driving Cars


By Dr. Lance Eliot, the AI Trends Insider

When my children were young, I used to teach them various games such as checkers, chess, Monopoly, and other such fun pastimes. In doing so, I would watch in apt fascination as they played these various games. I was interested in how they played the games and how much further they could go beyond whatever rudimentary tactics and strategies I might have shown them. It was intriguing to gauge how much cognitive capability they had to extend their initial learning and then develop their own expanded strategies and tactics in the games.

 As I watched them make moves in the games, let’s take chess as an example, at first I could see them making all of the simplest foundation moves, along with seeing their gambits in trying to tie together several moves into a larger tactic or strategy. When I saw one of them play against another child of their same age, I realized that by watching the chess moves I could pretty quickly estimate the age of the child and whether indeed a child was playing versus an adult playing the game.

I could even detect whether the game was being played by my own son or daughter, due to my having gotten used to their line of play in chess. In other words, even if I could not see who was playing the chess game, if you gave me a list of the moves made by each participant, I could tell you which one was my son or daughter and which one was someone else.  If you tried to trick me and gave me a list of game moves in which my son or daughter were not playing in the game, I could even pretty reliably say that they weren’t playing the game and that none of the players was them.

Of course, as they got older, the nature of their game playing became more sophisticated and it became harder too to recognize their specific style or approach. It also would take longer and longer game plays for me to try and figure out who was who. Before, I could immediately discern by the opening moves as to who was whom.  But, as my children grew more complex in their thinking and their game playing, it was increasingly hard to spot their own style and it would at times only be after much of the game was played that I could make an educated guess about whom the player was.

Recently, researchers at the Universite de Toulouse and the Universite Paris-Saclay conducted an interesting study of play in the the game Go. You likely are aware that the game of Go has become a great fascination for AI as it is a game that has complexities that differ from chess and make it a lively source of insight as to how to program to play games. Some had made predictions that it would be a long time before AI could play as well as the top rated human Go players. But, in 2017, the Google AI Go playing software called AlphaGo beat the world’s top ranked player at the time, and the software was granted a 9-dan ranking.

The researchers at Toulouse and Paris-Saclay were curious about how the game of Go was being played. They wanted to study the game moves made by the players, including both the human players and the computer AI playing software packages. Remember how I mentioned earlier that when my children played chess that I could recognize it was them by the moves they made in the game? Well, the question arose by these university researchers as to whether one could discern a human player from an AI system player by the nature of their moves made.

What do you think they discovered?

Let me tell you what they did and what they found out. They first put together databases of Go game playing. They collected 8,000 Go game instances played by humans, 8,000 games played by a software package called Gnugo (known for using a deterministic game playing approach), 8,000 games played by a software package called Fuego (known for using a Monte Carlo approach), and 50 games that were played by the top AI winner AlphaGo. From these games, the researchers prepared a weighted directed network graph to be able to aid in identifying patterns of play.

 They found that some of the networks had hubs or communities, essentially areas of the network that had a dense set of incoming and outgoing links (their mathematical metrics are elaborated in their paper “Distinguishing Humans from Computers in the Game of Go: A Complex Network Approach” by Coquide, Georgeot, and Giraud).  In their analysis, they found that these hubs or communities were somewhat readily and frequently within the network graphs of the computer playing systems moves, plus these communities seemed to be weakly linked to other areas of the network. Meanwhile, in contrast, the human players appeared to have fewer such communities in their sets of moves, and interestingly the communities tended to strongly link with other areas of the network (in contrast to how the computer playing systems played).

In short, the researchers contend that via this telltale sign, a kind of signature is seen, and it would be possible to presumably distinguish between whether a computer system was a player versus whether a human was a player. We might quibble with their claim of being able to make such a distinction due to their sample size of the number of games played might seem somewhat smaller than might be otherwise more convincing if they had a much larger analysis set. Also, one might wonder whether the play characteristics of the computer players might change over time, in the sense that maybe this is just an initial evolution of AI playing of Go and that with more advancements made in the AI playing that it might begin to have a pattern more akin to the human pattern.

Anyway, the researchers opted to comment even further outside-of-the-box and proposed an interesting idea. They suggested that it might be feasible to consider a form of the Turing Test that would involve game play and then use the moves to try and ascertain which player was a human and which was the computer. If you aren’t familiar with the Turing Test, allow me a moment to share with you what that’s all about, and also why then this out-of-the-box thinking is kind of interesting.

 The Turing Test was named after the famous mathematician Alan Turing (statue shown above), whom you’ve likely heard of in your math classes or maybe you’ve seen some of the movies and documentaries made about his life. In 1950, he wrote a paper that was about the topic of how we could discern whether or not computers can think. He suggested that rather than getting mired in what “thinking” is (in other words, you would first need to define the nature of thinking to be able to say whether something does it or not), he postulated that maybe we could construct a test that would illustrate presumably whether someone or something was able to think.

The test is rather simple to describe. Imagine that you had a human behind one door, and an AI system behind another door, and then another human stood outside the two doors and asked questions of the two that are hidden behind the respective doors. The questions would be passed back-and-forth in writing and thus eliminate the chance of guessing which one was the human and which was the AI by just the mere act of communicating with the player. In those days, when Turing wrote the research paper, they didn’t have texting like we do now, and so it is obviously much easier these days to envision such a test than it would have been logistically arranged in the 1950’s (the computer would have been a gigantic hardware system that today is comparable to a typical hand-sized smartphone).

 The interrogator then asks questions of each of the two players and eventually, presumably, the interrogator either is able to discern which is the computer — or, cannot tell the difference between the two players and therefore the computer is considered equivalent to the human player. In the case of the computer being undistinguishable from the human player, the computer has passed the Turing Test and succeeded in (apparently) demonstrating that it is a thinking system since it equaled a thinking human. Notice that this then avoids the whole issue of what is “thinking” because in this case the test is solely about whether the AI can do as well as a human that thinks, and thus we have no need to understand how thinking actually operates.

 The Turing Test is a frequently mentioned item in the AI field and even outside of the AI field. It has become a kind of legendary icon that many refer to, though many aren’t actually aware of what its downsides are.

 Let’s cover some of those downsides.

One obvious concern is that the human being used in the Turing Test as a player must be intelligent enough that we would grant that they are able to appropriately play this game. In other words, if we placed a nitwit behind one of the doors and they were mentally out of it, we would be hard pressed to say that just because the AI matched the human that it means that the AI really has shown itself to be intelligent per se.

 An equal concern is the nature of the interrogator. If the interrogator does not know what they are doing, they might be asking questions that aren’t very good to help probe for intelligence per se. Imagine that the interrogator asks the two to tell how they feel. One says it feels really good and happy, while the other says that it feels sad and downtrodden. Tell me, which of the two is the human and which is the AI?  You can’t know.

 I had one smart aleck tell me that they would ask each to calculate pi to a million digits, and whichever of the two could give the answer it must be the computer. Sigh. I pointed out that the human might have a computer with them (we’ve not said that they cannot) and might use it to help determine the calculation. Or, maybe the AI could calculate it, but has been programmed not to fall for such a silly trap and will pretend that it cannot calculate the result, and thus presumably be as unable to do so as an unaided human. If both players then indicated they could not calculate pi to a million digits, could you conclude that the computer was undistinguishable from the human player?  I think not.

 There is a slew of other problems associated with the Turing Test as a type of test for discerning whether a computer might be considered a thinking thing. Won’t go into those other downsides here, but I just caution you to be mindful of not blindly believing that the Turing Test is going to one day soon allow us to declare that AI has finally arrived. The nature of the test to do so, whichever test we all decide to use, needs to be rigorous enough that we would all feel confident in what the test actually reveals.

 Now, back to the point about the researchers that claim they can distinguish between humans and computers in the playing of Go. Their added interesting idea was that maybe for the Turing Test we could dispense with asking questions of the two players, and instead have them play a game, such as Go. We could then analyze their game play afterward, and if we could not see a statistically significant difference in their game play we might then suggest that the computer has passed this modified version of the Turing Test.

 Of course, this modified version of the Turing Test is not much better than the original Turing Test. For example, we need to decide what game is going to be played. Is Go an appropriate game? Maybe some other game instead? But, either way, just because a game can be played is not much of a wide enough swath of the nature of intelligence as we know it. In other words, does the ability to play a game, whether it be Go or chess or whatever, really embody the full range of aspects that we consider involving thinking? I don’t think it does, and I am guessing you agree with me.

 I do though applaud the researchers for bringing up the idea. It points out that maybe it’s not the questions themselves that are the only focus of a Turing Test, but also the nature of how someone responds over time, in the same sense that the moves in a game showcase presumably some kind of “thinking” without having to probe directly into what thinking is.

 What does this have to do with AI self-driving cars?

Glad you asked!

 At the Cybernetic Self-Driving Car Institute, we are keenly interested in and we are pursuing how we all will be able to agree that an AI self-driving car has achieved the vaunted Level 5.  

By the word “all” here, we mean that AI researchers, government officials, auto makers, the general public, and basically everyone would be able to agree when a self-driving car has earned a Level 5 badge of honor.

 In the standard definition for the levels of AI self-driving cars, the Level 5 is the topmost level and means that a self-driving car is driven by the AI in whatever manner that a human could drive a car. On the surface of things, maybe this seems like an airtight way to describe the Level 5. Unfortunately, it is not.

Suppose I have developed a self-driving car and I place it into a parking lot that I have purposely built and shaped (putting special markers on the roadway, putting handy barriers around the perimeter, etc.). I have a human drive the self-driving car, and watch as the human parks the car, maneuvers in and around the parking spots, and so on. We then make the human become a passenger, or maybe even remove the human from the car, and we have the AI try to do the same things that the human did.

If the AI is able to drive the self-driving car in the same manner that the human did, can we declare that this self-driving car is a Level 5?

 You might say, yes, indeed it is a Level 5 because it drove as a human would. Really? That’s all we need to do to achieve a Level 5?

 Notice that we severely constrained the driving environment. It is a confined space with specialized markings to make life easier for the AI system. Also, there wasn’t much driving needed per se in the constrained space. You couldn’t go faster than maybe 10-15 miles per hour. You didn’t need to avoid obstacles because we made sure the parking lot was pristine. And on, and on.

Do you still believe that this self-driving car merits a Level 5? I would contend that we can’t really say from such a limited test that the self-driving car fulfills the true sense of what we all seem to be thinking is a Level 5 self-driving car. If you are willing to say this is a Level 5 then you would presumably be willing to say that the cars on a Disneyland ride are also at a Level 5 because they can do whatever a human can do when driving those cars.

 In a sense, we need a better way to test whether a self-driving car is a true Level 5 – you might say we need a type of Turing Test.

 Could we just use the Turing Test as is? Not really. The Turing Test is aimed towards general intelligence, while the task of driving a car is more of a limited scope form of intelligence. We can though borrow from the Turing Test and say that if we want to test a self-driving car we should have some form of interrogatives about driving and some kind of judge or judges to be able to decide whether the self-driving car has achieved a Level 5.

 We need to have an open-ended kind of test in that if we were to reveal beforehand all of the test aspects, perhaps a clever developer could try to prepare the AI to pass that particular test, but that it still would not be what we seem to intend, namely that the AI must be able to drive the car in whatever manner a human could drive a car.

 Speaking of which, when we say that a human could drive a car, are we referring to a novice driver that has just taken a beginner’s class in driving, or are we referring to the average adult driver, or maybe are we referring to a race car driver?  The nature of the comparison is crucial, just as we pointed out in the case of the Turing Test that the human behind the door must be someone we would agree is a thinking being of intelligence, otherwise we won’t have a suitable basis for comparison.

 Some say to me that we’re getting overly complicated and that if an AI self-driving car is able to drive around a neighborhood without being aided by a human that it certainly has achieved a Level 5. Any typical neighborhood is okay by these pundits, as they claim that a typical location with pedestrians, dogs, potholes, and the rest is enough. I then ask them about freeways – shouldn’t a Level 5 self-driving car be able to do that too? What about the weather conditions, such as if it is a sunny day versus a snowing day with ice on the roads?

 And, how long does the test need to be? If the self-driving car can successfully drive around for let’s say 10 minutes, is that enough time to have “proven” what it can do? I doubt any of us would think such a short time is sufficient. Also, suppose the self-driving car drives around for 5 hours, but does not encounter a child that runs out into the street. Wouldn’t we want to know that the AI is able to handle that kind of circumstance?

Some believe that we should have a detailed list of the myriad of driving situations and that we could use that list to test whether a self-driving car is really a Level 5. This might be better than no test, and better than a test that is simplistic, and so it does help get us somewhat towards being able to agree as to a Level 5 when we see it.

 Keep in mind that we’re not getting bogged down in semantics about whether something can be labeled as a Level 5. The Level 5 is an important marker for progress in the field of self-driving cars. It is the ultimate goal for all self-driving car makers that want to achieve a true self-driving car. The issue is that we might not know when we have gotten there. Without some agreed substantive means to test for it, we’ll likely have false claims that it has been achieved. This will confuse and confound all, including other AI developers, government regulators, the general public, and others.  Let’s all work together on a Turing Test for self-driving cars, and whomever comes up with it we might agree we’ll name it after that person (your chance for immortality!).

This content is originally posted on AI Trends.

Click here for the Podcast version of this column.