A Pioneering Scientist Explains ‘Deep Learning’


Buzzwords like “deep learning” and “neural networks” are everywhere, but so much of the popular understanding is misguided, says Terrence Sejnowski, a computational neuroscientist at the Salk Institute for Biological Studies.

Terrence Sejnowski

Sejnowski, a pioneer in the study of learning algorithms, is the author of The Deep Learning Revolution (out next week from MIT Press). He argues that the hype about killer AI or robots making us obsolete ignores exciting possibilities happening in the fields of computer science and neuroscience, and what can happen when artificial intelligence meets human intelligence.

The Verge spoke to Sejnkowski about how “deep learning” suddenly became everywhere, what it can and cannot do, and the problem of hype.

First, I’d like to ask about definitions. People throw around words like “artificial intelligence” and “neural networks” and “deep learning” and “machine learning” almost interchangeably. But these are different things — can you explain?

AI goes back to 1956 in the United States, where engineers decided they would write a computer program that would try to imitate intelligence. Within AI, a new field grew up called machine learning. Instead of writing a step-by-step program to do something — which is a traditional approach in AI — you collect lots of data about something that you’re trying to understand. For example, envision you’re trying to recognize objects, so you collect lots of images of them. Then, with machine learning, it’s an automated process that dissects out various features, and figures out that one thing is an automobile and the other is a stapler.

Machine learning is a very large field and goes way back. Originally, people were calling it “pattern recognition,” but the algorithms became much broader and much more sophisticated mathematically. Within machine learning are neural networks inspired by the brain, and then deep learning. Deep learning algorithms have a particular architecture with many layers that flow through the network. So basically, deep learning is one part of machine learning and machine learning is one part of AI.

What can deep learning do that other programs can’t?

Writing a program is extremely labor-intensive. Back in the old days, computers were so slow and memory was so expensive that they resorted to logic, which is what computers work on. That’s their fundamental machine language as to manipulate bits of information. Computers were just too slow and computation was too expensive.

But now, computing is getting less and less expensive, and labor is getting more expensive. And computing got so cheap that it became much more efficient to have a computer learn than have a human being write a program. At that point, deep learning actually began to solve problems that no human has ever written a program before, in fields like computer vision and translation.

Learning is incredibly computational-intensive, but you only have to write one program, and by giving it different data sets you can solve different problems. You don’t have to be a domain expert. So there are thousands of applications for anything where there’s a lot of data.

“Deep learning” seems to be everywhere now. How did it become so dominant?

I can actually pinpoint that to a particular moment in history: December 2012 at the NIPS meeting, which is the biggest AI conference. There, [computer scientist] Geoff Hinton and two of his graduate students showed you could take a very large dataset called ImageNet, with 10,000 categories and 10 million images, and reduce the classification error by 20 percent using deep learning.

Traditionally on that dataset, error decreases by less than 1 percent in one year. In one year, 20 years of research was bypassed. That really opened the floodgates.

Deep learning is inspired by the brain. So how do these fields — computer science and neuroscience — work together?

The inspiration for deep learning really comes from neuroscience. Look at the most successful deep learning networks. That’s convolutional neural networks, or CNNs, developed by Yann LeCun.

If you look at the architecture of the CNNs, it’s not just lots of units, they’re connected in a fundamental way that mirrors the brain. One part of the brain that’s best studied in the visual system and fundamental work in the visual cortex show that there are simple and complex cells. If you look at the CNN architecture, there are the equivalents of simple cells, and the equivalent of complex cells and it comes directly from our understanding of the visual system.

Yann didn’t slavishly try to duplicate the cortex. He tried many different variations, but the ones he converged onto were the ones that natureconverged onto. This is an important observation. The convergence of nature and AI has a lot to teach us and there’s much farther to go.

How much does our understanding of computer science depend on our understanding of the brain?

Well, much of our present AI is based on what we knew about the brain in the 60s. We know an enormous amount more now and more of that knowledge is getting incorporated into the architecture.

AlphaGo, the program that beat the Go champion included not just a model of the cortex, but also a model of a part of the brain called the basal ganglia, which is important for making a sequence of decisions to meet a goal. There’s an algorithm there called temporal differences, developed back in the ‘80s by Richard Sutton, that, when coupled with deep learning, is capable of very sophisticated plays that no human has ever seen before.

As we learn about the architecture of the brain and as we begin to understand how they can be integrated into an artificial system, it will provide more and more capabilities way beyond where we are now.

Read the source article in The Verge.