Geoffrey Hinton, a professor and former Google engineering fellow, is known as “godfather of artificial intelligence” because of his contributions to the development of the technology. A cognitive psychologist and computer scientist, he pioneered work on developing artificial neural networks and deep learning techniques, such as back propagation — the algorithm that allows computers to learn.
Hinton, 75, is also a 2018 winner of the Turning Award, colloquially referred to as the Nobel Prize of computer science.
With that background, Hinton made waves recently when he announced his resignation from Google and wrote a statement to The New York Times warning of the dire consequences of AI and of his regret over having been involved in its development.
Asked about a recent online petition signed by more than 27,000 technologists, scientists and others calling for OpenAI to pause research on ChatGPT until safety protocols can be created, Hiilton called the move “silly” because AI will not stop advancing.
Hinton spoke this week with Will Douglas Heaven, senior editor for AI at MIT Technology Review, at the publication’s EmTech conference on Wednesday.
The following are excerpts from that conversation.
[Heaven] It’s been in the news everywhere you’ve stepped down from Google. Can you start by telling us why you made that decision? “There were a number of reasons. There are always a bunch of reasons for a decision like that. One was that I’m 75, and I’m not as good at doing technical work as I used to be. My memory is not as good and when I program, I forget to do things. So, it was time to retire.
“A second was, very recently, I’ve changed my mind a lot about the relationship between the brain and the kind of digital intelligence we’re developing. I used to think that the computer models we were developing weren’t as good as the brain. The aim was to see if you could understand more about the brain by seeing what it takes to improve the computer models.
“Over the last few months, I’ve changed my mind completely, and I think probably the computer models are working in a completely different way than the brain. They’re using back propagation and I think the brain’s probably not. And a couple things have led me to that conclusion and one of them is the performance of GPT-4.”
Do you have regrets that you were involved in making this? “[The New York Times reporter] tried very hard to get me to say I had regrets. In the end, I said maybe I had slight regrets, which got reported that I had regrets. I don’t think I made any had decisions in doing research. I think it was perfectly reasonable back in the ’70s and ’80s to do research on how to make artificial neural networks. It wasn’t really foreseeable — this stage of it wasn’t foreseeable. Until very recently, I thought this existential crisis was a long way off. So, I don’t really have any regrets over what I did.”
Tell us what back propagation is. This is an algorithm you developed with a couple of colleagues back in the 1980s. “Many different groups discovered back propagation. The special thing we did was used it to and showed it could develop good internal representations. And curiously, we did that by implementing a tiny language model. It had embedding vectors that were only six components and a training set that was 112 cases, but it was a language model; it was trying to predict the next turn in a string of symbols. About 10 years later, Yesher Avenger took the same net and showed it actually worked for natural language, which was much bigger.
“The way back propagation works: …imagine you wanted to detect birds in images. So an image, let’s suppose it was 100 pixels by 100 pixels image, that’s 10,000 pixels and each pixel is three channels RGB (red, green, blue in color), so that’s 30,000 numbers intensity in each channel in pixel that represents the image. The way to think of the computer vision problem is how do I turn those 30,000 numbers into a decision as to whether it’s a bird or not. And people tried for a long time to do that and they weren’t very good at it.
“But here’s the suggestion for how you might do it. You might have a layer of feature detectors that detects very simple features in images, like for example edges. So a feature detector might have big positive weights to a column of pixels and then big negative weights to the neighboring column of pixels. So, if both columns are bright, it won’t turn on. If both columns are dim, it won’t turn on. But if the column in one side is bright and the column on the other side is dim, it’ll get very excited. And that’s an edge detector.
“So, I just told you how to wire an edge detector by hand by having one column with big positive weights and the other column with big negative weights. And we can imagine a big layer of those detecting the edges of different orientations and different scales all over the image.
“We’d need a rather large number of them.”
The edge in an image is a line? “It’s a place where the intensity goes from light to dark. Then we’d might have a layer of feature detectors above that detects combinations of edges. So, for example, we might have something that detects two edges that join at a fine angle. So, it would have a big positive weight to those two edges and if both of those edges are there at the same time, it’ll get sighted. That would detect something that might be a bird’s beak.
“You might also in that layer have a feature detector that would detect a whole bunch of edges arranged in a circle. That may be a bird’s eye, or it might be something else. It might be a nob on a fridge. Then in a third layer you may have a feature detector that detects this potential beak, and it detects a potential eye and it wired up so that if a beak and an eye are in the right special relation to one another and it says, ‘Ah, this might be the head of a bird.’ And you can imagine if you keep wiring it like that, you can eventually have something that detects a bird.
“But wiring all that up by hand would be very difficult. It would be especially difficult because you’d want some intermediate layers for not just detecting birds but also for other things. So, it would be more or less impossible to wire it up by hand.
“So, the way back propagation works is you start with random weights. So these features you enter are just rubbish. So you put in a picture of a bird and in the output it says like .5 is a bird. Then you ask yourself the following question: how can I change each of the weights I’m connected to in the network so that instead of saying .5 is a bird, it says .501 is a bird and .499 and it’s not.
“And you change the weights in the directions that will make it more likely to say a bird is a bird and less likely to say a number is a bird.
“It’s as if some genetic engineers said, ‘We’re going to improve grizzly bears; we’ve already improved them with an IQ of 65, and they can talk English now, and they’re very useful for all sorts of things, but we think we can improve the IQ to 210.'”
“And you just keep doing that, and that’s back propagation. Back propagation is how you take a discrepancy between what you want, which is a probability — 0.1 that it’s a bird and probably 0.5 it’s a bird — and send it backwards through the network so you can compute for every feature set in the network, whether you’d like it to be a bit more active or a bit less active. And once you’ve computed that, and if you know you want a feature set to be a bit more active you could increase the weights coming from feature detections that are more active and maybe put in some negative weights to know when you’re off and now you have a better detector.
“Back propagation is just going backwards through the network to figure out which feature set you want a little more active and which one you want a little less active.”
Image detection…is also the technique that underpins large language models. This technique, you initially thought of it as almost like a poor approximation of what biological brains do, but it has turned out to do things that I think have stunned you, particularly in large language models. Why has that…almost flipped your thinking of what back propagation or machine learning in general is? “If you look at these large language models, they have about a trillion connections. And things like GPT-4 know much more than we do. They have sort of common-sense knowledge about everything. And so they probably know about 1,000 times as much as a person. But they’ve got a trillion connections and we’ve got 100 trillion connections, so they’re much, much better at getting knowledge into a trillion connections than we are. I think it’s because back propagation may be a much better learning algorithm than what we’ve got. That’s scary.
What do you mean by better? “It can pack more information into only a few connections; we’re defining a trillion as only a few.”
So these digital computers are better at learning than humans, which itself is a huge claim, but then you also argued that’s something we should be scared of. Why? “Let me give you a separate piece of the argument. If a computer is digital, which involved very high energy costs and very careful calculation, you can have many copies of the same model running on different hardware that do exactly the same thing. They can look at different data, but the models are exactly the same. What that means is, they can be looking at 10,000 sub-copies of data and whenever one of them learns something, all the others know it. One of them figures out how to change the weights so it can deal with this data, and so they all communicate with each other and they all agree to change the weights by the average of what all of them want. Now the 10,000 things are communicating very effectively with each other, so that they can see 10,000 times as much data as one agent could. And people can’t do that.
“If I learn a whole lot about quantum mechanics, and I want you to know a lot of stuff about that, it’s a long painful process of getting you to understand it. I can’t just copy my weights into your brain because your brain isn’t exactly the same as mine. So, we have digital computers that can learn more things more quickly and they can instantly teach it to each other. It’s like if people in the room could instantly transfer into my head what they have in theirs.
“Why is that scary? They can learn so much more. Take an example of a doctor. Imagine you have one doctor who’s seeing 1,000 patients and another doctor who’s seeing 100 million patients. You’d expect the doctor who’s seeing 100 million patients — if he’s not too forgetful — to have noticed all sorts of trends in the data that just aren’t as visible if you’re seeing [fewer] patients. You may have only seen one patient with a rare disease; the other doctor has seen 100 million patients… and so will see all sorts of irregularities that just aren’t apparent in small data.
“That’s why things that can get through a lot of data can probably see structuring data that we’ll never see.”
OK, but take me to the point of why I should be scared of this. “Well, if you look at GPT-4, it can already do simple reasoning. I mean, reasoning is the area where we’re still better. But I was impressed the other day with GPT-4 doing a piece of common sense reasoning I didn’t think it would be able to do. I asked it, ‘I want all the rooms in my house to be white. But present, there are some white rooms, some blue rooms and some yellow rooms. And yellow paint fades to white within a year. What can I do if I want them to all to be white in two years?’
“It said, ‘You should paint all the blue rooms yellow. That’s not the natural solution, but it works. That’s pretty impressive common-sense reasoning that’s been very hard to do using symbolic AI because you have to understand what fades means and you have to understand bitemporal stuff. So, they’re doing sensible reasoning with an IQ of like 80 or 90. And as a friend of mine said, it’s as if some genetic engineers said, we’re going to improve grizzly bears; we’ve already improved them with an IQ of 65, and they can talk English now, and they’re very useful for all sorts of things, but we think we can improve the IQ to 210.”
I’ve had that feeling when you’re interacting with these latest chatbots. You know, that hair-on-the-back-of-your-neck uncanny feeling, but when I’ve had that feeling, I’ve just closed my laptop. “Yes, but these things will have learned from us by reading all the novels that ever were and everything Machiavelli ever wrote [about] how to manipulate people. And if they’re much smarter than us, they’ll be very good at manipulating us. You won’t realize what’s going on. You’ll be like a two-year-old who’s being asked, ‘Do you want the peas or the cauliflower,’ and doesn’t realize you don’t have to have either. And you’ll be that easy to manipulate.
“They can’t directly pull levers, but they can certainly get us to pull levers. It turns out if you can manipulate people, you can invade a building in Washington without ever going there yourself.”
If there were no bad actors — people with bad intentions — would we be safe? “I don’t know. We’d be safer in a world where people didn’t have bad intentions and the political system is so badly broken that we can’t even decide not to give assault rifles to teenage boys. If you can’t solve that problem, how are you going to solve this problem?”