Skip to main content

Beyond Chat GPT, Russell on AI


 Stuart Russell gives both an informed and easy to follow set of responses to questions raised at the Common Wealth Club of California. There is little new, in order to add to my understanding, in what he states here, but he does put it forward concisely. 

Stuart Russell is a Professor of Computer Science, Director of the Kavli Center for Ethics, Science, and the Public, and Director of the Center for Human-Compatible AI, University of California, Berkeley; Author, Human Compatible: Artificial Intelligence and the Problem of Control.

An example of Russel's thought is below:

'But the drawback in doing that is that we have to specify those objectives, right? The machines don't dream them up by themselves. And if we mis-specify the objectives, then we have what's called a misalignment between the machine's behaviour and what humans want the future to be like.

The most obvious example of that is in social media, where we have specified objectives like maximising the number of clicks and maximising the amount of engagement of the user. The machine learning algorithms that decide what billions of people read and watch have more control over human cognitive intake than any dictator, you know, than the North Korean or Stalin or anyone has ever had.

And yet they're totally unregulated. So those algorithms learn how to maximise those objectives, and they figured out that the best way to do it is not to send you what you're interested in, but actually to manipulate you over time by thousands of little nudges so that you become a much more predictable version of yourself.

Because the more predictable you are, the more they can monetise you. And so they learned how to do that. And at least empirically, it looks as if the best way to do that is to make you more extreme, right? So that you start to consume that red meat that then whole human industries spring up to feed.

And this misalignment is the source of the concern that people have had about AI. Going right back to Alan Turing, who was the founder of computer science, in a 1951 lecture. He said that once the machine-thinking method had started, it would leave our feeble powers far behind. And we should have to expect the machines to take control.

So they take control not because they're evil or because they spontaneously develop consciousness or anything like that. It's just because we give them some objectives that are not aligned with what we want the future to be like. And because they're more capable than us, they achieve their objectives and we don't, right? So we set up a chess match which we proceed to lose.

So in order to fix that problem, I've been following a different approach to AI, which says that the AI system, while its only objective is to further the interests of human beings, doesn't know what those are and knows that it doesn't know what those are. It's explicitly uncertain about human objectives. And so to the extent that there's a moral theory, it's simply that the job of an AI system is to further human interest. It knows that it doesn't know what those are, but it can learn more by conversing with us, by observing the choices that we make and the choices that we regret, the things that we do and the things that we don't do. So this helps it to understand what we want the future to be like. And then as it starts to learn, it can start to be more helpful.

There are still some difficult moral questions, of course. The most obvious one is that it's not one person's interest. It's not one set of values. There's 8 billion of us, so there's 8 billion different preferences about the future and how do you trade those off? And this is a two-and-a-half-thousand-year-old question, at least, and there are several different schools of thought on that. And we better figure out which is the right one, because we're going to be implementing it fairly soon.

And then there are even more difficult questions like, well, what about not the 8 billion people who are alive, but what about all the people who have yet to live? How do we take into account their interests? Right, right. What if we take actions that change who's going to live? You change the number of people who are going to live. For example, the Chinese policy of one child per family probably eliminated 500 million people already. Now they never existed. So we don't know what they would have wanted, but how, you know, how should we make that type of decision? Right. These are really difficult questions that philosophers really struggle with. But when we have AI systems that are sufficiently powerful that they could make those decisions, we need to have an answer ready so that we don't get it wrong.'

Comments

Popular posts from this blog

The AI Dilemma and "Gollem-Class" AIs

From the Center for Humane Technology Tristan Harris and Aza Raskin discuss how existing A.I. capabilities already pose catastrophic risks to a functional society, how A.I. companies are caught in a race to deploy as quickly as possible without adequate safety measures, and what it would mean to upgrade our institutions to a post-A.I. world. This presentation is from a private gathering in San Francisco on March 9th with leading technologists and decision-makers with the ability to influence the future of large-language model A.I.s. This presentation was given before the launch of GPT-4. One of the more astute critics of the tech industry, Tristan Harris, who has recently given stark evidence to Congress. It is worth watching both of these videos, as the Congress address gives a context of PR industry and it's regular abuses. "If we understand the mechanisms and motives of the group mind, it is now possible to control and regiment the masses according to our will without their

Beware the Orca, the challenge to ChatGPT and Palm2 is here

  So Google's 'we have no moat' paper was correct. If you train an LLM wisely then it's cost effective and cheap to produce a small LLM that is able to compete or even beat established, costly LLMs, as Microsoft has just found. It's another excellent video from AI Explained, who goes through some of the training procedures, which I won't get into here. Orca, is a model that learns from large foundation models (LFMs) like GPT-4 and ChatGPT by imitating their reasoning process. Orca uses rich signals such as explanations and complex instructions to improve its performance on various tasks. Orca outperforms other instruction-tuned models and achieves similar results to ChatGPT on zero-shot reasoning benchmarks and professional and academic exams. The paper suggests that learning from explanations is a promising way to enhance model skills. Smaller models are often overestimated in their abilities compared to LFMs, and need more rigorous evaluation methods. Explana

What is happening inside of the black box?

  Neel Nanda is involved in Mechanistic Interpretability research at DeepMind, formerly of AnthropicAI, what's fascinating about the research conducted by Nanda is he gets to peer into the Black Box to figure out how different types of AI models work. Anyone concerned with AI should understand how important this is. In this video Nanda discusses some of his findings, including 'induction heads', which turn out to have some vital properties.  Induction heads are a type of attention head that allows a language model to learn long-range dependencies in text. They do this by using a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. For example, if a model is given the sequence "The cat sat on the mat," it can use induction heads to predict that the word "mat" will be followed by the word "the". Induction heads were first discovered in 2022 by a team of researchers at OpenAI. They found that induction heads were present in