Skip to main content

Beyond Chat GPT, Russell on AI


 Stuart Russell gives both an informed and easy to follow set of responses to questions raised at the Common Wealth Club of California. There is little new, in order to add to my understanding, in what he states here, but he does put it forward concisely. 

Stuart Russell is a Professor of Computer Science, Director of the Kavli Center for Ethics, Science, and the Public, and Director of the Center for Human-Compatible AI, University of California, Berkeley; Author, Human Compatible: Artificial Intelligence and the Problem of Control.

An example of Russel's thought is below:

'But the drawback in doing that is that we have to specify those objectives, right? The machines don't dream them up by themselves. And if we mis-specify the objectives, then we have what's called a misalignment between the machine's behaviour and what humans want the future to be like.

The most obvious example of that is in social media, where we have specified objectives like maximising the number of clicks and maximising the amount of engagement of the user. The machine learning algorithms that decide what billions of people read and watch have more control over human cognitive intake than any dictator, you know, than the North Korean or Stalin or anyone has ever had.

And yet they're totally unregulated. So those algorithms learn how to maximise those objectives, and they figured out that the best way to do it is not to send you what you're interested in, but actually to manipulate you over time by thousands of little nudges so that you become a much more predictable version of yourself.

Because the more predictable you are, the more they can monetise you. And so they learned how to do that. And at least empirically, it looks as if the best way to do that is to make you more extreme, right? So that you start to consume that red meat that then whole human industries spring up to feed.

And this misalignment is the source of the concern that people have had about AI. Going right back to Alan Turing, who was the founder of computer science, in a 1951 lecture. He said that once the machine-thinking method had started, it would leave our feeble powers far behind. And we should have to expect the machines to take control.

So they take control not because they're evil or because they spontaneously develop consciousness or anything like that. It's just because we give them some objectives that are not aligned with what we want the future to be like. And because they're more capable than us, they achieve their objectives and we don't, right? So we set up a chess match which we proceed to lose.

So in order to fix that problem, I've been following a different approach to AI, which says that the AI system, while its only objective is to further the interests of human beings, doesn't know what those are and knows that it doesn't know what those are. It's explicitly uncertain about human objectives. And so to the extent that there's a moral theory, it's simply that the job of an AI system is to further human interest. It knows that it doesn't know what those are, but it can learn more by conversing with us, by observing the choices that we make and the choices that we regret, the things that we do and the things that we don't do. So this helps it to understand what we want the future to be like. And then as it starts to learn, it can start to be more helpful.

There are still some difficult moral questions, of course. The most obvious one is that it's not one person's interest. It's not one set of values. There's 8 billion of us, so there's 8 billion different preferences about the future and how do you trade those off? And this is a two-and-a-half-thousand-year-old question, at least, and there are several different schools of thought on that. And we better figure out which is the right one, because we're going to be implementing it fairly soon.

And then there are even more difficult questions like, well, what about not the 8 billion people who are alive, but what about all the people who have yet to live? How do we take into account their interests? Right, right. What if we take actions that change who's going to live? You change the number of people who are going to live. For example, the Chinese policy of one child per family probably eliminated 500 million people already. Now they never existed. So we don't know what they would have wanted, but how, you know, how should we make that type of decision? Right. These are really difficult questions that philosophers really struggle with. But when we have AI systems that are sufficiently powerful that they could make those decisions, we need to have an answer ready so that we don't get it wrong.'

Comments

Popular posts from this blog

The Whispers in the Machine: Why Prompt Injection Remains a Persistent Threat to LLMs

 Large Language Models (LLMs) are rapidly transforming how we interact with technology, offering incredible potential for tasks ranging from content creation to complex analysis. However, as these powerful tools become more integrated into our lives, so too do the novel security challenges they present. Among these, prompt injection attacks stand out as a particularly persistent and evolving threat. These attacks, as one recent paper (Safety at Scale: A Comprehensive Survey of Large Model Safety https://arxiv.org/abs/2502.05206) highlights, involve subtly manipulating LLMs to deviate from their intended purpose, and the methods are becoming increasingly sophisticated. At its core, a prompt injection attack involves embedding a malicious instruction within an otherwise normal request, tricking the LLM into producing unintended – and potentially harmful – outputs. Think of it as slipping a secret, contradictory instruction into a seemingly harmless conversation. What makes prompt inj...

Can We Build a Safe Superintelligence? Safe Superintelligence Inc. Raises Intriguing Questions

  Safe Superintelligence Inc . (SSI) has burst onto the scene with a bold mission: to create the world's first safe superintelligence (SSI). Their (Ilya Sutskever, Daniel Gross, Daniel Levy) ambition is undeniable, but before we all sign up to join their "cracked team," let's delve deeper into the potential issues with their approach. One of the most critical questions is defining "safe" superintelligence. What values would guide this powerful AI? How can we ensure it aligns with the complex and often contradictory desires of humanity?  After all, "safe" for one person might mean environmental protection, while another might prioritise economic growth, even if it harms the environment.  Finding universal values that a superintelligence could adhere to is a significant hurdle that SSI hasn't fully addressed. Another potential pitfall lies in SSI's desire to rapidly advance capabilities while prioritising safety.  Imagine a Formula One car wi...

AI Agents and the Latest Silicon Valley Hype

In what appears to be yet another grandiose proclamation from the tech industry, Google has released a whitepaper extolling the virtues of what they're calling "Generative AI agents". (https://www.aibase.com/news/14498) Whilst the basic premise—distinguishing between AI models and agents—holds water, one must approach these sweeping claims with considerable caution. Let's begin with the fundamentals. Yes, AI models like Large Language Models do indeed process information and generate outputs. That much isn't controversial. However, the leap from these essentially sophisticated pattern-matching systems to autonomous "agents" requires rather more scrutiny than the tech evangelists would have us believe. The whitepaper's architectural approaches—with their rather grandiose names like "ReAct" and "Tree of Thought"—sound remarkably like repackaged versions of long-standing computer science concepts, dressed up in fashionable AI clot...