Eliezer Yudkowsky on Alignment and can it be regulated for?

Yudkowsky is one of the leading figures on matters of AI Alignment, this is a one hour discussion from the Center for Future Mind and the Gruber Sandbox at Florida Atlantic University. He's recently conducted a TED talk on the subjects raised here, but, at greater length and depth.

Early in the discussion Yudkowsky states:

'Just this very day... China released it's own preliminary set of regulations or something for AI models, it's actually stricter than what we've got. Possibly it was written by somebody who didn't quite understand how this works because it's things like all of the data that you're training it on needs to be like honest and accurate! So possibly regulations that are not factual.'

This is one of the significant issues with regulation as a means of controlling AI development. It requires levels of expertise in governance not often seen, it requires laws to be fit for purpose, and any laws should not be so reactive to current technologies that they miss out on what will occur by the time legislation has been passed.

It's often pointed out that we can do this; look at the examples on Human Cloning, there's been a global consensus on stopping research into this. But, that is a relatively easy area to legislate for, it's clear in the goals it sets and by and large can be monitored. What are the equivalent obvious goals for regulation on AI research and practice?

Comments

The Whispers in the Machine: Why Prompt Injection Remains a Persistent Threat to LLMs

Large Language Models (LLMs) are rapidly transforming how we interact with technology, offering incredible potential for tasks ranging from content creation to complex analysis. However, as these powerful tools become more integrated into our lives, so too do the novel security challenges they present. Among these, prompt injection attacks stand out as a particularly persistent and evolving threat. These attacks, as one recent paper (Safety at Scale: A Comprehensive Survey of Large Model Safety https://arxiv.org/abs/2502.05206) highlights, involve subtly manipulating LLMs to deviate from their intended purpose, and the methods are becoming increasingly sophisticated. At its core, a prompt injection attack involves embedding a malicious instruction within an otherwise normal request, tricking the LLM into producing unintended – and potentially harmful – outputs. Think of it as slipping a secret, contradictory instruction into a seemingly harmless conversation. What makes prompt inj...

Can We Build a Safe Superintelligence? Safe Superintelligence Inc. Raises Intriguing Questions

Safe Superintelligence Inc . (SSI) has burst onto the scene with a bold mission: to create the world's first safe superintelligence (SSI). Their (Ilya Sutskever, Daniel Gross, Daniel Levy) ambition is undeniable, but before we all sign up to join their "cracked team," let's delve deeper into the potential issues with their approach. One of the most critical questions is defining "safe" superintelligence. What values would guide this powerful AI? How can we ensure it aligns with the complex and often contradictory desires of humanity? After all, "safe" for one person might mean environmental protection, while another might prioritise economic growth, even if it harms the environment. Finding universal values that a superintelligence could adhere to is a significant hurdle that SSI hasn't fully addressed. Another potential pitfall lies in SSI's desire to rapidly advance capabilities while prioritising safety. Imagine a Formula One car wi...

The Hidden Environmental Cost of AI: Data Centres' Surging Energy and Water Consumption

In recent years, artificial intelligence (AI) has become an integral part of our daily lives, powering everything from smart assistants to complex data analysis. However, as AI technologies continue to advance and proliferate, a concerning trend has emerged: the rapidly increasing energy and water consumption of data centres that support these systems. The Power Hunger of AI According to the International Energy Agency (IEA), global data centre electricity demand is projected to more than double between 2022 and 2026, largely due to the growth of AI. In 2022, data centres consumed approximately 460 terawatt-hours (TWh) globally, and this figure is expected to exceed 1,000 TWh by 2026. To put this into perspective, that's equivalent to the entire electricity consumption of Japan. The energy intensity of AI-related queries is particularly striking. While a typical Google search uses about 0.3 watt-hours (Wh), a query using ChatGPT requires around 2.9 Wh - nearly ten times more en...

Charting the emergence of AGI?

Search This Blog