Skip to main content

What is happening inside of the black box?

 


Neel Nanda is involved in Mechanistic Interpretability research at DeepMind, formerly of AnthropicAI, what's fascinating about the research conducted by Nanda is he gets to peer into the Black Box to figure out how different types of AI models work. Anyone concerned with AI should understand how important this is. In this video Nanda discusses some of his findings, including 'induction heads', which turn out to have some vital properties. 

Induction heads are a type of attention head that allows a language model to learn long-range dependencies in text. They do this by using a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. For example, if a model is given the sequence "The cat sat on the mat," it can use induction heads to predict that the word "mat" will be followed by the word "the".

Induction heads were first discovered in 2022 by a team of researchers at OpenAI. They found that induction heads were present in all large language models that they looked at, up to about 13 billion parameters. They also found that induction heads were essential for the models' ability to track long-range dependencies in text.

The discovery of induction heads has led to a better understanding of how large language models work. It has also opened up new possibilities for using these models for tasks such as translation, summarisation, and question answering.

The following text is from the interview where Nanda is explaining this concept further:

'So we found these induction heads by looking at tiny two-layer attentionally-based models, and then we looked at larger models. It turns out that not only do all models that people have looked at have these heads, up to about 13 billion parameters. Since leaving OpenAI, I actually had a fun side project of looking at all the open-source models I could find, and I found them in about 41 models I checked. All of them that were big enough to have induction heads had them.

Not only do they appear everywhere, they also all appear in this sudden phase transition. As you're training the model, if you just keep checking, "Does it have induction heads? Does this have induction heads?" there's this narrow band of training, between about 5 to 10% of the way through training, where the model goes from no induction heads to basically fully formed induction heads. This is enough of a big deal, but if you look at the loss curve, which is the jargon for how good the model is at its task, there's this visible bump where the model is smoothly getting better, and it briefly gets better much faster, and then returns to its previous level of smoothly getting better when these induction heads form. So that's wild.

Then the next totally wild thing about induction heads is that they're really important for this thing models can do called context learning. So a general fact about language models that are trained on public data is that the more previous words you give them, the better they are. Which is kind of intuitive. If you're trying to predict what comes next in the sentence, "The cat sat on the mat," like, what comes after "on the mat?" If you just have "the," it's really hard. If you've got "it," it's easier. If you've got "the cat sat on the," it's like way easier. But it's not obvious that if you add more than 100 words it really matters. And in fact, older models weren't that good at using words more than 100 words back. And it's kind of not obvious how you do this, though clearly it should be possible. For example, if I'm reading a book, the chapter heading is probably relevant to figuring out what comes next. Or, like, if I'm reading an article, the introduction is pretty relevant. But it's definitely a weird thing that models can do this.

And it turns out that induction heads are a really big part of how they're good at this. Models that are capable of forming induction heads are much better at this thing of tracking long-range dependencies in text. The ability of models to do this perfectly coincides with the dramatic bit where they're learned. And when we did things like tweaking a model too small to have induction heads with this hard-coded thing that made induction heads more natural to form, that model got much better at tracking how to use text fallback to predict the next thing. And we even found some heads that seem to do more complicated things, like translation. Where you give it a text in English, and a text in French, and it looks at the word in English that came after the corresponding word in French. These also seem to be based on induction heads.

These induction heads pop up in many different neural networks, in many of the neural networks that I've checked at a certain size.' 

Comments

Popular posts from this blog

The Whispers in the Machine: Why Prompt Injection Remains a Persistent Threat to LLMs

 Large Language Models (LLMs) are rapidly transforming how we interact with technology, offering incredible potential for tasks ranging from content creation to complex analysis. However, as these powerful tools become more integrated into our lives, so too do the novel security challenges they present. Among these, prompt injection attacks stand out as a particularly persistent and evolving threat. These attacks, as one recent paper (Safety at Scale: A Comprehensive Survey of Large Model Safety https://arxiv.org/abs/2502.05206) highlights, involve subtly manipulating LLMs to deviate from their intended purpose, and the methods are becoming increasingly sophisticated. At its core, a prompt injection attack involves embedding a malicious instruction within an otherwise normal request, tricking the LLM into producing unintended – and potentially harmful – outputs. Think of it as slipping a secret, contradictory instruction into a seemingly harmless conversation. What makes prompt inj...

OpenAI's NSA Appointment Raises Alarming Surveillance Concerns

  The recent appointment of General Paul Nakasone, former head of the National Security Agency (NSA), to OpenAI's board of directors has sparked widespread outrage and concern among privacy advocates and tech enthusiasts alike. Nakasone, who led the NSA from 2018 to 2023, will join OpenAI's Safety and Security Committee, tasked with enhancing AI's role in cybersecurity. However, this move has raised significant red flags, particularly given the NSA's history of mass surveillance and data collection without warrants. Critics, including Edward Snowden, have voiced their concerns that OpenAI's AI capabilities could be leveraged to strengthen the NSA's snooping network, further eroding individual privacy. Snowden has gone so far as to label the appointment a "willful, calculated betrayal of the rights of every person on Earth." The tech community is rightly alarmed, with many drawing parallels to dystopian fiction. The move has also raised questions about ...

The Hidden Environmental Cost of AI: Data Centres' Surging Energy and Water Consumption

 In recent years, artificial intelligence (AI) has become an integral part of our daily lives, powering everything from smart assistants to complex data analysis. However, as AI technologies continue to advance and proliferate, a concerning trend has emerged: the rapidly increasing energy and water consumption of data centres that support these systems. The Power Hunger of AI According to the International Energy Agency (IEA), global data centre electricity demand is projected to more than double between 2022 and 2026, largely due to the growth of AI. In 2022, data centres consumed approximately 460 terawatt-hours (TWh) globally, and this figure is expected to exceed 1,000 TWh by 2026. To put this into perspective, that's equivalent to the entire electricity consumption of Japan. The energy intensity of AI-related queries is particularly striking. While a typical Google search uses about 0.3 watt-hours (Wh), a query using ChatGPT requires around 2.9 Wh - nearly ten times more en...