What is happening inside of the black box?

Neel Nanda is involved in Mechanistic Interpretability research at DeepMind, formerly of AnthropicAI, what's fascinating about the research conducted by Nanda is he gets to peer into the Black Box to figure out how different types of AI models work. Anyone concerned with AI should understand how important this is. In this video Nanda discusses some of his findings, including 'induction heads', which turn out to have some vital properties.

Induction heads are a type of attention head that allows a language model to learn long-range dependencies in text. They do this by using a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. For example, if a model is given the sequence "The cat sat on the mat," it can use induction heads to predict that the word "mat" will be followed by the word "the".

Induction heads were first discovered in 2022 by a team of researchers at OpenAI. They found that induction heads were present in all large language models that they looked at, up to about 13 billion parameters. They also found that induction heads were essential for the models' ability to track long-range dependencies in text.

The discovery of induction heads has led to a better understanding of how large language models work. It has also opened up new possibilities for using these models for tasks such as translation, summarisation, and question answering.

The following text is from the interview where Nanda is explaining this concept further:

'So we found these induction heads by looking at tiny two-layer attentionally-based models, and then we looked at larger models. It turns out that not only do all models that people have looked at have these heads, up to about 13 billion parameters. Since leaving OpenAI, I actually had a fun side project of looking at all the open-source models I could find, and I found them in about 41 models I checked. All of them that were big enough to have induction heads had them.

Not only do they appear everywhere, they also all appear in this sudden phase transition. As you're training the model, if you just keep checking, "Does it have induction heads? Does this have induction heads?" there's this narrow band of training, between about 5 to 10% of the way through training, where the model goes from no induction heads to basically fully formed induction heads. This is enough of a big deal, but if you look at the loss curve, which is the jargon for how good the model is at its task, there's this visible bump where the model is smoothly getting better, and it briefly gets better much faster, and then returns to its previous level of smoothly getting better when these induction heads form. So that's wild.

Then the next totally wild thing about induction heads is that they're really important for this thing models can do called context learning. So a general fact about language models that are trained on public data is that the more previous words you give them, the better they are. Which is kind of intuitive. If you're trying to predict what comes next in the sentence, "The cat sat on the mat," like, what comes after "on the mat?" If you just have "the," it's really hard. If you've got "it," it's easier. If you've got "the cat sat on the," it's like way easier. But it's not obvious that if you add more than 100 words it really matters. And in fact, older models weren't that good at using words more than 100 words back. And it's kind of not obvious how you do this, though clearly it should be possible. For example, if I'm reading a book, the chapter heading is probably relevant to figuring out what comes next. Or, like, if I'm reading an article, the introduction is pretty relevant. But it's definitely a weird thing that models can do this.

And it turns out that induction heads are a really big part of how they're good at this. Models that are capable of forming induction heads are much better at this thing of tracking long-range dependencies in text. The ability of models to do this perfectly coincides with the dramatic bit where they're learned. And when we did things like tweaking a model too small to have induction heads with this hard-coded thing that made induction heads more natural to form, that model got much better at tracking how to use text fallback to predict the next thing. And we even found some heads that seem to do more complicated things, like translation. Where you give it a text in English, and a text in French, and it looks at the word in English that came after the corresponding word in French. These also seem to be based on induction heads.

These induction heads pop up in many different neural networks, in many of the neural networks that I've checked at a certain size.'

Charting the emergence of AGI?

Search This Blog

What is happening inside of the black box?

Labels

Comments

Post a Comment

Popular posts from this blog

The Whispers in the Machine: Why Prompt Injection Remains a Persistent Threat to LLMs

The Future of Work in the Age of AGI: Opportunities, Challenges, and Resistance

Podcast Soon Notice