Skip to main content

What is happening inside of the black box?

 


Neel Nanda is involved in Mechanistic Interpretability research at DeepMind, formerly of AnthropicAI, what's fascinating about the research conducted by Nanda is he gets to peer into the Black Box to figure out how different types of AI models work. Anyone concerned with AI should understand how important this is. In this video Nanda discusses some of his findings, including 'induction heads', which turn out to have some vital properties. 

Induction heads are a type of attention head that allows a language model to learn long-range dependencies in text. They do this by using a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. For example, if a model is given the sequence "The cat sat on the mat," it can use induction heads to predict that the word "mat" will be followed by the word "the".

Induction heads were first discovered in 2022 by a team of researchers at OpenAI. They found that induction heads were present in all large language models that they looked at, up to about 13 billion parameters. They also found that induction heads were essential for the models' ability to track long-range dependencies in text.

The discovery of induction heads has led to a better understanding of how large language models work. It has also opened up new possibilities for using these models for tasks such as translation, summarisation, and question answering.

The following text is from the interview where Nanda is explaining this concept further:

'So we found these induction heads by looking at tiny two-layer attentionally-based models, and then we looked at larger models. It turns out that not only do all models that people have looked at have these heads, up to about 13 billion parameters. Since leaving OpenAI, I actually had a fun side project of looking at all the open-source models I could find, and I found them in about 41 models I checked. All of them that were big enough to have induction heads had them.

Not only do they appear everywhere, they also all appear in this sudden phase transition. As you're training the model, if you just keep checking, "Does it have induction heads? Does this have induction heads?" there's this narrow band of training, between about 5 to 10% of the way through training, where the model goes from no induction heads to basically fully formed induction heads. This is enough of a big deal, but if you look at the loss curve, which is the jargon for how good the model is at its task, there's this visible bump where the model is smoothly getting better, and it briefly gets better much faster, and then returns to its previous level of smoothly getting better when these induction heads form. So that's wild.

Then the next totally wild thing about induction heads is that they're really important for this thing models can do called context learning. So a general fact about language models that are trained on public data is that the more previous words you give them, the better they are. Which is kind of intuitive. If you're trying to predict what comes next in the sentence, "The cat sat on the mat," like, what comes after "on the mat?" If you just have "the," it's really hard. If you've got "it," it's easier. If you've got "the cat sat on the," it's like way easier. But it's not obvious that if you add more than 100 words it really matters. And in fact, older models weren't that good at using words more than 100 words back. And it's kind of not obvious how you do this, though clearly it should be possible. For example, if I'm reading a book, the chapter heading is probably relevant to figuring out what comes next. Or, like, if I'm reading an article, the introduction is pretty relevant. But it's definitely a weird thing that models can do this.

And it turns out that induction heads are a really big part of how they're good at this. Models that are capable of forming induction heads are much better at this thing of tracking long-range dependencies in text. The ability of models to do this perfectly coincides with the dramatic bit where they're learned. And when we did things like tweaking a model too small to have induction heads with this hard-coded thing that made induction heads more natural to form, that model got much better at tracking how to use text fallback to predict the next thing. And we even found some heads that seem to do more complicated things, like translation. Where you give it a text in English, and a text in French, and it looks at the word in English that came after the corresponding word in French. These also seem to be based on induction heads.

These induction heads pop up in many different neural networks, in many of the neural networks that I've checked at a certain size.' 

Comments

Popular posts from this blog

The AI Dilemma and "Gollem-Class" AIs

From the Center for Humane Technology Tristan Harris and Aza Raskin discuss how existing A.I. capabilities already pose catastrophic risks to a functional society, how A.I. companies are caught in a race to deploy as quickly as possible without adequate safety measures, and what it would mean to upgrade our institutions to a post-A.I. world. This presentation is from a private gathering in San Francisco on March 9th with leading technologists and decision-makers with the ability to influence the future of large-language model A.I.s. This presentation was given before the launch of GPT-4. One of the more astute critics of the tech industry, Tristan Harris, who has recently given stark evidence to Congress. It is worth watching both of these videos, as the Congress address gives a context of PR industry and it's regular abuses. "If we understand the mechanisms and motives of the group mind, it is now possible to control and regiment the masses according to our will without their

Beware the Orca, the challenge to ChatGPT and Palm2 is here

  So Google's 'we have no moat' paper was correct. If you train an LLM wisely then it's cost effective and cheap to produce a small LLM that is able to compete or even beat established, costly LLMs, as Microsoft has just found. It's another excellent video from AI Explained, who goes through some of the training procedures, which I won't get into here. Orca, is a model that learns from large foundation models (LFMs) like GPT-4 and ChatGPT by imitating their reasoning process. Orca uses rich signals such as explanations and complex instructions to improve its performance on various tasks. Orca outperforms other instruction-tuned models and achieves similar results to ChatGPT on zero-shot reasoning benchmarks and professional and academic exams. The paper suggests that learning from explanations is a promising way to enhance model skills. Smaller models are often overestimated in their abilities compared to LFMs, and need more rigorous evaluation methods. Explana