Skip to main content

What is happening inside of the black box?

 


Neel Nanda is involved in Mechanistic Interpretability research at DeepMind, formerly of AnthropicAI, what's fascinating about the research conducted by Nanda is he gets to peer into the Black Box to figure out how different types of AI models work. Anyone concerned with AI should understand how important this is. In this video Nanda discusses some of his findings, including 'induction heads', which turn out to have some vital properties. 

Induction heads are a type of attention head that allows a language model to learn long-range dependencies in text. They do this by using a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. For example, if a model is given the sequence "The cat sat on the mat," it can use induction heads to predict that the word "mat" will be followed by the word "the".

Induction heads were first discovered in 2022 by a team of researchers at OpenAI. They found that induction heads were present in all large language models that they looked at, up to about 13 billion parameters. They also found that induction heads were essential for the models' ability to track long-range dependencies in text.

The discovery of induction heads has led to a better understanding of how large language models work. It has also opened up new possibilities for using these models for tasks such as translation, summarisation, and question answering.

The following text is from the interview where Nanda is explaining this concept further:

'So we found these induction heads by looking at tiny two-layer attentionally-based models, and then we looked at larger models. It turns out that not only do all models that people have looked at have these heads, up to about 13 billion parameters. Since leaving OpenAI, I actually had a fun side project of looking at all the open-source models I could find, and I found them in about 41 models I checked. All of them that were big enough to have induction heads had them.

Not only do they appear everywhere, they also all appear in this sudden phase transition. As you're training the model, if you just keep checking, "Does it have induction heads? Does this have induction heads?" there's this narrow band of training, between about 5 to 10% of the way through training, where the model goes from no induction heads to basically fully formed induction heads. This is enough of a big deal, but if you look at the loss curve, which is the jargon for how good the model is at its task, there's this visible bump where the model is smoothly getting better, and it briefly gets better much faster, and then returns to its previous level of smoothly getting better when these induction heads form. So that's wild.

Then the next totally wild thing about induction heads is that they're really important for this thing models can do called context learning. So a general fact about language models that are trained on public data is that the more previous words you give them, the better they are. Which is kind of intuitive. If you're trying to predict what comes next in the sentence, "The cat sat on the mat," like, what comes after "on the mat?" If you just have "the," it's really hard. If you've got "it," it's easier. If you've got "the cat sat on the," it's like way easier. But it's not obvious that if you add more than 100 words it really matters. And in fact, older models weren't that good at using words more than 100 words back. And it's kind of not obvious how you do this, though clearly it should be possible. For example, if I'm reading a book, the chapter heading is probably relevant to figuring out what comes next. Or, like, if I'm reading an article, the introduction is pretty relevant. But it's definitely a weird thing that models can do this.

And it turns out that induction heads are a really big part of how they're good at this. Models that are capable of forming induction heads are much better at this thing of tracking long-range dependencies in text. The ability of models to do this perfectly coincides with the dramatic bit where they're learned. And when we did things like tweaking a model too small to have induction heads with this hard-coded thing that made induction heads more natural to form, that model got much better at tracking how to use text fallback to predict the next thing. And we even found some heads that seem to do more complicated things, like translation. Where you give it a text in English, and a text in French, and it looks at the word in English that came after the corresponding word in French. These also seem to be based on induction heads.

These induction heads pop up in many different neural networks, in many of the neural networks that I've checked at a certain size.' 

Comments

Popular posts from this blog

OpenAI's NSA Appointment Raises Alarming Surveillance Concerns

  The recent appointment of General Paul Nakasone, former head of the National Security Agency (NSA), to OpenAI's board of directors has sparked widespread outrage and concern among privacy advocates and tech enthusiasts alike. Nakasone, who led the NSA from 2018 to 2023, will join OpenAI's Safety and Security Committee, tasked with enhancing AI's role in cybersecurity. However, this move has raised significant red flags, particularly given the NSA's history of mass surveillance and data collection without warrants. Critics, including Edward Snowden, have voiced their concerns that OpenAI's AI capabilities could be leveraged to strengthen the NSA's snooping network, further eroding individual privacy. Snowden has gone so far as to label the appointment a "willful, calculated betrayal of the rights of every person on Earth." The tech community is rightly alarmed, with many drawing parallels to dystopian fiction. The move has also raised questions about

Prompt Engineering: Expert Tips for a variety of Platforms

  Prompt engineering has become a crucial aspect of harnessing the full potential of AI language models. Both Google and Anthropic have recently released comprehensive guides to help users optimise their prompts for better interactions with their AI tools. What follows is a quick overview of tips drawn from these documents. And to think just a year ago there were countless YouTube videos that were promoting 'Prompt Engineering' as a job that could earn megabucks... The main providers of these 'chatbots' will hopefully get rid of this problem, soon. Currently their interfaces are akin to 1970's command lines, we've seen a regression in UI. Constructing complex prompts should be relegated to Linux lovers. Just a word of caution, even excellent prompts don't stop LLM 'hallucinations'. They can be mitigated against by supplementing a LLM with a RAG, and perhaps by 'Memory Tuning ' as suggested by Lamini (I've not tested this approach yet).