Intermediate Activations in Llama 2.7B
there is a country layer in the llama 2 transformer
I found these parts interesting from this LessWrong analysis of the Llama 2 attention outputs.
“By layer 24, the model is quite certain about the correct answer, and the remaining computations are mostly redundant, mainly re-weighting alternative less obvious completion paths such as ‘The capital of Germany is {a, the, one, home, located…}’. Interestingly, the model becomes less certain about ‘Berlin’ from layers 24-31 as it figures out more alternative options.”
“The attention output of layer 24 of the llama 2 transformer consistently represents relevant information related to countries, even when neither the prompt nor the higher probability completions are related to countries”