The context vector
where
where
pipeline interface for inference:from transformers import pipeline
nlp = pipeline("sentiment-analysis") # for example
nlp("Cats are fickle creatures")
Weights represent probability!
All training sequences contain early tokens, only some longer ones contain later tokens. The transformer will be "distracted" by far away tokens.