Ever since the publication of the paper “Attention is all you need” by Vaswani et al in 2017, the Transformer Neural Network has become an ubiquitous architecture used in many machine learning tasks such as Neural Machine Translation, Language Modelling, Image Classification, Image Generation and Time Series Forecasting. In such a short time, a vast amount of variations of this model have been proposed to tackle different tasks, all based on the underlying mechanism which powers Transformers: Attention. This talk acts an introduction to the world of Transformers and Transformer variants, starting with a brief explanation of the model and the Attention mechanism, continuing afterwards with different use cases. The best part? Most of the code you need to reproduce these models is open-source. This way, you can find out for yourself if attention is truly all you need.
The Transformer Revolution
Attention-based Machine Learning