In recent years different groups have used the transformer architecture (a deep learning model) to train neural networks using large quantities of text. With the increase in compute power these models have grown to billions or even hundred of billions of parameters. As the model size grew, noteworthy abilities emerged. Such as the ability to generate text showing surprising reasoning skills to the point that the leading models can now successfully take college-level exams.
Currently some of the best and most famous models are proprietary and released to the public as a service. However a large Open Source community has emerged that tries to train and fine tune free models that can be used self-hosted. This is a challenging task due to problems with potential copyright issues with the training text, the large computational cost of the training itself and the supervised fine tuning step to adapt the model to its final use case.
In this talk I will give an overview on what the most promising projects in this space are and how they compare to the proprietary state-of-the-art models of the large players.