NEC C&C Foundation

2024 Recipients of C&C Prize

Group B

Transformer Team

Member：Eight authors of the paper “Attention Is All You Need”

Ashish Vaswani

CEO, Essemtial AI

Noam Shazeer

VP, Gemini Technical Co-Lead, Google Deepmind

Niki Parmar

Research scientist, essential AI, Google

Jakob Uszkoreit

Inceptive

Llion Owen Jones

CTO, Sakana AI

Aidan Gomez

CEO, Cohere

Lukasz Kaiser

Member of Technical Staff, OpenAI

Illia Polosukhin

NEAR Protocol

Citation

For pioneering research of the Transformer deep learning model serving as the foundation of generative AI

Achievements

Artificial intelligence (AI) technology has been evolving remarkably in recent years and has come to deeply penetrate not only industrial fields but society as well. The appearance of generative AI, in particular, has sent shock waves throughout the world as AI possessing creativity. The third-generation AI boom arrived in the 2000s. Machine learning came into practical use, deep learning made its appearance, and the performance of AI improved in many fields including image recognition, natural language processing, and speech recognition using convolutional neural networks (CNNs), recurrent neural networks (RNNs), and other models. In 2017, the innovative Transformer model that revamped conventional neural networks appeared achieving a level of performance greatly exceeding existing models. It is said that the present progress in AI could not have occurred without Transformer.

Eight years have passed since this groundbreaking event in the world of AI, and the memories of those involved are vague, so it may not be accurate, but this is roughly what happened. The story begins with the participation of Jakob Uszkoreit in a team organized by Google to develop a module that could directly converse with the user in 2012. The team constructed this interactive tool using RNN. However, RNN hit a wall since it could not interpret long sentences very well, and long short-term memory (LSTM) that improved upon RNN had its limitations as well.attentionOne day, Uszkoreit had lunch with Illia Polosukhin at the Google cafe and talked about a self-attention mechanism. They then invited Ashish Vaswani to join them, and the three created a design document titled “Transformers: Iterative Self-Attention and Processing for Various Tasks.” Niki Parmar and Llion Jones also joined the team. This Transformer research attracted Google Brain researchers, and Lukasz Kaiser and Aidan Gomez, an intern working under Kaiser, joined the team. The Transformer Team undertook the construction of a self-attention model for translating sentences into another language. The self-attention mechanism is a technology that focuses on important parts of input data and performs learning and inference effectively. It is also effective in learning the relationships between distant elements and is better at understanding long sentences and complex context. Other features include the ability to apply parallel processing to the input sequence, which significantly improves computational efficiency and enables training with a large-scale dataset. Transformer model exhibited excellent performance at a level equivalent to or better than the best models of that time, but it then came to stagnate at that level. One day in 2017, while Kaiser and Vaswani were having a heated discussion about self-attention, Noam Shazeer happened to pass by, and feeling that their discussion showed promise, he also joined the team.

The objective of the team was announced at the 2017 Annual Conference on Neural Information Processing System (NeurIPS). To meet the deadline of May 19, the eight members of the team worked without much sleep from February 2017 and sent off their paper just before the deadline. The paper included evaluations of two types of Transformer models. The base model trained for 12 hours exhibited performance exceeding that of competing models, and the “big” high-performance version trained for 3.5 days demonstrated scores definitely better than the records of existing models.

The team was in agreement about launching a model which used “attention” only which rejected the best practices of the day of using slow recurrent connections. Each of the eight authors contributed equally to the paper, and their names were listed on the paper in random order. The paper attracted lots of attention, and researchers packed the poster session at NeurIPS on December 6. As of August 2024, the number of citations exceeded 136,000, truly an amazing number. At the end of the paper, the eight authors described the possibility of extending the attention base model to image, audio, and video processing.

Following the presentation of the paper, AI studies using Transformer architecture were announced one after another. In addition to high speed and high performance, Transformer features a significant improvement in accuracy as the scale of training data increases, which has also brought about scale-related competition in AI models. In 2018, OpenAI announced the initial version of its generative pre-trained transformer (GPT), a large language model, and has been releasing updated versions ever since. As AI that can carry on conversations like a human, the number of users of ChatGPT appearing in 2022 topped 100 million in less than three months after its release. In addition to being talked about as AI that can foster innovation in a variety of fields including business, work, education, medical care, and everyday life, it also gave rise to discussions on ethical matters. Open AI also announced DALL-E, an image generation system, and Whisper, a speech recognition model. In 2018, Google announced a natural language processing model called BERT (Bidirectional Encoder Representations from Transformers), which was followed up by ViT (Vision Transformer) specializing in image recognition. Then, in 2023, it announced Gemini (formerly known as Bard), a generative artificial intelligence chatbot. These eight individuals who presented Transformer that changed the history of AI have since left Google. Seven of them have founded their own start-ups and all but one of these companies conduct business based on Transformer technology.

Transformer not only broke through the limitations of what were advanced technologies at that time like RNN and LSTM but also enabled the seamless integration of diverse types of data in multimodal AI while greatly improving the capabilities of AI systems. Although deep learning gave rise to AI-based innovation from around 2010, Transformer can be called the second great innovation in the sense of revamping existing neural networks. The present spread of generative AI using Transformer as base technology and its impact on society has been huge, and in view of these achievements, these eight members of the Transformer team are deserving recipients of the C&C Prize.

The NEC C&C Foundation

2024 Recipients of C&C Prize

Group B

Citation

Achievements