ML | JURASSIC-1 - Language Model
Last Updated :
29 Oct, 2021
Jurassic-1, the latest and the most advanced ‘Language Model’, is developed by Israel’s AI21 Labs. ‘Jurassic-1’ is the name given to a couple of auto-regressive Natural Language Processing (NLP) models. This model which was developed in competition to OpenAI’s GPT-3 consists of J1 Jumbo and J1 Large. This model breaks multiple records. Not only in terms of Jumbo’s size, which is 178 Billion parameters, but also in terms of its reach and usability by people. This is the first of all Language Models which will be made available to developers and researchers.
This futuristic model, introduced with the thought of having machines as humans’ thought partners, promises to carry out all kinds of languages and operational tasks. Not just that, it lets users build their own applications and services. Some of its coolest features are described below.
- Text summarization or simplification: Jurassic-1 does an amazing job to cut down texts of any given length to shorter texts containing only the relevant information. This feature could be used to make Minutes of Meeting, catch the gists of long mails/texts, conclude if a review or feedback was positive or negative, etc.
- Classification: This model specializes in classifying texts based on labels or categories. This classification is not limited to just binary classification. One main use case of classification is the case of sentiment analysis.
- World knowledge and creativity: This model has been trained on huge amounts of data due to which it is proficient in answering questions, giving suggestions, and clearing doubts. Not just that, this model is so creative that it is capable of writing articles on its own. It is humorous too though it is quite difficult for AI to grasp such things. Its ability to be so smart and creative has applications in areas of copywriting, ideation, marketing, and making interactive chatbots.
More of its features include translating programs and codes from one programming language to another, generating codes based only on textual commands, extracting information, and formatting. It can write lyrics of a song or rap, indulge in a game of charades, and play chess against you.
To store about 178 Billion parameters, Jurassic-1 requires a little more than 356 GB of memory in half-precision. Because even the best GPU’s memory is limited to just about 80 GB of memory, it was trained using multiple nodes. The model has been trained with 300 Billion tokens (a token is the small bit of text which is produced by breaking off large texts so that it is understood by the NLP) drawn from publicly available sources. In other words, the model has scraped almost all of the publicly available resources. This very fact makes the model a know-all.
This model is different from its predecessor GPT-3 in the following ways. GPT-3 has a capacity of 175 Billion parameters which makes it the second-largest language model. About 250,000 unique tokens, where a token can represent a word or a word piece, have been utilized to train Jurassic-1 whereas GPT-3 has been trained by utilizing only about 50,000 unique tokens. This makes Jurassic-1’s processing efficient as its tokens-per-byte (TPB) ratio is smaller which implies that the same text can be represented with fewer tokens in Jurassic-1 when compared to its representation in GPT-3. This speeds up Jurassic-1’s query processing by 1.4 times if both GPT-3 and Jurassic-1 are assumed to have the same architecture. But the catch is Jurassic-1’s architecture is different as the depth/width ratio of its neural net varies. Its architecture comparison is as shown in table 1. Considering Jurassic-1’s different architecture and its training in vocabulary, it speeds up its query processing by a huge 1.8 times. Because of its increased computational efficiency, Jurassic-1 can include more examples, when compared to GPT-3, in few-shot learning settings. Another very special feature of Jurassic-1 is it lets its users custom train the model by giving it very few examples (correctly mapped/answered datasets). The makers claim that giving it about 50-100 examples should be enough for the model to give fairly accurate results. Though it is always true that the greater the number of examples fed, the greater will be its accuracy. This, in contrast to GPT-3, gives the users the permission to use it as a chatbot too.
Table 1: Architecture comparison of GPT-3 and Jurassic-1 - nparams : Number of parameters in the model
- nlayers : Number of layers in the model
- dmodel : Number of units in each bottleneck model
- dhead : Dimension of attention heads
- nhead : Number of attention heads
- nvocab : Number of unique tokens used in training
AI21 is currently in open beta, hence, is letting anyone and everyone experiment with Jurassic-1. Go experiment.
References:
Similar Reads
Future of Large Language Models
In the last few years, the development of artificial intelligence has been in significant demand, with the emergence of Large Language Models (LLMs). This streamlined model entails advanced machine learning methods, has transformed natural language procedures, and is expected to revolutionize the fu
8 min read
Masked Language Models
Masked Language Models (MLMs) are a type of machine learning model designed to predict missing or "masked" words in a sentence. These models are trained on large datasets of text where certain words are intentionally hidden during training. The goal of the model is to guess the hidden word based on
5 min read
Causal Language Models in NLP
Causal language models are a type of machine learning model that generates text by predicting the next word in a sequence based on the words that came before it. Unlike masked language models which predict missing words in a sentence by analyzing both preceding and succeeding words causal models ope
4 min read
What is a Large Language Model (LLM)
Large Language Models (LLMs) represent a breakthrough in artificial intelligence, employing neural network techniques with extensive parameters for advanced language processing.This article explores the evolution, architecture, applications, and challenges of LLMs, focusing on their impact in the fi
9 min read
What are Language Models in NLP?
Language models are a fundamental component of natural language processing (NLP) and computational linguistics. They are designed to understand, generate, and predict human language. These models analyze the structure and use of language to perform tasks such as machine translation, text generation,
9 min read
Building Language Models in NLP
Building language models is a fundamental task in natural language processing (NLP) that involves creating computational models capable of predicting the next word in a sequence of words. These models are essential for various NLP applications, such as machine translation, speech recognition, and te
4 min read
Katz's Back-Off Model in Language Modeling
Language Modeling is one of the main tasks in the field of natural language processing and linguistics. We use these models to predict the probability with which the next word should come in a sequence of words. There are many good language models, one such is Katzâs Back-Off Model which was introdu
6 min read
Top 20 LLM (Large Language Models)
Large Language Model commonly known as an LLM, refers to a neural network equipped with billions of parameters and trained extensively on extensive datasets of unlabeled text. This training typically involves self-supervised or semi-supervised learning techniques. In this article, we explore about T
15+ min read
Fine Tuning Large Language Model (LLM)
Large Language Models (LLMs) have dramatically transformed natural language processing (NLP), excelling in tasks like text generation, translation, summarization, and question-answering. However, these models may not always be ideal for specific domains or tasks. To address this, fine-tuning is perf
13 min read
Llama 3: Meta's New AI Model
Meta AI just got a major upgrade with Llama 3, a powerful new large language model (LLM). This cutting-edge technology promises to change the way you interact with Meta's platforms like Facebook, Messenger, and WhatsApp. Llama 3 boasts superior Natural Language Processing (NLP), enabling Meta AI to
9 min read