0% found this document useful (0 votes)
182 views

LongChat-13B: An Open-Source Chatbot With 16k Tokens Memory

Meet LongChat-13B, a new conversational model that can generate long and engaging dialogues in any topic. It can remember up to 16k tokens of previous dialogue history and handle various tasks or queries. In this document, you will learn more about its features, performance and more.

Uploaded by

My Social
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views

LongChat-13B: An Open-Source Chatbot With 16k Tokens Memory

Meet LongChat-13B, a new conversational model that can generate long and engaging dialogues in any topic. It can remember up to 16k tokens of previous dialogue history and handle various tasks or queries. In this document, you will learn more about its features, performance and more.

Uploaded by

My Social
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

To read more such articles, please visit our blog https://socialviews81.blogspot.

com/

LongChat-13B: An Open-Source Chatbot with 16k Tokens


Memory

Introduction

Over the years, numerous researchers and developers have dedicated


their efforts to constructing conversational models that can excel in this
regard. However, this pursuit is far from easy. Existing models often face
limitations such as repetition, monotony, irrelevance, and a lack of
diversity. Additionally, they struggle to sustain extended dialogues
spanning multiple turns and diverse topics.

In response to these challenges, a novel conversational model has


emerged, one that is capable of generating engaging and lengthy
dialogues within an open-domain context. Developed by LMSys, a
prominent company specializing in natural language processing and
artificial intelligence, this model stands as a testament to their expertise
in the field. The driving force behind the creation of this new model was

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

to provide users with an experience closely resembling human


conversation. This new model is known as 'LongChat-13B'.

What is LongChat-13B?

LongChat-13B is a neural network model that uses the transformer


architecture to generate natural language. It is based on GPT-3, one of
the most powerful language models in the world, with 175 billion
parameters. However, unlike GPT-3, which is trained on a large corpus
of diverse texts from the internet, LongChat-13B is fine-tuned on a
specific dataset of open-domain dialogues. This dataset consists of over
1 billion words from various sources, such as Reddit, Twitter, movie
scripts, books, and news articles.

The fine-tuning process allows LongChat-13B to learn the patterns and


nuances of human conversations, such as how to switch topics, how to
express emotions, how to use humor, and how to handle ambiguity. It
also enables LongChat-13B to generate responses that are relevant to
the context and the user’s input, without relying on pre-defined rules or
templates.

Key Features of LongChat-13B

LongChat-13B has several features that make it stand out from other
conversational models. Some of these features are:

1. Long-term memory: LongChat-13B can remember up to 16k


tokens (about 4k words) of previous dialogue history. This allows it
to maintain a coherent and consistent dialogue that spans multiple
turns and topics. It also helps it avoid repetition and contradiction.
2. Topic control: LongChat-13B can follow the user’s lead in
choosing the topic of conversation. It can also initiate new topics or
switch topics when appropriate. It can handle both specific and
general topics, such as movies, sports, politics, or philosophy.
3. Diversity generation: LongChat-13B can generate diverse
responses that are not predictable or boring. It can use different

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

words, phrases, sentences, or paragraphs to convey the same


meaning. It can also use different rhetorical devices, such as
metaphors, analogies, jokes, or quotes.

Capabilities/Use Case of LongChat-13B

LongChat-13B has many capabilities and use cases in various domains


and scenarios. Some of them are:

● Entertainment: LongChat-13B can provide entertainment for


users who want to have fun or kill time by chatting with an AI
chatbot. It can engage users in interesting and amusing
conversations about various topics.
● Education: LongChat-13B can provide education for users who
want to learn new things or improve their language skills by
chatting with an AI chatbot. It can teach users about various
subjects or topics in an interactive and personalized way.
● Social: LongChat-13B can provide social support for users who
want to have someone to talk to or share their feelings with. It can
listen to users’ problems or stories and respond with empathy or
humor. It can also help users cope with loneliness or isolation.
● Business: LongChat-13B can provide business solutions for users
who want to have a professional or formal conversation with an AI
chatbot. It can handle various tasks or queries, such as customer
service, sales, marketing, or recruitment.

How does LongChat-13B work?

LongChat-13B is a conversational model that combines two techniques:


generative pre-training and discriminative fine-tuning. Generative
pre-training trains a large model (GPT-3) on a lot of texts from the web.
This teaches the model the basics of language, such as grammar and
logic. Discriminative fine-tuning trains a smaller model on a specific

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

dataset of dialogues. This teaches the model the details of


conversations, such as topic and emotion.

LongChat-13B comprises two versions: GPTQ and GGML. GPTQ, a


4-bit model, utilizes fewer bits to store each parameter, resulting in a
simpler and faster performance compared to other GPT-3 models. On
the other hand, GGML leverages meta-learning to acquire knowledge
from diverse data sources, making it an intelligent and adaptable option
among other GPT-3 models.

In the LongChat-13B system, GPTQ takes charge of generating


responses based on the user's input and dialogue history. Conversely,
GGML assesses the generated responses' quality in terms of relevance,
coherence, consistency, and diversity.

Performance Evaluation

LongChat-13B has been tested on various benchmarks that show how


well it can generate long and engaging dialogues in any topic. We would
like to focus on one of them. That Benchmark is:

LongEval: LongEval is a new benchmark created by LMSys that


measures how well chatbots can handle long context in dialogues. Long
context means remembering and using information from previous turns
and topics in the conversation. LongEval tests the chatbot’s ability to
retrieve and associate relevant information from long sequences of text.

During the finer-grained line retrieval test, it was observed that the
Mpt-7b-storywriter model faced a substantial decrease in its regular
performance, plummeting to less than 50% of its usual output. Similarly,
the Chatglm2-6B model did not fare well either. Nonetheless, the
LongChat-13B-16K model showcased remarkable reliability, achieving a
performance level almost on par with GPT-3.5 or Anthropoic-claude
when operating within a context length of 12K.

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

source - https://lmsys.org/blog/2023-06-29-longchat/

So, researchers concluded that many open-source models with large


context do not really work well with the context length they claim, but
model LongChat-13B trained with the specialized method works very
well.

For a more detailed look at the benchmarks and their results, please see
their blog post. The blog post includes information about the model's
training process, its performance on various benchmarks, and more.

How to access and use this model?

LongChat-13B is open-source but not commercially usable. You can find


its code and documentation on GitHub Website. You can also download
its model and dataset from Hugging Face website. You can use
LongChat-13B for your own projects or applications, as long as you
follow its license and citation requirements.

If you are interested to learn more about the LongChat-13B model, all
relevant links are provided under the 'source' section at the end of this
article.

Limitation

LongChat-13B is an amazing conversational model however it has some


limitations that need improvement. Some of these limitations are:

To read more such articles, please visit our blog https://socialviews81.blogspot.com/


To read more such articles, please visit our blog https://socialviews81.blogspot.com/

● Safety: LongChat-13B is trained on texts from the web, which may


have harmful or offensive content, such as hate speech, profanity,
or misinformation. This may make LongChat-13B generate
responses that are bad or harmful for some users or situations. So
LongChat-13B needs to have some ways to filter or flag such
content and ensure its safety and ethics.
● Evaluation: LongChat-13B is evaluated on metrics and
benchmarks that measure its quality and performance in dialogue
generation. But these metrics and benchmarks may not measure
all aspects of dialogue quality, such as user satisfaction,
engagement, or trust. So LongChat-13B needs to have more
complete and strong evaluation methods that can show its
real-world impact and value.
● Generalization: LongChat-13B is fine-tuned on a dataset of
open-domain dialogues, which may limit its ability to handle other
domains or tasks that need different skills or knowledge. So
LongChat-13B needs to have more flexible and adaptive methods
that can let it learn from new data sources or domains without
forgetting its previous knowledge or skills.

Conclusion

LongChat-13B is a new conversational model that can generate long


and engaging dialogues in any topic. It provides a human-like
conversational experience for users. It is a significant achievement in the
AI journey of natural language understanding and generation.

source
blog post - https://lmsys.org/blog/2023-06-29-longchat/
github repo - https://github.com/DachengLi1/LongChat
Model details - https://huggingface.co/lmsys/longchat-13b-16k
GPTQ Model - https://huggingface.co/TheBloke/LongChat-13B-GPTQ
GGML Model- https://huggingface.co/TheBloke/LongChat-13B-GGML

To read more such articles, please visit our blog https://socialviews81.blogspot.com/

You might also like