Eduardo Muñoz’s Post

5mo

💥 New Qwen2 series The developers behind the Qwen series have unveiled the next-generation Qwen2 models, marking a significant advancement from Qwen1.5. These new models are characterized by enhanced size diversity, multilingual proficiency, extended context handling, and state-of-the-art performance across various benchmarks. 👏 Diverse Model Sizes: The Qwen2 series introduces a range of models in five different sizes: Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B. Including both base and instruction-tuned versions. 🌏 Multilingual Training: Qwen2 models have been trained on data spanning 27 additional languages besides English and Chinese. This broad linguistic training improves the models' generalization and understanding across various languages, setting a new standard in multilingual LLM performance. 📥 Enhanced Context Length: Support for extended context lengths: Qwen2-7B-Instruct and Qwen2-72B-Instruct models now support context lengths up to 128K tokens. All base models have been pretrained on data with a context length of 32K tokens. The ability to handle such extensive contexts allows these models to manage and interpret lengthy documents and conversations more effectively. 🧐 Group Query Attention (GQA): GQA has been applied across all Qwen2 model sizes, enhancing inference speed and reducing memory usage, optimizing performance for both small and large models. 🎢 Improved Performance in Coding and Mathematics: The Qwen2 series exhibits significantly improved performance in coding and mathematical tasks, reflecting advancements in model architecture and training processes. 📢 Opensource Availability: The Qwen2 models have been open-sourced on Hugging Face and ModelScope. 🛠 Instruction-Tuned Models: The instruction-tuned models’ ability to manage long contexts is assessed through tasks like the "Needle in a Haystack," revealing their capability to handle context lengths up to 128K tokens, especially when augmented with YARN. 📚 Dataset Augmentation: Extensive efforts were made to expand and improve the datasets used for pretraining and instruction-tuning. This includes a focus on increasing both the volume and quality of data across various languages, enhancing the models' competencies beyond the default English and Chinese. Link to the model repo in the comments. #llm #nlp #machinelearning https://lnkd.in/disrdyrF

GitHub - QwenLM/Qwen2: Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

github.com

1 Comment

Eduardo Muñoz

5mo

Link to the model on HF: https://huggingface.co/Qwen

To view or add a comment, sign in

More Relevant Posts

Nilesh Barla

Founder @PerceptronAI | Content Writer | Researcher and Deep Learning Engineer
5mo
Report this post
Most recently Google released Gemini 1.5 pro which can process up to 128,000 to 1 million tokens; these models are long-context language models. As the name implies these models can learn and process information from more than 100 documents containing 5000 words. That is insane!!! How? Because now you can conduct complex research, there will be efficiency while retrieving information, responses will be much more coherent even in lengthier conversations, and much more. Other LCLMs are Claude-3 Opus which can process from 200,000 to 1 million tokens and maybe GPT4 and GPT4o. In this paper, researchers show that LCLMs can rival RAG-based LLMs even when they are not explicitly trained. Here, are some key takeaways from this paper: 1. Challenge: The entire RAG-based LLM system is complex because of the integration of many components which includes retrieval systems or databases. It introduces cascading errors. Solution: By natively processing the entire corpora in the LCLM itself. LCLM can even leverage sophisticated prompting techniques across the entire system. 2. Challenge: The authors argue that the entire benchmarking system is faulty as they don’t have real-world resemblance. The real-world data is much more complex. Solution: They introduce Long-context frontiers. It is a benchmark that evaluates LLM with long-context capabilities, essentially to check whether the model can retain contextual information up to millions of tokens 3. Challenge: RAGs require extensive training and a highly sophisticated pipeline. Solution: LCLMs don’t require extensive training and they were able to rival state-of-the-art retrieval and RAG systems. Lastly, these are my two favorite highlights: 1. Corpus-in-Context Prompting: In CiC the model provides relevant data in the context window itself allowing the model to process and leverage all the important information in one shot. This eliminates the need to rely on multiple, separate retrieval steps. This means now the LLM can take a bunch of documents to generate a cohesive response for a long conversation. In RAGs-based you will get responses from maybe the top 10 to 50 documents. 2. Many-shot in Context Learning: In Many-shot ICL you provide LLM with multiple examples from within the prompt to help the LLM perform better. This is something I do wherever I need to understand complex subjects. Read this paper to learn more in detail about the topics I have explained: https://lnkd.in/gRYFuisK Github: https://lnkd.in/gf3sCmQn
Like Comment
To view or add a comment, sign in
Anand Prabhala

Shaping the Future of Software Engineering with kis.ai | Thought Leader in AI-Powered Software Development Solutions
5mo
Report this post
Natural Language with Embedded Programs LLMs are doing fantastic things with unstructured data, but still struggle with structured data and reasoning on well-defined problems. And there are the usual issues with prompting, context, hallucination, and parsing response from LLMs. Function calling alleviates the response part to a certain extent but other problems still cause the failure rates. This keeps LLMs out of production for the good portion of enterprise use cases. In this promising technique, outlined here https://lnkd.in/gAec--uB, some of the problems are solved in two steps. The first step is to identify the problem and generate the data and code needed for the solving the problem. Execute the code and get the response and the give the response back to the LLM to give the natural language response. We have been building a similar solution, for deterministic answers, with our code generation service, and are having some promising results. One of the good use case is to give some sales numbers and ask the LLM to forecast the sales number with Lean, Regular and high growth %s. Linear projections are solved problem and fairly simple algorithm in most languages. We solve this in four steps: User prompts for sales forecast given monthly sales numbers for 2020, 2021, 2022 and 2023, in a table, and requests for forecasts for next 3 years start Jan 2024, with 6%, 10%, 20% growth options. We add prompt to system, to aid the LLM to break it down into intent and extract table data and specific algorithm needed give a structured response back. LLM returns the intent and table in json structure and algorithm suggestions. This is passed on to code generation engine with a repository of algorithms, which in turn generates the code needed to make the forecasts accurately and returns the code. Here the code generation engine uses a fine-tuned code generation LLM The code is executed in a temporary VM and the result is returned back to the LLM in a new prompt with the previous prompt and forecasted values in the context and The LLM then takes the question and the answers presents it along with its interpretation for all three cases. Our technique, is a variation from the one suggested in this paper, with multiple steps, but is helpful in more complex scenarios. This approach of designing a "Compound AI" with multiple LLMs stitched together, will always yield more deterministic and precise results. This aligns closely with our philosophy at kis.ai, that well-designed "Compound AI" will always be more efficient and accurate than foundational models. #nlep #ai #aicode #code #compoundai

2309.10814

arxiv.org

1 Comment
Like Comment
To view or add a comment, sign in
VAIBHAV SHANKAR SHARMA

Serial Entrepreneur skilled in Product Innovation, on a secret mission to make the future secure for people around the globe. Expert in Fintech, Marketing, and Beyond.
1mo
Report this post
Layer-of-Thoughts Prompting (LoT): A Unique Approach that Uses Large Language Model (LLM) based Retrieval with Constraint Hierarchies Utilizing Large Language Models (LLMs) through different prompting strategies has become popular in recent years. However, many current methods frequently offer very general frameworks that neglect to handle the particular difficulties involved in creating compelling urges. Differentiating prompts in multi-turn interactions, which involve several exchanges between the user and model, is a crucial problem that remains mostly unresolved. By studying how hierarchical relationships between cues can enhance these interactions, a recent study from Center of Juris-Informatics, ROIS-DS, Tokyo, Japan, has tried to close that gap. It specifically presents the idea of “thought hierarchies,” which helps in honing and sifting possible answers. This method makes more accurate, understandable, and structured retrieval procedures possible. The hierarchical structure of these ideas is essential for creating algorithms that are both effective and simple to comprehend. To filter and improve query responses, this study has presented a unique technique called Layer-of-Thoughts Prompting (LoT) based on hierarchical constraints. TCenter of Juris-Informatics, ROIS-DS, Tokyo, Japanhis method delivers a better organized and explicable information retrieval process by automating the procedures necessary to make the retrieval process more efficient. The application of constraint hierarchies, which help in methodically reducing the number of potential answers depending on the particular criteria of a query, distinguishes LoT from other approaches. LLMs can be promoted in various ways. Still, most rely on generalized frameworks that don’t adequately handle the complexity of multi-turn interactions, in which users and models exchange information multiple times before concluding. The depth required to address the specific difficulties of sustaining consistent and context-aware prompts across several exchanges is lacking in these earlier methods. LoT has highlighted the prompts’ hierarchical structure and interrelationships. One important component of LoT’s effectiveness is its conceptual framework. To create retrieval algorithms that are effective and simple to understand, the system arranges prompts and their answers into a layered, hierarchical structure. Because the system can explain why certain information is being retrieved and how it pertains to the original question, the resulting results are more accurate and easier to grasp. Building on the strength of LLMs, LoT uses their capabilities to enhance information retrieval tasks. The approach attains greater precision in obtaining pertinent data by directing the model through a more structured process of filtering responses and imposing constraints at various tiers. Furthermore, utilizing thought hierarchies improves the retrieval process’s transparency, facilitating users’...
Like Comment
To view or add a comment, sign in
2YoDoINDIA News Network

512 followers
8mo
Report this post
A New Open Source LLM, DBRX Claims to be the Most Powerful – Here are the Scores A whole new contender has entered the ring of large language models (LLMs). Databricks, a company specializing in data processing, has unveiled DBRX, claiming it to be the most powerful open-source LLM yet. But is it backing those claims up? Let’s find out. DBRX utilizes a transformer architecture and boasts a massive 132 billion parameters. It leverages a unique approach called a Mixture-of-Experts (MoE) model, consisting of 16 individual expert networks. During any given task, only 4 of these experts are active, utilizing 36 billion parameters for efficiency. GPT 4 also uses an MoE model. Databricks compares DBRX to other prominent open-source LLMs like Meta‘s Llama 2-70B, Mixtral (from France’s MixtralAI), and Grok-1 (developed by Elon Musk‘s xAI). DBRX reportedly outperforms its rivals in several key areas: Language Understanding: DBRX achieves a score of 73.7%, surpassing GPT-3.5 (70.0%), Llama 2-70B (69.8%), Mixtral (71.4%), and Grok-1 (73.0%). Programming Ability: Here, DBRX demonstrates a significant lead with a score of 70.1%, compared to GPT-3.5’s 48.1%, Llama 2-70B’s 32.3%, Mixtral’s 54.8%, and Grok-1’s 63.2%. Mathematics: DBRX takes another win with a score of 66.9%, edging out GPT-3.5 (57.1%), Llama 2-70B (54.1%), Mixtral (61.1%), and Grok-1 (62.9%). Databricks attributes DBRX’s speed to its MoE architecture, built upon their MegaBlocks research and open-source projects. This allows the model to output tokens at a very high rate. Additionally, Databricks positions DBRX as the most advanced open-source MoE model currently available, potentially paving the way for future advancements in the field. The open-source nature of DBRX allows for wider adoption and contribution from the developer community. This could accelerate further development and potentially solidify DBRX’s position as a leading LLM. For more news visit www.2yodoindia.com #2YoDoINDIA #LargeLanguageModels #LLM #DBRX #GPT
Like Comment
To view or add a comment, sign in
Emkademy

95 followers
3mo
Report this post
Exploring the trade-offs between Retrieval-Augmented Generation (RAG) and Long-Context LLMs: Can we achieve top performance without breaking the bank? This new study dives deep into the efficiency vs. accuracy debate with a hybrid approach. The paper discusses a comparison between two approaches for handling long texts in large language models (LLMs): Retrieval Augmented Generation (RAG) and long-context (LC) LLMs. RAG works by retrieving relevant information from a large database and then using that information to generate responses. On the other hand, LC LLMs can directly process and understand very long input texts. The researchers conducted a comprehensive study to compare these two methods using various public datasets and three recent LLMs. The study found that when given sufficient resources, LC LLMs consistently outperformed RAG in terms of average performance across different tasks. However, RAG still has a significant advantage in terms of cost-efficiency. This is because RAG requires less computational power and fewer input tokens, which typically determine the cost of using LLM APIs. The researchers observed that for over 60% of queries, RAG and LC produced identical predictions, suggesting that RAG could be used in these cases to reduce costs without sacrificing performance. Based on these findings, the researchers proposed a new method called SELF-ROUTE. This approach combines the strengths of both RAG and LC by using the LLM itself to decide whether to use RAG or LC for each query. SELF-ROUTE significantly reduces the overall computational cost while maintaining performance comparable to LC LLMs. For example, it reduced costs by 65% for Gemini-1.5-Pro and 39% for GPT-4O compared to using LC alone. The SELF-ROUTE method works in two steps. First, in the RAG-and-Route step, the LLM is given the query and retrieved chunks of information. It's asked to predict whether the query can be answered based on this information and, if so, to generate an answer. If the LLM determines that the query is answerable, it uses the RAG prediction as the final answer. For queries deemed unanswerable, it proceeds to the second step, where the full context is provided to the long-context LLM to obtain the final prediction. The researchers evaluated their method using three recent LLMs on nine datasets. The results showed that SELF-ROUTE consistently outperformed RAG by over 5% across all models. Compared to LC, there was a slight performance drop for GPT-4O (-0.2%) and Gemini-1.5-Pro (-2.2%), but an improvement for GPT-3.5-Turbo (+1.7%). Importantly, SELF-ROUTE achieved these results while using significantly fewer tokens. For example, GPT-4O used only 61% of the tokens required by LC while achieving comparable performance. This reduction in token usage translates to substantial cost savings. For more information check-out the paper: https://lnkd.in/dN7_SZaA
Like Comment
To view or add a comment, sign in
VAIBHAV SHANKAR SHARMA

Serial Entrepreneur skilled in Product Innovation, on a secret mission to make the future secure for people around the globe. Expert in Fintech, Marketing, and Beyond.
1mo
Report this post
LLM360 Group Introduces TxT360: A Top-Quality LLM Pre-Training Dataset with 15T Tokens In the ever-evolving world of large language models (LLMs), pre-training datasets form the backbone of how AI systems comprehend and generate human-like text. LLM360 has recently unveiled TxT360, a groundbreaking pre-training dataset comprising 15 trillion tokens. This release combines diversity, scale, and rigorous data filtering to achieve one of the most sophisticated open-source datasets to date. A Dataset Built on New Foundations TxT360 differentiates itself from previous datasets by including fresh sources such as FreeLaw (legal corpora), PG-19 (a collection of books), scientific papers, and Wikipedia. By blending these sources, TxT360 presents a richer and more nuanced dataset, designed to bolster the capabilities of the next generation of LLMs. From Common Crawl to Clean Data The creation of TxT360 began with Common Crawl, a publicly available web scrape that serves as the foundation for many modern language models.. However, simply using raw web data would not achieve the high standards LLM360 aimed for. Instead, the team embarked on a rigorous filtering journey to extract the most useful text from the massive collection of WARC (Web ARChive) files. Text Extraction: Clean, coherent text was isolated from noisy web data in WARC files. Language Filtering: Non-English content was removed to maintain a consistent dataset. URL Filtering: Redundant or low-value sources were filtered out, including spammy or promotional sites. Repetition Removal: Extensive efforts targeted repeated lines, paragraphs, and n-grams. Document and Line-Level Filtering: Heuristics were used to remove documents and lines that did not meet quality benchmarks. In total, 97.65% of the original data was filtered out, retaining only high-quality, meaningful text to ensure robust and nuanced language models. Global Deduplication Building a high-quality dataset like TxT360 required effective deduplication. LLM360 tackled this through two approaches: exact deduplication using a Bloom filter and fuzzy deduplication using a MinHash algorithm. These methods ensured that the dataset contained unique content, avoiding the pitfalls of repetitive learning. High-Quality Sources After the filtering process, LLM360 added handpicked, high-quality corpora, including scientific papers, legal documents, classic books, and curated Wikipedia content. Each of these specialized sources went through tailored pipelines to preserve data integrity and quality, ensuring that the resulting language models can handle a wide range of topics. TxT360: A New Era for Open-Source AI The release of TxT360 marks a significant leap forward in AI and NLP research. LLM360’s meticulous construction and filtering demonstrate that quality and quantity can coexist. With 15 trillion tokens, TxT360 supports the development of nuanced, capable, and intelligent language models. Moreover, LLM360’s...
Like Comment
To view or add a comment, sign in
Trelis Research

347 followers
8mo
Report this post
🔍 Improved Retrieval Augmented Generation with ALL-SORT 🔍 The issue with traditional vector-based search (RAG, retrieval augmented generation) is that vector comparison can miss relevant snippets of information. This [https://lnkd.in/dgw3WWyq] recent paper from Netflix shows that cosine comparisons can often be arbitrary, and badly represent underlying information. So, I started playing around with small language models - as a preprocessing step - to decide which chunks of a document are most relevant to a query. However, this ends up being slow and expensive because you are waiting for the language model to return the text that is relevant. Then, instead of asking the LLM to summarise or extract, I thought of using it to simply rate the relevance of a chunk on a scale of 1-5. This requires the LLM to only respond with one character/token (i.e. 1, 2, 3, 4, or 5), which gives a big speedup. Now, getting a 1 character consistent response is tricky - because LLMs sometimes blab on. However, there’s now a technique called regex forcing, that forces the model to output a certain format (see Outlines on GitHub). This method is not as quick and scalable as vector search, but it improves on quality. Compared to using a long-context model like Claude or GPT-4-Turbo, it can also improve on quality, while significantly reducing costs. ➡️ How it works: Break your long input text into chunks (similar to vector database search) Instead of using vector search, use a language model to rate the relevance of each chunk to the question on a 1-5 scale (using regex forcing) Take the highest rated chunks (4s and 5s) and include those in the final prompt (with a minimum of 3 chunks) ➡️ Why it's promising: * Language models are very good at determining if a text snippet is relevant to a question * Allows flexibly including more or less context based on relevance * Can outperform both providing the full context or using standard retrieval augmented generation (RAG) with vector search ➡️ Some key implementation details: * Use a grammar field (regex forcing) to restrict the language model to only outputting 1-5 relevance scores * Hit the language model API in parallel for each chunk to utilize GPU effectively * Experiment with different base language models (instruction tuned ones like Smaug 34B performed best) ➡️ Potential advantages over other approaches: * More accurate than vector search at identifying truly relevant snippets * More focused than dumping in full context which can confuse the model * Costs can be comparable to commercial APIs when utilizing GPU well I ran some initial experiments with ALL-SORT on some tricky queries about Berkshire Hathaway annual meetings and reports. The results were promising, outperforming both full context and RAG baselines. And shouts here for some of the underlying tools: - Smaug 34B fine-tune by Abacus (Bindu Reddy) of Jon Durbin's fine-tune: - Generation with Regex, see Outlines on Github: Rémi Louf
10 Comments
Like Comment
To view or add a comment, sign in
Steve Shirkey

Director, Azure AI (ANZ, ASEAN, Korea) at Microsoft
3mo
Report this post
Huge release by Daekeun Kim and Hyo (HK) Choi from our AI GBB team in Korea, sharing Korean language benchmarks for our latest Azure OpenAI model, gpt-4o-mini - like gpt-4o, the model excels on Korean language tasks, in this case mini completely eclipses gpt-35-turbo and nearly achieves gpt-4-turbo performance. The benchmarks and code are open source, extensible beyond our Azure models, so we encourage our customers and the larger community to try it out and even fork/share your own findings - and please let me in the comments what other Asia languages beyond Korean you may be looking to evaluate with your language models.

Daekeun Kim

AI Global Black Belt @ Microsoft | ex-AWS
3mo

다양한 LLM/SLM 모델이 지속적으로 등장하면서 기본적인 평가 데이터셋으로 LLM/SLM의 성능을 빠르게 파악하려는 고객들이 많습니다. 이에 CLIcK(Cultural and Linguistic Intelligence in Korean)과 HAE_RAE_BENCH 1.0 데이터셋에 대해 LLM이 객관식 문제를 얼마나 정확하게 푸는지 판별하는 벤치마킹 코드를 구현하여 공개합니다. 코드는 https://lnkd.in/gnVsNtq9 의 구현을 뼈대로 많은 부분을 수정하고 추가했습니다. 저는 같은 팀의 Hyo (HK) Choi 님과 함께 GPT-4o-mini (2024-07-18), GPT-4o (2024-05-13), GPT-4-turbo (2024-04-09), GPT-3.5-turbo (2023-06-13) 4개 모델에 대한 벤치마킹을 수행하였습니다. 벤치마킹 결과 GPT-4o-mini 의 성능이 매우 인상적입니다. 모든 지표에서 GPT-3.5-turbo보다 압도적이고 일부 지표는 GPT-4-turbo에 근접한 성능을 보이고 있습니다. 허깅페이스 모델을 비롯한 커스텀 모델의 벤치마킹도 가능하니 이 지표를 베이스라인으로 다른 모델도 자유롭게 비교해 보세요. ------------------------------------------------------------------------------- As different LLM/SLM models continue to emerge, many customers want to quickly see how LLM/SLM performs on basic evaluation datasets. We have implemented and released benchmarking code to determine how accurately LLM solves multiple-choice questions on the CLIcK (Cultural and Linguistic Intelligence in Korean) and HAE_RAE_BENCH 1.0 datasets. The code is based on the implementation at https://lnkd.in/gnVsNtq9, with many modifications to make it suitable for benchmarking. Together with Hyo (HK) Choi, I performed benchmarking on 4 models: GPT-4o-mini (2024-07-18), GPT-4o (2024-05-13), GPT-4-turbo (2024-04-09), and GPT-3.5-turbo (2023-06-13). The benchmarking results show that the performance of GPT-4o-mini is very impressive. It outperforms the GPT-3.5-turbo on all metrics and is close to the GPT-4-turbo on some metrics. You can also benchmark custom models, including Hugging Face models, so feel free to compare other models using these metrics as a baseline. Code: https://lnkd.in/geCXW2Nz #azureopenai #gpt4omini #gpt4o

GitHub - daekeun-ml/evaluate-llm-on-korean-dataset: Performs benchmarking on two Korean datasets with minimal time and effort.

github.com

7 Comments
Like Comment
To view or add a comment, sign in
Venkatesh Yakkala

NLP Engineer | Data Scientist | Expert in Genai and LLMs | Master in Machine learning | Deep Learning
1mo
Report this post
Leveraging the Full Potential of LLMs: RAG vs. Fine-Tuning Background on LLMs and RAG Large Language Models (LLMs) are pre-trained on vast datasets, enabling them to perform a variety of tasks like text generation, question answering, and translation. However, their knowledge can be limited and may not reflect the most current information. Retrieval-Augmented Generation (RAG) enhances LLMs by integrating relevant external knowledge from a database before generating responses. For instance, a financial advisor LLM could pull a client's investment history to provide tailored financial advice. This combination allows RAG systems to be more knowledgeable and reliable than standard LLMs. RAG Advantages Over Fine-Tuning LLMs While fine-tuning involves training a pre-trained LLM on specific datasets to improve task performance, it has several drawbacks: Forgetting: Fine-tuned models may lose general capabilities from pre-training, which can impair their performance on non-specific tasks. Training Data Dependence: Performance heavily relies on the availability and quality of domain-specific training data, which can be costly to obtain. Lacks External Knowledge: Fine-tuned models are limited to their training data and may not incorporate up-to-date real-world knowledge. Not Customizable: Updating a fine-tuned model requires retraining, which is resource-intensive. In contrast, RAG systems: Retain Pre-training Capabilities: Since the LLM isn’t modified, it retains its general capabilities. Utilize External Knowledge: RAG allows the integration of customizable, domain-specific knowledge. Flexibility: Knowledge sources can be adjusted without the need for retraining the LLM. Lower Data Requirements: Less reliance on extensive domain-specific training datasets. As a result, RAG often outperforms fine-tuning while maintaining the versatility of the original LLM. When to Fine-Tune vs. RAG for Different Model Sizes Large Language Models (e.g., GPT-4): RAG is usually preferable because it retains the extensive capabilities of large models while leveraging external databases for up-to-date information. Fine-tuning risks catastrophic forgetting, where the model may lose critical skills. Medium Language Models (e.g., Llama 2 7B): Both RAG and fine-tuning are viable. Fine-tuning might be favored for tasks that require heavy memorization, such as specific question-answering. RAG excels in domain-specific tasks where retrieval of relevant knowledge is beneficial. Small Language Models (e.g., Zephyr, Phi2): Fine-tuning is generally more suitable, as smaller models lack the breadth of knowledge found in larger models. Fine-tuning can directly imbue domain knowledge without the risks of forgetting. Retraining small models is often easier and less costly. https://lnkd.in/dDsMC7Hc

web link

miro.medium.com
Like Comment
To view or add a comment, sign in
Raphaël MANSUY

Data Engineering | DataScience | AI & Innovation | Author | Follow me for deep dives on AI & data-engineering
3mo
Report this post
ULLME: Advancing Text Embeddings with Large Language Models ... A new paper from researchers at the University of Oregon introduces ULLME - a Unified framework for Large Language Model Embeddings that addresses key challenges in this space. 👉 The Text Embedding Challenge Text embeddings allow us to represent words and documents as dense vectors, enabling powerful applications in search, recommendation systems, and more. However, existing frameworks for LLM-based embeddings have been limited in flexibility and performance. 👉 ULLME: A Versatile Solution ULLME offers a modular, plug-and-play approach to leveraging LLMs for text embeddings: - Supports multiple LLM architectures - Enables bidirectional attention for improved context understanding - Integrates various fine-tuning strategies It's like having a universal set of building blocks that can be assembled to create optimal embedding solutions for diverse needs. 👉 Innovative Fine-Tuning with GRL A key innovation in ULLME is Generation-augmented Representation Learning (GRL). This novel fine-tuning strategy combines the strengths of contrastive learning and text generation to create more nuanced embeddings. Here's how it works: 1. Taste Testing (Contrastive Learning): The model learns to recognize similar and different text passages. 2. Cooking Practice (Generation Task): The model generates text based on given prompts or queries. 3. Consistency Check: We align the model's understanding of relevance in both embedding and generation spaces. 4. Preference Learning: The model learns to generate more relevant and high-quality text. By bridging these skills, GRL creates a more well-rounded "AI chef" capable of handling a wide variety of language tasks. 👉 Practical Impact ULLME demonstrates significant performance improvements across various embedding tasks, with potential applications in: - Information retrieval - Question answering systems - Semantic search - And more By providing a flexible, high-performance framework, ULLME can accelerate research and development in LLM-based text understanding. 👉 Looking Ahead The open-source nature of ULLME invites collaboration and further innovation. As LLM architectures continue to evolve, ULLME provides a foundation for exploring their full potential in text embedding tasks. Interested in advanced text embeddings? Check out the ULLME GitHub repository and paper in comments.

4 Comments
Like Comment
To view or add a comment, sign in

731 followers

View Profile Connect

Eduardo Muñoz’s Post

GitHub - QwenLM/Qwen2: Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.

github.com

More from this author

IA en los negocios

Explore topics