Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications
Ebook878 pages6 hours

Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications

Rating: 0 out of 5 stars

()

Read preview
LanguageEnglish
PublisherPackt Publishing
Release dateMay 10, 2024
ISBN9781805124405
Building Data-Driven Applications with LlamaIndex: A practical guide to retrieval-augmented generation (RAG) to enhance LLM applications

Related to Building Data-Driven Applications with LlamaIndex

Related ebooks

Programming For You

View More

Reviews for Building Data-Driven Applications with LlamaIndex

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Building Data-Driven Applications with LlamaIndex - Andrei Gheorghiu

    Cover.jpg

    Building Data-Driven Applications with LlamaIndex

    As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.

    Copyright © 2024 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Niranjan Naikwadi

    Publishing Product Manager: Nitin Nainani

    Book Project Manager: Aparna Ravikumar Nair

    Content Development Editor: Priyanka Soam

    Technical Editor: Rahul Limbachiya

    Copy Editor: Safis Editing

    Indexer: Pratik Shirodkar

    Production Designer: Shankar Kalbhor

    DevRel Marketing Coordinator: Vinishka Kalra

    First published: May 2024

    Production reference: 1150424

    Published by

    Packt Publishing Ltd.

    Grosvenor House

    11 St Paul’s Square

    Birmingham

    B3 1RB, UK

    ISBN 978-1-83508-950-7

    www.packtpub.com

    For the past six months, the focus required to create this book has sadly kept me away from the people I love. To my family and friends, your understanding and support have been my harbor in the storm of long hours and endless revisions.

    Andreea, your love has been the gentle beacon guiding me through this journey. To my daughter Carla and every young reader out there: never stop learning! Life is a journey with so many possible destinations. Make sure you are the one choosing yours. My dear friends at ITAcademy, you guys rock! Thanks for supporting me along the way. Also, finalizing this book would not have been possible without the dedicated efforts and unwavering commitment of the Packt team. I extend my heartfelt gratitude to everyone involved in this project.

    – Andrei Gheorghiu

    Contributors

    About the author

    Andrei Gheorghiu is a seasoned IT professional and accomplished trainer at ITAcademy with over two decades of experience as a trainer, consultant, and auditor. With an impressive array of certifications, including ITIL Master, CISA, ISO 27001 Lead Auditor, and CISSP, Andrei has trained thousands of students in IT service management, information security, IT governance, and audit. His consulting experience spans the implementation of ERP and CRM systems, as well as conducting security assessments and audits for various organizations. Andrei’s passion for groundbreaking innovations drives him to share his vast knowledge and offer practical advice on leveraging technology to solve real-world challenges, particularly in the wake of recent advancements in the field. As a forward-thinking educator, his main goal is to help people upskill and reskill in order to increase their productivity and remain relevant in the age of AI.

    About the reviewers

    Rajesh Chettiar, holding a specialization in AI and ML, brings over 13 years of experience in machine learning, Generative AI, automation, and ERP solutions. He is passionate about keeping up with cutting-edge advancements in AI and is committed to improving his skills to foster innovation.

    Rajesh resides in Pune with his parents, his wife, Pushpa, and his son, Nishith. In his free time, he likes to play with his son, watch movies with his family, and go on road trips. He also has a fondness for listening to Bollywood music.

    Elliot helped write some of the LlamaIndexTS (Typescript version of LlamaIndex) codebase. He is actively looking to take on new generative AI projects (as of early 2024), he is available on GitHub and Linkedin.

    I thank the Lord for everything. Thank you, Dad, Mom, and twin sister for your amazing support. Thank you to my friends who gave me their honest opinions and helped me grow. Thank you Yi Ding at LlamaIndex for helping me start this GenAI journey, and Yujian Tang for introducing me to Yi and always being supportive of open-source. Finally, thank you to everyone who has reached out to talk about generative AI; I learn new things every day from each of you

    Srikannan Balakrishnan is an experienced AI/ML professional and a technical writer with a passion for translating complex information into simpler forms. He has a background in data science, including AI/ML, which fuels his ability to understand the intricacies of the subject matter and present it in a way that is accessible to both technical and non-technical audiences. He also has experience in Generative AI and has worked with different clients to solve their business problems with the power of data and AI. Beyond his technical expertise, he is a skilled communicator with a keen eye for detail. He is dedicated to crafting user-friendly documentation that empowers readers to grasp new concepts and navigate complex systems with confidence.

    Arijit Das is an experienced Data Scientist with over 5 years of commercial experience, providing data-driven solutions to Fortune 500 clients across the US, UK, and EU. With expertise in Finance, Banking, Logistics, and HR management, Arijit excels in the Data Science lifecycle, from data extraction to model deployment and MLOps. Proficient in Supervised and Unsupervised ML techniques, including NLP, Arijit is currently focused on implementing cutting-edge ML practices at Citi globally.

    Table of Contents

    Preface

    Part 1: Introduction to Generative AI and LlamaIndex

    1

    Understanding Large Language Models

    Introducing GenAI and LLMs

    What is GenAI?

    What is an LLM?

    Understanding the role of LLMs in modern technology

    Exploring challenges with LLMs

    Augmenting LLMs with RAG

    Summary

    2

    LlamaIndex: The Hidden Jewel - An Introduction to the LlamaIndex Ecosystem

    Technical requirements

    Optimizing language models – the symbiosis of fine-tuning, RAG, and LlamaIndex

    Is RAG the only possible solution?

    What LlamaIndex does

    Discovering the advantages of progressively disclosing complexity

    An important aspect to consider

    Introducing PITS – our LlamaIndex hands-on project

    Here’s how it will work

    Preparing our coding environment

    Installing Python

    Installing Git

    Installing LlamaIndex

    Signing up for an OpenAI API key

    Discovering Streamlit – the perfect tool for rapid building and deployment!

    Installing Streamlit

    Finishing up

    One final check

    Familiarizing ourselves with the structure of the LlamaIndex code repository

    Summary

    Part 2: Starting Your First LlamaIndex Project

    3

    Kickstarting Your Journey with LlamaIndex

    Technical requirements

    Uncovering the essential building blocks of LlamaIndex – documents, nodes, and indexes

    Documents

    Nodes

    Manually creating the Node objects

    Automatically extracting Nodes from Documents using splitters

    Nodes don’t like to be alone – they crave relationships

    Why are relationships important?

    Indexes

    Are we there yet?

    How does this actually work under the hood?

    A quick recap of the key concepts

    Building our first interactive, augmented LLM application

    Using the logging features of LlamaIndex to understand the logic and debug our applications

    Customizing the LLM used by LlamaIndex

    Easy as 1-2-3

    The temperature parameter

    Understanding how Settings can be used for customization

    Starting our PITS project – hands-on exercise

    Let’s have a look at the source code

    Summary

    4

    Ingesting Data into Our RAG Workflow

    Technical requirements

    Ingesting data via LlamaHub

    An overview of LlamaHub

    Using the LlamaHub data loaders to ingest content

    Ingesting data from a web page

    Ingesting data from a database

    Bulk-ingesting data from sources with multiple file formats

    Parsing the documents into nodes

    Understanding the simple text splitters

    Using more advanced node parsers

    Using relational parsers

    Confused about node parsers and text splitters?

    Understanding chunk_size and chunk_overlap

    Including relationships with include_prev_next_rel

    Practical ways of using these node creation models

    Working with metadata to improve the context

    SummaryExtractor

    QuestionsAnsweredExtractor

    TitleExtractor

    EntityExtractor

    KeywordExtractor

    PydanticProgramExtractor

    MarvinMetadataExtractor

    Defining your custom extractor

    Is having all that metadata always a good thing?

    Estimating the potential cost of using metadata extractors

    Follow these simple best practices to minimize your costs

    Estimate your maximal costs before running the actual extractors

    Preserving privacy with metadata extractors, and not only

    Scrubbing personal data and other sensitive information

    Using the ingestion pipeline to increase efficiency

    Handling documents that contain a mix of text and tabular data

    Hands-on – ingesting study materials into our PITS

    Summary

    5

    Indexing with LlamaIndex

    Technical requirements

    Indexing data – a bird’s-eye view

    Common features of all Index types

    Understanding the VectorStoreIndex

    A simple usage example for the VectorStoreIndex

    Understanding embeddings

    Understanding similarity search

    OK, but how does LlamaIndex generate these embeddings?

    How do I decide which embedding model I should use?

    Persisting and reusing Indexes

    Understanding the StorageContext

    The difference between vector stores and vector databases

    Exploring other index types in LlamaIndex

    The SummaryIndex

    The DocumentSummaryIndex

    The KeywordTableIndex

    The TreeIndex

    The KnowledgeGraphIndex

    Building Indexes on top of other Indexes with ComposableGraph

    How to use the ComposableGraph

    A more detailed description of this concept

    Estimating the potential cost of building and querying Indexes

    Indexing our PITS study materials – hands-on

    Summary

    Part 3: Retrieving and Working with Indexed Data

    6

    Querying Our Data, Part 1 – Context Retrieval

    Technical requirements

    Learning about query mechanics – an overview

    Understanding the basic retrievers

    The VectorStoreIndex retrievers

    The DocumentSummaryIndex retrievers

    The TreeIndex retrievers

    The KnowledgeGraphIndex retrievers

    Common characteristics shared by all retrievers

    Efficient use of retrieval mechanisms – asynchronous operation

    Building more advanced retrieval mechanisms

    The naive retrieval method

    Implementing metadata filters

    Using selectors for more advanced decision logic

    Understanding tools

    Transforming and rewriting queries

    Creating more specific sub-queries

    Understanding the concepts of dense and sparse retrieval

    Dense retrieval

    Sparse retrieval

    Implementing sparse retrieval in LlamaIndex

    Discovering other advanced retrieval methods

    Summary

    7

    Querying Our Data, Part 2 – Postprocessing and Response Synthesis

    Technical requirements

    Re-ranking, transforming, and filtering nodes using postprocessors

    Exploring how postprocessors filter, transform, and re-rank nodes

    SimilarityPostprocessor

    KeywordNodePostprocessor

    PrevNextNodePostprocessor

    LongContextReorder

    PIINodePostprocessor and NERPIINodePostprocessor

    MetadataReplacementPostprocessor

    SentenceEmbeddingOptimizer

    Time-based postprocessors

    Re-ranking postprocessors

    Final thoughts about node postprocessors

    Understanding response synthesizers

    Implementing output parsing techniques

    Extracting structured outputs using output parsers

    Extracting structured outputs using Pydantic programs

    Building and using query engines

    Exploring different methods of building query engines

    Advanced uses of the QueryEngine interface

    Hands-on – building quizzes in PITS

    Summary

    8

    Building Chatbots and Agents with LlamaIndex

    Technical requirements

    Understanding chatbots and agents

    Discovering ChatEngine

    Understanding the different chat modes

    Implementing agentic strategies in our apps

    Building tools and ToolSpec classes for our agents

    Understanding reasoning loops

    OpenAIAgent

    ReActAgent

    How do we interact with agents?

    Enhancing our agents with the help of utility tools

    Using the LLMCompiler agent for more advanced scenarios

    Using the low-level Agent Protocol API

    Hands-on – implementing conversation tracking for PITS

    Summary

    Part 4: Customization, Prompt Engineering, and Final Words

    9

    Customizing and Deploying Our LlamaIndex Project

    Technical requirements

    Customizing our RAG components

    How LLaMA and LLaMA 2 changed the open source landscape

    Running a local LLM using LM Studio

    Routing between LLMs using services such as Neutrino or OpenRouter

    What about customizing embedding models?

    Leveraging the Plug and Play convenience of using Llama Packs

    Using the Llama CLI

    Using advanced tracing and evaluation techniques

    Tracing our RAG workflows using Phoenix

    Evaluating our RAG system

    Introduction to deployment with Streamlit

    HANDS-ON – a step-by-step deployment guide

    Deploying our PITS project on Streamlit Community Cloud

    Summary

    10

    Prompt Engineering Guidelines and Best Practices

    Technical requirements

    Why prompts are your secret weapon

    Understanding how LlamaIndex uses prompts

    Customizing default prompts

    Using advanced prompting techniques in LlamaIndex

    The golden rules of prompt engineering

    Accuracy and clarity in expression

    Directiveness

    Context quality

    Context quantity

    Required output format

    Inference cost

    Overall system latency

    Choosing the right LLM for the task

    Common methods used for creating effective prompts

    Summary

    11

    Conclusion and Additional Resources

    Other projects and further learning

    The LlamaIndex examples collection

    Moving forward – Replit bounties

    The power of many – the LlamaIndex community

    Key takeaways, final words, and encouragement

    On the future of RAG in the larger context of generative AI

    A small philosophical nugget for you to consider

    Summary

    Index

    Other Books You May Enjoy

    Preface

    Beyond the initial hype that the fast advance of Generative AI and Large Language Models (LLMs) has produced, we have been able to observe both the abilities and shortcomings of this technology. LLMs are versatile and powerful tools driving innovation across various fields, serving as the foundation for natural language generation technology. Despite their potential, though, LLMs have limitations such as lacking access to real-time data, struggling to distinguish truth from falsehoods, maintaining context over long documents, and exhibiting unpredictable failures in reasoning and fact retention. Retrieval-Augmented Generation (RAG) attempts to solve many of these shortcomings and LlamaIndex is perhaps the simplest and most user-friendly way to begin your journey into this new development paradigm.

    Driven by a flourishing and expanding community, this open source framework provides a huge number of tools for different RAG scenarios. Perhaps, that’s also why this book is needed. When I first encountered the LlamaIndex framework, I was impressed by its comprehensive official documentation. However, I soon realized that the sheer amount of options can be overwhelming for someone who’s just starting out. Therefore, my goal was to provide a beginner-friendly guide that helps you navigate the framework’s capabilities and use them in your projects. The more you explore the inner mechanics of LlamaIndex, the more you’ll appreciate its effectiveness. By breaking down complex concepts and offering practical examples, this book aims to bridge the gap between the official documentation and your understanding, ensuring that you can confidently build RAG applications while avoiding common pitfalls.

    So, join me on a journey through the LlamaIndex ecosystem. From understanding fundamental RAG concepts to mastering advanced techniques, you’ll learn how to ingest, index, and query data from various sources, create optimized indexes tailored to your use cases, and build chatbots and interactive web applications that showcase the true potential of Generative AI. The book contains a lot of practical code examples, several best practices in prompt engineering, and troubleshooting techniques that will help you navigate the challenges of building LLM-based applications augmented with your data.

    By the end of this book, you’ll have the skills and expertise to create powerful, interactive, AI-driven applications using LlamaIndex and Python. Moreover, you’ll be able to predict costs, deal with potential privacy issues, and deploy your applications, helping you become a sought-after professional in the rapidly growing field of Generative AI.

    Who this book is for

    This book has been specifically designed for developers at varying stages of their careers who are eager to understand and exploit the capabilities of Generative AI, particularly through the use of RAG. It aims to serve as a foundational guide for those with a basic understanding of Python development and a general familiarity with Generative AI concepts.

    Here are the key audiences who will find this book invaluable:

    Entry-level developers: Individuals who have a foundational understanding of Python and are beginning their journey into the world of generative AI will find this book an excellent starting point. It will guide you through the initial steps of using the LlamaIndex framework to create robust and innovative applications. You’ll learn the core components, basic workflows, and best practices to kickstart your RAG application development journey.

    Experienced developers: For those who are already familiar with the landscape of generative AI and are looking to deepen their expertise, this book offers insight into advanced topics within the LlamaIndex framework. You’ll discover how to leverage your existing skills to develop and deploy more complex RAG applications, enhancing the capabilities of your projects and pushing the boundaries of what’s possible with AI.

    Professionals seeking to harness the full power of LLMs: If you’re looking to improve your productivity by building quick solutions for data-driven problems, this book will teach you the basic concepts and provide you with powerful abilities. If you’re a natural learner and want to experiment with this wonderful technology, this book will provide you with the tools to solve complex problems with greater efficiency and creativity.

    What this book covers

    Chapter 1

    , Understanding Large Language Models, serves as an introduction to generative AI and LLMs. It explains what LLMs are, their role in modern technology, and their strengths and weaknesses. The chapter aims to provide you with a foundational understanding of the capabilities of LLMs that LlamaIndex builds upon.

    Chapter 2

    , LlamaIndex: The Hidden Jewel - An Introduction to the LlamaIndex Ecosystem, introduces the LlamaIndex ecosystem and how it can augment LLMs. It explains the general structure of the book – starting with basic concepts and gradually introducing more complex elements of the LlamaIndex framework. The chapter also introduces the PITS Personalized Intelligent Tutoring System project, which will be used to apply the concepts studied in the book and covers the preparation of the development environment.

    Chapter 3

    , Kickstarting Your Journey with LlamaIndex, covers the basics of starting your first LlamaIndex project. It explains the essential components of a RAG application in LlamaIndex, such as documents, nodes, indexes, and query engines. The chapter provides a typical workflow model and a simple hands-on example, where readers will begin building the PITS project.

    Chapter 4

    , Ingesting Data into Our RAG Workflow, focuses on importing our proprietary data into LlamaIndex, emphasizing the usage of the LlamaHub connectors. We learn how to break down and organize documents by parsing them into coherent, indexable chunks of information. The chapter also covers ingestion pipelines, important data privacy considerations, metadata extraction, and simple cost estimation methods.

    Chapter 5

    , Indexing with LlamaIndex, explores the topic of data indexing. It provides an overview of how indexing works, comparing different indexing techniques to help readers choose the most suitable one for their use cases. The chapter also explains the concept of layered indexing and covers persistent index storage and retrieval, cost estimation, embeddings, vector stores, similarity search, and storage contexts.

    Chapter 6

    , Querying Our Data, Part 1 – Context Retrieval, explains the mechanics of querying data and various querying strategies and architectures within LlamaIndex, with a deep focus on retrievers. It covers advanced concepts such as asynchronous retrieval, metadata filters, tools, selectors, retriever routers, and query transformations. The chapter also discusses fundamental paradigms such as dense retrieval and sparse retrieval, along with their strengths and weaknesses.

    Chapter 7

    , Querying Our Data, Part 2 – Postprocessing and Response Synthesis, continues the query mechanics topic, explaining the role of node post-processing and response synthesizers in the RAG workflow. It presents the overall query engine construct and its usage, as well as output parsing. The hands-on part of this chapter focuses on using LlamaIndex to generate personalized content in the PITS application.

    Chapter 8

    , Building Chatbots and Agents with LlamaIndex, introduces the essentials of chatbots, agents, and conversation tracking with LlamaIndex, applying this knowledge to the hands-on project. You will learn how LlamaIndex facilitates fluid interaction, retains context, and manages custom retrieval/response strategies, which are essential aspects for building effective conversational interfaces.

    Chapter 9

    , Customizing and Deploying Our LlamaIndex Project, provides a comprehensive guide to personalizing and launching LlamaIndex projects. It covers tailoring different components of the RAG pipeline, a beginner-friendly tutorial on deploying with Streamlit, advanced tracing methods for debugging, and techniques for evaluating and fine-tuning a LlamaIndex application.

    Chapter 10

    , Prompt Engineering Guidelines and Best Practices, explains the essential role of prompt engineering in enhancing the effectiveness of a RAG pipeline, highlighting how prompts are used under the hood of the LlamaIndex framework. It guides readers on the nuances of customizing and optimizing prompts to harness the full power of LlamaIndex and ensure more reliable and tailored AI outputs.

    Chapter 11

    , Conclusion and Additional Resources, serves as a comprehensive conclusion, highlighting other projects and pathways for extended learning and summarizing the core insights from the book. It offers an overview of the main features of the framework, provides a curated list of additional resources for further exploration, and includes an index for quick terminology reference.

    To get the most out of this book

    You will need to have a basic understanding of Python development. General experience in using Generative AI models is also recommended. All the examples provided in the book have been specifically designed to run in a local Python environment, and because several libraries will be required along the way, it is recommended that you have a minimum of 20 GB of storage space available on your computer.

    Because most of the examples presented in the book rely on the OpenAI API, you’ll also need to obtain an OpenAI API key.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    As many of the code examples rely on the OpenAI API, keep in mind that running them will incur costs. Everything has been optimized for minimum cost but neither the author nor the publisher are responsible for these costs. You should also be advised of the security implications when using a public API such as the one provided by OpenAI. If you choose to use your own proprietary data to experiment with different examples, make sure you consult OpenAI’s privacy policy in advance.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Building-Data-Driven-Applications-with-LlamaIndex

    . The repository is organized in different folders. There is one corresponding folder for each chapter titled ch, where represents the chapter number. The folder called PITS_APP contains the source code of the main project presented throughout the book. If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

    . Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: […] using the download_llama_pack() method and specifying a download location such as […]

    A block of code is set as follows:

    from llama_index.llms.openai import OpenAI

    llm = OpenAI(

        api_base='http://localhost:1234/v1',

        temperature=0.7

    )

    When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

    from llama_index.llms.openai import OpenAI

    llm = OpenAI(

        api_base=

    'http://localhost:1234/v1'

    ,

        temperature=0.7

    )

    Any command-line input or output is written as follows:

    $ pip install llama-index-llms-neutrino

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Select System info from the Administration panel.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected]

    and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata

    and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]

    with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com

    .

    Share Your Thoughts

    Once you’ve read Building Data-Driven Applications with LlamaIndex, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page

    for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Download a free PDF copy of this book

    Thanks for purchasing this book!

    Do you like to read on the go but are unable to carry your print books everywhere?

    Is your eBook purchase not compatible with the device of your choice?

    Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

    Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

    The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

    Follow these simple steps to get the benefits:

    Scan the QR code or visit the link below

    Download a free PDF copy of this book

    https://packt.link/free-ebook/9781835089507

    Submit your proof of purchase

    That’s it! We’ll send your free PDF and other benefits to your email directly

    Part 1:Introduction to Generative AI and LlamaIndex

    As this ebook edition doesn't have fixed pagination, the page numbers below are hyperlinked for reference only, based on the printed edition of this book.

    This first part begins by introducing generative AI and Large Language Models (LLMs), discussing their ability to produce human-like text, their limitations, and how Retrieval-Augmented Generation (RAG) can address these issues by enhancing accuracy, reasoning, and relevance. We then progress to understand how LlamaIndex leverages RAG to bridge the gap between LLMs’ extensive knowledge and proprietary data, elevating the potential of interactive AI applications.

    This part has the following chapters:

    Chapter 1

    , Understanding Large Language Models

    Chapter 2

    , LlamaIndex: The Hidden Jewel - An Introduction to the LlamaIndex Ecosystem

    1

    Understanding Large Language Models

    If you are reading this book, you have probably explored the realm of large language models (LLMs) and already recognize their potential applications as well as their pitfalls. This book aims to address the challenges LLMs face and provides a practical guide to building data-driven LLM applications with LlamaIndex, taking developers from foundational concepts to advanced techniques for implementing retrieval-augmented generation (RAG) to create high-performance interactive artificial intelligence (AI) systems augmented by external data.

    This chapter introduces generative AI (GenAI) and LLMs. It explains how LLMs generate human-like text after training on massive datasets. We’ll also overview LLM capabilities, limitations such as outdated knowledge potential for false information, and lack of reasoning. You’ll be introduced to RAG as a potential solution, combining retrieval models using indexed data with generative models to increase fact accuracy, logical reasoning, and context relevance. Overall, you’ll gain a basic LLM understanding and learn about RAG as a way to overcome some LLM weaknesses, setting the stage for utilizing LLMs practically.

    In this chapter, we will cover the following main topics:

    Introducing GenAI and LLMs

    Understanding the role of LLMs in modern technology

    Exploring challenges with LLMs

    Augmenting LLMs with RAG

    Introducing GenAI and LLMs

    Introductions are sometimes boring, but here, it is important for us to set the context and help you familiarize yourself with GenAI and LLMs before we dive deep into LlamaIndex. I will try to be as concise as possible and, if the reader is already familiar with this information, I apologize for the brief digression.

    What is GenAI?

    GenAI refers to systems that are capable of generating new content such as text, images, audio, or video. Unlike more specialized AI systems that are designed for specific tasks such as image classification or speech recognition, GenAI models can create completely new assets that are often very difficult – if not impossible – to distinguish from human-created content.

    These systems use machine learning (ML) techniques such as neural networks (NNs) that are trained on vast amounts of data. By learning patterns and structures within the training data, generative models can model the underlying probability distribution of the data and sample from this distribution to generate new examples. In other words, they act as big prediction machines.

    We will now discuss LLMs, which are one of the most popular fields in GenAI.

    What is an LLM?

    One of the most prominent and rapidly advancing branches of GenAI is natural language generation (NLG) through LLMs (Figure 1.1):

    Figure 1.1 – LLMs are a sub-branch of GenAI

    Figure 1.1 – LLMs are a sub-branch of GenAI

    LLMs are NNs that are specifically designed and optimized to understand and generate human language. They are large in the sense that they are trained on massive amounts of text containing billions or even trillions of words scraped from the internet and other sources. Larger models show increased performance on benchmarks, better generalization, and new emergent abilities. In contrast with earlier, rule-based generation systems, the main distinguishing feature of an LLM is that it can produce novel, original text that reads naturally.

    By learning patterns from many sources, LLMs acquire various language skills found in their training data – from nuanced grammar to topic knowledge and even basic common-sense reasoning. These learned patterns allow LLMs to extend human-written text in contextually relevant ways. As they keep improving, LLMs create new possibilities for automatically generating natural language (NL) content at scale.

    During the training process, LLMs gradually learn probabilistic relationships between words and rules that govern language structure from their huge dataset of training data. Once trained, they are able to generate remarkably human-like text by predicting the probability of the next word in a sequence, based on the previous words. In many cases, the text they generate is so natural that it makes you wonder: aren’t we humans just a similar but more sophisticated prediction machine? But that’s a topic for another book.

    One of the key architectural innovations is the transformer (that is the T in GPT), which uses an attention mechanism to learn contextual relationships between words. Attention allows the model to learn long-range dependencies in text. It’s like if you’re listening carefully in a conversation, you pay attention to the context to understand the full meaning. This means they understand not just words that are close together but also how words that are far apart in a sentence or paragraph relate to each other.

    Attention allows the model to selectively focus on relevant parts of the input sequence when making predictions, thus capturing complex patterns and dependencies within the data. This feature makes it possible for particularly large transformer models (with many parameters and trained on massive datasets) to demonstrate surprising new abilities such as in-context learning, where they can perform tasks with just a few examples in their prompt. To learn more about transformers and Generative Pre-trained Transformer (GPT), you can refer to Improving Language Understanding with unsupervised learning– Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever (https://openai.com/research/language-unsupervised

    ).

    The best-performing LLMs such as GPT-4, Claude 2.1, and Llama 2 contain trillions of parameters and have been trained on internet-scale datasets using advanced deep learning (DL) techniques. The resulting model has an extensive vocabulary and a broad knowledge of language structure such as grammar and syntax, and about the world in general. Thanks to their unique traits, LLMs are able to generate text that is coherent, grammatically correct, and semantically relevant. The outputs they produce may not always be completely logical or factually accurate, but they usually read convincingly like being written by a human. But it’s not all about size. The quality of data and training algorithms – among others – can also play a huge role in the resulting performance of a particular model.

    Many models feature a user interface that allows for response generation through prompts. Additionally, some offer an

    Enjoying the preview?
    Page 1 of 1