Showing posts with label AI. Show all posts
Showing posts with label AI. Show all posts

Wednesday, 18 October 2023

Building generative AI/ChatGPT on your data solutions - considerations, pitfalls and lessons learnt

The last article focused on combining organisational data with ChatGPT and Large Language Models, specifically using Microsoft’s 'Azure OpenAI on your data' accelerator which is designed to simplify this. I’ve been focused on the general area of 'AI with your data' (though not the AOI accelerator specifically) for a while now with colleagues, and I don’t think it’s any exaggeration to say that combining generative AI and organisational data will be a big thing for the next few years. The results can be astonishing – we all know what ChatGPT is capable of, but seeing it answer questions and generate content related to an organisation’s clients, products, services, people, and projects rather than its original internet training data immediately shows huge value – providing a “second brain” for employees and supporting many use cases at work. Platform solutions like Microsoft 365 Copilot offer amazing capabilities for core collaboration and productivity, but building your own AI and data solution (often to supplement Copilot) using available building blocks is often the way to go for better results with your data.

The overall message from my last article (Integrating your data with ChatGPT - exploring Microsoft's "Azure OpenAI on your data" accelerator) was that the tool is a useful accelerator in some respects, but in reality only gets you so far in terms of what you probably need. For AI that gives relevant, accurate, and transparent responses to prompts and queries for real world use cases, the implementors need to understand concepts such as retrieval augmentation (RAG), chunking, vector generation, and more. There are various ways to slice this but here’s one way of thinking of the top-level considerations:

All of these concepts are inter-related.

This article tries to help you understand each in more detail, sharing info on our approach and technology selection (for Microsoft-centric solutions) as well as some lessons learnt. I'll finish with some predictions on where the space is going and what I believe will remain important.

Data platform for Retrieval Augmented Generation

Retrieval augmentation has emerged as a key concept for combining generative AI with data, representing arguably the first thing to learn about the space. I summarised it in the “RAG and other concepts” section of the last article, so if the concept is new to you, the three bullet points outlined there may help.

In order to be able to do RAG, you need a platform for your data and it may not be the one where it is currently held. A likely scenario is that data you wish to integrate with AI is spread across multiple platforms rather than conveniently batched up in one place anyway – in our clients, organisational knowledge is often spread across documents in Microsoft 365 (Teams and SharePoint), various data sources in Azure (e.g. structured data in Azure SQL or Cosmos DB, files in Azure BLOBs, perhaps some other flavours too), and some single-purpose SaaS applications. While you *may* have some success going directly to a myriad of platforms like this, there are two fundamental reasons why it’s likely to be difficult:

  • The data in its native form will not be suited to AI – it will not be chunked or represented as vector embeddings, meaning that poor answers are likely to be returned due to issues with relevance and similarity search (both needed by generative AI)
  • Establishing which data source to go to when (across all of the prompts and queries your users might enter) is likely to be difficult, especially when results should be returned in seconds – similarly, responses which combine data from multiple sources will be a challenge if you’re hopping across them

So, what’s often needed is a vector database which also acts as a data aggregation point. This allows you to run one retrieval operation across the right kind of data for AI, where data from your various sources has already been brought together and converted to embeddings. We favour Azure Cognitive Search in our solutions today since it has lots of connectors, a ready-made indexing platform, and support for vector storage, but as discussed last time many vector database options have sprung up in the AI era - from dedicated vector DBs such as Pinecone, Qdrant and Weaviate, to additions to existing technologies like Azure Cosmos DB (MongoDB flavour), Databricks, and Redis. Microsoft promote Azure Cognitive Search for generative AI applications and it does have some fairly unique capabilities, but we regularly review options in this fast-changing space.

See the “Generating vector embeddings” section for more on what vectors are why they’re needed in AI solutions.

Azure Cognitive Search – a competitive advantage for RAG?
While just about every data platform under the sun now has a vector database offering (I count three options from Microsoft alone - Azure Cognitive Search and two Cosmos DB options), an interesting consideration comes up in terms of choosing between search index and database architectures for storing vectors (i.e. what I described earlier as your RAG platform). Microsoft quite heavily promote Azure Cognitive Search as being especially suited to “AI on your data” solutions, by virtue of things possible with a search engine but not (easily) with a database. In particular, Cognitive Search offers a hybrid search option which combines both vector and full-text searching in the same query. The benefit of this can be improved accuracy of answers from the AI, stemming from increased relevance of initial results retrieved. The theory (quite logical) is that whereas embeddings are great for finding related concepts, keyword matching/full-text search works better with specialised terms, product codes, dates, names and so on because of the nature of exact matching.

We use this option in our solutions today and get good results, though without some fairly academic research it’s hard to pin down whether it’s definitively related to what happens in a hybrid search. In Cognitive Search, hybrid entails combining vector search and keyword search but also a re-ranking step based on “deep learning models adapted from Microsoft Bing”, all of which is detailed in a Microsoft article Azure Cognitive Search: Outperforming vector search with hybrid retrieval and ranking capabilities. The article goes some way to explaining Microsoft’s testing, methodology, and results - and therefore their rationale for positioning Azure Cognitive Search as the answer to retrieval augmented generation – all I can say here is that it works for us at the moment, but we’re open minded as to whether this is the only game in town.

Expertise in your RAG platform of choice is key – and you may need to bring in support or consultancy if Azure Cognitive Search (or your chosen vector database) isn’t a common skill today.

Chunking

Chunking refers to the practice of splitting long documents which go beyond the limitations of prompt size, e.g 4000 or 32000 tokens for GPT-4 (a token is around 4 characters of text). Remember that RAG is all about retrieving some data/information to give to the AI in a big long prompt, but the limitations we have today mean that a long document will never fit into the prompt in its entirety. What we need is for the most relevant part(s) of the document to be passed to the AI – and that means the documents need to be split into chunks in the RAG data platform. Additionally, models used to generate vectors have similar limits on the maximum input, so chunking is needed both for storing your data in the right format as well as retrieving it. The cut-off point for a chunk is equivalent to around 6000 words if you’re using the Azure OpenAI embeddings models for vector generation, so chunks need to be smaller than that. You can split your documents into:

  • Fixed-size chunks
  • Variable chunks
  • A hybrid, with some special chunking strategies added (e.g. to deal with specific formats in your documents like tables in PDFs or smart art)

In our experience, that last point needs some special thought - I expand on it in the later section on “Content tuning”.

Getting chunking right is vital. I speculated in the last article that some of my poor results with the “Azure OpenAI on your data” accelerator were due to inadequate chunking - there is a chunking mechanism in there, but it’s not used under all circumstances and the parameters used in the chunking script may not have suited my data.

In terms of existing tools to help you implement chunking, there are various scripts and options out there. The LangChain splitter is a common one and Semantic Kernel, Microsoft’s AI orchestrator library, also has one. Whatever script or approach you use, in most cases you’ll need to integrate it into your indexing/ingestion pipeline so that as documents and data change and need to be re-indexed, the chunking and other steps happen automatically. More on this in the “Content ingestion/indexing” section.

The following Microsoft documentation is a good read on chunking and related considerations:

Chunk documents in vector search - Azure Cognitive Search | Microsoft Learn

Generating vector embeddings

As touched on earlier, in most cases your AI solution will be much more powerful if your data is converted from its original form (e.g. text in a document) to embeddings. This allows concepts like similarity search, which is where much of the power of ChatGPT and generative AI comes from, in particular the feeling that the AI understands what you’re asking for regardless of the exact words you used. Most classic search solutions rely on keyword matching - a search for the word "dog" will only get results containing “dog". However, cats are somewhat related to dogs - and both are related to household pets. When your information is represented as embeddings, these semantic links and relationships can be understood – enabling AI solutions which use search, classification, recommendations, data visualization and more. The approach can work not just across text, but across other content types like images, audio, and video – different content types can all be converted to embeddings, enabling interesting scenarios like finding images and video related to concepts discussed in a conversation or document.

Embeddings are created by passing your data (e.g. text inside a document) into an AI model which returns the information as embeddings, i.e. arrays of numbers. OpenAI have this image which nicely represents the process:


 

Most solutions using Azure OpenAI will generate their embeddings using a model behind the Embeddings API e.g. text-embedding-ada-002. New versions get created as models evolve, and since these use different weights/measures internally the format is different, some care is needed that your embeddings generation matches the AI model you’re querying/prompting against.

AI orchestration

When developing AI applications, it quickly becomes apparent that some middleware is needed to do some of the heavy lifting of storing data and calling plugins. LangChain emerged as a popular open source library for this, followed by Semantic Kernel as a Microsoft equivalent. Semantic Kernel provides quite a few valuable functions:

  • Connectors to vector databases – including Azure Cognitive Search, but also Azure PostgreSQL, Chroma, Pinecone, Qdrant, Redis, Sqlite, Weaviate and a couple of others
  • A plugins model – allowing you to call out to other apps and systems from the conversation the user has with the AI. If you heard about ChatGPT plugins (e.g. those for Expedia, Zapier, Slack etc.) then this is the SK equivalent – and since the model provides an abstraction over different plugin architectures, both OpenAI and Azure OpenAI plugins can be used. Importantly, SK also provides some ready-to-go plugins, allowing you to do some common operations easily – calling out to a HTTP API, doing file IO, summarising conversations, getting the current time etc., and also doing some things LLMs aren’t suited to such as math operations
  • Memories – context is crucial in generative AI. The AI needs to understand things previously said in the conversation, so the user can ask contextual questions like “Can you expand on that?” Additionally, SK provides the concept of document memories, enabling the AI to have context of a particular document the user is working with closely. In this case, SK does the work of generating vectors embeddings for documents (e.g. those uploaded by the user in a front-end app), thus joining up several of the concepts discussed here

The real power of orchestration comes with chaining plugins and functions together in both predetermined and non-predetermined ways. In the latter case, we are allowing the LLM to decide how best to use a set of additional capabilities to meet a certain goal i.e. a request made by a user in a prompt which is extremely interesting. For this to be effective, functions need to be described well so that the AI can decide whether they will be useful. The concept of giving the AI agency to decide which tools from an extended toolkit may be useful for a given task (i.e. beyond what it was initially trained on) has huge potential for organisational use. Consider an insurance company offering home/car/pet insurance policies to a large client base – with the right set of plugins, it would be possible to make complex requests in prompts such as:

“Find all clients with a total annual contract value in the bottom 50%, and for each generate a personalised e-mail recommending policy extras not currently taken. Upload the draft e-mails to SharePoint and post a summary of client numbers and key themes to the ‘Client Retention’ team in Microsoft Teams to allow review”.

Such a request could simplify a complex data analysis, content generation, and approval exercise massively, not only reducing effort and cost but potentially bringing in new revenue through the campaign results. The capability is ground-breaking because we are able to approximate human work – taking a fairly open-ended input and establishing the process and tools to get to an outcome, perhaps via certain milestones. This is generative AI supporting automation within the workplace, leveraging GPT’s ability to process data, identify anomalies, establish trends, generate content, and take action via plugins.

Semantic Kernel is particularly strong in this, with several planner types offered to suit different “thinking approaches”. Simple cases will use the ActionPlanner, with more complex multi-step processes using one of the others:


See the planner capability in the SK docs for more info.

Content tuning

In the Chunking section earlier I touched on some of the complexities of chunking for specific content, such as tables in PDF files. Attention needs to be paid to what’s IN the files you are working with – not all content is created equal, and text in paragraphs is more easily understood by AI than tables, graphs, and other visualisations. Some of the specific examples we’ve run into where the AI did not initially give great answers on include:

  • Tables (in both PDF and Office docs), especially:
    • Long tables spanning multiple pages
    • Tables where some rows are effectively a “sub-header row”
  • Scanned/OCR’d documents where the content is effectively an image
  • HTML content
  • Images
  • Smart art
  • Document structure elements (headings, subheadings etc.) which convey semantics

We needed to take specific steps to deal with such content, and as mentioned in the last article I think it’s where accelerators like Azure OpenAI on your data can run out of steam. For a production-grade AI platform, you’ll need to establish what you need to solve for in this area and prioritise accordingly - there’s almost no upper limit to how much tuning and content optimisation you could implement. Note also that while I label this “content tuning”, the tuning actually takes place in your platform mechanisms – your content ingestion pipeline and the chunking script/code most specifically. You’re not changing content to suit the AI, because the business will create content as the business needs to. That said, one tactic for special content may to index a modified version of a file rather than the original – so long as you have a mechanism for ongoing ingestion of content created by the business.

So what are the specific steps you might need to take? A possible toolbox here includes:

  • Modifying chunking to recognise long tables and adopt tactics such as:
  • Create a larger chunk than normal so the entire table fits into one
  • Ensure the table header is repeated every time the table is split
  • Implementing ‘document cracking’ (aka document understanding) using something like:
    • Microsoft Syntex – perhaps to leverage its extraction capabilities with important values inside documents (e.g. contract value, start date, end date, special clauses etc.); this can ensure vital details are indexed properly
    • Azure AI Document Intelligence – similar to Syntex, using the Layout model allows you to crack PDFs or images to text, even if it’s a scanned document where the content is actually an image

Both of those document cracking approaches (Syntex and Azure Document Intelligence) allow tables to be processed since they understand headers, rows and columns.

In cases where high value information is expressed in such constructs, be ready to spend time in this area gradually tuning and improving the AI’s understanding of the content. To close, perhaps considering the image helps convey why gen AI needs help in this arena:

Content ingestion/indexing

All the previous aspects need to be worked into an indexing pipeline of some sort which can continually ingest data from source platforms - the only exception would be if you’re creating a simple solution based on a one-time upload of some static content, which is certainly more straightforward. Most scenarios, however, require generative AI to work against continually changing data (e.g. new and changing documents in Microsoft 365), and this means ensuring all of your steps to support RAG - in terms of content processing, chunking, embeddings generation and so on - are called as part of an automated pipeline.

But what triggers the process? You could run on a scheduled basis, but in many cases you can piggyback onto existing content indexing mechanisms which may be scheduled or based on detection of content changes. Another benefit of Azure Cognitive Search as the RAG platform is the support for indexers (see the list of connectors above). In our solutions, to bring GPT capabilities to documents stored in Microsoft 365 we use the SharePoint indexer in Cognitive Search to do the initial gathering, but extend using skillsets to integrate document cracking, chunking, embeddings generation and other steps into the ingestion process. A few considerations come with this, including:

  • The SharePoint indexer is still in preview at the time of writing
  • ACS has certain thresholds of how many indexes and indexers you can have – this varies based on pricing tier, but needs consideration when indexing at scale
  • The SharePoint indexer doesn’t currently deal well with some content scenarios such as deletions and folder renaming – this can lead to content staying in your gen AI platform when it shouldn’t, and missed content and/or broken links in citations

On the second point, our team have needed to augment the indexer to deal with these shortcomings. On the first, we have some views on challenges Microsoft might be running into with the SharePoint indexer (consider ACS ingesting a Microsoft 365 tenant with 30+ TB of data for example) – and we hope this isn’t one of those cases where Microsoft tech gets pulled without even making it out of preview. Having Cognitive Search index documents in SharePoint is a common scenario for many reasons, not just generative AI – leaving the world to create their own indexing mechanisms would take away a big value-add for Microsoft’s premier search technology.

Summary

Today, there’s no such a thing as a genuine turn-key platform for “generative AI on large amounts of your organisational data across different platforms”. On a related note, Microsoft 365 Copilot is amazing for many scenarios (and we had early exposure through the limited Early Access Program), but it’s not the answer to every generative AI use case. Sure, data from other platforms can be integrated via Copilot plugins, but in my view the pattern is better suited to small scale ‘callouts’ to the other systems (e.g. read or write a record) – this isn’t quite the model for “ingest TB of data from different company platforms to work with gen AI”.

However, with a talented team (or partner), such platforms can be built in a few weeks or months depending on your scope, and many parts of the stack will come from assembly of building blocks which exist already. Without a doubt, lots of the challenges above will be abstracted further in the next year or two – but at the same time, I’ll be surprised if Microsoft or anyone else cracks all parts of the puzzle in a way that works for everyone. Some elements will always be organisation-specific, and priorities will vary. Cost will always be a factor too – budgets will be found for AI projects demonstrating a path to return, but no-one wants to license a hugely expensive product only to find it can’t be easily configured or extended to work with company apps, data, and processes.

Similarly, no-one wants to spend a year building a platform because the team didn’t know what they were doing or weren’t following developments closely. Being plugged in to the firehose of generative AI changes is vital to avoid missteps and wasted effort. For implementors, I feel this is the new web development or the new databases - solutions of immense value can be built, so relevant expertise will be in demand. Following a series of “AI hacks” and client projects this year, I’m feeling good about how we’re shaping up at Content+Cloud/Advania to respond to this new era.

More fundamentally, the results we’re seeing from combining GPT (via Azure OpenAI) with our client’s organisational data are hugely encouraging and show the power of generative AI in the workplace. Seeing the AI perform reasoning and answer deep questions over organisational data which came from different platforms and in different formats provides a vision into how AI will power organisations and how work will get done over the next few years. As I keep saying, it’s a magical time.

Tuesday, 5 September 2023

Integrating your data with ChatGPT - exploring Microsoft's "Azure OpenAI on your data" accelerator

The idea of combining the power of ChatGPT and LLMs with organisational data has caught the attention of many. It seems to form the basis for many of the conversations I'm having with CIOs and tech leaders at the moment, and with good reason I think. After all, if you could "train" ChatGPT/generative AI everything about your company, your products and services, clients, employees and expertise, past projects and other valuable information, the potential would be huge. If you could further add a sprinkling of the most relevant content on the internet such as the latest industry regulations, analyst reports, or information from accredited suppliers, the potential could be increased further. "Instead of searching and creating, can't I just ask generative AI to give me what I need?" is a common theme of questioning. In my view we're only starting to understand the possibilities and accuracy rates, but in our client projects so far where we've integrated organisational data with ChatGPT, the results are pretty incredible. As one example, being able to ask natural language questions about past projects and get high quality, easy to understand answers, seems to bring out organisational knowledge in a powerful way that helps with decision-making and winning business.  

There are many approaches to integrating custom data with AI. For most Microsoft-centric organisations, when we talk about ChatGPT it's actually Azure OpenAI which is the starting point for generative AI. This is because it allows safe and controlled use of OpenAI models such as GPT-4, but delivered with all the benefits of trusted Azure such as improved privacy controls, data sovereignty, governance policies, and integration into existing cloud billing. The approach described here revolves around Azure OpenAI and you'll need to have an instance of the service created. 

Focus of this article
With this context, this article covers:
  • Core concepts when integrating data with ChatGPT/Azure OpenAI
  • Overview of Azure OpenAI on your data, with a focus on integrating Microsoft 365/SharePoint data in particular
  • The setup process for Azure OpenAI on your data
  • What the solution looks like and findings from testing
  • My thoughts on where the solution fits in combining AI with your data
 

RAG and other concepts in integrating data with ChatGPT and gen AI

Stitching together custom data with LLMs requires work. There are several overarching approaches, including training your own model (expensive and complex), fine-tuning an existing model (limited to small pieces of data), to techniques like Retrieval Augmented Generation (or RAG) which essentially combine searching across your dataset - that's the retrieval part - with the answer and content generation we commonly associate with LLMs. RAG is essentially a multi-step process, consisting of at least these steps:

  • Take user prompt and search across a dataset (i.e. your organisational data) for relevant information 
  • Construct a long, detailed prompt for the LLM which includes the fetched data - this is known as grounding
  • Generate a natural language response based on the retrieved information

The response will therefore feel like ChatGPT has not only been trained on internet data, but your custom company data too. The user does not know or care that a few things have happened under the surface. RAG is essentially the approach used by Microsoft 365 Copilot, where the data being returned in the initial step is from the Microsoft Graph - documents, relationships, meetings, 
activities, and other data in Microsoft 365.

In RAG, information is often converted to vectors or embeddings to better support natural language processing.

Overview - Azure OpenAI on your data  

To help with the data integration question, Microsoft provide the Azure OpenAI on your data capability (shortened to "AOI on your data" in this article). This is effectively a PaaS accelerator where much of the back-end complexity of integrating LLMs with your data is taken care of. It takes care of creating a back-end data store, allowing your custom data to be ingested, creating embeddings/vectors from your data (at least in some circumstances - more on that later), and allows you to quickly deploy a sample app to provide a basic user interface with some of the useful features you might want (e.g. chat history and citations). It does use resources in your chosen Azure subscription though - you'll either create these at the time of initial config or point to resources you've already provisioned.

Azure Cognitive Search is a key ingredient

In Azure OpenAI on your data, the key technology which allows your documents and data to be combined with AI is Azure Cognitive Search. Cognitive Search provides the information store from which the initial information is retrieved, before feeding this into the prompt to ChatGPT/the LLM. Conceptually you can use any queryable data platform in Retrieval Augmented Generation, but it helps a lot if the platform can store vector data. Azure Cognitive Search has been extended with this capability, but know that many vector database options have sprung up in the AI era - from dedicated vector DBs such as Pinecone, Qdrant and Weaviate, to additions to existing technologies like Azure Cosmos DB (MongoDB flavour), Databricks, and Redis. Microsoft promote Azure Cognitive Search for generative AI applications, and it does have some fairly unique capabilities. Azure OpenAI on your data supports the following data sources:

  • Azure BLOBs
  • Files you upload
  • An existing Azure Cognitive Search instance you have (which could hold information you've indexed from lots of sources)

Needless to say, the last option is the most powerful and flexible, so it's the one we'll look at here. One reason is that Azure Cognitive Search has an array of connectors which will allow you to bring in content quite easily from lots of platforms. These essentially break down as:

  • Native Microsoft connectors:
    • SharePoint Online, Azure SQL, Azure Cosmos DB, Azure MySQL, Azure BLOBs, Files, Tables, Data Lake Gen 2 etc.
  • Third party connectors - there are many, including:
    •  Adobe AEM, Amazon S3, Atlassian, Bentley Connectwise, Box.com, Elasticsearch and lots more - see the ACS connectors gallery
  • Your custom connector:
    • Essentially you can index anything by generating some JSON conforming to a particular structure

Using the 'existing Cognitive Search' option in Azure OpenAI on your data

As you might expect, you need an Azure Cognitive Search instance already and to have some data indexed, so if you're experimenting with this you'll need to get one created. If you're interested in "AI on your data" I recommend spending the time on this - it will help you understand how to combine ChatGPT with all sorts of data and platforms.

Unfortunately the free tier of ACS is not supported for AOI on your data, so you'll need an instance created on at least the 'Basic' tier (£61.05 per month in UK pricing at this time). A good resource for getting started is Create an Azure Cognitive Search service in the portal - the process described there will get you the base service provisioned in Azure. The next step is to connect to some content.

Indexing content in Microsoft 365/SharePoint Online with the SharePoint indexer

One popular scenario will be to combine ChatGPT/Azure OpenAI powers with the knowledge contained within documents in Microsoft 365. Sure, it's exactly what Microsoft 365 Copilot will do when it arrives, but for me there are still many reasons to explore going this way - perhaps in addition to adoption of Copilot. For one thing, licensing of all users in an organisation may be a difficult investment case at $30 per user per month - it's unlikely to be something rolled out to the entire organisation for most. In contrast, a tool you stitch together yourself could be - and it could be quite cost effective since there are building blocks like Azure Cognitive Search to support the journey. An AI strategy which combines Microsoft 365 Copilot usage (for those who derive the most value), with a supplementary AI tool which understands organisational data but has no per-user costs, could be a powerful approach to leveraging AI over the next few years. Regarding the latter, Azure Cognitive Search can bring together data from many sources quite easily - meaning it's a good foundation for AI that understands LOTS of how your organisation works. A key benefit is that it can go beyond just data in Microsoft 365. 

To get set up with Azure Cognitive Search indexing some of your M365/SharePoint content, I recommend following these instructions:

SharePoint indexer (preview) - Azure Cognitive Search | Microsoft Learn

Note that there are some technical steps in there since the config is done via Postman and the ACS REST API, but the process doesn't take too long. Once you've done this, it's now time for the fun part - configuring Azure OpenAI on your data and pointing to your Cognitive Search instance. 

Configuring Azure OpenAI on your data with ACS

The config steps for this part are done in the Azure AI Studio for your Azure OpenAI instance. As a reminder, you can get to this from the main Azure portal - your OpenAI instance will provide a link. 

Once there, head into the chat playground and find the "Add your data" tab. Click the "Add a data source" button as shown below:

In the dropdown which appears, select the Azure Cognitive Search option:


In the next dialog you're going to point to your Azure Cogntive Search instance by selecting the parent Azure subscription then choosing the ACS service. Note that you also select a specific index within Cognitive Search here - which is why you need all the Cognitive Search config to be in place already using a process like that described above in the "Indexing content in Microsoft 365/SharePoint Online with the SharePoint indexer" section: 

The next step involves telling ACS how to establish the various bits of data to display in search results. Since the '10 blue links' we associate with search results are always comprised of a title, a URL, a filename and a snippet of content, we need to tell ACS what they should be for the content being indexed. If you were indexing SQL data this might need more thought, but since SharePoint content is a set of files which naturally have these elements the mappings are quite logical. Just use the dropdowns to map each field to the relevant item specified when you created the indexer:

The final option relates to semantic search in Azure Cognitive Search, which is the ability of ACS to semantically understand relationships between concepts in your data. I'd recommend treating this as an advanced capability that you might not start off with - it's chargeable for one thing, and we've been finding good quality results without it, most likely because vector search is already doing some of this. So, I suggesting skipping past this one for now:  



The final step is to confirm your settings:

Once confirmed, you'll be back in the main area of the chat playground with your configuration displayed. Note the "limit responses to your data content" checkbox - this constrains the LLM to only your added data and ignores the core internet data it knows already. Whether you check this or not will depend on the solution you're building (i.e. whether you want both sources involved), but I suggest that you definitely want this during testing at least:

Config is now complete and we can think about the front-end interface and starting to test. 

Deploying the sample app front-end

Azure OpenAI on your data provides a deployable web application which can serve as the front-end. In reality, this isn't something you could deploy to an organisation without further work but it can be useful for testing and/or to accelerate the creation of a real front-end app. To provision it into your Azure subscription, start by finding the "Deploy to" button in the top-right hand corner of the Azure OpenAI Studio:

Choose your preferred option, but in my case I'm choosing the web app:

Specify the details for your web app - here's what mine looked like:




Once the web app has finished deploying, the AOI Studio will display this in the top-right corner:


Alternatively you can navigate straight in with the URL you specified. When you get there you'll see a basic web app which is talking to your data:


Looks good! But is it the promised land of ChatGPT and generative AI that truly understands your data? I'll start to answer that here, but there's a lot to consider so I expand on things in the next article - for those working in this space it's worth discussing findings and recommendations in more detail. 

Testing generative AI on your data

In short the results from my testing were.....mixed. I put this down to the Azure OpenAI accelerator taking care of some things for you, but for a production-grade solution my view is that you need more control and there's more work to do. Take this how you wish, but for now we are not using the "AOI on your data" accelerator in our client projects at Advania/Content+Cloud which combine generative AI and custom data. We're using similar principles and the very same technologies, but more 'grown-up' approaches based on the Microsoft documentation and other info. More on

Background - my scenario and data for testing

As a set of documents to interrogate, I'm using some of Microsoft's earnings reports from recent quarters. I spend quite a bit of time analysing these each quarter to understand Microsoft performance and strategy - they are full of dense information and it would be highly beneficial to be able to ask the AI simple questions and get simple answers, rather than lengthy digestion and interpreting of the contents which I do today. The documents take the format of both PowerPoint documents and Word transcripts from the quarterly earnings calls. I only have a few documents but as I say, they are full of complex information - here they are in the SharePoint document library which ACS is indexing:

The Word call transcripts look like this:

The PowerPoint files look like this:

Results overview

So let's ask some questions of the data. Initial results seem quite promising, like the answers in this converation thread:

Looks good! Any solid "generative AI on your data" solution should help you understand how it's finding the answers, and expanding the citation helps me see the source content:

However, the solution runs into challenges with some requests. Here's an example which I feel should have been answered:

That's a bit surprising because the answer isn't hard to find in the document set. In a different case, I see a bit of hallucination happening. The data is actually being misunderstood, and an answer is given but it's incorrect. The question I'm asking should again be quite easy to obtain from the documents - total revenue for a specific quarter:

The reason I know it's incorrect is because the answer is quite easy to find in both the PowerPoint and Word documents. Here it is in the deck for example:

Expanding the citation starts to explain what's happening here:

The AI has found something referring to revenue for the quarter, but in fact these numbers relate purely to Intelligent Cloud, one of Microsoft's segments, rather than total revenue. The fact that this part of the discussion in the call transcript relates purely to this has has been misunderstood. This is obviously somewhat concerning. As we combine AI with our data, the need for accuracy and precision tends to increase compared to consumer uses of ChatGPT for example. So why is this happening? Let's consider this and expand out into overall conclusions.

My high-level conclusions on Azure OpenAI on your data

My speculation on why AOI on your data doesn't always give great results in these cases comes down to what it does and does not do. Specifically, I put the AI misses above down to the fact that the data is not chunked properly. Sidebar - in the context of AI on your data, "chunking" is a key concept and refers to the practice of splitting long documents which go beyond the limitations of prompt size, e.g 4000 or 32000 tokens for GPT-4 (a token is around 4 characters of text). Clearly, a long document in it's entirety will not fit into the maximum prompt size allowed by LLMs today, so the typical approach involves splitting documents into smaller chunks. Indeed, Microsoft's documentation for AI on your data is explicit in calling out that you might need to do this - the "Ingesting your data into Azure Cognitive Search" section of the AOI on your data documentation (also linked below) discusses this and links to a commonly used 'data preparation' script - however it's something critical you'll need to take care of if you're building any kind of production solution. In some ways, this illustrates the issue with Microsoft's AOI on your data solution today - while it helps in provisioning a starter point for some elements, it doesn't necessarily do the hard bits which you'll need.

By it's nature, Azure OpenAI is an accelerator which tries to simplify the complex aspects of combining AI with your data, but realistically it cannot take care of everything. Colleagues and I are currently viewing it as a low-code route to AI on your data, and like many low-code solutions there are some trade-offs and you hope you don't run into brick walls. In Power Apps for example, it's possible to break past constraints by calling out to an Azure Function to run custom code or bringing in PCF components to go past out-of-the-box UX controls. In the same way, it's necessary to understand where the boundaries lie with AOI on your data. Let me try to be more specific.

Where Azure OpenAI on your data helps and where it doesn't

AOI on your data is helpful in the following ways: 

  • Provisioning a sample web app front-end to Azure App Service - this uses a GitHub sample which isn't a bad starting point, and the solution provisions an App Service and App Service Plan for you. The sample code surfaces capabilities such as SSO auth, chat history and citations, various config options in app settings, and while the UX is very basic it certainly could be extended (the exact sample used is linked below)
  • Provisioning a back-end data store - a Cosmos DB instance used to store chat history, which is configured with 'provisioned throughput' capacity mode (i.e. consumption-based pricing), and the Azure Cognitive Search instance if you're not pointing to an existing one 
  • Hooking up the front-end to the back-end - integrating the sample app to various infrastructure pieces via app config settings - your Cosmos DB, Azure Cognitive Search instance, and your Azure OpenAI instance etc.
  • Helping you connect Azure OpenAI in a basic way to simple custom data sources - as described above, this provides the basics to connect to Azure Blob Storage, the file upload option, and an existing Azure Cognitive Search instance (the approach used in this article)
However it's less helpful with other things you need:
  • Chunking of your data/content - when you bring an existing Azure Cognitive Search instance, which you'll do for anything other than Azure Blobs or the upload option (e.g. when you want to connect to a wider set of documents in Microsoft 365/SharePoint), the solution will use the data in it's non-chunked form - resulting in potential accuracy challenges
  • Generating vectors/embeddings from your data - this is required to provide similarity search, the capability that allows ChatGPT and generative AI to be so powerful in truly 'understanding' the training data
  • Support for a wide variety of data - the solution supports Word, PowerPoint, PDF and some simple file types (.html, .text, .md) but for anything else you're on your own. Additionally, the processing of these formats is somewhat 'black box' and if it doesn't do the right things for you (e.g. deal with images, graphs, or tables in your PDFs in the right way), it seems there's no control to improve things
  • Aligning with enterprise-grade Azure architecture practices - support here is patchy, and I could imagine some organisations may feel the solution doesn't quite align with their Azure standards and governance. For example, if your Azure OpenAI instance is protected by a vNet and private endpoint, Azure OpenAI on your data can connect to this if you complete an application form but not otherwise. Storage accounts with private endpoints are currently not supported
Providing a production-grade front-end which you can roll out to the business - in the end, the solution is deploying a sample app, and sample apps aren't meant for production - they are meant as a starting point for development. We've found there's fairly significant coding work to do on this front, and for our client projects (and internal deployment) we choose to use a different GitHub sample as our starting point to this one (there are several around and we've looked at all the major ones)

In the end, if your goal is to get ChatGPT (by which we really mean Azure OpenAI) talking to your data in Microsoft 365 or Azure, then you'll need to understand some of the deeper mechanics and building blocks involved in creating these solutions. My view is that while AOI on your data takes core of some useful pieces on the journey, those pieces aren't necessarily where the most complexity is. Of course, the capabilities of Azure OpenAI on your data will expand from where they are at the time of writing - there's absolutely no doubt about that. However, my recommendation is to consider the accelerator as the starting point for a technical team to use in a project - either simply as a reference architecture off to the side, or as the basis of a solution they will expand quite significantly. It's a great entry point to the space, but perhaps not the entire solution to providing a solution to the business which combines generative AI and organisational data.   

Beyond sample apps - delving deeper into building "AI on your data" solutions

In the next article, I'll go into more detail on some of the concepts you're likely to need to deal with in building a production "AI on your data" solution, and also some of the Microsoft-centric building blocks which are useful like Semantic Kernel. By the way, I certainly wouldn't want to claim I personally have all the answers - some of the wider thinking described above comes from my talented Advania/Content+Cloud colleagues, and even as a collective we're finding that this is definitely an emerging space where things are moving quickly and there's a lot to learn. Consider this info more as an attempt to share key findings and conclusions perhaps - but if Azure OpenAI on your data on it's own doesn't answer all the questions, in the next article I'll share more thoughts on what might work.

It truly is an exciting time, and the possibilities of AI with organisational data are huge from our perspective.


References

Monday, 27 February 2023

Call ChatGPT/GPT-3 from Power Apps and Power Automate via Azure OpenAI

Anyone in technology will know the buzz caused by ChatGPT since its launch, and beyond fooling schoolteachers with human-like essays and passing law exams we’ve seen many interesting real-world use cases already. Of course, the point isn’t that it’s just an interesting plaything - generative AI using large language models are powerful technology to integrate into apps, tools, and automated processes. The potential is almost limitless and entirely new categories of digital experience are opened up. As we all know, Microsoft were quick to identify this and their $11 billion investment in GPT creator OpenAI means cutting-edge AI is becoming integrated into their products and services.

So we know that Teams, SharePoint, Viva, Dynamics etc. will evolve quite quickly and with features like Power Apps Ideas and Power Automate AI Copilot, Microsoft are starting to include GPT-3 and Codex capabilities into the maker experience within Power Platform. However, alongside that we want to build GPT into *our* apps and solutions! In this post we’ll explore how to integrate GPT-3 into what you build with the Power Platform – this is significant because being able to call into GPT capabilities from Power Apps, Power Automate, Power Virtual Agent etc. can be hugely powerful. I won’t go into use cases too much here, but perhaps you want to generate content for new products in development or automate contact centre responses. Maybe you want to use it as a classic chatbot able to answer questions or be a digital assistant of some kind. Or maybe you want to analyse, summarise, translate, or classify some content. In this post I’m going to keep us focused on how to implement rather than usage, but the limit might just your imagination.

The (not particularly) bad news – some implementation work is required, there’s no magic switch in the Power Platform
The good news – it’s as simple as deploying some services in Azure and creating a Power Platform custom connector (which you, or someone in your organisation, may have done already)

What you'll need
Here are the ingredients for calling GPT from the Power Platform:
  • The ability to create an Azure OpenAI instance – note that this requires an application and is currently only available for managed customers and partners. Microsoft assess your use case in line with their responsible AI commitment
  • The ability to create and use a custom connector in the Power Platform
  • A paid Power Platform license which allows you to use a custom connector (e.g. Power Apps/Power Automate per user or per app plan)

Overall here’s what the process looks like:
If you have experience of connectors in the Power Platform things are reasonably straightforward, though there are some snags in the process. This article presents a step-by-step process as well as some things that will hopefully accelerate you in getting started.

Azure OpenAI – the service that makes it happen


In my approach I’m using Azure OpenAI rather than the API hosted by the OpenAI organisation, because this is the way I expect the vast vast majority of businesses to tap into the capabilities. For calling GPT or other OpenAI models to be workable in the real world, it can’t just be another service on the internet. Azure OpenAI is essentially Microsoft-hosted GPT – providing all the security, compliance, trust, scalability, reliability, and responsible AI governance that a mature organisation would look for. Unsurprisingly, the OpenAI service folds under Azure Cognitive Services within Azure, and this means that working with the service and integrating it into solutions is familiar. Microsoft are the exclusive cloud provider behind OpenAI’s models, meaning you won’t find them in AWS, GCP or another cloud provider.

Azure OpenAI is a gated service
To work with Azure OpenAI you need to apply for access, which involves completing a form to describe how you plan to use the service and providing some other details. You’ll also provide the ID of the Azure subscription you plan to use the service from – if your application is approved, this is the subscription where the service will be enabled for you. Note also that the service is only available in three Microsoft regions for now. Details about the application process can be found in the How do I get access to Azure OpenAI section in the Microsoft docs.

Link – form to apply for OpenAI: https://aka.ms/oai/access

Once you’ve received notification that your application for Azure OpenAI has been approved, you’re ready to start building.

Step 1 – create your Azure OpenAI instance


Firstly, in the Azure portal navigate into the Azure subscription where your usage of OpenAI has been approved. Start the process by creating a new resource and searching for “Azure OpenAI”:

Click the ‘Create’ link. We’ll now provide the details to create the resource in your chosen region and resource group – in my case I already have a resource group ready to go:
There’s only one pricing tier and currently you’ll be selecting from one of three regions where the OpenAI service is available:
  • West Europe
  • East US
  • South Central US
Hit the ‘Review + create’ button and you should see validation has passed:
Now finish off by clicking the ‘Create’ button:
Creation will now start:
Once complete, your OpenAI resource will be available. The main elements you’ll use are highlighted below:

Exploring the Azure OpenAI Studio


For the next steps you’ll use the link highlighted above to navigate into the Azure OpenAI Studio – this is the area where OpenAI models can be deployed for use. Like many other Azure AI capabilities, configuration isn't done in a normal blade in the Azure portal - instead there's a more complete experience in a sub-portal. Here’s what the Studio looks like:

Azure OpenAI Studio has a GPT-3 playground, similar to the openai.com site where you may have played with ChatGPT or the main playground. In the Azure OpenAI Studio the playground can be used once you have a model deployed (which we’ll get to in a second), and it’s exactly like OpenAI’s own GPT-3 playground with some ready-to-play examples and tuning options. In the image below I’m summarising some longer text using the text-davinci-003 model:
The playground gives you the ability to test various prompts and the AI-generated results, also giving you info on how many tokens were used in the operations. Pricing is consumption-based and will depend on how many tokens you use overall.

To give a further sense of what’s available in the Azure OpenAI Studio, here’s the navigation:
To move forward with integrating GPT-3 into our apps, we need to deploy one of the AI models and make it available for use.

Step 2 – choose an OpenAI model and deploy it


Using the navigation above, go into the Models area. Here you’ll find all the OpenAI models available for selection, and for production use you’ll want to spend time in the OpenAI model documentation to establish the best model for your needs. Here’s a sample of the models you can choose from:
Each model has a different blend of functional performance from the model for the cost and speed of operations. The Finding the right model area in the OpenAI documentation has the information you need and here’s a quick extract to give you a sense:

Model (latest version) Description Training data
text-davinci-003 Most capable GPT-3 model. Can do any task the other models can do, often with higher quality, longer output and better instruction-following. Also supports inserting completions within text. Up to Jun 2021
text-curie-001 Very capable, but faster and lower cost than Davinci. Up to Oct 2019
text-babbage-001 Capable of straightforward tasks, very fast, and lower cost. Up to Oct 2019
text-ada-001 Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost. Up to Oct 2019

Text-davinci-003 is the model most of us have been exposed to the most as we’ve been playing with GPT-3, and as noted above it’s the most powerful. It’s worth noting that if your use case relates to GPT’s code capabilities (e.g. debugging, finding errors, translating from natural language to code etc.) then you’ll need one of the Codex models such as code-davinci-002 or code-cushman-001. Again, more details in the documentation. Most likely you just want to get started experimenting, so we’ll deploy an instance of the text-davinci-003 model.

To move forward with this step, go into the Deployments area in the navigation:

From there, hit the ‘Create new deployment’ button:
In the dlalog which appears, select the text-davinci-003 model (or another you choose) and give it a name – I’m simply giving mine a prefix:
Hit the ‘Create’ button and model creates instantly:
The model is now ready for use – you can test this in the playground, ensuring your deployment is selected in the dropdown. The next step is to start work on making it available in the Power Platform.

Step 3.1 – create the custom connector in the Power Platform


Background:

Azure OpenAI provides a REST API, and in this step we’ll create the custom connector to the API endpoint of your Azure OpenAI service and a connection instance to use. From the REST APIs provided, it’s the Completions API which is the one we’ll use – this is the primary service to provide responses to a given prompt. A call to this endpoint looks like this:
POST https://{your-resource-name}.openai.azure.com/openai/deployments/{deployment-id}/completions?api-version={api-version}
In a JSON body, you’ll pass parameters for the prompt (i.e. your question/request) and various optional tokens to instruct GPT on how to operate, including max_tokens and temperature to use. All this is documented and the good news is you can stay simple and use defaults.

You can authenticate to the API using either AAD (which you should do for production use) or with an API key. In my example below I’m using the API key to keep things simple, and this approach is fine as you get started. A header named “api-key” is expected and you pass one of the keys from your Azure OpenAI endpoint in this parameter.

Process:

If you’ve ever created a Power Platform connector before you’ll know there are various approaches to this – the API docs link to Swagger definitions on Github, however these will not work for the Power Platform because they’re in OpenAPI 3.0 format and the Power Platform currently needs OpenAPI 2.0. To work around this I created my connector from scratch and used the Swagger editor to craft the method calls to match the API docs. To save you some of this hassle you can copy/paste my Swagger definition below, and that should simplify the process substantially.

Before we get started, you’re going to need the details of your Azure OpenAI endpoint, so collect those now and store them safely (remembering that your API key is a sensitive credential). To do this, go back into the main Azure portal (not the OpenAI Studio) and head into the ‘Keys and Endpoint’ area:


Once there, use the screen to collect one of your keys and the endpoint URI:


Once you have those details, we’re ready to create the connector in the Power Platform. To do this, go to https://make.powerautomate.com/ and ensure you’re in the Power Platform environment to use.

From there, go to the Custom connectors area:

Hit the ‘New custom connector’ button and select ‘from blank’:

Give your connector a name:

We now start defining the connector – you can define each individual step if you like but you should find it easier to import from my definition below. Before that step, you can add a custom logo if you like in the ‘General’ area – this will make the connector easier to identify in Power Automate. 

The next step is to download the Swagger definition and save it locally. If you choose to use my Swagger definition, you can get it from the embed below (NOTE: if you're reading on a phone this will not be shown!). Copy the definition and save it locally:


Make the following changes to the file and save:
  • Replace ‘[YOUR ENDPOINT PREFIX HERE] in line 6 with your prefix – the final line should be something like cob-openai-instance1.openai.azure.com
  • Replace ‘[YOUR DEPLOYMENT ID HERE]’ with your AI model deployment name from the Azure OpenAI Studio – in the steps above, my model name was cob-textdavinci-003 for example
In the connector definition screens, toggle the Swagger Editor toggle so it’s on:


The editor UI will now show on the left - paste your amended Swagger definition in here. The image below shows my host (prefixed ‘COB’) but you should see your equivalent:

Once this is done, simply hit the ‘Create connector’ button:
Your custom connector has now been created using the details from the Swagger definition.

Step 3.2 – create connection instance


Once the custom connector exists we create a connection from it – if you haven’t done this before, this is where credentials are specified. In the Power Automate portal, if you’re not there already go back into the Data > Custom connectors area and find the custom connector we created in the previous step – when you find it click the plus icon (+):
Paste in the API key previously retrieved for your Azure endpoint and hit ‘Create connection’:
The connection is now created and should appear in your list:
We’re now ready to test calling GPT from a Flow or Power App – in the next step we’ll use a Flow.

Step 4 – test calling GPT-3 from a Flow


Follow the normal process in Power Automate to create a Flow, deciding which flavour is most appropriate for your test. As a quick initial test you may want to simply create a manually-triggered Flow. Once in there, use the normal approach to add a new step:
In the add step, choose the ‘Custom’ area and find the connection created in the previous step:
If you used my YAML, the connector definition will give you a Flow action which exposes the things you might need to vary with some elements auto-populated – remember to make sure your model deployment ID is specified in the first parameter (it will be if you overwrote mine in the earlier step):
Let’s now add some details for the call – choose a prompt that you like (red box below) and specify the number of tokens and temperature you’d like the model to use (green box below):
If you now run the Flow, you should get a successful response with the generated text returned in the body:
If we expand this, here’s what our call to GPT-3 created in full for these parameters:

Power Platform Governance is an important component of any organization’s digital transformation effort that allows customers to stay in control of the data, applications, and processes. It is an integrated system of policies, standards, and procedures that ensure the correct use and management of technology. This governance framework provides organizations the assurance, confidence, and trust that their technology investments are secure, optimized, and compliant with the industry standards.

Power Platform Governance consists of three main components: Governance Policies, Governance Model, and Governance Reporting. Governance Policies are the foundation of the framework and define the standards, procedures, and rules that organizations must adhere to. The Governance Model is a set of tools, processes, and best practices that organizations use to implement and enforce the governance policies. Lastly, Governance Reporting is the process of monitoring, analyzing, and reporting on the adherence to the governance policies.

Overall, Power Platform Governance is an important component of any digital transformation effort, as it helps organizations to stay in control of their investments in technology, making sure that it is secure, optimized and compliant with the industry standards. It consists of three main components: governance policies, governance model, and governance reporting. This framework helps organizations to create a reliable and secure environment for their technology investments, ensuring that they are in compliance with the industry standards.

SUCCESS! You now have the full power of GPT-3 and other OpenAI models at your disposal in the Power Platform.

I recommend playing around with the max_tokens and temperature parameters to see different length responses and greater/lesser experimentation in the generation. Again, spend time in the OpenAI documentation to gain more understanding of the effect of these parameters.

GREAT! But how much will some experimentation cost?
The text-davinci-003 model used in this example costs $0.02 per 1000 tokens used, so you'll need to do a lot of playing to generate significant costs. Since the max_tokens parameter is exposed in the Swagger definition above, you can easily control how large or small the responses should be to allow you to control things as you're playing. Since this is just Azure consumption, you can also set up budgets and notifications in Azure Cost Analysis to help you monitor usage. I'll cover this in an upcoming article.

Summary


Being able to call into powerful generative AI from the Power Platform provides a huge range of possibilities - imagine integrating GPT-3 capabilities into your apps and processes. You can automate responses to queries, generate content, or even translate, summarise, classify or moderate something coming into a process. With the Codex model, scenarios around code generation and checking are also open. Most organisations invested in Microsoft tech will want to use the Azure-hosted version of the OpenAI models, and this is the Azure OpenAI service as part of Azure Cognitive Services. The service requires an application to be approved before you can access it, but it's as simple as completing a form in most cases. The process outlined above allows you to connect to the service through a Power Platform custom connector, and this can be shared across your organisation and/or controlled as you need.

The connection essentially gives you all the standard GPT-3 and Codex possibilities, meaning everything you may have experienced or read about online. Going beyond that, in many use cases you'll want to explore model fine-tuning or patterns like Retrieval Augmented Generation - for example, to bring in industry or organisation-specific data and decision-making from your own prompts. This guides the AI into understanding your particular domain much more than the default. That's beyond the scope of this article but is just one of the many possibilities for going further with generative AI in your apps. 

I also highly recommend putting some controls in place to ensure you are protected from costs spiralling out of control - this is the subject of my next article.