0% found this document useful (0 votes)
447 views5 pages

Build A Chatbot On Your CSV Data With LangChain and OpenAI

This document provides instructions for building a chatbot with memory using CSV data. It explains how to use LangChain to link GPT-3 to CSV data via embeddings. Streamlit is then used to create a user interface. The code sample loads a CSV, creates embeddings with OpenAI, builds a retrieval chain with GPT-3 and a vector store, and implements a conversational interface in Streamlit. Running the code launches a chatbot that can answer questions based on the CSV content.

Uploaded by

Mihai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
447 views5 pages

Build A Chatbot On Your CSV Data With LangChain and OpenAI

This document provides instructions for building a chatbot with memory using CSV data. It explains how to use LangChain to link GPT-3 to CSV data via embeddings. Streamlit is then used to create a user interface. The code sample loads a CSV, creates embeddings with OpenAI, builds a retrieval chain with GPT-3 and a vector store, and implements a conversational interface in Streamlit. Running the code launches a chatbot that can answer questions based on the CSV content.

Uploaded by

Mihai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as ODT, PDF, TXT or read online on Scribd
You are on page 1/ 5

Build a Chatbot on Your CSV Data With

LangChain and OpenAI


Chat with your CSV file with a memory chatbot? — Made with
Langchain? and OpenAI?

image made with StableDiffusion


In this article, we’ll see how to build a simple chatbot? with memory that can answer your
questions about your own CSV data.
Hi everyone! In the past few weeks, I have been experimenting with the fascinating potential of large
language models to create all sorts of things, and it’s time to share what I’ve learned!
We’ll use LangChain?to link gpt-3.5 to our data and Streamlit to create a user interface for our
chatbot.
Unlike ChatGPT, which offers limited context on our data (we can only provide a maximum of 4096
tokens), our chatbot will be able to process CSV data and manage a large database thanks to the use of
embeddings and a vectorstore.
A diagram of the process used to create a chatbot on your data, from LangChain Blog

The code
Now let’s get practical! We’ll develop our chatbot on CSV data with very little Python syntax.
Disclaimer: This code is a simplified version of the chatbot I created, it is not optimized to
reduce OpenAI API costs, for a more performant and optimized chatbot, feel free to check
out my GitHub project : yvann-hub/ChatBot-CSV or just test the app at chatbot-csv.com ?.
• First, we’ll install the necessary libraries:
pip install streamlit streamlit_chat langchain openai faiss-cpu tiktoken

• Import the libraries needed for our chatbot:


import streamlit as st
from streamlit_chat import message
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders.csv_loader import CSVLoader
from langchain.vectorstores import FAISS
import tempfile

• We ask the user to enter their OpenAI API key and download the CSV file on which the chatbot
will be based.
• To test the chatbot at a lower cost, you can use this lightweight CSV file: fishfry-
locations.csv
user_api_key = st.sidebar.text_input(
label="#### Your OpenAI API key ?",
placeholder="Paste your openAI API key, sk-",
type="password")
uploaded_file = st.sidebar.file_uploader("upload", type="csv")

• If a CSV file is uploaded by the user, we load it using the CSVLoader class from LangChain
if uploaded_file :
#use tempfile because CSVLoader only accepts a file_path
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
tmp_file.write(uploaded_file.getvalue())
tmp_file_path = tmp_file.name
loader = CSVLoader(file_path=tmp_file_path, encoding="utf-8")
data = loader.load()

• The LangChain CSVLoader class allows us to split a CSV file into unique rows. This can be
seen by displaying the content of the data:
st.write(data)

0:"Document(page_content='venue_name: McGinnis Sisters\nvenue_type: Market\


nvenue_address: 4311 Northern Pike, Monroeville, PA\nwebsite: http://www.mcginnis-
sisters.com/\nmenu_url: \nmenu_text: \nphone: 412-858-7000\nemail: \nalcohol: \
nlunch: True', metadata={'source': 'C:\\Users\\UTILIS~1\\AppData\\Local\\Temp\\
tmp6_24nxby', 'row': 0})"
1:"Document(page_content='venue_name: Holy Cross (Reilly Center)\nvenue_type:
Church\nvenue_address: 7100 West Ridge Road, Fairview PA\nwebsite: \nmenu_url: \
nmenu_text: Fried pollack, fried shrimp, or combo. Adult $10, Child $5. Includes
baked potato, homemade coleslaw, roll, butter, dessert, and beverage. Mac and
cheese $5.\nphone: 814-474-2605\nemail: \nalcohol: \nlunch: ', metadata={'source':
'C:\\Users\\UTILIS~1\\AppData\\Local\\Temp\\tmp6_24nxby', 'row': 1})"

• Cutting the CSV file now allows us to provide it to our vectorstore (FAISS) using OpenAI
embeddings.
• Embeddings allow transforming the parts cut by CSVLoader into vectors, which then represent
an index based on the content of each row of the given file.
• In practice, when the user makes a query, a search will be performed in the vectorstore, and the
best matching index(es) will be returned to the LLM, which will rephrase the content of the
found index to provide a formatted response to the user.
• I recommend deepening your understanding of vectorstore and embeddings concepts for better
comprehension.
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(data, embeddings)

• We then add the ConversationalRetrievalChainby providing it with the desired chat


model gpt-3.5-turbo (or gpt-4) and the FAISS vectorstore storing our file transformed into
vectors by OpenAIEmbeddings().
• This chain allows us to have a chatbot with memory while relying on a vectorstore to find
relevant information from our document.
chain = ConversationalRetrievalChain.from_llm(
llm = ChatOpenAI(temperature=0.0,model_name='gpt-3.5-turbo'),
retriever=vectorstore.as_retriever())

• This function allows us to provide the user’s question and conversation history to
ConversationalRetrievalChain to generate the chatbot’s response.
• st.session_state[‘history’] stores the user’s conversation history when they are on
the Streamlit site.
If you want to add improvements to this chatbot you can check my GitHub ?
def conversational_chat(query):
result = chain({"question": query,
"chat_history": st.session_state['history']})
st.session_state['history'].append((query, result["answer"]))

return result["answer"]

• We initialize the chatbot session by creating st.session_state[‘history’] and the first messages
displayed in the chat.
• [‘generated’] corresponds to the chatbot’s responses.
• [‘past’] corresponds to the messages provided by the user.
• Containers are not essential but help improve the UI by placing the user’s question area below
the chat messages.
if 'history' not in st.session_state:
st.session_state['history'] = []
if 'generated' not in st.session_state:
st.session_state['generated'] = ["Hello ! Ask me anything about " +
uploaded_file.name + " ?"]
if 'past' not in st.session_state:
st.session_state['past'] = ["Hey ! ?"]

#container for the chat history


response_container = st.container()
#container for the user's text input
container = st.container()

• Now that the session.state and containers are configured.


• We can set up the UI part that allows the user to enter and send their question to our
conversational_chat function with the user’s question as an argument.
with container:
with st.form(key='my_form', clear_on_submit=True):

user_input = st.text_input("Query:", placeholder="Talk about your csv


data here (:", key='input')
submit_button = st.form_submit_button(label='Send')

if submit_button and user_input:


output = conversational_chat(user_input)

st.session_state['past'].append(user_input)
st.session_state['generated'].append(output)

• This last part allows displaying the user’s and chatbot’s messages on the Streamlit site using the
streamlit_chat module.
if st.session_state['generated']:
with response_container:
for i in range(len(st.session_state['generated'])):
message(st.session_state["past"][i], is_user=True, key=str(i) +
'_user', avatar_style="big-smile")
message(st.session_state["generated"][i], key=str(i),
avatar_style="thumbs")

• All that’s left is to launch the script:


streamlit run name_of_your_chatbot.py #run with the name of your file
The result after launch the last command
Et voilà! You now have a beautiful chatbot running with LangChain, OpenAI, and Streamlit, capable
of answering your questions based on your CSV file!
I hope this article will help you to create nice things, do not hesitate to contact me on Twitter or at
[email protected] if you need. ?
You also can find the full project on my GitHub.

You might also like