PineconeBee evaluates multiple LLMs for a user’s question, using similarity‑search over a local knowledge base to score which model produced the most helpful answer. It then returns all model responses with scores and highlights the top pick in real time via WebSockets.
- Retrieves relevant context with FAISS (from scraped UAE.gov pages).
- Queries multiple LLMs.
- Scores each answer against a ground‑truth snippet from the same retrieval.
- Returns every model’s score and response, plus the best one.
- Retrieve: Use FAISS to fetch top‑k chunks for the query.
- Generate: Prompt each configured model with the question + retrieved context.
- Ground truth: Concatenate the retrieved chunks as an approximate truth reference.
- Evaluate: Compute a similarity score (token‑overlap baseline) for each answer.
- Select: Choose the highest‑scoring answer as “most helpful” then return all results.
- Frontend: React (Vite) UI, connects via Socket.IO and displays scores/results.
- Backend: Flask + Flask‑SocketIO for real‑time messaging.
- Retriever: SeleniumURLLoader → OpenAIEmbeddings → FAISS index.
- Models: OpenAI ChatGPT 3.5/4, optional LLaMA via Replicate.
- Evaluator: Token overlap baseline (sklearn) — easy to swap for other metrics.
- Python 3.11+ (3.12 recommended)
- Node.js 18+
- Google Chrome (for Selenium)
- Conda or
venv
cd backend
conda create -n pinecone-bee python=3.12 -y
conda activate pinecone-bee
pip install -r requirements.txt
python main.py
cd frontend
npm install
npm run dev
