The Model's Dilemma

An experimental platform to test whether LLMs can reason strategically or just pattern-match to game theory terminology. Built with Next.js, Supabase, and the Vercel AI SDK.

The Question

Can LLMs think strategically, or are they sophisticated echo chambers?

This project recreates Robert Axelrod's famous 1984 Prisoner's Dilemma tournament—but with LLMs as the players. By testing models in both overt (explicit game theory framing) and cloaked (business scenario framing) conditions, we can measure whether models genuinely reason about strategic dynamics or simply retrieve training data patterns.

Models Tested

Provider	Model
Anthropic	Claude Sonnet 4.5, Claude Opus 4.5
OpenAI	GPT-5.1 Thinking
xAI	Grok 4.1 Fast Reasoning
Google	Gemini 3 Pro Preview
Perplexity	Sonar Pro
Moonshot	Kimi K2 Thinking Turbo
DeepSeek	DeepSeek V3.2 Thinking

Experiment Design

Dual Prompting Strategy

Overt Prompt: Classic Prisoner's Dilemma framing with explicit payoff matrix and game theory terminology.

Cloaked Prompts: The same payoff structure disguised as:

Sales Territory - Regional directors deciding to SHARE or HOLD leads
Research Lab - Competing labs choosing OPEN or GUARDED data sharing
Content Creator - YouTubers deciding to SUPPORT or stay INDEPENDENT

If behavior is consistent across framings → evidence for genuine strategic reasoning. If behavior diverges significantly → evidence for sophisticated pattern matching.

Payoff Matrix

	Opponent Cooperates	Opponent Defects
You Cooperate	3, 3	0, 5
You Defect	5, 0	1, 1

Tech Stack

Framework: Next.js 15 (App Router)
Database: Supabase (PostgreSQL)
AI: Vercel AI SDK with AI Gateway
Styling: Tailwind CSS + shadcn/ui
Charts: Recharts
Animations: Framer Motion

Project Structure

``` ├── app/ │ ├── api/ │ │ ├── run-match/ # Main game loop │ │ ├── model-stats/ # Analytics API │ │ └── cancel-match/ # Match cancellation │ ├── model-explorer/ # Analytics dashboard │ └── page.tsx # Main game interface ├── components/ │ ├── game-feed.tsx # Live match visualization │ ├── test-match-modal.tsx # Match configuration │ ├── experiment-design.tsx # Methodology documentation │ └── strategy-stats.tsx # Behavior metrics ├── lib/ │ ├── prompts.ts # Overt & cloaked prompt templates │ ├── models.ts # Model definitions │ ├── game-logic.ts # Scoring & outcome logic │ └── supabase/ # Database utilities └── scripts/ # Database migrations ```

Features

Live Match Streaming: Watch AI decisions in real-time with round-by-round visualization
Model Explorer: Bar charts comparing cooperation rates, wins/losses, scenario results, and errors
Prompt Templates: View exact prompts used in each scenario type
Error Tracking: Categorized error logging (format, timeout, parse, API)
Token Tracking: Monitor reasoning effort via token consumption

Database Schema

Key Tables

matches - Match metadata, models, scores, scenarios
rounds - Individual round decisions, tokens, errors
game_rounds - Aggregated game records for display
ai_models - Model registry

Tracked Metrics

Cooperation/Defection per round
Token usage (input/output per model)
Error types and messages
Scenario type per game
Win/loss/draw outcomes

Running Locally

pnpm install
pnpm dev

Environment Variables

Create a .env.local file with:

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
AI_GATEWAY_API_KEY=your_ai_gateway_key

Links

Credits

Made by Webrenew in v0 with AI SDK

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
app		app
components		components
hooks		hooks
lib		lib
public		public
src/trigger		src/trigger
styles		styles
supabase/migrations		supabase/migrations
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
instrumentation.ts		instrumentation.ts
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
trigger.config.ts		trigger.config.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Model's Dilemma

The Question

Models Tested

Experiment Design

Dual Prompting Strategy

Payoff Matrix

Tech Stack

Project Structure

Features

Database Schema

Key Tables

Tracked Metrics

Running Locally

Environment Variables

Links

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Model's Dilemma

The Question

Models Tested

Experiment Design

Dual Prompting Strategy

Payoff Matrix

Tech Stack

Project Structure

Features

Database Schema

Key Tables

Tracked Metrics

Running Locally

Environment Variables

Links

Credits

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages