Livepair

Livepair is a realtime multimodal desktop assistant that uses Gemini Live API to combine voice, screen context, and transcript-aware responses in an Electron app backed by a NestJS API that is designed to run on Google Cloud Run.

Quick Start

Prerequisites

Node.js LTS
pnpm 9.x
Docker Engine with Docker Compose
Linux desktop environment with microphone and screen-capture support

1) Install dependencies

pnpm install

2) Create local environment files

cp apps/api/.env.example apps/api/.env
cp apps/desktop/.env.example apps/desktop/.env
cp infra/postgres/.env.example infra/postgres/.env

3) Set required environment values

In apps/api/.env, set your Gemini API key:

GEMINI_API_KEY=your-gemini-api-key

Keep these values aligned:

SESSION_TOKEN_AUTH_SECRET must match in apps/api/.env and apps/desktop/.env
SESSION_TOKEN_LIVE_MODEL in the API should match VITE_LIVE_MODEL in the desktop app
VITE_LIVE_API_VERSION should remain v1alpha for the current speech flow

4) Start local PostgreSQL

make postgres-up

5) Run the backend

pnpm --filter @livepair/api dev

The API is available at http://127.0.0.1:3000 by default.

6) Run the desktop app

pnpm --filter @livepair/desktop dev

If you want to start both apps together instead:

pnpm run dev

Useful checks:

make smoke-check
curl http://127.0.0.1:3000/health

7) Build Ubuntu desktop artifacts

The repository now includes a Linux packaging flow for the desktop app that uses the Electron runtime already installed in the workspace.

Build the artifacts from the repo root:

pnpm dist:desktop:linux

Artifacts are written under apps/desktop/release/linux/<arch>/:

portable Linux bundle directory
.deb package
.AppDir staging directory
.AppImage when appimagetool is installed on the host

Production configuration is still provided outside version control. Create a livepair.env file from apps/desktop/.env.example and set at least:

SESSION_TOKEN_AUTH_SECRET=replace-with-your-production-secret
BACKEND_URL=https://your-production-api.example.com

Use the env file in one of these locations:

next to the portable launcher or extracted .AppImage as livepair.env
/etc/livepair/livepair.env for the installed .deb

If you also want the packaging command to emit a real .AppImage, install appimagetool on the Ubuntu build host before running pnpm dist:desktop:linux.

What it does

Livepair gives users a desktop assistant that can listen, respond, and use screen context during a live session.

Starts a speech session by requesting an ephemeral Gemini token from the backend
Connects directly from the desktop app to Gemini Live API for low-latency realtime interaction
Accepts voice plus typed follow-up turns inside the same active Live session
Shows transcript and conversation state in the desktop UI
Persists chats, messages, summaries, and live-session metadata through backend chat-memory APIs

Why it matters

It makes desktop assistance feel conversational instead of step-by-step and modal
It keeps latency low by keeping the backend out of the audio/video hot path
It combines voice, screen context, and transcript feedback in one workflow
It uses ephemeral tokens and a strict Electron bridge for safer local AI interactions

Key capabilities

Gemini-powered voice interaction: speech mode is built on Gemini Live API
Realtime transcript handling: transcript and response state update during the session
Multimodal screen context: users can share screen context in manual or continuous modes during an active Live session
Interruption support: local barge-in handling stops playback quickly when the user speaks
Durable memory: the backend stores chats, messages, summaries, and live-session records in Postgres
Session continuity: the desktop supports token refresh and session resumption flows

Preview

Welcome Screen	Active Session	Chat History

Preferences	Settings

Current MVP boundaries:

The backend handles control-plane work and persistence, not realtime audio/video proxying
Typed input is available once a Live session is active
Backend-backed tools, checkpoint restore, and broader error-reporting flows are planned but not fully implemented yet

Architecture overview

Desktop app

Built with Electron, React, and TypeScript
Captures microphone input
Manages Live session state, transcript UI, playback, interruption, and screen sharing
Connects directly to Gemini Live API for realtime speech interactions

Backend API

Built with NestJS and TypeScript
Exposes GET /health, POST /session/token, and /chat-memory/*
Issues short-lived Gemini session tokens
Persists durable chat memory in Postgres

Runtime boundary

Important: the backend stays out of the realtime audio/video path. The desktop talks directly to Gemini Live API, while the backend focuses on authentication, health, and persistence.

Tech stack

Desktop: Electron, React, TypeScript
Backend: NestJS, TypeScript
AI: Gemini Developer API, Gemini Live API
Data: PostgreSQL
Cloud: Google Cloud Run, Cloud Build, Artifact Registry
Infrastructure: Terraform modules under infra/terraform

Google Cloud deployment

The backend deployment path is built for Google Cloud.

Runtime: Google Cloud Run
CI/CD: cloudbuild.yaml performs build, push, migration, deploy, and smoke test steps
Images: Artifact Registry stores the API and migration images
Infrastructure as code: Terraform modules manage Cloud Run service and job shape, Secret Manager wiring, Cloud SQL attachment, ingress, scaling, and IAM

For full deployment details, see infra/terraform/README.md.

Architecture diagrams

1) Product architecture overview

This diagram shows the fastest judge-facing story: what runs on the user device, what runs in Google Cloud, and how Gemini fits into the multimodal flow.

flowchart LR
  subgraph Device[User Device]
    U((User))
    D[Desktop App<br/>Electron + React<br/>voice • screen • transcript]
  end

  subgraph Cloud[Google Cloud]
    API[Cloud Run API<br/>NestJS]
    DB[(Cloud SQL<br/>Postgres)]
  end

  subgraph AI[Gemini / Google AI]
    AUTH[Gemini auth_tokens]
    LIVE[Gemini Live API]
  end

  U -->|voice, text, screen| D
  D -->|token + memory APIs| API
  API --> DB
  API -->|mint ephemeral token| AUTH
  D -->|live audio, screen, text| LIVE
  LIVE -->|audio + transcript events| D

Note: the backend is intentionally out of the realtime media path. The desktop connects directly to Gemini Live after the backend mints a short-lived token.

2) Runtime interaction flow

This diagram shows what happens during a live multimodal session: token issuance, direct Live connection, multimodal input, assistant response, and durable persistence.

sequenceDiagram
  participant U as User
  participant D as Desktop App
  participant API as Backend API
  participant G as Gemini Live API
  participant DB as Cloud SQL

  U->>D: Start live session
  D->>API: POST /session/token
  API->>G: Mint ephemeral token (auth_tokens)
  G-->>API: Token + expiry
  API-->>D: Token + expiry

  D->>G: Open Gemini Live session
  U->>D: Speak
  opt Screen context enabled
    U->>D: Share screen
  end
  D->>G: Stream mic audio + screen + typed follow-up
  G-->>D: Assistant audio + transcript + session events
  D-->>U: Play reply + render live transcript UI

  D->>API: Persist chat + live-session state
  API->>DB: Store durable memory

Note: typed follow-up turns currently reuse the active Gemini Live session; there is no separate backend text-chat endpoint in the current repo state.

3) Google Cloud infrastructure

This diagram makes the deployed Google Cloud footprint obvious while keeping the presentation simple enough for README and Devpost.

flowchart LR
  subgraph Client[Client]
    Desktop[Desktop App<br/>Electron + React]
  end

  subgraph GCP[Google Cloud]
    subgraph Delivery[Delivery]
      Build[Cloud Build]
      AR[Artifact Registry]
    end

    subgraph Runtime[Runtime]
      Run[Cloud Run API<br/>NestJS]
      Job[Migration Job<br/>Cloud Run]
      SQL[(Cloud SQL<br/>Postgres)]
      SM[Secret Manager]
      Obs[Logs + Alerts]
    end

    SA[Runtime Service Accounts]
  end

  subgraph AI[Gemini / Google AI]
    Auth[Gemini auth_tokens]
    Live[Gemini Live API]
  end

  Desktop -->|health, token, memory APIs| Run
  Desktop -->|audio, screen, typed turns| Live

  Run -->|mint token| Auth
  Run --> SQL
  Job --> SQL

  Build --> AR
  AR --> Run
  AR --> Job
  Build -->|deploy| Run
  Build -->|update + execute| Job

  SM -->|secret env vars| Run
  SM -->|DATABASE_URL| Job
  SA -.attached to .-> Run
  SA -.attached to .-> Job

  Run -->|structured logs| Obs
  Obs -->|uptime + telemetry alerts| Run

Note: this matches the current Terraform and deployment files. I did not include a load balancer, VPC connector, Redis, Pub/Sub, or queues because they are not provisioned in this repo today.

Project structure

.
├── apps/
│   ├── api/                # NestJS backend API
│   └── desktop/            # Electron + React desktop app
├── packages/
│   └── shared-types/       # Shared serializable contracts
├── infra/                  # Deployment and local infrastructure
├── docs/                   # Architecture and supporting docs
├── cloudbuild.yaml         # Google Cloud build/deploy pipeline
└── THIRD_PARTY_NOTICES.md  # Third-party runtime notices

Development instructions

The same final image also carries the SQL migration files and an API-local migration script, so a Cloud Run Job can use the runtime-safe command below instead of any workspace-level pnpm --filter ... invocation:

npm run migration:up

If you want to mimic Cloud Run's default port locally, override PORT when you start the container:

docker run --rm \
  -p 8080:8080 \
  --env-file apps/api/.env \
  -e PORT=8080 \
  livepair-api:local

GET /health should respond without a database connection. Routes backed by durable chat-memory persistence still require a reachable Postgres via DATABASE_URL.

☁️ API deploy pipeline

Wave 6 turns the API path into a staging-first CD flow:

GitHub Actions deploys main automatically to staging
production deploys are a separate manual workflow step
cloudbuild.yaml now performs the full ordered rollout: build, push, migrate, deploy, smoke-test
deploys use immutable commit-SHA image tags for both the API image and the migration image

Responsibility stays split on purpose:

Terraform remains the source of truth for Artifact Registry, Cloud Run service shape, Cloud Run migration job shape, runtime service accounts, Secret Manager wiring, Cloud SQL attachment, scaling, ingress, and public/private access.
Cloud Build owns the ordered rollout execution.
GitHub Actions owns the staging and production entry points.

The Cloud Run Terraform modules ignore image-only drift so a later terraform apply does not roll back a successful release. Keep both api_service.image and api_migration_job.image in the environment terraform.tfvars files pointed at valid bootstrap images for first creation or any future recreate.

Manual fallback:

PROJECT_ID=your-gcp-project-id
REGION=us-central1
REPOSITORY=livepair-staging-containers
SERVICE=livepair-staging-api
MIGRATION_JOB=livepair-staging-api-migrate
IMAGE_TAG="$(git rev-parse HEAD)"

gcloud builds submit \
  --project "$PROJECT_ID" \
  --config cloudbuild.yaml \
  --substitutions=_REGION="$REGION",_AR_REPOSITORY="$REPOSITORY",_IMAGE_NAME=api,_MIGRATION_IMAGE_NAME=api-migrator,_IMAGE_TAG="$IMAGE_TAG",_SERVICE_NAME="$SERVICE",_MIGRATION_JOB_NAME="$MIGRATION_JOB",_SMOKE_PATH=/health \
  .

That path keeps secrets out of the image and out of the pipeline config. Runtime secrets still stay in Secret Manager and are injected by the Terraform-managed Cloud Run service/job.

For the full operator flow, including GitHub environment setup, manual migration reruns, and rollback commands, see infra/terraform/README.md.

🐘 Local infrastructure helpers

make postgres-up
make postgres-down
make postgres-reset

Database and smoke checks

pnpm migration:up
make smoke-check

Run workspace checks

pnpm lint
pnpm typecheck
pnpm test

Focused package checks:

pnpm verify:api
pnpm verify:desktop
pnpm verify:shared-types

Optional API container build

docker build -f apps/api/Dockerfile -t livepair-api:local .
docker run --rm -p 3000:3000 --env-file apps/api/.env livepair-api:local

Helpful docs

docs/ARCHITECTURE.md for the current architecture and product model
docs/MILESTONE_MATRIX.md for implementation status
docs/KNOWN_ISSUES.md for known gaps and risks

Acknowledgements and notices

Gemini Developer API and Gemini Live API power the assistant experience
Google Cloud Run and Cloud Build power the backend deployment path
Third-party runtime notices are listed in THIRD_PARTY_NOTICES.md
This repository does not currently include a standalone LICENSE file

Name		Name	Last commit message	Last commit date
Latest commit History 951 Commits
.agents/skills		.agents/skills
.github		.github
apps		apps
docs		docs
infra		infra
packages/shared-types		packages/shared-types
specs		specs
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Makefile		Makefile
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
WATCHOUTS.md		WATCHOUTS.md
cloudbuild.yaml		cloudbuild.yaml
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
setup-dev.sh		setup-dev.sh
tsconfig.base.json		tsconfig.base.json

Folders and files

Latest commit

History

Repository files navigation

Livepair

Quick Start

Prerequisites

1) Install dependencies

2) Create local environment files

3) Set required environment values

4) Start local PostgreSQL

5) Run the backend

6) Run the desktop app

7) Build Ubuntu desktop artifacts

What it does

Why it matters

Key capabilities

Preview

Architecture overview

Desktop app

Backend API

Runtime boundary

Tech stack

Google Cloud deployment

Architecture diagrams

1) Product architecture overview

2) Runtime interaction flow

3) Google Cloud infrastructure

Project structure

Development instructions

☁️ API deploy pipeline

🐘 Local infrastructure helpers

Database and smoke checks

Run workspace checks

Optional API container build

Helpful docs

Acknowledgements and notices

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages