Livepair is a realtime multimodal desktop assistant that uses Gemini Live API to combine voice, screen context, and transcript-aware responses in an Electron app backed by a NestJS API that is designed to run on Google Cloud Run.
- Node.js LTS
pnpm9.x- Docker Engine with Docker Compose
- Linux desktop environment with microphone and screen-capture support
pnpm installcp apps/api/.env.example apps/api/.env
cp apps/desktop/.env.example apps/desktop/.env
cp infra/postgres/.env.example infra/postgres/.envIn apps/api/.env, set your Gemini API key:
GEMINI_API_KEY=your-gemini-api-keyKeep these values aligned:
SESSION_TOKEN_AUTH_SECRETmust match inapps/api/.envandapps/desktop/.envSESSION_TOKEN_LIVE_MODELin the API should matchVITE_LIVE_MODELin the desktop appVITE_LIVE_API_VERSIONshould remainv1alphafor the current speech flow
make postgres-uppnpm --filter @livepair/api devThe API is available at http://127.0.0.1:3000 by default.
pnpm --filter @livepair/desktop devIf you want to start both apps together instead:
pnpm run devUseful checks:
make smoke-check
curl http://127.0.0.1:3000/healthThe repository now includes a Linux packaging flow for the desktop app that uses the Electron runtime already installed in the workspace.
Build the artifacts from the repo root:
pnpm dist:desktop:linuxArtifacts are written under apps/desktop/release/linux/<arch>/:
- portable Linux bundle directory
.debpackage.AppDirstaging directory.AppImagewhenappimagetoolis installed on the host
Production configuration is still provided outside version control. Create a livepair.env file from apps/desktop/.env.example and set at least:
SESSION_TOKEN_AUTH_SECRET=replace-with-your-production-secret
BACKEND_URL=https://your-production-api.example.comUse the env file in one of these locations:
- next to the portable launcher or extracted
.AppImageaslivepair.env /etc/livepair/livepair.envfor the installed.deb
If you also want the packaging command to emit a real .AppImage, install appimagetool on the Ubuntu build host before running pnpm dist:desktop:linux.
Livepair gives users a desktop assistant that can listen, respond, and use screen context during a live session.
- Starts a speech session by requesting an ephemeral Gemini token from the backend
- Connects directly from the desktop app to Gemini Live API for low-latency realtime interaction
- Accepts voice plus typed follow-up turns inside the same active Live session
- Shows transcript and conversation state in the desktop UI
- Persists chats, messages, summaries, and live-session metadata through backend chat-memory APIs
- It makes desktop assistance feel conversational instead of step-by-step and modal
- It keeps latency low by keeping the backend out of the audio/video hot path
- It combines voice, screen context, and transcript feedback in one workflow
- It uses ephemeral tokens and a strict Electron bridge for safer local AI interactions
- Gemini-powered voice interaction: speech mode is built on Gemini Live API
- Realtime transcript handling: transcript and response state update during the session
- Multimodal screen context: users can share screen context in manual or continuous modes during an active Live session
- Interruption support: local barge-in handling stops playback quickly when the user speaks
- Durable memory: the backend stores chats, messages, summaries, and live-session records in Postgres
- Session continuity: the desktop supports token refresh and session resumption flows
| Welcome Screen | Active Session | Chat History |
|---|---|---|
![]() |
![]() |
![]() |
| Preferences | Settings |
|---|---|
![]() |
![]() |
Current MVP boundaries:
- The backend handles control-plane work and persistence, not realtime audio/video proxying
- Typed input is available once a Live session is active
- Backend-backed tools, checkpoint restore, and broader error-reporting flows are planned but not fully implemented yet
- Built with Electron, React, and TypeScript
- Captures microphone input
- Manages Live session state, transcript UI, playback, interruption, and screen sharing
- Connects directly to Gemini Live API for realtime speech interactions
- Built with NestJS and TypeScript
- Exposes
GET /health,POST /session/token, and/chat-memory/* - Issues short-lived Gemini session tokens
- Persists durable chat memory in Postgres
Important: the backend stays out of the realtime audio/video path. The desktop talks directly to Gemini Live API, while the backend focuses on authentication, health, and persistence.
- Desktop: Electron, React, TypeScript
- Backend: NestJS, TypeScript
- AI: Gemini Developer API, Gemini Live API
- Data: PostgreSQL
- Cloud: Google Cloud Run, Cloud Build, Artifact Registry
- Infrastructure: Terraform modules under
infra/terraform
The backend deployment path is built for Google Cloud.
- Runtime: Google Cloud Run
- CI/CD:
cloudbuild.yamlperforms build, push, migration, deploy, and smoke test steps - Images: Artifact Registry stores the API and migration images
- Infrastructure as code: Terraform modules manage Cloud Run service and job shape, Secret Manager wiring, Cloud SQL attachment, ingress, scaling, and IAM
For full deployment details, see infra/terraform/README.md.
This diagram shows the fastest judge-facing story: what runs on the user device, what runs in Google Cloud, and how Gemini fits into the multimodal flow.
flowchart LR
subgraph Device[User Device]
U((User))
D[Desktop App<br/>Electron + React<br/>voice • screen • transcript]
end
subgraph Cloud[Google Cloud]
API[Cloud Run API<br/>NestJS]
DB[(Cloud SQL<br/>Postgres)]
end
subgraph AI[Gemini / Google AI]
AUTH[Gemini auth_tokens]
LIVE[Gemini Live API]
end
U -->|voice, text, screen| D
D -->|token + memory APIs| API
API --> DB
API -->|mint ephemeral token| AUTH
D -->|live audio, screen, text| LIVE
LIVE -->|audio + transcript events| D
Note: the backend is intentionally out of the realtime media path. The desktop connects directly to Gemini Live after the backend mints a short-lived token.
This diagram shows what happens during a live multimodal session: token issuance, direct Live connection, multimodal input, assistant response, and durable persistence.
sequenceDiagram
participant U as User
participant D as Desktop App
participant API as Backend API
participant G as Gemini Live API
participant DB as Cloud SQL
U->>D: Start live session
D->>API: POST /session/token
API->>G: Mint ephemeral token (auth_tokens)
G-->>API: Token + expiry
API-->>D: Token + expiry
D->>G: Open Gemini Live session
U->>D: Speak
opt Screen context enabled
U->>D: Share screen
end
D->>G: Stream mic audio + screen + typed follow-up
G-->>D: Assistant audio + transcript + session events
D-->>U: Play reply + render live transcript UI
D->>API: Persist chat + live-session state
API->>DB: Store durable memory
Note: typed follow-up turns currently reuse the active Gemini Live session; there is no separate backend text-chat endpoint in the current repo state.
This diagram makes the deployed Google Cloud footprint obvious while keeping the presentation simple enough for README and Devpost.
flowchart LR
subgraph Client[Client]
Desktop[Desktop App<br/>Electron + React]
end
subgraph GCP[Google Cloud]
subgraph Delivery[Delivery]
Build[Cloud Build]
AR[Artifact Registry]
end
subgraph Runtime[Runtime]
Run[Cloud Run API<br/>NestJS]
Job[Migration Job<br/>Cloud Run]
SQL[(Cloud SQL<br/>Postgres)]
SM[Secret Manager]
Obs[Logs + Alerts]
end
SA[Runtime Service Accounts]
end
subgraph AI[Gemini / Google AI]
Auth[Gemini auth_tokens]
Live[Gemini Live API]
end
Desktop -->|health, token, memory APIs| Run
Desktop -->|audio, screen, typed turns| Live
Run -->|mint token| Auth
Run --> SQL
Job --> SQL
Build --> AR
AR --> Run
AR --> Job
Build -->|deploy| Run
Build -->|update + execute| Job
SM -->|secret env vars| Run
SM -->|DATABASE_URL| Job
SA -.attached to .-> Run
SA -.attached to .-> Job
Run -->|structured logs| Obs
Obs -->|uptime + telemetry alerts| Run
Note: this matches the current Terraform and deployment files. I did not include a load balancer, VPC connector, Redis, Pub/Sub, or queues because they are not provisioned in this repo today.
.
├── apps/
│ ├── api/ # NestJS backend API
│ └── desktop/ # Electron + React desktop app
├── packages/
│ └── shared-types/ # Shared serializable contracts
├── infra/ # Deployment and local infrastructure
├── docs/ # Architecture and supporting docs
├── cloudbuild.yaml # Google Cloud build/deploy pipeline
└── THIRD_PARTY_NOTICES.md # Third-party runtime notices
The same final image also carries the SQL migration files and an API-local migration script, so a Cloud Run Job can use the runtime-safe command below instead of any workspace-level pnpm --filter ... invocation:
npm run migration:upIf you want to mimic Cloud Run's default port locally, override PORT when you start the container:
docker run --rm \
-p 8080:8080 \
--env-file apps/api/.env \
-e PORT=8080 \
livepair-api:localGET /health should respond without a database connection. Routes backed by durable chat-memory persistence still require a reachable Postgres via DATABASE_URL.
Wave 6 turns the API path into a staging-first CD flow:
- GitHub Actions deploys
mainautomatically tostaging - production deploys are a separate manual workflow step
cloudbuild.yamlnow performs the full ordered rollout: build, push, migrate, deploy, smoke-test- deploys use immutable commit-SHA image tags for both the API image and the migration image
Responsibility stays split on purpose:
- Terraform remains the source of truth for Artifact Registry, Cloud Run service shape, Cloud Run migration job shape, runtime service accounts, Secret Manager wiring, Cloud SQL attachment, scaling, ingress, and public/private access.
- Cloud Build owns the ordered rollout execution.
- GitHub Actions owns the staging and production entry points.
The Cloud Run Terraform modules ignore image-only drift so a later terraform apply does not roll back a successful release. Keep both api_service.image and api_migration_job.image in the environment terraform.tfvars files pointed at valid bootstrap images for first creation or any future recreate.
Manual fallback:
PROJECT_ID=your-gcp-project-id
REGION=us-central1
REPOSITORY=livepair-staging-containers
SERVICE=livepair-staging-api
MIGRATION_JOB=livepair-staging-api-migrate
IMAGE_TAG="$(git rev-parse HEAD)"
gcloud builds submit \
--project "$PROJECT_ID" \
--config cloudbuild.yaml \
--substitutions=_REGION="$REGION",_AR_REPOSITORY="$REPOSITORY",_IMAGE_NAME=api,_MIGRATION_IMAGE_NAME=api-migrator,_IMAGE_TAG="$IMAGE_TAG",_SERVICE_NAME="$SERVICE",_MIGRATION_JOB_NAME="$MIGRATION_JOB",_SMOKE_PATH=/health \
.That path keeps secrets out of the image and out of the pipeline config. Runtime secrets still stay in Secret Manager and are injected by the Terraform-managed Cloud Run service/job.
For the full operator flow, including GitHub environment setup, manual migration reruns, and rollback commands, see infra/terraform/README.md.
make postgres-up
make postgres-down
make postgres-resetpnpm migration:up
make smoke-checkpnpm lint
pnpm typecheck
pnpm testFocused package checks:
pnpm verify:api
pnpm verify:desktop
pnpm verify:shared-typesdocker build -f apps/api/Dockerfile -t livepair-api:local .
docker run --rm -p 3000:3000 --env-file apps/api/.env livepair-api:localdocs/ARCHITECTURE.mdfor the current architecture and product modeldocs/MILESTONE_MATRIX.mdfor implementation statusdocs/KNOWN_ISSUES.mdfor known gaps and risks
- Gemini Developer API and Gemini Live API power the assistant experience
- Google Cloud Run and Cloud Build power the backend deployment path
- Third-party runtime notices are listed in
THIRD_PARTY_NOTICES.md - This repository does not currently include a standalone
LICENSEfile





