Inspiration

We wanted to democratize music-making. Traditional jam sessions require instruments, skills, and being in the same physical space. We asked: what if anyone could pick up their phone and instantly become part of a band? What if the barrier to making music together was just scanning a QR code?

What it does

CornellJam transforms any gathering into a live band session. A host opens a session on their laptop, shares a QR code, and players join from their phones. The host controls tempo (80-140 BPM) and transport; players tap, swipe, or use motion controls to play Drums, Bass, Melody, or FX. Every input is quantized to the beat on the server and played through the host's speakers in perfect sync.

Key features:

  • QR Join - No app install, no account needed
  • 4 Instruments - Drums, Bass, Melody, FX with tap/hold/swipe gestures
  • Hand Tracking - Use your hands in front of the camera to play (TensorFlow.js + MediaPipe)
  • Motion Controls - Shake, tilt, and swipe your phone to trigger sounds
  • Auto-Harmony - All notes locked to C major pentatonic scale so everything sounds musical
  • WebRTC Video - See all players in a video grid on the host screen
  • AI Coach - Get real-time feedback on your performance using OpenAI
  • AI Player - Let an AI fill in an instrument slot with genre-based patterns
  • Session Recording - Record and replay jam sessions

How we built it

Frontend: React 19, Vite 7, Tailwind CSS 4, Framer Motion - delivering a responsive, mobile-first experience with smooth animations.

Backend: Node.js with Express, tRPC for type-safe APIs, Socket.IO for real-time WebSocket communication with sub-50ms latency on LAN.

Audio: Web Audio API - all sounds are synthesized in real-time using oscillators, filters, and envelopes. No audio files are streamed; only lightweight control signals travel over the network. This eliminates buffering latency entirely.

Machine Learning: TensorFlow.js with MediaPipe Hands for real-time hand landmark detection and gesture recognition - play instruments with just your hands in front of the camera.

Real-time Video: WebRTC peer-to-peer connections between players and host, with dynamic TURN server support via Metered for NAT traversal.

Database: TiDB Cloud (MySQL-compatible) with Drizzle ORM for session persistence and player records. Cloud: AWS (EC2/S3) for hosting and recording storage, Docker for containerization.

AI Integration: OpenAI API for the AI Coach feature; ElevenLabs for potential voice feedback.

Challenges we faced

  1. Timing synchronization: Getting multiple devices to play in perfect sync. We solved this with server-side beat quantization - the server runs the metronome and snaps all player inputs to the nearest beat, compensating for network jitter.

  2. Audio latency: Streaming audio would add 100-500ms of buffering delay. Our solution: synthesize all audio locally on the host device using Web Audio API, triggered by tiny JSON events over WebSockets.

  3. Cross-browser WebRTC: Different browsers handle ICE candidates and peer connections differently. We implemented ICE buffering to handle candidates arriving before SDP is set, plus reconnection logic for dropped connections.

  4. Hand tracking performance: ML models can be heavy. We lazy-load TensorFlow.js and use the "lite" MediaPipe model for real-time 30fps tracking on mobile devices.

  5. iOS motion permissions: iOS 13+ requires explicit user permission for device motion. We built a permission flow that gracefully handles denied permissions with fallback touch controls.

What we learned

  • The power of Web Audio API for real-time synthesis- creating drums, bass, and synth sounds from scratch with oscillators
  • Socket.IO room management for efficient event broadcasting
  • WebRTC peer connection lifecycle and ICE candidate handling
  • Making ML models performant on mobile browsers
  • The importance of quantization for networked music applications

Built With

Frontend: React, TypeScript, Vite, Tailwind CSS, Framer Motion, Radix UI Backend: Node.js, Express, tRPC, Socket.IO, Web Audio API Machine Learning: TensorFlow.js, MediaPipe Hands Real-time: WebRTC, Socket.IO WebSockets Database: MySQL, TiDB Cloud via Manus, Drizzle ORM Cloud Services: AWS Lightsail (Containers, S3) AI APIs: OpenAI, ElevenLabs

Sponsor Track Justifications

  1. Best Project Built with ElevenLabs

CornellJam integrates ElevenLabs Music Generation API (eleven-music-v1) as a core feature, not just a bolt-on. Our AI Musician agent uses it to:

  • Generate genre-specific backing tracks - Full 60-second jam session backings in Jazz, Rock, Electronic, or Ambient styles at any BPM
  • Create instrument-specific parts - Individual drum patterns, bass lines, melodies, and ambient textures that lock to the session tempo
  • Fill empty instrument slots - When a player drops out, the AI Musician generates contextually appropriate music to keep the band going

This is a novel use case for ElevenLabs beyond text-to-speech - we're using it for real-time music synthesis in a collaborative multiplayer setting. The generated audio integrates seamlessly with our Web Audio pipeline, letting human players jam alongside AI-generated parts.

  1. Technology & Consumer Track by AWS

CornellJam is designed for everyday users - no music theory knowledge, no instruments, no app downloads required. It delights users by:

  • Zero-friction onboarding - Scan a QR code and you're playing in a band. No account creation, no app install.
  • Instant gratification - Auto-harmony (C major pentatonic lock) means everything sounds musical. A complete beginner can pick up a phone and sound good within seconds.
  • Social entertainment - Turns any gathering (parties, classrooms, team events) into an interactive music experience. Players see each other via WebRTC video grid while jamming.
  • Cross-device - Works on any phone or laptop with a browser. Responsive design with motion controls for mobile, touch for tablets, keyboard for desktop.

We're built on AWS infrastructure:

  • Lighsail S3 for session recording storage (players can replay their jams)
  • Lightsail Containers for consistent deployment, and websocket reliability (sub-50ms latency real-time sync)

This is exactly what the track asks for: a consumer-facing app that people love to use - an entertainment product that makes music accessible to everyone.

  1. OpenAI: Best use of Codex

Our AI Coach feature uses OpenAI's GPT-5.4 mini to provide real-time, personalized music education:

  • Context-aware coaching - The CoachingAgent receives live session context (BPM, current bar, which instruments are active) plus each player's recent performance events
  • Personalized feedback - Generates specific, actionable tips for each musician: "On drums, try adding a ghost note on the 'and' of 2 to add groove"
  • Encouraging tone - Designed to be constructive and motivating, not critical

The agent system (server/agents/coaching.ts) uses structured prompts and parses JSON responses for consistent, parseable feedback that's broadcast to all players via WebSocket. This is AI-powered education in action - helping users learn music skills while they play.

  1. EdTech Track by ETEST

CornellJam reimagines music education by removing traditional barriers:

Traditional Learning

  • Requires instrument purchase
  • Needs music theory knowledge
  • Individual practice
  • Teacher-led instruction
  • Slow feedback loop (weekly lessons)

Barriers we're removing:

  • AI Coach - Real-time, personalized feedback on timing, groove, dynamics, and musicality
  • Visual metronome - Beat dots help internalize rhythm and timing
  • Auto-harmony - Players learn pentatonic scales implicitly by playing within them
  • Recording & replay - Review sessions to hear progress over time
  • Multiplayer learning - Beginners learn from watching others, experienced players mentor newcomers

We're making music education accessible, engaging, and social - anyone can learn rhythm, timing, and ensemble playing through play, not practice.

Built With

Share this project:

Updates