Demo Online: https://demo.byteplus.com/rtc/solution/aigc
- In the AIGC conversation scenario, the Volcengine AIGC - RTC Server cloud service provides an end - to - end AIGC capability link based on streaming voice by integrating RTC audio and video stream processing, ASR voice recognition, large - model interface call integration, and TTS voice generation capabilities.
- Users only need to call the standard - based OpenAPI interfaces to configure the required ASR, LLM, and TTS types and parameters. The Volcengine cloud computing service is responsible for edge user access, cloud resource scheduling, audio and video stream compression, text - to - voice conversion processing, and data subscription and transmission. This simplifies the development process, allowing developers to focus more on the training and debugging of the core capabilities of large models, thus rapidly promoting the innovation of AIGC product applications.
- At the same time, Volcengine RTC has mature technologies such as audio 3A processing and video processing, as well as large - scale audio and video chat capabilities. It can enable AIGC products to more conveniently support scenarios such as multimodal interaction and multi - person interaction, maintaining the naturalness and efficiency of interaction.
This demo supports two operation modes with different configuration requirements:
Allows independent configuration of ASR, LLM, and TTS components with multiple provider options.
You need to fill the following fields into VOICE_CHAT_MODE in ./Server/sensitive.js according to the ASR/LLM/TTS vendor you use.
Besides, if using BytePlusArk as LLM Module, you need to fill the model endpoint ID in ArkModel (File: src/config/voiceChat/llm.ts), which could be gain from BytePlus Ark Console.
VOICE_CHAT_MODE
├── ASRConfig (Speech Recognition)
│ ├── amazon
│ │ ├── ID (Access Key ID)
│ │ └── Secret (Secret Access Key)
│ ├── google
│ │ └── CredentialsJSON (Service Account JSON)
│ └── BytePlus
│ ├── AppId (BytePlus ASR App ID)
│ ├── AccessToken (BytePlus ASR Access Token)
│ └── ApiResourceId (Service Plan Type)
│
├── TTSConfig (Text-to-Speech)
│ ├── byteplus_Bidirectional_streaming
│ │ └── app
│ │ ├── appid (BytePlus TTS App ID)
│ │ └── token (BytePlus TTS Access Token)
│ ├── amazon
│ │ ├── ID (Access Key ID)
│ │ └── Secret (Secret Access Key)
│ └── openai
│ ├── URL (API Endpoint)
│ └── APIKey (OpenAI API Key)
│
└── LLMConfig (Large Language Model)
├── CustomLLM (OpenAI Compatible)
│ ├── Url (API Endpoint)
│ └── APIKey (API Key)
└── BytePlusArk
└── APIKey (BytePlus Ark API Key)
Uses OpenAI's integrated ASR+TTS solution with only LLM being configurable.
You need to fill the OpenAI token into REALTIME_API_MODE in ./Server/sensitive.js.
REALTIME_API_MODE
└── LLMConfig (OpenAI Realtime API)
└── Token (OpenAI API Key)
- Node Version: 16.0+
-
Two terminals are required to start the server and the front-end page respectively.
-
RTC Basic Configuration (
src/config/config.ts)- AppId: Your BytePlus RTC App ID (required)
- RoomId: Auto-generated UUID or custom room ID (optional)
- UserId: Auto-generated UUID or custom user ID (optional)
- Token: Token generated in Byteplus Console or leave it undefined for token auto-generation, demo will invoke api(defined in
./Server/app.js) to generate token, which require yourRTC_APP_KEYin./Server/sensitive.js.
-
Server Configuration (
Server/sensitive.js)- RTC Basic Configuration: Configure
RTC_APP_KEYif you want to auto generate token in server. - API Provider Credentials:
- Flexible Mode:
- Configure
ASR,TTS, andLLMprovider credentials as shown in the configuration tree above. - If using
BytePlusArkasLLM Module, you need to fill the model endpoint ID inArkModel(File:src/config/voiceChat/llm.ts), which could be gain from BytePlus Ark Console.
- Configure
- Realtime Mode: Only requires OpenAI
APIKeyinLLMConfig.Token.
- Flexible Mode:
- RTC Basic Configuration: Configure
Refer to the configuration tree structure above for the complete list of required fields.
Please note that both the server and the web need to be started. The steps are as follows:
Enter the project root directory
cd Server
yarnnode app.jsor
yarn dev
Enter the project root directory
yarnFor Flexible Mode (customizable ASR/LLM/TTS):
yarn dev:flexibleFor Realtime Mode (OpenAI integrated solution):
yarn dev:realtime| Issue | Solution |
|---|---|
| After starting the AI agent, there is no response to conversation, or it keeps showing "AI preparing..." | src/config/voiceChat/llm.ts) and other configurations are correct and available. |
Browser shows Uncaught (in promise) r: token_error error |
Please check if the RTC Token filled in your project is valid. Verify that the UserId, RoomId used to generate the Token, and the Token itself match what's configured in the project. The Token may also be expired - try regenerating it. |
| [StartVoiceChat]Failed(Reason: The task has been started. Please do not call the startup task interface repeatedly.) error | If you've set fixed values for RoomId and UserId, repeatedly calling startAudioBot will cause errors. Simply call stopAudioBot first, then call startAudioBot again. |
| Why aren't my devices working normally even though my microphone and camera are functioning? | This may be due to device permissions not being granted. Please check device permission settings. |
| API calls return "Invalid 'Authorization' header, Pls check your authorization header" error | The AK/SK in Server/sensitive.js is incorrect |
| What is RTC? | Real Time Communication. For more information about RTC concepts, please refer to the official documentation. |
If you encounter issues beyond those listed above, please feel free to contact us for feedback.
Refer to OpenAPI Updates for updates related to Real-time Conversational AI.
- 2025-08-04
- Upgrade SDK version to 4.67.2
- Support BytePlus ASR(Speech-to-Text (ASR) - Streaming)
- fix AI Settings UI
- 2025-07-08
- Upgrade SDK version to 4.66.20
- 2025-06-16
- Updated Demo guide and fixed some parameter errors.
- Modify the Endpoint ID of Byteplus Ark Model. which needs to be filled in by the user.
- 2025-06-06
- Initial Release