# Multi-Agent Realtime Voice Interaction Example This example demonstrates how to use AgentScope's `ChatRoom` class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input. ## Features - đŸ—Ŗī¸ **Real-time Voice Interaction**: Two agents communicate through voice in real-time - 🤖 **Autonomous Conversation**: Agents converse with each other without user intervention - âš™ī¸ **Customizable Configuration**: Configure agent names and instructions through the web interface - 🎨 **Modern UI**: Clean, shadcn-inspired interface for easy interaction - 📊 **Live Transcript**: See the conversation transcripts in real-time ## Architecture The example uses: - **Backend**: FastAPI server with WebSocket support - **Frontend**: HTML5 with Web Audio API for audio playback - **AgentScope Components**: - `ChatRoom`: Manages multiple `RealtimeAgent` instances - `RealtimeAgent`: Handles real-time voice interaction with AI models - `DashScopeRealtimeModel`: DashScope's Qwen3-Omni realtime model ## Prerequisites 1. **Python Dependencies**: ```bash pip install agentscope[dashscope] pip install fastapi uvicorn ``` 2. **DashScope API Key**: - Set your DashScope API key as an environment variable: ```bash export DASHSCOPE_API_KEY="your-api-key-here" ``` ## Usage 1. **Start the Server**: ```bash python run_server.py ``` 2. **Open the Web Interface**: - Navigate to `http://localhost:8000` in your web browser 3. **Configure Agents**: - Set names and instructions for both Agent 1 and Agent 2 - Example configurations: - **Agent 1 (Alice)**: "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational." - **Agent 2 (Bob)**: "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational." 4. **Start the Conversation**: - Click the "â–ļī¸ Start Conversation" button - The agents will begin conversing autonomously - You'll see transcripts and system messages in the message panel - Audio playback will stream in real-time 5. **Stop the Conversation**: - Click the "âšī¸ Stop Conversation" button when you want to end the session ## How It Works ### Backend Flow 1. **WebSocket Connection**: Client connects via WebSocket to `/ws/{user_id}/{session_id}` 2. **Session Creation**: - Client sends `client_session_create` event with agent configurations - Server creates two `RealtimeAgent` instances with specified names and instructions - Server creates a `ChatRoom` with both agents - Server starts the chat room and returns `session_created` event 3. **Message Broadcasting**: - `ChatRoom` automatically broadcasts messages between agents - All events (audio, transcripts, etc.) are forwarded to the frontend 4. **Session End**: Client sends `client_session_end` event to stop the conversation ### Frontend Flow 1. **WebSocket Setup**: Establishes connection and waits for server events 2. **Session Management**: Sends configuration and manages conversation state 3. **Audio Playback**: - Receives base64-encoded PCM16 audio chunks - Decodes and queues audio data - Uses Web Audio API `ScriptProcessorNode` for streaming playback at 24kHz 4. **Transcript Display**: Shows real-time transcripts from both agents ## Key Components ### ChatRoom The `ChatRoom` class manages multiple `RealtimeAgent` instances: - Establishes connections for all agents - Broadcasts messages between agents automatically - Forwards events to the frontend - Handles lifecycle management (start/stop) ### RealtimeAgent Each `RealtimeAgent`: - Connects to the DashScope realtime API - Processes audio input from other agents - Generates voice responses - Emits events for transcripts, audio, and status updates ## Customization ### Changing the Model To use a different model, modify the `DashScopeRealtimeModel` configuration in `run_server.py`: ```python model=DashScopeRealtimeModel( model_name="your-model-name", api_key=os.getenv("DASHSCOPE_API_KEY"), ) ``` ### Adding More Agents To add more agents, modify the agent creation section in `run_server.py`: ```python agent3 = RealtimeAgent( name=agent3_name, sys_prompt=agent3_instructions, model=DashScopeRealtimeModel( model_name="qwen3-omni-flash-realtime", api_key=os.getenv("DASHSCOPE_API_KEY"), ), ) chat_room = ChatRoom(agents=[agent1, agent2, agent3]) ``` And update the frontend to include configuration fields for the additional agents. ## Troubleshooting ### No Audio Playback - Ensure your browser supports Web Audio API - Check browser console for audio-related errors - Verify the audio format matches the expected PCM16 at 24kHz ### Connection Issues - Verify your DashScope API key is set correctly - Check that port 8000 is not blocked by firewall - Review server logs for error messages ### Agents Not Responding - Ensure both agent configurations have valid instructions - Check that the instructions encourage conversational behavior - Review the console logs for API errors ## References - [AgentScope Documentation](https://modelscope.github.io/agentscope/) - [DashScope API Documentation](https://help.aliyun.com/zh/model-studio/) - [FastAPI Documentation](https://fastapi.tiangolo.com/) - [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)