Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
161 lines
5.4 KiB
Markdown
161 lines
5.4 KiB
Markdown
# Multi-Agent Realtime Voice Interaction Example
|
|
|
|
This example demonstrates how to use AgentScope's `ChatRoom` class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.
|
|
|
|
## Features
|
|
|
|
- 🗣️ **Real-time Voice Interaction**: Two agents communicate through voice in real-time
|
|
- 🤖 **Autonomous Conversation**: Agents converse with each other without user intervention
|
|
- ⚙️ **Customizable Configuration**: Configure agent names and instructions through the web interface
|
|
- 🎨 **Modern UI**: Clean, shadcn-inspired interface for easy interaction
|
|
- 📊 **Live Transcript**: See the conversation transcripts in real-time
|
|
|
|
## Architecture
|
|
|
|
The example uses:
|
|
- **Backend**: FastAPI server with WebSocket support
|
|
- **Frontend**: HTML5 with Web Audio API for audio playback
|
|
- **AgentScope Components**:
|
|
- `ChatRoom`: Manages multiple `RealtimeAgent` instances
|
|
- `RealtimeAgent`: Handles real-time voice interaction with AI models
|
|
- `DashScopeRealtimeModel`: DashScope's Qwen3-Omni realtime model
|
|
|
|
## Prerequisites
|
|
|
|
1. **Python Dependencies**:
|
|
```bash
|
|
pip install agentscope[dashscope]
|
|
pip install fastapi uvicorn
|
|
```
|
|
|
|
2. **DashScope API Key**:
|
|
- Set your DashScope API key as an environment variable:
|
|
```bash
|
|
export DASHSCOPE_API_KEY="your-api-key-here"
|
|
```
|
|
|
|
## Usage
|
|
|
|
1. **Start the Server**:
|
|
```bash
|
|
python run_server.py
|
|
```
|
|
|
|
2. **Open the Web Interface**:
|
|
- Navigate to `http://localhost:8000` in your web browser
|
|
|
|
3. **Configure Agents**:
|
|
- Set names and instructions for both Agent 1 and Agent 2
|
|
- Example configurations:
|
|
- **Agent 1 (Alice)**: "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
|
|
- **Agent 2 (Bob)**: "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
|
|
|
|
4. **Start the Conversation**:
|
|
- Click the "▶️ Start Conversation" button
|
|
- The agents will begin conversing autonomously
|
|
- You'll see transcripts and system messages in the message panel
|
|
- Audio playback will stream in real-time
|
|
|
|
5. **Stop the Conversation**:
|
|
- Click the "⏹️ Stop Conversation" button when you want to end the session
|
|
|
|
## How It Works
|
|
|
|
### Backend Flow
|
|
|
|
1. **WebSocket Connection**: Client connects via WebSocket to `/ws/{user_id}/{session_id}`
|
|
2. **Session Creation**:
|
|
- Client sends `client_session_create` event with agent configurations
|
|
- Server creates two `RealtimeAgent` instances with specified names and instructions
|
|
- Server creates a `ChatRoom` with both agents
|
|
- Server starts the chat room and returns `session_created` event
|
|
3. **Message Broadcasting**:
|
|
- `ChatRoom` automatically broadcasts messages between agents
|
|
- All events (audio, transcripts, etc.) are forwarded to the frontend
|
|
4. **Session End**: Client sends `client_session_end` event to stop the conversation
|
|
|
|
### Frontend Flow
|
|
|
|
1. **WebSocket Setup**: Establishes connection and waits for server events
|
|
2. **Session Management**: Sends configuration and manages conversation state
|
|
3. **Audio Playback**:
|
|
- Receives base64-encoded PCM16 audio chunks
|
|
- Decodes and queues audio data
|
|
- Uses Web Audio API `ScriptProcessorNode` for streaming playback at 24kHz
|
|
4. **Transcript Display**: Shows real-time transcripts from both agents
|
|
|
|
## Key Components
|
|
|
|
### ChatRoom
|
|
|
|
The `ChatRoom` class manages multiple `RealtimeAgent` instances:
|
|
- Establishes connections for all agents
|
|
- Broadcasts messages between agents automatically
|
|
- Forwards events to the frontend
|
|
- Handles lifecycle management (start/stop)
|
|
|
|
### RealtimeAgent
|
|
|
|
Each `RealtimeAgent`:
|
|
- Connects to the DashScope realtime API
|
|
- Processes audio input from other agents
|
|
- Generates voice responses
|
|
- Emits events for transcripts, audio, and status updates
|
|
|
|
## Customization
|
|
|
|
### Changing the Model
|
|
|
|
To use a different model, modify the `DashScopeRealtimeModel` configuration in `run_server.py`:
|
|
|
|
```python
|
|
model=DashScopeRealtimeModel(
|
|
model_name="your-model-name",
|
|
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
|
)
|
|
```
|
|
|
|
### Adding More Agents
|
|
|
|
To add more agents, modify the agent creation section in `run_server.py`:
|
|
|
|
```python
|
|
agent3 = RealtimeAgent(
|
|
name=agent3_name,
|
|
sys_prompt=agent3_instructions,
|
|
model=DashScopeRealtimeModel(
|
|
model_name="qwen3-omni-flash-realtime",
|
|
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
|
),
|
|
)
|
|
|
|
chat_room = ChatRoom(agents=[agent1, agent2, agent3])
|
|
```
|
|
|
|
And update the frontend to include configuration fields for the additional agents.
|
|
|
|
## Troubleshooting
|
|
|
|
### No Audio Playback
|
|
- Ensure your browser supports Web Audio API
|
|
- Check browser console for audio-related errors
|
|
- Verify the audio format matches the expected PCM16 at 24kHz
|
|
|
|
### Connection Issues
|
|
- Verify your DashScope API key is set correctly
|
|
- Check that port 8000 is not blocked by firewall
|
|
- Review server logs for error messages
|
|
|
|
### Agents Not Responding
|
|
- Ensure both agent configurations have valid instructions
|
|
- Check that the instructions encourage conversational behavior
|
|
- Review the console logs for API errors
|
|
|
|
## References
|
|
|
|
- [AgentScope Documentation](https://modelscope.github.io/agentscope/)
|
|
- [DashScope API Documentation](https://help.aliyun.com/zh/model-studio/)
|
|
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
|
|
- [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)
|
|
|