Files
tw2/examples/workflows/multiagent_realtime/README.md
codex-bot a64378956a
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
chore: initialize sandbox and overwrite remote content
2026-03-02 22:32:27 +08:00

161 lines
5.4 KiB
Markdown

# Multi-Agent Realtime Voice Interaction Example
This example demonstrates how to use AgentScope's `ChatRoom` class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.
## Features
- 🗣️ **Real-time Voice Interaction**: Two agents communicate through voice in real-time
- 🤖 **Autonomous Conversation**: Agents converse with each other without user intervention
- ⚙️ **Customizable Configuration**: Configure agent names and instructions through the web interface
- 🎨 **Modern UI**: Clean, shadcn-inspired interface for easy interaction
- 📊 **Live Transcript**: See the conversation transcripts in real-time
## Architecture
The example uses:
- **Backend**: FastAPI server with WebSocket support
- **Frontend**: HTML5 with Web Audio API for audio playback
- **AgentScope Components**:
- `ChatRoom`: Manages multiple `RealtimeAgent` instances
- `RealtimeAgent`: Handles real-time voice interaction with AI models
- `DashScopeRealtimeModel`: DashScope's Qwen3-Omni realtime model
## Prerequisites
1. **Python Dependencies**:
```bash
pip install agentscope[dashscope]
pip install fastapi uvicorn
```
2. **DashScope API Key**:
- Set your DashScope API key as an environment variable:
```bash
export DASHSCOPE_API_KEY="your-api-key-here"
```
## Usage
1. **Start the Server**:
```bash
python run_server.py
```
2. **Open the Web Interface**:
- Navigate to `http://localhost:8000` in your web browser
3. **Configure Agents**:
- Set names and instructions for both Agent 1 and Agent 2
- Example configurations:
- **Agent 1 (Alice)**: "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
- **Agent 2 (Bob)**: "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
4. **Start the Conversation**:
- Click the "▶️ Start Conversation" button
- The agents will begin conversing autonomously
- You'll see transcripts and system messages in the message panel
- Audio playback will stream in real-time
5. **Stop the Conversation**:
- Click the "⏹️ Stop Conversation" button when you want to end the session
## How It Works
### Backend Flow
1. **WebSocket Connection**: Client connects via WebSocket to `/ws/{user_id}/{session_id}`
2. **Session Creation**:
- Client sends `client_session_create` event with agent configurations
- Server creates two `RealtimeAgent` instances with specified names and instructions
- Server creates a `ChatRoom` with both agents
- Server starts the chat room and returns `session_created` event
3. **Message Broadcasting**:
- `ChatRoom` automatically broadcasts messages between agents
- All events (audio, transcripts, etc.) are forwarded to the frontend
4. **Session End**: Client sends `client_session_end` event to stop the conversation
### Frontend Flow
1. **WebSocket Setup**: Establishes connection and waits for server events
2. **Session Management**: Sends configuration and manages conversation state
3. **Audio Playback**:
- Receives base64-encoded PCM16 audio chunks
- Decodes and queues audio data
- Uses Web Audio API `ScriptProcessorNode` for streaming playback at 24kHz
4. **Transcript Display**: Shows real-time transcripts from both agents
## Key Components
### ChatRoom
The `ChatRoom` class manages multiple `RealtimeAgent` instances:
- Establishes connections for all agents
- Broadcasts messages between agents automatically
- Forwards events to the frontend
- Handles lifecycle management (start/stop)
### RealtimeAgent
Each `RealtimeAgent`:
- Connects to the DashScope realtime API
- Processes audio input from other agents
- Generates voice responses
- Emits events for transcripts, audio, and status updates
## Customization
### Changing the Model
To use a different model, modify the `DashScopeRealtimeModel` configuration in `run_server.py`:
```python
model=DashScopeRealtimeModel(
model_name="your-model-name",
api_key=os.getenv("DASHSCOPE_API_KEY"),
)
```
### Adding More Agents
To add more agents, modify the agent creation section in `run_server.py`:
```python
agent3 = RealtimeAgent(
name=agent3_name,
sys_prompt=agent3_instructions,
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
chat_room = ChatRoom(agents=[agent1, agent2, agent3])
```
And update the frontend to include configuration fields for the additional agents.
## Troubleshooting
### No Audio Playback
- Ensure your browser supports Web Audio API
- Check browser console for audio-related errors
- Verify the audio format matches the expected PCM16 at 24kHz
### Connection Issues
- Verify your DashScope API key is set correctly
- Check that port 8000 is not blocked by firewall
- Review server logs for error messages
### Agents Not Responding
- Ensure both agent configurations have valid instructions
- Check that the instructions encourage conversational behavior
- Review the console logs for API errors
## References
- [AgentScope Documentation](https://modelscope.github.io/agentscope/)
- [DashScope API Documentation](https://help.aliyun.com/zh/model-studio/)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)