chore: initialize sandbox and overwrite remote content
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled

This commit is contained in:
codex-bot
2026-03-02 22:32:27 +08:00
commit a64378956a
584 changed files with 93604 additions and 0 deletions

View File

@@ -0,0 +1,160 @@
# Multi-Agent Realtime Voice Interaction Example
This example demonstrates how to use AgentScope's `ChatRoom` class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.
## Features
- 🗣️ **Real-time Voice Interaction**: Two agents communicate through voice in real-time
- 🤖 **Autonomous Conversation**: Agents converse with each other without user intervention
- ⚙️ **Customizable Configuration**: Configure agent names and instructions through the web interface
- 🎨 **Modern UI**: Clean, shadcn-inspired interface for easy interaction
- 📊 **Live Transcript**: See the conversation transcripts in real-time
## Architecture
The example uses:
- **Backend**: FastAPI server with WebSocket support
- **Frontend**: HTML5 with Web Audio API for audio playback
- **AgentScope Components**:
- `ChatRoom`: Manages multiple `RealtimeAgent` instances
- `RealtimeAgent`: Handles real-time voice interaction with AI models
- `DashScopeRealtimeModel`: DashScope's Qwen3-Omni realtime model
## Prerequisites
1. **Python Dependencies**:
```bash
pip install agentscope[dashscope]
pip install fastapi uvicorn
```
2. **DashScope API Key**:
- Set your DashScope API key as an environment variable:
```bash
export DASHSCOPE_API_KEY="your-api-key-here"
```
## Usage
1. **Start the Server**:
```bash
python run_server.py
```
2. **Open the Web Interface**:
- Navigate to `http://localhost:8000` in your web browser
3. **Configure Agents**:
- Set names and instructions for both Agent 1 and Agent 2
- Example configurations:
- **Agent 1 (Alice)**: "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
- **Agent 2 (Bob)**: "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
4. **Start the Conversation**:
- Click the "▶️ Start Conversation" button
- The agents will begin conversing autonomously
- You'll see transcripts and system messages in the message panel
- Audio playback will stream in real-time
5. **Stop the Conversation**:
- Click the "⏹️ Stop Conversation" button when you want to end the session
## How It Works
### Backend Flow
1. **WebSocket Connection**: Client connects via WebSocket to `/ws/{user_id}/{session_id}`
2. **Session Creation**:
- Client sends `client_session_create` event with agent configurations
- Server creates two `RealtimeAgent` instances with specified names and instructions
- Server creates a `ChatRoom` with both agents
- Server starts the chat room and returns `session_created` event
3. **Message Broadcasting**:
- `ChatRoom` automatically broadcasts messages between agents
- All events (audio, transcripts, etc.) are forwarded to the frontend
4. **Session End**: Client sends `client_session_end` event to stop the conversation
### Frontend Flow
1. **WebSocket Setup**: Establishes connection and waits for server events
2. **Session Management**: Sends configuration and manages conversation state
3. **Audio Playback**:
- Receives base64-encoded PCM16 audio chunks
- Decodes and queues audio data
- Uses Web Audio API `ScriptProcessorNode` for streaming playback at 24kHz
4. **Transcript Display**: Shows real-time transcripts from both agents
## Key Components
### ChatRoom
The `ChatRoom` class manages multiple `RealtimeAgent` instances:
- Establishes connections for all agents
- Broadcasts messages between agents automatically
- Forwards events to the frontend
- Handles lifecycle management (start/stop)
### RealtimeAgent
Each `RealtimeAgent`:
- Connects to the DashScope realtime API
- Processes audio input from other agents
- Generates voice responses
- Emits events for transcripts, audio, and status updates
## Customization
### Changing the Model
To use a different model, modify the `DashScopeRealtimeModel` configuration in `run_server.py`:
```python
model=DashScopeRealtimeModel(
model_name="your-model-name",
api_key=os.getenv("DASHSCOPE_API_KEY"),
)
```
### Adding More Agents
To add more agents, modify the agent creation section in `run_server.py`:
```python
agent3 = RealtimeAgent(
name=agent3_name,
sys_prompt=agent3_instructions,
model=DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
),
)
chat_room = ChatRoom(agents=[agent1, agent2, agent3])
```
And update the frontend to include configuration fields for the additional agents.
## Troubleshooting
### No Audio Playback
- Ensure your browser supports Web Audio API
- Check browser console for audio-related errors
- Verify the audio format matches the expected PCM16 at 24kHz
### Connection Issues
- Verify your DashScope API key is set correctly
- Check that port 8000 is not blocked by firewall
- Review server logs for error messages
### Agents Not Responding
- Ensure both agent configurations have valid instructions
- Check that the instructions encourage conversational behavior
- Review the console logs for API errors
## References
- [AgentScope Documentation](https://modelscope.github.io/agentscope/)
- [DashScope API Documentation](https://help.aliyun.com/zh/model-studio/)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,220 @@
# -*- coding: utf-8 -*-
"""A multi-agent realtime voice interaction server using ChatRoom."""
import asyncio
import os
import traceback
from pathlib import Path
import uvicorn
from fastapi import FastAPI, WebSocket
from fastapi.responses import FileResponse
from agentscope import logger
from agentscope.agent import RealtimeAgent
from agentscope.message import TextBlock
from agentscope.pipeline import ChatRoom
from agentscope.realtime import (
ClientEvents,
ServerEvents,
ClientEventType,
DashScopeRealtimeModel,
GeminiRealtimeModel,
OpenAIRealtimeModel,
)
app = FastAPI()
@app.get("/")
async def get() -> FileResponse:
"""Serve the HTML test page."""
html_path = Path(__file__).parent / "multi_agent.html"
return FileResponse(html_path)
@app.get("/model_availability")
async def model_availability() -> dict:
"""Check which model API keys are available in environment variables."""
return {
"dashscope": bool(os.getenv("DASHSCOPE_API_KEY")),
"gemini": bool(os.getenv("GEMINI_API_KEY")),
"openai": bool(os.getenv("OPENAI_API_KEY")),
}
async def frontend_receive(
websocket: WebSocket,
frontend_queue: asyncio.Queue,
) -> None:
"""Forward the message received from the agents to the frontend."""
try:
while True:
msg: ServerEvents.EventBase = await frontend_queue.get()
# Send the message as JSON
await websocket.send_json(msg.model_dump())
except Exception as e:
print(f"[ERROR] frontend_receive error: {e}")
traceback.print_exc()
@app.websocket("/ws/{user_id}/{session_id}")
async def multi_agent_endpoint(
websocket: WebSocket,
user_id: str,
session_id: str,
) -> None:
"""WebSocket endpoint for multi-agent realtime voice interaction."""
try:
await websocket.accept()
logger.info(
"Connected to WebSocket: user_id=%s, session_id=%s",
user_id,
session_id,
)
# Create the queue to forward messages to the frontend
frontend_queue = asyncio.Queue()
asyncio.create_task(
frontend_receive(websocket, frontend_queue),
)
# Chat room and agents
chat_room = None
while True:
# Handle the incoming messages from the frontend
# i.e. ClientEvents
data = await websocket.receive_json()
client_event = ClientEvents.from_json(data)
if isinstance(
client_event,
ClientEvents.ClientSessionCreateEvent,
):
# Create agents by the given session arguments
agent1_name = client_event.config.get("agent1_name", "Agent1")
agent1_instructions = client_event.config.get(
"agent1_instructions",
"You are a helpful assistant.",
)
agent2_name = client_event.config.get("agent2_name", "Agent2")
agent2_instructions = client_event.config.get(
"agent2_instructions",
"You are a helpful assistant.",
)
model_provider = client_event.config.get(
"model_provider",
"dashscope",
)
# Create the appropriate model based on provider
if model_provider == "dashscope":
model1 = DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
voice="Dylan",
enable_input_audio_transcription=False,
)
model2 = DashScopeRealtimeModel(
model_name="qwen3-omni-flash-realtime",
api_key=os.getenv("DASHSCOPE_API_KEY"),
voice="Peter",
enable_input_audio_transcription=False,
)
elif model_provider == "gemini":
model1 = GeminiRealtimeModel(
model_name=(
"gemini-2.5-flash-native-audio-preview-09-2025"
),
api_key=os.getenv("GEMINI_API_KEY"),
voice="Puck",
)
model2 = GeminiRealtimeModel(
model_name=(
"gemini-2.5-flash-native-audio-preview-09-2025"
),
api_key=os.getenv("GEMINI_API_KEY"),
voice="Charon",
)
elif model_provider == "openai":
model1 = OpenAIRealtimeModel(
model_name="gpt-4o-realtime-preview",
api_key=os.getenv("OPENAI_API_KEY"),
voice="alloy",
)
model2 = OpenAIRealtimeModel(
model_name="gpt-4o-realtime-preview",
api_key=os.getenv("OPENAI_API_KEY"),
voice="echo",
)
else:
raise ValueError(
f"Unsupported model provider: {model_provider}",
)
# Create the first agent
agent1 = RealtimeAgent(
name=agent1_name,
sys_prompt=agent1_instructions,
model=model1,
)
# Create the second agent
agent2 = RealtimeAgent(
name=agent2_name,
sys_prompt=agent2_instructions,
model=model2,
)
# Create chat room with both agents
chat_room = ChatRoom(agents=[agent1, agent2])
await chat_room.start(frontend_queue)
# Send session_created event to frontend
await websocket.send_json(
ServerEvents.ServerSessionCreatedEvent(
session_id=session_id,
).model_dump(),
)
await agent1.model.send(
TextBlock(
type="text",
text="<system>Now you can talk.</system>",
),
)
elif client_event.type == ClientEventType.CLIENT_SESSION_END:
# End the session with the chat room
if chat_room:
await chat_room.stop()
chat_room = None
else:
# Forward other events to the chat room
if chat_room:
await chat_room.handle_input(client_event)
except Exception as e:
print(f"[ERROR] WebSocket endpoint error: {e}")
traceback.print_exc()
raise
if __name__ == "__main__":
uvicorn.run(
"run_server:app",
host="localhost",
port=8000,
reload=True,
log_level="info",
)