chore: initialize sandbox and overwrite remote content
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
This commit is contained in:
160
examples/workflows/multiagent_realtime/README.md
Normal file
160
examples/workflows/multiagent_realtime/README.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# Multi-Agent Realtime Voice Interaction Example
|
||||
|
||||
This example demonstrates how to use AgentScope's `ChatRoom` class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.
|
||||
|
||||
## Features
|
||||
|
||||
- 🗣️ **Real-time Voice Interaction**: Two agents communicate through voice in real-time
|
||||
- 🤖 **Autonomous Conversation**: Agents converse with each other without user intervention
|
||||
- ⚙️ **Customizable Configuration**: Configure agent names and instructions through the web interface
|
||||
- 🎨 **Modern UI**: Clean, shadcn-inspired interface for easy interaction
|
||||
- 📊 **Live Transcript**: See the conversation transcripts in real-time
|
||||
|
||||
## Architecture
|
||||
|
||||
The example uses:
|
||||
- **Backend**: FastAPI server with WebSocket support
|
||||
- **Frontend**: HTML5 with Web Audio API for audio playback
|
||||
- **AgentScope Components**:
|
||||
- `ChatRoom`: Manages multiple `RealtimeAgent` instances
|
||||
- `RealtimeAgent`: Handles real-time voice interaction with AI models
|
||||
- `DashScopeRealtimeModel`: DashScope's Qwen3-Omni realtime model
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Python Dependencies**:
|
||||
```bash
|
||||
pip install agentscope[dashscope]
|
||||
pip install fastapi uvicorn
|
||||
```
|
||||
|
||||
2. **DashScope API Key**:
|
||||
- Set your DashScope API key as an environment variable:
|
||||
```bash
|
||||
export DASHSCOPE_API_KEY="your-api-key-here"
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
1. **Start the Server**:
|
||||
```bash
|
||||
python run_server.py
|
||||
```
|
||||
|
||||
2. **Open the Web Interface**:
|
||||
- Navigate to `http://localhost:8000` in your web browser
|
||||
|
||||
3. **Configure Agents**:
|
||||
- Set names and instructions for both Agent 1 and Agent 2
|
||||
- Example configurations:
|
||||
- **Agent 1 (Alice)**: "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
|
||||
- **Agent 2 (Bob)**: "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
|
||||
|
||||
4. **Start the Conversation**:
|
||||
- Click the "▶️ Start Conversation" button
|
||||
- The agents will begin conversing autonomously
|
||||
- You'll see transcripts and system messages in the message panel
|
||||
- Audio playback will stream in real-time
|
||||
|
||||
5. **Stop the Conversation**:
|
||||
- Click the "⏹️ Stop Conversation" button when you want to end the session
|
||||
|
||||
## How It Works
|
||||
|
||||
### Backend Flow
|
||||
|
||||
1. **WebSocket Connection**: Client connects via WebSocket to `/ws/{user_id}/{session_id}`
|
||||
2. **Session Creation**:
|
||||
- Client sends `client_session_create` event with agent configurations
|
||||
- Server creates two `RealtimeAgent` instances with specified names and instructions
|
||||
- Server creates a `ChatRoom` with both agents
|
||||
- Server starts the chat room and returns `session_created` event
|
||||
3. **Message Broadcasting**:
|
||||
- `ChatRoom` automatically broadcasts messages between agents
|
||||
- All events (audio, transcripts, etc.) are forwarded to the frontend
|
||||
4. **Session End**: Client sends `client_session_end` event to stop the conversation
|
||||
|
||||
### Frontend Flow
|
||||
|
||||
1. **WebSocket Setup**: Establishes connection and waits for server events
|
||||
2. **Session Management**: Sends configuration and manages conversation state
|
||||
3. **Audio Playback**:
|
||||
- Receives base64-encoded PCM16 audio chunks
|
||||
- Decodes and queues audio data
|
||||
- Uses Web Audio API `ScriptProcessorNode` for streaming playback at 24kHz
|
||||
4. **Transcript Display**: Shows real-time transcripts from both agents
|
||||
|
||||
## Key Components
|
||||
|
||||
### ChatRoom
|
||||
|
||||
The `ChatRoom` class manages multiple `RealtimeAgent` instances:
|
||||
- Establishes connections for all agents
|
||||
- Broadcasts messages between agents automatically
|
||||
- Forwards events to the frontend
|
||||
- Handles lifecycle management (start/stop)
|
||||
|
||||
### RealtimeAgent
|
||||
|
||||
Each `RealtimeAgent`:
|
||||
- Connects to the DashScope realtime API
|
||||
- Processes audio input from other agents
|
||||
- Generates voice responses
|
||||
- Emits events for transcripts, audio, and status updates
|
||||
|
||||
## Customization
|
||||
|
||||
### Changing the Model
|
||||
|
||||
To use a different model, modify the `DashScopeRealtimeModel` configuration in `run_server.py`:
|
||||
|
||||
```python
|
||||
model=DashScopeRealtimeModel(
|
||||
model_name="your-model-name",
|
||||
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
||||
)
|
||||
```
|
||||
|
||||
### Adding More Agents
|
||||
|
||||
To add more agents, modify the agent creation section in `run_server.py`:
|
||||
|
||||
```python
|
||||
agent3 = RealtimeAgent(
|
||||
name=agent3_name,
|
||||
sys_prompt=agent3_instructions,
|
||||
model=DashScopeRealtimeModel(
|
||||
model_name="qwen3-omni-flash-realtime",
|
||||
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
||||
),
|
||||
)
|
||||
|
||||
chat_room = ChatRoom(agents=[agent1, agent2, agent3])
|
||||
```
|
||||
|
||||
And update the frontend to include configuration fields for the additional agents.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No Audio Playback
|
||||
- Ensure your browser supports Web Audio API
|
||||
- Check browser console for audio-related errors
|
||||
- Verify the audio format matches the expected PCM16 at 24kHz
|
||||
|
||||
### Connection Issues
|
||||
- Verify your DashScope API key is set correctly
|
||||
- Check that port 8000 is not blocked by firewall
|
||||
- Review server logs for error messages
|
||||
|
||||
### Agents Not Responding
|
||||
- Ensure both agent configurations have valid instructions
|
||||
- Check that the instructions encourage conversational behavior
|
||||
- Review the console logs for API errors
|
||||
|
||||
## References
|
||||
|
||||
- [AgentScope Documentation](https://modelscope.github.io/agentscope/)
|
||||
- [DashScope API Documentation](https://help.aliyun.com/zh/model-studio/)
|
||||
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
|
||||
- [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)
|
||||
|
||||
1273
examples/workflows/multiagent_realtime/multi_agent.html
Normal file
1273
examples/workflows/multiagent_realtime/multi_agent.html
Normal file
File diff suppressed because it is too large
Load Diff
220
examples/workflows/multiagent_realtime/run_server.py
Normal file
220
examples/workflows/multiagent_realtime/run_server.py
Normal file
@@ -0,0 +1,220 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
"""A multi-agent realtime voice interaction server using ChatRoom."""
|
||||
import asyncio
|
||||
import os
|
||||
import traceback
|
||||
from pathlib import Path
|
||||
|
||||
import uvicorn
|
||||
from fastapi import FastAPI, WebSocket
|
||||
from fastapi.responses import FileResponse
|
||||
|
||||
from agentscope import logger
|
||||
from agentscope.agent import RealtimeAgent
|
||||
from agentscope.message import TextBlock
|
||||
from agentscope.pipeline import ChatRoom
|
||||
from agentscope.realtime import (
|
||||
ClientEvents,
|
||||
ServerEvents,
|
||||
ClientEventType,
|
||||
DashScopeRealtimeModel,
|
||||
GeminiRealtimeModel,
|
||||
OpenAIRealtimeModel,
|
||||
)
|
||||
|
||||
app = FastAPI()
|
||||
|
||||
|
||||
@app.get("/")
|
||||
async def get() -> FileResponse:
|
||||
"""Serve the HTML test page."""
|
||||
html_path = Path(__file__).parent / "multi_agent.html"
|
||||
return FileResponse(html_path)
|
||||
|
||||
|
||||
@app.get("/model_availability")
|
||||
async def model_availability() -> dict:
|
||||
"""Check which model API keys are available in environment variables."""
|
||||
return {
|
||||
"dashscope": bool(os.getenv("DASHSCOPE_API_KEY")),
|
||||
"gemini": bool(os.getenv("GEMINI_API_KEY")),
|
||||
"openai": bool(os.getenv("OPENAI_API_KEY")),
|
||||
}
|
||||
|
||||
|
||||
async def frontend_receive(
|
||||
websocket: WebSocket,
|
||||
frontend_queue: asyncio.Queue,
|
||||
) -> None:
|
||||
"""Forward the message received from the agents to the frontend."""
|
||||
try:
|
||||
while True:
|
||||
msg: ServerEvents.EventBase = await frontend_queue.get()
|
||||
|
||||
# Send the message as JSON
|
||||
await websocket.send_json(msg.model_dump())
|
||||
|
||||
except Exception as e:
|
||||
print(f"[ERROR] frontend_receive error: {e}")
|
||||
traceback.print_exc()
|
||||
|
||||
|
||||
@app.websocket("/ws/{user_id}/{session_id}")
|
||||
async def multi_agent_endpoint(
|
||||
websocket: WebSocket,
|
||||
user_id: str,
|
||||
session_id: str,
|
||||
) -> None:
|
||||
"""WebSocket endpoint for multi-agent realtime voice interaction."""
|
||||
try:
|
||||
await websocket.accept()
|
||||
|
||||
logger.info(
|
||||
"Connected to WebSocket: user_id=%s, session_id=%s",
|
||||
user_id,
|
||||
session_id,
|
||||
)
|
||||
|
||||
# Create the queue to forward messages to the frontend
|
||||
frontend_queue = asyncio.Queue()
|
||||
asyncio.create_task(
|
||||
frontend_receive(websocket, frontend_queue),
|
||||
)
|
||||
|
||||
# Chat room and agents
|
||||
chat_room = None
|
||||
|
||||
while True:
|
||||
# Handle the incoming messages from the frontend
|
||||
# i.e. ClientEvents
|
||||
data = await websocket.receive_json()
|
||||
|
||||
client_event = ClientEvents.from_json(data)
|
||||
|
||||
if isinstance(
|
||||
client_event,
|
||||
ClientEvents.ClientSessionCreateEvent,
|
||||
):
|
||||
# Create agents by the given session arguments
|
||||
agent1_name = client_event.config.get("agent1_name", "Agent1")
|
||||
agent1_instructions = client_event.config.get(
|
||||
"agent1_instructions",
|
||||
"You are a helpful assistant.",
|
||||
)
|
||||
|
||||
agent2_name = client_event.config.get("agent2_name", "Agent2")
|
||||
agent2_instructions = client_event.config.get(
|
||||
"agent2_instructions",
|
||||
"You are a helpful assistant.",
|
||||
)
|
||||
|
||||
model_provider = client_event.config.get(
|
||||
"model_provider",
|
||||
"dashscope",
|
||||
)
|
||||
|
||||
# Create the appropriate model based on provider
|
||||
if model_provider == "dashscope":
|
||||
model1 = DashScopeRealtimeModel(
|
||||
model_name="qwen3-omni-flash-realtime",
|
||||
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
||||
voice="Dylan",
|
||||
enable_input_audio_transcription=False,
|
||||
)
|
||||
model2 = DashScopeRealtimeModel(
|
||||
model_name="qwen3-omni-flash-realtime",
|
||||
api_key=os.getenv("DASHSCOPE_API_KEY"),
|
||||
voice="Peter",
|
||||
enable_input_audio_transcription=False,
|
||||
)
|
||||
|
||||
elif model_provider == "gemini":
|
||||
model1 = GeminiRealtimeModel(
|
||||
model_name=(
|
||||
"gemini-2.5-flash-native-audio-preview-09-2025"
|
||||
),
|
||||
api_key=os.getenv("GEMINI_API_KEY"),
|
||||
voice="Puck",
|
||||
)
|
||||
model2 = GeminiRealtimeModel(
|
||||
model_name=(
|
||||
"gemini-2.5-flash-native-audio-preview-09-2025"
|
||||
),
|
||||
api_key=os.getenv("GEMINI_API_KEY"),
|
||||
voice="Charon",
|
||||
)
|
||||
|
||||
elif model_provider == "openai":
|
||||
model1 = OpenAIRealtimeModel(
|
||||
model_name="gpt-4o-realtime-preview",
|
||||
api_key=os.getenv("OPENAI_API_KEY"),
|
||||
voice="alloy",
|
||||
)
|
||||
model2 = OpenAIRealtimeModel(
|
||||
model_name="gpt-4o-realtime-preview",
|
||||
api_key=os.getenv("OPENAI_API_KEY"),
|
||||
voice="echo",
|
||||
)
|
||||
else:
|
||||
raise ValueError(
|
||||
f"Unsupported model provider: {model_provider}",
|
||||
)
|
||||
|
||||
# Create the first agent
|
||||
agent1 = RealtimeAgent(
|
||||
name=agent1_name,
|
||||
sys_prompt=agent1_instructions,
|
||||
model=model1,
|
||||
)
|
||||
|
||||
# Create the second agent
|
||||
agent2 = RealtimeAgent(
|
||||
name=agent2_name,
|
||||
sys_prompt=agent2_instructions,
|
||||
model=model2,
|
||||
)
|
||||
|
||||
# Create chat room with both agents
|
||||
chat_room = ChatRoom(agents=[agent1, agent2])
|
||||
|
||||
await chat_room.start(frontend_queue)
|
||||
|
||||
# Send session_created event to frontend
|
||||
await websocket.send_json(
|
||||
ServerEvents.ServerSessionCreatedEvent(
|
||||
session_id=session_id,
|
||||
).model_dump(),
|
||||
)
|
||||
|
||||
await agent1.model.send(
|
||||
TextBlock(
|
||||
type="text",
|
||||
text="<system>Now you can talk.</system>",
|
||||
),
|
||||
)
|
||||
|
||||
elif client_event.type == ClientEventType.CLIENT_SESSION_END:
|
||||
# End the session with the chat room
|
||||
if chat_room:
|
||||
await chat_room.stop()
|
||||
chat_room = None
|
||||
|
||||
else:
|
||||
# Forward other events to the chat room
|
||||
if chat_room:
|
||||
await chat_room.handle_input(client_event)
|
||||
|
||||
except Exception as e:
|
||||
print(f"[ERROR] WebSocket endpoint error: {e}")
|
||||
traceback.print_exc()
|
||||
raise
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
uvicorn.run(
|
||||
"run_server:app",
|
||||
host="localhost",
|
||||
port=8000,
|
||||
reload=True,
|
||||
log_level="info",
|
||||
)
|
||||
Reference in New Issue
Block a user