chore: initialize sandbox and overwrite remote content

2026-03-02 22:32:27 +08:00
commit a64378956a
584 changed files with 93604 additions and 0 deletions
--- a/examples/workflows/multiagent_realtime/README.md
+++ b/examples/workflows/multiagent_realtime/README.md
@@ -0,0 +1,160 @@
+# Multi-Agent Realtime Voice Interaction Example
+
+This example demonstrates how to use AgentScope's `ChatRoom` class to create a multi-agent real-time voice interaction system where two AI agents can have autonomous conversations without user input.
+
+## Features
+
+- 🗣️ **Real-time Voice Interaction**: Two agents communicate through voice in real-time
+- 🤖 **Autonomous Conversation**: Agents converse with each other without user intervention
+- ⚙️ **Customizable Configuration**: Configure agent names and instructions through the web interface
+- 🎨 **Modern UI**: Clean, shadcn-inspired interface for easy interaction
+- 📊 **Live Transcript**: See the conversation transcripts in real-time
+
+## Architecture
+
+The example uses:
+- **Backend**: FastAPI server with WebSocket support
+- **Frontend**: HTML5 with Web Audio API for audio playback
+- **AgentScope Components**:
+  - `ChatRoom`: Manages multiple `RealtimeAgent` instances
+  - `RealtimeAgent`: Handles real-time voice interaction with AI models
+  - `DashScopeRealtimeModel`: DashScope's Qwen3-Omni realtime model
+
+## Prerequisites
+
+1. **Python Dependencies**:
+   ```bash
+   pip install agentscope[dashscope]
+   pip install fastapi uvicorn
+   ```
+
+2. **DashScope API Key**:
+   - Set your DashScope API key as an environment variable:
+     ```bash
+     export DASHSCOPE_API_KEY="your-api-key-here"
+     ```
+
+## Usage
+
+1. **Start the Server**:
+   ```bash
+   python run_server.py
+   ```
+
+2. **Open the Web Interface**:
+   - Navigate to `http://localhost:8000` in your web browser
+
+3. **Configure Agents**:
+   - Set names and instructions for both Agent 1 and Agent 2
+   - Example configurations:
+     - **Agent 1 (Alice)**: "You are Alice, a cheerful and optimistic person who loves to share stories and ask questions. Keep your responses brief and conversational."
+     - **Agent 2 (Bob)**: "You are Bob, a thoughtful and analytical person who enjoys deep conversations. Keep your responses brief and conversational."
+
+4. **Start the Conversation**:
+   - Click the "▶️ Start Conversation" button
+   - The agents will begin conversing autonomously
+   - You'll see transcripts and system messages in the message panel
+   - Audio playback will stream in real-time
+
+5. **Stop the Conversation**:
+   - Click the "⏹️ Stop Conversation" button when you want to end the session
+
+## How It Works
+
+### Backend Flow
+
+1. **WebSocket Connection**: Client connects via WebSocket to `/ws/{user_id}/{session_id}`
+2. **Session Creation**:
+   - Client sends `client_session_create` event with agent configurations
+   - Server creates two `RealtimeAgent` instances with specified names and instructions
+   - Server creates a `ChatRoom` with both agents
+   - Server starts the chat room and returns `session_created` event
+3. **Message Broadcasting**:
+   - `ChatRoom` automatically broadcasts messages between agents
+   - All events (audio, transcripts, etc.) are forwarded to the frontend
+4. **Session End**: Client sends `client_session_end` event to stop the conversation
+
+### Frontend Flow
+
+1. **WebSocket Setup**: Establishes connection and waits for server events
+2. **Session Management**: Sends configuration and manages conversation state
+3. **Audio Playback**:
+   - Receives base64-encoded PCM16 audio chunks
+   - Decodes and queues audio data
+   - Uses Web Audio API `ScriptProcessorNode` for streaming playback at 24kHz
+4. **Transcript Display**: Shows real-time transcripts from both agents
+
+## Key Components
+
+### ChatRoom
+
+The `ChatRoom` class manages multiple `RealtimeAgent` instances:
+- Establishes connections for all agents
+- Broadcasts messages between agents automatically
+- Forwards events to the frontend
+- Handles lifecycle management (start/stop)
+
+### RealtimeAgent
+
+Each `RealtimeAgent`:
+- Connects to the DashScope realtime API
+- Processes audio input from other agents
+- Generates voice responses
+- Emits events for transcripts, audio, and status updates
+
+## Customization
+
+### Changing the Model
+
+To use a different model, modify the `DashScopeRealtimeModel` configuration in `run_server.py`:
+
+```python
+model=DashScopeRealtimeModel(
+    model_name="your-model-name",
+    api_key=os.getenv("DASHSCOPE_API_KEY"),
+)
+```
+
+### Adding More Agents
+
+To add more agents, modify the agent creation section in `run_server.py`:
+
+```python
+agent3 = RealtimeAgent(
+    name=agent3_name,
+    sys_prompt=agent3_instructions,
+    model=DashScopeRealtimeModel(
+        model_name="qwen3-omni-flash-realtime",
+        api_key=os.getenv("DASHSCOPE_API_KEY"),
+    ),
+)
+
+chat_room = ChatRoom(agents=[agent1, agent2, agent3])
+```
+
+And update the frontend to include configuration fields for the additional agents.
+
+## Troubleshooting
+
+### No Audio Playback
+- Ensure your browser supports Web Audio API
+- Check browser console for audio-related errors
+- Verify the audio format matches the expected PCM16 at 24kHz
+
+### Connection Issues
+- Verify your DashScope API key is set correctly
+- Check that port 8000 is not blocked by firewall
+- Review server logs for error messages
+
+### Agents Not Responding
+- Ensure both agent configurations have valid instructions
+- Check that the instructions encourage conversational behavior
+- Review the console logs for API errors
+
+## References
+
+- [AgentScope Documentation](https://modelscope.github.io/agentscope/)
+- [DashScope API Documentation](https://help.aliyun.com/zh/model-studio/)
+- [FastAPI Documentation](https://fastapi.tiangolo.com/)
+- [Web Audio API](https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API)
+
--- a/examples/workflows/multiagent_realtime/multi_agent.html
+++ b/examples/workflows/multiagent_realtime/multi_agent.html
--- a/examples/workflows/multiagent_realtime/run_server.py
+++ b/examples/workflows/multiagent_realtime/run_server.py
@@ -0,0 +1,220 @@
+# -*- coding: utf-8 -*-
+"""A multi-agent realtime voice interaction server using ChatRoom."""
+import asyncio
+import os
+import traceback
+from pathlib import Path
+
+import uvicorn
+from fastapi import FastAPI, WebSocket
+from fastapi.responses import FileResponse
+
+from agentscope import logger
+from agentscope.agent import RealtimeAgent
+from agentscope.message import TextBlock
+from agentscope.pipeline import ChatRoom
+from agentscope.realtime import (
+    ClientEvents,
+    ServerEvents,
+    ClientEventType,
+    DashScopeRealtimeModel,
+    GeminiRealtimeModel,
+    OpenAIRealtimeModel,
+)
+
+app = FastAPI()
+
+
+@app.get("/")
+async def get() -> FileResponse:
+    """Serve the HTML test page."""
+    html_path = Path(__file__).parent / "multi_agent.html"
+    return FileResponse(html_path)
+
+
+@app.get("/model_availability")
+async def model_availability() -> dict:
+    """Check which model API keys are available in environment variables."""
+    return {
+        "dashscope": bool(os.getenv("DASHSCOPE_API_KEY")),
+        "gemini": bool(os.getenv("GEMINI_API_KEY")),
+        "openai": bool(os.getenv("OPENAI_API_KEY")),
+    }
+
+
+async def frontend_receive(
+    websocket: WebSocket,
+    frontend_queue: asyncio.Queue,
+) -> None:
+    """Forward the message received from the agents to the frontend."""
+    try:
+        while True:
+            msg: ServerEvents.EventBase = await frontend_queue.get()
+
+            # Send the message as JSON
+            await websocket.send_json(msg.model_dump())
+
+    except Exception as e:
+        print(f"[ERROR] frontend_receive error: {e}")
+        traceback.print_exc()
+
+
+@app.websocket("/ws/{user_id}/{session_id}")
+async def multi_agent_endpoint(
+    websocket: WebSocket,
+    user_id: str,
+    session_id: str,
+) -> None:
+    """WebSocket endpoint for multi-agent realtime voice interaction."""
+    try:
+        await websocket.accept()
+
+        logger.info(
+            "Connected to WebSocket: user_id=%s, session_id=%s",
+            user_id,
+            session_id,
+        )
+
+        # Create the queue to forward messages to the frontend
+        frontend_queue = asyncio.Queue()
+        asyncio.create_task(
+            frontend_receive(websocket, frontend_queue),
+        )
+
+        # Chat room and agents
+        chat_room = None
+
+        while True:
+            # Handle the incoming messages from the frontend
+            # i.e. ClientEvents
+            data = await websocket.receive_json()
+
+            client_event = ClientEvents.from_json(data)
+
+            if isinstance(
+                client_event,
+                ClientEvents.ClientSessionCreateEvent,
+            ):
+                # Create agents by the given session arguments
+                agent1_name = client_event.config.get("agent1_name", "Agent1")
+                agent1_instructions = client_event.config.get(
+                    "agent1_instructions",
+                    "You are a helpful assistant.",
+                )
+
+                agent2_name = client_event.config.get("agent2_name", "Agent2")
+                agent2_instructions = client_event.config.get(
+                    "agent2_instructions",
+                    "You are a helpful assistant.",
+                )
+
+                model_provider = client_event.config.get(
+                    "model_provider",
+                    "dashscope",
+                )
+
+                # Create the appropriate model based on provider
+                if model_provider == "dashscope":
+                    model1 = DashScopeRealtimeModel(
+                        model_name="qwen3-omni-flash-realtime",
+                        api_key=os.getenv("DASHSCOPE_API_KEY"),
+                        voice="Dylan",
+                        enable_input_audio_transcription=False,
+                    )
+                    model2 = DashScopeRealtimeModel(
+                        model_name="qwen3-omni-flash-realtime",
+                        api_key=os.getenv("DASHSCOPE_API_KEY"),
+                        voice="Peter",
+                        enable_input_audio_transcription=False,
+                    )
+
+                elif model_provider == "gemini":
+                    model1 = GeminiRealtimeModel(
+                        model_name=(
+                            "gemini-2.5-flash-native-audio-preview-09-2025"
+                        ),
+                        api_key=os.getenv("GEMINI_API_KEY"),
+                        voice="Puck",
+                    )
+                    model2 = GeminiRealtimeModel(
+                        model_name=(
+                            "gemini-2.5-flash-native-audio-preview-09-2025"
+                        ),
+                        api_key=os.getenv("GEMINI_API_KEY"),
+                        voice="Charon",
+                    )
+
+                elif model_provider == "openai":
+                    model1 = OpenAIRealtimeModel(
+                        model_name="gpt-4o-realtime-preview",
+                        api_key=os.getenv("OPENAI_API_KEY"),
+                        voice="alloy",
+                    )
+                    model2 = OpenAIRealtimeModel(
+                        model_name="gpt-4o-realtime-preview",
+                        api_key=os.getenv("OPENAI_API_KEY"),
+                        voice="echo",
+                    )
+                else:
+                    raise ValueError(
+                        f"Unsupported model provider: {model_provider}",
+                    )
+
+                # Create the first agent
+                agent1 = RealtimeAgent(
+                    name=agent1_name,
+                    sys_prompt=agent1_instructions,
+                    model=model1,
+                )
+
+                # Create the second agent
+                agent2 = RealtimeAgent(
+                    name=agent2_name,
+                    sys_prompt=agent2_instructions,
+                    model=model2,
+                )
+
+                # Create chat room with both agents
+                chat_room = ChatRoom(agents=[agent1, agent2])
+
+                await chat_room.start(frontend_queue)
+
+                # Send session_created event to frontend
+                await websocket.send_json(
+                    ServerEvents.ServerSessionCreatedEvent(
+                        session_id=session_id,
+                    ).model_dump(),
+                )
+
+                await agent1.model.send(
+                    TextBlock(
+                        type="text",
+                        text="<system>Now you can talk.</system>",
+                    ),
+                )
+
+            elif client_event.type == ClientEventType.CLIENT_SESSION_END:
+                # End the session with the chat room
+                if chat_room:
+                    await chat_room.stop()
+                    chat_room = None
+
+            else:
+                # Forward other events to the chat room
+                if chat_room:
+                    await chat_room.handle_input(client_event)
+
+    except Exception as e:
+        print(f"[ERROR] WebSocket endpoint error: {e}")
+        traceback.print_exc()
+        raise
+
+
+if __name__ == "__main__":
+    uvicorn.run(
+        "run_server:app",
+        host="localhost",
+        port=8000,
+        reload=True,
+        log_level="info",
+    )