Some checks failed
Pre-commit / run (ubuntu-latest) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_en (ubuntu-latest, 3.10) (push) Has been cancelled
Deploy Sphinx documentation to Pages / build_zh (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (macos-15, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (ubuntu-latest, 3.12) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.10) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.11) (push) Has been cancelled
Python Unittest Coverage / test (windows-latest, 3.12) (push) Has been cancelled
117 lines
3.2 KiB
Markdown
117 lines
3.2 KiB
Markdown
# Realtime Voice Agent Example
|
|
|
|
This example demonstrates how to build a **real-time voice conversation agent** using AgentScope's RealtimeAgent. The agent supports bidirectional voice streaming, enabling natural voice conversations with low latency and real-time audio transcription.
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.10 or higher
|
|
- Your DashScope API key in an environment variable `DASHSCOPE_API_KEY`
|
|
|
|
Install the required packages:
|
|
|
|
```bash
|
|
uv pip install agentscope fastapi uvicorn websockets
|
|
# or
|
|
# pip install agentscope
|
|
```
|
|
|
|
## Usage
|
|
|
|
### 1. Start the Server
|
|
|
|
Run the FastAPI server:
|
|
|
|
```bash
|
|
cd examples/agent/realtime_voice_agent
|
|
python run_server.py
|
|
```
|
|
|
|
The server will start on `http://localhost:8000` by default.
|
|
|
|
### 2. Open the Web Interface
|
|
|
|
Open your web browser and navigate to:
|
|
|
|
```
|
|
http://localhost:8000
|
|
```
|
|
|
|
You will see a web interface with:
|
|
- Configuration panel (instructions and user name)
|
|
- Voice control buttons (Start Recording, Stop Recording, Disconnect)
|
|
- Video recording button (Start Video Recording)
|
|
- Text input field
|
|
- Message display area
|
|
- Video preview area (when video recording is active)
|
|
|
|
### 3. Start Conversation
|
|
|
|
1. **Configure the Agent** (optional):
|
|
- Modify the "Instructions" to customize the agent's behavior
|
|
- Enter your name in the "User Name" field
|
|
|
|
2. **Start Voice Recording**:
|
|
- Click the "🎤 Start Recording" button
|
|
- Allow microphone access when prompted by your browser
|
|
- Speak naturally to the agent
|
|
- The agent will respond with voice and text
|
|
|
|
3. **Stop Recording**:
|
|
- Click "⏹️ Stop Recording" to pause voice input
|
|
|
|
4. **Video Recording** (Optional):
|
|
- Click the "📹 Start Video Recording" button to start video recording
|
|
- Allow camera access when prompted by your browser
|
|
- The system will automatically capture and send video frames to the server at 1 frame per second (1 fps)
|
|
- A video preview will be displayed while recording
|
|
- Click "🔴 Stop Video Recording" to stop recording
|
|
- **Note**: Video recording requires an active voice chat session. Please start voice chat first before starting video recording.
|
|
|
|
## Switching Models
|
|
|
|
AgentScope supports multiple realtime voice models. By default, this example uses DashScope's `qwen3-omni-flash-realtime` model, but you can easily switch to other providers.
|
|
|
|
### Supported Models
|
|
|
|
- **GeminiRealtimeModel**
|
|
- **OpenAIRealtimeModel**
|
|
|
|
### How to Switch Models
|
|
|
|
Edit `run_server.py` and replace the model initialization code:
|
|
|
|
**For OpenAI:**
|
|
|
|
```python
|
|
from agentscope.realtime import OpenAIRealtimeModel
|
|
|
|
agent = RealtimeAgent(
|
|
name="Friday",
|
|
sys_prompt=sys_prompt,
|
|
model=OpenAIRealtimeModel(
|
|
model_name="gpt-4o-realtime-preview",
|
|
api_key=os.getenv("OPENAI_API_KEY"),
|
|
voice="alloy", # Options: "alloy", "echo", "marin", "cedar"
|
|
),
|
|
)
|
|
```
|
|
|
|
**For Gemini:**
|
|
|
|
```python
|
|
from agentscope.realtime import GeminiRealtimeModel
|
|
|
|
agent = RealtimeAgent(
|
|
name="Friday",
|
|
sys_prompt=sys_prompt,
|
|
model=GeminiRealtimeModel(
|
|
model_name="gemini-2.5-flash-native-audio-preview-09-2025",
|
|
api_key=os.getenv("GEMINI_API_KEY"),
|
|
voice="Puck", # Options: "Puck", "Charon", "Kore", "Fenrir"
|
|
),
|
|
)
|
|
```
|
|
|
|
Don't forget to set the corresponding API key environment variable before starting the server!
|
|
|