chore: initialize sandbox and overwrite remote content

2026-03-02 22:32:27 +08:00
commit a64378956a
584 changed files with 93604 additions and 0 deletions
--- a/examples/functionality/short_term_memory/memory_compression/README.md
+++ b/examples/functionality/short_term_memory/memory_compression/README.md
@@ -0,0 +1,317 @@
+# MemoryWithCompress
+
+- [ ] TODO: The memory module with compression will be added to the agentscope library in the future.
+
+## Overview
+
+MemoryWithCompress is a memory management system designed for AgentScope's `ReActAgent`. It automatically compresses conversation history when the memory size exceeds a specified token limit, using a Large Language Model (LLM) to create concise summaries that preserve key information. This allows agents to maintain context over long conversations while staying within token constraints.
+
+The system maintains two separate storage mechanisms:
+- **`chat_history_storage`**: Stores the complete, unmodified conversation history (uses `MessageStorageBase` interface)
+- **`memory_storage`**: Stores messages that may be compressed when token limits are exceeded (uses `MessageStorageBase` interface)
+
+Both storage mechanisms are abstracted through the `MessageStorageBase` interface, allowing for flexible storage backends. By default, `InMemoryMessageStorage` is used for both.
+
+## Core Features
+
+### Automatic Memory Compression
+- **Token-based Triggering**: Automatically compresses memory when the total token count exceeds `max_token`
+- **LLM-Powered Summarization**: Uses an LLM to intelligently compress conversation history while preserving essential information
+- **Structured Output**: Uses Pydantic schemas to ensure consistent compression format
+
+### Dual Storage System
+- **Complete History**: Maintains original, unmodified messages in `_chat_history` for reference
+- **Compressed Memory**: Stores potentially compressed messages in `_memory` for efficient context management
+
+### Flexible Memory Management
+- **Filtering Support**: Provides `filter_func` parameter for custom memory filtering
+- **Recent N Retrieval**: Supports retrieving only the most recent N messages
+- **State Persistence**: Includes `state_dict()` and `load_state_dict()` methods for saving and loading memory state
+- **Storage Abstraction**: Uses `MessageStorageBase` interface for flexible storage backends
+- **Compression Triggers**: Supports both token-based and custom trigger functions for compression
+- **Compression Timing Control**: Configurable compression on add (`compression_on_add`) and get (`compression_on_get`) operations
+
+## File Structure
+
+```
+memory_with_compression/
+├── README.md                   # This documentation file
+├── main.py                     # Example demonstrating MemoryWithCompress usage
+├── _memory_with_compress.py    # Core MemoryWithCompress implementation
+├── _memory_storage.py          # Storage abstraction layer (MessageStorageBase, InMemoryMessageStorage)
+├── _mc_utils.py                # Utility functions (formatting, token counting, compression schema)
+
+```
+
+## Prerequisites
+
+### Clone the AgentScope Repository
+This example depends on AgentScope. Please clone the full repository to your local machine.
+
+### Install Dependencies
+**Recommended**: Python 3.10+
+
+Install the required dependencies:
+```bash
+pip install agentscope
+```
+
+### API Keys
+This example uses DashScope APIs by default. You need to set your API key as an environment variable:
+```bash
+export DASHSCOPE_API_KEY='YOUR_API_KEY'
+```
+
+You can easily switch to other models by modifying the configuration in `main.py`.
+
+## How It Works
+
+### 1. Memory Addition Flow
+1. **Message Input**: New messages are added via the async `add()` method
+2. **Dual Storage**: Messages are deep-copied and added to both `chat_history_storage` and `memory_storage`
+3. **Optional Compression on Add**: If `compression_on_add=True`, compression may be triggered immediately after adding messages
+
+### 2. Memory Retrieval and Compression Flow
+When `get_memory()` is called (if `compression_on_get=True`):
+1. **Token Counting**: The system calculates the total token count of all messages in `memory_storage`
+2. **Compression Check**:
+   - First checks if token count exceeds `max_token` (automatic compression)
+   - Then checks if `compression_trigger_func` returns `True` (custom trigger)
+3. **LLM Compression**: If compression is needed, all messages in `memory_storage` are sent to the LLM with a compression prompt
+4. **Structured Output**: The LLM returns a structured response containing the compressed summary
+5. **Memory Replacement**: The entire `memory_storage` is updated with the compressed message(s)
+6. **Filtering & Selection**: Optional filtering and recent_n selection are applied
+7. **Return**: The processed memory is returned
+
+### 3. Compression Process
+The compression uses a structured output approach:
+- **Prompt**: Instructs the LLM to summarize conversation history while preserving key information
+- **Customizable Prompt**: Supports `customized_compression_prompt` parameter for custom prompt templates
+- **Schema**: Uses `MemoryCompressionSchema` (Pydantic model) to ensure consistent output format
+- **Output Format**: Returns a message with content wrapped in `<compressed_memory>` tags
+- **Async Support**: All compression operations are asynchronous
+
+## Usage Examples
+
+### Running the Example
+To see `MemoryWithCompress` in action, run the example script:
+```bash
+python ./main.py
+```
+
+### Basic Initialization
+Here is a snippet from `main.py` showing how to set up the agent and memory:
+
+```python
+from agentscope.agent import ReActAgent
+from agentscope.model import DashScopeChatModel
+from agentscope.formatter import DashScopeChatFormatter
+from agentscope.token import OpenAITokenCounter
+from agentscope.message import Msg
+from _memory_with_compress import MemoryWithCompress
+
+# 1. Create the model for agent and memory compression
+model = DashScopeChatModel(
+    api_key=os.environ.get("DASHSCOPE_API_KEY"),
+    model_name="qwen-max",
+    stream=False,
+)
+
+# 2. Optional: Define a custom compression trigger function
+async def trigger_compression(msgs: list[Msg]) -> bool:
+    # Trigger compression if the number of messages exceeds 2
+    # and the last message is from the assistant
+    return len(msgs) > 2 and msgs[-1].role == "assistant"
+
+# 3. Initialize MemoryWithCompress
+memory_with_compress = MemoryWithCompress(
+    model=model,
+    formatter=DashScopeChatFormatter(),
+    max_token=3000,  # Compress when memory exceeds 3000 tokens
+    token_counter=OpenAITokenCounter(model_name="qwen-max"),
+    compression_trigger_func=trigger_compression,  # Optional custom trigger
+    compression_on_add=False,  # Don't compress on add (default)
+    compression_on_get=True,   # Compress on get (default)
+)
+
+# 4. Initialize ReActAgent with the memory instance
+agent = ReActAgent(
+    name="Friday",
+    sys_prompt="You are a helpful assistant named Friday.",
+    model=model,
+    formatter=DashScopeChatFormatter(),
+    memory=memory_with_compress,
+)
+```
+
+### Custom Compression Function
+You can provide a custom compression function:
+
+```python
+async def custom_compress(messages: List[Msg]) -> List[Msg]:
+    # Your custom compression logic
+    # Must return a List[Msg], not a single Msg
+    compressed_content = "..."
+    return [Msg("assistant", compressed_content, "assistant")]
+
+memory_with_compress = MemoryWithCompress(
+    model=model,
+    formatter=formatter,
+    max_token=300,
+    compress_func=custom_compress,
+)
+```
+
+### Custom Storage Backend
+You can provide custom storage backends by implementing the `MessageStorageBase` interface:
+
+```python
+from _memory_storage import MessageStorageBase
+
+class CustomStorage(MessageStorageBase):
+    # Implement required methods: start, stop, health, add, delete, clear, get, replace, __aenter__, __aexit__
+    ...
+
+memory_with_compress = MemoryWithCompress(
+    model=model,
+    formatter=formatter,
+    max_token=300,
+    chat_history_storage=CustomStorage(),
+    memory_storage=CustomStorage(),
+)
+```
+
+## API Reference
+
+### MemoryWithCompress Class
+
+#### `__init__(...)`
+Initializes the memory system. Key parameters include:
+
+- `model` (ChatModelBase): The LLM model to use for compression
+- `formatter` (FormatterBase): The formatter to use for formatting messages
+- `max_token` (int): The maximum token count for `memory_storage`. Default: 28000. Compression is triggered when exceeded
+- `chat_history_storage` (MessageStorageBase): Storage backend for complete chat history. Default: `InMemoryMessageStorage()`
+- `memory_storage` (MessageStorageBase): Storage backend for compressed memory. Default: `InMemoryMessageStorage()`
+- `token_counter` (Optional[TokenCounterBase]): The token counter for counting tokens. Default: None. If None, it will return the character count of the JSON string representation of messages (i.e., len(json.dumps(messages, ensure_ascii=False))).
+- `compress_func` (Callable[[List[Msg]], Awaitable[List[Msg]]] | None): Custom compression function. Must be async and return `List[Msg]`. If None, uses the default `_compress_memory` method
+- `compression_trigger_func` (Callable[[List[Msg]], Awaitable[bool]] | None): Optional function to trigger compression when token count is below `max_token`. Must be async and return `bool`. If None, compression only occurs when token count exceeds `max_token`
+- `compression_on_add` (bool): Whether to check and compress memory when adding messages. Default: False
+- `compression_on_get` (bool): Whether to check and compress memory when getting messages. Default: True
+- `customized_compression_prompt` (str | None): Optional customized compression prompt template. Should include placeholders: `{max_token}`, `{messages_list_json}`, `{schema_json}`. Default: None (uses default template)
+
+#### Main Methods
+
+**`async add(msgs: Union[Sequence[Msg], Msg, None], compress_func=None, compression_trigger_func=None)`**
+- Adds new messages to both `chat_history_storage` and `memory_storage`
+- Messages are deep-copied to avoid modifying originals
+- Raises `TypeError` if non-Msg objects are provided
+- Parameters:
+  - `msgs`: Messages to be added
+  - `compress_func` (Optional): Override the instance-level compression function for this call
+  - `compression_trigger_func` (Optional): Override the instance-level trigger function for this call
+- If `compression_on_add=True`, may trigger compression after adding
+
+**`async direct_update_memory(msgs: Union[Sequence[Msg], Msg, None])`**
+- Directly updates the `memory_storage` with new messages (does not update `chat_history_storage`)
+- Useful for replacing memory content directly
+
+**`async get_memory(recent_n=None, filter_func=None, compress_func=None, compression_trigger_func=None)`**
+- Retrieves memory content, automatically compressing if token limit is exceeded (if `compression_on_get=True`)
+- Parameters:
+  - `recent_n` (Optional[int]): Return only the most recent N messages
+  - `filter_func` (Optional[Callable[[int, Msg], bool]]): Custom filter function that takes (index, message) and returns bool
+  - `compress_func` (Optional): Override the instance-level compression function for this call
+  - `compression_trigger_func` (Optional): Override the instance-level trigger function for this call
+- Returns: `list[Msg]` - The memory content (potentially compressed)
+
+**`async delete(indices: Union[Iterable[int], int])`**
+- Deletes memory fragments from `memory_storage` (note: does not delete from `chat_history_storage`)
+- Indices can be a single int or an iterable of ints
+
+**`async size() -> int`**
+- Returns the number of messages in `chat_history_storage`
+
+**`async clear()`**
+- Clears all memory from both `chat_history_storage` and `memory_storage`
+
+**`state_dict() -> dict`**
+- Returns a dictionary containing the serialized state:
+  - `chat_history_storage`: List of message dictionaries from chat history
+  - `memory_storage`: List of message dictionaries from memory
+  - `max_token`: The max_token setting
+- Note: This method handles async operations internally, so it can be called from both sync and async contexts
+
+**`load_state_dict(state_dict: dict, strict: bool = True)`**
+- Loads memory state from a dictionary
+- Restores `chat_history_storage`, `memory_storage`, and `max_token` settings
+- Note: This method handles async operations internally, so it can be called from both sync and async contexts
+
+**`async retrieve(*args, **kwargs)`**
+- Not implemented. Use `get_memory()` instead.
+- Raises `NotImplementedError`
+
+## Internal Methods
+
+**`async _compress_memory(msgs: List[Msg]) -> List[Msg]`**
+- Internal method that compresses messages using the LLM
+- Uses structured output with `MemoryCompressionSchema`
+- Returns a `List[Msg]` containing the compressed summary (typically a single message)
+- Supports both streaming and non-streaming models
+
+**`async _check_length_and_compress(compress_func=None) -> bool`**
+- Checks if memory token count exceeds `max_token` and compresses if needed
+- Returns `True` if compression was triggered, `False` otherwise
+
+**`async check_and_compress(compress_func=None, compression_trigger_func=None, memory=None) -> tuple[bool, List[Msg]]`**
+- Checks if compression is needed based on `compression_trigger_func`
+- Returns a tuple: (was_compressed: bool, compressed_memory: List[Msg])
+- If `memory` is provided, checks that instead of `memory_storage`
+
+## Utility Functions
+
+The `_mc_utils.py` module provides:
+
+- **`format_msgs(msgs)`**: Formats a list of `Msg` objects into a list of dictionaries
+- **`async count_words(token_counter, text)`**: Counts tokens in text (supports both string and list[dict] formats). Must be awaited.
+- **`MemoryCompressionSchema`**: Pydantic model for structured compression output
+- **`DEFAULT_COMPRESSION_PROMPT_TEMPLATE`**: Default prompt template for compression (includes placeholders: `{max_token}`, `{messages_list_json}`, `{schema_json}`)
+
+## Storage Abstraction
+
+The `_memory_storage.py` module provides:
+
+- **`MessageStorageBase`**: Abstract base class for message storage backends
+  - Required async methods: `start()`, `stop()`, `health()`, `add()`, `delete()`, `clear()`, `get()`, `replace()`, `__aenter__()`, `__aexit__()`
+- **`InMemoryMessageStorage`**: Default in-memory implementation
+  - Stores messages in a simple list
+  - Suitable for most use cases
+
+## Best Practices
+
+- **Token Limit Selection**: Choose `max_token` based on your model's context window and typical conversation length
+- **Compression Timing**:
+  - Set `compression_on_get=True` (default) for compression during retrieval
+  - Set `compression_on_add=False` (default) to avoid compression during add operations, as it may not complete before `get_memory()` is called
+- **Async Operations**: All main methods are async, so use `await` when calling them
+- **State Persistence**: Use `state_dict()` and `load_state_dict()` to save/restore conversation state between sessions
+- **Custom Compression**: For domain-specific compression needs, implement a custom `compress_func` (must be async and return `List[Msg]`)
+- **Compression Triggers**: Use `compression_trigger_func` for custom compression logic beyond token limits (e.g., compress after N messages, compress on specific conditions)
+- **Storage Backends**: Implement custom `MessageStorageBase` subclasses for persistent storage (e.g., database, file system)
+
+## Troubleshooting
+
+- **Compression Not Triggering**:
+  - Check that `compression_on_get=True` if you expect compression during retrieval
+  - Verify that `max_token` is set appropriately
+  - Ensure `get_memory()` is being called (and awaited)
+  - If using `compression_trigger_func`, verify it returns `True` when compression should occur
+- **Structured Output Errors**: Ensure your model supports structured output (e.g., DashScope models with `structured_model` parameter)
+- **Token Counting Issues**: Verify that your `token_counter` is compatible with your model and correctly configured
+- **Async/Await Errors**: Remember that most methods are async - use `await` when calling them
+- **Storage Issues**: If using custom storage backends, ensure all required methods are properly implemented and async
+
+## Reference
+
+- [AgentScope Documentation](https://github.com/agentscope-ai/agentscope)
+- [Pydantic Documentation](https://docs.pydantic.dev/)
--- a/examples/functionality/short_term_memory/memory_compression/main.py
+++ b/examples/functionality/short_term_memory/memory_compression/main.py
@@ -0,0 +1,46 @@
+# -*- coding: utf-8 -*-
+"""The main entry point of the MemoryWithCompress example."""
+import asyncio
+import os
+from agentscope.agent import ReActAgent, UserAgent
+from agentscope.formatter import DashScopeChatFormatter
+from agentscope.model import DashScopeChatModel
+from agentscope.token import CharTokenCounter
+
+
+async def main() -> None:
+    """The main entry point of the MemoryWithCompress example."""
+
+    # Create model for agent and memory compression
+    agent = ReActAgent(
+        name="Friday",
+        sys_prompt="You are a helpful assistant named Friday.",
+        model=DashScopeChatModel(
+            api_key=os.getenv("DASHSCOPE_API_KEY"),
+            model_name="qwen3-max",
+        ),
+        formatter=DashScopeChatFormatter(),
+        compression_config=ReActAgent.CompressionConfig(
+            enable=True,
+            agent_token_counter=CharTokenCounter(),
+            # We set a small trigger threshold for demonstration purposes.
+            trigger_threshold=1000,
+            keep_recent=3,
+        ),
+    )
+    user = UserAgent("User")
+
+    # Simulate a conversation to trigger memory compression
+    msg = None
+    while True:
+        msg = await user(msg)
+        if msg.get_text_content() == "exit":
+            break
+        msg = await agent(msg)
+
+    print("The memory of the agent:")
+    for msg in await agent.memory.get_memory():
+        print(msg.to_dict(), end="\n")
+
+
+asyncio.run(main())
--- a/examples/functionality/short_term_memory/reme/README.md
+++ b/examples/functionality/short_term_memory/reme/README.md
@@ -0,0 +1,479 @@
+# ReMe Short-Term Memory in AgentScope
+
+This example demonstrates how to
+
+- use ReMeShortTermMemory to provide automatic working memory management for AgentScope agents,
+- handle long conversation contexts with intelligent summarization and compaction,
+- integrate short-term memory with ReAct agents for efficient tool usage and context management, and
+- configure DashScope models for memory operations.
+
+## Why Short-Term Memory?
+
+### The Challenge: From Prompt Engineering to Context Engineering
+
+As AI agents evolved from simple chatbots to sophisticated autonomous systems, the focus shifted from "prompt engineering" to "context engineering". While prompt engineering focused on crafting effective instructions for language models, context engineering addresses a more fundamental challenge: **managing the ever-growing conversation and tool execution history that agents accumulate**.
+
+### The Core Problem: Context Explosion
+
+Agentic systems work by binding LLMs with tools and running them in a loop where the agent decides which tools to call and feeds results back into the message history. This creates a snowball effect:
+
+- **Rapid Growth**: A seemingly simple task can trigger 50+ tool calls, with production agents often running hundreds of conversation turns
+- **Large Outputs**: Each tool call can return substantial text, consuming massive amounts of tokens
+- **Memory Pressure**: The context window quickly fills up as messages and tool results accumulate chronologically
+
+### The Consequence: Context Rot
+
+When context grows too large, model performance degrades significantly—a phenomenon known as **"context rot"**:
+
+- **Repetitive Responses**: The model starts generating redundant or circular answers
+- **Slower Reasoning**: Inference becomes noticeably slower as context length increases
+- **Quality Degradation**: Overall response quality and coherence decline
+- **Lost Focus**: The model struggles to identify relevant information in the bloated context
+
+### The Fundamental Paradox
+
+Agents face a critical tension:
+
+- **Need Rich Context**: Agents require comprehensive historical information to make informed decisions
+- **Suffer from Large Context**: Excessive context causes performance degradation and inefficiency
+
+**Context management aims to keep "just enough" information in the window**—sufficient for effective decision-making while leaving room for retrieval and expansion, without overwhelming the model.
+
+### Why Short-Term Memory Management Matters
+
+Effective short-term memory management is essential for:
+
+1. **Maintaining Performance**: Keeping context within optimal size prevents quality degradation
+2. **Enabling Long-Running Tasks**: Agents can handle complex, multi-step workflows without hitting context limits
+3. **Cost Efficiency**: Reducing token usage directly lowers API costs
+4. **Preserving Reasoning Quality**: Clean, focused context helps models maintain coherent reasoning chains
+5. **Scalability**: Proper memory management allows agents to scale to production workloads
+
+### The Solution: Intelligent Context Management
+
+ReMeShortTermMemory implements proven context management strategies:
+
+- **Context Offloading**: Moving large tool outputs to external storage while keeping references
+- **Context Reduction**: Compacting tool results into minimal representations and summarizing when necessary
+- **Smart Retention**: Keeping recent messages intact to maintain continuity and provide usage examples
+- **Automatic Triggering**: Monitoring token usage and applying strategies before performance degrades
+
+By implementing these strategies, ReMeShortTermMemory enables agents to handle arbitrarily long conversations and complex tasks while maintaining optimal performance throughout.
+
+## Prerequisites
+
+- Python 3.10 or higher
+- DashScope API key from Alibaba Cloud
+
+
+## QuickStart
+
+Install agentscope and ensure you have a valid DashScope API key in your environment variables.
+
+> Note: The example is built with DashScope chat model. If you want to use OpenAI models instead,
+> modify the model initialization in the example code accordingly.
+
+```bash
+# Install agentscope from source
+cd {PATH_TO_AGENTSCOPE}
+pip install -e .
+# Install dependencies
+pip install reme-ai python-dotenv
+```
+
+Set up your API key:
+
+```bash
+export DASHSCOPE_API_KEY='YOUR_API_KEY'
+```
+
+Or create a `.env` file:
+
+```bash
+DASHSCOPE_API_KEY=YOUR_API_KEY
+```
+
+Run the example:
+
+```bash
+python short_term_memory_example.py
+```
+
+The example will:
+1. Initialize a ReMeShortTermMemory instance with DashScope models
+2. Demonstrate automatic memory compaction for long tool responses
+3. Show integration with ReActAgent for context-aware conversations
+4. Use grep and read_file tools to search and retrieve information from files
+
+## Key Features
+
+- **Automatic Memory Management**: Intelligent summarization and compaction of working memory to handle long contexts
+- **Tool Response Optimization**: Automatic truncation and summarization of large tool responses to stay within token limits
+- **Flexible Configuration**: Configurable thresholds for compaction ratio, token limits, and recent message retention
+- **ReAct Agent Integration**: Seamless integration with AgentScope's ReActAgent and tool system
+- **Async Operations**: Full async support for non-blocking memory operations
+
+## Basic Usage
+
+This section provides a detailed walkthrough of the `short_term_memory_example.py` code, explaining how each component works together to create an agent with intelligent context management.
+
+### Configuration Parameters
+
+#### `ReMeShortTermMemory` Class Parameters
+
+The `ReMeShortTermMemory` class accepts the following initialization parameters:
+
+- **`model`** (`DashScopeChatModel | OpenAIChatModel | None`): Language model for compression operations. Must be either `DashScopeChatModel` or `OpenAIChatModel`. This model is used for LLM-based compression when generating compact state snapshots. **Required**.
+
+- **`reme_config_path`** (`str | None`): Optional path to ReMe configuration file for custom settings. Use this to provide advanced ReMe configurations beyond the standard parameters. Default: `None`.
+
+- **`working_summary_mode`** (`str`): Strategy for working memory management. Controls how the memory system handles context overflow:
+  - `"compact"`: Only compact verbose tool messages by storing full content externally and keeping short previews in the active context.
+  - `"compress"`: Only apply LLM-based compression to generate compact state snapshots of conversation history.
+  - `"auto"`: First run compaction, then optionally run compression if the compaction ratio exceeds `compact_ratio_threshold`. This is the recommended mode for most use cases.
+
+  Default: `"auto"`.
+
+- **`compact_ratio_threshold`** (`float`): Threshold for compaction effectiveness in AUTO mode. If `(compacted_tokens / original_tokens) > compact_ratio_threshold`, compression is applied after compaction. This ensures compression only runs when compaction alone isn't sufficient. Valid range: 0.0 to 1.0. Default: `0.75`.
+
+- **`max_total_tokens`** (`int`): Maximum token count threshold before compression is triggered. This limit does **not** include `keep_recent_count` messages or system messages, which are always preserved. Should be set to 20%-50% of your model's context window size to leave room for new tool calls and responses. Default: `20000`.
+
+- **`max_tool_message_tokens`** (`int`): Maximum token count for individual tool messages before compaction. Tool messages exceeding this limit are stored externally in files, with only a short preview kept in the active context. This is the maximum tolerable length for a single tool response. Default: `2000`.
+
+- **`group_token_threshold`** (`int | None`): Maximum token count per compression group when splitting messages for LLM compression. When set to a positive integer, long message sequences are split into smaller batches for compression. If `None` or `0`, all messages are compressed in a single group. Use this to control the granularity of compression operations. Default: `None`.
+
+- **`keep_recent_count`** (`int`): Number of most recent messages to preserve without compression or compaction. These messages remain in full in the active context to maintain conversation continuity and provide usage examples for the agent. The example uses `1` for demonstration purposes; **in production, a value of `10` is recommended** to maintain better conversation flow. Default: `10`.
+
+- **`store_dir`** (`str`): Directory path for storing offloaded message content and compressed history files. This is where external files containing full tool responses and compressed message history are saved. The directory will be created automatically if it doesn't exist. Default: `"inmemory"`.
+
+- **`**kwargs`** (`Any`): Additional arguments passed to `ReMeApp` initialization. Use this to pass any extra configuration options supported by the underlying ReMe application.
+
+#### Parameter Relationships and Best Practices
+
+- **Token Budget Strategy**: Set `max_total_tokens` to 20%-50% of your model's context window. For example, if your model has a 128K context window, set `max_total_tokens` between 25,600 and 64,000 tokens.
+
+- **Compaction vs Compression**:
+  - Compaction is fast and lossless (full content is stored externally)
+  - Compression is slower but more aggressive (uses LLM to summarize)
+  - Use `"auto"` mode to benefit from both strategies
+
+- **Recent Message Retention**: Higher `keep_recent_count` values (e.g., 10) provide better context continuity but consume more tokens. Lower values (e.g., 1) are more aggressive but may lose important recent context.
+
+- **Tool Message Handling**: Adjust `max_tool_message_tokens` based on your typical tool response sizes. If your tools frequently return large outputs (e.g., file contents, API responses), consider a higher threshold or ensure compaction is enabled.
+
+### Code Flow Diagram
+
+```mermaid
+flowchart TD
+    A[Start: Load Environment] --> B[Create Toolkit]
+    B --> C[Register Tools: grep & read_file]
+    C --> D[Initialize LLM Model]
+    D --> E[Create ReMeShortTermMemory]
+    E --> F[Enter Async Context Manager]
+    F --> G[Add Initial Messages with Large Tool Response]
+    G --> H[Memory Auto-Compacts Large Content]
+    H --> I[Create ReActAgent with Memory]
+    I --> J[User Sends Query]
+    J --> K[Agent Uses Tools to Search/Read]
+    K --> L[Tool Responses Added to Memory]
+    L --> M{Memory Token Limit?}
+    M -->|Exceeded| N[Auto-Compact/Summarize]
+    M -->|OK| O[Agent Generates Response]
+    N --> O
+    O --> P[Return Response to User]
+    P --> Q[Exit Context Manager]
+    Q --> End[End]
+
+    style H fill:#e1f5ff
+    style N fill:#ffe1e1
+    style O fill:#e1ffe1
+```
+
+### Step-by-Step Code Walkthrough
+
+The example demonstrates a complete workflow from tool registration to agent interaction. Here's a detailed breakdown:
+
+#### 1. Environment Setup and Imports
+
+```python
+import asyncio
+import os
+from dotenv import load_dotenv
+
+load_dotenv()
+```
+
+The code starts by loading environment variables (including the DashScope API key) from a `.env` file.
+
+#### 2. Tool Registration
+
+The example defines two custom tools that demonstrate how to integrate retrieval operations:
+
+**`grep` Tool**: Searches for patterns in files using regular expressions
+```python
+async def grep(file_path: str, pattern: str, limit: str) -> ToolResponse:
+    """A powerful search tool for finding patterns in files..."""
+    from reme_ai.retrieve.working import GrepOp
+
+    op = GrepOp()
+    await op.async_call(file_path=file_path, pattern=pattern, limit=limit)
+    return ToolResponse(
+        content=[TextBlock(type="text", text=op.output)],
+    )
+```
+
+**`read_file` Tool**: Reads specific line ranges from files
+```python
+async def read_file(file_path: str, offset: int, limit: int) -> ToolResponse:
+    """Reads and returns the content of a specified file..."""
+    from reme_ai.retrieve.working import ReadFileOp
+
+    op = ReadFileOp()
+    await op.async_call(file_path=file_path, offset=offset, limit=limit)
+    return ToolResponse(
+        content=[TextBlock(type="text", text=op.output)],
+    )
+```
+
+> **Important Note on Tool Replaceability**:
+> - The `grep` and `read_file` tools shown here are **example implementations** using ReMe's built-in operations
+> - You can **replace them with your own retrieval tools**, such as:
+>   - Vector database embedding retrieval (e.g., ChromaDB, Pinecone, Weaviate)
+>   - Web search APIs (e.g., Google Search, Bing Search)
+>   - Database query tools (e.g., SQL queries, MongoDB queries)
+>   - Custom domain-specific search solutions
+> - Similarly, the **offline write operations** (used internally by ReMeShortTermMemory to store compacted content) can be customized by modifying the `write_text_file` function in AgentScope's tool system
+> - The key requirement is that your tools return `ToolResponse` objects with appropriate content blocks
+
+#### 3. LLM Model Initialization
+
+```python
+llm = DashScopeChatModel(
+    model_name="qwen3-coder-30b-a3b-instruct",
+    api_key=os.environ.get("DASHSCOPE_API_KEY"),
+    stream=False,
+    generate_kwargs={
+        "temperature": 0.001,
+        "seed": 0,
+    },
+)
+```
+
+The model is configured with low temperature for consistent, deterministic responses. This same model will be used for both agent reasoning and memory summarization operations.
+
+#### 4. Short-Term Memory Initialization
+
+```python
+short_term_memory = ReMeShortTermMemory(
+    model=llm,
+    working_summary_mode="auto",           # Automatic memory management
+    compact_ratio_threshold=0.75,          # Trigger compaction at 75% capacity
+    max_total_tokens=20000,                # Set to 20%-50% of model's context window
+    max_tool_message_tokens=2000,          # Maximum tolerable tool response length
+    group_token_threshold=None,            # Max tokens per LLM compression batch; None means no splitting
+    keep_recent_count=1,                   # Keep 1 recent message intact (set to 1 for demo; use 10 in production)
+    store_dir="inmemory",            # Storage directory for offloaded content
+)
+```
+
+This configuration enables automatic memory management that will:
+- Monitor token usage
+- Automatically compact large tool responses when they exceed `max_tool_message_tokens`
+- Trigger summarization when total tokens exceed `max_total_tokens` and compaction ratio exceeds `compact_ratio_threshold`
+
+#### 5. Async Context Manager Usage
+
+```python
+async with short_term_memory:
+    # All memory operations happen here
+```
+
+The `async with` statement ensures proper initialization and cleanup of memory resources. This is the recommended approach for using `ReMeShortTermMemory`.
+
+#### 6. Simulating Long Context
+
+The example demonstrates memory compaction by adding a large tool response:
+
+```python
+# Read README content and multiply it 10 times to simulate a large response
+f = open("../../../../README.md", encoding="utf-8")
+readme_content = f.read()
+f.close()
+
+memories = [
+    {
+        "role": "user",
+        "content": "搜索下项目资料",
+    },
+    {
+        "role": "assistant",
+        "content": None,
+        "tool_calls": [...],  # Tool call metadata
+    },
+    {
+        "role": "tool",
+        "content": readme_content * 10,  # Large tool response (10x README)
+        "tool_call_id": "call_6596dafa2a6a46f7a217da",
+    },
+]
+
+await short_term_memory.add(
+    ReMeShortTermMemory.list_to_msg(memories),
+    allow_duplicates=True,
+)
+```
+
+When this large content is added, `ReMeShortTermMemory` will:
+1. Detect that the tool response exceeds `max_tool_message_tokens` (the maximum tolerable tool response length, set to 2000 in this example)
+2. Automatically compact it by storing the full content in an external file
+3. Keep only a short preview in the active memory
+4. This happens transparently without manual intervention
+
+#### 7. ReAct Agent Creation
+
+```python
+agent = ReActAgent(
+    name="react",
+    sys_prompt=(
+        "You are a helpful assistant. "
+        "工具调用的调用可能会被缓存到本地。"
+        "可以先使用`Grep`匹配关键词或者正则表达式所在行数，然后通过`ReadFile`读取位置附近的代码。"
+        # ... more instructions
+    ),
+    model=llm,
+    formatter=DashScopeChatFormatter(),
+    toolkit=toolkit,
+    memory=short_term_memory,  # Memory is integrated here
+    max_iters=20,
+)
+```
+
+The agent is configured with:
+- The same LLM model used for memory operations
+- The toolkit containing `grep` and `read_file` tools
+- The `short_term_memory` instance for automatic context management
+- A system prompt that guides the agent on tool usage patterns
+
+#### 8. Agent Interaction
+
+```python
+msg = Msg(
+    role="user",
+    content=("项目资料中，agentscope_v1论文的一作是谁？"),
+    name="user",
+)
+msg = await agent(msg)
+print(f"✓ Agent response: {msg.get_text_content()}\n")
+```
+
+When the agent processes this message:
+1. It receives the user query
+2. Decides to use tools (e.g., `grep` to search for "agentscope_v1")
+3. Tool responses are automatically added to memory
+4. If memory grows too large, automatic compaction occurs
+5. The agent generates a response based on the managed context
+6. The response is returned to the user
+
+### Complete Example Code Structure
+
+```python
+async def main() -> None:
+    # 1. Create toolkit and register tools
+    toolkit = Toolkit()
+    toolkit.register_tool_function(grep)
+    toolkit.register_tool_function(read_file)
+
+    # 2. Initialize LLM
+    llm = DashScopeChatModel(...)
+
+    # 3. Create short-term memory
+    short_term_memory = ReMeShortTermMemory(...)
+
+    # 4. Use async context manager
+    async with short_term_memory:
+        # 5. Add initial messages (with large content to demo compaction)
+        await short_term_memory.add(messages, allow_duplicates=True)
+
+        # 6. Create agent with memory
+        agent = ReActAgent(..., memory=short_term_memory, ...)
+
+        # 7. Interact with agent
+        response = await agent(user_message)
+```
+
+### Key Takeaways
+
+1. **Automatic Memory Management**: Memory compaction and summarization happen automatically when thresholds are exceeded
+2. **Tool Integration**: Tools return `ToolResponse` objects that are seamlessly integrated into memory
+3. **Async Context Manager**: Always use `async with short_term_memory:` to ensure proper resource management
+4. **Flexible Tool System**: The `grep` and `read_file` tools are examples—you can replace them with any retrieval mechanism that fits your use case
+5. **Transparent Operation**: Memory management is transparent to the agent—it just sees a clean, focused context
+
+### Using Async Context Manager
+
+`ReMeShortTermMemory` implements the async context manager protocol, which ensures proper initialization and cleanup of resources. There are two ways to use it:
+
+#### Recommended: Using `async with` Statement
+
+The recommended approach is to use the `async with` statement, which automatically handles resource management:
+
+```python
+async with short_term_memory:
+    # Memory is initialized here
+    await short_term_memory.add(messages)
+    response = await agent(msg)
+    # Memory is automatically cleaned up when exiting the block
+```
+
+#### Alternative: Manual `__aenter__` and `__aexit__` Calls
+
+You can also manually call `__aenter__` and `__aexit__` if you need more control:
+
+```python
+# Manually initialize
+await short_term_memory.__aenter__()
+
+try:
+    # Use the memory
+    await short_term_memory.add(messages)
+    response = await agent(msg)
+finally:
+    # Manually cleanup
+    await short_term_memory.__aexit__(None, None, None)
+```
+
+> **Note**: It's recommended to use the `async with` statement as it ensures proper resource cleanup even if an exception occurs.
+
+## Advanced Configuration
+
+You can customize the ReMe config by passing a config path:
+
+```python
+short_term_memory = ReMeShortTermMemory(
+    model=llm,
+    reme_config_path="path/to/your/config.yaml",  # Pass your custom ReMe configuration
+    # ... other parameters
+)
+```
+
+For more configuration options, refer to the [ReMe documentation](https://github.com/agentscope-ai/ReMe).
+
+## What's in the Example
+
+The `short_term_memory_example.py` file demonstrates:
+
+1. **Tool Integration**: Registering `grep` and `read_file` tools for searching and reading files
+2. **Memory Initialization**: Setting up ReMeShortTermMemory with appropriate parameters for handling long contexts
+3. **Long Context Handling**: Adding a large tool response (README content × 10) to demonstrate automatic memory compaction
+4. **ReAct Agent Usage**: Using the agent with short-term memory to answer questions based on retrieved information
+
+## Example Workflow
+
+The example shows a typical workflow:
+
+1. User asks to search for project information
+2. Agent uses `grep` tool to find relevant content
+3. Agent uses `read_file` tool to read specific sections
+4. Large tool responses are automatically compacted by the memory system
+5. Agent answers the user's question based on the retrieved information
+
--- a/examples/functionality/short_term_memory/reme/reme_short_term_memory.py
+++ b/examples/functionality/short_term_memory/reme/reme_short_term_memory.py
@@ -0,0 +1,349 @@
+# -*- coding: utf-8 -*-
+"""ReMe-based short-term memory implementation for AgentScope."""
+import json
+from pathlib import Path
+from typing import Any, List
+from uuid import uuid4
+
+from agentscope import logger
+from agentscope._utils._common import _json_loads_with_repair
+from agentscope.formatter import DashScopeChatFormatter, OpenAIChatFormatter
+from agentscope.memory import InMemoryMemory
+from agentscope.message import Msg, TextBlock, ToolUseBlock, ToolResultBlock
+from agentscope.model import DashScopeChatModel, OpenAIChatModel
+from agentscope.tool import write_text_file
+
+
+class ReMeShortTermMemory(InMemoryMemory):
+    """Short-term memory implementation using ReMe for message management.
+
+    This class provides automatic working-memory management through a
+    multi-stage pipeline that reduces token usage while preserving
+    essential information:
+
+    1. **Compaction**: Truncates large tool messages by storing full
+       content in external files and keeping only short previews in the
+       active context.
+    2. **Compression**: Uses LLM to generate dense summaries of older
+       conversation history, creating a compact state snapshot.
+    3. **Offload**: Orchestrates compaction and optional compression
+       based on the configured working_summary_mode (COMPACT, COMPRESS,
+       or AUTO).
+
+    The memory management is triggered automatically when `get_memory()`
+    is called, ensuring the agent's context stays within token limits
+    while maintaining access to detailed historical information through
+    external storage.
+    """
+
+    def __init__(
+        self,
+        model: DashScopeChatModel | OpenAIChatModel | None = None,
+        reme_config_path: str | None = None,
+        working_summary_mode: str = "auto",
+        compact_ratio_threshold: float = 0.75,
+        max_total_tokens: int = 20000,
+        max_tool_message_tokens: int = 2000,
+        group_token_threshold: int | None = None,
+        keep_recent_count: int = 10,
+        store_dir: str = "inmemory",
+        **kwargs: Any,
+    ) -> None:
+        """Initialize ReMe-based short-term memory.
+
+        Args:
+            model: Language model for compression operations. Must be
+                either DashScopeChatModel or OpenAIChatModel.
+            reme_config_path: Optional path to ReMe configuration file
+                for custom settings.
+            working_summary_mode: Strategy for working memory management.
+                - "compact": Only compact verbose tool messages by
+                  storing full content externally and keeping short
+                  previews.
+                - "compress": Only apply LLM-based compression to
+                  generate compact state snapshots.
+                - "auto": First run compaction, then optionally run
+                  compression if the compaction ratio exceeds
+                  compact_ratio_threshold.
+                Defaults to "auto".
+            compact_ratio_threshold: Threshold for compaction
+                effectiveness in AUTO mode. If (compacted_tokens /
+                original_tokens) > this threshold, compression is
+                applied. Defaults to 0.75.
+            max_total_tokens: Maximum token count threshold before
+                compression is triggered. Does not include
+                keep_recent_count messages or system messages.
+                Defaults to 20000.
+            max_tool_message_tokens: Maximum token count for individual
+                tool messages before compaction. Tool messages exceeding
+                this are stored externally. Defaults to 2000.
+            group_token_threshold: Maximum token count per compression
+                group when splitting messages for LLM compression. If
+                None or 0, all messages are compressed in a single
+                group. Defaults to None.
+            keep_recent_count: Number of most recent messages to
+                preserve without compression or compaction. These
+                messages remain in full in the active context.
+                Defaults to 1.
+            store_dir: Directory path for storing offloaded message
+                content and compressed history files. Defaults to
+                "working_memory".
+            **kwargs: Additional arguments passed to ReMeApp
+                initialization.
+
+        Raises:
+            ValueError: If model is not a DashScopeChatModel or
+                OpenAIChatModel.
+            ImportError: If reme_ai library is not installed.
+        """
+        super().__init__()
+
+        # Store working memory parameters
+        self.working_summary_mode = working_summary_mode
+        self.compact_ratio_threshold = compact_ratio_threshold
+        self.max_total_tokens = max_total_tokens
+        self.max_tool_message_tokens = max_tool_message_tokens
+        self.group_token_threshold = group_token_threshold
+        self.keep_recent_count = keep_recent_count
+        self.store_dir = store_dir
+
+        config_args = []
+
+        if isinstance(model, DashScopeChatModel):
+            llm_api_base = "https://dashscope.aliyuncs.com/compatible-mode/v1"
+            llm_api_key = model.api_key
+            self.formatter = DashScopeChatFormatter()
+
+        elif isinstance(model, OpenAIChatModel):
+            llm_api_base = str(getattr(model.client, "base_url", None))
+            llm_api_key = str(getattr(model.client, "api_key", None))
+            self.formatter = OpenAIChatFormatter()
+
+        else:
+            raise ValueError(
+                "model must be a DashScopeChatModel or "
+                "OpenAIChatModel instance. "
+                f"Got {type(model).__name__} instead.",
+            )
+
+        llm_model_name = model.model_name
+
+        if llm_model_name:
+            config_args.append(f"llm.default.model_name={llm_model_name}")
+
+        try:
+            from reme_ai import ReMeApp
+        except ImportError as e:
+            raise ImportError(
+                "The 'reme_ai' library is required for ReMe-based "
+                "short-term memory. Please try `pip install reme-ai`,"
+                "and visit: https://github.com/agentscope-ai/ReMe for more "
+                "information.",
+            ) from e
+
+        self.app = ReMeApp(
+            *config_args,
+            llm_api_key=llm_api_key,
+            llm_api_base=llm_api_base,
+            embedding_api_key=llm_api_key,  # fake api key
+            embedding_api_base=llm_api_base,  # fake api base
+            config_path=reme_config_path,
+            **kwargs,
+        )
+
+        self._app_started = False
+
+    async def __aenter__(self) -> "ReMeShortTermMemory":
+        """Async context manager entry.
+
+        Initializes the ReMe application for async operations.
+        """
+        if self.app is not None:
+            await self.app.__aenter__()
+            self._app_started = True
+        return self
+
+    async def __aexit__(
+        self,
+        exc_type: Any = None,
+        exc_val: Any = None,
+        exc_tb: Any = None,
+    ) -> None:
+        """Async context manager exit.
+
+        Cleans up the ReMe application resources.
+        """
+        if self.app is not None:
+            await self.app.__aexit__(exc_type, exc_val, exc_tb)
+        self._app_started = False
+
+    async def get_memory(self) -> list[Msg]:
+        """Retrieve and manage working memory with automatic summarization.
+
+        This method performs the core working-memory management pipeline:
+
+        1. **Format messages**: Converts internal Msg objects to standard
+           message format using the appropriate formatter (DashScope or
+           OpenAI).
+        2. **Execute offload pipeline**: Calls ReMe's
+           summary_working_memory_for_as operation which orchestrates:
+           - Message compaction: Large tool messages are truncated and
+             stored externally with only previews kept in context.
+           - Message compression: If needed (based on
+             working_summary_mode), older messages are compressed using
+             LLM into dense summaries.
+           - File storage: Offloaded content is written to external
+             files for potential retrieval.
+        3. **Update content**: Replaces the internal message list with
+           the managed version, ensuring subsequent operations work with
+           the optimized context.
+
+        The operation respects configuration parameters like
+        max_total_tokens, keep_recent_count, and working_summary_mode to
+        balance context size with information preservation.
+
+        Returns:
+            List of Msg objects representing the managed working memory,
+            with large tool messages compacted and/or older history
+            compressed as needed.
+
+        Note:
+            This method automatically writes offloaded content to files
+            in the configured store_dir. The write_file_dict metadata
+            contains paths and content for all externally stored
+            messages.
+        """
+        messages: list[dict[str, Any]] = await self.formatter.format(
+            msgs=self.content,  # type: ignore[has-type]
+        )
+        for message in messages:
+            if isinstance(message.get("content"), list):
+                msg_content = message.get("content")
+                logger.warning(
+                    "Skipping message with content as list. content=%s",
+                    msg_content,
+                )
+                message["content"] = ""
+
+        # Execute ReMe's working memory offload pipeline
+        # This orchestrates compaction and/or compression based on
+        # working_summary_mode
+        result: dict = await self.app.async_execute(
+            name="summary_working_memory_for_as",
+            messages=messages,
+            working_summary_mode=self.working_summary_mode,
+            compact_ratio_threshold=self.compact_ratio_threshold,
+            max_total_tokens=self.max_total_tokens,
+            max_tool_message_tokens=self.max_tool_message_tokens,
+            group_token_threshold=self.group_token_threshold,
+            keep_recent_count=self.keep_recent_count,
+            store_dir=self.store_dir,
+            chat_id=uuid4().hex,
+        )
+        logger.info(
+            "summary_working_memory_for_as.result=%s",
+            json.dumps(result, ensure_ascii=False, indent=2),
+        )
+
+        # Extract managed messages and file write operations from result
+        messages = result.get("answer", [])
+        write_file_dict: dict = result.get("metadata", {}).get(
+            "write_file_dict",
+            {},
+        )
+        # Write offloaded content to external files
+        # This includes full tool message content and compressed message
+        # history
+        if write_file_dict:
+            for path, content_str in write_file_dict.items():
+                file_dir = Path(path).parent
+                if not file_dir.exists():
+                    file_dir.mkdir(parents=True, exist_ok=True)
+                await write_text_file(path, content_str)
+
+        # Update internal content with managed messages
+        self.content = self.list_to_msg(messages)
+        return self.content
+
+    @staticmethod
+    def list_to_msg(messages: list[dict[str, Any]]) -> list[Msg]:
+        """Convert a list of message dictionaries to Msg objects.
+
+        This method handles the conversion from standard message format
+        (used by ReMe and LLM APIs) back to AgentScope's Msg objects.
+        It properly handles:
+        - Text content for user, system, and assistant messages
+        - Tool result blocks (converting role="tool" to role="system")
+        - Tool use blocks from tool_calls in assistant messages
+
+        Args:
+            messages: List of message dictionaries with role, content,
+                and optional tool_calls or tool-related fields.
+
+        Returns:
+            List of Msg objects with properly structured content blocks.
+        """
+        msg_list: list[Msg] = []
+        for msg_dict in messages:
+            role = msg_dict["role"]
+            content_blocks: List[
+                TextBlock | ToolUseBlock | ToolResultBlock
+            ] = []
+            content = msg_dict.get("content")
+
+            # Convert text content to appropriate content blocks
+            if content:
+                if role in ["user", "system", "assistant"]:
+                    content_blocks.append(TextBlock(type="text", text=content))
+                elif role in ["tool"]:
+                    # Tool messages are converted to system messages with
+                    # ToolResultBlock
+                    role = "system"
+                    content_blocks.append(
+                        ToolResultBlock(
+                            type="tool_result",
+                            name=msg_dict.get("name"),
+                            id=msg_dict.get("tool_call_id"),
+                            output=[TextBlock(type="text", text=content)],
+                        ),
+                    )
+
+            # Convert tool_calls to ToolUseBlock content blocks
+            if msg_dict.get("tool_calls"):
+                for tool_call in msg_dict["tool_calls"]:
+                    # Parse tool arguments with repair for malformed JSON
+                    input_ = _json_loads_with_repair(
+                        tool_call["function"].get(
+                            "arguments",
+                            "{}",
+                        )
+                        or "{}",
+                    )
+                    content_blocks.append(
+                        ToolUseBlock(
+                            type="tool_use",
+                            name=tool_call["function"]["name"],
+                            input=input_,
+                            id=tool_call["id"],
+                        ),
+                    )
+
+            msg_obj = Msg(
+                name=role,
+                content=content_blocks,
+                role=role,
+                metadata=msg_dict.get("metadata"),
+            )
+            msg_list.append(msg_obj)
+        return msg_list
+
+    async def retrieve(self, *args: Any, **kwargs: Any) -> None:
+        """Retrieve operation is not implemented for ReMe short-term memory.
+
+        ReMe focuses on working memory management (compaction and compression)
+        rather than retrieval from long-term storage.
+
+        Raises:
+            NotImplementedError: This operation is not supported.
+        """
+        raise NotImplementedError
--- a/examples/functionality/short_term_memory/reme/short_term_memory_example.py
+++ b/examples/functionality/short_term_memory/reme/short_term_memory_example.py
@@ -0,0 +1,188 @@
+# -*- coding: utf-8 -*-
+"""Example demonstrating ReMeShortTermMemory usage with ReActAgent."""
+# noqa: E402
+import asyncio
+import os
+
+from dotenv import load_dotenv
+
+from agentscope.agent import ReActAgent
+from agentscope.formatter import DashScopeChatFormatter
+from agentscope.message import Msg, TextBlock
+from agentscope.model import DashScopeChatModel
+from agentscope.tool import ToolResponse, Toolkit, view_text_file
+
+load_dotenv()
+
+
+async def main() -> None:
+    """Main function demonstrating ReMeShortTermMemory with tool usage."""
+    from reme_short_term_memory import ReMeShortTermMemory
+
+    toolkit = Toolkit()
+
+    async def grep(file_path: str, pattern: str, limit: str) -> ToolResponse:
+        """A powerful search tool for finding patterns in files using regular
+        expressions.
+
+        Supports full regex syntax (e.g., "log.*Error", "function\\s+\\w+"),
+        glob pattern filtering, and result limiting. Ideal for searching code
+        or text content across multiple files.
+
+        Args:
+            file_path (`str`):
+                The path to the file to search in. Can be an absolute or
+                relative path.
+            pattern (`str`):
+                The search pattern or regular expression to match. Supports
+                full regex syntax for complex pattern matching.
+            limit (`str`):
+                The maximum number of matching results to return. Use this to
+                control output size for large files. Should not exceed 50.
+        """
+        from reme_ai.retrieve.working import GrepOp
+
+        op = GrepOp()
+        await op.async_call(file_path=file_path, pattern=pattern, limit=limit)
+        return ToolResponse(
+            content=[
+                TextBlock(
+                    type="text",
+                    text=op.output,
+                ),
+            ],
+        )
+
+    async def read_file(
+        file_path: str,
+        offset: int,
+        limit: int,
+    ) -> ToolResponse:
+        """Reads and returns the content of a specified file.
+
+        For text files, it can read specific line ranges using the 'offset' and
+        'limit' parameters. Use offset and limit to paginate through large
+        files.
+
+        Note: It's recommended to use the `grep` tool first to locate the line
+        numbers of interest before calling this function.
+
+        Args:
+            file_path (`str`):
+                The path to the file to read. Can be an absolute or relative
+                path.
+            offset (`int`):
+                The starting line number to read from (0-indexed). Use this to
+                skip to a specific position in the file.
+            limit (`int`):
+                The maximum number of lines to read from the offset position.
+                Helps control memory usage when reading large files. Should
+                not exceed 100.
+        """
+
+        return await view_text_file(file_path, ranges=[offset, offset + limit])
+
+    # These two tools are provided as examples. You can replace them with your
+    # own retrieval tools, such as vector database embedding retrieval or other
+    # search solutions that fit your use case.
+    toolkit.register_tool_function(grep)
+    toolkit.register_tool_function(read_file)
+
+    llm = DashScopeChatModel(
+        model_name="qwen3-max",
+        # model_name="qwen3-coder-30b-a3b-instruct",
+        api_key=os.environ.get("DASHSCOPE_API_KEY"),
+        stream=False,
+        generate_kwargs={
+            "temperature": 0.001,
+            "seed": 0,
+        },
+    )
+    short_term_memory = ReMeShortTermMemory(
+        model=llm,
+        working_summary_mode="auto",
+        compact_ratio_threshold=0.75,
+        max_total_tokens=20000,
+        max_tool_message_tokens=2000,
+        group_token_threshold=None,  # Max tokens per compression batch
+        keep_recent_count=1,  # Set to 1 for demo; use 10 in production
+        store_dir="inmemory",
+    )
+
+    async with short_term_memory:
+        # Simulate ultra long context
+        f = open("../../../../README.md", encoding="utf-8")
+        readme_content = f.read()
+        f.close()
+
+        memories = [
+            {
+                "role": "user",
+                "content": "Search for project information",
+            },
+            {
+                "role": "assistant",
+                "content": None,
+                "tool_calls": [
+                    {
+                        "index": 0,
+                        "id": "call_6596dafa2a6a46f7a217da",
+                        "function": {
+                            "arguments": "{}",
+                            "name": "web_search",
+                        },
+                        "type": "function",
+                    },
+                ],
+            },
+            {
+                "role": "tool",
+                "content": readme_content * 10,
+                "tool_call_id": "call_6596dafa2a6a46f7a217da",
+            },
+        ]
+        await short_term_memory.add(
+            ReMeShortTermMemory.list_to_msg(memories),
+            allow_duplicates=True,
+        )
+
+        agent = ReActAgent(
+            name="react",
+            sys_prompt=(
+                "You are a helpful assistant. "
+                "Tool calls may be cached locally. "
+                "You can first use `Grep` to match keywords or regular "
+                "expressions to find line numbers, then use `ReadFile` "
+                "to read the code near that location. "
+                "If no matches are found, never give up trying - try "
+                "other parameters or relax the matching conditions, such "
+                "as searching for only partial keywords. "
+                "After `Grep`, you can use the `ReadFile` command to "
+                "view content starting from a specified offset position "
+                "`offset` with length `limit`. "
+                "The maximum limit is 100. "
+                "If the current content is insufficient, the `ReadFile` "
+                "command can continuously try different `offset` and "
+                "`limit` parameters."
+            ),
+            model=llm,
+            formatter=DashScopeChatFormatter(),
+            toolkit=toolkit,
+            memory=short_term_memory,
+            max_iters=20,
+        )
+
+        msg = Msg(
+            role="user",
+            content=(
+                "In the project documentation, who is the first author "
+                "of the agentscope_v1 paper?"
+            ),
+            name="user",
+        )
+        msg = await agent(msg)
+        print(f"✓ Agent response: {msg.get_text_content()}\n")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())