OceanBase Vector Store
This example demonstrates how to use OceanBaseStore for vector storage and semantic search in AgentScope. It includes CRUD operations, metadata filtering, document chunking, and distance metric tests.
Quick Start
Install dependencies (including pyobvector):
pip install -e .[full]
Start seekdb (a minimal OceanBase-compatible instance):
docker run -d -p 2881:2881 oceanbase/seekdb
Run the example script:
python main.py
Note: The script defaults to
127.0.0.1:2881, userroot, databasetest. If you use a multi-tenant OceanBase account (e.g.,root@test), override via environment variables.
Usage
Initialize Store
from agentscope.rag import OceanBaseStore
store = OceanBaseStore(
collection_name="test_collection",
dimensions=768,
distance="COSINE",
uri="127.0.0.1:2881",
user="root",
password="",
db_name="test",
)
Add Documents
from agentscope.rag import Document, DocMetadata
from agentscope.message import TextBlock
doc = Document(
metadata=DocMetadata(
content=TextBlock(type="text", text="Your document text"),
doc_id="doc_1",
chunk_id=0,
total_chunks=1,
),
embedding=[0.1, 0.2, 0.3],
)
await store.add([doc])
Search
results = await store.search(
query_embedding=[0.1, 0.2, 0.3],
limit=5,
score_threshold=0.9,
)
Filter Search
client = store.get_client()
table = client.load_table(collection_name="test_collection")
results = await store.search(
query_embedding=[0.1, 0.2, 0.3],
limit=5,
flter=[table.c["doc_id"].like("doc%")],
)
Note: The parameter name is
flter(missing the "i") to avoid clashing with Python's built-infilterand follows the underlying library's convention.
Delete
client = store.get_client()
table = client.load_table(collection_name="test_collection")
await store.delete(where=[table.c["doc_id"] == "doc_1"])
Distance Metrics
| Metric | Description | Best For |
|---|---|---|
| COSINE | Cosine similarity | Text embeddings (recommended) |
| L2 | Euclidean distance | Spatial data |
| IP | Inner product | Recommendation systems |
Filter Expressions
Build filters using SQLAlchemy expressions and pass them via flter:
table = store.get_client().load_table("test_collection")
filters = [
table.c["doc_id"] == "doc_1",
table.c["doc_id"].like("prefix%"),
table.c["chunk_id"] >= 0,
]
Advanced Usage
Access Underlying Client
client = store.get_client()
stats = client.get_collection_stats(collection_name="test_collection")
Document Metadata
content: Text content (TextBlock)doc_id: Unique document identifierchunk_id: Chunk position (0-indexed)total_chunks: Total chunks in document
FAQ
What embedding dimension should I use? Match your embedding model's output dimension (e.g., 768 for BERT, 1536 for OpenAI ada-002).
Can I change the distance metric after creation? No, create a new collection with the desired metric.
How do I clean up test data? Drop the collection via the underlying client or remove the seekdb container volume.
Environment Variables
The script supports the following environment variables to override connection settings:
export OCEANBASE_URI="127.0.0.1:2881"
export OCEANBASE_USER="root"
export OCEANBASE_PASSWORD=""
export OCEANBASE_DB="test"