Agentic Memory
Executive Summary
This document describes a Proof of Concept for Agentic Memory in the Semantic Router. Agentic Memory enables AI agents to remember information across sessions, providing continuity and personalization.
⚠️ POC Scope: This is a proof of concept, not a production design. The goal is to validate the core memory flow (retrieve → inject → extract → store) with acceptable accuracy. Production hardening (error handling, scaling, monitoring) is out of scope.
Core Capabilities
| Capability | Description |
|---|---|
| Memory Retrieval | Embedding-based search with simple pre-filtering |
| Memory Saving | LLM-based extraction of facts and procedures |
| Cross-Session Persistence | Memories stored in Milvus (survives restarts; production backup/HA not tested) |
| User Isolation | Memories scoped per user_id (see note below) |
⚠️ User Isolation - Milvus Performance Note:
Approach POC Production (10K+ users) Simple filter ✅ Filter by user_idafter search❌ Degrades: searches all users, then filters Partition Key ❌ Overkill ✅ Physical separation, O(log N) per user Scalar Index ❌ Overkill ✅ Index on user_idfor fast filteringPOC: Uses simple metadata filtering (sufficient for testing).
Production: Configureuser_idas Partition Key or Scalar Indexed Field in Milvus schema.
Key Design Principles
- Simple pre-filter decides if query should search memory
- Context window from history for query disambiguation
- LLM extracts facts and classifies type when saving
- Threshold-based filtering on search results
Explicit Assumptions (POC)
| Assumption | Implication | Risk if Wrong |
|---|---|---|
| LLM extraction is reasonably accurate | Some incorrect facts may be stored | Memory contamination (fixable via Forget API) |
| 0.6 similarity threshold is a starting point | May need tuning (miss relevant or include irrelevant) | Adjustable based on retrieval quality logs |
| Milvus is available and configured | Feature disabled if down | Graceful degradation (no crash) |
| Embedding model produces 384-dim vectors | Must match Milvus schema | Startup failure (detectable) |
| History available via Response API chain | Required for context | Skip memory if unavailable |
Table of Contents
- Problem Statement
- Architecture Overview
- Memory Types
- Pipeline Integration
- Memory Retrieval
- Memory Saving
- Memory Operations
- Data Structures
- API Extension
- Configuration
- Failure Modes and Fallbacks
- Success Criteria
- Implementation Plan
- Future Enhancements
1. Problem Statement
Current State
The Response API provides conversation chaining via previous_response_id, but knowledge is lost across sessions:
Session A (March 15):
User: "My budget for the Hawaii trip is $10,000"
→ Saved in session chain
Session B (March 20) - NEW SESSION:
User: "What's my budget for the trip?"
→ No previous_response_id → Knowledge LOST ❌
Desired State
With Agentic Memory:
Session A (March 15):
User: "My budget for the Hawaii trip is $10,000"
→ Extracted and saved to Milvus
Session B (March 20) - NEW SESSION:
User: "What's my budget for the trip?"
→ Pre-filter: memory-relevant ✓
→ Search Milvus → Found: "budget for Hawaii is $10K"
→ Inject into LLM context
→ Assistant: "Your budget for the Hawaii trip is $10,000!" ✅
2. Architecture Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENTIC MEMORY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤