Memory Protocol
What it is, and when to use it
Memory handles an agent's stateful recall: it stores past conversation turns and retrieves them later under some query, so they can be folded into a new prompt. The key difference from skills, the hook helpers (analyzer/validator), and workflows is that memory always carries state: a buffer, an index, an embedding matrix. None of that can be expressed cleanly as a single pure function call.
Why there is no @memory decorator and no register_memory
This is a deliberate asymmetry, and one of the framework's teaching points:
- The other protocols can start from a stateless function, so a decorator entry and a function-pass entry are the friendliest starting points for students.
- The smallest viable memory implementation (
ShortTermMemory) needs adeque(maxlen=n)the moment you construct it; the BM25 and embedding variants need to build an index and cache embeddings. State cannot be written as a single function, and forcing it into a decorator would mislead students into thinking "memory is the same kind of thing as a skill." - For that reason,
from cantus import memoryandfrom cantus import register_memoryraise ImportError on purpose at the package surface, and tests guard that contract.
When students notice the gap, they learn that "can this be written as a decorator?" is a real design judgement, and that it correlates strongly with whether there is state involved.
Class-first style (the only canonical style)
from cantus.protocols.memory import Memory, Turn
class TopicMemory(Memory):
"""Group turns by topic keyword and recall the matching bucket."""
def __init__(self) -> None:
self._buckets: dict[str, list[Turn]] = {}
def remember(self, turn: Turn) -> None:
topic = _classify(turn.user)
self._buckets.setdefault(topic, []).append(turn)
def recall(self, query: str) -> list[Turn]:
topic = _classify(query)
return list(self._buckets.get(topic, []))An implementation only needs to override two methods: remember(turn) and recall(query). Turn is a frozen dataclass, Turn(user: str, assistant: str).
Trade-offs across the four built-in implementations
| Class | Mechanism | Trade-off |
|---|---|---|
ShortTermMemory(n=10) | collections.deque(maxlen=n), strictly in arrival order | Fastest and simplest; ignores the query; remembers only the most recent turns, and old ones get pushed out |
BM25Memory(top_k=5) | rank-bm25 keyword retrieval | No model weights needed; accurate when the relevant keyword actually appears; the tokenizer is plain whitespace splitting, so judge it carefully for mixed CJK and English text |
EmbeddingMemory(top_k=5, model_name=...) | sentence-transformers cosine similarity | Catches semantically similar sentences; slow to load the first time and needs an extra dependency; for short sentences or rare words it does not always beat BM25 |
MarkdownMemory(path, top_k=10) | YAML-frontmatter chunks written into a single .md file | Human-readable and git-diff friendly; recall is a case-insensitive substring match; returns in file order; the cap is tunable through top_k |
The teaching progression maps neatly onto four levels: data structure (deque) → information retrieval (BM25) → machine learning (embeddings) → file persistence (markdown). You can introduce them one at a time as students advance.
The two-tier API
cantus splits Memory into two tiers: the lower tier of four explicit Memory implementations, and the upper-tier AutoMemory that exposes four LLM-facing tools.
- Lower tier: host code calls
mem.recall(query)andmem.remember(turn)itself. The student controls exactly when each retrieval and each write happens, which suits teaching and deterministic flows. - Upper tier:
AutoMemory(backend=mem)wraps any lower-tier memory into four cantusSkills (view,create,str_replace,delete), aligned with the Anthropic Memory tool spec. Feedauto.toolsinto an agent and the LLM decides for itself when to do CRUD.
from cantus.protocols.memory import MarkdownMemory, AutoMemory, Turn
backend = MarkdownMemory("memo.md") # lower-tier explicit API
backend.remember(Turn(user="q", assistant="a"))
print(backend.recall("q")) # [Turn(user='q', assistant='a', ...)]
auto = AutoMemory(backend=backend) # upper tier: 4 Skills for the LLM
print([t.name for t in auto.tools]) # ['view', 'create', 'str_replace', 'delete']Design details:
AutoMemoryuses composition: it holds anyMemoryas its backend rather than inheriting fromMemory, so it does not disturb the lower-tier ABI.AutoMemoryis itself not aMemorysubclass.auto.toolsis an instance-level cache: every access returns the same list object, so the spec the LLM sees does not drift between turns.- The docstring on the
toolsproperty always contains the literal string"LLM has full CRUD access", so static introspection and IDE hover both surface the warning.
MarkdownMemory path safety
MarkdownMemory(path) runs a four-step "resolve-then-classify" check in its constructor:
- Windows UNC: a raw string beginning with
\\or//raisesValueError("path traversal ..."). - Path traversal: a raw string containing
..whosepath.resolve()lands outside the current cwd subtree raisesValueError("path traversal ..."). - System path: a path that resolves under
/etc,/sys,/proc,/dev, or/root(including the macOS canonical forms such as/private/etc) raisesValueError("system path ..."). Symlink attacks (for example/tmp/memo.mdpointing at/etc/passwd) are caught here becauseresolve()unwraps the link before classification. - Unsafe file type: a resolved target that is a FIFO, socket, or block device raises
ValueError("unsafe file type ...").
Every rejection completes before the file is opened. A rejected path is never created, opened, or touched by any IO beyond the stat() needed to classify it.
AutoMemory: autonomous LLM CRUD, and a production warning
The four Skills returned by AutoMemory.tools expose full CRUD to the LLM by default, with no built-in content filter. In a teaching setting that is intentional: students need to see the trade-offs of letting an LLM write and delete on its own. Before any production use, wrap them with a filter using cantus's existing hook mechanism:
from cantus import skill
from cantus.protocols.memory import AutoMemory, MarkdownMemory
def block_secrets(result):
# post_hook example: reject the write if it detects a sensitive string
return result # ... your filtering logic
backend = MarkdownMemory("memo.md")
auto = AutoMemory(backend=backend)
# Replace the create tool with a wrapped version (keeping the other 3 tools)
create_skill = auto.tools[1]
create_skill._post_hook = block_secrets
agent_tools = list(auto.tools)EventStream JSON-Lines persistence
cantus.core.event_stream_persistence.JsonLinesPersistence(path) is an optional persistence plug for the EventStream. Each append(event) immediately calls os.fsync and writes to a single .jsonl file; load() rebuilds the event list from that file. The default EventStream stays in memory, so this plug is an explicit opt-in.
from cantus.core.event_stream_persistence import JsonLinesPersistence
p = JsonLinesPersistence("session-001.jsonl")
p.append({"action": "search", "query": "Tainan"})
p.append({"observation": "found 3 books"})
# Reload across sessions
restored = JsonLinesPersistence("session-001.jsonl").load()
print(restored) # [{'action': 'search', ...}, {'observation': ...}]Design constraints:
json.dumpsruns beforeopen(). A non-serializable event raisesTypeError("... not JSON serializable ..."), and the file is neither created (cold start) nor modified (existing file).- A newly created file uses POSIX mode
0o600, so other users on a shared machine cannot read sensitive conversation logs. - Every
appendcalls fsync. The teaching positioning assumes a single-digit number of events per second, where the performance cost is negligible. For production-scale persistence, wait for a later release.
Common mistakes
- Forgetting to initialize the internal container in
__init__, so the first call torecallraisesAttributeError. - Assuming you can register with
@memory: the package surface raises ImportError directly. Use a class instead. - Putting an LLM call inside
recall: memory should be pure retrieval. For summarization, split it into a skill or a workflow, then let memory store the result. - Using
BM25MemoryorEmbeddingMemorywithout installing the extras: at runtime you will be told to runpip install 'cantus[memory]'.