Most AI products work by pouring as much chat history as the model will accept into every request. ownify doesn't. We retrieve a handful of well-ranked memories per call — inspectable, deletable, exportable. The result is roughly 4× cheaper than naive long-context on real production traffic, and the answers are usually better because the model isn't hunting through noise.
We use ~28k tokens of precisely-retrieved context per request, not 120k+ of raw history. Same answer quality, often better, at roughly 1/4 the model cost — dropping to ~1/14 once stable system prompts hit the cached-input rate. That's why we can offer flat per-agent pricing instead of metering by context length.
Long context isn't free quality — LLMs degrade on it (the “lost-in-the-middle” effect). Five well-ranked memory drawers beat 200 messages of chat history. We measured this on the LongMemEval-S benchmark: R@5 = 95.4%, within 1.2 points of the best published number on the same split.
Memory in ownify is discrete drawers, not an opaque blob. You can list them, search them, export them, delete them one by one. Ask “what does my agent know about me?” and we can answer literally. A giant context window can't do that.
Per-message cost is bounded by design. You don't get surprised by a €400 month because your team had long chats. We pass that predictability through as flat per-agent pricing — and we can hold the line on it because we engineered the cost shape, not just the marketing.
We don't promise infinite memory. We promise the right memory at the right moment. If the answer is in your archive, we'll find it. If it isn't, we'll tell you. When you genuinely need long context (analysing one huge document), we route to a long-context model for that turn — we don't pretend retrieval beats long-context for every task.
Inspectable memory is also auditable memory. Every read and write against your agent's store goes through ACL-checked endpoints and ends up in a log you can see. Retrieval-style memory makes “what did the agent look at, when?” a real, answerable question — not a vibe.
All numbers in EUR, on Kimi K2.6, at the public rate of €1.40/1M input (€0.24 cached) and €5.88/1M output (Fireworks list × 1.47 markup). Production averages are pulled from our LiteLLM call log across 532 real Kimi K2.6 calls (April 2026).
| Approach | Avg prompt | Avg output | € / message | € / 1,000 msgs |
|---|---|---|---|---|
Naive long-context full history each turn, no retrieval | 120,000 | 976 | €0.1733 | €173 |
ownify — uncached retrieved memory + recent turns + soul | 27,670 | 976 | €0.0444 | €44 |
ownify — with prompt cache ~80% of prompt cached (stable soul + skills) | 27,670 | 976 | €0.0187 | €19 |
We'd rather you hear these from us than discover them later.
If the task is “read this 200-page contract end-to-end and tell me every reference to indemnification”, retrieval over a knowledge base isn't the right shape — you want long-context inference for that one turn. Our default routing handles that: the request gets sent with the document attached, to a long-context model, billed at that model's rate. We don't pretend retrieval beats long-context for every task.
Retrieval is only as good as what got indexed. If the agent's memory is full of half-finished notes and contradictions, it will retrieve half-finished notes and contradictions. We surface what was indexed (the external-memory page lets you read it directly), so you can see and clean it. But the cleaning is real work — we don't hide that.
Memory holds what the agent has seen. If you need fresh information from the live web, that's what tools are for, not memory. ownify agents can call MCP tools (search, fetch, scrape) and the result lands in memory for next time. But the first call has to actually go out.
ownify charges for retained, retrievable knowledge (drawers under management × retrieval quality × concurrency), not for raw token volume. That's a cleaner unit economically and defensible publicly: customers can see what they're paying for instead of a metering bill that scales with chat length.
Concretely: each plan ships with an LLM budget large enough to cover typical usage at the rates above, and top-up credits at Fireworks list × 1.47 EUR if you exceed it. No per-context-token surprise.