stefandango.dev

Self-Reliance

Building a personal AI assistant for my notes: Your tool catalogue is your data-flow specification

Notes on building a small AI assistant over my Obsidian vault, and why the decision that looks like over-engineering is the one that makes a credible data-sovereignty story possible at all.

There's a failure mode most people who keep notes will recognise. You write things down (a decision at work, a config that took an afternoon to get right, a thought worth coming back to) and then you never read any of it again. The notes accumulate. The writing itself stops doing any work.

I've been migrating about a year's worth of notes into a fresh Obsidian vault. Maybe fifty notes so far. The reason isn't tidiness; it's that I want to fix the never-read-again problem at the structural level, which means treating the vault as the front half of a knowledge system whose back half is software I haven't built yet. The back half is what this post is about, and specifically one decision in its design that I think is what separates AI architectures you can defend to a security review from ones you can't.

I'm a senior .NET developer based in Copenhagen, ten-plus years in case management. That domain runs on data that can't leave a specific jurisdiction, or a specific machine, or in some cases a specific room, and the question of how you put AI on top of data like that without compromising any of those constraints is one I've been thinking about for the last year. The project where I'm working that question out in code is agentic-rag: a small .NET service that lets a language model answer questions over my own notes, on my own infrastructure. It's open at github.com/stefandango/agentic-rag, and this post is the long version of one decision documented there.

The decision looks like over-engineering on first inspection. I want to explain why I think it's the most important thing in the codebase.

What this kind of system actually is

For readers who haven't built one of these: the pattern is called retrieval-augmented generation, or RAG. You take a question, find the most relevant pieces of text from your own corpus, hand those pieces to a large language model, and the model answers using your text rather than its training data. The answer is grounded in material you control.

The "agentic" prefix means the model is doing more than receiving chunks blindly. It decides how to search: which sources to look at, which filters to apply, whether to chain a second lookup off the first. The agent loop gives the model tools, the model picks which to call, the system runs them, the model synthesises an answer.

The system has one source of knowledge in v0.5: the vault. A self-hosted bookmark manager is queued up to become the second source. Beyond that, anything that accumulates content I'd want to query later (saved articles, an RSS reader, recipe notes, whatever) is a candidate. The point isn't a specific list of integrations. The point is that any source you keep should be queryable through the same interface. Otherwise you've built a search engine that gets worse with every new corpus you add to it.

That's the design constraint. Here's why it matters more than it sounds.

Where data sovereignty actually lives

When people say "sovereign AI" they usually mean one of two things: the model runs on infrastructure they control, or the data stays in a particular jurisdiction. Both are necessary. Neither is sufficient.

The sovereignty surface of an agentic system, the place where you can credibly say "I know where this data goes and who can see it," isn't the model. It isn't the network boundary either. It's the contract between the agent and the knowledge layer, because that contract is where data crosses every single time the model is asked anything. Query in, retrieval call out, chunks back, chunks into the model's context. The retrieval call is the moment your storage starts talking to a language model, and what that call can and cannot do is what determines whether you have a sovereignty story or just a deployment diagram.

The tool catalogue you expose to a language model is, in effect, the data-flow specification for the system. It says: these are the calls that move data, these are the parameters that constrain what comes back, this is the contract. If your catalogue has one retrieval tool with explicit filters, your data-flow specification is one function signature. A security reviewer can read it in thirty seconds and reason about it completely. If your catalogue has N retrieval tools, one per source, your specification is N function signatures plus whatever rules the model follows when deciding between them. That second part is fundamentally unauditable. The model's tool-selection behaviour is observable but not predictable, and it shifts as you add more tools.

Sovereignty isn't a feature you bolt on. It's a property of the seams. The seam between the agent and the knowledge layer is the most important one to get right, because it's the one that runs on every query.

The naive plan, and why it would have worked

When you build an LLM-driven agent, you give it tools. A tool is a function the model can call, with a name, a description, and arguments, and the model decides when to invoke it based on the user's question. For a retrieval system, the obvious tools are retrieval tools: functions that fetch relevant text from a particular place.

The original plan, written down before I'd settled on an embedding model, was two of them. search_vault to query my notes, search_bookmarks to query the bookmark manager. The model would look at the question ("what did I write about Postgres backups" goes to the vault, "did I save that article about Postgres tuning" goes to bookmarks) and pick the right tool. Each would have its own implementation talking to its own backend.

This was sensible. It mirrors how the two systems are actually organised. It would have been easy to explain in a README. The agent would have answered most questions correctly. And it would have given me two data-flow contracts to audit, then three, then five, with no clean way to reason about which one fired when.

The shape of the failure isn't obvious until you sit with it.

What goes wrong with one-tool-per-source

The set of tools you expose to a model is its contract with you. Every tool is a routing decision the model has to make on every turn. One tool means no routing: the model retrieves, then answers. Two similar tools means the model has to choose. Five means the model is doing the work the system should be doing.

The first failure is the one most agent builders hit first. A perfectly reasonable question, "have I written or saved anything about Postgres backups?", now requires the model to call search_vault and search_bookmarks and (eventually) search_rss, then merge the results in its head. Sometimes it forgets one. Almost always it spends tokens deciding what to call before it spends any tokens answering. The agent fanning out across parallel tools that should have been one is a common enough pattern that it has a name: the fan-out anti-pattern.

The second failure is subtler and harder to fix once it's in. Each retrieval tool returns its own relevance scores, computed by its own retrieval system. The vault tool might return cosine-similarity scores against vector embeddings, numbers between roughly 0 and 1 that quantify how semantically close a chunk is to the question. The bookmark tool might return whatever ranking the bookmark manager's text search produces, which uses a different formula entirely. There is no honest way to merge a top hit at vector cosine 0.61 with a top hit at "the other system's relevance, whatever that means." You end up either trusting one source arbitrarily, or asking the model to reconcile rankings it can't reliably reconcile.

The third failure is silent drift. Adding a fifth retrieval tool changes how the model handles the first four. Tool descriptions interact in ways that aren't predictable from reading them in isolation. Imagine adding search_bookmarks with a description that mentions "saved articles" and "web content." On the next deploy, the agent might start calling search_vault less often on questions where it's still the right tool, because the new description partly overlaps with the kinds of questions the vault tool used to win cleanly, and the model now hedges between them. This is observable in practice but very hard to test for, and it gets worse as the catalogue grows.

That third failure is also the one with the sovereignty consequence. If you can't predict how the agent routes between sources, you can't credibly claim you know where the data flows. Worse, the routing rules aren't even fixed: the model reinterprets them quietly as the catalogue grows, which means whatever audit story you told about the old shape no longer matches the new one, and you don't get a warning when it shifts. The agent should not be a router for your storage layout. The system underneath the agent should unify retrieval before the agent sees anything.

What "source-agnostic" actually means

The replacement that solves all three failure modes, and gives you one auditable contract instead of N, is one retrieval tool with an optional filter for which sources to consider. In code:

ToolResult<IReadOnlyList<SearchHit>> SearchKnowledge(
    string query,
    int topK = 5,
    string[]? sources = null,
    string[]? tags = null,
    string? type = null,
    string[]? folders = null);

In English: give me the top N most relevant chunks for this question, optionally narrowed to specific sources, tags, types, or folder paths. sources defaults to "all of them." In v0.5 that means the vault, because the vault is the only thing indexed. In v1, when bookmark ingestion lands, it means the vault and the bookmarks, with the same call signature. The model's mental model does not change. The tool description does not change. The system's data-flow specification does not change.

What changes is the index underneath, not the interface above.

The mechanism is straightforward and the README has the full version: every chunk in the vector store carries a source field in its metadata, the search applies a filter when sources is constrained, all chunks compete in the same similarity space against the same query vector. Adding a new source becomes an ingestion-layer change, not an agent-layer change. The second integration is cheaper than the first, the third cheaper still. Critically, the contract you defend to a security reviewer doesn't grow with the system. One signature, one set of parameters, one place where data crosses from your storage to the model.

What it costs

Every architectural recommendation should be paired with what it costs, and this one isn't free.

The first cost is that you have to design the surface before you've felt the pain of the second source. There's a sensible version of YAGNI (you aren't gonna need it, the principle that you shouldn't build for cases you don't have yet) that would push back here. Why design for bookmarks when bookmarks aren't even indexed in v0.5? The defence is asymmetry. Designing the surface this way up front costs one parameter on one tool and one field on the index payload. Retrofitting it later means rewriting the tool catalogue, every test that touched it, and the agent's system prompt, and migrating whatever audit story you'd told about the old shape to the new one. Doing it early is cheap. Doing it late is not.

The second cost is concrete and visible. v0.5 ships with a sources parameter that does nothing useful, because the only valid value is ["vault"], which is the same as omitting it. This is deliberate over-engineering of the right kind: the API is the part that's hard to change, and the implementation is the part that's easy to extend.

The third cost is the deferred-source decision itself. The original v0.5 plan had bookmark search shipping in v0.5, talking directly to the bookmark manager's HTTP API. The source-agnostic surface forced a choice: ship a second tool for bookmarks now and live with the fan-out problem from the start, or hold bookmarks until ingestion can land them in the same store. I chose to hold. v0.5 has a narrower story than the original plan, vault only with no bookmarks, but the story is architecturally honest and the data-flow contract is one function signature. A narrower story you can defend is more credible than a wider story that quietly falls apart.

What actually matters

The catalogue you can hand a security reviewer in one screen is the one you can defend. Every tool you add splits that contract further, and the model picks up where the contract leaves off, which is exactly the part you cannot audit.

agentic-rag v0.5 is vault-only, and also, deliberately, ready for everything else. The interesting engineering in this space isn't whether to put AI on top of your knowledge. It's how you do it in a way that survives a serious look from someone responsible for what your software does with that knowledge.


Stefan is a senior .NET developer based in Copenhagen, interested in how AI integrates with enterprise systems under real data-sovereignty constraints. More writing at stefandango.dev