A question becomes a program
dacular lets a powerful remote model help with your private data without ever seeing it. The trick — the headgate harness — is to treat the frontier model as an untrusted code generator, not a data processor. Here's how one question turns into code that runs on your Mac.
Two asymmetric models
There are two models, deliberately asymmetric:
- The frontier model (untrusted) is the planner/coder.
It sees only a sanitized manifest of your vault — file
aliases (
file_0), kinds, aliased column schemas (col_2) — never contents, names, or values. From that it writes one Mojo program that calls a fixed set of vault tools. - The local model (trusted, on your device) is the
reader. When the program needs to understand content — "is this a
travel expense?", "read the renewal date" — it calls
ask_local(...), which runs on the millrace inference engine and is the only thing that ever sees real text.
The contract the frontier model is given is a single document — the
headgate system prompt — loaded at runtime. It
spells out the confidentiality rules, the current Mojo dialect, and the
tool surface (search, csv_rows,
pdf_text, ask_local, print_answer, …).
Example: a question becomes a program
You ask "How much did I spend on travel last year?". The frontier model — seeing only aliases — writes this, and headgate compiles it and runs it in a sandbox that can reach only your local model:
# written by the frontier model — it never sees a single real value
from vault import *
def main() raises:
var hits = search("travel transportation flights hotels expenses", 40)
var total = 0.0
for c in hits:
# ask_local reads the REAL chunk on-device; returns "amount|yes" or "0|no"
var verdict = ask_local(
"If this is a 2025 travel expense, reply '<amount>|yes', else '0|no'.", c.text)
var parts = verdict.split("|")
if len(parts) == 2 and String(parts[1]) == "yes":
total += atof(String(parts[0]))
print_answer("You spent about $" + String(total) + " on travel in 2025.")
The frontier model orchestrates over aliases; search and
ask_local do the real work locally; the sum is computed on
your machine and print_answer surfaces it there. The
search results and the answer are never returned to the
frontier model — which is exactly why the program model is load-bearing,
not an implementation detail.
Why it holds
Containment lives outside the model, at the OS level. The generated program runs under a Seatbelt profile that denies all network except loopback to your local engine — it can't phone home. An egress guard gates every message to the frontier (fails closed), and the compile-feedback loop only ever sends back aliased source, never runtime output that might contain real content. Your documents never leave the Mac, and never reach the frontier model. See the walkthrough to try it, or headgate for the full design.