A personal AI agent, built from the inside
I spent a weekend building NAVis, a personal always-on AI agent running on a Hetzner VPS with Telegram as the interface. I was inspired by Marc Andreessen's framing that Agent = LLM + Shell + Filesystem + Markdown + Cron. A Unix-for-AI composition where every component was already known, but their combination unlocks autonomous operation. I wanted to understand this architecture from the inside, not just use it. The question underneath every design decision: what happens when I'm not watching and something goes wrong?
What happens when I'm not watching and something goes wrong?
You are not building a chatbot. You are building a system you can trust to act in the world without you watching. Every capability in NAVis exists to answer a specific question about unsupervised operation.
contain it → define it → limit blast radius → gate irreversible actions → prove it before expanding scope
NAVis is not a morning-email bot. The brief is proof point six of six.
SOUL, USER, AGENTS, MEMORY — loaded per turn. Filesystem does the remembering the model cannot.SOUL.md, explicit failure modes, own model routing. A specialist the generalist delegates to.PASS/FAIL per checkpoint. Introspection as a first-class operation.:18789. Tailscale for my laptop. The public internet cannot see the port.curl to the Telegram Bot API. The smallest, loudest proof the other five work.I drew these to pressure-test my own understanding. Each one answers a different question.
Audience. A recruiter, a PM peer, a curious friend. What it is in thirty seconds.
Audience. An engineer. Four layers, three entry paths, the shape of the system without the wiring detail.
Audience. Me at 2am, debugging. Every port, every file path, every session type and payload.
Every autonomous system has to answer these five, in this order. Scheduling sits on top — it only becomes safe once the four below are solved.
You cannot schedule what you cannot isolate.
Everything below is what I had to ground in my own running setup before I could trust the agent to operate on its own.
/etc/crontab somewhere. OS cron was the only scheduler I knew.jobs.json and run through the gateway's own scheduler. /etc/crontab is untouched. OS-level cron is irrelevant to NAVis.journalctl -u cron.:18789, the same way my laptop does.jobs.json from inside the same process. It creates an isolated session, loads identity files, calls the LLM. No shell, no curl, no port.jobs.json is the persistence layerjobs.json on disk. On boot the gateway reads it, rebuilds its timer table, and continues. The file is the system of record.sessionKey and sessionTarget are what you intended.systemEvent payloads. Isolated sessions (used by cron) carry agentTurn payloads. Named sessions (session:xxx) persist across runs. Mixing payload type and session type fails silently.Thematic. Each one changed how I think about agentic systems.
Twice. First: it reported editing jobs.json when the file on disk was unchanged. Second: it reported creating a script in scripts/ that had never been written. Both times the natural-language response was confident and specific.
cat, ls, or git log before trusting a write happened. Any agent will lie by omission when a tool call quietly fails.NAVis created a scheduled job with a malformed sessionKey, despite sessionTarget: isolated being set correctly in the request. The job fired, produced nothing, and left no error. Only inspecting jobs.json directly revealed the mismatch.
jobs.json after every job creation. The field you did not set is the one that breaks you.The model I was using interpreted strict-output prompts instead of forwarding them verbatim. A cron job meant to echo an exact string instead produced a helpful paraphrase. The prompt said "output exactly"; the model heard "do something useful with this."
The community imap-smtp-email skill had four undocumented gaps: app-password setup, IMAP folder casing, attachment handling, and rate-limit behaviour. All invisible until the first real run against a real inbox.
The heartbeat mechanism was disabled at the start -- the stated reason was nothing useful to do yet. The real reason is sharper: without identity files loaded, the agent has full action permissions and zero constraints. Heartbeat firing into that vacuum is an unsupervised agent with no rules acting on the infrastructure. The same principle surfaced when installing the community email skill -- it ran with full agent credentials from the moment it landed, before I had verified what it could access.
Skills installed in the workspace tier are invisible to isolated cron sessions. A skill that worked perfectly in interactive webchat did not exist in the scheduled execution context. The cron session had no error, no warning -- it simply proceeded without the skill. The only way to catch it was knowing that workspace-tier and global-tier skills are different capability surfaces.
Each one answers a different question about building with AI.
The 'under-the-hood' primitives for AI agents.
you are here
Together these cover design, validation, and operation. PRACtis, an autonomous intelligence pipeline, is next. Same primitives, different application.
The five problems don't change at scale -- isolation, state, intelligence, action surface, and scheduling. What changes is what implements each primitive when it has to serve 100,000 users instead of one. Shell becomes containerisation. Markdown splits into a database for memory and a prompt management system for behaviour, the most AI-native change in the stack. The LLM call becomes a gateway with routing, fallbacks, and cost caps. The filesystem becomes distributed object storage. Cron becomes a job queue with retry logic and dead letter queues. Two things appear that don't exist at personal scale at all: auth and observability. I didn't know any of this abstractly. I built the personal-scale version, hit each wall, and worked backwards to understand why the production tool exists. Read the full teardown →