NAVis

A personal AI agent, built from the inside

Always-On Agent Identity-Driven Autonomous Scheduling Hetzner · Telegram · OpenClaw

I spent a weekend building NAVis, a personal always-on AI agent running on a Hetzner VPS with Telegram as the interface. I was inspired by Marc Andreessen's framing that Agent = LLM + Shell + Filesystem + Markdown + Cron. A Unix-for-AI composition where every component was already known, but their combination unlocks autonomous operation. I wanted to understand this architecture from the inside, not just use it. The question underneath every design decision: what happens when I'm not watching and something goes wrong?

Hetzner VPS· Telegram· OpenClaw·

The question underneath every design decision.

What happens when I'm not watching and something goes wrong?

You are not building a chatbot. You are building a system you can trust to act in the world without you watching. Every capability in NAVis exists to answer a specific question about unsupervised operation.

Security first
Who can reach the agent when you're not there?
Identity files
What does it believe about itself when there's no one to correct it?
Read before write
What's the worst it can do if it gets something wrong?
Approval gates
What requires you back in the loop before something irreversible happens?
Incremental capability
Have you actually seen it handle the smaller thing correctly before it gets the bigger one?

contain it → define it → limit blast radius → gate irreversible actions → prove it before expanding scope

Six capabilities. Architecture first, output last.

NAVis is not a morning-email bot. The brief is proof point six of six.

01
Persistent identity across stateless sessions
Four markdown files — SOUL, USER, AGENTS, MEMORY — loaded per turn. Filesystem does the remembering the model cannot.
02
Named writer sub-agent with its own narrow identity
Separate SOUL.md, explicit failure modes, own model routing. A specialist the generalist delegates to.
03
Self-audit across ten capability checkpoints
The agent inspects its own filesystem and cron state, reports PASS/FAIL per checkpoint. Introspection as a first-class operation.
04
Always-on Telegram chat interface
Token-authed loopback gateway on :18789. Tailscale for my laptop. The public internet cannot see the port.
05
Live web search, scoped to the session
Brave Search API installed as a skill at session scope. Capability appears when useful, vanishes when not.
06
Morning Gmail summary to Telegram at 08:00
IMAP fetch via shell wrapper, curl to the Telegram Bot API. The smallest, loudest proof the other five work.

Same system. Three audiences. Three diagrams.

I drew these to pressure-test my own understanding. Each one answers a different question.

Audience. A recruiter, a PM peer, a curious friend. What it is in thirty seconds.

Audience. An engineer. Four layers, three entry paths, the shape of the system without the wiring detail.

Audience. Me at 2am, debugging. Every port, every file path, every session type and payload.

Autonomy is a load-bearing question, not a feature.

Every autonomous system has to answer these five, in this order. Scheduling sits on top — it only becomes safe once the four below are solved.

05Scheduling
When does it run, and can you trust it to run alone? Autonomy only works once the four below are solved.
04Action surface
What can the agent actually do in the world? Skills, shell wrappers, outbound APIs, write permissions.
03Intelligence
Which model, which prompts, what context window. The stateless decision-maker that every turn rebuilds.
02State & behaviour
How does a stateless model carry memory? Through markdown files the agent reads on every turn. Identity lives on disk.
01Isolation
Can components fail independently without taking the rest down? Sessions, gateway, cron, skills — each with its own blast radius.

You cannot schedule what you cannot isolate.

Four distinctions that had to click before anything worked.

Everything below is what I had to ground in my own running setup before I could trust the agent to operate on its own.

OS cron versus OpenClaw's internal scheduler

Initially assumed
That anything called a "cron job" would be in /etc/crontab somewhere. OS cron was the only scheduler I knew.
Worked out
Jobs created inside OpenClaw persist to jobs.json and run through the gateway's own scheduler. /etc/crontab is untouched. OS-level cron is irrelevant to NAVis.
Why it matters
Debugging starts in the wrong place if you assume OS cron. The gateway logs are the first stop, not journalctl -u cron.

The cron chain has no shell-to-port handshake

Initially assumed
A scheduled job would execute a shell command that curled the gateway on :18789, the same way my laptop does.
Worked out
The gateway's scheduler reads jobs.json from inside the same process. It creates an isolated session, loads identity files, calls the LLM. No shell, no curl, no port.
Why it matters
If the gateway is down, jobs do not fire. There is no external trigger to fall back on. The gateway is both the point of failure and the single source of scheduling truth.

jobs.json is the persistence layer

Initially assumed
Scheduled jobs lived somewhere in memory inside the gateway process.
Worked out
Scheduling survives gateway restarts because state lives in jobs.json on disk. On boot the gateway reads it, rebuilds its timer table, and continues. The file is the system of record.
Why it matters
To inspect or edit a job, go to the file. Always stop the gateway before editing directly. Always verify after creation that sessionKey and sessionTarget are what you intended.

Three kinds of session, each with its own payload shape

Initially assumed
A session was a session. One abstraction.
Worked out
Three kinds. Main sessions carry systemEvent payloads. Isolated sessions (used by cron) carry agentTurn payloads. Named sessions (session:xxx) persist across runs. Mixing payload type and session type fails silently.
Why it matters
Most of my early cron failures were this. The job fires, returns no error, produces no output, and the only evidence of the mismatch is a run record that looks fine at a glance.

Six failures. Six habits.

Thematic. Each one changed how I think about agentic systems.

Twice. First: it reported editing jobs.json when the file on disk was unchanged. Second: it reported creating a script in scripts/ that had never been written. Both times the natural-language response was confident and specific.

Always verify writes with cat, ls, or git log before trusting a write happened. Any agent will lie by omission when a tool call quietly fails.

NAVis created a scheduled job with a malformed sessionKey, despite sessionTarget: isolated being set correctly in the request. The job fired, produced nothing, and left no error. Only inspecting jobs.json directly revealed the mismatch.

Inspect jobs.json after every job creation. The field you did not set is the one that breaks you.

The model I was using interpreted strict-output prompts instead of forwarding them verbatim. A cron job meant to echo an exact string instead produced a helpful paraphrase. The prompt said "output exactly"; the model heard "do something useful with this."

For deterministic steps, bypass the LLM loop with a shell script. Skills are instruction manuals. The model decides whether to follow them.

The community imap-smtp-email skill had four undocumented gaps: app-password setup, IMAP folder casing, attachment handling, and rate-limit behaviour. All invisible until the first real run against a real inbox.

Community skills are instruction manuals for the happy path. Deployment details live below the surface. Read the source before trusting the README.

The heartbeat mechanism was disabled at the start -- the stated reason was nothing useful to do yet. The real reason is sharper: without identity files loaded, the agent has full action permissions and zero constraints. Heartbeat firing into that vacuum is an unsupervised agent with no rules acting on the infrastructure. The same principle surfaced when installing the community email skill -- it ran with full agent credentials from the moment it landed, before I had verified what it could access.

Capability before constraints is the most common mistake in AI product design. The question is never just "what can this agent do" -- it's "what is the worst case if it acts right now, with what it currently knows and doesn't know."

Skills installed in the workspace tier are invisible to isolated cron sessions. A skill that worked perfectly in interactive webchat did not exist in the scheduled execution context. The cron session had no error, no warning -- it simply proceeded without the skill. The only way to catch it was knowing that workspace-tier and global-tier skills are different capability surfaces.

"It worked in testing" is often a deployment context mismatch, not a code bug. Before shipping any autonomous feature: what does the agent actually have access to at runtime? This is a product spec question, not an engineering detail.

Three projects. Design, validation, operation.

Each one answers a different question about building with AI.

DEALta

Built a product with AI. Multi-agent orchestration for contract review.

dealta.mandava.in →

EVALens

Validated the quality of AI systems. A CI quality gate for retrieval.

evalens.mandava.in →

NAVis

The 'under-the-hood' primitives for AI agents.

you are here
NAVis in operation — a man in autumn Berlin reading a self-audit report on his phone, beside a wooden shed in the clouds where two friendly robots tend to identity files, a run log, and a Telegram pipe.

Together these cover design, validation, and operation. PRACtis, an autonomous intelligence pipeline, is next. Same primitives, different application.

The five problems don't change at scale -- isolation, state, intelligence, action surface, and scheduling. What changes is what implements each primitive when it has to serve 100,000 users instead of one. Shell becomes containerisation. Markdown splits into a database for memory and a prompt management system for behaviour, the most AI-native change in the stack. The LLM call becomes a gateway with routing, fallbacks, and cost caps. The filesystem becomes distributed object storage. Cron becomes a job queue with retry logic and dead letter queues. Two things appear that don't exist at personal scale at all: auth and observability. I didn't know any of this abstractly. I built the personal-scale version, hit each wall, and worked backwards to understand why the production tool exists. Read the full teardown →