Skip to content

AgentOps — One-Page Brief

The problem: AI coding tools behave like contractors with amnesia. Every session starts from zero — no memory of what broke last week, no record of decisions already made, no awareness of what was tried and abandoned. You brief them today. Tomorrow you brief them again.


What It Is

A repo-native operational layer for coding agents.

AgentOps gives every session bookkeeping, validation, primitives, and flows. Work begins with repo context instead of a blank prompt, passes through explicit validation before shipping, and ends by feeding findings, learnings, and next work back into the environment.

The institutional knowledge stops walking out the door because the repo keeps it.


Internal Proof Contract

Most coding-agent tooling handles prompt construction and routing well. The failure mode comes after that. Internally, AgentOps proves the product through a three-gap lifecycle contract (see docs/context-lifecycle.md):

Gap Problem AgentOps response
Validation (internal: judgment validation) The agent ships without risk context that would challenge its choices /pre-mortem before implementation, /vibe before commit, /council for multi-judge review
Bookkeeping (internal: durable learning) Solved problems recur because nothing extracts, scores, or retrieves the lesson .agents/ ledger, ao lookup, finding registry, /retro extraction, freshness curation
Closure (internal: loop closure) Completed work does not produce better next work /post-mortem harvests learnings and next-work, finding compiler promotes failures into constraints, GOALS.md + /evolve turn findings into measurable improvements

The compound effect below only works because validation catches the problem, bookkeeping preserves the lesson, and closure ensures the next session loads better context before repeating the mistake.


Four Load-Bearing Layers

Skills — The Primitive and Flow Layer

Structured primitives and named flows for discovery, execution, validation, recovery, and release. Skills route work into the right chain instead of leaving the agent to improvise the lifecycle.

Hooks — The Enforcement Layer

Runtime hooks fire at session start/end, prompt submission, tool boundaries, stop, and task completion. They keep guidance, validation, and flywheel closure active even when the operator does not remember every step.

ao CLI — The Control Plane

The CLI handles retrieval, ratchet checkpoints, flywheel closure, goals, curation, and phased execution support. It is the mechanical bridge between plain files, flows, and enforceable progress.

.agents/ — The Bookkeeping Layer

Plain files on disk: research, brainstorms, findings, learnings, handoffs, ratchet traces, next-work queues. No single session has to remember the plan. The environment carries it forward.


How a Session Works

Text Only
Session starts
  -> Startup hooks retrieve lightweight context and continuity hints
  -> Discovery scopes the work and pressure-tests the plan

Implementation runs
  -> Fresh workers execute in bounded waves
  -> Validation gates challenge the output before closure

Session ends
  -> Learnings, findings, and next work are harvested
  -> Flywheel closure updates what the next session will see

Next session starts with a richer environment than this one did.

Key Properties

Property Detail
Local-only No telemetry, no cloud, no vendor accounts. Nothing phones home.
Open source Every line auditable. Apache 2.0 licensed.
Multi-tool Works with Claude Code, Codex, Cursor, OpenCode. Not locked to one vendor.
Air-gap compatible Runs fully offline. Knowledge base is plain files.
Auditable trail Every learning, decision, and review verdict written to .agents/ with timestamps.

The Compound Effect

Text Only
Without AgentOps:  [2 hrs] → [2 hrs] → [2 hrs] → [2 hrs]  =  8 hours total
With AgentOps:     [2 hrs] → [10 min] → [2 min] → instant  =  ~2.2 hours total
                    learn     recall     refine    mastered

By session 100, the repo already carries prior failures, design choices, planning rules, and validated patterns that new sessions can load before they repeat old mistakes.


Development Model

The most accurate current framing is:

Text Only
Public category    -> operational layer for coding agents
Public value       -> bookkeeping, validation, primitives, and flows
Internal proof     -> three-gap lifecycle contract
Technical frame    -> context compiler
Runtime mechanics  -> Brownian Ratchet + Stigmergic Spiral + Knowledge Flywheel

The claim is not "better models." The claim is "better repo mechanics around the models you already have." Publicly, that means bookkeeping, validation, primitives, and flows. Internally, the three-gap contract remains the proof model that verifies those claims are real.


AgentOps — github.com/boshu2/agentops


Appendix: System Map

Scale

Text Only
┌──────────────────────────────────────────────────────────────────┐
│                    AgentOps at a Glance                          │
├───────────────────┬──────────────────────┬───────────────────────┤
│ 66 shared skills  │   `ao` Control Plane │   7 Hook Events       │
│ plus runtime      │ repo-native retrieval│  runtime manifest     │
│    artifacts      │ goals, and automation│                       │
└───────────────────┴──────────────────────┴───────────────────────┘

The Pipeline — Primitive Chains in Motion

/rpi orchestrates the macro lifecycle. Each phase expands into its own skill chain.

Text Only
GOALS.md
  -> /evolve
      -> /rpi
          -> Discovery: /brainstorm -> /research -> /plan -> /pre-mortem
          -> Implementation: /crank -> /swarm -> /implement
          -> Validation: /validation -> /vibe -> /post-mortem -> /retro -> /forge

Validation Layer — Everything Flows Through Council

Text Only
                   ┌──────────────────────────────┐
                   │           /council           │
                   │  (independent reviewers      │
                   │   debate, verdict gates work)│
                   └───────────┬──────────────────┘
                               │ used by
          ┌────────────────────┼────────────────────┐
          ▼                    ▼                    ▼
   /pre-mortem              /vibe              /post-mortem
   (validate plans          (validate code     (wrap-up +
    before building)         before shipping)   learnings)

Knowledge Handoff — Skills and CLI Working Together

Text Only
   SURFACE                 CLI / FILE PRIMITIVE          RESULT
   ───────                 ────────────────────          ──────
/research          ->    ao lookup + ao search      Prior repo context loaded
/plan              ->    findings registry          Reusable risks loaded pre-decomposition
/post-mortem       ->    ao forge + ao session      Learnings harvested and session closed
/vibe              ->    ao ratchet record          Validation checkpoint persisted
/evolve            ->    ao goals measure           Worst fitness gap selected
/recover           ->    handoff artifacts          Interrupted work resumed from disk

Hooks — Automatic Enforcement

Text Only
TRIGGER                   HOOK                        WHAT IT DOES
───────                   ────                        ────────────
Session starts         session-start.sh            Inject prior knowledge
Session ends           session-end-maintenance.sh  Harvest learnings
Agent stops            ao-flywheel-close.sh        Close learning loop
Prompt submit         prompt-nudge.sh             Remind missing intent / ratchet state
Pre tool use          pre-mortem-gate.sh          Require review before risky work
Post tool use         go-complexity-precommit.sh  Block over-complex edits
Task complete         task-validation-gate.sh     Execute compiled validation constraints

CLI Command Groups

Text Only
RETRIEVAL / CURATION        VALIDATION / RATCHETS    WORKFLOW / FITNESS
────────────────────        ─────────────────────    ──────────────────
ao lookup                   ao ratchet status        ao rpi phased
ao search                   ao ratchet record        ao rpi status
ao forge                    ao ratchet check         ao goals measure
ao curate                   ao constraint activate   ao goals steer
ao maturity                 ao constraint review     ao flywheel status
ao dedup                    ao session close         ao hooks list
ao contradict               ao temper validate       ao status
ao notebook                                          ao doctor
ao extract