Architecture

NightShift is built on 9 pillars that form one unified system. Every feature flows through a single engine. No special code paths. No hardcoded behavior.

System Overview

At a high level, NightShift takes a problem description, builds a team of agents using learned patterns, executes the plan, evaluates the result, and writes everything it learned back to the Knowledge Base.

User (inject / stop / status)

↓

Auditor (mailbox + observer) <—> Investor (valuation + pressure)

↓ ↓

• summary() —→ Coordinator (planner)

↓

AR (proposes teams from learned patterns)

↓

_solve_pipeline (one engine, all strategies)

↓

Evaluator (honest judge — git diff is truth)

↓

KB (shared memory ← everyone reads/writes)

Design Principles

Zero hardcoding — no hardcoded agents, thresholds, or task-type gates. The Coordinator invents roles. The system doesn't care what an agent is called, only what it produces.
Honest evaluation — git diff is the source of truth, not stale file state. Quality 0 = FAIL, not "needs improvement."
File interfaces — any tool integrates by reading files. No server required for monitoring.
Oversight pair — Auditor+Investor prevent stuck loops. They wake each other and escalate naturally.
Output is sacred — the system never finishes empty. Even a partial run produces something.
Patterns evolve — AR mutates, competes, skills accumulate. The system gets better at the class of problems you run.

The 9 Pillars

Pillar 01

🧠

Knowledge Base

LanceDB + ModernBERT · hybrid search · 3ms

Every run produces knowledge: strategies that worked, errors encountered, domain facts discovered. The KB stores all of it using LanceDB with ModernBERT contextual embeddings and hybrid BM25 + vector search that returns results in 3ms.

Two tiers: global (~/.nightshift/kb/) holds cross-project knowledge; local (.nightshift/kb/) holds project-specific facts. After each run, the Librarian (Claude Haiku) consolidates — merges duplicates, drops noise, keeps insights.

Every agent reads and writes: Coordinator, Evaluator, Auditor, Investor, Predictor. Skill entries, evaluation reviews, node failures, investor valuations, auditor diagnoses — all searchable.

Key insight: The KB is the shared memory that makes the system smarter. Without it, every run starts from zero.

LanceDBModernBERTBM25Hybrid searchTwo tiersLibrarian consolidation

Pillar 02

🏢

Agent Resources (AR)

UCB1 scoring · mutate · evolve

AR is the HR department. It stores team patterns — which agents, what roles, what dependencies — and scores them with UCB1, the same math used in Monte Carlo tree search. Proven patterns get exploited. Untried patterns get an exploration bonus.

When the Coordinator plans a run, AR proposes a team. Coordinator can accept, modify, or override. If the Coordinator invents a new pattern that works, it saves back to AR. Patterns mutate, compete, and the best survive.

Per-node performance data: perf_runs, perf_successes, perf_avg_quality. The system builds up a model of which agent types perform well on which problem classes.

Key insight: The way a team is organized determines outcomes as much as individual agent quality. AR learns which team structures work for which problem types.

UCB1Pattern evolutionTeam mutationPer-node performance

Pillar 03

🔄

Auditor ↔ Investor

Oversight pair · prevents safe-play loops

Auditor is an event-driven observer. It watches every event — node failures, output growth, cost burn, repeated errors. It holds a mailbox (inbox.jsonl) where users and the Investor send messages. At each replan, Auditor delivers a consolidated summary to the Coordinator.

Investor reads everything — KB, AR, episodic memory, budget — and makes one cheap Haiku call to judge the whole picture. It outputs a pressure signal:

explore — try new approaches, we don't know what works yet
exploit — keep going with what's working
deliver — time's running out, produce output from what we have
reboot — current approach is stuck, need a completely fresh plan

They wake each other. Auditor triggers Investor after each attempt. Investor pushes signals back through Auditor. Together they prevent the system from grinding in a safe-play loop.

Key insight: Without external pressure, any optimization system converges on the safest strategy — even when it's failing. The Auditor+Investor pair is the immune system against convergence.

Event-drivenMailboxexplore/exploit/deliver/rebootEscalation

Pillar 04

⚖️

Evaluator

Git diff truth · sub-evaluators · quality 1–5

After execution, the Evaluator scores the output (quality 1–5) and the approach (was the pipeline sane?). For complex output, it spawns sub-evaluators — specialists that each check one aspect independently.

The Evaluator uses git diff as its source of truth — not stale file snapshots. This means it sees exactly what changed in this run, not what might have been there before. Execution errors (_error_ entries from the graph) are visible to the Evaluator.

If can_improve is true, the system iterates. Reviews go to KB. Next time a similar problem comes in, the Coordinator reads past evaluations and knows what the Evaluator penalized.

Key insight: An honest evaluator is the foundation of a self-improving system. Quality 0 = FAIL. Gentle scores on bad outputs corrupt the learning signal.

Git diffSub-evaluatorscan_improve looprecommend_dry_run

Pillar 05

⚙️

Unified Engine

_solve_pipeline · 13 features · one path

Every strategy — pipeline, iterations, decomposition, sub-problems — runs through one function: _solve_pipeline. There are no separate code paths. Decomposition is just pipeline nodes with sub_solve archetype. Iterations are small pipelines with a replan loop.

13 features integrated into one engine: KB queries, simulation, auditor events, RCA, blame tracking, checkpoints, status updates, cost tracking, timeout management, predictions, improvement loops, output assembly, and learning writes.

The Coordinator can invent agent roles that don't exist yet — unknown archetypes are handled gracefully. The engine doesn't care what the agent is called, only what it produces.

Key insight: A unified engine means every feature is available to every strategy. There's no "limited mode" for decomposed problems or sub-tasks.

Single function13 featuresBlame retryUnknown archetypes

Pillar 06

📂

File-Based Monitoring

Files are the API

The entire system state is expressed as files in .nightshift/:

status.json — full run state: nodes, dependencies, descriptions, costs, phase, current output
events.jsonl — append-only event stream: every node start, completion, failure, and attempt
inbox.jsonl — the Auditor's mailbox: user messages and Investor signals, all flowing through the same path
control — control signals: stop, output-now

Any tool reads these files. The dashboard is just an HTML page polling status.json. No server, no WebSocket, no API contract. The file protocol stays the same whether the backend is a local disk, a database, or cloud storage.

Key insight: File interfaces are the most durable integration surface. Any tool, any language, any platform can integrate by reading a JSON file.

status.jsonevents.jsonlinbox.jsonlZero servers

Pillar 07

📈

Self-Learning

Multiple feedback loops · UCB1 everywhere

The system learns at multiple levels simultaneously:

Episodic memory: records each run's strategy, cost, quality, and criteria met/unmet. UCB1 uses this to score strategies.
KB: evaluator feedback, node failures, auditor diagnoses, investor valuations — all searchable next run.
AR: pattern success/failure updates UCB scores. Better teams get proposed more often.
Predictor: before each node runs, queries KB for past failures on similar nodes. Flags risks in advance.
Librarian: after each run, consolidates KB — merges related entries, drops noise, keeps actionable insights.

Key insight: Run 50 is fundamentally different from run 1. The knowledge compounds — strategies, patterns, domain facts, known pitfalls.

Episodic memoryUCB1 scoringPredictorLibrarian

Pillar 08

🎲

Exploration

High uncertainty → high risk appetite

Without exploration pressure, any system defaults to what worked before — even when it's not working anymore. The Investor prevents this.

High uncertainty (new problem type, no KB entries, untried patterns) leads to high risk appetite. The Investor reads the full picture and pushes: "3 safe attempts failed with the same pattern — try something radically different." Valuations are saved to KB, so future Investors learn from past speculation.

AR's UCB exploration bonus works at the pattern level: untried team configurations get a mathematical boost, ensuring the system doesn't converge on one approach forever.

Key insight: Exploration isn't random. It's calibrated. High uncertainty = explore more. Low uncertainty = exploit what works. The same math as bandit algorithms.

Uncertainty scoringRisk appetiteUCB exploration bonusValuations to KB

Pillar 09

🔌

External Integration

Human hints = algorithmic signals

User messages flow through the same path as Investor signals: into the Auditor inbox, consolidated into the summary, delivered to the Coordinator at next replan. The system doesn't distinguish between human hints and algorithmic pressure — both are just signals that inform the next decision.

nightshift inject "try a different approach" — message goes to Auditor inbox
nightshift stop — writes to control file, triggers graceful stop
nightshift status — reads status.json, shows current state

Key insight: An open system is more powerful than a closed one. Users and external tools can always influence a run in progress — not just at the start.

injectstopstatusExternal signals

← Prev Getting Started Next → CLI Reference