Architecture

NightShift is built on 9 pillars that form one unified system. Every feature flows through a single engine. No special code paths. No hardcoded behavior.

System Overview

At a high level, NightShift takes a problem description, builds a team of agents using learned patterns, executes the plan, evaluates the result, and writes everything it learned back to the Knowledge Base.

User (inject / stop / status)
  ↓
Auditor (mailbox + observer) <—> Investor (valuation + pressure)
  ↓                          ↓
  • summary() —→ Coordinator (planner)
                   ↓
                   AR (proposes teams from learned patterns)
                   ↓
                   _solve_pipeline (one engine, all strategies)
                   ↓
                   Evaluator (honest judge — git diff is truth)
                   ↓
                   KB (shared memory ← everyone reads/writes)

Design Principles

The 9 Pillars

Pillar 01
๐Ÿง 
Knowledge Base
LanceDB + ModernBERT ยท hybrid search ยท 3ms

Every run produces knowledge: strategies that worked, errors encountered, domain facts discovered. The KB stores all of it using LanceDB with ModernBERT contextual embeddings and hybrid BM25 + vector search that returns results in 3ms.

Two tiers: global (~/.nightshift/kb/) holds cross-project knowledge; local (.nightshift/kb/) holds project-specific facts. After each run, the Librarian (Claude Haiku) consolidates โ€” merges duplicates, drops noise, keeps insights.

Every agent reads and writes: Coordinator, Evaluator, Auditor, Investor, Predictor. Skill entries, evaluation reviews, node failures, investor valuations, auditor diagnoses โ€” all searchable.

Key insight: The KB is the shared memory that makes the system smarter. Without it, every run starts from zero.

LanceDBModernBERTBM25Hybrid searchTwo tiersLibrarian consolidation
Pillar 02
๐Ÿข
Agent Resources (AR)
UCB1 scoring ยท mutate ยท evolve

AR is the HR department. It stores team patterns โ€” which agents, what roles, what dependencies โ€” and scores them with UCB1, the same math used in Monte Carlo tree search. Proven patterns get exploited. Untried patterns get an exploration bonus.

When the Coordinator plans a run, AR proposes a team. Coordinator can accept, modify, or override. If the Coordinator invents a new pattern that works, it saves back to AR. Patterns mutate, compete, and the best survive.

Per-node performance data: perf_runs, perf_successes, perf_avg_quality. The system builds up a model of which agent types perform well on which problem classes.

Key insight: The way a team is organized determines outcomes as much as individual agent quality. AR learns which team structures work for which problem types.

UCB1Pattern evolutionTeam mutationPer-node performance
Pillar 03
๐Ÿ”„
Auditor ↔ Investor
Oversight pair ยท prevents safe-play loops

Auditor is an event-driven observer. It watches every event โ€” node failures, output growth, cost burn, repeated errors. It holds a mailbox (inbox.jsonl) where users and the Investor send messages. At each replan, Auditor delivers a consolidated summary to the Coordinator.

Investor reads everything โ€” KB, AR, episodic memory, budget โ€” and makes one cheap Haiku call to judge the whole picture. It outputs a pressure signal:

  • explore โ€” try new approaches, we don't know what works yet
  • exploit โ€” keep going with what's working
  • deliver โ€” time's running out, produce output from what we have
  • reboot โ€” current approach is stuck, need a completely fresh plan

They wake each other. Auditor triggers Investor after each attempt. Investor pushes signals back through Auditor. Together they prevent the system from grinding in a safe-play loop.

Key insight: Without external pressure, any optimization system converges on the safest strategy โ€” even when it's failing. The Auditor+Investor pair is the immune system against convergence.

Event-drivenMailboxexplore/exploit/deliver/rebootEscalation
Pillar 04
โš–๏ธ
Evaluator
Git diff truth ยท sub-evaluators ยท quality 1โ€“5

After execution, the Evaluator scores the output (quality 1โ€“5) and the approach (was the pipeline sane?). For complex output, it spawns sub-evaluators โ€” specialists that each check one aspect independently.

The Evaluator uses git diff as its source of truth โ€” not stale file snapshots. This means it sees exactly what changed in this run, not what might have been there before. Execution errors (_error_ entries from the graph) are visible to the Evaluator.

If can_improve is true, the system iterates. Reviews go to KB. Next time a similar problem comes in, the Coordinator reads past evaluations and knows what the Evaluator penalized.

Key insight: An honest evaluator is the foundation of a self-improving system. Quality 0 = FAIL. Gentle scores on bad outputs corrupt the learning signal.

Git diffSub-evaluatorscan_improve looprecommend_dry_run
Pillar 05
โš™๏ธ
Unified Engine
_solve_pipeline ยท 13 features ยท one path

Every strategy โ€” pipeline, iterations, decomposition, sub-problems โ€” runs through one function: _solve_pipeline. There are no separate code paths. Decomposition is just pipeline nodes with sub_solve archetype. Iterations are small pipelines with a replan loop.

13 features integrated into one engine: KB queries, simulation, auditor events, RCA, blame tracking, checkpoints, status updates, cost tracking, timeout management, predictions, improvement loops, output assembly, and learning writes.

The Coordinator can invent agent roles that don't exist yet โ€” unknown archetypes are handled gracefully. The engine doesn't care what the agent is called, only what it produces.

Key insight: A unified engine means every feature is available to every strategy. There's no "limited mode" for decomposed problems or sub-tasks.

Single function13 featuresBlame retryUnknown archetypes
Pillar 06
๐Ÿ“‚
File-Based Monitoring
Files are the API

The entire system state is expressed as files in .nightshift/:

  • status.json โ€” full run state: nodes, dependencies, descriptions, costs, phase, current output
  • events.jsonl โ€” append-only event stream: every node start, completion, failure, and attempt
  • inbox.jsonl โ€” the Auditor's mailbox: user messages and Investor signals, all flowing through the same path
  • control โ€” control signals: stop, output-now

Any tool reads these files. The dashboard is just an HTML page polling status.json. No server, no WebSocket, no API contract. The file protocol stays the same whether the backend is a local disk, a database, or cloud storage.

Key insight: File interfaces are the most durable integration surface. Any tool, any language, any platform can integrate by reading a JSON file.

status.jsonevents.jsonlinbox.jsonlZero servers
Pillar 07
๐Ÿ“ˆ
Self-Learning
Multiple feedback loops ยท UCB1 everywhere

The system learns at multiple levels simultaneously:

  • Episodic memory: records each run's strategy, cost, quality, and criteria met/unmet. UCB1 uses this to score strategies.
  • KB: evaluator feedback, node failures, auditor diagnoses, investor valuations โ€” all searchable next run.
  • AR: pattern success/failure updates UCB scores. Better teams get proposed more often.
  • Predictor: before each node runs, queries KB for past failures on similar nodes. Flags risks in advance.
  • Librarian: after each run, consolidates KB โ€” merges related entries, drops noise, keeps actionable insights.

Key insight: Run 50 is fundamentally different from run 1. The knowledge compounds โ€” strategies, patterns, domain facts, known pitfalls.

Episodic memoryUCB1 scoringPredictorLibrarian
Pillar 08
๐ŸŽฒ
Exploration
High uncertainty → high risk appetite

Without exploration pressure, any system defaults to what worked before โ€” even when it's not working anymore. The Investor prevents this.

High uncertainty (new problem type, no KB entries, untried patterns) leads to high risk appetite. The Investor reads the full picture and pushes: "3 safe attempts failed with the same pattern โ€” try something radically different." Valuations are saved to KB, so future Investors learn from past speculation.

AR's UCB exploration bonus works at the pattern level: untried team configurations get a mathematical boost, ensuring the system doesn't converge on one approach forever.

Key insight: Exploration isn't random. It's calibrated. High uncertainty = explore more. Low uncertainty = exploit what works. The same math as bandit algorithms.

Uncertainty scoringRisk appetiteUCB exploration bonusValuations to KB
Pillar 09
๐Ÿ”Œ
External Integration
Human hints = algorithmic signals

User messages flow through the same path as Investor signals: into the Auditor inbox, consolidated into the summary, delivered to the Coordinator at next replan. The system doesn't distinguish between human hints and algorithmic pressure โ€” both are just signals that inform the next decision.

  • nightshift inject "try a different approach" โ€” message goes to Auditor inbox
  • nightshift stop โ€” writes to control file, triggers graceful stop
  • nightshift status โ€” reads status.json, shows current state

Key insight: An open system is more powerful than a closed one. Users and external tools can always influence a run in progress โ€” not just at the start.

injectstopstatusExternal signals