Architecture
NightShift is built on 9 pillars that form one unified system. Every feature flows through a single engine. No special code paths. No hardcoded behavior.
System Overview
At a high level, NightShift takes a problem description, builds a team of agents using learned patterns, executes the plan, evaluates the result, and writes everything it learned back to the Knowledge Base.
Design Principles
- Zero hardcoding โ no hardcoded agents, thresholds, or task-type gates. The Coordinator invents roles. The system doesn't care what an agent is called, only what it produces.
- Honest evaluation โ git diff is the source of truth, not stale file state. Quality 0 = FAIL, not "needs improvement."
- File interfaces โ any tool integrates by reading files. No server required for monitoring.
- Oversight pair โ Auditor+Investor prevent stuck loops. They wake each other and escalate naturally.
- Output is sacred โ the system never finishes empty. Even a partial run produces something.
- Patterns evolve โ AR mutates, competes, skills accumulate. The system gets better at the class of problems you run.
The 9 Pillars
Every run produces knowledge: strategies that worked, errors encountered, domain facts discovered. The KB stores all of it using LanceDB with ModernBERT contextual embeddings and hybrid BM25 + vector search that returns results in 3ms.
Two tiers: global (~/.nightshift/kb/) holds cross-project knowledge; local (.nightshift/kb/) holds project-specific facts. After each run, the Librarian (Claude Haiku) consolidates โ merges duplicates, drops noise, keeps insights.
Every agent reads and writes: Coordinator, Evaluator, Auditor, Investor, Predictor. Skill entries, evaluation reviews, node failures, investor valuations, auditor diagnoses โ all searchable.
Key insight: The KB is the shared memory that makes the system smarter. Without it, every run starts from zero.
AR is the HR department. It stores team patterns โ which agents, what roles, what dependencies โ and scores them with UCB1, the same math used in Monte Carlo tree search. Proven patterns get exploited. Untried patterns get an exploration bonus.
When the Coordinator plans a run, AR proposes a team. Coordinator can accept, modify, or override. If the Coordinator invents a new pattern that works, it saves back to AR. Patterns mutate, compete, and the best survive.
Per-node performance data: perf_runs, perf_successes, perf_avg_quality. The system builds up a model of which agent types perform well on which problem classes.
Key insight: The way a team is organized determines outcomes as much as individual agent quality. AR learns which team structures work for which problem types.
Auditor is an event-driven observer. It watches every event โ node failures, output growth, cost burn, repeated errors. It holds a mailbox (inbox.jsonl) where users and the Investor send messages. At each replan, Auditor delivers a consolidated summary to the Coordinator.
Investor reads everything โ KB, AR, episodic memory, budget โ and makes one cheap Haiku call to judge the whole picture. It outputs a pressure signal:
exploreโ try new approaches, we don't know what works yetexploitโ keep going with what's workingdeliverโ time's running out, produce output from what we haverebootโ current approach is stuck, need a completely fresh plan
They wake each other. Auditor triggers Investor after each attempt. Investor pushes signals back through Auditor. Together they prevent the system from grinding in a safe-play loop.
Key insight: Without external pressure, any optimization system converges on the safest strategy โ even when it's failing. The Auditor+Investor pair is the immune system against convergence.
After execution, the Evaluator scores the output (quality 1โ5) and the approach (was the pipeline sane?). For complex output, it spawns sub-evaluators โ specialists that each check one aspect independently.
The Evaluator uses git diff as its source of truth โ not stale file snapshots. This means it sees exactly what changed in this run, not what might have been there before. Execution errors (_error_ entries from the graph) are visible to the Evaluator.
If can_improve is true, the system iterates. Reviews go to KB. Next time a similar problem comes in, the Coordinator reads past evaluations and knows what the Evaluator penalized.
Key insight: An honest evaluator is the foundation of a self-improving system. Quality 0 = FAIL. Gentle scores on bad outputs corrupt the learning signal.
Every strategy โ pipeline, iterations, decomposition, sub-problems โ runs through one function: _solve_pipeline. There are no separate code paths. Decomposition is just pipeline nodes with sub_solve archetype. Iterations are small pipelines with a replan loop.
13 features integrated into one engine: KB queries, simulation, auditor events, RCA, blame tracking, checkpoints, status updates, cost tracking, timeout management, predictions, improvement loops, output assembly, and learning writes.
The Coordinator can invent agent roles that don't exist yet โ unknown archetypes are handled gracefully. The engine doesn't care what the agent is called, only what it produces.
Key insight: A unified engine means every feature is available to every strategy. There's no "limited mode" for decomposed problems or sub-tasks.
The entire system state is expressed as files in .nightshift/:
status.jsonโ full run state: nodes, dependencies, descriptions, costs, phase, current outputevents.jsonlโ append-only event stream: every node start, completion, failure, and attemptinbox.jsonlโ the Auditor's mailbox: user messages and Investor signals, all flowing through the same pathcontrolโ control signals:stop,output-now
Any tool reads these files. The dashboard is just an HTML page polling status.json. No server, no WebSocket, no API contract. The file protocol stays the same whether the backend is a local disk, a database, or cloud storage.
Key insight: File interfaces are the most durable integration surface. Any tool, any language, any platform can integrate by reading a JSON file.
The system learns at multiple levels simultaneously:
- Episodic memory: records each run's strategy, cost, quality, and criteria met/unmet. UCB1 uses this to score strategies.
- KB: evaluator feedback, node failures, auditor diagnoses, investor valuations โ all searchable next run.
- AR: pattern success/failure updates UCB scores. Better teams get proposed more often.
- Predictor: before each node runs, queries KB for past failures on similar nodes. Flags risks in advance.
- Librarian: after each run, consolidates KB โ merges related entries, drops noise, keeps actionable insights.
Key insight: Run 50 is fundamentally different from run 1. The knowledge compounds โ strategies, patterns, domain facts, known pitfalls.
Without exploration pressure, any system defaults to what worked before โ even when it's not working anymore. The Investor prevents this.
High uncertainty (new problem type, no KB entries, untried patterns) leads to high risk appetite. The Investor reads the full picture and pushes: "3 safe attempts failed with the same pattern โ try something radically different." Valuations are saved to KB, so future Investors learn from past speculation.
AR's UCB exploration bonus works at the pattern level: untried team configurations get a mathematical boost, ensuring the system doesn't converge on one approach forever.
Key insight: Exploration isn't random. It's calibrated. High uncertainty = explore more. Low uncertainty = exploit what works. The same math as bandit algorithms.
User messages flow through the same path as Investor signals: into the Auditor inbox, consolidated into the summary, delivered to the Coordinator at next replan. The system doesn't distinguish between human hints and algorithmic pressure โ both are just signals that inform the next decision.
nightshift inject "try a different approach"โ message goes to Auditor inboxnightshift stopโ writes to control file, triggers graceful stopnightshift statusโ reads status.json, shows current state
Key insight: An open system is more powerful than a closed one. Users and external tools can always influence a run in progress โ not just at the start.