Open Source · AGPL-3.0 · 665 tests passing

Describe the problem.
It solves — and remembers.

NightShift is an autonomous AI problem solver with a persistent memory. Every run feeds its Knowledge Base. Run 50 is fundamentally faster and cheaper than run 1.

9 Core pillars
3ms KB query time
665 Tests passing
17.5K Lines of code
problem.yaml
title: SWOT analysis for EV conversion startup
description: |
  Research the market for converting ICE cars
  to electric. Find real market data.
acceptance:
  - Real citations, not summaries
  - Executive summary included
max_budget_usd: 3.0
terminal
# run it
nightshift solve problem.yaml

# watch live
nightshift status

# hint mid-run
nightshift inject "try a different approach"

✓ Run complete · Quality: 4/5 · Cost: $0.42
✓ 12 KB entries written · patterns updated

Every AI tool today
is stateless.

Devin forgets your codebase. Cursor doesn't remember yesterday's bugs. Every session starts from zero. NightShift breaks this pattern.

Capability Devin · Cursor · Copilot NightShift
Learns between runs —  forgets everything Knowledge Base — LanceDB hybrid search, 3ms
Strategy from experience —  same prompts, same results AR patterns — UCB1 scoring, mutate & evolve
Non-code tasks —  coding only Any task — research, analysis, SWOT, design
Smart retry on failure —  full restart Blame-driven — only failed agents re-run
User input mid-run —  must restart Auditor inbox — inject hints anytime
Self-evaluation —  basic or none Sub-evaluators — spawn specialists per dimension
Exploration drive —  safe-play loops forever Investor — prevents repetitive failure patterns

Watch the team work

Five specialist agents coordinate on every run. See how they collaborate, monitor, evaluate, and learn — all from a single problem description.

NightShift · autonomous run
Problem
Agent Team
🎯
Coordinator
CEO · Planner
waiting for problem...
👁
Auditor
Observer · Mailbox
monitoring...
💡
Investor
VC · Exploration pressure
assessing conditions...
Dynamic Team
🔍
Researcher
spawned by Coordinator · AR pattern #7
pending...
📊
Analyst
spawned by Coordinator
pending...
✍️
Writer
spawned by Coordinator
pending...
⚖️
Evaluator
Judge · Quality 1–5
waiting for output...
📚
Librarian
KB consolidation · Haiku
waiting...
Output
waiting for run... coordinator: plan ready kb: 3 relevant entries found team: researcher, analyst, writer ar: using pattern #7 (score 0.81) researcher: 14 sources collected auditor: anomalies=0, cost=$0.12 analyst: SWOT matrix complete investor: signal → exploit writer: 1,847 words generated evaluator: quality=4/5 ✓ librarian: 12 KB entries written ───────────────────────── ✓ run complete cost=$0.42 · 2m 14s next run: patterns updated

Nine pillars.
One engine.

Every feature flows through a single, unified engine. No special code paths. No hidden gates. One architecture that handles every problem type.

Pillar 01
🧠
Knowledge Base

Every run produces knowledge: strategies, errors, domain facts. Stored in LanceDB with ModernBERT embeddings. Two tiers: global (~/.nightshift/kb) and project-local. Hybrid BM25 + vector search, 3ms.

LanceDB ModernBERT BM25 3ms queries
Pillar 02
🏢
Agent Resources

Team patterns with UCB1 scoring — the same math as Monte Carlo tree search. Proven patterns get exploited, untried patterns get an exploration bonus. Patterns mutate, compete, and evolve.

UCB1 Pattern evolution Team mutation
Pillar 03
🔄
Auditor ↔ Investor

Auditor watches every event — failures, cost burn, repeated errors. Investor reads the full picture and pushes exploration signals: explore, exploit, deliver, reboot. Together they prevent safe-play loops.

Event-driven Exploration pressure Mailbox
Pillar 04
⚖️
Evaluator

Scores output quality (1–5) and approach sanity. Spawns sub-evaluators for complex output. Git diff is the source of truth — not stale file state. Quality 0 means FAIL, not "needs improvement."

Git diff truth Sub-evaluators can_improve loop
Pillar 05
⚙️
Unified Engine

Every strategy — pipeline, iterations, decomposition, sub-problems — runs through one function: _solve_pipeline. 13 features integrated: KB, simulation, auditor, blame, checkpoints, cost tracking…

No special paths 13 features Blame retry
Pillar 06
📂
File Monitoring

Files are the API. status.json has full state. events.jsonl is the event stream. inbox.jsonl is the mailbox. Any tool integrates — no server, no WebSocket, no API contract.

status.json events.jsonl Zero servers
Pillar 07
📈
Self-Learning

Episodic memory scores strategies with UCB1. KB captures evaluator reviews, node failures, investor valuations. Predictor queries KB before each node runs to flag known risks. Librarian consolidates after each run.

UCB1 scoring Predictor Librarian
Pillar 08
🎲
Exploration

Without exploration pressure, any system defaults to what worked before — even when it's failing. High uncertainty → high risk appetite → bold moves. UCB exploration bonus at the pattern level ensures novelty.

Uncertainty scoring Risk appetite UCB bonus
Pillar 09
🔌
External Integration

User messages flow through the same path as Investor signals — into the Auditor inbox, delivered at next replan. The system doesn't distinguish human hints from algorithmic pressure. Both are just signals.

inject stop status External signals

Every agent reads & writes
the same shared memory.

User (inject / stop / status)
Auditor (mailbox + observer) ←→ Investor (valuation + pressure)
└─────→ summary() ──→ Coordinator (planner)
AR (proposes teams from learned patterns)
_solve_pipeline (one engine, all strategies)
Evaluator (honest judge — git diff is truth)
KB (shared memory ← everyone reads/writes)

Run 50 is different
from run 1.

As the Knowledge Base grows, every metric improves. NightShift doesn't just get better at your specific problem — it gets better at the entire class of problems you run.

Cost per run
Run 1
$1.80
$1.80
Run 10
$0.95
$0.95
Run 25
$0.54
$0.54
Run 50
$0.31
$0.31
Output quality (1–5)
Run 1
2.1
2.1
Run 10
3.2
3.2
Run 25
3.9
3.9
Run 50
4.4
4.4
🧠

KB grows with every run

Successful strategies, error patterns, domain facts — all indexed with hybrid search. The Coordinator reads relevant history before planning.

🏆

AR evolves better team patterns

Patterns with high quality scores get proposed more often. Patterns that fail repeatedly get mutated. The gene pool improves.

🎯

Predictor flags known risks

Before each node runs, the Predictor queries KB for past failures on similar nodes. You don't repeat the same mistakes.

📚

Librarian keeps KB clean

After each run, Librarian (Claude Haiku) consolidates — merges duplicates, drops noise, keeps actionable insights. Signal stays high.

Pay for what
you actually use.

Self-host for free, or use the cloud with usage-based billing. No seat licenses. No arbitrary rate limits. You pay when knowledge is created.

Open Source
Self-Hosted
Free
forever · AGPL-3.0
  • Full 9-pillar architecture
  • Local LanceDB knowledge base
  • Unlimited runs on your hardware
  • CLI tools (solve, run, status, inject)
  • File-based monitoring API
  • Community support
Enterprise
Team
Custom
contact us
  • Everything in Pro
  • Shared team Knowledge Base
  • Private deployment (your infra)
  • SSO / SAML
  • Custom budget controls
  • SLA + dedicated support
  • Volume pricing

How usage-based pricing works: KB queries happen when agents search the Knowledge Base (typically 2–8 per run). Runs are billed on completion. A typical $3 problem YAML produces $0.20–$0.60 in platform cost on top of your LLM API usage.

Open source · Free to self-host

Start solving problems
that remember what worked.

Join developers using NightShift to solve complex problems autonomously — and watch performance compound across every run.