Open Source · AGPL-3.0 · 665 tests passing

Describe the problem.
It solves — and remembers.

NightShift is an autonomous AI problem solver with a persistent memory. Every run feeds its Knowledge Base. Run 50 is fundamentally faster and cheaper than run 1.

Get Started ★ Star on GitHub

9 Core pillars

3ms KB query time

665 Tests passing

17.5K Lines of code

title: SWOT analysis for EV conversion startup
description: |
  Research the market for converting ICE cars
  to electric. Find real market data.
acceptance:
  - Real citations, not summaries
  - Executive summary included
max_budget_usd: 3.0

# run it
nightshift solve problem.yaml

# watch live
nightshift status

# hint mid-run
nightshift inject "try a different approach"

✓ Run complete · Quality: 4/5 · Cost: $0.42
✓ 12 KB entries written · patterns updated

The Problem

Every AI tool today
is stateless.

Devin forgets your codebase. Cursor doesn't remember yesterday's bugs. Every session starts from zero. NightShift breaks this pattern.

Capability	Devin · Cursor · Copilot	NightShift
Learns between runs	— forgets everything	✓ Knowledge Base — LanceDB hybrid search, 3ms
Strategy from experience	— same prompts, same results	✓ AR patterns — UCB1 scoring, mutate & evolve
Non-code tasks	— coding only	✓ Any task — research, analysis, SWOT, design
Smart retry on failure	— full restart	✓ Blame-driven — only failed agents re-run
User input mid-run	— must restart	✓ Auditor inbox — inject hints anytime
Self-evaluation	— basic or none	✓ Sub-evaluators — spawn specialists per dimension
Exploration drive	— safe-play loops forever	✓ Investor — prevents repetitive failure patterns

Live Demo

Watch the team work

Five specialist agents coordinate on every run. See how they collaborate, monitor, evaluate, and learn — all from a single problem description.

Problem

Agent Team

🎯

Coordinator

CEO · Planner

waiting for problem...

👁

Auditor

Observer · Mailbox

monitoring...

💡

Investor

VC · Exploration pressure

assessing conditions...

Dynamic Team

🔍

Researcher

spawned by Coordinator · AR pattern #7

pending...

📊

Analyst

spawned by Coordinator

pending...

✍️

Writer

spawned by Coordinator

pending...

⚖️

Evaluator

Judge · Quality 1–5

waiting for output...

📚

Librarian

KB consolidation · Haiku

waiting...

Output

waiting for run... coordinator: plan ready kb: 3 relevant entries found team: researcher, analyst, writer ar: using pattern #7 (score 0.81) researcher: 14 sources collected auditor: anomalies=0, cost=$0.12 analyst: SWOT matrix complete investor: signal → exploit writer: 1,847 words generated evaluator: quality=4/5 ✓ librarian: 12 KB entries written ───────────────────────── ✓ run complete cost=$0.42 · 2m 14s next run: patterns updated

Architecture

Nine pillars.
One engine.

Every feature flows through a single, unified engine. No special code paths. No hidden gates. One architecture that handles every problem type.

Pillar 01

🧠

Knowledge Base

Every run produces knowledge: strategies, errors, domain facts. Stored in LanceDB with ModernBERT embeddings. Two tiers: global (~/.nightshift/kb) and project-local. Hybrid BM25 + vector search, 3ms.

LanceDB ModernBERT BM25 3ms queries

Pillar 02

🏢

Agent Resources

Team patterns with UCB1 scoring — the same math as Monte Carlo tree search. Proven patterns get exploited, untried patterns get an exploration bonus. Patterns mutate, compete, and evolve.

UCB1 Pattern evolution Team mutation

Pillar 03

🔄

Auditor ↔ Investor

Auditor watches every event — failures, cost burn, repeated errors. Investor reads the full picture and pushes exploration signals: explore, exploit, deliver, reboot. Together they prevent safe-play loops.

Event-driven Exploration pressure Mailbox

Pillar 04

⚖️

Evaluator

Scores output quality (1–5) and approach sanity. Spawns sub-evaluators for complex output. Git diff is the source of truth — not stale file state. Quality 0 means FAIL, not "needs improvement."

Git diff truth Sub-evaluators can_improve loop

Pillar 05

⚙️

Unified Engine

Every strategy — pipeline, iterations, decomposition, sub-problems — runs through one function: _solve_pipeline. 13 features integrated: KB, simulation, auditor, blame, checkpoints, cost tracking…

No special paths 13 features Blame retry

Pillar 06

📂

File Monitoring

Files are the API. status.json has full state. events.jsonl is the event stream. inbox.jsonl is the mailbox. Any tool integrates — no server, no WebSocket, no API contract.

status.json events.jsonl Zero servers

Pillar 07

📈

Self-Learning

Episodic memory scores strategies with UCB1. KB captures evaluator reviews, node failures, investor valuations. Predictor queries KB before each node runs to flag known risks. Librarian consolidates after each run.

UCB1 scoring Predictor Librarian

Pillar 08

🎲

Exploration

Without exploration pressure, any system defaults to what worked before — even when it's failing. High uncertainty → high risk appetite → bold moves. UCB exploration bonus at the pattern level ensures novelty.

Uncertainty scoring Risk appetite UCB bonus

Pillar 09

🔌

External Integration

User messages flow through the same path as Investor signals — into the Auditor inbox, delivered at next replan. The system doesn't distinguish human hints from algorithmic pressure. Both are just signals.

inject stop status External signals

How it connects

Every agent reads & writes
the same shared memory.

User (inject / stop / status)

↓

Auditor (mailbox + observer) ←→ Investor (valuation + pressure)

↓ ↓

└─────→ summary() ──→ Coordinator (planner)

↓

AR (proposes teams from learned patterns)

↓

_solve_pipeline (one engine, all strategies)

↓

Evaluator (honest judge — git diff is truth)

↓

KB (shared memory ← everyone reads/writes)

The Learning Effect

Run 50 is different
from run 1.

As the Knowledge Base grows, every metric improves. NightShift doesn't just get better at your specific problem — it gets better at the entire class of problems you run.

Cost per run

Run 1

$1.80

Run 10

$0.95

Run 25

$0.54

Run 50

$0.31

Output quality (1–5)

Run 1

2.1

Run 10

3.2

Run 25

3.9

Run 50

4.4

🧠

KB grows with every run

Successful strategies, error patterns, domain facts — all indexed with hybrid search. The Coordinator reads relevant history before planning.

🏆

AR evolves better team patterns

Patterns with high quality scores get proposed more often. Patterns that fail repeatedly get mutated. The gene pool improves.

🎯

Predictor flags known risks

Before each node runs, the Predictor queries KB for past failures on similar nodes. You don't repeat the same mistakes.

📚

Librarian keeps KB clean

After each run, Librarian (Claude Haiku) consolidates — merges duplicates, drops noise, keeps actionable insights. Signal stays high.

Pricing

Pay for what
you actually use.

Self-host for free, or use the cloud with usage-based billing. No seat licenses. No arbitrary rate limits. You pay when knowledge is created.

Open Source

Self-Hosted

Free

forever · AGPL-3.0

Full 9-pillar architecture
Local LanceDB knowledge base
Unlimited runs on your hardware
CLI tools (solve, run, status, inject)
File-based monitoring API
Community support

Get on GitHub

Cloud

Pro

Usage

based pricing

$0.003 / KB query · $0.05 / run

Everything in Self-Hosted
Managed LanceDB (no setup)
Cross-project Knowledge Base
Automatic Librarian consolidation
Run history & analytics
Priority support
Free tier: 50 runs / 500 KB queries

Join Waitlist

Enterprise

Team

Custom

Everything in Pro
Shared team Knowledge Base
Private deployment (your infra)
SSO / SAML
Custom budget controls
SLA + dedicated support
Volume pricing

How usage-based pricing works: KB queries happen when agents search the Knowledge Base (typically 2–8 per run). Runs are billed on completion. A typical $3 problem YAML produces $0.20–$0.60 in platform cost on top of your LLM API usage.

Open source · Free to self-host

Start solving problems
that remember what worked.

Join developers using NightShift to solve complex problems autonomously — and watch performance compound across every run.

Read the Docs ★ Star on GitHub Join Community

Describe the problem. It solves — and remembers.

Every AI tool todayis stateless.

Watch the team work

Nine pillars.One engine.

Every agent reads & writesthe same shared memory.

Run 50 is differentfrom run 1.

KB grows with every run

AR evolves better team patterns

Predictor flags known risks

Librarian keeps KB clean

Pay for whatyou actually use.

Start solving problemsthat remember what worked.

Describe the problem.
It solves — and remembers.

Every AI tool today
is stateless.

Nine pillars.
One engine.

Every agent reads & writes
the same shared memory.

Run 50 is different
from run 1.

Pay for what
you actually use.

Start solving problems
that remember what worked.