The full engine — every kind of work, proven by morning

Your whole backlog,
worked overnight.

Bugs fixed. Exploits closed. Hot paths made faster and cheaper. Messy modules refactored. Eval scores raised. New features built as ready-to-pick variants. Every one implemented and proven while you sleep — never merged without you.

0
experiments run
0
verified wins surfaced
0
discarded by the skeptic
$0.00
spent in the trough

A single night on one repo. 0 merged — every win is a branch waiting for your review.

One morning. Every kind of win.

This morning, while you slept
7:14 AM · 1 repo
crash fixedcheckout 500 on empty cart

repro failed on baseline → passes now · full suite green

IDOR closed/orders/:id leaked other users’ data

exploit succeeded → blocked · authz test added · nothing else broke

4.2× fasterdashboard analytics query

p95 820ms → 195ms over 1,000 runs · output identical (equivalence guard)

−38% $/reqsummarizer LLM pipeline

$0.0042 → $0.0026 · responses equivalent on 200 sampled inputs

−41% complexityauth module untangled

behavior pinned by 60 characterization tests · identical after

0.71 → 0.86invoice field extraction

scored against your 200 labelled examples · +0.15 accuracy

3 variantsdark mode + mobile layout

builds · e2e green · breaks nothing · live previews ready

skipped“make onboarding feel snappier”

no objective test to prove it — surfaced as a note, not a claim

7 verified · 1 needs your pick · 1 deferred · 0 merged — your call on all of it.

When there is no objective “better”

It builds working variants. You pick.

For features and UI, “correct” is taste, not a test. So Night Shift certifies what it can — it builds, runs, e2e passes, breaks nothing — and hands you several finished options with live previews. Ship one, or keep all three behind a flag.

preview · variant-a
Variant A
minimal · system fonts
e2e green
preview · variant-b
Variant B
high-contrast · bold
e2e green
preview · variant-c
Variant C
soft · rounded cards
e2e green

Many attempts. Only survivors surface.

Each hypothesis is tried several ways in parallel worktrees. The skeptic refutes what it can — gamed tests, masked regressions, fixes that don’t depend on the change — and only the proven few reach your digest.

verified → surfaced refuted → discarded

The covering library

Every category is a fixed, human-vetted test

A category is its answer to “what proves it worked?” Five fitness families cover the work; the engine only routes to them — it never invents the bar it grades against.

AAssertion fix
bug-fix · security · correctness
fitness: a test that fails on baseline passes after the fix
Verdict
BMeasured improvement
performance · cost
fitness: statistically-significant metric gain, behavior unchanged
Verdict
CBehavior-locked transform
refactor
fitness: behavior identical (characterization tests), complexity down
Verdict
DEval score
quality · prompts · pipelines
fitness: scores higher against your eval set
Verdict*
EWorks + variants
feature · frontend / UI
fitness: builds, runs, e2e passes, breaks nothing — N options
Variant

*Verdict when you have (or we can infer) an eval set; otherwise it degrades gracefully to variant mode.

Why you can trust the digest

Built to under-promise

It never merges

Every win lands as a branch (and a preview, for UI). Night Shift proposes; you decide what ships. Your main branch is never touched.

It never overclaims

No test, no claim. A green metric means a real measurement on a real workload — and “deferred” is stated plainly, not dressed up.

Quiet day, quiet night

Nothing to do → nothing runs → you pay nothing. A whole night of this still costs cents, because it runs in the trough.

Powered by InferRoute

An entire backlog, tested overnight, for the price of a coffee.

The more kinds of work it does, the more inference it burns — and overnight work is latency-insensitive, so InferRoute runs it in the cheapest trough capacity (~2.5× cheaper). That cost floor is what makes testing dozens of hypotheses a night economical instead of absurd.

Route your daytime Claude Code too Same engine, best price per model — day and night.

Hand it the backlog.
Review the wins at breakfast.

Bugs, speed, cost, refactors, eval scores, whole features — proven overnight, surfaced as branches and previews. You stay the one who decides.

Night Shift — by InferRoute