The full engine — every kind of work, proven by morning

Your whole backlog,
worked overnight.

Bugs fixed. Exploits closed. Hot paths made faster and cheaper. Messy modules refactored. Eval scores raised. New features built as ready-to-pick variants. Every one implemented and proven while you sleep — never merged without you.

Put it to work tonight See a morning digest

experiments run

verified wins surfaced

discarded by the skeptic

$0.00

spent in the trough

A single night on one repo. 0 merged — every win is a branch waiting for your review.

One morning. Every kind of win.

☀ This morning, while you slept

7:14 AM · 1 repo

crash fixed— checkout 500 on empty cart

repro failed on baseline → passes now · full suite green

bug-fixbranch ↗

IDOR closed— /orders/:id leaked other users’ data

exploit succeeded → blocked · authz test added · nothing else broke

securitybranch ↗

4.2× faster— dashboard analytics query

p95 820ms → 195ms over 1,000 runs · output identical (equivalence guard)

performancebranch ↗

−38% $/req— summarizer LLM pipeline

$0.0042 → $0.0026 · responses equivalent on 200 sampled inputs

costbranch ↗

−41% complexity— auth module untangled

behavior pinned by 60 characterization tests · identical after

refactorbranch ↗

0.71 → 0.86— invoice field extraction

scored against your 200 labelled examples · +0.15 accuracy

qualitybranch ↗

3 variants— dark mode + mobile layout

builds · e2e green · breaks nothing · live previews ready

feature · UIpreviews ↗

skipped— “make onboarding feel snappier”

no objective test to prove it — surfaced as a note, not a claim

deferred

7 verified · 1 needs your pick · 1 deferred · 0 merged — your call on all of it.

When there is no objective “better”

It builds working variants. You pick.

For features and UI, “correct” is taste, not a test. So Night Shift certifies what it can — it builds, runs, e2e passes, breaks nothing — and hands you several finished options with live previews. Ship one, or keep all three behind a flag.

preview · variant-a

Variant A

minimal · system fonts

e2e green

preview · variant-b

Variant B

high-contrast · bold

e2e green

preview · variant-c

Variant C

soft · rounded cards

e2e green

Many attempts. Only survivors surface.

Each hypothesis is tried several ways in parallel worktrees. The skeptic refutes what it can — gamed tests, masked regressions, fixes that don’t depend on the change — and only the proven few reach your digest.

verified → surfaced refuted → discarded

The covering library

Every category is a fixed, human-vetted test

A category is its answer to “what proves it worked?” Five fitness families cover the work; the engine only routes to them — it never invents the bar it grades against.

AAssertion fix

bug-fix · security · correctness

fitness: a test that fails on baseline passes after the fix

Verdict

BMeasured improvement

performance · cost

fitness: statistically-significant metric gain, behavior unchanged

Verdict

CBehavior-locked transform

refactor

fitness: behavior identical (characterization tests), complexity down

Verdict

DEval score

quality · prompts · pipelines

fitness: scores higher against your eval set

Verdict*

EWorks + variants

feature · frontend / UI

fitness: builds, runs, e2e passes, breaks nothing — N options

Variant

*Verdict when you have (or we can infer) an eval set; otherwise it degrades gracefully to variant mode.

Why you can trust the digest

Built to under-promise

It never merges

Every win lands as a branch (and a preview, for UI). Night Shift proposes; you decide what ships. Your main branch is never touched.

It never overclaims

No test, no claim. A green metric means a real measurement on a real workload — and “deferred” is stated plainly, not dressed up.

Quiet day, quiet night

Nothing to do → nothing runs → you pay nothing. A whole night of this still costs cents, because it runs in the trough.

An entire backlog, tested overnight, for the price of a coffee.

The more kinds of work it does, the more inference it burns — and overnight work is latency-insensitive, so InferRoute runs it in the cheapest trough capacity (~2.5× cheaper). That cost floor is what makes testing dozens of hypotheses a night economical instead of absurd.

Route your daytime Claude Code too Same engine, best price per model — day and night.

Hand it the backlog.
Review the wins at breakfast.

Bugs, speed, cost, refactors, eval scores, whole features — proven overnight, surfaced as branches and previews. You stay the one who decides.

Put it to work tonight

Night Shift — by InferRoute

Your whole backlog,worked overnight.

One morning. Every kind of win.

It builds working variants. You pick.

Many attempts. Only survivors surface.

Every category is a fixed, human-vetted test

Built to under-promise

It never merges

It never overclaims

Quiet day, quiet night

An entire backlog, tested overnight, for the price of a coffee.

Hand it the backlog.Review the wins at breakfast.

Your whole backlog,
worked overnight.

Hand it the backlog.
Review the wins at breakfast.