Your whole backlog,
worked overnight.
Bugs fixed. Exploits closed. Hot paths made faster and cheaper. Messy modules refactored. Eval scores raised. New features built as ready-to-pick variants. Every one implemented and proven while you sleep — never merged without you.
A single night on one repo. 0 merged — every win is a branch waiting for your review.
One morning. Every kind of win.
repro failed on baseline → passes now · full suite green
exploit succeeded → blocked · authz test added · nothing else broke
p95 820ms → 195ms over 1,000 runs · output identical (equivalence guard)
$0.0042 → $0.0026 · responses equivalent on 200 sampled inputs
behavior pinned by 60 characterization tests · identical after
scored against your 200 labelled examples · +0.15 accuracy
builds · e2e green · breaks nothing · live previews ready
no objective test to prove it — surfaced as a note, not a claim
When there is no objective “better”
It builds working variants. You pick.
For features and UI, “correct” is taste, not a test. So Night Shift certifies what it can — it builds, runs, e2e passes, breaks nothing — and hands you several finished options with live previews. Ship one, or keep all three behind a flag.
Many attempts. Only survivors surface.
Each hypothesis is tried several ways in parallel worktrees. The skeptic refutes what it can — gamed tests, masked regressions, fixes that don’t depend on the change — and only the proven few reach your digest.
The covering library
Every category is a fixed, human-vetted test
A category is its answer to “what proves it worked?” Five fitness families cover the work; the engine only routes to them — it never invents the bar it grades against.
*Verdict when you have (or we can infer) an eval set; otherwise it degrades gracefully to variant mode.
Why you can trust the digest
Built to under-promise
It never merges
Every win lands as a branch (and a preview, for UI). Night Shift proposes; you decide what ships. Your main branch is never touched.
It never overclaims
No test, no claim. A green metric means a real measurement on a real workload — and “deferred” is stated plainly, not dressed up.
Quiet day, quiet night
Nothing to do → nothing runs → you pay nothing. A whole night of this still costs cents, because it runs in the trough.
An entire backlog, tested overnight, for the price of a coffee.
The more kinds of work it does, the more inference it burns — and overnight work is latency-insensitive, so InferRoute runs it in the cheapest trough capacity (~2.5× cheaper). That cost floor is what makes testing dozens of hypotheses a night economical instead of absurd.
Hand it the backlog.
Review the wins at breakfast.
Bugs, speed, cost, refactors, eval scores, whole features — proven overnight, surfaced as branches and previews. You stay the one who decides.
Night Shift — by InferRoute