Rollback reference (deploymill)
deploymill supports image-swap rollback: after a bad deploy, swap the running container back to the image of a previous deploy in seconds — no rebuild required.
How it works
When rollback: true is set in .deploymill/project.json and reconcile_project has been run:
- Reconcile wires up a container registry on the app (using credentials configured on the server).
- It enables rollback recording on the application.
- The next
deploybuilds the image as normal, then pushes it to the registry and creates a rollback record (with arollbackId). list_deploymentsexposes each deploy'srollbackId.rollbackswaps the running image to whicheverrollbackIdyou pass — the platform pulls the image from the registry and restarts the container.
Enabling rollback
- Set
"rollback": truein.deploymill/project.json. - Commit + push.
- Run
reconcile_projectwith the app'sapplicationIdandrepoUrl. Reconcile flips the rollback toggle and configures the container registry. - Run
deploy. This is the first deploy whose image will be available to roll back to. Deploys made before enabling rollback have no captured image.
Requires the container registry to be configured on the deploymill server. If it's not, reconcile fails with a clear error.
Performing a rollback
1. list_deployments({ applicationId }) → find the deploy you want to revert to; copy its rollbackId
2. rollback({ applicationId, rollbackId }) → image swap completes in ~seconds
The swap is non-destructive to the registry — the current image stays available, so you can roll forward again (rollback to the deploy you just rolled away from).
What rollback does NOT do
- Doesn't roll back database migrations. If the deploy you're reverting from added a column or table, the image you're rolling back to will likely throw at startup. Treat rollback as for code-only changes; schema changes need a forward fix or a manual
alembic downgrade/node-pg-migrate down. - Doesn't roll back env vars.
set_env_varsis not history-tracked. If a bad deploy went out with a wrong env var, fix the env var first (set_env_varsagain) — otherwise the rolled-back code will hit the same bad value. - Doesn't roll back data writes. Reverting code that corrupted data only restores the code — the data stays corrupted.
- Doesn't roll back mounts or domain changes. Those are app-level config, not per-deploy.
When to use rollback
- ✅ A code change broke a route or threw an exception at startup. Roll back, fix in a PR, redeploy.
- ✅ A perf regression you can't immediately diagnose. Roll back to buy time.
- ✅ A misconfigured runtime behavior (wrong feature flag default, wrong static file).
- ⚠️ A schema migration broke things. Rollback alone may leave the schema ahead of the code. Usually you need a forward fix + redeploy, or a manual
downgradefirst. - ❌ Data corruption. Restore from backup, not from a rollback record.
Automatic rollback (self-healing)
Set "rollback": "auto" (instead of true) in .deploymill/project.json, commit, reconcile. This enables rollback recording exactly like true, and arms post-deploy self-healing: when a deploy builds and swaps a new image but the health gate comes back unhealthy, deploy automatically reverts to a known-good image — no second tool call from you.
- The health gate is the trigger. Auto-rollback keys off the same health-endpoint contract
deploy/rollback/get_app_healthuse: a deploy whose health endpoint (default/healthz) doesn't return200withinretriesconsecutive attempts is unhealthy. See the health guide (deploymill://guides/health) for the contract, thehealthconfig block, and the strict-vs-lenient / 404-fallback rules. Put your real readiness checks in/healthzso "healthy" means what you need it to mean. - You don't pass anything to
deploy; the intent is persisted byreconcile_projectanddeployacts on it.reconcile'splan.rollback.autoshows whether it's armed. - **Reverts to the last healthy deploy, not just the previous one.**
deployrecords each deploy's health verdict; auto-rollback walks back to the most recent earlier deploy that was recorded healthy (falling back to the most recent earlier image when nothing has a recorded result). So stacking two broken deploys won't land you on the second-broken one. deploy's response carries anautoRollbackobject and anautoRollbackNote:{ attempted: true, toLastHealthy: true, recovered: true, … }→ reverted to the last recorded-healthy image and it recovered. Fix forward in a PR.{ attempted: true, recovered: false, … }→ reverted but it's still unhealthy (degraded). Something broader is wrong (DB, dependency, or the target image is also bad) — investigate withget_logs/get_app_health.{ attempted: false, reason: "no_rollback_point" }→ there was no earlier captured image to revert to (the first deploy after enabling rollback). The bad image is still live; fix forward.
- It triggers only when the new image is live but all edges fail the health gate (a single-domain blip won't trip it), and it reverts at most once (no flapping). Workers have no domains, so auto-rollback never fires for them.
- Orchestrator-level gate (recommended). Declaring a
healthblock also wires the endpoint into the container's Swarm HEALTHCHECK with a start-first / failure-action=rollback rollout, so the platform won't cut over to — or complete the rollout on — an unhealthy new task, and self-heals at the orchestrator layer before our probe even runs. See the health guide.
Disabling rollback
Set "rollback": false in .deploymill/project.json, commit, reconcile. The next deploy won't push to the registry; existing rollback records remain queryable but new ones won't accumulate. Disabling also disarms auto-rollback.
What NOT to do
- Don't
rollbackpast a migration. Check what shipped in the gap; if a migration ran, forward-fix instead.
Troubleshooting
list_deploymentsshows norollbackIdon recent deploys → rollback was never enabled, OR the deploy ran before reconcile turned rollback on. Confirm with adryRun: truereconcile to see the current toggle state.rollbackfails with auth error → registry credentials expired or were rotated on the server. The operator needs to refresh them.- Rolled-back container won't start → likely a schema mismatch. The image is older than the current DB schema. Forward-fix or manual downgrade.