Online Schema Changes Still Need a Rollback Story
MySQL 8.4 LTS and modern migration tools reduce schema-change risk, but they do not remove the need to monitor replication, locks, and application behavior.
Zero downtime is a process, not a flag
Online schema-change tools made it possible to alter busy MySQL tables without stopping the application. That was a major improvement over long blocking migrations, and it still matters in 2026 with MySQL 8.4 LTS and newer LTS lines in production planning.
But "online" does not mean invisible. A migration can still increase replication lag, create lock waits, amplify write load, or expose application code that depends on the old shape of the table.
Watch the database and the app together
The schema change is only one part of the rollout. Monitor the database for lock waits, replication delay, disk growth, temporary table usage, and error rates. Monitor the application for slower requests, failed writes, background job backlog, and elevated 500s.
If the app slows down while the database looks fine, the migration may have changed query plans. If the database slows down while the app still responds, you may be spending the safety margin that will matter during peak traffic.
Keep migrations reversible in practice
Rollback is often treated as a sentence in the deploy notes. It should be a tested path. Before the change starts, decide what happens if you need to stop halfway:
- Can the old application version read the new schema?
- Can the new application version tolerate missing columns or indexes?
- Is the migration tool safe to pause?
- Are replicas allowed to fall behind, and for how long?
- Who decides when to abort?
These answers matter more than the tool name. gh-ost, pt-online-schema-change, native DDL improvements, and managed database helpers all have different failure modes.
Alert on symptoms, not just migration state
A migration dashboard that says "running" is helpful for the person driving the change. It is not enough for the business. Add alerts for customer-facing symptoms: checkout errors, API write failures, login failures, and queue age.
The safest schema changes are boring because they are observable. When the migration changes from a database task into a customer-impacting event, the alert should already know how to say so.