Every team eventually hits the same wall: the database schema needs to change, but the application is live, processing real transactions, and a maintenance window is either politically impossible or operationally unacceptable. The knee-jerk answer is "we'll do it Sunday at 2 AM." The better answer is learning which operations are safe to run hot and which ones require a deliberate sequence.
Here's the playbook we actually use at Romanov Solutions when migrating production PostgreSQL databases — including dental practice management systems running Open Dental, multi-tenant SaaS platforms, and financial data pipelines where even a 30-second lock is a customer complaint.
Why Schema Migrations Lock Tables (and When They Don't)PostgreSQL's MVCC architecture is generous, but DDL statements still acquire AccessExclusiveLock by default. That lock blocks every read and write until the migration completes. On a table with millions of rows, ADD COLUMN DEFAULT now() used to be a full rewrite in Postgres 10 and earlier. Since Postgres 11, adding a column with a volatile default is instant — but only if you know the version you're running and the exact conditions that qualify.
The operations that are genuinely safe with no lock impact: adding a nullable column with no default, creating an index CONCURRENTLY, adding a foreign key with NOT VALID. Everything else deserves scrutiny.
Renaming a column sounds trivial. It is not. A naive ALTER TABLE patients RENAME COLUMN dob TO date_of_birth will break every query, ORM mapping, and stored procedure referencing the old name — simultaneously, the moment it runs.
The expand-contract pattern splits the change across three deployment phases:
date_of_birth) alongside the old one. Write application code that writes to both columns and reads from the new one with a fallback to the old.UPDATE on the full table. Use a loop with LIMIT 5000 and a short sleep between batches to avoid I/O saturation.This pattern adds deployment complexity but eliminates risk. For a 10-million-row appointments table we migrated for a dental group client, the backfill ran over 90 minutes with zero user impact.
Standard CREATE INDEX holds a lock for the entire build duration. On a large table, that's minutes. The fix is one keyword:
CREATE INDEX CONCURRENTLY idx_patients_last_name ON patients (last_name);
Concurrent index builds take roughly twice as long but acquire only a weaker lock that doesn't block normal reads and writes. The tradeoff is worth it in every production scenario we've encountered. One caveat: if the build fails midway, it leaves an invalid index that must be dropped manually before retrying.
Foreign Key Constraints: Add First, Validate LaterAdding a foreign key with ADD CONSTRAINT ... REFERENCES validates the entire table immediately, holding a lock throughout. The two-step alternative:
NOT VALID — this prevents future bad inserts but skips historical validation. Lock duration is milliseconds.VALIDATE CONSTRAINT — this scans existing rows but only holds a ShareUpdateExclusiveLock, which allows concurrent reads and writes.This is especially useful when retrofitting referential integrity onto legacy schemas that were migrated from MySQL or SQL Server without constraints.
Lock Timeouts Are Not OptionalEvery migration script we ship includes this at the top:
SET lock_timeout = '2s'; SET statement_timeout = '30s';
If your migration can't acquire its lock within two seconds, it fails fast rather than queuing behind a long-running transaction and then blocking everything behind it. A failed migration you can retry is better than a silent queue buildup that takes down your connection pool.
Tooling: What We Reach ForFor Python/Django stacks, django-safemigrate adds safety checks at the framework level. For Node and raw SQL workflows, we maintain a lightweight migration runner that enforces lock timeout headers and logs every DDL statement with its execution time to a schema_migrations_audit table — invaluable during post-incident reviews.
For Open Dental integrations specifically, where the underlying MySQL-compatible schema has hundreds of tables with implicit constraints, we layer a read replica promotion strategy: validate the migration against a promoted replica before touching production.
Practical TakeawayBefore your next schema change, classify it: instant-safe, expand-contract required, or concurrent-build eligible. Write that classification as a comment at the top of the migration file. It takes 30 seconds and has saved us from at least a dozen production incidents. If you're not sure which category a change falls into, assume it needs the full expand-contract treatment — the extra deployment cycle costs far less than an unplanned outage.