The consequences of waiting until the last minute to deal with the database

One common technique for managing database changes when releasing software is to establish a development database in addition to a production database. The development database is often a copy of production from some point in the past and serves as a point of experimentation for changes.

Whether there’s one developer or many, this database becomes the record of truth for the next version of the application.

When the time comes to release the next version of the application, the changes made in the development database have to be migrated to the production database.

Some rely upon memory to do this. Chances are, at least if you want to double check your work, you’ll use a tool to compare the production and development database. The tool will spit out changes. You’ll review the changes, perhaps make adjustments, and then apply them in production.

This approach may work with small projects but quickly grows unruly. Here are a few of the problems:

  • The development database contains what changed, at least once you’ve made a comparison and produced a list of changes, but it doesn’t contain who made the change, why it was made, when it was made and how it was made.
    • This makes it difficult to understand the historical context that guided the current state of the database.
    • Not having this information makes the nature of deciding what changes are necessary even more difficult. Especially with larger teams. It’s possible someone experimented in development and didn’t rollback their changes. It takes time, during the middle of trying to release your application, to do the detective work necessary to figure out what’s necessary.
  • Mistakes will be made and changes that aren’t intended will eventually be released to production. This will cause problems and risk losing production data.
  • While structural (schema) comparisons may seem straight forward and many tools can automate recommended migration scripts, detecting data changes is impossible to automate. You can compare the data, but that’s not going to scale well. You will miss data changes. By data changes, I mean things like splitting the data in a column into two columns. For example, splitting a name field into first and last name. Or, splitting a table to create a one to many relationship. For example, a contact table that has name and address being split so that a contact can have multiple addresses. The development database alone cannot tell you how the change was made. Not only will you likely miss it, even if you notice it, you may not replicate the correct change in production.
  • Because you’re reverse engineering changes, you can’t have much confidence in the result.
  • If you release changes by diffing, chances are you don’t have any tests of the changes. There may be edge cases you fail to replicate when you reverse engineer what changed.
  • It’s difficult to align changes to the database with changes to the application’s code. This makes it difficult to understand how the two pieces work together. The database and application code may diverge over time.
  • Changing the database is painful. The more you do it, the more painful it is.
  • People will fear making changes to the database. The more that’s changed, the more investigative work that will be necessary come time to release. Nobody likes to delay releases. So naturally people will tend not to make changes, even necessary changes like renaming constructs in the database to match the application’s code evolution.
  • People will tend to reuse existing database structures for unintended purposes, just to avoid making changes to the database.
  • You’re duplicating efforts to reverse engineer the changes. This is wasteful. The changes were made at one point in time, not long ago most likely. Someone, perhaps your past self, had the change in their hand at one point.
  • This process inevitably delays releasing your software and it adds an extreme degree of risk at a time when risks should be minimized.
  • Until you reverse engineer the changes, you won’t have anything to test with to find out about problems. Once you have the scripts you can test them with a backup of production, but you’re not getting these scripts until it’s time to release. Perhaps a change doesn’t jive with the data in a production database but works smoothly with the data in a development database. At best you’ll catch problems when you’re releasing. But at that point, why the change was made is likely lost. Ideally you would find out about problems when you were making the changes.
  • The rushed nature of releasing software, because of all the problems that arise, mean the problems will tend to arise after you release because you don’t take the time to be thorough during the release.
  • You’ll be more anxious during a release, and rightfully so. Instead of being excited about delivering valuable new software.
  • You’ll tend to put off releases because of this. That only makes the amount of changes pile up and makes things worse. Fear leads to delay, delay leads to changes piling up, bigger changes increases risk which in turn increases fear. A positive feedback loop emerges (note: positive feedback loops are not a good thing here).
  • Manually applying changes to production databases takes magnitudes longer than automatically applying the changes. It also requires manual verification that the changes were successful. Perhaps running another comparison to make sure things match up and then manually reviewing the results. Another opportunity to miss something.
  • Hopefully you have a backup of your development database.

Many of these problems are similar to the problems you would have if you didn’t use a system of version control to manage your application’s code.