Schema migration, an important part of continuous delivery

Most of you have a relational database involved in your persistence. As soon as your first version has been set in use, you can’t change the database schema as easy anymore or you might lose valuable production data.

At the same time, continuous delivery demands that there should be as few manual steps as possible. See here for motivation on continuous delivery.

You no longer have a few database instance, there are numerous for different levels of testing and every developer nowadays run a personal database. To keep track of all database instances and keep them updated becomes a steep task.

To tackle this, we started using Flyway as a tool to manage our database scripts. Our applications migrates the database automatically on startup so we get a hands-free solution that will guarantee that the code and database schema is in sync.

Here is a technical recipe for accomplishing this when using Flyway in combination with Spring.

I wrote in 2008 about schema versions  and at that time it was not evident that you should control which schema changes had been implemented in a given database. Today, it seems that it is an established low bar. Still, there is a lot of manual work involved and back then we only prevented mistakes.

The next step is to remove the manual steps that require understanding the relationship between code version and schema version.

Our recipe for Flyway and Spring involves two steps:

  1. A Spring bean that calls the Flyway API to invoke the migration.
  2. A Spring configuration that will inject a Datasource required to use Flyway.

Flyway will not do anything magic to your schema. It is focused on running your SQL scripts and Java migration logic in a controlled manner, nothing more. Therefore, you will also need to write migration scripts and name them “Vn__description.sql” where n is an integer.

The Spring bean looks like this (note that it is not annotated as a bean):

package our.service;

import com.googlecode.flyway.core.Flyway;
import javax.annotation.PostConstruct;
import javax.inject.Inject;
import javax.sql.DataSource;

public class MigrationService {

  private final DataSource dataSource;

  @Inject
  public MigrationService(DataSource dataSource) {
    this.dataSource = dataSource;
  }

  @PostConstruct
  public void migrate() {
    Flyway flyway = new Flyway();
    flyway.setDataSource(dataSource);
    flyway.migrate();
  }
}

The Spring Configuration:

    <bean id="dataSource" class="org.springframework.jndi.JndiObjectFactoryBean">
        <property name="jndiName" value="java:comp/env/jdbc/SdgDS"/>
    </bean>

    <bean id="migration" class="our.service.MigrationService">
        <constructor-arg ref="dataSource"/>
    </bean>

Also see this Wikpedia article on Schema Migration.

8 responses on “Schema migration, an important part of continuous delivery

  1. Hi, Arne

    The thought has passed my mind 😉

    But still, we need to keep code and data in sync, which is what this is fundamentally about.

    Let’s say we use Mongo. If I have added an attribute to a document which is compulsory according to the business, what shall I do?

    – old objects don’t need it, but new ones must comply. My code must then have logic for both which gets messy over time.
    – old objects are given a default value for the new attribute. My code is simple but the change is not quite true to business logic.

    If you don’t migrate your data, your code must deal with it and that will get messy eventually.

    So if I use a RDBMS, I try to make it work with me and be a safety net besides my tests and static type checking.

  2. How do you handle backing up the database? I can create rollback scripts but they are not relied upon for deleted data. That would need to be recovered from a backup or a lot of effort is needed to make migrations that keep old data around with a fixed clean up date. Thus during deployment we need to backup the database first. Do you incorporate that concept?

    1. Doing a specific backup before each schema migration is usually too costly.

      Rollback scripts can help but note that DBMS behave differently when a transaction aborts. Most of them, Postgres an exception, does not rollback schema changes. That means that if a script fails, you may be in an undefined state.

      It is very important to test the scripts before running them in production and in my experience, a setting like this makes more runs of the scripts than when you have to do it manually or when you copy some common test database all the time.

      If you are about to delete data in a migration script, I would advise you to instead move it to another table or similar. Then verify and then remove it from its new location in a later script.

  3. Also how do you handle version control of this? We are implementing continuous delivery and the version numbers of the app will change frequently as they reflect build numbers in Jenkins. What is a strategy for maintaining what versions of the db go with what versions of the app?

    1. Our strategy was to keep the scripts together with the core of the system, i e where all the database access is made. The version numbers of the scripts were completely unrelated to the version number of the system.

      Thus, version 1 of the system had version 1 – 5 of the scripts while the version 2 of the system had 1 – 9, as an example.

      Sometimes two developers would use the same version number for their scripts, a race condition in RT terms. The last to commit would have to back off. Takes some discipline.

Leave a Reply to Mikael Brodd Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.