In software development where the system persists data in a relational database, it is important to keep track of changes to the versions of the schema.
The importance comes from that you always have several database instances to keep track of. There is the production database, the database for system tests, the database for acceptance test, the database for performance test, the database for development team and the database each developer has.
All these will be at different versions and aligned with different versions of the code.
In this blog I will describe how you come about to control your schema versions. I have tried it in two completly different settings.
What you do is to create a table called, say, “version”. Here you store all versions that have affected the database schema and preloaded control data. Thus, when in doubt you can always check the table to see which version a particular database is in.
major | minor |
0.1 | 1 |
0.1 | 2 |
You may use any notation and number of columns to describe your versions, but each must be unique and guranteed through a unique index.
Furthermore, all changes to the schema are described in SQL script files. For every version or change, there is one file. You can never change the file once it has come to use. There must be no doubt how the file relates to the contents of the version table.
S_00.01_1.sql
S_00.01_2.sql
The file should start with a begin and end with a commit to make running of the script atomic. The first line should be an insert into the version table. Since the lines are unique, you are guarded against applying the same SQL script twice.
begin
insert into version values (‘0,1’, 2);
…
commit
The files are named after the version. You should use numbers with leading zeroes padded so they line up nicely in correct order when you list them.
Order matters, since some changes depend on earlier ones.
By this simple technique you get a database where you know which version of the schema it has and you are never afraid of running the database script too many times.
And, besides, at Spa Franchorchamps, keep your foot down through Eau Rouge… 😉
Maybe of interest: my colleague Dan Bergh Johnsson blogged about this subject recently: "Version of data in database"
/ET
I use this scheme myself with the addition of one column: db_version_info.
In the db_version_info column I let Subversion (or whatever version control system being used) fill in the repository version and other information about the SQL file.
The db_version_info column might in this example contain something like:
This way I will always know exactly which version of a SQL script that has been run.
It seems simple enough to actually work:-)
I’m interested in if anyone has come up with a good solution for managing schema evolution and upgrading existing databases with existing data.
In a project I worked in we used to have one SQL script that was version stamped that handled creation and initialization of the database. This was used for setting up new databases. Then we had another script that was used for upgrading and migration existing databases. The upgrade script was only valid betwen two consecutive versionsi.e. 1.3 to 1.4. So if you wanted to upgrade from say 1.1 to .1.4 you had to run three scripts 1.1->1.2->1.3->1.4.
A really nice easy to use no-nonsense tool for exactly this sort of versioning is DBDeploy available in both Java and .NET flavours.
I’ve used it on multiple projects with very nice results.
And at SPA – if you keep your foot down Eau Rouge I will show within a meter where you will hit the wall 🙂
hi,
the very same idea is used in “deltasql” (implemented in php and mysql as a web server), available at
http://gpu-grid.net/wiki/index.php/Deltasql
Basically, you enter the scripts on the webserver (numbering is done automatically) and the webserver computes for you the needed set of scripts, if you give it the current schema number.
bye 🙂