Surprisingly often an organization exposes itself to a multitude of risks by not knowing enough about its systems, infrastructure, and applications. This doesn’t manifest itself as a lack of “enterprise architecture” documents (while some could help). The implications are far more down-to-earth. Does any of this sound familiar?
- Upgrading both hardware and software is unproportionally difficult and the outcome isn’t predictable
- There exist tasks that must be performed by a specific person
- There exist artifacts that everybody thinks need to be there, but no one dares to touch, delete, or upgrade
- Introduction of new employees is difficult
While all of the above is scary, it will also make system development a real pain. Productivity will be low and the rate of delivery unpredictable. Nowadays organizations need to be able to deliver their software in a timely fashion, at a high rate, and with a level of quality appropriate for the product they are making. That level is never low.
The symptoms above can be summarized as “the black box architecture” or the anti-thesis to efficient development or continuous delivery. Let’s look at them in detail.
In black box systems upgrading or moving software or infrastructure is difficult because nobody actually knows what software is being run, what systems it talks to, how it has been customized and configured, and what the initial non-functional requirements were. In some cases, a five year old document describing the system can be found, but it’s obsolete by today, and it was not really accurate on the day of its writing either. Another version of the same phenomenon is being stuck with a contractor/partner that runs the system, and cannot hand it over to someone else in a controlled manner. Of course it can be done, but not to a reasonable cost. The solution to this has always spelled documentation, and while it may be the obvious one, it’s never there.
People become the masters of specific tasks either willingly or unwillingly. In the beginning, it may be tempting to be the guru of a particular system. Everybody must turn to that person, who becomes a critical resource. This may be fulfilling, until the day the system crashes, takes some heat, or the person just want to go on vacation or parental leave. People may also take care of systems unwillingly. They are simply the only ones that are familiar with the way they work, and new employees get to do cooler stuff instead of learning the black box. After all, it’s quite convenient when it works.
Systems are composed of various artifacts. They may be database, message queues, and servers of various sorts. The same goes for their software, which in turn is composed of source files, binaries, configuration, and third party libraries. Regardless of size and scope, a black box systems/software never shrinks. It always expands, since nobody can remove any component at a reasonable cost without the fear of breaking something. In a previous post, I called this “horizontal development” when applied to system development.
Introducing a new employee to of this is difficult, especially if it’s a developer. There’s no simple way for that person to set up a development environment, configure it, and start developing against a system with known state. That person will rely on his/hers new colleagues to copy their configuration and perform some “tweaks” that have been passed down the generations or require the guru to perform them.
If you’re a developer, does this sound scary? If you’re a manager, how does this map to your risk reduction strategies and business continuity plans? If you’re tester, will you ever be able repeat your tests, be they automated or manual?
Black box systems also go under the name “untestable systems”, and they behave thereafter.
Now, imagine the “untestable” black box system at the right side of the scale, and the ability to deliver working software with predictable results as frequently you want on the other. What does it take to get there?
Of course, the solution will be specific to your particular organization, and since it has taken years to accumulate enough technical debt to turn a system into a black box, it will take some time to reverse that condition. This is time well spent! Here are some pointers and hints.
Databases
Most of today’s systems require some kind of database. What does the database need to contain to be able to bootstrap an “empty” system? Typically it needs some reference data like user accounts and various lookup tables. Which ones are required? And by the way… Can you recreate your database in an automated manner? Can you recreate it at all? This is absolutely crucial to get control of. Harnessing your database enables you to migrate the system at will (from a database perspective) and enables automated integration and acceptance testing. Getting the database under control is tedious work (in an upcoming post I’ll give you some tips about how to do it). Basically one needs to start with an empty schema and work from there, but it’s really rewarding when it’s finished. Also, the database guru can go on vacation, since the scripts that contain the automated setup are accessible and readably be everybody, since they’re under version control, right?
Operating systems and environments
Environment configuration is difficult. A system that’s been running on the same machine for a couple of years has “broken in” that environment. Of course, nothing has been documented, so no one really knows what binaries and libraries are required to support the system. Is a specific version of the OS or kernel required? Or has a special kernel been compiled? Discovering that the path is incomplete is easy, but what about the library path? What’s been patched, and what patches are required? Getting the environment configuration right may be harder than getting control of the database. The latter is just a chunk of work, while the former may need some real detective effort. Nonetheless, deploy a copy of your system to a fresh installation of the operating system it’s running on now. Maybe the exact same version isn’t available at this point, but try finding something similar. Resist the temptation of upgrading before the system environment is understood. Note every change and write a script in your favorite scripting language, or a tool like Puppet, that automates the transformation of a fresh OS into something that supports the system. Again, place that script under version control.
Middleware
Application servers, enterprise message buses, and additional support servers each have their own configuration. The procedure is similar to the one outlined for the operating systems. Deploy the system to the server and see where it breaks. Compare the configuration to the black box version, note the difference, script it, and place under version control.
Development environment
Setting up a developer’s environment is usually difficult because of the obstacles outlined above. However, if the environment doesn’t depend on shared resources (and it actually seldom does), there’s no excuse. Setting up a local development environment shouldn’t be more difficult than checking out a couple of projects from the repository, maybe making some changes to one configuration file, and maybe executing a script. If rituals are required to make the local environment work, start from afresh, see what the first part of the ritual is, automate or configure, rinse and repeat. Now, new members of the development team can start learning the system from day one.
Once all of this is done, care must be taken to ensure that configuration changes make it to the scripts that you have laboriously created. The best way to ensure that this happens is to actually deploy the system to an environment that’s been prepared by your scripts. If that feels scary, or is unfeasible because of some other limitation, at least let all development and testing make use of this infrastructure.
Regardless of the final step, the system is no longer a black box!