When refactoring legacy code, two problems seems to repeatedly occur. One is that the code is all tangled up with interdependencies and the other is that there is no specification of what the system is supposed to do.
Still we are asked to add features or fix problems without breaking anything. Everything in there should stay there.
The weapons that I wish to wield this time is the sword of dependency breaking and the shield of characterization tests.
Dependency breaking is the technique you use to bring a part of the system into isolation. The purpose is to get a manageable piece to work with. A component with defined dependencies. With that, testing has been made possible.
Characterization tests is where you send stimuli to your now isolated component and records its behavior and output. Preferably you use a test coverage tool to see if the stimuli is varied enough to exercise the code thoroughly. This is to give you a defined starting point and minimize the risk of breaking anything later.
The characterization tests are now your specification. The part of it that is business logic, show it to your product owner to verify that this what is actually meant by “not changing anything”. In eight times of ten, you will also get ideas for other test cases. So do it.
Swinging the sword
So how do you do dependency breaking then? It is a bit of a chicken and egg problem. You don’t want to change the code without tests but you can’t, as the design is, add tests without first changing the code.
You need to find the seams, the places where it is most feasible to add some kind of level of indirection that will behave exactly as previously but with the possibility to put the isolated component in a test jig.
A seam can be where a class is directly accessed by a client class that is part of what you wish to isolate. Instead of directly access the class, it should access an interface instead. That interface is implemented by the other class.
A database is often a central point of misery as knowledge of the data model is all over the place. Relational databases have views that can be used to minimize the exposure of the model. Updates and inserts are preferably placed in dedicated components with interfaces.
Use your knowledge of the domain and let the concepts guide you when looking for places to put your seams. If possible, examine the history of changes in the system and see which parts vary together. They probably belong together.
You don’t always need to break dependencies to create characterization tests. Tests can be at the system level and defined using tools such as FitNesse, Cucumber or Spock. Behavior at the system level is also of larger interest to stakeholders, making it easier to get the feedback mentioned earlier.
So at any level, unit, integration or system test, the goal is to find a way to test and enough examples of inputs to feel certain that nothing will break unexpectedly later on. You also need a way to record the output.
Remember to design your test as well as always. They should not interfere with each other, it should be possible to run them in any order and at the same time. Thus test data must be set up by each test case, unless it becomes completely impractical where you instead verify that test data is there before running the actual test.
Recording output depends of course on which level of testing you are. Here are some examples.
When unit testing in Java, Mockito is a framework that has features to capture how your software interacts with test doubles (a.k.a. mocks).
A database can be replaced by an in-memory equivalent that is faster and disappear when the test has run.
The two problems of refactoring legacy code that the code is tangled up, forming a monolith and that the correct behavior is not specified, making it risky to change anything.
By using dependency breaking and characterization tests, these problems are mitigated.
Thanks for reading so far!