I recently ran up against a large XML file and legacy code that manipulated it.. the code had mysteriously stopped working, there were no unit tests and the XML file was large and had no line breaks. What do you do in this situation?
Loading the 30MB file in any kind of editor made the editor slow. Trying to format the file left the editors unresponsive. There’s no point in doing a grep find on a one line XML file, and while writing a cleanup script to add line breaks makes life better, why reinvent the wheel? Googling yielded several windows programs at first, but digging further, and using social networking finally produced two fantastic products for unix/linux based OS!
The first and easiest tool to use turns out to be xmllint! It’s most likely available with your unix/linux distribution just type it at the command prompt. Running xmllint –format my_file.xml > my_file_formated.xml will add line breaks where they make sense in your XML file. Now you can easily perform a grep find, you’re halfway there!
This perl program was the icing on the cake! You can download it here: http://search.cpan.org/dist/XML-Twig/ and then all you need to do is install it using the provided Makefile and you’re up and running. How did this help? Well among the hundreds of options provided with XML-Twig there’s a tool called xml_split. Just type xml_split my_file.xml and it will split your large unreadable file into manageable smaller ones. You can find more information about the usage here: http://search.cpan.org/dist/XML-Twig/tools/xml_split/xml_split
With just these two tools it’s easy to write unit tests against smaller sections of the XML file. It’s also easy to find the problematic sections in the XML file and easier to make changes and to test the fix!
What tools do you use?
Get in touch via my homepage if you have questions or comments!