This blog post was written by Intechnica Senior Developer Edward Woodcock. It originally appeared on his personal website, which you can view here.
Test Driven Development (TDD) is a development methodology created by Kent Beck (or at least, popularised by him), which focuses on testing as not just the verification of your code, but the force that drives you to write the code in the first place.
Everyone who’s been exposed to TDD has come across the mantra:
Red. Green. Refactor. Repeat.
For the uninitiated that means you write a unit test that fails first. Yup, you aim to fail. Then, you write the simplest code that’ll make that test (and JUST that test) pass, which is the “Green” step (obviously you have to re-run the test for it to go green). Then you’re safe to refactor the code, as you know you’ve got a test that shows you whether it still works or not.
So, that’s your standard dev cycle, RGRR. Every now and again you might drop out and have a poke around the UI if you’ve plugged anything in, but you’ll generally just be doing your RGRR, tiny little steps towards a working system.
Finally, when you’re happy with the release you’re working on, you deploy your code to production and go for a beer, safe in the knowledge that your code will work, right?
Well, yes, and no. You’re safe in the knowledge that your business logic is “correct” as you understand it, and that your code is as robust as your tests are. But if your app is a large-scale internet (or even intranet) application with many concurrent users, you’ve only covered half the bases, because having the green tests that say that the system works as expected is useless if the system is so clogged that no-one can get on it in the first place. Even a system that simply degrades in performance under high- (or worse, normal-) load scenarios can leave a sour taste in the mouth of your otherwise happy customers.
So, the question is: how can you slide in performance as part of your RGRR cycle? Well, TDD says that the tests should drive the system, but they don’t specify what type of test you need to use. A common thought is to add automated UI tests into the mix, to be run in batches on a release, but why not add in a performance test as part of the build-verification process.
Going from our RGRR cycle, first we need a failing test. So, decide on a load model for the small section of the system you’re working on right now. Does it need to handle a hundred users at once, or will it likely be more like ten? It makes sense to go a little over what you might expect on an average day, just to give yourself some extra headroom.
Next, pick a load time that you think is acceptable for the action under test. If you’re loading search results, do they need to be quick, or is it more of a report that can be a fire-and-forget action for the user? Obviously input may be required from your client or Product Owner as to what they consider to be an acceptable time to carry out the action, as there’s nothing worse than being proud of the performance of something when it didn’t need to be fast, as then you’ve wasted effort that could be used elsewhere!
Now you’re going to want to run the test, in as close a match to your live environment as possible. At this point I need to point out that this is more of a theoretical process than a set of steps to take, as you’re unlikely to have one production-like environment for each developer! If needs be, group up with other developers in your team and run all your tests back-to-back. Make sure you have some sort of profiling tool available, as running the tests in a live-like environment is the key here, if you run them locally you’ll not be able to replicate the load effectively (unless you actually develop on the live server!).
If you’re using an iterative development approach and this is the first time you’ve run a test on this particular piece of functionality, most likely your test will fail. Your average response time under load will be above your target, and you may not even get up to the number of users you need to account for.
So, that’s the “Red” step accounted for, so how do you get to green? This is where we start to diverge from the RGRR pattern, as to get good performance you’ll need to refactor to make the test green. If you can’t run the same test, just take a few stabs through the UI manually to get some profiler results, and spend the time you’ve saved waiting for the test to complete thinking about how you can implement tests that can be run locally and from a load injector.
Hopefully your profile should have some lovely big red bars that show you where the hotspots are for your particular piece of functionality and you can use this information to refactor to make the algorithm less complex, the DB call faster, or to add some caching. If you’re being a rigorous TDD-adherent you’ll probably only want to make a single architectural change before you re-run the performance test, but in most cases you’ll want to do as many things as you can think of as performance tests on a live-like system won’t be available all the time.
Once you’re happy with that, re-run your test, this should be the “Green” step, if it’s not you should go back and refactor the code again until you hit your target. If you’re struggling to find enough headroom from code or architecture changes to hit your performance targets you most likely need to either leverage more hardware or refactor your UI to divert traffic to other areas.
Right then, we’ve done Red, Refactor, Green, next comes “Repeat”. If you think you can eke out more speed from the area you’re working on, you can go back and adjust your test load, but if you’ve gone for a known (or expected) production load with a little extra on top you probably don’t want to waste time on that. After all, when you’re practicing TDD you do just enough to hit your target, and then move on.
So what’s next? Well, next you implement another piece of functionality, and do another load test. As you go along you’ll eventually build up quite a collection of load test scripts, one for each functionality area in the system, and you should run these together each time you add a new piece of functionality, just like in a unit testing session. However, I’d avoid doing tests on multiple new pieces of functionality at once the first time around, as you will likely come across a situation where a single piece of functionality knocks off performance across the board, giving you a big spreadsheet full of red boxes.
If you follow this method (RRGR) throughout the development lifetime of your system you should have a rock-stable system that can quantifiably cope with the expected amount of load, and then some. This is a great situation to be in when you’re planning new functionality, as you’ll rarely have to worry about whether you have enough headroom on your boxes to implement killer feature X, and can instead worry about really nailing that cool new bit of functionality.