test driven development

Performance testing is not a luxury, it’s a necessity

Recently I chanced upon a post posing the question of whether Software Testing is a luxury or a necessity. My first thought was that testing should not be a luxury; it’s much more expensive to face an issue when it arises in a live system, so if you can afford to do that, perhaps that’s the true “luxury”. However, testing is now accepted as a fundamental aspect of the software lifecycle and Test-Driven Development (TDD) stresses this aspect.

Unfortunately too often software testing is understood only as functional software testing and this is a very big limitation. Only if your software is supposed to be used by a very small number of users can you avoid caring about performance; only if your software manages completely trivial data can you avoid caring about security. Nevertheless, too often even the more advanced companies that use TDD doesn’t consider performance and penetration tests; or, at least, they do it only just at the User Acceptance Test (UAT) stage.

Working for a company that is very often requested to run performance tests for clients in the few days before the live release, we are often faced with all the problems that the development team has ignored. It’s hard when we must say “your system can’t go live with the expected workload” when the client’s marketing has already advertised the new system release.

Intechnica is stressing the performance aspect so much that we are now following the “Performance-Driven Development” approach. The performance requirements are collected together with the functional ones, addressing the design in terms of system architecture, software and hardware. Then the performance tests are run together with the unit tests during the entire software lifecycle (following the Continuous Integration practice).

I think that such an extreme approach may not be suitable for everybody, but recently I tested the performance of a new web application for a financial institution. We had already tested it 3 months earlier, but during these 3 months the development team has added new features and fixed some issues, the web interface has slightly changed, and as a result, almost all the test scripts became unusable and had to be fixed.

This tells me that the performance requirements must be considered throughout the entire development stage. They must be expressed and included in the unit tests because that is the only place where defined software contracts are used. Performance tests can be done at the UAT stage, but, just as no serious company would implement a UAT without having first functionally tested the code, so should you measure performance during the development stage. Additionally, an Application Performance Management (APM) tool is highly advisable to monitor the application and to find out the cause of performance issues rapidly in development as in the production environment.

Is testing a luxury? I’d prefer the luxury of a good dinner to the “luxury” of wasting money for an application found unfit for launch on the “go live” day.

This post was contributed by Cristian Vanti, one of our Performance Consultants here at Intechnica.

Approaching Application Performance with TDD

This blog post was written by Intechnica Senior Developer Edward Woodcock. It originally appeared on his personal website, which you can view here

Test Driven Development (TDD) is a development methodology created by Kent Beck (or at least, popularised by him), which focuses on testing as not just the verification of your code, but the force that drives you to write the code in the first place.

Everyone who’s been exposed to TDD has come across the mantra:

Red. Green. Refactor. Repeat.

For the uninitiated that means you write a unit test that fails first. Yup, you aim to fail. Then, you write the simplest code that’ll make that test (and JUST that test) pass, which is the “Green” step (obviously you have to re-run the test for it to go green). Then you’re safe to refactor the code, as you know you’ve got a test that shows you whether it still works or not.

Test Cycle - Red is where I'm going to do my next dev step (or a long-outstanding issue I'm marking with a failing test)

So, that’s your standard dev cycle, RGRR. Every now and again you might drop out and have a poke around the UI if you’ve plugged anything in, but you’ll generally just be doing your RGRR, tiny little steps towards a working system.

Finally, when you’re happy with the release you’re working on, you deploy your code to production and go for a beer, safe in the knowledge that your code will work, right?

Continuous Integration - Useful for seeing your project's red/green status

Well, yes, and no. You’re safe in the knowledge that your business logic is “correct” as you understand it, and that your code is as robust as your tests are. But if your app is a large-scale internet (or even intranet) application with many concurrent users, you’ve only covered half the bases, because having the green tests that say that the system works as expected is useless if the system is so clogged that no-one can get on it in the first place. Even a system that simply degrades in performance under high- (or worse, normal-) load scenarios can leave a sour taste in the mouth of your otherwise happy customers.

So, the question is: how can you slide in performance as part of your RGRR cycle? Well, TDD says that the tests should drive the system, but they don’t specify what type of test you need to use. A common thought is to add automated UI tests into the mix, to be run in batches on a release, but why not add in a performance test as part of the build-verification process.

Going from our RGRR cycle, first we need a failing test. So, decide on a load model for the small section of the system you’re working on right now. Does it need to handle a hundred users at once, or will it likely be more like ten? It makes sense to go a little over what you might expect on an average day, just to give yourself some extra headroom.

Response Time Comparison - If your graph looks something like this you're probably doing OK!

Next, pick a load time that you think is acceptable for the action under test. If you’re loading search results, do they need to be quick, or is it more of a report that can be a fire-and-forget action for the user? Obviously input may be required from your client or Product Owner as to what they consider to be an acceptable time to carry out the action, as there’s nothing worse than being proud of the performance of something when it didn’t need to be fast, as then you’ve wasted effort that could be used elsewhere!

Now you’re going to want to run the test, in as close a match to your live environment as possible. At this point I need to point out that this is more of a theoretical process than a set of steps to take, as you’re unlikely to have one production-like environment for each developer! If needs be, group up with other developers in your team and run all your tests back-to-back. Make sure you have some sort of profiling tool available, as running the tests in a live-like environment is the key here, if you run them locally you’ll not be able to replicate the load effectively (unless you actually develop on the live server!).

If you’re using an iterative development approach and this is the first time you’ve run a test on this particular piece of functionality, most likely your test will fail. Your average response time under load will be above your target, and you may not even get up to the number of users you need to account for.

So, that’s the “Red” step accounted for, so how do you get to green? This is where we start to diverge from the RGRR pattern, as to get good performance you’ll need to refactor to make the test green. If you can’t run the same test, just take a few stabs through the UI manually to get some profiler results, and spend the time you’ve saved waiting for the test to complete thinking about how you can implement tests that can be run locally and from a load injector.

Profiling in action: DynaTrace giving us a realtime comparison. We're looking for some red bars, which indicate worse performance.

Hopefully your profile should have some lovely big red bars that show you where the hotspots are for your particular piece of functionality and you can use this information to refactor to make the algorithm less complex, the DB call faster, or to add some caching. If you’re being a rigorous TDD-adherent you’ll probably only want to make a single architectural change before you re-run the performance test, but in most cases you’ll want to do as many things as you can think of as performance tests on a live-like system won’t be available all the time.

Once you’re happy with that, re-run your test, this should be the “Green” step, if it’s not you should go back and refactor the code again until you hit your target. If you’re struggling to find enough headroom from code or architecture changes to hit your performance targets you most likely need to either leverage more hardware or refactor your UI to divert traffic to other areas.

Right then, we’ve done Red, Refactor, Green, next comes “Repeat”. If you think you can eke out more speed from the area you’re working on, you can go back and adjust your test load, but if you’ve gone for a known (or expected) production load with a little extra on top you probably don’t want to waste time on that. After all, when you’re practicing TDD you do just enough to hit your target, and then move on.

Repeat Load Test - When you do your repeat you can add in the "Change" column, which helps identify possible areas on concern for the next cycle.

So what’s next? Well, next you implement another piece of functionality, and do another load test. As you go along you’ll eventually build up quite a collection of load test scripts, one for each functionality area in the system, and you should run these together each time you add a new piece of functionality, just like in a unit testing session. However, I’d avoid doing tests on multiple new pieces of functionality at once the first time around, as you will likely come across a situation where a single piece of functionality knocks off performance across the board, giving you a big spreadsheet full of red boxes.

If you follow this method (RRGR) throughout the development lifetime of your system you should have a rock-stable system that can quantifiably cope with the expected amount of load, and then some. This is a great situation to be in when you’re planning new functionality, as you’ll rarely have to worry about whether you have enough headroom on your boxes to implement killer feature X, and can instead worry about really nailing that cool new bit of functionality.