Author: Edward Woodcock

I'm a Senior Developer at Intechnica. In my spare (coding) time I like to write programs involving Neural Networks, usually games, so I don't get too rusty at the AI programming I learned at University. I'm also usually in the middle of writing various bits of jQuery-plugin, node.js or C# goodness, usually for the purpose of making things either faster or easier to make fast.

The Twitter Performance Revamp

So, Twitter have attempted to improve performance of their site again.

Interestingly, they’ve decided to back away from the JS-driven !# approach that caused so much fuss last year, and move back to an initial-payload method (i.e. they’re sending the content with the original response, instead of sending back a very lightweight response and loading the content with JavaScript).

Their reasons for doing so all boil down to “We think it makes the site more performant”. They’ve based their measurement of “How fast is my site?” on the time until the first tweet on any page is rendered, and with some (probably fairly involved) testing, they’ve decided that the initial-payload method gives them the fastest “first tweet” time.

The rest of their approach involves grouping their JavaScript into modules, and serving modules in bundles based on the required functionality. This should theoretically allow them to only deliver content (JavaScript, in this case) that they need, and only when they need it.

I have a few issues with the initial-payload approach, from a performance optimisation standpoint (ironic, huh?), and a few more with their reasons for picking this route.

Let’s start right at the beginning, with the first request from a user who’s never been to your site before (or any site at all, for that matter). They’ve got a completely empty cache, so they need to load all your assets. These users actually probably account for between 40 and 60 percent of your traffic.

For a blank request to an arbitrary tweet, the user has to load 18 separate items of content, plus perform a POST request to “/scribe”, which seems to be some form of tracking. Interestingly, this post request is the second request fired from the page, and blocks the load of static content, which is obviously fairly bad.

A firebug net panel break-down of the load times from the linked tweet (click to view full size)

A firebug net panel break-down of the load times from the linked tweet (click to view full size)

Interestingly, most of the content that’s loaded by the page is not actually rendered in the initial page request; it’s additional content that is needed for functionality that’s not available from the get-go. The response is also not minified, there’s a lot of whitespace in there that could quite easily be removed either on-the-fly (which is admittedly CPU intensive), or at application deploy time.

There we have stumbled upon my issue with the reasons for Twitter taking the initial-payload approach to content delivery: They’re doing it wrong.

They are sending more than just the initial required content with the payload, in fact they’re sending more than 10 modal popouts that are, obviously, not used apart from in 10 different circumstances. Quite simply, this is additional data that does not need to be transferred to the client, yet.

I’ve un-hidden all the additional content from the linked tweet page in this screenshot, there are around 10 popups.

I’ve un-hidden all the additional content from the linked tweet page in this screenshot, there are around 10 popups. I particularly like the SMS codes one, I wonder what percentage of Twitter users have ever seen that?

The worst thing is, if someone comes onto the new Twitter page without JavaScript enabled, they going to get a payload that includes a bunch of modal popouts that they cannot use without a postback and a new payload. So, for people who have JS enabled, they’re getting data in the initial payload that could be loaded (and cached) on the fly after the page is done rendering; for people without JS enabled they’re getting data in the initial payload that they simply can’t make use of.

How would I handle this situation, in an ideal world?

Well, first I’d ditch the arbitrary “first-tweet” metric. If they’re really worried about the time until the first tweet is rendered, they’d be sending the first tweet, and the first tweet only, with the initial payload. That’s a really easy way to hit your performance target: make an arbitrary one, and arbitrarily hit it. My performance target would be time until the page render is complete, i.e. the time from the initial request to the server right up to the bit where the background is done rendering, and the page fully functional at a point where the user can achieve the action they came to the page to do. If this is a page displaying a single Tweet, then the page is fully functional when the Tweet is readable, if this is a sign-up page the page is fully functional when the user can begin to fill out the sign-up form, etc.. This will force the developers to think about what really needs to go with the initial payload, and what content can be dynamically loaded after rendering is complete.

I would then look at the fastest way to get to a fully functional page, as soon as possible. In my opinion, the way to do this is to have a basic payload, which is as small as possible but large enough to give the end-user an indication that something is happening, and then dynamically load the minimum amount of data that will fill the page content. After the page content is retrieved and rendered, that’s the time to start adding things like popouts, adverts, fancy backgrounds, etc., things that add to the user experience, but that are not needed in the initial load in 100% of cases.

I’ve written a really simple application to test how fast I can feasibly load a small amount of data after the initial payload, and I estimate that by using the bare minimum of JavaScript (a raw XHR request, no jQuery, no imports) at the bottom of the BODY tag, I can cut parsing down to around 10ms, and return a full page in around 100ms locally (obviously the additional content loads are not counted towards this, I’d probably not load the images automatically if this were a more comprehensive PoC).

A break-down of the mockup I made, which loads the tweet content from the original linked tweet.

A break-down of the mockup I made, which loads the tweet content from the original linked tweet. It’s a contrived example, but it’s also a good indicator for the sort of speed you can get when you really think about optimising a page.

Add in some static assets and render time, and you’re still looking at a very fast page turnaround, as well as a very low-bandwidth cost.

The idea that Twitter will load all their JS in modules is not a completely bad one. Obviously there will be functionality that has its own JS scripts that can be loaded separately, and loading them when not needed is fairly daft. However, it really depends on numbers; with each separate JS payload you’re looking at around a few hundred milliseconds of wait time (if you’re not using a cached version of the script), so you’ll want to group the JS into as few packages as possible. If you can avoid loading packages until they’re really needed (i.e. if you have lightbox functionality, you could feasibly load the JS that drives the functionality while the lightbox is fading into view) then you can really improve the perceived performance of your site. The iPhone is a good example of this; the phone itself is not particularly performant, but by adding animations that run while the phone is loading new content (such as while switching between pages of apps on the homescreen) the perceived speed of the phone is very high, as the user doesn’t notice the delay.

The final thing I would look at is JavaScript templates. Jon Resig wrote a very small, very simple, JS template function that can work very well as a micro view engine; my testing indicates it can be parsed in around 1ms on my high-spec laptop. Sticking this inline in the initial payload, as well as a small JS template that can be used to render data returned from a web service, will allow very fast turnaround of render times. Admittedly, on browsers with slow JS parsing this will provide a worse response turnaround, but that’s where progressive enhancement comes into play. A quick browser sniff would allow a parameter pass to your service, telling it to return raw HTML instead of a JSON string, which would allow faster rendering where JS is your bottleneck.

The key thing to take from this is that if you are looking for extreme performance gains you should look to minimise the initial payload of your application. Whether you should do this via JavaScript with a JS view engine and JSON web services, or via a minified HTML initial payload, really depends on your user base, there’s never a one-size-fits-all approach in these situations.

Using JS as the main driver for your website is a fairly new approach that really shows how key JavaScript is as the language of the web. It’s an approach that requires some careful tweaking, as well as a very in-depth knowledge of JS, user experience, and performance optimisation; it’s also an approach that, when done correctly, can really drive your website performance into the stratosphere. Why a website like Twitter, with no complex logic and a vested interest in being as performant as possible, would take a step back towards the bad old days of the “the response is the full picture” is a question I did not expect to have to ask myself.

Visit Ed’s blog at http://ed-j.co.uk/

Anaemic Domain Model

Martin Fowler once wrote an article “AnemicDomainModel” decrying the “anti-pattern” of a service layer that executes actions (as opposed to delegating them to an entity), combined with entities that are just data buckets of getters and setters.

His claim is that this kind of application architecture is wrong because “…it’s so contrary to the basic idea of object-oriented design; which is to combine data and process together.”

Right.

Object oriented design, i.e. designing around conceptual “object” blocks that model distinct entities in your system, has always been a bit of a flawed model, in my opinion. When you go shopping, let’s say to buy a DVD, you grab a basket and wander down to the DVD aisle. You check for the next boxed set of your favourite TV series, grab a copy, put it in your basket, wander across to the till (yeah yeah, no-one gets a basket for one item, just roll with it for now) and pay.

In a standard OO model the Service layer is simply a dumb abstraction that provides delegation, whereas the Domain contains the bulk of the application code.

When you “added” your item to the basket, the product did nothing, and neither did the basket; the entity that performed the action was your hand. That’s because products and baskets can’t do anything, a product is just an inanimate thing (obviously there are exceptions to this, such as if you’re buying a puppy, but there are exceptions to almost everything), and a basket is just a receptacle for inanimate (or animate) things. Still think your product needs an AddToBasket(Basket basket)?

So, if we’re building an eCommerce system (a full-on eco-system, not just the B2C frontend; warehouse management, invoicing etc.), which ones of these will have product data? I’m going to say all of them.

You’ve got options at this point. You can say “No! I’m not going to pollute my applications with shared logic and shared datatypes and shared… well, anything”. I admit, there are some serious downsides to having shared functionality, the main one being the inability to make changes without risk of regression. But risk can be managed, and the organisational upsides of the development velocity that can come from only writing code once, as opposed to once per system, are massive.

Common functionality is unlikely on your Product class. Warehouses don’t have baskets; customers don’t buy quarterly overviews; your accountant probably doesn’t care too much about how much stuff fits onto a pallet. But common data is very much the case, everyone will need to know what the product is (warehouses need to know that Product #45126 is a crate of beans, not the wrongly-marked ostrich steak they can’t fit on a pallet; the accountant will need to know that a crate of tennis balls can’t possibly be priced at $4,000 a pop), everyone will need pretty much the same level of basic information.

How does this fit into a domain model?

In the ADM (I would like to coop this as the moniker for this pattern, I don’t even slightly think it’s insulting), our “Domain model” is only “anaemic” in the data layer. We have a full-blown domain model at the “service” layer. There’s just an extra conceptual leap between having a polymorphic product and a polymorphic service.

In the ADM, the Service layer contains the bulk of the logic, and therefore the bulk of the code, and the Domain is only used to model the data.

Under “normal” circumstances (old-skool circumstances, more like) you’ll probably have a factory that takes the “productType” attribute from the raw dataset and decides which type of product to instantiate, then whichever class that’s handling the basket addition delegates the AddToBasket() call to the newly-instantiated SubProduct.

With the newschool (super-awesome) method, we’ll probably have a factory that takes the “productType” attribute from the Product class and decides which type of IBasketHandler to instantiate, then whatever class that’s handling the basket addition delegates the AddToBasket() call the to newly-instantiated SubProductBasketHandler.

See the difference? That’s right, very little. But in terms of separation of concerns, well, we’ve gained a whole lot. Products don’t need to know anything, so there’s no danger of one of the juniors deciding that it’s fine to have the Product(string productId) constructor load the product data from the database (performance and maintenance problem, here we come); IoC will come naturally, testing is much easier with the kind of statelessness that comes with this pattern.

If we’re talking semantics, yeah, we don’t have a “domain” model anymore, we have a data model and a process model, and they’re separate. But really, do your products have behaviour, or do various procedures happen to the products? Data and process should be separate, they’re fundamentally different things; behaviour is dictated by data, data is dictated by behaviour, but there’s a fundamental divide between the two, in that process can (and will) change and data is generally so static that it simply cannot.

So, who’s right? Well, I’d say that both models are appropriate, depending on context. If you’ve not got an ecosystem, and you’re fairly sure your single system will always be the one and only, and you’re confident you’ll get the best results from mixing process and data then go for it. If you’re worried about maintainability and have more than one interlinked system, then think seriously about what’s going to give you the best results in the long term. Just be consistent, and if you’re not, be clear about why you’re deviating from your standards or you’ll end up with an unholy amalgamation of different patterns.

This post originally appeared on Ed’s personal site.

Approaching Application Performance with TDD

This blog post was written by Intechnica Senior Developer Edward Woodcock. It originally appeared on his personal website, which you can view here

Test Driven Development (TDD) is a development methodology created by Kent Beck (or at least, popularised by him), which focuses on testing as not just the verification of your code, but the force that drives you to write the code in the first place.

Everyone who’s been exposed to TDD has come across the mantra:

Red. Green. Refactor. Repeat.

For the uninitiated that means you write a unit test that fails first. Yup, you aim to fail. Then, you write the simplest code that’ll make that test (and JUST that test) pass, which is the “Green” step (obviously you have to re-run the test for it to go green). Then you’re safe to refactor the code, as you know you’ve got a test that shows you whether it still works or not.

Test Cycle - Red is where I'm going to do my next dev step (or a long-outstanding issue I'm marking with a failing test)

So, that’s your standard dev cycle, RGRR. Every now and again you might drop out and have a poke around the UI if you’ve plugged anything in, but you’ll generally just be doing your RGRR, tiny little steps towards a working system.

Finally, when you’re happy with the release you’re working on, you deploy your code to production and go for a beer, safe in the knowledge that your code will work, right?

Continuous Integration - Useful for seeing your project's red/green status

Well, yes, and no. You’re safe in the knowledge that your business logic is “correct” as you understand it, and that your code is as robust as your tests are. But if your app is a large-scale internet (or even intranet) application with many concurrent users, you’ve only covered half the bases, because having the green tests that say that the system works as expected is useless if the system is so clogged that no-one can get on it in the first place. Even a system that simply degrades in performance under high- (or worse, normal-) load scenarios can leave a sour taste in the mouth of your otherwise happy customers.

So, the question is: how can you slide in performance as part of your RGRR cycle? Well, TDD says that the tests should drive the system, but they don’t specify what type of test you need to use. A common thought is to add automated UI tests into the mix, to be run in batches on a release, but why not add in a performance test as part of the build-verification process.

Going from our RGRR cycle, first we need a failing test. So, decide on a load model for the small section of the system you’re working on right now. Does it need to handle a hundred users at once, or will it likely be more like ten? It makes sense to go a little over what you might expect on an average day, just to give yourself some extra headroom.

Response Time Comparison - If your graph looks something like this you're probably doing OK!

Next, pick a load time that you think is acceptable for the action under test. If you’re loading search results, do they need to be quick, or is it more of a report that can be a fire-and-forget action for the user? Obviously input may be required from your client or Product Owner as to what they consider to be an acceptable time to carry out the action, as there’s nothing worse than being proud of the performance of something when it didn’t need to be fast, as then you’ve wasted effort that could be used elsewhere!

Now you’re going to want to run the test, in as close a match to your live environment as possible. At this point I need to point out that this is more of a theoretical process than a set of steps to take, as you’re unlikely to have one production-like environment for each developer! If needs be, group up with other developers in your team and run all your tests back-to-back. Make sure you have some sort of profiling tool available, as running the tests in a live-like environment is the key here, if you run them locally you’ll not be able to replicate the load effectively (unless you actually develop on the live server!).

If you’re using an iterative development approach and this is the first time you’ve run a test on this particular piece of functionality, most likely your test will fail. Your average response time under load will be above your target, and you may not even get up to the number of users you need to account for.

So, that’s the “Red” step accounted for, so how do you get to green? This is where we start to diverge from the RGRR pattern, as to get good performance you’ll need to refactor to make the test green. If you can’t run the same test, just take a few stabs through the UI manually to get some profiler results, and spend the time you’ve saved waiting for the test to complete thinking about how you can implement tests that can be run locally and from a load injector.

Profiling in action: DynaTrace giving us a realtime comparison. We're looking for some red bars, which indicate worse performance.

Hopefully your profile should have some lovely big red bars that show you where the hotspots are for your particular piece of functionality and you can use this information to refactor to make the algorithm less complex, the DB call faster, or to add some caching. If you’re being a rigorous TDD-adherent you’ll probably only want to make a single architectural change before you re-run the performance test, but in most cases you’ll want to do as many things as you can think of as performance tests on a live-like system won’t be available all the time.

Once you’re happy with that, re-run your test, this should be the “Green” step, if it’s not you should go back and refactor the code again until you hit your target. If you’re struggling to find enough headroom from code or architecture changes to hit your performance targets you most likely need to either leverage more hardware or refactor your UI to divert traffic to other areas.

Right then, we’ve done Red, Refactor, Green, next comes “Repeat”. If you think you can eke out more speed from the area you’re working on, you can go back and adjust your test load, but if you’ve gone for a known (or expected) production load with a little extra on top you probably don’t want to waste time on that. After all, when you’re practicing TDD you do just enough to hit your target, and then move on.

Repeat Load Test - When you do your repeat you can add in the "Change" column, which helps identify possible areas on concern for the next cycle.

So what’s next? Well, next you implement another piece of functionality, and do another load test. As you go along you’ll eventually build up quite a collection of load test scripts, one for each functionality area in the system, and you should run these together each time you add a new piece of functionality, just like in a unit testing session. However, I’d avoid doing tests on multiple new pieces of functionality at once the first time around, as you will likely come across a situation where a single piece of functionality knocks off performance across the board, giving you a big spreadsheet full of red boxes.

If you follow this method (RRGR) throughout the development lifetime of your system you should have a rock-stable system that can quantifiably cope with the expected amount of load, and then some. This is a great situation to be in when you’re planning new functionality, as you’ll rarely have to worry about whether you have enough headroom on your boxes to implement killer feature X, and can instead worry about really nailing that cool new bit of functionality.