ec2

How to create templates of Amazon EC2 environments

Sometimes, when you’re developing an application (or even in the testing stage), it’s really useful to be able to regress back to a previous iteration or even branch off into several unique environments. You might also want to set up identical environments to send off to other people, such as team members, clients or even sales teams. Cloud platforms like Amazon Web Services and its Elastic Compute Cloud (EC2) service offer a cost effective solution to this problem, especially compared to traditional (“tin”) infrastructure, but managing all these environments and machine images in the Amazon Management Console can quickly become confusing. It would be easier if you could set up a template for each environment iteration and fire up said environments directly from the templates.

This is actually very simple to achieve – even for those without deep technical knowledge of AWS or similar IaaS offerings – by using a neat application called CloudFlex (disclosure: yes, it’s made by Intechnica, but we find it very handy).

Step 1: Set up the template

CloudFlex uses a step-by step wizard to guide you through the process of defining what your environment should look like. This includes the number and type of AMIs (Amazon Machine Images) that should be deployed, their security groups, elastic IPs, load balancers and everything else you might need. You can also give it a descriptive name to help you identify each template at a glance. All of your templates are shown in one place so you can see what you have saved and launch environments from them easily.

Step 2: Launch an environment

Once you have your template set up, you can start an environment from it. You can either do this manually or schedule it to start up whenever you like, or as frequently as you need (such as every morning at the start of the working day). When you start an environment manually, all the details are pre-populated by your chosen template, which keeps everything consistent across multiple environments. After the environment has finished spinning up, CloudFlex gives you quick access to details such as its public DNS; you can also connect to a machine image’s remote desktop through this screen. From there you can do whatever work you need to do on said machine.

Step 3: Save environment snapshots

Now that you’ve connected to your AMI and done your work, you might want to keep a snapshot of that image, in its current state, to go back to later (or to distribute to team members etc). You might not want to allow access to the AMI you’re working on in case something gets changed. The best solution is to create a new template from a snapshot of your AMI. To do this, from the details page of your environment in CloudFlex, click “Create Machine Image” (after giving it a name) and the AMI will be copied in its current state to your AWS account. Now, repeat steps 1 & 2, this time choosing your new AMI as the machine image for your template. You can then start up as many concurrent versions of the environment as needed and send remote desktop files for each to whoever needs access.

CloudFlex is available from Intechnica from £99 per month. If you want to know more, visit the website where you can sign up for a trial or leave a comment below.

Richard Bishop at SIGiST

Performance Testing in the Cloud [Presentation]

Intechnica recently sponsored the British Computer Society’s Special Interest Group in Software Testing (SIGiST) summer conference in London. The SIGiST is a great place to come and listen to the country’s top software testers talk about methodologies, tools, new technology and experiences, as well as to meet others in the world of testing.

View a Photosynth panorama of the SIGiST conference in London

One of the speakers for this summer’s SIGiST conference was Intechnica’s own Richard Bishop, who contributes blog posts for this site. Richard spoke about Intechnica’s findings and observations from the use of cloud platforms in performance testing (we use TrafficSpike, based on Facilita Forecast, to generate load from the cloud for tests, as well as developing and migrating applications for and to the cloud).

Richard Bishop at SIGiST

Richard Bishop, speaking at the BCS SIGiST summer conference

The presentation was well received, gaining praise on Twitter via the #SIGiST hashtag.

https://twitter.com/webcowgirl/status/215770663586234368

The slides have since been uploaded to SlideShare; view and download them here:

 

15 Web Performance Nightmares, and the damage they caused

In a new section of the Intechnica blog, titled “Performance Nightmares”, we’re going to take a closer look at some of the most notorious, high-profile, brand-damaging website performance failures in the history of the internet. Now, the internet is a huge place, and every day, all sorts of websites struggle with performance issues of all kinds; not just the big websites, but also many smaller sites. A quick search on Twitter for “slow website” or “website down” shows the scope of this.

https://twitter.com/lukey868/status/191857279111405568 https://twitter.com/Frewps/status/191704594269736961 Of course, the cause of each of these problems could be all sorts of things, but these comments are entirely public, and potentially hundreds or even thousands of people could read these negative tweets. But bad publicity is not the only negative effect of slow or failing websites, as a previous post on this blog tells. The business impact can be very damaging too. So to kick off “Performance Nightmares”, let’s jump right in with 15 examples spanning from across the last 11 years. I’ve split them into four categories; Marketing Oversights, Overwhelming Public Interest, Unforeseeable Events and Technical Hiccups. Be sure to check back (or subscribe) to the blog to see more Performance Nightmares as they are reported!

Marketing Oversights

The goal of any marketing campaign is to generate awareness and interest in a brand or product, and drive customers to find out more, ultimately converting leads to sales. In the past, the more interest you directed towards your product the better, as long as supply met demand. But in the following cases, so much more demand was created than expected that the websites fell down, locking out potential customers (not to mention regular and existing customers). While the companies often say that “it’s a good problem to have” or “we were too successful for our own good”, the under-performing website has shot them in the foot, and customers old and new end up frustrated. Here are a few high profile examples…

1. Nectar

When directing people to a website for a special offer, it’s worth checking they will all be able to access it

Setting the scene: When it launched in 2002, loyalty card scheme Nectar pushed hard with TV adverts and email marketing directed at over 10 million households, driving people to their newly launched service. While they had phone lines and direct mail as means of collecting registrations, Nectar attempted to save costs by offering a rewards incentive to people registering online via the website.

Performance Nightmare: While Nectar prepared for this by increasing their server capacity six-fold, a peak of 10,000 visitors in one hour was enough to bring the site down for three days. Nectar cited the complexity of the registration process (security & encryption) as a bottleneck.

Source

2. Glastonbury Festival

Website crashes causes people to express real disappointment

Glastonbury: Messy business

Setting the scene: As an iconic music festival of the past 42 years, Glastonbury needs no introduction. In fact, its fame and popularity on the global music calendar has been almost a gift as well as a curse for some time; plagued by demand greatly outstripping availability of tickets, £2 million was spent in 2002 to build a giant fence, which kept out people who did not have a ticket, and festival goers long complained of having to pay inflated prices to ticket touts online. The problem was compounded in 2005 when information leaked about acts like Oasis and Paul McCartney being set to headline.

Performance Nightmare: The ticketing website got two million impressions in the first five minutes, overloading the system and resulting in disappointment for many people seeking tickets. Bad news spreads fast, with the BBC being flooded with emails about the service, and reports of people selling t-shirts displaying the error message shown by the website.

Source

3. Dr Pepper

Offer a free drink to 300 million people… what’s the worst that can happen? 

Setting the scene: This is almost the poster boy for a marketing campaign that did not take the limits of a website’s performance into account. Dr Pepper promised that, if Guns n’ Roses released the “Chinese Democracy” album in 2008, they would give everyone in the US a free Dr Pepper. When the album was released, Dr Pepper made good on their offer… limiting it to just one day. There are 300 million people in the United States. I think you can see where this is going.

Performance Nightmare: If you guessed that the site was overwhelmed and crashed, you’d be right. The traffic spiked dramatically, forcing Dr Pepper to add more server capacity and extend the offer by a day… which probably would have been a good idea in the first place.

Source

4. Reiss

The gift of free publicity can be a double-edged sword for web performance

Setting the scene: The internet has had a massive effect on the fashion industry, from e-commerce retail through to trend setting and social media. Since her engagement and the globally covered event of her marriage to Prince William, Kate Middleton has become a fashion icon. And when you have an event as watched by the world as the first meeting of the US Presidential first family with “Kate & Wills”, in May 2011, every fashion-hunter had their eye on what the duchess chose to wear. This was great publicity for the designer in question, Reiss…

Performance Nightmare: … Until the sheer volume of interest crashed their website for two and a half hours. While it probably wasn’t a formal marketing campaign as such, the exposure of the brand via Kate Middleton meeting the Obamas is a major fashion event for many, and such a high profile endorsement draws more traffic than perhaps any traditional marketing campaign, something the website was unable to cope with.

Source

5. Ticketmaster, Ticketline, See Tickets, The Ticket Factory

Some events are in such high demand, they can take out multiple sites in one day

Setting the scene: Take That were one of the biggest boy bands of the 90’s, and when they returned for a nationwide tour, women in their mid 20s would trample over their father to get tickets. When the band announced a huge tour with the full line up, including Robbie Williams, the response was frenzied among fans. Tickets were stocked by many major websites, including Ticketmaster, Ticketline, See Tickets and the Ticket Factory.

Performance Disaster: The demand was so high for tickets that fans flooded and crashed all four sites mentioned above. Would-be ticket buyers were forced to wait, in some cases all day, for their order to be processed successfully (if they could get on the sites at all), and the slow running of the websites continued even after the tickets were all gone. Considering the popularity of Take That, it should be no surprise that this negative experience was widely shared and reported on in the mainstream media. Maybe those Take That fans should have had a little patience.

Source

6. Paddy Power

You think your website will perform when it matters… Wanna bet? 

Don’t let your site fall at the first hurdle

Setting the scene: The Grand National is the biggest betting event of the year in the UK, with it actually being the only betting event of the year for many. Business is at its peak for bookies and betting websites, with the British public spending £80 million in bets on each year’s Grand National. With fierce competition between betting websites to get the business of both regular and once-a-year betters, many offer special deals, such as Paddy Power’s “Five Places” payout offer.

Performance Disaster: So great was the demand for bets on Paddy Power, higher than any other day in its history in fact, that the website came crashing down more minutes before the race. The site was down just 20 minutes, coming back up 15 minutes before the race, but in such a short time frame, this was a costly hiccup where there are many other betting websites to choose from. Studies show that, at busy times, 75% of customers will move onto a competitor’s website rather than suffer delays. This has to be compounded with such a time-sensitive case as getting good odds on the Grand National. 88% of people won’t come back to a website after a bad experience; Paddy Power have since offered a free bet to all its customers as damage control.

Source

Overwhelming Public Interest

Clearly there is a costly disconnect between marketing efforts and website performance considerations, but sometimes simple public interest in a product or service can back a website into a corner. Government and public service websites are more and more becoming essential resources for the general public, and especially at service launches or at times of peak interest, these key web applications need to be able to scale – but sometimes don’t…

7. Swine Flu Pandemic

Curiosity killed this website of high public interest

Setting the scene: Back in July 2009, the UK was caught up in the supposed “Swine Flu Pandemic”. Some reports went as far as to say that up to 60,000 could die of swine flu. To help ease the strain on the medical sector, the government decided to launch a website with a check list of the symptoms of swine flu, giving appropriate advice to those who had them, while putting those without at ease.

Performance Nightmare: With media scaremongering at a high, the website received 2,600 hits per second, or 9.3 million hits per hour, just two hours after launching. Unsurprisingly, the website crashed temporarily, although it was quickly restored; this was put down to most people visiting out of “curiosity” and quickly leaving the site after deciding on their diagnosis.

Source

8. UK Police Crime Maps

Even with higher than expected demand, websites can still be guilty of not being built to scale 

Setting the scene: In a move to increase transparency in crime statistics, the UK government launched a website in February 2011 allowing members of the public to get access to information about crime rates in their areas via markers on an interactive map. This received mainstream news coverage, with questions being raised about the accuracy & impact of the reports; for example, the affect on insurance rates or house prices.

Performance Nightmare: The crime maps were of such great public interest that it received 18 million hits an hour on its first day, bringing it tumbling down within a few hours in a very public fashion. While it might seem reasonable for a website to fall down under 18 million hits an hour, it was clear that the site simply wasn’t designed to perform at any kind of scale, despite using Amazon EC2 machines to spin up extra capacity; “you still need to build a site that scales without needing 1000′s of servers”.

Source

9. Census (1901 UK & 1940 USA)

Learn from other people’s mistakes, and make sure you can scale up enough

Data entry for the 1940 census web archive… sort of

Setting the scene: Although separated by 39 years and the Atlantic Ocean, these two census reports caused a very similar problem to their respective websites. As the census information came into the public domain (in 2002 for the UK, 2012 for the States), each government commissioned websites to host the historical data, which was placed into databases and images scanned in for downloading. Both sites expected a high level of interest and part of their remit was to cope with the high level of load (the US census was expected to support 10 million hits a day, while the UK census was required to cater for 1.2 million users per day).

Performance Nightmare: Demand for each service was so overwhelming that it exceeded both predefined targets set to the websites, bringing them down within hours. To start with the 2002 failure of the 1901 UK census, the website hit its 1.2 million hit limit within just 3 hours, and the site was closed in an attempt to investigate means to make it scale. It was closed in January and eventually reopened in August, with full functionality being restored in November. Ten years later, the US government apparently didn’t heed this lesson in scaling a census website, as the site hit 22.5 million hits within 3 hours of launching. Again, despite being hosted in Amazon’s AWS cloud, the site didn’t scale to meet the demand, and the site was forced to restrict its functionality when it came back online the next day.

Source

10. London 2012 Olympics

Web performance can be more like a marathon than a sprint

Setting the scene: The Olympics is often a cause of controversy, in the sheer level of interest it generates. Cities all over the world clamour to host the global event, as it draws in tourism and revenue, but with that comes social, economical and logistical challenges, as even advanced cities prepare to welcome a sudden spike in the population.

Performance Nightmare: There is almost too much to write about this one. In April 2011, a window to buy 6.6 million tickets through a public ballot came to an end; as it was not a first-come, first-served basis, many people waited until the last minute to decide on what tickets to bid on. The website was slowed to a crawl late on the last day under heavy load, forcing the six-week window to be extended by several hours. A few months later, the Olympics ticket resale site opened, allowing people to buy and sell official tickets with each other, but this also failed to cope with the strain of demand, slowing to a crawl. More problems arose in December and January, with more ticket website outages and cases of events being oversold.

Source

11. UCAS

Increasing adoption of the internet in general can impact your service performance

Setting the scene: Compared to post or call centres, the internet is a cost-effective way to collect information from lots of people at once, and with more people having access than ever to internet services, it makes sense to expect the public to use them. Indeed, UCAS now uses their website to allow students to book places on courses with vacant places in the clearing stage of University applications.

Performance Nightmare: In 2011 185,000 students were chasing just 29,000 unfilled course places. The number of hopeful students logging into the UCAS clearing site quadrupled from the previous year. UCAS were forced to shut down the site for over an hour to cope with the volume of traffic coming into the site, as students were dependant on the service to find out the status of their applications.

Source

12. Floodline

Sometimes it’s not the volume of traffic, but what the traffic is doing that causes problems

Disclaimer: Not a realistic danger of a flooded website

Setting the scene: The UK Environment Agency’s National Floodline was set up in 2002 to provide instant information via call centre or over web about potential flood dangers across the UK. However, heavy rainfall over the Christmas and New Year of 2002/2003 caused a surge of activity at both channels.

Performance Nightmare: The sudden demand and searches for information made the website suddenly unavailable for many. As the risk of flooding rose, phone enquiries climbed to a peak of 32,650 calls a day, and as people failed to get through, many turned to the web site where they would execute complicated searches in order to establish the impact of flooding in their area. At the peak, on 2 January, 23,350 people were hitting the site, and while the site was built to support a high number of users (and had successfully done so in the past), it was the complexity of the searches than was the main cause of bottlenecks. As the Environment minister told a parliamentary committee, the web site crash (which took the site out for several days) was not helped by the fact that so many people were at home over that period “and had little else to do except surf the net and look for flood information”.

Source

Technical Hiccups

While website performance problems are only brought to light when the site in question needs to perform well more than ever, and site owners find themselves “victims of their own success” when a marketing push or genuine public interest flood their website, there are also times where a glitch or error can bring on a Performance Nightmare. From hardware failure through to human error, such instances have proven to cause serious problems ranging from bad PR through to legal woes, and of course have cost their victims a lot of money.

13. Tesco

Customers needing a service won’t hesitate to go elsewhere 

Setting the scene: Online grocery site Tesco.com is a service used to order shopping for home delivery in the UK. Many people use it for their weekly grocery shop, as part of a busy lifestyle or perhaps through being unable to physically get to and from a supermarket. Tesco makes an estimated £255 million a year through online sales.

Performance Nightmare: In September 2011, the Tesco online service was halted for 2 hours by “technical glitches”. Disgruntled customers, who in some cases depend on getting specific delivery slots, were quick to go elsewhere with their custom, as many other UK supermarkets now offer an online delivery service.

Source

14. TD Waterhouse

A case of the financial impact being all too apparent

Setting the scene: TD Waterhouse, now known in the US as Ameritrade and elsewhere as TD Direct Investing, is an individual investment services company. Customers use its online service to order stocks and shares. As of 2001, it was the second largest discount broker in the US.

Performance Nightmare: The stock broker’s website suffered significant outages, which prevented customer orders from being processed on 33 different trade days spanning from November 1997 through to April 2000. The outages lasted up to 1 hour 51 minutes. This, along with TD Waterhouse’s failure to advise customers about alternative order methods, plus a general lack of customer service around the matter, caused the New York Stock Exchange to fine TD Waterhouse $225,000. The company put the outages down to “software issues”. The Securities and Exchange Commission released a report in January 2001 calling on brokerage firms to improve areas such as performance.

Source

15. JP Morgan Chase

Communication and prompt action are key when customers suffer from a web performance failure

Setting the scene: American bank JP Morgan chase, which as of 2010 had $2 trillion in assets, provides an online banking services for its customers to manage their accounts and make transactions.

Performance Nightmare: On 14th September 2010, Chase bank’s online service went down “sometime overnight”, causing inconvenience for customers, who took to Twitter to vent. One user was quoted as tweeting  “Dear Chase Bank, I have about 10 million expense reports to do, please get your act together so I can see my transactions online!” While occasional online bank outages aren’t rare, in this case the outage lasted around 18 hours.

Source

Got a contribution to the list? Leave it in the comments below!

Want to avoid a web performance nightmare of your own? Check out Intechnica’s Event Performance Management service!

Webinar: Designing Applications for the Cloud

This webinar, from 6th March 2012, was hosted by Intechnica‘s Technical Director, Andy Still. Andy talked about the key principles of designing and migrating applications to the cloud. This includes scaling out, taking new and imaginative approaches to data storage, making full use of the wide range of products and services on offer from cloud providers (beyond hosting), and exploring the many flavours of hybrid solution which can mean all types of business can leverage the benefits of the cloud.

Andy has architected and built a number of cloud-based applications, specialising in highly scalable, high-performance, business critical applications.

If you’re planning or considering moving to the cloud in 2012 then this webinar is essential viewing.

More Intechnica webinars

AWS instances, their ever-changing hostnames and the implications for software licensing

I’ve recently been doing some performance testing for a client and evaluating the use of dynaTrace for monitoring application performance under load. As well as an installation of dynaTrace at the client site, we have a demonstration/evaluation licence which is installed on an AWS cloud server. As well as being useful for client demonstrations, this gives us the opportunity to perform proof of concept exercises and “try things out” away from production systems.

Last week, in an effort to save on the cost of keeping the AWS instance up and running all the time, I decided to shut the server down using the AWS console. When I went back to the server and restarted it, I had the following error message in dynaTrace.

I did some investigation and I found that dynaTrace locks the licence key to the hostname of the server on which it is installed. This is all well and good in a normal environment, but I noticed that the name of the host server changed each time that I rebooted. When I installed dynaTrace, my machine name was ip-03a4d76 and when I restarted the server the name had changed to ip-0a3b11c9.

I looked at the server IP address and saw that as the server restarted (even though I was using an elastic IP address to address the server externally), the hostname changed when the private (internal Amazon) IP address changed. The hostname was a hexadecimal representation of the private IP address.

My IP address was 10.59.17.201 and the hostname (which has since changed again) was ip-0a3b11c9 (0A = 10, 3B = 59, 11 = 17 and C9=201).

I spoke to the dynaTrace, the supplier of our software, and they told me that it can be tied to a MAC address, rather than a hostname if required, but that didn’t help me since I understand that MAC addresses change each time AWS instances restart. Instead I looked at ways of fixing the hostname and found that it was remarkably easy (when you know where to look).

On each Windows AWS server there is a program on the start menu called “EC2 Service Properties”. Run this program and uncheck the box “Set Computer Name”, you can then set a HOSTNAME normally which persists after each reboot. Your hostname-dependent software can then be reinstalled or re-licensed and you can relax in the knowledge that your software will run properly next time you restart your server.