Scaling an ad platform

We scaled a startup advertising network's platform by 5x to handle over 3 million requests per minute, while halving their operating costs.

The client was an early stage startup who had built an online advertisment publishing platform with some unique tracking and targeting capabilities, already delivering ads for well-known brands to high profile sites via third party ad networks.

They had scaled up by adding racks of servers in distributed physical data centres as the most cost-effective way to handle regular traffic loads, but they were finding that when an ad went live on a high traffic site, the resulting spike in demand would overwhelm capacity, and publisher sites were concerned that their users were sometimes left waiting for the ad content to load.

Adding more physical servers for sufficient reserve capacity was not going to be economical, so Uzeweb's experience managing servers at scale made us a strong partner to help find a more cost-effective solution.

Initial investigation

We put a lightweight web server in front of each frontend machine to implement enhanced logging, and analysed the logs for a month to find the peak spike demand, and the average capacity for each server.

We found that while the data centres were well scaled to cope with regular load, the spikes brought well over 5x the traffic. Additionally, 10% of answered requests took almost a minute to calculate a response, as the servers became backlogged trying to perform tracking and targeting calculations in real-time. This compounded the problem as machines then fell out of the load balancers due to slow responses.

We also used the logs to perform offline request replays to profile the codebase, and identified the key areas in the analytics logic which were creating performance bottlenecks.

We recommended a three-pronged approach to tackling traffic: handling over-capacity spikes gracefully, restructuring the infrastructure to be able to scale more cost-effectively, and refactoring the code to move more tracking logic into asynchronous queues.

Implementation

We started by implementing an escape valve for traffic spikes which went above capacity, to avoid impacting publisher sites in the worst case where other measures failed. We handled this by adding timeouts to the frontends; if the application server was unable to provide a response within 1 second, the web server would return a suitable default response while the application server worked through its tracking analysis backlog.

We then addressed the scaling issue by adding cloud-based servers. The client had chosen physical machines over virtual because it was more cost-effective for the constant load, but for spikes it was more economical to temporarily scale up using virtual machines in a cloud provider, given pricing was per request (CPM). We developed provisioning tools with a view to automating scaling, although we settled on manual provisioning to allow the client more direct control of their costs.

Lastly we added an option to offload the tracking logic into a queue-based asynchronous process which could even out the load from a traffic spike. Although it would mean that targeting data would temporarily lag behind the collected data, during busy periods it meant we could dramatically increase capacity by turning off the real-time targeting and reducing the total time taken to process a request.

Outcome

When the client experienced their next traffic spike, the asyncronous queue meant that we were able to service all requests with the existing servers in under 1s, with over 90% of responses in under 0.25s.

Although the response time cap wasn't needed, if a future spike was higher than expected then every request would still return a full response within 1 second, addressing publishers' concerns.

These changes were so effective that the client was able to halve their number of physical servers and still have adequate capacity to handle regular load, using the cloud provisioning tools to scale up to temporarily meet increased demand during spikes.