Five considerations for building highly scalable applications

Non-scalable applications hit the business bottom line in two ways : First, application Scalability issues will limit the maximum number of users that can use the application at the same time. Secondly, once that limit is breached, application may become unavailable when the load increases significantly. This may lead to visitors change their mind and switch to an alternative provider. According to Scalability experts, Martin Abbott and Michael Fischer who wrote an excellent small book called, “Scalability Rules,” scalability should be designed for in three phases: the design itself, the implementation, and the deployment.


Scalability is the ability to scale or to accommodate a growing amount of work in a graceful way or a capable manner. This can mean multiple things for different applications, but usually it refers to the smooth and progressive degradation of service that allows the application to work but with somewhat increased response times. Scalable applications are usually more complex and more expensive to build than the applications with no explicit high scalability in mind. So we must ensure that when we build an application, the benefits of adding complexity and additional cost to ensure high scalability are really worth it.

Let’s have a look at five key considerations for building highly scalable applications:

1. Minimize Storage Locks

Bottlenecks – natural points of communication between the layers in architecture eg: Data access layer – can introduce two main problems. The first one is it puts a limit to the communication flow in the application, as it can’t accommodate more requests than the application allows for. The second one is that it’s very easy for a bottleneck to become a single point of failure. Single source of truth databases limit maximum throughput, which is the maximum number of requests per second we can get. Additionally, by having a single source of data it makes application vulnerable to locking, as it serves all the data operations, even if they are just reads and not updates.

Minimizing storage locks can lead to a scalable system with less bottlenecks. We can split or partition the storage into different parts and then we get more throughput as there are more components in peril that sum the throughput of all the individual partitions. We can also split database into read and write parts to eliminate the locking due to transaction isolation levels of database. We can even dispense with the relational database and use a NoSQL database or even in-memory database. All of these are varied options for minimizing the locking due to the storage of the data.

2. Caching

When an application runs in a local network, all the communication is very quick and the latency introduced by the network is almost negligible, but when application runs in the Cloud and is accessed by thousands of users outside the Cloud, the latency increases, as there are many network components connected with an application. The clearest cases are the requests from the browser or the client side to the service side of the application. Each request has to travel across the internet, to the data center, passing to routers, switches, and other network devices, all of which introduce some delay or latency. Latency makes application slower to respond.

Caching is undoubtedly the cheapest way to reduce unnecessary roundtrips and latency, by copying the data right from the server and storing it locally. By using caching, we basically trade RAM memory for response times.

3. Asynchronous or Non blocking requests

A web application is by its nature an example of request and response model. One of the pages in the application issues a request to the backend and the backend responds. During the request and until the response is issued, the calling page and its executing threads are blocked. They just wait idle, waiting for the response to come from the backend. It limits application maximum throughput as there is a limited number of threads to make the requests. If all of them are blocked, waiting for the response, application will make any additional requests wait until one of them is free, which adds to the average response time as well and if the wait is too long, we will begin to see timeouts in the application.

If the programming language allows to use asynchronous calls, we can free the thread from waiting for the response. It can have enormous impact on the throughput of the application as we are using the existing threads in a much more efficient manner. The same threads can be reused again and again while still waiting for the existing requests on the backend to respond.

4. Queuing

With asynchronous calls we can optimize the use of threads, but in the end we still wait for a response to arrive. By using queues we can decouple the request response model and just send a message to a queue. The sender is then free to attend other requests. The receiving part is also free to get the messages from the queue and process them at its own pace. This can offload long operations to be processed by other loads and applications and then the results are sent back through another queue. This also can get false tolerance as queue mechanism usually allows for multiple retries if a processing node has failed.

5. Avoid Single Point of Failure (SPOF)

Almost every component of an application can act as a single point of failure. A single point of failure or SPOF is a component that if down, breaks the application. In a single node deployment with no load balancing, the entire application is a huge single point of failure. We can spin more nodes with application and we will usually just shift the failure point to the database, as usually all the nodes will share a database. We must also make the database duplicated at least to avoid a single point of failure.

Usually, increasing loads put more strain to application components and ultimately the hardware fails at one point of time or another. Redundant architecture shields the application from that failure and keeps it running. Adding redundant components has the additional and optional benefit of being able to split the incoming load into different buckets to be processed by different nodes.


Scalability is very important and a very desirable thing for modern web and mobile applications. Scalability should be taken into account during the application design and should never been an afterthought; however, scalability comes with a price and it has to be justified.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s