Scaling Systems


Information Systems; Information Technology; System-Level Programming


System scalability is the ability of a computer system or product to continue to function correctly when it is altered in size or volume due to user need. To be considered scalable, a system must be able not only to function properly while scaled but also to adapt to and take advantage of its new environment and increased capacity.



As computer technology rapidly advances, computer systems must handle increasingly large and complex demands. Rather than replacing an existing system each time these demands exceed its capacity, one can simply scale up or out.

Scalability is the ability of a computer system to adapt to and accommodate an increased workload.

Horizontal scaling works by distributing the workload across multiple servers or systems. Vertical scaling involves adding resources, such as hardware upgrades, to an existing server or system. Scalability allows data to be processed at a greater rate, decreasing load times and increasing productivity.


Both horizontal and vertical scaling play key roles in how networks and computers operate. How fast and how well a program or web application (app) functions depends on the resources at its disposal. For example, a new web app may have only a small number of users per day, who can easily be accommodated by a single server. However, as the number of users increases, that server will at some point become unable to support them all. Users may find that their connection is much slower, or they may be unable to connect to the server at all. At this point, the server must be scaled either vertically, such as by adding more memory, or horizontally, by adding more servers.

Vertical scaling is limited by the physical size of the existing system and its components. Fortunately, most modern computers employ open architecture, which supports the addition of hardware from different vendors. This allows for greater flexibility when upgrading components or even rebuilding part of the system to accommodate more components. One example of vertical scaling might be adding another processing unit to a single-processor system so that it can carry out parallel operations. An open-architecture system could use processors from different vendors or even different types of processors that perform different functions, such as a central processing unit (CPU) and a graphic processing unit (GPU). A system that can be scaled using such heterogeneous components is said to have heterogeneous scalability.

Vertical and horizontal scaling have their pros and cons. Vertical scaling increases computing power with a more powerful (and often more costly) computer.

Vertical and horizontal scaling have their pros and cons. Vertical scaling increases computing power with a more powerful (and often more costly) computer. Horizontal scaling increases computing power with more computers of equal power (and equal price), but a more complex network and programming is required.
EBSCO illustration.

Ultimately, however, vertical scaling can only go so far. For larger-scale operations such as cloud computing, horizontal scaling is more common. Cloud computing connects multiple hardware or software components, such as servers or networks, so they can work as a single unit. The downside of horizontal scaling is that it is not immediate. It requires careful planning and preparation to make the components work logically together. While adding more servers is a good way to deal with increased traffic, for instance, if the system cannot properly balance the load across the available servers, a user may see little improvement.


Another way is to use a proxy server. This is particularly useful for a system with multiple servers. A proxy functions as an intermediary between the user and the main servers. It receives requests from users and directs them to the appropriate servers. A proxy can also combine requests to speed up processing. If multiple users request the same data, a system without a proxy has to perform multiple retrievals of that data. However, if the requests are filtered through a proxy server, the proxy can perform a single retrieval and then forward the data to each user.


Unlike other computing systems, many cloud-computing services automatically scale up or down to be more efficient. Extra computing resources, such as additional servers, are provided when usage is high and then removed when they are no longer needed. This is known as “auto-scaling.” It was first introduced by Amazon Web Services and has since been adopted by other cloud-computing services, such as Microsoft Azure and Google Cloud Platform.

For large Internet companies such as these, which support massive distributed storage and cloud-computing systems, simply auto-scaling is not enough. They must be able to scale up from a handful of servers to thousands rapidly, efficiently, and resiliently, without server outages that might impact user experience. This type of architecture is described as “hyper-scale.” Hyper-scale data-center architecture is mainly software-based, replacing much of the hardware of a traditional data center with virtual machines. In addition to supporting rapid, efficient auto-scaling, this greatly reduces infrastructure costs for both established and new companies. Large companies that support hundreds of millions of daily users do not have to pay to operate the massive data centers that traditional architecture would require. Meanwhile, new start-ups can launch with just a few servers and then easily scale up to a few thousand when necessary.

Receive-side scaling (RSS) is another way in which companies can scale up. This directs incoming network traffic to various CPUs for processing, thereby speeding up the network. RSS has become a feature in some cloud-computing services, such as Microsoft Azure.

—Daniel Horowitz

De George, Andy. “How to Autoscale an Application.” Microsoft Azure. Microsoft, 7 Dec. 2015. Web. 17 Feb. 2016.

El-Rewini, Hesham, and Mostafa Abd-El-Barr. Advanced Computer Architecture and Parallel Processing. Hoboken: Wiley, 2005. Print.

Evans, Kirk. “Autoscaling Azure—Virtual Machines.” .NET from a Markup Perspective. Microsoft, 20 Feb. 2015. Web. 17 Feb. 2016.

Hill, Mark D. “What Is Scalability?” Scalable Shared Memory Multiprocessors. Ed. Michel Dubois and Shreekant Thakkar. New York: Springer, 1992. 89–96. Print.

Matsudaira, Kate. “Scalable Web Architecture and Distributed Systems.” The Architecture of Open Source Applications. Ed. Amy Brown and Greg Wilson. Vol. 2. N.p.: Lulu, 2012. PDF file.

Miniman, Stuart. “Hyperscale Invades the Enterprise.” Network Computing. UBM, 13 Jan. 2014. Web. 8 Mar. 2016.

Peterson, Larry L., and Bruce S. Davie. Computer Networks: A Systems Approach. 5th ed. Burlington: Morgan, 2012. Print.