A data warehouse is a database that collects information from various disparate sources and stores it in a central location for easy access and analysis. The data stored in a data warehouse is a static record, or a snapshot, of what each item looked like at a specific point in time. This data is not updated; rather, if the information has changed over time, later snapshots of the same item are simply added to the warehouse. Data warehouses are commonly used in business to aid management in tracking a company's progress and making decisions.
Computer scientist William H. Inmon, is largely credited with having codified and popularized the concept of data warehousing. Inmon began discussing the underlying principles in the 1970s and published the first book on the subject, Building the Data Warehouse, in 1992. The term itself was coined in 1988 when Barry A. Devlin and Paul T. Murphy published their paper “An Architecture for a Business and Information System,” which discusses a new software architecture they describe as a “business data warehouse.”
Simply put, a data warehouse is a central database that contains copies of information from several different sources, such as other, smaller databases and company records, stored in a read-only format so that nothing in the warehouse can be changed or overwritten. Since its inception, data warehousing has been associated with the field of business intelligence, which deals with analyzing a company's accumulated data for the purpose of developing business strategies. However, data warehouses are used in nonbusiness contexts as well, such as scientific research or analysis of crime statistics.
There are two main approaches to designing a data warehouse: from the top down and from the bottom up. Inmon champions the top-down model, in which the warehouse is designed as a central information repository. Data is entered into the warehouse in its most basic form, after which the warehouse can be subdivided into data marts, which group together related data for easy access by the team or department to which that information pertains. The advantages of creating data marts in a top-down data warehouse include increased performance for the user, as they can be accessed with greater efficiency than the much larger warehouse, as well as the ability to institute individual security protocols on different subsets of data. However, data marts are not a necessary component of top-down warehouse design, and they have their drawbacks, such as a propensity toward duplicated or inconsistent data.
Alternatively, the bottom-up model of data warehousing, developed by Ralph Kimball, views a data warehouse as simply a union of the various data marts. In this method, the data marts—separate databases used and maintained by individual departments—are created first and then linked to create a data warehouse. The merits of one approach over the other are often debated; the cheaper, faster, and more flexible bottom-up design is generally regarded as better in small or medium-sized businesses but too unwieldy for large corporations, where the sheer number of different people entering new data into different data marts would make any efforts to maintain consistency and avoid duplication extremely time consuming.
Cloud computing is the use of a remote server network on the Internet rather than a local server or hard drive to store, manage, and process data. Cloud computing can be used for data warehousing; warehousing data in the cloud offers both benefits and challenges. The benefits of moving a data warehouse to the cloud include the potential to lower costs and increase flexibility and access. Challenges include maintaining data security and privacy, network performance, and management of intellectual property.
—Randa Tantawi, PhD
Chandran, Ravi. “DW-on-Demand: The Data Warehouse Redefined in the Cloud.” Business Intelligence Journal 20.1 (2015): 8–13. Business Source Complete. Web. 8 June 2015.
Devlin, Barry A., and Paul T. Murphy. “An Architecture for a Business and Information System.” IBM Systems Journal 27.1 (1988): 60–80. Print.
Greenfield, Larry. Data Warehousing Information Center. LGI Systems, 1995. Web. 8 Oct. 2013.
Griffith, Eric. “What Is Cloud Computing?” PC Magazine. Ziff Davis, PCMag Digital Group, 17 Apr. 2015. Web. 8 June 2015.
Henschen, Doug, Ben Werther, and Scott Gnau. “Big Data Debate: End Near for Data Warehousing?” InformationWeek. UBM Tech, 19 Nov. 2012. Web. 8 Oct. 2013.
Inmon, William H. Building the Data Warehouse. 4th ed. Indianapolis: Wiley, 2011. Print.
Laberge, Robert. The Data Warehouse Mentor: Practical Data Warehouse and Business Intelligence Insights. New York: McGraw, 2011. Print.
Mohamed, Arif. “A History of Cloud Computing.” Computer Weekly. TechTarget, 2000–2015. Web. 8 June 2015.
Singh, Ajit, D. C. Upadhyay, and Hemant Yadav. “The Analytical Data Warehouse: A Sustainable Approach for Empowering Institutional Decision Making.” International Journal of Engineering Science and Technology 3.7 (2011): 6049–57. PDF file.
Steier, Sandy. “To Cloud or Not to Cloud: Where Does Your Data Warehouse Belong?” Wired. Condé Nast, 29 May 2013. Web. 9 Oct. 2013.
Williams, Paul. “A Short History of Data Warehousing.” Dataversity. Dataversity Educ., 23 Aug. 2012. Web. 8 Oct. 2013.