Building a Data Warehouse in DBMS
Last Updated :
25 Apr, 2023
A Data warehouse is a heterogeneous collection of different data sources organized under unified schema. Builders should take a broad view of the anticipated use of the warehouse while constructing a data warehouse. During the design phase, there is no way to anticipate all possible queries or analyses. Some characteristic of Data warehouse are:
- Subject oriented
- Integrated
- Time Variant
- Non-volatile
Building a Data Warehouse - Some steps that are needed for building any data warehouse are as following below:
- To extract the data (transnational) from different data sources: For building a data warehouse, a data is extracted from various data sources and that data is stored in central storage area. For extraction of the data Microsoft has come up with an excellent tool. When you purchase Microsoft SQL Server, then this tool will be available at free of cost.
- To transform the transnational data: There are various DBMS where many of the companies stores their data. Some of them are: MS Access, MS SQL Server, Oracle, Sybase etc. Also these companies saves the data in spreadsheets, flat files, mail systems etc. Relating a data from all these sources is done while building a data warehouse.
- To load the data (transformed) into the dimensional database: After building a dimensional model, the data is loaded in the dimensional database. This process combines the several columns together or it may split one field into the several columns. There are two stages at which transformation of the data can be performed and they are: while loading the data into the dimensional model or while data extraction from their origins.
- To purchase a front-end reporting tool: There are top notch analytical tools are available in the market. These tools are provided by the several major vendors. A cost effective tool and Data Analyzer is released by the Microsoft on its own.
For the warehouse there is an acquisition of the data. There must be a use of multiple and heterogeneous sources for the data extraction, example databases. There is a need for the consistency for which formation of data must be done within the warehouse. Reconciliation of names, meanings and domains of data must be done from unrelated sources. There is also a need for the installation of the data from various sources in the data model of the warehouse. Conversion of the data might be done from object oriented, relational or legacy databases to a multidimensional model. One of the largest labor demanding component of data warehouse construction is data cleaning, which is one of the complex process. Before loading of the data in the warehouse, there should be cleaning of the data. All the work of loading must be done in warehouse for better performance. The only feasible and better approach for it is incremental updating. Data storage in the data warehouse:
- Refresh the data
- To provide the time variant data
- To store the data as per the data model of the warehouse
- Purging the data
- To support the updating of the warehouse data
Some of the important designs for the data warehouse are:
- Modular component design
- Consideration of the parallel architecture
- Consideration of the distributed architecture
- Usage protection
- Characteristics of available sources
- Design of the metadata component
- The fit of the data model
The major determining characteristics for the design of the warehouse is the architecture of the organizations distributed computing environment. The distributed warehouse and the federated warehouse are the two basic distributed architecture.There are some benefits from the distributed warehouse, some of them are:
- Improved load balancing
- Scalability of performance
- Higher availability
Federated warehouse is a decentralized confederation of autonomous data warehouses. Each of them has its own metadata repository.Now a days large organizations start choosing a federated data marts instead of building a huge data warehouse.
Advantages:
Familiarity: Building a data warehouse in a DBMS that an organization is already using can be advantageous, as it allows developers to use existing skills and knowledge to build and maintain the data warehouse.
Scalability: DBMSs are designed to handle large amounts of data, making them well-suited for building data warehouses that require storage and analysis of large volumes of data.
Integration: A data warehouse built in a DBMS can be integrated with other databases and applications in the organization, allowing for seamless data flow between systems.
Security: DBMSs offer robust security features that can help ensure the confidentiality, integrity, and availability of data in the data warehouse.
Performance: DBMSs are optimized for performance, which can result in faster data retrieval and processing times.
Disadvantages:
Complexity: Building a data warehouse in a DBMS can be complex, as it involves designing and implementing a database schema that is optimized for analytical processing.
Cost: DBMSs can be expensive, particularly for large-scale data warehouses that require high levels of processing power and storage.
Learning curve: Building a data warehouse in a DBMS may require specialized skills and knowledge, which can result in a steep learning curve for developers who are not familiar with the technology.
Maintenance: Maintaining a data warehouse in a DBMS requires ongoing effort, including monitoring for performance issues, ensuring data quality, and making updates as needed.
Data silos: Building a data warehouse in a DBMS can result in data silos if the data warehouse is not integrated with other databases and applications in the organization.
Similar Reads
Testing in Data warehouse
Data Warehouse stores huge amount of data, which is typically collected from multiple heterogeneous source like files, DBMS, etc to produce statistical result that help in decision making. Testing is very important for data warehouse systems for data validation and to make them work correctly and ef
2 min read
Attributes of Data Warehouse
A data warehouse is meant for only query and analysis rather than transaction processing. The data warehouse is essentially subject-oriented, non-volatile, integrated, time-variant, and consists of historical data stored over long periods of time. A blueprint of BI and data mining algorithms. Data m
3 min read
What is Data Abstraction in DBMS?
Data Abstraction is one of the most important concepts in DBMS. Data abstraction is the process of hiding unwanted and irrelevant details from the end user. It helps to store information in such a way that the end user can access data which is necessary, the user will not be able to see what data is
3 min read
Data Mining: Data Warehouse Process
INTRODUCTION: Data warehousing and data mining are closely related processes that are used to extract valuable insights from large amounts of data. The data warehouse process is a multi-step process that involves the following steps: Data Extraction: The first step in the data warehouse process is t
8 min read
Difference between Data Warehouse and Data Mart
Both Data Warehouse and Data Mart are used for store the data. The main difference between Data warehouse and Data mart is that, Data Warehouse is the type of database which is data-oriented in nature. while, Data Mart is the type of database which is the project-oriented in nature. The other differ
6 min read
Difference between Data Warehouse and Hadoop
Data Warehouse and Hadoop are two commonly used technologies that serve as the repositories of large amounts of data. In their essence, while both aim at addressing the need for data storage and analysis they are quite distinct in their structure, performance, and applications. This article will fur
5 min read
Difference between Data Lake and Data Warehouse
Data LakeData Lake is the concept where all sorts of data can be landed at a low cost but exceedingly adaptable storage/zone to be examined afterward for potential insights. It is another advancement of what ETL/DWH pros called the Landing Zone of data. Only presently we are looking at ALL sorts of
3 min read
Star Schema in Data Warehouse modeling
A star schema is a type of data modeling technique used in data warehousing to represent data in a structured and intuitive way. In a star schema, data is organized into a central fact table that contains the measures of interest, surrounded by dimension tables that describe the attributes of the me
5 min read
Building blocks of a Data Model
A data model is a structure of the data that contains all the required details of the data like the name of the data, size of the data, relationship with other data and constraints that are applied on the data. It is a communication tool. A data model is essential in order to store the database in a
2 min read
Data Models in DBMS
A Data Model in Database Management System (DBMS) is the concept of tools that are developed to summarize the description of the database. Data Models provide us with a transparent picture of data which helps us in creating an actual database. It shows us from the design of the data to its proper im
8 min read