0% found this document useful (0 votes)
59 views

Infrastructure of Data Warehouse: Ms. Ashwini Rao Asst - Prof.IT

The document discusses the infrastructure that supports data warehouse architecture. It covers the key elements of infrastructure including operational, physical, hardware, operating systems and computing platforms. It also describes various parallel processing architectures like symmetric multiprocessing, clusters, massively parallel processing and NUMA that can be used to implement different components of a data warehouse architecture. The document summarizes that infrastructure provides the foundation for data warehouses and that hardware, operating systems and parallel server architectures are important design decisions.

Uploaded by

Tanushree Shenvi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Infrastructure of Data Warehouse: Ms. Ashwini Rao Asst - Prof.IT

The document discusses the infrastructure that supports data warehouse architecture. It covers the key elements of infrastructure including operational, physical, hardware, operating systems and computing platforms. It also describes various parallel processing architectures like symmetric multiprocessing, clusters, massively parallel processing and NUMA that can be used to implement different components of a data warehouse architecture. The document summarizes that infrastructure provides the foundation for data warehouses and that hardware, operating systems and parallel server architectures are important design decisions.

Uploaded by

Tanushree Shenvi
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Infrastructure of Data Warehouse

Ms. Ashwini Rao Asst.Prof.IT

Infrastructure supporting architecture

Infrastructure
Elements that enable the architecture to be implemented.
Operational
help to keep the DW going
People Procedures Training Management software

Physical
Hardware components Operating system Network, network software

Physical Infrastructure

Features of Hardware & OS


Hardware
Scalability Vendor support Vendor stability Vendor reference
Scalability Security Reliability Availability Preemptive multitasking Memory protection
RS SPAM

OS

Possible options of Hardware & OS


Mainframes
Old hardware Designed for OLTP Expensive Not easily scalable

Open System Servers


UNIX servers are most opted Robust Adapted for parallel processing

NT Servers
Medium-sized data warehouses Limited parallel processing Cost effective for small or medium DW

Platform Options
A computing platform is the set hardware components, operating system, network & network software. Both Online Transaction Processing and Decision Support Systems need a computing platform.

Single Platform Option


All functions from back-end data extraction to front-end query processing is performed on one platform.
Data flows smoothly, no conversions required No middleware required Limitations Legacy platform stretched to capacity Non-availability of tools Multiple legacy platforms Companys migration policy

Hybrid Platform Option


Eliminate s the drawbacks of single platform option Data extraction: Each source is extracted on its own computing platform Initial reformatting & merging: The extracted file from each source is reformatted & merged, on their respective platforms Preliminary data cleansing: Verify extracted data for missing values & data types. Transformation & Consolidation: Performed on the platform where the staging area resides. Validation & Final Quality Check Creation of Load Images

Data Movement Considerations


Shared Disk Mass Data Transmission
Through ports

Real Time Connection


TCP/IP

Manual Methods
External medium

Data movement options

Client/Server architecture for DW

Considerations on client workstations


Depends on type of users
casual user-Web browser and HTML reports Analyst-more powerful workstation machine

Practically feasible solution is a minimum configuration on an appropriate platform that would support a standard set of information delivery tools in DW

Platform options as DW matures

Parallel processing
Symmetric multiprocessing Clusters Massively parallel processing Cache-coherent Non uniform Memory Architecture

Symmetric Multiprocessing

Symmetric Multiprocessing
Features: This is a shared-everything architecture, the simplest parallel processing machine. Each processor has full access to the shared memory through a common bus. Communication between processors occurs through common memory. Benefits: Provides high concurrency. You can run many concurrent queries. Balances workload very well. Gives scalable performance. Simply add more processors to the system bus. Being a simple design, you can administer the server easily.

Symmetric Multiprocessing
Limitations: Available memory may be limited. May be limited by bandwidth for processorto-processor communication, I/O, and bus communication. Availability is limited; like a single computer with many processors.

Clusters

Clusters
Features: Each node consists of one or more processors and associated memory. Memory is not shared among the nodes; it is shared only within each node. Communication occurs over a high-speed bus. Each node has access to the common set of disks. This architecture is a cluster of nodes. Benefits: This architecture provides high availability; all data is accessible even if one node fails. Preserves the concept of one database. This option is good for incremental growth.

Clusters
Limitations: Bandwidth of the bus could limit the scalability of the system. This option comes with a high operating system overhead. Each node has a data cache; the architecture needs to maintain cache consistency for internode synchronization. Main memory is like a big file cabinet stretching across the entire room.

Massively Parallel Processing

Massively Parallel Processing


Features: This is a shared-nothing architecture. This architecture is more concerned with disk access than memory access. Works well with an operating system that supports transparent disk access. If a database table is located on a particular disk, access to that disk depends entirely on the processor that owns it. Internode communication is by processor-to-processor connection. Benefits: This architecture is highly scalable. The option provides fast access between nodes. Any failure is local to the failed node; improves system availability. Generally, the cost per node is low. Limitations: The architecture requires rigid data partitioning. Data access is restricted.

NUMA

NUMA
Features: This is the newest architecture. The NUMA architecture is like a big SMP broken into smaller SMPs that are easier to build. Hardware considers all memory units as one giant memory. The system has a single real memory address space over the entire machine; memory addresses begin with 1 on the first node and continue on the following nodes. Each node contains a directory of memory addresses within that node. In this architecture, the amount of time needed to retrieve a memory value varies because the first node may need the value that resides in the memory of the third node. That is why this architecture is called non uniform memory access architecture. Benefits: Provides maximum flexibility. Overcomes the memory limitations of SMP. Better scalability than SMP.

NUMA
Limitations: Programming NUMA architecture is more complex than even with MPP. Software support for NUMA is fairly limited. Technology is still maturing.

Database Software
Many operations can be parallelized
mass loading of data full table scans queries with exclusion conditions, queries with grouping selection with distinct values aggregation sorting creation of tables using subqueries, creating and rebuilding indexes inserting rows into a table from other tables

Types of parallelization

Software Tools

Summing up
Infrastructure acts as the foundation supporting the data warehouse architecture Data warehouse infrastructure consists of operational infrastructure and physical infrastructure. Hardware and operating systems make up the computing environment for the DW. Several options exist for the computing platforms needed to implement the various architectural components.

Summing up
Selecting the server hardware is a key decision. Invariably, the choice is one of the four parallel server architectures. Current database software products are able to perform interquery and intraquery parallelization. Software tools are used in the data warehouse for data modeling, data extraction, data transformation, data loading, data quality assurance, queries and reports, and online analytical processing (OLAP). Tools are also used as middleware, alert systems,and for data warehouse administration.

You might also like