CS614 - Helping Material
CS614 - Helping Material
Many organizations need to centralize data to improve corporate decision-making. However, their data may be stored in a variety of formats and in different locations. Data Transformation Services (DTS) address this vital business need by providing a set of tools that let you extract, transform, and consolidate data from disparate sources into single or multiple destinations supported by DTS connectivity.
Kimbal's Approach
DWH Lifecycle: Key steps 1. Project Planning 2. Business Requirements Definition 3. Parallel Tracks 3.1 Lifecycle Technology Track 3.1.1 Technical Architecture 3.1.2 Product Selection 3.2 Lifecycle Data Track 3.2.1 Dimensional Modeling 3.2.2 Physical Design 3.2.3 Data Staging design and development 3.3 Lifecycle Analytic Applications Track 3.3.1 Analytic application specification 3.3.2 Analytic application development 4. Deployment 5. Maintenance
Works well when a single index can be used for the majority of table accesses. Selectivity requirements for making use of a cluster index are much less stringent than for a non-clustered index.
o Typically by an order of magnitude, depending on row width. High maintenance cost to keep sorted order or frequent reorganizations to recover clustering factor. Optimizer must keep track of clustering factor (which can degrade over time) to determine optimal execution plan Works well when a single index can be used for the majority of table accesses. Selectivity requirements for making use of a cluster index are much less stringent than for a non-clustered index. o Typically by an order of magnitude, depending on row width. High maintenance cost to keep sorted order or frequent reorganizations to recover clustering factor. Optimizer must keep track of clustering factor (which can degrade over time) to determine optimal execution plan
Question No: 31 ( Marks: 10 ) Shared RDBMS architecture requires a static partitioning. How do you perform the partitioning. Shared nothing RDBMS architecture requires a static partitioning of each table in the database. How do you perform the partitioning? Hash partitioning Key range partitioning. List partitioning. Round-Robin Combinations (Range-Hash & Range-List) (p. 211) How time contiguous log entries and HTTP secure socket layer are used for user session identification? What are the limitations of these techniques? Time-contiguous Log Entries Limitations This method breaks down for visitors from large ISPs because different visitors may reuse dynamically assigned IP addresses over a brief time period. Different IP addresses may be used within the same session for the same visitor.
This approach also presents problems when dealing with browsers that are behind some firewalls. HTTP's secure sockets layer (SSL) Limitations The downside to using this method is that to track the session, the entire information exchange needs to be in high overhead SSL, and the visitor may be put off by security advisories that can pop up when certain browsers are used. Each host server must have its own unique security certificate.
( Question No: 31 ( Marks: 2 ?What are the two extremes for technical architecture design? Which one is better
.Theoretically there can be two extremes i.e Free space and Free performance If storage is not an issue, then just pre-compute every cube at every unique combination of dimensions at every level as it does not cost anything. This will result in maximum query performance. But in reality, this implies huge cost in disk space and the time for constructing the pre-aggregates. In the other case where performance is free i.e. infinitely .fast machines and infinite number of them, then there is not need to build any summaries Meaning zero cube space and zero pre-calculations, and in reality this would result in .minimum performance boost, in the presence of infinite performance
( Question No: 32 ( Marks: 2 ?What is value validation process Value validation is the process of ensuring that each value that is sent to the data warehouse is .accurate ( Question No: 35 ( Marks: 3 Why building a data warehouse is a challenging activity? What are the three broad categories of ?data warehouse development methods 1. Waterfall model 2. RAD model 3. Spiral Model
( Question No: 36 ( Marks: 3 ?What are three fundamental reasons for warehousing Web data
.Web data is unstructured and dynamic, Keyword search is insufficient .1 .Web log contain wealth of information as it is a key touch point .2 .Shift from distribution platform to a general communication platform .3 ( Question No: 37 ( Marks: 3 ?What types of operations are provided by MS DTS 1. Providing connectivity to different databases 2. Building query graphically 3. Extraction data from disparate databases 4. Transforming data 5. Copying database objects
6. Providing support of different scripting languages (by default VB-script and Java
Reasons for web warehousing 1. Searching the web (web mining). 2. Analyzing web traffic. 3. Archiving the web.
Three major types of searches, as follows: 1. Keyword-based search 2. Querying deep Web sources 3. Random surfing Drawbacks of traditional web searches 1. Limited to keyword based matching. 2. Can not distinguish between the contexts in which a link is used. 3. Coupling of files has to be done manually. ///////////////////////////////////////////////////////////////////////////// Why web warehousing-Reason no. 1? Web data is unstructured and dynamic, keyword search is insufficient. To increase usage of web must make it more comprehensible. Data Mining is required for understanding the web. Data mining used to rank and find high quality pages, thus making most of search time. Why web warehousing-Reason no. 2?
Web log contains wealth of information, as it is a key touch point. Every customer interaction is recorded. Success of email or other marketing campaign can be measured by integrating with other operational systems. Common measurements are: Number of visitors Number of sessions Most requested pages Robot activity Etc. Why web warehousing-Reason no. 3? Shift from distribution platform to a general communication platform. New uses from e-government to e-commerce and new forms of art etc. between different levels of society. Thus web is worthy to be archived to be used for several other projects. Such as snapshots of preserving time.
Lecture # 41 DTS
Data Transformation Services (DTS) is a set of graphical tools and programmable objects that allow you extract, transform, and consolidate data from disparate sources into single or multiple destinations.
DTS includes
Data Import/Export Wizard DTS Designer DTS Query Designer Package Execution Utilities
DTS Basics
DTS Packages DTS Tasks DTS Transformations DTS Package Workflows DTS Tools
Meta Data
6. Select a table
Execution of a package
1. 2. 3. 4. 5. 6.
Connection with source (Text file) is established Connection with destination (MS-SQL Server) is established New Database at destination is created New table is created Data is extracted from source Data is loaded to destination