0% found this document useful (0 votes)
72 views

CS614 - Helping Material

The document discusses three topics: 1. Data Transformation Services (DTS) allow organizations to extract, transform, and consolidate data from different sources into single or multiple destinations to improve decision making. 2. Kimbal's data warehouse lifecycle approach includes project planning, requirements definition, technology selection, dimensional modeling, and analytic application development. 3. Major issues in data cleansing for an agriculture data warehouse arose due to manual data handling and processing by different groups, including errors during hand recordings, typing, photocopying, and data entry.

Uploaded by

Azhar Khan
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

CS614 - Helping Material

The document discusses three topics: 1. Data Transformation Services (DTS) allow organizations to extract, transform, and consolidate data from different sources into single or multiple destinations to improve decision making. 2. Kimbal's data warehouse lifecycle approach includes project planning, requirements definition, technology selection, dimensional modeling, and analytic application development. 3. Major issues in data cleansing for an agriculture data warehouse arose due to manual data handling and processing by different groups, including errors during hand recordings, typing, photocopying, and data entry.

Uploaded by

Azhar Khan
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

Purpose of DTS services(5)

Many organizations need to centralize data to improve corporate decision-making. However, their data may be stored in a variety of formats and in different locations. Data Transformation Services (DTS) address this vital business need by providing a set of tools that let you extract, transform, and consolidate data from disparate sources into single or multiple destinations supported by DTS connectivity.

Kimbal's Approach
DWH Lifecycle: Key steps 1. Project Planning 2. Business Requirements Definition 3. Parallel Tracks 3.1 Lifecycle Technology Track 3.1.1 Technical Architecture 3.1.2 Product Selection 3.2 Lifecycle Data Track 3.2.1 Dimensional Modeling 3.2.2 Physical Design 3.2.3 Data Staging design and development 3.3 Lifecycle Analytic Applications Track 3.3.1 Analytic application specification 3.3.2 Analytic application development 4. Deployment 5. Maintenance

Issued faced in data cleansing of AgriDWH(3)


Major issues of data cleansing had arisen due to data processing and handling at four levels by different groups of people 1. Hand recordings by the scouts at the field level. 2. Typing hand recordings into data sheets at the DPWQCP office. 3. Photocopying of the typed sheets by DPWQCP personnel. 4. Data entry or digitization by hired data entry operators.

Cluster Index: Issues

Works well when a single index can be used for the majority of table accesses. Selectivity requirements for making use of a cluster index are much less stringent than for a non-clustered index.

o Typically by an order of magnitude, depending on row width. High maintenance cost to keep sorted order or frequent reorganizations to recover clustering factor. Optimizer must keep track of clustering factor (which can degrade over time) to determine optimal execution plan Works well when a single index can be used for the majority of table accesses. Selectivity requirements for making use of a cluster index are much less stringent than for a non-clustered index. o Typically by an order of magnitude, depending on row width. High maintenance cost to keep sorted order or frequent reorganizations to recover clustering factor. Optimizer must keep track of clustering factor (which can degrade over time) to determine optimal execution plan

Question No: 31 ( Marks: 10 ) Shared RDBMS architecture requires a static partitioning. How do you perform the partitioning. Shared nothing RDBMS architecture requires a static partitioning of each table in the database. How do you perform the partitioning? Hash partitioning Key range partitioning. List partitioning. Round-Robin Combinations (Range-Hash & Range-List) (p. 211) How time contiguous log entries and HTTP secure socket layer are used for user session identification? What are the limitations of these techniques? Time-contiguous Log Entries Limitations This method breaks down for visitors from large ISPs because different visitors may reuse dynamically assigned IP addresses over a brief time period. Different IP addresses may be used within the same session for the same visitor.

This approach also presents problems when dealing with browsers that are behind some firewalls. HTTP's secure sockets layer (SSL) Limitations The downside to using this method is that to track the session, the entire information exchange needs to be in high overhead SSL, and the visitor may be put off by security advisories that can pop up when certain browsers are used. Each host server must have its own unique security certificate.

( Question No: 31 ( Marks: 2 ?What are the two extremes for technical architecture design? Which one is better

.Theoretically there can be two extremes i.e Free space and Free performance If storage is not an issue, then just pre-compute every cube at every unique combination of dimensions at every level as it does not cost anything. This will result in maximum query performance. But in reality, this implies huge cost in disk space and the time for constructing the pre-aggregates. In the other case where performance is free i.e. infinitely .fast machines and infinite number of them, then there is not need to build any summaries Meaning zero cube space and zero pre-calculations, and in reality this would result in .minimum performance boost, in the presence of infinite performance

( Question No: 32 ( Marks: 2 ?What is value validation process Value validation is the process of ensuring that each value that is sent to the data warehouse is .accurate ( Question No: 35 ( Marks: 3 Why building a data warehouse is a challenging activity? What are the three broad categories of ?data warehouse development methods 1. Waterfall model 2. RAD model 3. Spiral Model

( Question No: 36 ( Marks: 3 ?What are three fundamental reasons for warehousing Web data

.Web data is unstructured and dynamic, Keyword search is insufficient .1 .Web log contain wealth of information as it is a key touch point .2 .Shift from distribution platform to a general communication platform .3 ( Question No: 37 ( Marks: 3 ?What types of operations are provided by MS DTS 1. Providing connectivity to different databases 2. Building query graphically 3. Extraction data from disparate databases 4. Transforming data 5. Copying database objects

6. Providing support of different scripting languages (by default VB-script and Java

Reasons for web warehousing 1. Searching the web (web mining). 2. Analyzing web traffic. 3. Archiving the web.

Three major types of searches, as follows: 1. Keyword-based search 2. Querying deep Web sources 3. Random surfing Drawbacks of traditional web searches 1. Limited to keyword based matching. 2. Can not distinguish between the contexts in which a link is used. 3. Coupling of files has to be done manually. ///////////////////////////////////////////////////////////////////////////// Why web warehousing-Reason no. 1? Web data is unstructured and dynamic, keyword search is insufficient. To increase usage of web must make it more comprehensible. Data Mining is required for understanding the web. Data mining used to rank and find high quality pages, thus making most of search time. Why web warehousing-Reason no. 2?

Web log contains wealth of information, as it is a key touch point. Every customer interaction is recorded. Success of email or other marketing campaign can be measured by integrating with other operational systems. Common measurements are: Number of visitors Number of sessions Most requested pages Robot activity Etc. Why web warehousing-Reason no. 3? Shift from distribution platform to a general communication platform. New uses from e-government to e-commerce and new forms of art etc. between different levels of society. Thus web is worthy to be archived to be used for several other projects. Such as snapshots of preserving time.

Lecture # 41 DTS
Data Transformation Services (DTS) is a set of graphical tools and programmable objects that allow you extract, transform, and consolidate data from disparate sources into single or multiple destinations.

DTS includes

Data Import/Export Wizard DTS Designer DTS Query Designer Package Execution Utilities

DTS Basics
DTS Packages DTS Tasks DTS Transformations DTS Package Workflows DTS Tools

Meta Data

DTS Packages: (4 Operations) Packages can be:


1. Edited 2. Password protected 3. Scheduled for execution 4. Retrieved by version

DTS Package: Contents


DTS Package is an organized collection of
Connections DTS tasks DTS transformations Workflows

DTS Package: Creating


Package can be created by one of the following three methods:
Import/Export wizard DTS Designer Programming DTS applications
The slide shows environment of DTS Designer. In designer we can see four windows A. Connection toolbar B. Task toolbar C. General toolbar D. Design Area Import/Export Wizard is sufficient to perform all above mentioned tasks easily. So we will use the wizard as it can provide us good functionality in this scenario.

Seven Steps to Extract Data Using Wizard


1. Launch the Wizard 2. Choose a Data Source 3. Choose a Database

4. Specify the Destination 5. Choose Destination Database

Specification of file format incase of Text files

Selection of existing database or creation of a new database

6. Select a table

Selection of existing table or creation of a new table

7. Finalizing and scheduling the package

Execution of a package
1. 2. 3. 4. 5. 6.

Connection with source (Text file) is established Connection with destination (MS-SQL Server) is established New Database at destination is created New table is created Data is extracted from source Data is loaded to destination

Why Correct Before Transform?


One is an obvious reason, other not so obvious. If SQL Package encounters an error, it roll backs all transactions. Sometimes the error reporting is ambiguous.

"OLE" originally stood for

Object Linking and Embedding

You might also like