Data Mining Primer
Data Mining Primer
By: Arlene Zaima Data Mining Marketing Manager Teradata Contributor: James Kashner CTO Teradata Data Mining Lab
Executive Summary
By now, youve probably heard or read about the rewards that data mining can bring to your business. But, very little has been written to explain the challenges facing many Information Technology (IT) organizations as they try to make data mining part of their business intelligence operations. This paper explores data mining from the IT perspective giving a quick overview of the data mining technology, technical challenges, and solutions for implementing successful data mining projects. This white paper explains data mining in terms that can be understood by data warehouse professionals. These explanations include: > How data mining is used for business advantages today > The integral relationship between data mining and data warehousing > The challenges that may be encountered with data mining > The details about how to get started with data mining
Data Mining Makes Its Way to the Business World Since the mid 1980s, data mining has been very effective in select and focused situations such as medical diagnosis,
scientific research, and behavioral profiling. In the past ten years, data mining technology has journeyed from the scientific and academic worlds into the business world where it adds a new dimension of predictive
Analytic Application
Business Question
Business Value
OLAP
Data Mining
Business problems that lend themselves to data mining are predictive and descriptive in nature. Predictive models are used to predict an outcome, referred to as the dependent or target variable, based on the value of other variables in the data set. For example, a predictive model could determine the likelihood that a customer will purchase a product based on her income, number of children, current product ownership, or debt. Predictive techniques build models based on a training set of data with a known outcome, such as prior buying patterns. The algorithm analyzes the values of all input variables and identifies which variables are significant as predictors for a desired outcome.
,ALA F
,AIEC 6H=E @A
@A
@A
JAIJ = @ L= E@=JA
,AF O
,AF O 4AF HJI )FF E?=JE
M A@CA
@A E JACH=JE
2H A?J = =CA A J
*KIE AII 2H > A ,ABE EJE
,=J= 2HAF=H=JE
the graphical Windows interface. In some cases, Teradata Warehouse Miner breaks the algorithms into steps so that the steps which require data access are performed via SQL, while other steps requiring numerical processing are handled by the Teradata Warehouse Miner client. Teradata Warehouse Miner processes functions in the most optimal manner leveraging the parallelism of Teradata Database whenever possible.
Teradata Database
Analytical Data
Results
Traditionally, data mining technologies require that you move data out of the centralized data warehouse and into
sophisticated set of analytical algorithms and graphical interfaces. However, they fail to provide a robust set of data visualization and data preprocessing functions. Since the bulk of the data mining process is spent exploring and conditioning data, you need tools that will facilitate data exploration, visualization, transformation, and data management. Tools must also process large data volumes and provide an interface that enables integration of analytical models into business applications. Data Mining with Teradata Data warehouse solution providers, such as Teradata, a division of NCR, fully understand the data mining challenges
and issues facing companies today. Teradatas in-database data mining approach sets us apart from other data mining solution providers in the industry. Our centralized solution permits users to do data exploration, data preprocessing, analytic modeling, scoring, and deployment all within the database using SQL, taking advantage of Teradata Databases unlimited scalability and exceptional performance. Performing data mining in the database streamlines the process by eliminating data movement and the overhead associated with managing the data and the systems involved in a distributed environment. In-database mining also reduces data redundancy and improves data reliability.
proprietary or flat file structures. With this technique, many copies of the data will reside in various analytical servers or data marts. Imagine how much time it could take to create 20 samples of a terabyte-sized database, extract them into different locations, convert them into different formats and finally, import them into applications. Can you afford the time and inefficiencies of this method? Teradata Warehouse Miners analytic operations can be performed on the data within the Teradata Database. Results from the analysis are stored within your enterprise data warehouse providing access to all users as necessary. (See Figure 6.)
Not Just Better, but the Best Our Benchmarks Prove it!
Teradata and NCR are registered trademarks of NCR Corporation. Windows is a registered trademark of Microsoft Corporation. NCR continually enhances products as new technologies and components become available. NCR, therefore, reserves the right to change specifications without prior notice. All features, functions and operations described herein may not be marketed in all parts of the world. Consult your Teradata representative or Teradata.com for the latest information. 2004 NCR Corporation Dayton, OH U.S.A. All Rights Reserved.