0% found this document useful (0 votes)

56 views12 pages

Crivat B., MacLennan J. - Detect Anomalies in Excel Spreadsheets

Uploaded by

AKOGU J. AKPOCHI J.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views12 pages

Crivat B., MacLennan J. - Detect Anomalies in Excel Spreadsheets

Uploaded by

AKOGU J. AKPOCHI J.

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da...

Page 1

Ask Advisor
Experts
Username Submit Go to Article Doc # Go
Advisor Tips
Password Shop Advanced Search Search All Publications Go
Join:Member Center Login Advisor Jump to . . .
EXPERT ADVICE & KNOW-HOW
Customer
Help

ADVISOR Home Advisor Zones Advisor Magazines Advisor Live Events CDs/DVDs Advisor Forums

ARTICLE ACCESS = You are here: Advisor.com > Magazines > Access Advisor > Detect Anomalies in Excel Spreadsheets
free access
= subscriber-only
= has download

ARTICLE INFO

ACCESS ADVISOR
Web issue 2004 week
36
Print issue October
2004 SQL SERVER DEVELOPMENT
Length 6.25 pages
Doc #14413 Detect Anomalies in
Files for this article Excel Spreadsheets
are on this issue's Use SQL Server 2005 Data Mining
Professional Resource
CD.
inside Excel.

File Description:
By Bogdan Crivat and Jamie MacLennan,
Example to find Microsoft SQL Server Data Mining
anomalies in
spreadsheets. No reader comments yet.
Click to download
file
763,329 bytes Microsoft Excel becomes more and more versatile with each release and
solves a wider variety of business needs. Its flexibility and
programmability let you integrate different technologies to better
understand and process the data in your spreadsheets. From its inception
ADVISOR ARTICLES in SQL Server 2000, Microsoft's data mining solution has provided a
More on Microsoft Excel programming model to access data mining technologies, which has
expanded with SQL Server 2005. This article shows you how these two
More on Microsoft SQL technologies can work together seamlessly.
Server

workbook after a few SageKey - Build and troubleshoot installations

DATABASED ADVISOR
mouse clicks. The
code that
DEVELOPMENT accompanies this ADVERTISEMENT
ADVISOR Secrets Of The Top
article is an Excel add- Experts -- Now!
in you can install in Microsoft Excel and use for detecting anomalies in any See exactly how to do it, step-
MICROSOFT EXCHANGE Excel worksheet.
by-step, in Advisor Academy
CDs created by the top
& O UTLOOK ADVISOR experts. Click to see what you
can learn right now.
You don't need any previous knowledge of data mining or programming to AdvisorAcademy.com
MICROSOFT .NET use the solution in this article. However, the last part of the article (the
ADVISOR
"Add-in details" section) contains details about the solution
implementation. The Excel add-in is developed with Visual Basic for

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 2

Applications (VBA) and uses Data Mining eXtensions (DMX) for SQL Join the Mobile &
MICROSOFT O FFICE statements in modeling the data and detecting the anomalies. Some Wireless Revolution
SYSTEM ADVISOR knowledge of these technologies might help you understand and further Read the official guide to the
next wave of business and
extend the solution. The "Add-in details" section also contains a brief lifestyle. Subscribe now to
MICROSOFT ADVISOR description of the DMX language and the location of the specification. keep up, and scan the
archives to catch up.
MobileBusinessAdvisor.com

MICROSOFT SQL Requirements

SERVER ADVISOR
We tested the Excel add-in presented here with the SQL Server 2005 Beta Need Know-How?
CUSTOMER SERVICE As 2005 cranks up, what
2 release, which Microsoft shipped to more than 200,000 Microsoft direction are you going?
Shop Advisor Developer Network (MSDN) subscribers. The Microsoft Analysis Services Advisor magazines are
packed with the answers you
Advisor FAQ 2005 server, part of which is Microsoft Data Mining, is included in this beta need to work smarter. Can
Writing & Speaking release. For the add-in to work, you have to install the Connectivity you afford to fall behind?
AdvisorStore.com
Components included with SQL Server 2005 Beta 2 on the machine.

The add-in also requires that you have permissions to create a temporary
file on the C:\ drive, although you can change the code to use any folder Free E-Newsletters
Keep up! Hot News, How-To,
where the users have permissions. Tips & Tricks, Expert Advice,
i ADVISOR.com
j
k
l
m
n and more. Click to request
your's free.
j Web
k
l
m
n AdvisorUpdate.info
Search
The problem
Real, meaningful data usually contains patterns. You can describe these Internet Domain
patterns in terms of relations between various column values. An example Management
Get total control of your W eb
is "IF the value of the Age column is smaller than 18 THEN the value of and e-mail domains with a
the Occupation column IS LIKELY TO BE Student." Sometimes you can powerful browser control
panel -- and save money!
detect a simple rule easily just by visually inspecting the data or by using Register your domains with
common sense. A spreadsheet entry with 10 as the value of the Age AdvisorDomains.com

column and Lawyer as the value of the Occupation column usually raises
eyebrows and will most likely be treated as an anomaly.
Showcase Your
If the spreadsheet has a small number of columns (we also use the term Smarts
Submit your tips, techniques
"attributes" for columns), data visualization tools (such as the graph and advice and let Advisor
component of Excel) are helpful. With scatter plots (such as the X-Y promote your business and
build your career. Show the
graphs in Excel), you can usually detect simple relations between two or world what you know!
three columns just by visually inspecting the spreadsheet. AdvisorTips.com

However, the complexity of these patterns grows with the dimensionality

of the data. That is, the more columns a spreadsheet has, the more
complex the rules describing the patterns. Intuitively, more columns mean
the rules tend to be more complicated. For example:

"IF
the value of the Age column is greater than 21
BUT smaller than 25 AND
the value of the Credit Score column is smaller than 720
AND
the value of the Income column is greater than 50000 BUT
the value of the Number of Children column IS NOT 0
THEN
the value of the Home Ownership column IS LIKELY TO BE
Rent"

Now, such a rule isn't easy to find. A good understanding of the data
always helps, but visually inspecting a few thousand rows in a
spreadsheet is a daunting task even for a person familiar with the
columns.

The problem is even more complex because the relations between two
columns may change completely for different values in a third column. For
example, the relations between the number of children and the home
ownership status change a lot with age. For instance, regardless of the
number of children, people at the beginning of their careers are less likely
to own a home than a seasoned professional. Therefore, you must
consider all the possible values of a column before attempting to use that
column in a rule.

The anomaly detection process can only start when you've completely
determined the set of rules, and it requires reading all the data again and
verifying, for each row, any rule that might apply.
The problem treated in this article is finding the values that are anomalies
for a specified column (we refer to this column as "the target column")-
that is, those spreadsheet entries that don't abide by the rules that, in
general, relate the values in the target column to the values in the other

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 3

columns. After you resolve this problem, the spreadsheet user can take
action and clean the spreadsheet in various ways, depending on his
purpose. For example, he can:

Recheck the data for the entries that contain anomalies

Eliminate entries considered abnormal
Correct the abnormal entries by changing the anomaly values to the
ones suggested by the rules

For this article, the spreadsheet of interest looks like figure 1. The
spreadsheet has six columns: Student ID, Gender, Parent Income, IQ
(intelligence quotient), Parent Encouragement, and College Plans. These
are the values in each column:

Student ID contains a unique identifier (a number) for each entry in

the spreadsheet. In terms of relational databases, this column is a
key: It can help you connect this spreadsheet to another one
containing the name and address for each student, or to one
containing the exam grades for each course a student has taken.
The values in this column depend only on the order in which the
student information was entered in the spreadsheet. They're in no
way related to the other columns, so you can ignore this column.
The Gender column contains demographic information about each
student. The values are Male and Female.
The Parent Income column contains the income information for the
parents of each student. It contains values between 5,000 and
74,900.
The IQ column contains the intelligence quotient for each student.
The values are between 60 and 140.
The Parent Encouragement column describes whether the parents
encourage a student to continue his education through college. It
contains values of Encouraged and Not Encouraged.
The College Plans column shows whether a student intends to go to
college.

Here's the problem we want to solve: Who are the students whose college
plans don't fit their potential?

A first guess would indicate that students with:

high IQ
parental encouragement
and high-income parents

would plan to to go to college. Therefore any student who didn't follow this
pattern would be an anomaly. But are these the only ones? Are there
other cases that don't fit the general pattern?
The problem becomes even more difficult when the user of the
spreadsheet doesn't fully understand the data.

How Data Mining can help

You can think of SQL Server 2005 Data Mining as a set of technologies
that deal with automatically discovering meaning in data, as opposed to
imperative technologies such as query languages, where the user explicitly
asks for certain properties of the data. This isn't really a definition, but
this explanation describes how we use data mining in this article. You'll
see how data mining can help in finding rules and anomalies in your
spreadsheet.

How can data mining find rules in my

spreadsheet?
For Microsoft SQL Server 2005 Data Mining, data is always represented as
a set of input cases. These input cases share a set of attributes. Generally,
each case has a value for each attribute. However, for some cases, certain
attributes may be missing. For the Excel spreadsheet, each row is an input
case, with column values acting as attributes.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 4

In a typical usage scenario, you first train a Data Mining engine from the
whole data or just a subset. During training, the engine learns the rules
and patterns in the subset. In a second phase, you apply the rules for
various purposes, such as detecting anomalies in your spreadsheet.

You can reformulate the problem of detecting anomalies in terms of data

mining technology: You have to find the rules and patterns behind the
columns in the spreadsheet.

With these rules at hand, for each row in the spreadsheet, you decide
what's the most likely value for the College Plans column based on the
values in the other columns. In other words, for each student whose IQ,
Parent Income, and Parent Encouragement are known, you have to decide
whether he's likely to continue with college education. Then, if the likely
college plans don't match the actual college plans, you treat that student
as an anomaly from the set of discovered rules.

This kind of problem, in terms of data mining, is a classification problem.

You're classifying each row based on the student's plans to go to college.
SQL Server 2005 Data Mining provides a few algorithms for solving
classification problems. The Microsoft_Decision_Trees algorithm is a
particularly good fit for your spreadsheet problem because it's proven to
find good rules with high accuracy, it's highly optimized for performance,
and it describes the rules in an intuitive form.

For a specific attribute, the Microsoft_Decision_Trees algorithm can

determine the factors that most influence the value of that column.
Furthermore, it's able to clearly describe the relative importance of these
factors.

Let's assume, for now, the factors affecting a student's college plans are
(in the descending order of importance) IQ and the parents' income. The
Microsoft_Decision_Trees algorithm will find and organize rules like those
shown in table 1.

You can think of this structure of rules as a tree because all the students
are first divided based on the most important attribute (here, IQ). Then,
each branch is divided again based on the most important factor for that
subset of data. In table 1, we assumed the parents' income to be the
most important factor for those students with high IQ.

Here's an example of the format of a rule the Microsoft_Decision_Trees

algorithm discovers:

IF "IQ >= 100" AND "Parent's income > 20000" THEN (the student)
MOST LIKELY "Plans to attend"

How likely is "most likely"?

Before moving further, let's see how much you can trust these rules. The
Microsoft_Decision_Trees algorithm never generates rules that aren't
reflected in the data. However, some rules are more important than
others, and some rules are to be trusted more than others.

Clearly, a rule that applies to 1,000 students deserves more consideration

than a rule that applies to only two students. So, a first measure of
confidence is the support: the number of students (spreadsheet rows) for
whom the rule applies, or, in data mining terminology, the number of
cases that support this rule.

Now, let's take another look at the rule above. What does it mean by
"MOST LIKELY Plans to attend"? How likely is "MOST LIKELY"? Let's
assume, in the context of the rule above, that the support is 100. This
means that, in the whole spreadsheet, we found 100 students who have
an IQ greater than or equal to 100 and parents with income greater than
$20,000. Now, if all these students plan to attend college, this is a strong
rule; there seems to be no exception. However, this hardly happens in real
life. You usually end up with something like this: 82 of the students plan
to attend college, but 17 don't plan to attend college, and one didn't
mention any plans.

Now, you can define most likely college plans as those plans shared by
most of the students who match this rule. This is "Plans to attend", with
82 votes. The likelihood of such plans is 82 out of 100-82 percent (or
0.82). This value is the confidence (or probability) of the rule.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 5

Microsoft_Decision_Trees finds both the support and the confidence for

each rule and provides access to this information through the SQL-like
DMX language.

Integration with Excel

Microsoft Data Mining provides an environment for describing the data to
be mined and for intuitively displaying the rules and patterns you found.
However, we want to solve this problem completely inside Excel. Microsoft
Data Mining also comes with an extensive programmability solution, a set
of libraries that simplify the task of integration in various applications. It's
this set of libraries that allows seamless integration with Excel and other
applications.

Excel provides a handy feature called add-ins. An add-in is a library that

can extend the workbook functionality. The DataMining Anomaly
Detection.xla file provided with this article is such an add-in. To install it,
perform these steps:

1. Open a workbook in Excel (for example, open the CollegePlans.xls file

that comes with this article).
2. In the Tools menu, select the "Add-Ins …" menu item. A box labeled
Add-Ins appears.
3. In that box, look for the "Browse …" button. After you click on it, the
usual Windows file selection box comes up. Select the DataMining Anomaly
Detection.xla file from the location where you saved it.
4. A new box might show up, with a message like this:

"Copy Data Mining Anomaly Detection.xla to the Add-Ins folder

for <your user name>?"

If you select Yes, Excel saves a copy of this add-in into a special Add-Ins
folder. Otherwise, the add-in only works as long as its original location is
still valid.

5. A new entry shows up in the list of add-ins: Data Mining Anomaly

Detection.
6. After you close the Add-Ins box, Excel adds a new entry to the Tools
menu, Data Mining Anomaly Detection.

Please note that, depending on your current Excel security settings, this
procedure might not work if you've disabled macros. Usually, an error
message indicates this problem. You can solve it by selecting Tools >
Options and, in the Security panel of the resulting dialog, clicking on the
"Macro Security …" button. You see a new dialog that lets you select the
security level for running macros. The Medium level lets you choose
whether to allow macros; in particular, you can select whether you want to
allow the Anomaly Detection add-in to run. After adjusting security
settings, you have to re-add the data mining add-in.

Now, you can use the Anomaly Detection add-in to find problems in the
CollegePlans.xls spreadsheet. You can apply these steps to any
spreadsheet:

1. Select the range of data you want to analyze.

In the spreadsheet, select the columns and rows that contain the data you
want to analyze. The selection must include the names of the Excel
columns in the first row, and it must have at least two rows (at least one
data row, besides the column names). You don't have to select all the
data in the spreadsheet, but this is how I got the results I describe below.

After you select the range of data, select the new entry by choosing Data
Mining Anomaly Detection from the Tools menu. A dialog like figure 2
displays.

2. Tell the add-in what to do.

The first input of this dialog (the one containing the range "$A$1:$F$9001"
in figure 2) lets you change or make a selection. After you make the
selection, the add-in populates the two drop-down lists for the key column
and anomaly detection column.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 6

If your spreadsheet contains a key column, such as a row identifier,

indicate this in the "Select the key column, if any" field to instruct the
Anomaly Detection add-in that no significant information is in that column.
Or, simply don't select that column at all and leave the key column to
"<none>". Then, select a column to search for anomalies. For the data set
described in this article, select the College Plans column.

Microsoft Data Mining also lets you see the reasons behind the anomalies.
If you don't want this, deselect the box marked "Click on anomalies to
show the rule they break." When enabled, this option creates a hyperlink
inside each Excel cell containing an anomaly. By clicking on that cell, you'll
be able to see the rule that identified the cell as an anomaly. However, if
the column to search for anomalies in your particular spreadsheet already
contains hyperlinks, leave this box unchecked, as the new hyperlinks will
remove the existing ones.

If you have Microsoft Analysis Services 2005 installed on your machine,

you can export the set of rules for your spreadsheet to a file. This lets you
import the rules into Analysis Services 2005 and further explore them. The
"What else can Data Mining do with my data?" section describes the
procedure to follow. If you don't have Microsoft Analysis Services installed
or don't want to export your data, just leave the file name empty.

After you've made all the selections, click on the OK button to instruct the
add-in to start looking for rules and anomalies. This process took us about
2 to 3 minutes for all 9,000 rows of the CollegePlans.xls spreadsheet. A
small dialog appears and informs you of the status of the operation.

After the anomalies are detected, you can move on to inspect the results.

3. Inspect the results.

After Excel completes the analysis and detects the anomalies, it highlights
them on the spreadsheet. The cells detected as anomalies are red and
have comments associated with them. If you selected the "Click on
anomalies shows the rule they break" option, these cells are also
hyperlinks (that is, clicking on the cells shows the rule they break).

Figure 3 shows how the spreadsheet looks after you run the Anomaly
Detection add-in.

As you can see, each anomaly cell has a comment now, describing the
expected college plans for that particular student and the probability
(confidence) of the rule that fits the student.
Finally, the add-in adds a new worksheet to the Excel workbook
containing those rules the Data Mining add-in found were relevant in
detecting the anomalies.

The newly created "Rules found by Data Mining" worksheet looks like
figure 4.

For each rule, the following columns are present:

Rule Description -- Contains the verbose description of the rule. As we

discussed in the "How data mining can help" section, the conditions in a
rule description are ordered based on the importance. Notice the most
important factor for determining the college plans of a student is the
encouragement he receives from the parents.

Confidence and Support -- These are measures for the quality of the
rule, as mentioned before in the "How likely is 'most likely'?" section.

Likely value for College Plans -- This is the most likely value for the
College Plans of students that fit into this rule.

Note that the rules may differ a lot depending on the spreadsheet data
you're analyzing. For example, if you only select the first few rows, you'll
likely find fewer rules and each rule will have fewer conditions.

What else can Data Mining do with my data?

Microsoft Data Mining can do a lot with your data. Rather than detecting
anomalies, this add-in can suggest the most likely value when the
information is missing. To try this, just empty a few cells in the College
Plans column and run the Anomaly Detection add-in again. The add-in will

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 7

fill the empty cells with the "[Missing data]" string, and they'll have a
comment with their most likely value and a link to a rule that justifies the
comment.

With small changes, you can use the add-in for a purpose other than
detecting anomalies. For instance, you can use it to partition the
spreadsheet in groups of rows with common characteristics (a problem
known in Data Mining as clustering, or segmentation).

While running the add-in, if you indicate a file to export the rules to, you
can load the set of rules on Microsoft Analysis Services 2005. To do this,
open the SQL Server Management Studio and connect to a running
instance of Microsoft Analysis Services 2005. In the context menu
associated with the Databases node of the Object Explorer, select Restore
and indicate the name of the file to which you exported the rules. Also,
enter a name for the database to contain the local mining model you're
restoring and go ahead with the Restore operation. As a result, you create
a new database on the Analysis Services server containing the local
mining model built on your spreadsheet. This model doesn't contain the
actual data, only the set of rules Data Mining discovered while processing
the Excel spreadsheet.

After the data is loaded on the server, a rich set of tools is available for
graphically displaying the rules.

Figure 5 shows how the rules discovered inside the CollegePlans

spreadsheet display inside the Microsoft Decision Tree Viewer. You can
easily follow the way Data Mining discovers rules as well as the
importance of various attributes (spreadsheet columns) in determining the
outcome. You can also easily understand the confidence and support for
each rule.

On the server side, Data Mining analysis can handle large volumes of data
by taking advantage of multiple processors. Also, a collection of various
Data Mining algorithms can help with various business problems.

Microsoft Data Mining comes with a query language similar to SQL. The
DMX query language lets you model data, train algorithms, and execute
business intelligence operations, such as retrieving the rules for the
spreadsheet or determining the most likely values for various attributes.

The next section includes a few examples of the DMX syntax. It also
describes how the add-in works and shows you the implementation
details.

Add-in details
Microsoft Data Mining is designed as a platform for developing the various
applications that can take advantage of the Data Mining technology. For
data warehouse applications, it contains a powerful, scalable server that
can handle large volumes of data and help many users. It also contains a
solution for lightweight, embedded Data Mining usage, such as finding
patterns in Excel spreadsheets.

This embedded solution is called "local mining models" and is a library with
many of the most commonly used functionalities of the Data Mining
server.

For the server and the local mining models, communication with the Data
Mining framework occurs via an OLE DB provider that lets you send
commands in the SQL-like DMX language and read results. The Excel add-
in uses this local server to perform data mining on the spreadsheet data.

First, you have to initialize a connection to the OLE DB provider for

Analysis Services. In the connection string, you can substitute the Data
Source property, which is usually a server name, for a file name. When a
file name is detected, the provider understands that it's supposed to load
the local server. In VBA, the ADODB library is a great instrument for
dealing with OLE DB providers. The add-in uses ADODB for sending DMX
requests to the local mining model. This is the VBA code snippet that
opens an ADODB connection to a local mining model, hosted in a
temporary file on the root drive:

Private m_cnAS As ADODB.Connection

Set m_cnAS = New ADODB.Connection

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 8

m_cnAS.Open "Provider=MSOLAP.3;Data Source=c:\ExcelAddIn.cub"

Note the following elements specific to the OLE DB provider for Analysis
Services:

The provider signature is "MSOLAP.3".

The data source is a file name because the connection is created
against a local mining model.

To perform the analysis, Data Mining must first model the data. For this to
happen, you must create a mining model object. This object will be the
container of all the rules and patterns in the data you analyze. Creating a
mining model is similar to creating a table in SQL. Here's the DMX
statement that creates the model associated with the spreadsheet:

CREATE MINING MODEL __ExcelTemp(

[StudentID] TEXT KEY,
[Gender] TEXT DISCRETE,
[ParentIncome] DOUBLE CONTINUOUS,
[IQ] DOUBLE CONTINUOUS,
[ParentEncouragement] TEXT DISCRETE,
[CollegePlans] TEXT DISCRETE PREDICT
) USING Microsoft_Decision_Trees

You obtain the name of the model columns from the first line of the
selection to which you apply the add-in. The type of the columns is
inferred from the values in the second row of the selection (the first row of
data). Note the PREDICT keyword that marks the CollegePlans column; it
signifies the model is supposed to find rules for that column.

After you create the model, you have to train it-that is, feed it with data
to find rules. Here's the DMX statement (again, similar to the SQL INSERT
statement) to do this:

INSERT INTO __ExcelTemp (

[Gender],
[ParentIncome],
[IQ],
[ParentEncouragement],
[CollegePlans])
@MySpreadsheet

The Analysis Services OLE DB provider supports parameters, such as

@MySpreadsheet. To be more specific, it's a parameter that's a set of
data rows. The OLE DB provider supports a data table parameter. You
pass this type of parameter in the format described by the XML for
Analysis (XMLA) 1.1 specification, which is available at http://
www.xmla.org.

The add-in contains some code that reads all the selections and packs it
into the XMLA format. This code is included in the XMLARowsetGen class
module, which is included in the plug-in. The XMLARowsetGen class simply
serializes each row in the XML format described by the XMLA 1.1
specification. The XMLARowsetGen object then reads the rows one by one.
The GenerateRowset method of this class module returns a string, which
contains the XML serialization of all the rows added so far.

Here's the code that attaches an XMLA 1.1 rowset as a parameter to the
ADODB command:

' Execute Training command

Dim cmd As New ADODB.Command
cmd.ActiveConnection = m_cnAS

' The INSERT INTO DMX Statement

cmd.CommandText = strInsert
cmd.NamedParameters = True

Dim param As ADODB.Parameter

Set param = cmd.CreateParameter

param.Name = "MySpreadsheet"
param.Type = adBSTR

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da... Page 9

param.Direction = adParamInput
param.Attributes = adParamLong

' The XMLA 11 serialized rowset

param.value = m_xmla.GenerateRowset

Note you have to name the parameter. This is a requirement for the OLE
DB for Analysis Services provider. Also, the parameter is of type adBSTR
Then, this statement is sent to the local Data Mining server, which
processes the mining model.

With the processed model, you can start predicting the most likely value
for the College Plans column. You can use a statement similar to SQL
SELECT with JOIN:

SELECT
Predict(__ExcelTemp.[CollegePlans], EXCLUDE_NULL),
PredictProbability(__ExcelTemp.[CollegePlans]),
PredictNodeId(__ExcelTemp.[CollegePlans])
FROM
__ExcelTemp
NATURAL PREDICTION JOIN
@MySpreadsheet as __Input

MySpreadsheet has the same meaning as above-i.e., a parameter that

contains the selection in which you're looking for anomalies.

We'll analyze the semantics of this prediction statement, as it's important

for understanding how Microsoft Data Mining works.

The __ExcelTemp local mining model, created with the CREATE MINING
MODEL … DMX statement, contains a number of columns, matching the
columns in the spreadsheet. The @MySpreadsheet table input parameter
also contains the columns in the spreadsheet.

The NATURAL PREDICTION JOIN part of the prediction statement indicates

the local mining model to map each case in the input table to the columns
of the local mining model based on the names of the columns. If the
columns in the input table have different names from the ones in the
mining model, you have to specify the mappings explicitly with a syntax
similar to SQL JOIN:

ON
[Mining Model Column] = [Input Column]

The statement, translated to plain English, is:

"For each row in the @MySpreadsheet input table, using the rules
detected while you trained the ExcelTemp mining model compute:

The predicted value for the [College Plans] column, excluding null
values
The probability of this prediction
The node identifier of the rule that governs this prediction

When:

The [Student Id] column of the mining model takes the value of the
[Student Id] column of the input table
The [Gender] column of the mining model takes the value of the
[Gender] column of the input table
The [ParentIncome] column of the mining model takes the value of
the [ParentIncome] column of the input table
And so on"

The predicted value is the most likely value for the [College Plans]
column, according to the rules that apply to the current row in the input
table. The probability of the prediction is the confidence of the rule. The
add-in uses the node identifier of the rule later to fetch the description,
support, and confidence for the respective rule from the Data Mining local
server.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database D... Page 10

For each line in the input table, the statement computes three values;
therefore, the result is a table with three columns for each row in the
input table.

But MySpreadsheet contains all the current selection. This means each row
in the response contains the three named columns for one row in the
selection.

After the add-in fetches this result, it walks the selection row by row. If
the value of the College Plans column differs from the most likely value
the Data Mining server returned, the respective row is marked as
containing an anomaly.

The last DMX statement the add-in issues exports the local mining model
into a file you can reuse in Analysis Services 2005. The statement is:

EXPORT MINING MODEL __ExcelTemp TO

'c:\MySpreadsheetRulesFile.abf'

This statement creates a new file on the hard disk, named

c:\MySpreadsheetRulesFile.abf, which contains an Analysis Services 2005
database with a single mining model. You can later restore this file as a
database or import it into an existing database with an IMPORT
statement:

IMPORT FROM 'c:\MySpreadSheetRulesFile.abf'

The DMX language is pretty powerful and not hard to comprehend for
someone familiar with SQL syntax. The detailed specification for DMX is
also included in the "OLEDB for Data Mining" specification, which you can
find on the Microsoft Web site at http://www.microsoft.com/downloads/
details.aspx?FamilyID=c66af00d-51be-4d8d-9056-
82cb2410ae3f&displaylang=en.

Simply by using a different algorithm name in the CREATE MINING MODEL

statement and changing some of the functions in the prediction
statement, you can modify the add-in to solve other business problems.

If the data source isn't an Excel spreadsheet and doesn't support an add-
in development language such as VBA, you can solve the anomaly
detection problem inside the Analysis Services 2005 server (although with
a few more clicks than required to run the add-in).

Power at your fingertips

Microsoft SQL Server 2005 Data Mining gives you a lot of versatility in
finding meaning in your data. Use this add-in with your spreadsheets or,
even better, play with the add-in code to further explore what local mining
models can do for you. You can change the algorithm from
Microsoft_Decision_Trees to Microsoft_Clustering. The add-in continues to
work as it did before, but by adding a few more DMX statements, you can
see how your data is partitioned and a receive a detailed description for
each of these partitions.

In the past, data mining has been a field of academics and high-end
researchers and analysts. The simplicity and ease of use of the new
Microsoft SQL Server 2005 Data Mining platform bring this powerful
technology to everybody's fingertips.

Figure 1: Example -- The CollegePlans

spreadsheet.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database D... Page 11

Figure 2: The Data Mining Anomaly Detection

dialog -- Select your options before finding the
anomalies.

Figure 3: Result of the Anomaly -- Detection

add-in-Note the highlighted cells with
anomalies and the comment that describes the
likely value in those cells.

Figure 4: Rules found by Data Mining -- The

rule's verbose description is accompanied by
the confidence and the support, as well as the
most likely College Plans value for cases
matching that rule.

Figure 5: Microsoft Decision Tree Viewer -- The

cases are divided based on various attributes.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database D... Page 12

The Node Legend window shows the

confidence and the support for the selected
rule node.

Table 1: Rules organized by

Microsoft_Decision_Trees -- From left to
right, each row is divided based on values of
the attributes.
Most Secondary Most
important factor and likely
factor and value college
value plans
All IQ >= 100 Parents' Plans to
Students income > attend
20000
Parents' Does not
income <= plan to
20000 attend
IQ < 100 ... ...

What do YOU think? CLICK HERE to add a comment to this article.

Detect Anomalies in Excel Spreadsheets

No reader comments ... yet.

Printer-friendly page layout

ADVISORAMA
There comes a time when you should stop expecting other people to
make a big deal about your birthday. That time is age eleven.
-- Dave Barry
Refresh (F5) for more Contribute

SPECIAL OFFERS ADVISOR MARKETPLACE

RSS FEEDS - Look for the XML icons to get ADVISOR headlines on COMPLIANCE SOLUTIONS ADVISOR MAGAZINE - Compliance is not
your desktop. optional! Keep up on Sarbanes-Oxley, HIPAA, Patriot Act, and much more.
ARTICLE FEEDBACK: Discuss topics and share your wisdom via yellow Subscribe now.
Reader Comments box at the bottom of each article. MOBILE BUSINESS ADVISOR MAGAZINE - Strategies and solutions for
success in the new world of mobile & wireless business and lifestyle. Subscribe
now.
NOW: SECRETS OF THE TOP EXPERTS! - See exactly how to do it in
Advisor Academy step-by-step training CDs. Get insider advice directly from
the top experts. Click to see what you can learn right now.
GET ADVISOR ANTHOLOGIES ON CD - Have it all, at your fingertips:
articles, tips, code, files. Complete Advisor CDs are available now!
INTERNET DOMAIN REGISTRATION - Have total control of all your Web
and e-mail domains with a powerful new system. Set up advanced DNS,
customize e-mail, block spam, and save money on domain registrations.

Posted 08/27/2004 Modified 03/07/2005 03:46:38 PM

Site Privacy Terms Trademarks Advertising Jobs About Advisor's

Map of Advisor San
Use Media Diego

Use of this or any other site, content, product or service of Advisor Media constitutes acceptance of Terms of Use. Portions copyright ©1983-2005 Advisor Media,
Inc. All Rights Reserved. Reuse or reproduction of any portion or quantity of Advisor Media's copyrighted content, in any form, for any purpose, requires written
permission.

ADVISOR®, and other names and logos that incorporate ADVISOR, are registered trademarks, trademarks or service marks of Advisor Media, Inc. in the United
States, the European Union, and/or other countries. Other trademarks are used for identification, editorial or descriptive purposes and are the property of their
owners.

Page generated 03/09/2005 03:51:15 AM - MMB U:[email protected] B:Microsoft 6 WinNT Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en)
Opera 8.00 HR:http://www.sqlserverdatamining.com/DMCommunity/Whitepapers+and+Articles/Articles/default.aspx SN:accessvbsqladvisor.com PI:/articles.nsf/aid/
14413 P:Windows/32 V:194 Advisor expert advice, help, know-how, tips, news, training, and more.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Data Analytics Using Excel Microsoft 365 With Accounting and Fin - Nodrm
100% (1)
Data Analytics Using Excel Microsoft 365 With Accounting and Fin - Nodrm
558 pages
Big Book of Excel Vba Macros
No ratings yet
Big Book of Excel Vba Macros
176 pages
Excel VBA File Management Using The FileSytemObject
No ratings yet
Excel VBA File Management Using The FileSytemObject
21 pages
WWW Tutorialspoint Com Excel Data Analysis Excel Data Analysis Quick Guide HTM
No ratings yet
WWW Tutorialspoint Com Excel Data Analysis Excel Data Analysis Quick Guide HTM
50 pages
Statistical Analysis in Excel
No ratings yet
Statistical Analysis in Excel
51 pages
Data Analytics Using Microsoft Excel
No ratings yet
Data Analytics Using Microsoft Excel
482 pages
Project ON Data Tab (Ms Excel)
No ratings yet
Project ON Data Tab (Ms Excel)
27 pages
Excel
No ratings yet
Excel
34 pages
Data Quality
No ratings yet
Data Quality
14 pages
Session 2 - Excel Fundamentals For Data Exploration
100% (1)
Session 2 - Excel Fundamentals For Data Exploration
56 pages
Sneha File
No ratings yet
Sneha File
77 pages
Visual Basic For Excel 97-2000-XP
No ratings yet
Visual Basic For Excel 97-2000-XP
43 pages
Ccw331business Analytics Lab
No ratings yet
Ccw331business Analytics Lab
91 pages
Advanced Excel
No ratings yet
Advanced Excel
48 pages
Excel Cleanup Guide
No ratings yet
Excel Cleanup Guide
14 pages
The Role of The Nigerian Professional Engineer in Consultancy
No ratings yet
The Role of The Nigerian Professional Engineer in Consultancy
4 pages
Excel Macros / VBA (Bank Reconciliation Program)
71% (7)
Excel Macros / VBA (Bank Reconciliation Program)
54 pages
(David Phillips) Web Scraping With Excel How To U (B-Ok - CC)
100% (3)
(David Phillips) Web Scraping With Excel How To U (B-Ok - CC)
59 pages
CCW331 Labmanual
No ratings yet
CCW331 Labmanual
95 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
65 pages
How To Use Excel XPS Macro Pub
No ratings yet
How To Use Excel XPS Macro Pub
7 pages
Priyanshi Tomer Data Analytics 1
No ratings yet
Priyanshi Tomer Data Analytics 1
74 pages
How To Work On Data You Haev
No ratings yet
How To Work On Data You Haev
40 pages
Excel Advanced Course
No ratings yet
Excel Advanced Course
4 pages
Final Dass
No ratings yet
Final Dass
153 pages
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
57 pages
Excel Lab Manual-2
No ratings yet
Excel Lab Manual-2
62 pages
Akansh Srivastava MBA22214
No ratings yet
Akansh Srivastava MBA22214
39 pages
Branches of Engineering
No ratings yet
Branches of Engineering
11 pages
Lecture 2 FINAL
No ratings yet
Lecture 2 FINAL
80 pages
PhazeComp Keywords
No ratings yet
PhazeComp Keywords
56 pages
Microsoft Excel-Amashi
No ratings yet
Microsoft Excel-Amashi
18 pages
Lesson 7 Analytics With Excel
No ratings yet
Lesson 7 Analytics With Excel
55 pages
Vb-Script 1
0% (2)
Vb-Script 1
15 pages
Spreadsheets and Non-Spatial Databases
100% (1)
Spreadsheets and Non-Spatial Databases
20 pages
What'Sbest!: User'S Manual
No ratings yet
What'Sbest!: User'S Manual
484 pages
Guc 59 64 49551 2024-11-03T14 11 20
No ratings yet
Guc 59 64 49551 2024-11-03T14 11 20
25 pages
Gayu Report
No ratings yet
Gayu Report
24 pages
handout10489IT10489LAmbrosiusHandouts PDF
No ratings yet
handout10489IT10489LAmbrosiusHandouts PDF
106 pages
IT Lab Practical File
No ratings yet
IT Lab Practical File
42 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
12 pages
Hardening Microsoft 365, Office 2021, Office 2019 and Office 2016 (July 2023)
No ratings yet
Hardening Microsoft 365, Office 2021, Office 2019 and Office 2016 (July 2023)
15 pages
Cablab - Report - 84
No ratings yet
Cablab - Report - 84
14 pages
Database Analytics
No ratings yet
Database Analytics
29 pages
Ada Unit3
No ratings yet
Ada Unit3
11 pages
Transient Operations
100% (1)
Transient Operations
44 pages
Lec 2
No ratings yet
Lec 2
11 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
Data Preprocessing For Clustering
No ratings yet
Data Preprocessing For Clustering
40 pages
4 - SC030218 - PR Afflux Advisor User Guide and Technical Reference PDF
No ratings yet
4 - SC030218 - PR Afflux Advisor User Guide and Technical Reference PDF
71 pages
SG19861 DWDM Practical-File
No ratings yet
SG19861 DWDM Practical-File
29 pages
Module 1 - Data Analysis in Excel
No ratings yet
Module 1 - Data Analysis in Excel
15 pages
Excel Solver Upgrade Guide
100% (1)
Excel Solver Upgrade Guide
17 pages
Unit 4 Microsoft Excel in Business
No ratings yet
Unit 4 Microsoft Excel in Business
29 pages
Route Optimization For Warehouse Order Picking Ope
No ratings yet
Route Optimization For Warehouse Order Picking Ope
19 pages
Ges300 1
No ratings yet
Ges300 1
24 pages
DWFile
No ratings yet
DWFile
22 pages
VBA For Financial Engineering Course Overview
0% (1)
VBA For Financial Engineering Course Overview
6 pages
Data Science Analytics Reviewer
No ratings yet
Data Science Analytics Reviewer
10 pages
Optimal Injection Strategies For Foam Ior - Shan Spe
No ratings yet
Optimal Injection Strategies For Foam Ior - Shan Spe
19 pages
Course Outline Microsoft Office Excel Advanced - 2 Days: Logical Functions Formula Techniques
No ratings yet
Course Outline Microsoft Office Excel Advanced - 2 Days: Logical Functions Formula Techniques
4 pages
Crude Oil Viscosity Correlations: A Novel Approach For Upper Assam Basin
No ratings yet
Crude Oil Viscosity Correlations: A Novel Approach For Upper Assam Basin
7 pages
Command Reference
No ratings yet
Command Reference
35 pages
Excelda
No ratings yet
Excelda
6 pages
BC 2014 Session2
No ratings yet
BC 2014 Session2
45 pages
Working With Statistics Using Excel: K.V.S. Sarma Professor of Statistics Sri Venkateswara University Tirupati - 517 502
No ratings yet
Working With Statistics Using Excel: K.V.S. Sarma Professor of Statistics Sri Venkateswara University Tirupati - 517 502
50 pages
Project ON Data Tab in Ms Excel: Submitted by - Roll No. Reg. No. NRO0426443 Group No. - 12 Submitted To
No ratings yet
Project ON Data Tab in Ms Excel: Submitted by - Roll No. Reg. No. NRO0426443 Group No. - 12 Submitted To
27 pages
Class Reports With Access 2
No ratings yet
Class Reports With Access 2
16 pages
S.No Topic Link: Ilogic - Content Centre
No ratings yet
S.No Topic Link: Ilogic - Content Centre
21 pages
Miko, 15 Ike E
No ratings yet
Miko, 15 Ike E
12 pages
5101b435d3fb2 - Following Hard After God (A. W. Tozer)
No ratings yet
5101b435d3fb2 - Following Hard After God (A. W. Tozer)
10 pages
14 Top Scripting Languages You Can Learn
No ratings yet
14 Top Scripting Languages You Can Learn
4 pages
Data Collection and Collation Reporting Analysis
No ratings yet
Data Collection and Collation Reporting Analysis
24 pages
Macro That Extract Video Filename, Size and Duration and List To Excel - Excel General - OzGrid Free Excel - VBA Help Forum
No ratings yet
Macro That Extract Video Filename, Size and Duration and List To Excel - Excel General - OzGrid Free Excel - VBA Help Forum
12 pages
Project ON Data Tab in Ms Excel: Submitted by
No ratings yet
Project ON Data Tab in Ms Excel: Submitted by
27 pages
Microsoft Excel Introduction To Microsoft Excel
No ratings yet
Microsoft Excel Introduction To Microsoft Excel
7 pages
Arc Objects 9.3 in Delphi 7
No ratings yet
Arc Objects 9.3 in Delphi 7
20 pages
Trabajo de Ingles
No ratings yet
Trabajo de Ingles
19 pages
Application of QUAL2K For Water Quality Modeling of River Ghataprabha (India)
No ratings yet
Application of QUAL2K For Water Quality Modeling of River Ghataprabha (India)
6 pages
Wayne Ho CV - RAD Commando
No ratings yet
Wayne Ho CV - RAD Commando
3 pages
Vba Code For Rows Inserton
No ratings yet
Vba Code For Rows Inserton
8 pages
Energy Calculator
No ratings yet
Energy Calculator
3 pages
Sciencedirect Sciencedirect Sciencedirect
No ratings yet
Sciencedirect Sciencedirect Sciencedirect
6 pages
Gas PVT
No ratings yet
Gas PVT
3 pages
Outline Excel 2010 Advanced
No ratings yet
Outline Excel 2010 Advanced
2 pages
Syllabus Excel 2013 Advance
No ratings yet
Syllabus Excel 2013 Advance
2 pages
GC Blockage
No ratings yet
GC Blockage
2 pages
How To Paste Values To Visible - Filtered Cells Only in Excel
No ratings yet
How To Paste Values To Visible - Filtered Cells Only in Excel
8 pages
GraphWorX32 - Getting Started With Scripting
No ratings yet
GraphWorX32 - Getting Started With Scripting
2 pages
Tool Bar
No ratings yet
Tool Bar
6 pages
Bcie 92228
No ratings yet
Bcie 92228
1 page
Pirouette 4.5 Spec
No ratings yet
Pirouette 4.5 Spec
1 page
400L 1sem Coursereg
No ratings yet
400L 1sem Coursereg
1 page
Q 1. Brief Note On Mail Merge: Concept of Mail Merging and Its Components
No ratings yet
Q 1. Brief Note On Mail Merge: Concept of Mail Merging and Its Components
8 pages
Excel Vba Video Tutorial
No ratings yet
Excel Vba Video Tutorial
2 pages
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
From Everand
Amazon DynamoDB - The Definitive Guide: Explore enterprise-ready, serverless NoSQL with predictable, scalable performance
Aman Dhingra
No ratings yet
Microsoft Access Guide to Success: From Fundamentals to Mastery in Crafting Databases, Optimizing Tasks, & Making Unparalleled Impressions [III EDITION]
From Everand
Microsoft Access Guide to Success: From Fundamentals to Mastery in Crafting Databases, Optimizing Tasks, & Making Unparalleled Impressions [III EDITION]
Kevin Pitch
5/5 (18)
Microsoft SQL Server Text Book
From Everand
Microsoft SQL Server Text Book
Manish Soni
No ratings yet
Azure® Essentials
From Everand
Azure® Essentials
iCertify Training
No ratings yet
Beginning Access 2003 VBA
From Everand
Beginning Access 2003 VBA
Denise M. Gosnell
5/5 (1)
Microsoft Azure Architect Technologies: Exam Guide AZ-300: A guide to preparing for the AZ-300 Microsoft Azure Architect Technologies certification exam
From Everand
Microsoft Azure Architect Technologies: Exam Guide AZ-300: A guide to preparing for the AZ-300 Microsoft Azure Architect Technologies certification exam
Sjoukje Zaal
No ratings yet
Hyper-V Network Virtualization Cookbook
From Everand
Hyper-V Network Virtualization Cookbook
Ryan Boud
No ratings yet
Learning Microsoft Windows Server 2012 Dynamic Access Control
From Everand
Learning Microsoft Windows Server 2012 Dynamic Access Control
Jochen Nickel
No ratings yet
Instant SQL Server Analysis Services 2012 Cube Security
From Everand
Instant SQL Server Analysis Services 2012 Cube Security
Satya SK Jayanty
No ratings yet
SQL Server Interview Questions You'll Most Likely Be Asked
From Everand
SQL Server Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Crivat B., MacLennan J. - Detect Anomalies in Excel Spreadsheets

Uploaded by

Crivat B., MacLennan J. - Detect Anomalies in Excel Spreadsheets

Uploaded by

Access Advisor :: Detect Anomalies in Excel Spreadsheets -- Microsoft Excel Microsoft SQL Server Data Integration Database Da...

More on Data Integration A common problem that A CCESS A DVISOR Advertisers

workbook after a few SageKey - Build and troubleshoot installations

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

MICROSOFT SQL Requirements

However, the complexity of these patterns grows with the dimensionality

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Recheck the data for the entries that contain anomalies

Student ID contains a unique identifier (a number) for each entry in

A first guess would indicate that students with:

How Data Mining can help

How can data mining find rules in my

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

You can reformulate the problem of detecting anomalies in terms of data

This kind of problem, in terms of data mining, is a classification problem.

For a specific attribute, the Microsoft_Decision_Trees algorithm can

Here's an example of the format of a rule the Microsoft_Decision_Trees

How likely is "most likely"?

Clearly, a rule that applies to 1,000 students deserves more consideration

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Microsoft_Decision_Trees finds both the support and the confidence for

Integration with Excel

Excel provides a handy feature called add-ins. An add-in is a library that

1. Open a workbook in Excel (for example, open the CollegePlans.xls file

"Copy Data Mining Anomaly Detection.xla to the Add-Ins folder

5. A new entry shows up in the list of add-ins: Data Mining Anomaly

1. Select the range of data you want to analyze.

2. Tell the add-in what to do.

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

If your spreadsheet contains a key column, such as a row identifier,

If you have Microsoft Analysis Services 2005 installed on your machine,

3. Inspect the results.

For each rule, the following columns are present:

Rule Description -- Contains the verbose description of the rule. As we

What else can Data Mining do with my data?

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Figure 5 shows how the rules discovered inside the CollegePlans

First, you have to initialize a connection to the OLE DB provider for

Private m_cnAS As ADODB.Connection

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

m_cnAS.Open "Provider=MSOLAP.3;Data Source=c:\ExcelAddIn.cub"

The provider signature is "MSOLAP.3".

CREATE MINING MODEL __ExcelTemp(

INSERT INTO __ExcelTemp (

The Analysis Services OLE DB provider supports parameters, such as

' Execute Training command

' The INSERT INTO DMX Statement

Dim param As ADODB.Parameter

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

' The XMLA 11 serialized rowset

MySpreadsheet has the same meaning as above-i.e., a parameter that

We'll analyze the semantics of this prediction statement, as it's important

The NATURAL PREDICTION JOIN part of the prediction statement indicates

The statement, translated to plain English, is:

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

EXPORT MINING MODEL __ExcelTemp TO

This statement creates a new file on the hard disk, named

IMPORT FROM 'c:\MySpreadSheetRulesFile.abf'

Simply by using a different algorithm name in the CREATE MINING MODEL

Power at your fingertips

Figure 1: Example -- The CollegePlans

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

Figure 2: The Data Mining Anomaly Detection

Figure 3: Result of the Anomaly -- Detection

Figure 4: Rules found by Data Mining -- The

Figure 5: Microsoft Decision Tree Viewer -- The

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34

The Node Legend window shows the

Table 1: Rules organized by

What do YOU think? CLICK HERE to add a comment to this article.

Detect Anomalies in Excel Spreadsheets

Printer-friendly page layout

SPECIAL OFFERS ADVISOR MARKETPLACE

Posted 08/27/2004 Modified 03/07/2005 03:46:38 PM

Site Privacy Terms Trademarks Advertising Jobs About Advisor's

http://accessvbsqladvisor.com/doc/14413 09/03/2005 17:24:34