Data Mining Metrices
Data Mining Metrices
INTRODUCTION
1
F-Score, Area under the ROC Curve, Average Precision, Precision/Recall
Break-Even Point, Squared Error, Cross Entropy, and Probability
Calibration. Multidimensional scaling (MDS) shows that these metrics span
a low dimensional manifold.
What was not expected was that the margin methods have excellent
performance on ordering metrics such as ROC area and average precision.
We introduce a new metric, SAR, that combines squared error, accuracy,
and ROC area into one metric. MDS and correlation analysis shows that
SAR is centrally located and correlates well with other metrics, suggesting
that it is a good general purpose metric to use when more specific criteria
are not known.
Choice models are used to identify the most important factors driving customer
choices. Typically, the choice model enables a firm to compute an individual's likelihood of
purchase, or other behavioral response, based on variables that the firm has in its database,
such as geo-demographics, past purchase behavior for similar products, attitudes, or
psychographics.
3. Rule induction. Rule induction involves developing formal rules that are extracted
from a set of observations. The rules extracted may represent a scientific model of the data or
local patterns in the data. One major rule-induction paradigm is the association rule.
Association rules are about discovering interesting relationships between variables in large
databases. It is a technique applied in data mining and uses rules to discover regularities
between products. For example, if someone buys peanut butter and jelly, he or she is likely to
buy bread. The idea behind association rules is to understand when a customer does X, he or
she will most likely do Y. Understanding those kinds of relationships can help with
forecasting sales, promotional pricing, or product placements.
3
4. Network/Link Analysis. This is another technique for associating like records. Link
analysis is a subset of network analysis. It explores relationships and associations among
many objects of different types that are not apparent from isolated pieces of information. It is
commonly used for fraud detection and by law enforcement. You may be familiar with link
analysis, since several Web-search ranking algorithms use the technique.
6. Neural networks. Neural networks were designed to mimic how the brain learns
and analyzes information. Organizations develop and apply artificial neural networks to
predictive analytics in order to create a single framework.
The idea is that a neural network is much more efficient and accurate in circumstances
where complex predictive analytics is required, because neural networks comprise a series of
interconnected calculating nodes that are designed to map a set of inputs into one or more
output signals. Neural networks are ideal for deriving meaning from complicated or
imprecise data and can be used to extract patterns and detect trends that are too complex to be
noticed by humans or other computer techniques. Marketing organizations find neural
networks useful for predicting customer demand and customer segmentation.
4
8. Decision trees. Decision trees use real data-mining algorithms to help with
classification. A decision-tree process will generate the rules followed in a process. Decision
trees are useful for helping you choose among several courses of action and enable you to
explore the possible outcomes for various options in order to assess the risk and rewards for
each potential course of action. Such an analysis is useful when you need to choose among
different strategies or investment opportunities, and especially when you have limited
resources.
CONCLUSION
The detection of function clones in software systems is valuable for the code
adaptation and error checking maintenance activities. This assignment presents an efficient
metrics-based data mining clone detection approach. First, metrics are collected for all
functions in the software system.
A data mining algorithm, fractal clustering, is then utilized to partition the software
system into a relatively small number of clusters. Each of the resulting clusters encapsulates
functions that are within a specific proximity of each other in the metrics space. Finally, clone
classes, rather than pairs, are easily extracted from the resulting clusters. For large software
systems, the approach is very space efficient and linear in the size of the data set. Evaluation
is performed using medium and large open source software systems. In this evaluation, the
effect of the chosen metrics on the detection precision is investigated.
REFERENCES
http://www.marketingprofs.com/articles/2010/3567/the-nine-most-common-data-
mining-techniques-used-in-predictive-analytics
5
http://www.networkworld.com/article/2231920/microsoft-subnet/data-mining-
your-performance-metrics---uncover-that-nugget---.html
http://trrjournalonline.trb.org/doi/abs/10.3141/2072-15?journalCode=trr
http://dl.acm.org/citation.cfm?id=1014063