Acp Excise
Acp Excise
Theme
Data Mining
Summary:
Introduction…………………………………………………........................3
1
Definition...………………………………………………….........................3
Conclusion…………………………………………………………………………12
Bibliography……………………………………………………………………….13
Introduction:
We are in an age often referred to as the information age. In this information age, because we believe that
information leads to power and success, and thanks to sophisticated technologies such as computers, satellites,
etc., we have been collecting tremendous amounts of information. Initially, with the advent of computers and
means for mass digital storage, we started collecting and storing all sorts of data, counting on the power of
computers to help sort through this amalgam of information. Unfortunately, these massive collections of data
stored on disparate structures very rapidly became overwhelming. This initial chaos has led to the creation of
2
structured databases and database management systems (DBMS). The efficient database management systems
have been very important assets for management of a large corpus of data and especially for effective and
efficient retrieval of particular information from a large collection whenever needed. The proliferation of
database management systems has also contributed to recent massive gathering of all sorts of information.
Today, we have far more information than we can handle from business transactions and scientific data, to
satellite pictures, text reports and military intelligence. Information retrieval is simply not enough anymore for
decision-making. Confronted with huge collections of data, we have now created new needs to help us make
better managerial choices. These needs are automatic summarization of data, extraction of the “essence” of
information stored, and the discovery of patterns in raw data.
Definition:
Generally, data mining (sometimes-called data or knowledge discovery) is the process of analyzing data from
different perspectives and summarizing it into useful information that can be used to increase revenue, cuts
costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to
analyze data from many different dimensions or angles, categorize it, and summarize the relationships
identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in
large relational databases. [1]
Example:
For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local
buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended
to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on
Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased
the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered
information in various ways to increase revenue. For example, they could move the beer display closer to the
diaper display. Moreover, they could make sure beer and diapers were sold at full price on Thursdays. [2]
Operational or transactional data such as, sales, cost, inventory, payroll, and accounting.
Nonoperational data, such as industry sales, forecast data, and macro-economic data.
Meta data - data about the data itself, such as logical database design or data dictionary definitions.
Information:
The patterns, associations, or relationships among all this data can provide information. For example, analysis of
retail point of sale transaction data can yield information on which products are selling and when.
Knowledge:
3
Information can be converted into knowledge about historical patterns and future trends. For example,
summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide
knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are
most susceptible to promotional efforts.
Scientific Viewpoint:
● Data collected and stored at enormous speeds (GB/hour)
remote sensors on a satellite
telescopes scanning the skies
microarrays generating gene expression data
This act of model building is thus something that people have been doing for a long time, certainly before the
advent of computers or data mining technology. What happens on computers, however, is not much different
5
than the way people build models. Computers are loaded up with lots of information about a variety of
situations where an answer is known and then the data mining software on the computer must run through that
data and distill the characteristics of the data that should go into the model. Once the model is built, it can then
be used in similar situations where you do not know the answer. For example, say that you are the director of
marketing for a telecommunications company and you would like to acquire some new long distance phone
customers. You could just randomly go out and mail coupons to the general population - just as you could
randomly sail the seas looking for sunken treasure. In neither case would you achieve the results you desired
and of course you have the opportunity to do much better than random - you could use your business
experience stored in your database to build a model.
As the marketing director, you have access to a lot of information about all of your customers: their age, sex,
credit history and long distance calling usage. The good news is that you also have a lot of information about
your prospective customers: their age, sex, credit history etc. Your problem is that you do not know the long
distance calling usage of these prospects (since they are most likely now customers of your competition). You
would like to concentrate on those prospects who have large amounts of long distance usage. You can
accomplish this by building a model. Table 2 illustrates the data used for building a model for new customer
prospecting in a data warehouse.[5]
Customer Prospects
Clustering
Clustering is identifying similar groups from unstructured data. Clustering is the task of grouping a set of objects
in a such a way that object in same group are more similar to each other than to those in other groups. Once the
clusters are decided, the objects are labelled their corresponding clusters, and common features of the objects
in cluster are summarized to form a class description. For example, a bank may cluster its customer in to several
groups based on the similarities of their income, age, sex, residence etc. and the command characteristics of the
customers in a group can be used to describe that group of customers. This will the bank to understand its
customers better and thus provide customized services.
Classification
6
Classification is learning rules that can be applied to new data and will typically include following steps:
preprocessing of data, designing modelling, learning/feature selection and validation /evaluation. Classification
predicts categorical continuous valued functions. For example, we can make classification model to categorize
bank loan application as either safe or risky. Classification is the derivation of model which determines the class
of an object based on its attributes. A set of object is given as training set in which every object is represented
by vector of attributes along with its class. By analyzing the relationship between attributes and class of the
objects in the training set, classification model can be constructed. Such classification model can be used to
classify future objects and develop a better understanding of the classes of the objects in the database. For
example, from the set ISSN (Online) : 2278-1021 ISSN (Print) : 2319-5940 International Journal of Advanced
Research in Computer and Communication Engineering Vol. 3, Issue 10, October 2014 Copyright to IJARCCE
www.ijarcce.com 8096 of loan borrowers (Name, Age, and Income) who serve as training set, a classification
model can be built, which concludes bank loan application as either safe or risky. (If age = Youth then Loan
decision = risky).
Regression
Regression is finding function with minimal error to model data. It is statistical methodology that is most often
used for numeric prediction. Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms of these
relationships. In restricted circumstances, regression analysis can be used to infer causal relationships between
the independent and dependent variables. However this can lead to illusions or false relationships, so cautions
advisable [6] for example, correlation does not imply causation.
Association
Association is looking for relationship between variables or objects. It aims to extract interesting association,
correlations or casual structures among the objects i.e. the appearance of another set of objects in [7]. The
association rules can be useful for marketing, commodity management, advertising etc. Association rule learning
is a popular and well researched method for discovering interesting relations between variables in large
databases. It is intended to identify strong rules discovered in databases using different measures of
interestingness[6] and based on the concept of strong rules presented in [8] , introduced association rules for
discovering regularities between products in large-scale transaction data recorded by point-of-sale (POS)
systems in supermarkets. For example, the rule {Onions, potatoes} {burger} found in the sales data of a
supermarket would indicate that if a customer buys onions and potatoes together, he or she is likely to also buy
hamburger meat. Such information can be used as the basis for decisions about marketing activities such as,
e.g., promotional pricing or product placements. In addition to the above example from market basket analysis
association rules are employed today in many application areas including Web usage mining, intrusion
detection, Continuous production, and bioinformatics.
Data mining is used for market basket analysis to provide information on what product combinations
were purchased together when they were bought and in what sequence. This information helps
7
businesses promote their most profitable products and maximize the profit. In addition, it encourages
customers to purchase related products that they may have been missed or overlooked.
Retail companies use data mining to identify customer’s behavior buying patterns .
Data mining is used to identify customers loyalty by analyzing the data of customer’s purchasing
activities such as the data of frequency of purchase in a period of time, a total monetary value of all
purchases and when was the last purchase. After analyzing those dimensions, the relative measure is
generated for each customer. The higher of the score, the more relative loyal the customer is.
To help the bank to retain credit card customers, data mining is applied. By analyzing the past data, data
mining can help banks predict customers that likely to change their credit card affiliation so they can plan
and launch different special offers to retain those customers.
Credit card spending by customer groups can be identified by using data mining.
The hidden correlation’s between different financial indicators can be discovered by using data mining.
From historical market data, data mining enables to identify stock trading rules.
Data mining is applied in claims analysis such as identifying which medical procedures are claimed
together.
Data mining enables to forecasts which customers will potentially purchase new policies.
Data mining allows insurance companies to detect risky customers’ behavior patterns.
8
Data mining helps identify the patterns of successful medical therapies for different illnesses .
Marketing / Retail
Data mining helps marketing companies build models based on historical data to predict who will respond to the
new marketing campaigns such as direct mail, online marketing campaign…etc. Through the results, marketers
will have an appropriate approach to selling profitable products to targeted customers.
Data mining brings many benefits to retail companies in the same way as marketing. Through market basket
analysis, a store can have an appropriate production arrangement in a way that customers can buy frequent
buying products together with pleasant. In addition, it also helps the retail companies offer certain discounts for
particular products that will attract more customers.
Finance / Banking
Data mining gives financial institutions information about loan information and credit reporting. By building a
model from historical customer’s data, the bank, and financial institution can determine good and bad loans. In
addition, data mining helps banks detect fraudulent credit card transactions to protect credit card’s owner.
Manufacturing
By applying data mining in operational engineering data, manufacturers can detect faulty equipment and
determine optimal control parameters. For example, semiconductor manufacturers have a challenge that even
the conditions of manufacturing environments at different wafer production plants are similar, the quality of
wafer are a lot the same and some for unknown reasons even has defects. Data mining has been applying to
determine the ranges of control parameters that lead to the production of the golden wafer. Then those optimal
control parameters are used to manufacture wafers with desired quality.
Governments
Data mining helps government agency by digging and analyzing records of the financial transaction to build
patterns that can detect money laundering or criminal activities.
Privacy Issues
The concerns about the personal privacy have been increasing enormously recently especially when the internet
is booming with social networks, e-commerce, forums, blogs…. Because of privacy issues, people are afraid of
their personal information is collected and used in an unethical way that potentially causing them a lot of
troubles. Businesses collect information about their customers in many ways for understanding their purchasing
behaviors trends. However, businesses do not last forever, some days they may be acquired by other or gone. At
this time, the personal information they own probably is sold to other or leak.
9
Security issues
Security is a big issue. Businesses own information about their employees and customers including social
security number, birthday, payroll etc. However how properly this information is taken care is still in questions.
There have been a lot of cases that hackers accessed and stole big data of customers from the big corporation
such as Ford Motor Credit Company, Sony… with so much personal and financial information available, the
credit card stolen and identity theft become a big problem.
Information is collected through data mining intended for the ethical purposes can be misused. This information
may be exploited by unethical people or businesses to take benefits of vulnerable people or discriminate against
a group of people.In addition, data mining technique is not perfectly accurate. Therefore, if inaccurate
information is used for decision-making, it will cause serious consequence.
Conclusion:
Data mining is an important part of knowledge discovery process that we can analyze an
enormous set of data and get hidden and useful knowledge. Data mining is applied effectively
not only in the business environment but also in other fields such as weather forecast,
medicine, transportation, healthcare, insurance, government…etc. Data mining has many
advantages when using in a specific industry. Besides those advantages, data mining also has its
own disadvantages e.g., privacy, security and misuse of information.
10
Bibliography
[1]
[2]
[3]
[4]
[5] http://www.thearling.com/text/dmwhite/dmwhite.htm \ 7_11_2016
[6] R.Kaur, S.Kaur, A.Kaur, R.Kaur, A.Kaur, “An Overview of Database management System, Data warehousing
and Data Mining”. IJARCCE, Vol.2, issue.7, July 2013.
[7] Y.Fu , Data Minig : Tasks, Techniques and Applications.
[8] Y. Ramamohan, K. Vasantharao, C. Kalyana Chakravarti, and A.S.K.Ratnam, “A Study of Data Mining Tools in
Knowledge
11