0% found this document useful (0 votes)

93 views

Assignment 1

A department store can use data mining to assist with targeted marketing campaigns. By using association rules, the store can determine which product purchases are associated with other product purchases. This allows the store to send targeted marketing only to customers likely to purchase additional products. Data query processing and simple statistical analysis cannot handle large customer data sets or find association rules like data mining can.

Uploaded by

sonal jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views

Assignment 1

Uploaded by

sonal jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

SUBMITTED BY: SONAL JAIN

ROLL NO: 18/FMS/MBA/073

SECTION B

Q1. Present an example where data mining is crucial to the success of a business. What
data mining functions does this business need? Can they be performed alternatively by
data query processing or simple statistical analysis?
Ans. The following example depicts that data mining is crucial to the success of a
business:
 A departmental store, can use data mining to assist with its target marketing mail
campaign.
 Using data mining functions such as association, the store can use the mined strong
association rules to determine which products bought by one group of customers
are likely to lead to the buying of certain other products. With this information, the
store can then mail marketing materials only to those kinds of customers who exhibit
a high likelihood of purchasing additional products.
 Data query processing is used for data or information retrieval and does not have
the means for finding association rules. Similarly, simple statistical analysis cannot
handle large amounts of data such as those of customer records in a department
store.

Q2.Outline the major research challenges of data mining in Finance and Marketing.
Ans. Data mining is not an easy task, as the algorithms used can get very complex and
data is not always available at one place. It needs to be integrated from various
heterogeneous data sources. These factors also create some issues. Here in this tutorial,
we will discuss the major issues regarding −

 Mining Methodology and User Interaction

 Performance Issues
 Diverse Data Types Issues

It refers to the following kinds of issues:

 Mining different kinds of knowledge in databases − Different users may be interested in

different kinds of knowledge. Therefore it is necessary for data mining to cover a broad
range of knowledge discovery task.
 Interactive mining of knowledge at multiple levels of abstraction − The data mining
process needs to be interactive because it allows users to focus the search for patterns,
providing and refining data mining requests based on the returned results.
 Incorporation of background knowledge − To guide discovery process and to express
the discovered patterns, the background knowledge can be used. Background
knowledge may be used to express the discovered patterns not only in concise terms
but at multiple levels of abstraction.
 Data mining query languages and ad hoc data mining − Data Mining Query language
that allows the user to describe ad hoc mining tasks, should be integrated with a data
warehouse query language and optimized for efficient and flexible data mining.
 Presentation and visualization of data mining results − Once the patterns are
discovered it needs to be expressed in high level languages, and visual representations.
These representations should be easily understandable.
 Handling noisy or incomplete data − The data cleaning methods are required to
handle the noise and incomplete objects while mining the data regularities. If the data
cleaning methods are not there then the accuracy of the discovered patterns will be
poor.
 Pattern evaluation − The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.

There can be performance-related issues such as follows

 Efficiency and scalability of data mining algorithms − In order to effectively extract the
information from huge amount of data in databases, data mining algorithm must be
efficient and scalable.
 Parallel, distributed, and incremental mining algorithms − The factors such as huge size
of databases, wide distribution of data, and complexity of data mining methods
motivate the development of parallel and distributed data mining algorithms. These
algorithms divide the data into partitions which is further processed in a parallel fashion.
Then the results from the partitions is merged. The incremental algorithms, update
databases without mining the data again from scratch.

Diverse Data Types Issues

 Handling of relational and complex types of data − The database may contain
complex data objects, multimedia data objects, spatial data, temporal data etc. It is
not possible for one system to mine all these kind of data.
 Mining information from heterogeneous databases and global information systems −
The data is available at different data sources on LAN or WAN. These data source may
be structured, semi structured or unstructured. Therefore mining the knowledge from
them adds challenges to data mining.

Q3. How can Machine Learning be enhanced to improve prediction and modelling?
Explain.
Ans. My key observation for this discovery was that we humans are very bias to what kind
of information (i.e. seeing, hearing, feeling, tasting and smelling) we pay attention but
disregard the rest. That is why we don't get to see the whole picture and hence, remain
stuck in a confusing state of mind - sometimes for many years - because we tend to
neglect to consider almost all information outside the spectrum of our very subjective and
limited range of sensory perception. Since we humans tend not to be systematic in
selecting the data sources and dimensions of information (i.e. feature selection) without
even being aware of it, we need the help of less bias artificial intelligence.
The fact that we cannot perceive this information does not make it any less important or
relevant for our lives. That is what I realized when trying to get a better understanding of
the regulation of the aging process. My problem was - or actually still is - that I could not
find the kind of datasets, I'd need to test new hypotheses. This implies that nobody before
me seemed to have felt that collecting transcriptome, proteome, metabolome and epi-
genetic data every 5 minutes throughout the entire lifespan of the yeast, would be worth
the effort. We are totally capable of generating the data needed to advance in our
understanding of aging and many other complex and still obscure phenomena, but the
wet-lab scientists, who design our biological and medical experimental studies, don't
seem to be even aware of missing something very important.

A good example is the magnetic field. We humans tend to ignore it because we cannot
feel it. But, nevertheless, it affects our lives. It can cure depression. It can make your
thumbs move involuntarily. Some birds use it to navigate the globe on their seasonal flight
migration.

I am worried about that there are other fundamental phenomena similar to the magnetic
field, of which none of us is aware yet, because so far we have not tried to look for similar
imperceptible information carrying dimensions. For example, spiders, ants and bets are
blind. However, visible light affects their lives regardless whether or not they have a
concept of vision, since they have never experienced it. There could be other information
carrying dimensions that – like the light for the spiders, ants and bets – is an imperatively
hidden object (IHO), although it affects our lives so profoundly that we cannot
understand aging and many other complex phenomena without considering such kind of
information as well. That is why I recommend using artificial intelligence to reduce our
observational bias.

Often, scientific progress has been made by accident. The means that a mistake, which
changed the otherwise constant experimental environment in such a way that an
unexpected result or observation was the consequence, was what helped us
unexpectedly to finally make some progress. That is why I am proposing to intentionally
vary external experimental conditions, methods, measurements, study designs, etc., to
discover new features, which affect the outcome, much sooner.

Q4. What are proper machine learning algorithms to extract relationship among
variables?
Ans. Proper machine learning algorithms to extract relationship among variables are:

1. Linear Regression
These are probably the simplest algorithms in machine learning. Regression algorithms can
be used for example, when you want to compute some continuous value as compared
to Classification where the output is categoric. So whenever you are told to predict some
future value of a process which is currently running, you can go with regression algorithm.
Linear Regressions are however unstable in case features are redundant, i.e. if there is
multicollinearity
Some examples where linear regression can used are:
 Time to go one location to another
 Predicting sales of particular product next month
 Impact of blood alcohol content on coordination
 Predict monthly gift card sales and improve yearly revenue projections

2. Logistic Regression
Logistic regression performs binary classification, so the label outputs are binary. It takes
linear combination of features and applies non-linear function (sigmoid) to it, so it’s a very
small instance of neural network.
Logistic regression provides lots of ways to regularize your model, and you don’t have to
worry as much about your features being correlated, like you do in Naive Bayes. You also
have a nice probabilistic interpretation, and you can easily update your model to take in
new data, unlike decision trees or SVMs. Use it if you want a probabilistic framework or if
you expect to receive more training data in the future that you want to be able to quickly
incorporate into your model. Logistic regression can also help you understand the
contributing factors behind the prediction, and is not just a black box method.
Logistic regression can be used in cases such as:
 Predicting the Customer Churn
 Credit Scoring & Fraud Detection
 Measuring the effectiveness of marketing campaigns

3. Decision trees
Single trees are used very rarely, but in composition with many others they build very
efficient algorithms such as Random Forest or Gradient Tree Boosting.
Decision trees easily handle feature interactions and they’re non-parametric, so you don’t
have to worry about outliers or whether the data is linearly separable. One disadvantage
is that they don’t support online learning, so you have to rebuild your tree when new
examples come on. Another disadvantage is that they easily over-fit, but that’s where
ensemble methods like random forests (or boosted trees) come in. Decision Trees can also
take a lot of memory (the more features you have, the deeper and larger your decision
tree is likely to be)
Trees are excellent tools for helping you to choose between several courses of action.
 Investment decisions
 Customer churn
 Banks loan defaulters
 Build vs Buy decisions
 Sales lead qualifications

4. K-means
Sometimes you don’t know any labels and your goal is to assign labels according to the
features of objects. This is called clusterization task. Clustering algorithms can be used for
example, when there is a large group of users and you want to divide them into particular
groups based on some common attributes.
If there are questions like how is this organized or grouping something or concentrating on
particular groups etc. in your problem statement then you should go with Clustering.
The biggest disadvantage is that K-Means needs to know in advance how many clusters
there will be in your data, so this may require a lot of trials to “guess” the best K number of
clusters to define.

5. Support Vector Machines

Support Vector Machine (SVM) is a supervised machine learning technique that is widely
used in pattern recognition and classification problems — when your data has exactly two
classes.
High accuracy, nice theoretical guarantees regarding overfitting, and with an
appropriate kernel they can work well even if you’re data isn’t linearly separable in the
base feature space. Especially popular in text classification problems where very high-
dimensional spaces are the norm. SVMs are however memory-intensive, hard to interpret,
and difficult to tune.
SVM can be used in real-world applications such as:
 detecting persons with common diseases such as diabetes
 hand-written character recognition
 text categorization — news articles by topics
 stock market price prediction

6. Neural networks
Neural Networks take in the weights of connections between neurons. The weights are
balanced, learning data point in the wake of learning data point. When all weights are
trained, the neural network can be utilized to predict the class or a quantity, if there
should arise an occurrence of regression of a new input data point. With Neural networks,
extremely complex models can be trained and they can be utilized as a kind of black
box, without playing out an unpredictable complex feature engineering before training
the model. Joined with the “deep approach” even more unpredictable models can be
picked up to realize new possibilities. E.g. object recognition has been as of late
enormously enhanced utilizing Deep Neural Networks. Applied to unsupervised learning
tasks, such as feature extraction, deep learning also extracts features from raw images or
speech with much less human intervention.
On the other hand, neural networks are very hard to just clarify and parameterization is
extremely mind boggling. They are also very resource and memory intensive.

Q5. When do I need unsupervised learning? Discuss.

Ans. Unsupervised learning is very useful in exploratory analysis because it can
automatically identify structure in data. For example, if an analyst were trying to segment
consumers, unsupervised clustering methods would be a great starting point for their
analysis. In situations where it is either impossible or impractical for a human to propose
trends in the data, unsupervised learning can provide initial insights that can then be used
to test individual hypotheses.
Dimensionality reduction, which refers to the methods used to represent data using less
columns or features, can be accomplished through unsupervised methods. In
representation learning, we wish to learn relationships between individual features,
allowing us to represent our data using the latent features that interrelate our initial
features. This sparse latent structure is often represented using far fewer features than we
started with, so it can make further data processing much less intensive, and can
eliminate redundant features.

Q6. Differentiate between supervised, unsupervised, and reinforcement learning.

HCIA-Datacom V1.0 Certification H12-811-ENU Dumps
No ratings yet
HCIA-Datacom V1.0 Certification H12-811-ENU Dumps
10 pages
Lecture 1 Data Mining
No ratings yet
Lecture 1 Data Mining
51 pages
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
No ratings yet
5 Major Issues 10 Feb 2021material I 10 Feb 2021 Mod1 Issues
5 pages
Data Mining Issues
No ratings yet
Data Mining Issues
5 pages
Major issues in DM
No ratings yet
Major issues in DM
5 pages
Data Mining - Digital Notes (Unit I To V)
No ratings yet
Data Mining - Digital Notes (Unit I To V)
85 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
76 pages
Data Mining Overview
No ratings yet
Data Mining Overview
24 pages
DM Chapter 1
No ratings yet
DM Chapter 1
10 pages
Whats App
No ratings yet
Whats App
23 pages
DWDMunit 2
No ratings yet
DWDMunit 2
27 pages
Data Mining Issues and Tasks
No ratings yet
Data Mining Issues and Tasks
5 pages
DM notes
No ratings yet
DM notes
26 pages
DMW - Unit 1
No ratings yet
DMW - Unit 1
21 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
DM-Model Question Paper Solutions
No ratings yet
DM-Model Question Paper Solutions
27 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
Data Mining and Warehouse
No ratings yet
Data Mining and Warehouse
30 pages
Intro To Data Minning
No ratings yet
Intro To Data Minning
24 pages
Unit 4 Introduction To Data Mining
No ratings yet
Unit 4 Introduction To Data Mining
22 pages
Es 2646574663
No ratings yet
Es 2646574663
7 pages
Data Mining
No ratings yet
Data Mining
22 pages
Mmds
No ratings yet
Mmds
12 pages
unit-1 notes onl
No ratings yet
unit-1 notes onl
25 pages
Payal Technical Seminar Final
No ratings yet
Payal Technical Seminar Final
23 pages
Issues in Data Mining
No ratings yet
Issues in Data Mining
4 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Notes for DMDWH -Module1
No ratings yet
Notes for DMDWH -Module1
21 pages
John - Fields - HW1 Data Mining
No ratings yet
John - Fields - HW1 Data Mining
10 pages
Insert Your Titles and Guide Name: International Research Journal of Engineering and Technology (IRJET)
No ratings yet
Insert Your Titles and Guide Name: International Research Journal of Engineering and Technology (IRJET)
8 pages
Data Mining Task Primitives and Major Issues
No ratings yet
Data Mining Task Primitives and Major Issues
18 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
Data Whare House PDF
No ratings yet
Data Whare House PDF
51 pages
Data Mining
No ratings yet
Data Mining
20 pages
IJCSE-01768
No ratings yet
IJCSE-01768
4 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
8 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Data Mining
No ratings yet
Data Mining
4 pages
MLDM Lect1 Introduction
No ratings yet
MLDM Lect1 Introduction
40 pages
Unit 1 Data Mining task
No ratings yet
Unit 1 Data Mining task
7 pages
ware house server
No ratings yet
ware house server
89 pages
Data Mining
No ratings yet
Data Mining
26 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
87 pages
BI_Unit 5
No ratings yet
BI_Unit 5
9 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Unit 3
No ratings yet
Unit 3
34 pages
Lec 1
No ratings yet
Lec 1
48 pages
DM&DW SEE Module 1
No ratings yet
DM&DW SEE Module 1
6 pages
Data Mining: Concepts and Techniques (2nd Edition)
No ratings yet
Data Mining: Concepts and Techniques (2nd Edition)
8 pages
Week1-2
No ratings yet
Week1-2
24 pages
Data Mining Tutorials
No ratings yet
Data Mining Tutorials
52 pages
Chapter 1 - What is Data Mining
No ratings yet
Chapter 1 - What is Data Mining
8 pages
Gokaraju Rangaraju Institute of Engineering and Technology
No ratings yet
Gokaraju Rangaraju Institute of Engineering and Technology
49 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
18mca52c U1
No ratings yet
18mca52c U1
17 pages
CHAPTER1-datamining
No ratings yet
CHAPTER1-datamining
33 pages
1.data Mining Functionalities
No ratings yet
1.data Mining Functionalities
14 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)
Percentages PDF
No ratings yet
Percentages PDF
2 pages
Securities and Exchange Board of India
No ratings yet
Securities and Exchange Board of India
13 pages
Conflict Management (Sonal Jain)
No ratings yet
Conflict Management (Sonal Jain)
27 pages
Five Ps of Strategy: Plan, Pattern, Position, Perspective Ploy
No ratings yet
Five Ps of Strategy: Plan, Pattern, Position, Perspective Ploy
11 pages
Employee Engagement Kpi
100% (1)
Employee Engagement Kpi
3 pages
Determinants of Dividends Policy and Dividend Policy of Companies Trend and Ratio Analysis
No ratings yet
Determinants of Dividends Policy and Dividend Policy of Companies Trend and Ratio Analysis
31 pages
FMCG Channels
No ratings yet
FMCG Channels
14 pages
Mckinsey 7S Framework: Strategic Management Tom Peters, Robert Waterman, Julien Philips
100% (1)
Mckinsey 7S Framework: Strategic Management Tom Peters, Robert Waterman, Julien Philips
11 pages
FDI PPT IB
No ratings yet
FDI PPT IB
11 pages
Chapter-10: Research Methodology
No ratings yet
Chapter-10: Research Methodology
21 pages
International Business: Presented By: Group 8 Sonal Jain Mona Gogar Aastha Aggarwal
No ratings yet
International Business: Presented By: Group 8 Sonal Jain Mona Gogar Aastha Aggarwal
35 pages
Dividend: Dividend Payout Ratio Refers To The Percent of Net Profit To Be Distributed
No ratings yet
Dividend: Dividend Payout Ratio Refers To The Percent of Net Profit To Be Distributed
7 pages
The General Environment: Strategic Management
No ratings yet
The General Environment: Strategic Management
20 pages
Academic Calendar - Jan. - May 2019
No ratings yet
Academic Calendar - Jan. - May 2019
2 pages
Mncs
No ratings yet
Mncs
7 pages
HIKVISION Password Reset RU
No ratings yet
HIKVISION Password Reset RU
5 pages
Srs Gaming
No ratings yet
Srs Gaming
9 pages
D4000033-001 HEYEX2 Brochure HEYEX2Brochure US
No ratings yet
D4000033-001 HEYEX2 Brochure HEYEX2Brochure US
7 pages
(Ebook) Gradle Dependency Management by Hubert Klein Ikkink ISBN 9781784392789, 1784392782 pdf download
100% (1)
(Ebook) Gradle Dependency Management by Hubert Klein Ikkink ISBN 9781784392789, 1784392782 pdf download
28 pages
100 uCOS III Infineon XMC4500 001
No ratings yet
100 uCOS III Infineon XMC4500 001
910 pages
Full Download The Art of Reinforcement Learning Michael Hu PDF DOCX
100% (2)
Full Download The Art of Reinforcement Learning Michael Hu PDF DOCX
51 pages
SRX SFP 10ge Dac 1M
No ratings yet
SRX SFP 10ge Dac 1M
7 pages
LS6 Lesson 3 How To Turn On Off Restart A Desktop Computer
No ratings yet
LS6 Lesson 3 How To Turn On Off Restart A Desktop Computer
8 pages
(Ebook) Mastering SQL: A Beginner's Guide by Sufyan Bin Uzayr ISBN 9781032415062, 1032415061 - The latest ebook version is now available for instant access
100% (1)
(Ebook) Mastering SQL: A Beginner's Guide by Sufyan Bin Uzayr ISBN 9781032415062, 1032415061 - The latest ebook version is now available for instant access
76 pages
E Business Term Paper
No ratings yet
E Business Term Paper
16 pages
Logical Data Model
No ratings yet
Logical Data Model
12 pages
Third Quarter Computer Hardware Servicing Exam
No ratings yet
Third Quarter Computer Hardware Servicing Exam
4 pages
Database Normalization
No ratings yet
Database Normalization
44 pages
FSK M: Bjectives
No ratings yet
FSK M: Bjectives
4 pages
Abb
100% (1)
Abb
35 pages
message (3)
No ratings yet
message (3)
3 pages
50+ Data Structure and Algorithms Interview Questions
No ratings yet
50+ Data Structure and Algorithms Interview Questions
12 pages
Gen AI Course Content
No ratings yet
Gen AI Course Content
6 pages
Betavib Vibworks Data Collector, Analyzer, and Balancer - Information Brochure
No ratings yet
Betavib Vibworks Data Collector, Analyzer, and Balancer - Information Brochure
16 pages
Computer Hardware Servicing
100% (1)
Computer Hardware Servicing
44 pages
(Ebook) Beginning C# 6.0 Programming with Visual Studio 2015 by Benjamin Perkins, Jacob Vibe Hammer, Jon D. Reid ISBN 9781119096689, 1119096685 pdf download
No ratings yet
(Ebook) Beginning C# 6.0 Programming with Visual Studio 2015 by Benjamin Perkins, Jacob Vibe Hammer, Jon D. Reid ISBN 9781119096689, 1119096685 pdf download
55 pages
Go in Practice Second Edition MEAP V02 Nathan Kozyra - The ebook is available for instant download, read anywhere
No ratings yet
Go in Practice Second Edition MEAP V02 Nathan Kozyra - The ebook is available for instant download, read anywhere
47 pages
Introduction To The PlantPAx Distributed Control System Lab Manual REV2
No ratings yet
Introduction To The PlantPAx Distributed Control System Lab Manual REV2
90 pages
Ghid Utilizare Motorola MTP 3550
No ratings yet
Ghid Utilizare Motorola MTP 3550
163 pages
Unikey Manual 7.20 Uk
No ratings yet
Unikey Manual 7.20 Uk
244 pages
Chma-Test1 2023
No ratings yet
Chma-Test1 2023
2 pages
1 Fjune 2022 Ms
No ratings yet
1 Fjune 2022 Ms
24 pages
Instructions_NP_WD365_2A
No ratings yet
Instructions_NP_WD365_2A
4 pages
ManageEngine OpManager Plus Datasheet
No ratings yet
ManageEngine OpManager Plus Datasheet
2 pages

Assignment 1

Uploaded by

Assignment 1

Uploaded by

SUBMITTED BY: SONAL JAIN

ROLL NO: 18/FMS/MBA/073

 Mining Methodology and User Interaction

It refers to the following kinds of issues:

 Mining different kinds of knowledge in databases − Different users may be interested in

There can be performance-related issues such as follows

Diverse Data Types Issues

5. Support Vector Machines

Q5. When do I need unsupervised learning? Discuss.

Q6. Differentiate between supervised, unsupervised, and reinforcement learning.

You might also like