0% found this document useful (0 votes)
12 views24 pages

DWDM Unit 4

Uploaded by

vinaydarling063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views24 pages

DWDM Unit 4

Uploaded by

vinaydarling063
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

UNIT- IV

Clustering

Clustering is the process of partitioning a set of data into meaningful similar subclasses
is called cluster

[Or]

Clustering is the grouping the set of objects such a way that the object of same
group’s are grouped together. i.e., while doing clustering analysis, we first partitioning data
into group based on similarity.

Examples of clustering application:

Marketing
Land use
Insurance
City-planning
A categorization of major clustering methods:
1. Partitioning approach:

Construct various partitions and then evaluate them by some criterion.e.g, Minimizing the
sum of square errors.

Typical methods: k-means, k-medoids, CLARANS

2 .Hierarchical approach:

Create a hierarchical decomposition of the set of data (or objects) using some criterion.

Typical methods: Agglomerative, Divisive clustering

BIRCH
ROCK
CHAMELEON
3. Density-based approach:
Based on connectivity and density functions.

Typical methods:

 Denclue [Density based clustering]


 DBSCAN [Density based clustering method based on connected region with sufficiently
high density]
 OPTICS [Ordering point to identify the clustering structure]

4. Grid based methods:

Based on a multiple-level granularity structure.

Typical methods: Wave cluster, STING [Statistical information Grid]

CLIQUE [Clustering In QUEST]

5. Model-based methods:

A model is hypothesized for each of the clusters and tries to find the best fit of that model to each
other.

Typical method: COWWEB, EM [Expectation & Maximization],

SOM[S elf-organizing feature map]

6. Constraint-based methods:

Clustering by considering user-specified or application specific clustering.

Typical methods: COD (Obstacles), constrained clustering.

Partitioning methods:
1. k-mean algorithm:

Step-1 Take mean value (randomly)

Step-2 Find nearest number of mean and put it in cluster.

Step-3 Repeat step-1&step-2 until we get same mean.

Step-4 k-mean method typically uses the Square error criterion function.

EX: O= {2, 3, 4, 10, 11, 12, 20, 25, 30}; K=2

M1=4 M2=12
C1={2,3,4} C2={10,11,12,20,25,30}

M1=9/3 =3 M2=108/6 =18

C1={2,3,4,10} C2={11,12,20,25,3

M1=19/4 =4.75 M2=19.6

M1= ~5 M2= ~20

C1={2,3,4,10,11,12} C2={20,25,30}

M1=7 M2=25

C1={2,3,4,10,11,12} C2={20,25,30}

M1=7 M2=25

2. K-Medoids algorithm:

The k-mean algorithm is sensitivity to outliers because an object with an extremely large values
may substrainly destroy the destroy the distribution of data.

Insetead of taking the mean value of the object in a cluster as a reference point, we can
pick actual objects to represent clusters using one representative object per cluster.

Each remaining object is clustered with the representative object to which it is the most
similar.

Case-1: ‘p’ currenty belongs to representative object Oj.If Oj is replaced by O randam as a


representative object Oi,i≠j , then p is reassigned to Oi.

 →data object

+ →cluster center

- →before swapping

--- →After swapping.

Case-2:’p’ currently belongs to representative object Oj,if Oj is replaced by Orandom as a


representative object and p is close to Orandom then p is assigned to Orandom.

Case-3:’p’ currently belongs to representative object Oi,i≠j, if Oj is replaced by Orandom


as a representative object and p is still close to Oi, then the assigned doesn’t change.

Case-4: ‘p’ currently belongs to representative object Oi, i≠j, if Oj is replaced by


Orandom, then p is reassigned to Orandom.
PAM (partitioning around medoids) algorithm:

1. It is one of the k-medoids Algorithm.

2. It uses a k-medoids to identify the clusters

3. CLARANS [clustering Large Application]:

1. CLARA is a simplifying based method used to partition Large data base.

2. The idea behind CLARA is instead of taking the whole set of data in to
consideration, a small portion of the actual data is choosen.

3. Medoids are then choose from this sample using PAM.

4. CLARA applies PAM on each sample and returns its best clustering as the output.

5. CLARA can deal with larger data sets than PAM. The complexities of Each iteration now
becomes O(ks²+k(n-k))

Where ‘s’ size of the sample


‘k’ is the no.of clusters
‘n’ is the total no.of objects

Hierachical clustering Method (HCM):

1. HCM works by grouping data objects in to tree of clusters.

2. HCM can be further classified as either Agglomerative(or) divisive, depending on


whether the hierarchial decomposition is formed in a bottom - up(merging) (or) top-
down(splitting) fastion.
Agglomerative Hierarchial clustering:

1. This is also known as bottom-up method.

2.It starts by placing each object in its own cluster and merge this atomic clusters in to
large &Larger clusters, until all of the objects are in a single cluster.

2. Divisive Hierarchial clustering:

1. This is also known as top-down method.

2. It does the reverse of agglomerative hierarchial clustering by starting with all objects in
one cluster.

3. It subdivide/split the clustera in to smaller &smaller pieces,until each object from a


cluster on its own.

Disadvantages:

1.Once merge (or) splits step is done it cannot be redo (or) undo.

2. To overcome this problem and to improve the Quality of hierarchial methods is to


integrate with other clustering techniques.

1. BIRCH- Balanced interactive reducing &clustering using hierarchy.


2. ROCKS- Robust clustering using links.
3. CHAMELEON-

BIRCH: It is designed for clustering Large amount of numerical data by integration of


hierarchical clustering.
BIRCH introduces two concepts

(a) Clustering feature.


(b). clustering feature tree (cf-tree)

Given ‘n’ d- dimensional data objects or points in a cluster, we can define the centroid x₀,
Radius R, & diameter D.

Clustering feature(CF): CF is the three dimensional vector summarization information


about clusters of object.

CF= {n, Ls,Ss}


Where n is the no.of points in the cluster
Ls is the Linear sum of the n points(i.e.,∑in=1 xi)
Ss is the square sum of the data points(i.e.,∑i=1n xi²)

Ex: suppose that there are three points(2,5),(3,2)ξ (4,3) in a cluster c1.

CF1={n, Ls, Ss}


CF1= {3, (2+3+4,5+2+3), (2²+3²+4² ,5²+2²+3²)}
={3, (9,10), (29,38)}
suppose that c1 is disjoint to a second cluster c2,
where CF2= (3, (35,36),(417,440)}

CF-tree: It is a height-balanced tree that stores the clustering features for a hierarchical
clustering. the size of a clustering feature tree is dependent on two factors:

1. Branching factor: It decides the maximum number of child nodes for a non-
leafnode.
2. Threshold : It decides the maximum diameter that a subclusters i.e., a
collection of non-leaf and its child node.
CHAMELEON:

Measures the similarity based on a dynamic model: Two clusters are merged only if the
interconnectivity and closeness between two clusters are high relative to the internal
interconnectivity of the clusters and closeness of items with in the clusters.

→Cure ignores information about interconnectivity of objects.

→ Rock ignores information about closeness of two clusters.

There are two – phases of Algorithm:

1. Use a graph partitioning algorithem: cluster objects in to a large number


of relatively small sub-clusters
2. Use an agg lomerative hierarchial clustering algorithm : find the genuine
clusters by repeatedly combining these sub-clusters

Density-based Clustering

The Density-based Clustering tool works by detecting areas where points are concentrated and
where they are separated by areas that are empty or sparse. Points that are not part of a cluster are
labeled as noise.

This tool uses unsupervised machine learning clustering algorithms which automatically detect
patterns based purely on spatial location and the distance to a specified number of neighbors.
These algorithms are considered unsupervised because they do not require any training on what
it means to be a cluster.

Clustering Methods

The Density-based Clustering tool provides three different Clustering Methods with which to
find clusters in your point data:

 Defined distance (DBSCAN)—uses a specified distance to separate dense clusters from


sparser noise. The DBSCAN algorithm is the fastest of the clustering methods, but is only
appropriate if there is a very clear Search Distance to use, and that works well for all
potential clusters. This requires that all meaningful clusters have similar densities.
 Multi-scale (OPTICS)—uses the distance between neighboring features to create a
reachability plot which is then used to separate clusters of varying densities from noise.
The OPTICS algorithm offers the most flexibility in fine-tuning the clusters that are
detected, though it is computationally intensive, particularly with a large Search
Distance.
 For Defined distance (DBSCAN), if the Minimum Features per Cluster cannot be
found within the Search Distance from a particular point, then that point will be marked
as noise. In other words, if the core-distance (the distance required to reach the minimum
number of features) for a feature is greater than the Search Distance, the point is marked
as noise. The Search Distance, when using Defined distance (DBSCAN), is treated as a
search cut-off.
Multi-scale (OPTICS) will search all neighbor distances within the specified Search Distance,
comparing each of them to the core-distance. If any distance is smaller than the core-distance,
then that feature is assigned that core-distance as its reachability distance. If all of the distances
are larger than the core-distance, then the smallest of those distances is assigned as the
reachability distance.

While only Multi-scale (OPTICS) uses the reachability plot to detect clusters, the plot can be
used to help explain, conceptually, how these methods differ from each other. For the purposes
of illustration, the reachability plot below will be used to explain the differences in the 3
methods. The plot reveals clusters of varying densities and separation distances.

Grid-Based clustering Methods:

Using multi-resolution grid data structure.

1. STING [A statistical Information Grid approach]


→The spatial area is divided into regular cells.

→There are several levels of cells, corresponding to different levels of resolution.

→Each cell at a high level is partitioned into number of smaller cells in the next lower
level.

→Statistical info of each cell is calculated and stored before hand and is used to
answer Queries.

→parameters of higher level cells can be easily calculated from parameters of lower
level cell

→Remove the irrelevant cells from further consideration

→when finish examining the current layer, proceed to the next lower level.

→Repeat this process until the bottom layer is reached.

Advantages:

1. Query-independent, easy to parallelize, incremental update


2. O(k), where k is the number of grid cells at the lowest level

Disadvantages: All the cluster boundaries are either horizontal or vertical, and no
diagonal boundary is detected.

Wave cluster: clustering by wavelet Analysis

A multi-resolution clustering approach which applies wavelet transform to the feature


space. How to apply wavelet transform to find clusters.

Summarizes the data by imposing a multi dimensional grid grid structure on to data space
These multidimensional spatial data objects are represented in a n-dimensional feature
space.

Apply wavelet transform on feature space to find the dense regions in the feature space.

Model-Based Clustering Methods


Model-based clustering methods attempt to optimize the fit between the given data and some
mathematical model. Such methods are often based on the assumption that the data are generated
by a mixture of underlying probability distributions.
Expectation-Maximization (EM):
In practice, each cluster can be represented mathematically by a parametric probability
distribution. The entire data is a mixture of these distributions, where each individual distribution
is typically referred to as a component distribution.
The EM (Expectation-Maximization) algorithm is a popular iterative refinement algorithm that
can be used for finding the parameter estimates. It can be viewed as an extension of the k-means
paradigm, which assigns an object to the cluster with which it is most similar, based on the
cluster mean.
Conceptual Clustering
Conceptual clustering is a form of clustering in machine learning that, given a set of unlabeled
objects, produces a classification scheme over the objects. Unlike conventional clustering, which
primarily identifies groups of like objects, conceptual clustering goes one step further by also
finding characteristic descriptions for each group, where each group represents a concept or
class. Hence, conceptual clustering is a two-step process: clustering is performed first, followed
by characterization.

COBWEB is a popular and simple method of incremental conceptual clustering. Its input objects
are described by categorical attribute-value pairs. COBWEB creates a hierarchical clustering in
the form of a classification tree.

Outlier Analysis

“What is an outlier?” Very often, there exist data objects that do not comply with the general
behavior or model of the data. Such data objects, which are grossly different from or inconsistent
with the remaining set of data, are called outliers. Outliers can be caused by measurement or
execution error.

Many data mining algorithms try to minimize the influence of outliers or eliminate them all
together. This, however, could result in the loss of important hidden information because one
person’s noise could be another person’s signal. In other words, the outliers may be of particular
interest, such as in the case of fraud detection, where outliers may indicate fraudulent activity.
Thus, outlier detection and analysis is an interesting data mining task, referred to as outlier
mining.

Outlier mining has wide applications. As mentioned previously, it can be used in fraud detection,
for example, by detecting unusual usage of credit cards or telecommunication services. In
addition, it is useful in customized marketing for identifying the spending behavior of customers
with extremely low or extremely high incomes, or in medical analysis for finding unusual
responses to various medical treatments.

Web data mining:


Web mining can widely be seen as the application of adapted data mining techniques to the web,
whereas data mining is defined as the application of the algorithm to discover patterns on mostly
structured data embedded into a knowledge discovery process. Web mining has a distinctive
property to provide a set of various data types.

The web has multiple aspects that yield different approaches for the mining process, such as web
pages consist of text, web pages are linked via hyperlinks, and user activity can be monitored via
web server logs. These three features lead to the differentiation between the three areas are web
content mining, web structure mining, web usage mining.

Applications of Web Mining:

Web mining is the process of discovering patterns, structures, and relationships in web data. It
involves using data mining techniques to analyze web data and extract valuable insights. The
applications of web mining are wide-ranging and include:
Personalized marketing:
Web mining can be used to analyze customer behavior on websites and social media platforms.
This information can be used to create personalized marketing campaigns that target customers
based on their interests and preferences.
E-commerce
Web mining can be used to analyze customer behavior on e-commerce websites. This
information can be used to improve the user experience and increase sales by recommending
products based on customer preferences.
Search engine optimization:
Web mining can be used to analyze search engine queries and search engine results pages
(SERPs). This information can be used to improve the visibility of websites in search engine
results and increase traffic to the website.
Fraud detection:
Web mining can be used to detect fraudulent activity on websites. This information can be
used to prevent financial fraud, identity theft, and other types of online fraud.
Sentiment analysis:
Web mining can be used to analyze social media data and extract sentiment from posts,
comments, and reviews. This information can be used to understand customer sentiment
towards products and services and make informed business decisions.
Web content analysis:
Web mining can be used to analyze web content and extract valuable information such as
keywords, topics, and themes. This information can be used to improve the relevance of web
content and optimize search engine rankings.
Customer service:
Web mining can be used to analyze customer service interactions on websites and social media
platforms. This information can be used to improve the quality of customer service and
identify areas for improvement.
Healthcare:
Web mining can be used to analyze health-related websites and extract valuable information
about diseases, treatments, and medications. This information can be used to improve the
quality of healthcare and inform medical research.
Process of Web Mining:

Web Mining Process

Web mining can be broadly divided into three different types of techniques of mining: Web
Content Mining, Web Structure Mining, and Web Usage Mining. These are explained as
following below.
Web Content Mining?
Web Content Mining can be used for the mining of useful data, information, and knowledge
from web page content. Web content mining performs scanning and mining of the text, images,
and group of web pages according to the content of the input by displaying the list in search
engines.

It is also quite different from data mining because web data are mainly semi-structured or
unstructured, while data mining deals primarily with structured data. Web content mining is also
different from text mining because of the semi-structured nature of the web, while text mining
focuses on unstructured texts. Thus, Web content mining requires creative applications of data
mining and text mining techniques and its own unique approaches.

In the past few years, there has been a rapid expansion of activities in the web content mining
area. This is not surprising because of the phenomenal growth of web content and the significant
economic benefit of such mining. However, due to the heterogeneity and the lack of structure of
web data, automated discovery of targeted or unexpected knowledge information still present
many challenging research problems. Web content mining could be differentiated from two
approaches, such as:

1. Agent-based Approach

This approach involves intelligent systems. It aims to improve information finding and filtering.
It usually relies on autonomous agents that can identify relevant websites. And it could be placed
into the following three categories, such as:

o Intelligent Search Agents: These agents search for relevant information using domain
characteristics and user profiles to organize and interpret the discovered information.
o Information Filtering or Categorization: These agents use information retrieval
techniques and characteristics of open hypertext Web documents to retrieve
automatically, filter, and categorize them.
o Personalized Web Agents: These agents learn user preferences and discover Web
information based on other users' preferences with similar interests.

2. Data based approach

Data based approach is used to organize semi-structured data present on the internet into
structured data. It aims to model the web data into a more structured form to apply standard
database querying mechanisms and data mining applications to analyze it.

Web Content Mining Challenges

Web content mining has the following problems or challenges also with their solutions, such as:

o Data Extraction: Extraction of structured data from Web pages, such as products and
search results. Extracting such data allows one to provide services. Two main types of
techniques, machine learning and automatic extraction, are used to solve this problem.
o Web Information Integration and Schema Matching: Although the Web contains a
huge amount of data, each website (or even page) represents similar information
differently. Identifying or matching semantically similar data is an important problem
with many practical applications.
o Opinion extraction from online sources: There are many online opinion sources, e.g.,
customer reviews of products, forums, blogs, and chat rooms. Mining opinions are of
great importance for marketing intelligence and product benchmarking.
o Knowledge synthesis: Concept hierarchies or ontology are useful in many applications.
However, generating them manually is very time-consuming. The main application is to
synthesize and organize the pieces of information on the web to give the user a coherent
picture of the topic domain. A few existing methods that explore the web's information
redundancy will be presented.
o Segmenting Web pages and detecting noise: In many Web applications, one only wants
the main content of the Web page without advertisements, navigation links, copyright
notices. Automatically segmenting Web pages to extract the pages' main content is an
interesting problem.
What is Web Structure Mining?

The challenge for Web structure mining is to deal with the structure of the hyperlinks within the
web itself. Link analysis is an old area of research. However, with the growing interest in Web
mining, the research of structure analysis has increased. These efforts resulted in a newly
emerging research area called Link Mining, which is located at the intersection of the work in
link analysis, hypertext, web mining, relational learning, inductive logic programming, and graph
mining.

Web structure mining uses graph theory to analyze a website's node and connection structure.
According to the type of web structural data, web structure mining can be divided into two kinds:

o Extracting patterns from hyperlinks in the web: a hyperlink is a structural component


that connects the web page to a different location.
o Mining the document structure: analysis of the tree-like structure of page structures to
describe HTML or XML tag usage.

The web contains a variety of objects with almost no unifying structure, with differences in the
authoring style and content much greater than in traditional collections of text documents. The
objects in the WWW are web pages, and links are in, out, and co-citation (two pages linked to by
the same page). Attributes include HTML tags, word appearances, and anchor texts. Web
structure mining includes the following terminology, such as:

o Web graph: directed graph representing web.


o Node: web page in the graph.
o Edge: hyperlinks.
o In degree: the number of links pointing to a particular node.
o Out degree: number of links generated from a particular node.

An example of a technique of web structure mining is the Page-Rank algorithm used by Google
to rank search results. A page's rank is decided by the number and quality of links pointing to the
target node.

Link mining had produced some agitation on some traditional data mining tasks. Below we
summarize some of these possible tasks of link mining which are applicable in Web structure
mining, such as:

1. Link-based Classification: The most recent upgrade of a classic data mining task to
linked Domains. The task is to predict the category of a web page based on words that
occur on the page, links between pages, anchor text, html tags, and other possible
attributes found on the web page.
2. Link-based Cluster Analysis: The data is segmented into groups, where similar objects
are grouped together, and dissimilar objects are grouped into different groups. Unlike the
previous task, link-based cluster analysis is unsupervised and can be used to discover
hidden patterns from data.
3. Link Type: There is a wide range of tasks concerning predicting the existence of links,
such as predicting the type of link between two entities or predicting the purpose of a
link.
4. Link Strength: Links could be associated with weights.
5. Link Cardinality: The main task is to predict the number of links between objects. page
categorization used to
o Finding related pages.
o Finding duplicated websites and finding out the similarity between them.

What is Web Usage Mining?

Web Usage Mining focuses on techniques that could predict the behavior of users while they are
interacting with the WWW. Web usage mining, discovering user navigation patterns from web
data, trying to discover useful information from the secondary data derived from users'
interactions while surfing the web. Web usage mining collects the data from Weblog records to
discover user access patterns of web pages. Several available research projects and commercial
tools analyze those patterns for different purposes. The insight knowledge could be utilized in
personalization, system improvement, site modification, business intelligence, and usage
characterization.

The only information left behind by many users visiting a Web site is the path through the pages
they have accessed. Most of the Web information retrieval tools only use textual information,
while they ignore the link information that could be very valuable. In general, there are mainly
four kinds of data mining techniques applied to the web mining domain to discover the user
navigation pattern, such as:

1. Association Rule Mining

Association rule is the most basic rule of data mining methods which is used more than other
methods in web usage mining. This method enables the website for more efficient content
organization or provides recommendations for an effective cross-selling product.

These rules are statements in the form X => Y where (X) and (Y) are the set of available items in
a series of transactions. The rule of X => Y states that transactions that contain items in X may
also include items in Y. Association rules in the web usage mining are used to find relationships
between pages that frequently appear next to one another in user sessions.

2. Sequential Patterns

Sequential patterns are used to discover the subsequence in a large volume of sequential data. In
web usage mining, sequential patterns are used to find user navigation patterns that frequently
appear at meetings. The sequential patterns may seem to be association rules. But the sequential
patterns are included the time, which means that the sequence of events that occurred is defined
in sequential patterns. Algorithms that are used to extract association rules can also be used to
generate sequential patterns. Two types of algorithms are used for sequential mining patterns.

o The first type of algorithm is based on association rules mining. Many common
algorithms of sequential mining patterns have been changed for mining association rules.
For example, GSP and AprioriAll are two developed species of Apriori algorithms that
are used to extract association rules. But some researchers believe that association rules
mining algorithms do not have enough performance in the long sequential patterns
mining.
o The second type of sequential patterns mining algorithms has been introduced in which
the tree structure and Markov chain are used to represent survey patterns. For example, in
one of these algorithms called WAP-mine, the tree structure called WAP-tree is used to
explore access patterns to the web. Evaluation results show that its performance is higher
than an algorithm such as GSP.

3. Clustering

Clustering techniques diagnose groups of similar items among high volumes of data. This is
done based on distance functions which measure the degree of similarity between different items.
Clustering in web usage mining is used for grouping similar meetings. What is important in this
type of search is the contrast between the user and individual groups. Two types of interesting
clustering can be found in this area: user clustering and page clustering.

Clustering of user records is usually used to analyze web mining and web analytics tasks. More
knowledge derived from clustering is used to partition the market in e-commerce. Different
methods and techniques are used for clustering, which includes:

o Using the similarity graph and the amount of time spent viewing a page to estimate the
similarity of meetings.
o Using genetic algorithms and user feedback.
o Clustering matrix.
o K -means algorithm, which is the most classic clustering method.

The repetitive patterns are first extracted from the user's sessions using association rules in other
clustering methods. Then, these patterns are used to construct a graph where the nodes are the
visited pages. The edges of the graph connect two or more pages. If these pages exist in a pattern
extracted, the weight will be assigned to the edges that show the relationship between the nodes.
Then, for clustering, this graph is recursively divided to user behavior groups are detected.

4. Classification Mining

Discovering classification rules allows one to develop a profile of items belonging to a particular
group according to their common attributes. This profile can classify new data items added to the
database. In Web Mining, classified techniques allow one to develop a profile for clients who
access particular server files based on demographic information available on those clients or
their navigation patterns.

What is a Search Engine?


A search engine is software that brings to user relevant information (which they search) from
the vast library of data available on World Wide Web. Users can search for multiple things
including queries, documents, images, videos, webpages, and other content on a search engine.

Search engines are build in such a way that they effectively generate the required information
by crawling across the web and searching from the available databases on internet.

Working of a search engine

If we look back to earlier example the search engine acts as a librarian that gathers relevant
books which is required information from the library of data available on the internet.
To summarize, when user searches for a particular data the web crawlers scan or crawl through
the data available on web and gather all the relevant information (Crawling).

The search engine then picks up the most relevant results according to the ranking and finally
displays it in the results page or SERP. It is quite a technical process, but all this happens so
quickly that user gets the results as soon as they search something on the search engine.
Architecture Of Search Engine
If we talk about the architecture or the framework of a search engine, it can be described in
three main components –
• Web crawlers – As the name suggests these acts as spiders which crawl all over the web to
collect required information. These are special bots that search throughout the internet and
accumulates data using various links.
• Database – It is a collection of data which is gathered by the web crawlers after searching
throughout the World Wide Web.
• Search Interface – It provides a medium or interface for users so that they can access and
search on the database for required information.

Prediction:
Prediction Is a type of classification technique. It is a Linear and Continuous
classification Technique, to classify data Prediction uses one method called Linear
regression.
Linear regression:
It is a classification method used in prediction to classify data. This
technique involves two types of variables one is called response variable (y) and
another is called Linear variable(x).
In this technique we have to calculate the response
variable ‘y’ as follows

Y=w0+w1
Where w0= y−w 1 x
n

∑ ( x i−x ) ( y i− y )
i=1
n

∑ ( x i− x )
2

i=1

‘x’ is a given value of x ξ y are the mean values of x &y

Example:Type equation here .

The following table contains values of x & y

Mid-exam- marks(x) Final- marks(y)


72 84
50 63
81 71
74 78
94 96
86 75

SOL:
Step-1:

Calculate the x , y values as, follows


72+50+80+74 +94+ 86
x= =76.5
6

84 +63+71+78+96 +75
y= =77.5
6

Step-2:

Calculate the response variable ‘y’ as follows:

Y= w0+w1

w1=
n

∑ ( x i−x ) ( y i− y )
i=1
n

∑ ( x i− x )
2

i=1

(72-76.5) (84-77.8) +(50-76.5) (63-77.8) +(81-76.5) (71-77.8) + (74-76.5) (78-


77.8) + (94-76.5) (96-77.8) + (86-76.5) (75-77.8)

= ( 72−76.5 )2 ( 50−76.5 )2 ( 81−76.5 )2 ( 74−76.5 )2 ( 94−76.5 )2 ( 86−76.5 )2

W1=0.42

W0=77.8 -(0.42) (76.5) =45.67

X= 87(given value)

Y=w0+w1x = 45.6+(0.42) (87) =82.14

Hence x= 87 & y= 83

Plot the graph according to the x & y valves

You might also like