0% found this document useful (0 votes)
35 views

Cluster Analysis

Discriminant analysis is a statistical technique used to classify observations into predefined groups based on a set of predictor variables. It can be used for descriptive purposes to assess how well observations have been classified into groups, or for predictive purposes to assign new observations to existing groups. Common discriminant analysis methods include multiple discriminant analysis, Fisher's linear discriminant analysis, and K-nearest neighbors discriminant analysis.

Uploaded by

Deepak Bhardwaj
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Cluster Analysis

Discriminant analysis is a statistical technique used to classify observations into predefined groups based on a set of predictor variables. It can be used for descriptive purposes to assess how well observations have been classified into groups, or for predictive purposes to assign new observations to existing groups. Common discriminant analysis methods include multiple discriminant analysis, Fisher's linear discriminant analysis, and K-nearest neighbors discriminant analysis.

Uploaded by

Deepak Bhardwaj
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Discriminant analysis

Discriminant analysis is a technique for classifying a set of observations into predefined classes. The purpose is to determine the class of an observation based on a set of variables known as predictors or input variables. Discriminant Analysis may be used for two objectives: either we want to assess the adequacy of classification, given the group memberships of the objects under study; or we wish to assign objects to one of a number of (known) groups of objects. Discriminant Analysis may thus have a descriptive or a predictive objective. In both cases, some group assignments must be known before carrying out the Discriminant Analysis. Such group assignments, or labelling, may be arrived at in any way. Hence Discriminant Analysis can be employed as a useful complement to Cluster Analysis (in order to judge the results of the latter) or Principal Components Analysis. Alternatively, in stargalaxy separation, for instance, using digitised images, the analyst may define group (stars, galaxies) membership visually for a conveniently small training set or design se Methods implemented in this area are Multiple Discriminant Analysis, Fisher's Linear Discriminant Analysis, and K-Nearest Neighbours Discriminant Analysis. Multiple Discriminant Analysis (MDA) is also termed Discriminant Factor Analysis and Canonical Discriminant Analysis. It adopts a similar perspective to PCA: the rows of the data matrix to be examined constitute points in a multidimensional space, as also do the group mean vectors. Discriminating axes are determined in this space, in such a way that optimal separation of the predefined groups is attained. As with PCA, the problem becomes mathematically the Eigen reduction of a real, symmetric matrix. The eigenvalues represent the discriminating power of the associated eigenvectors. The NY groups lie in a space of dimension at most NY - 1. This will be the number of discriminant axes or factors obtainable in the most common practical case when n>m>nY (where n is the number of rows, and m the number of columns of the input data matrix). Linear Discriminant Analysis It is the 2-group case of MDA. It optimally separates two groups, using the generalized distance. It also gives the same linear separating decision surface as Bayesian maximum likelihood discrimination in the case of equal class covariance matrices. K-NNs Discriminant Analysis Non-parametric (distribution-free) methods dispense with the need for assumptions regarding the probability density function. They have become very popular especially in the image processing area. The K-NNs method assigns an object of unknown affiliation to the group to which the majority of its K nearest neighbours belongs.

Factor analysis
Factor analysis is a statistical method used to describe variability among observed variables in terms of a potentially lower number of unobserved variables called factors. In other words, it is possible, for example, that variations in three or four observed variables mainly reflect the variations in a single unobserved variable, or in a reduced number of unobserved variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modeled as linear combinations of the potential factors, plus "error" terms. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset. Factor analysis originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data. Factor analysis is related to principal component analysis (PCA), but the two are not identical. Because PCA performs a variance-maximizing rotation of the variable space, it takes into account all variability in the variables. In contrast, factor analysis estimates how much of the variability is due to common factors ("communality"). The two methods become essentially equivalent if the error terms in the factor analysis model (the variability not explained by common factors, see below) can be assumed to all have the same variance

Cluster analysis Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. Clustering is a method of unsupervisedlearning, and a common technique for statisticaldata analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis, information retrieval, and bioinformatics. Types of Clustering: 1.Hierarchical clustering Hierarchical clustering creates a hierarchy of clusters which may be represented in a tree structure called a dendrogram. The root of the tree consists of a single cluster containing all observations, and the leaves correspond to individual observations. Algorithms for hierarchical clustering are generally either agglomerative, in which one starts at the leaves and successively merges clusters together; or divisive, in which one starts at the root and recursively splits the clusters. Any non-negative-valued function may be used as a measure of similarity between pairs of observations. The choice of which clusters to merge or split is determined by a linkage criterion, which is a function of the pairwise distances between observations.

2. Spect l cl teri Thi t pe of cl ter anal i i used in Biological Studies, Medicine, Psychology and Neuroscience, Market research, Educational research etc. Gi en a set of data points A, the si ilarity matri may be defined as a matri S where Sij represents a measure of the similarity between points . Spectral clustering techni ues make use of the spectrum of the similarity matri of the data to perform dimensionality reduction for clustering in fewer dimensions. C t al orithm by Shi Malik, commonly used for One such techni ue is the Normali image segmentation. It partitions points into two sets (S1,S 2) based on the eigenvectorv corresponding to the second-smallest eigenvalue of the Laplacian matri

L = I D 1 / 2SD 1 / 2 ofS, where D is the diagonal matri


Dii =

Sij.
j

This partitioning may be done in various ways, such as by taking the medianm of the components in v , and placing all points whose component in v is greater than m in S1, and the rest in S2. The algorithm can be used for hierarchical clustering by repeatedly partitioning the subsets in this fashion.

3. Partitional Analysis

K-means and derivatives


k-means cl stering The k-means algorithm assigns each point to the cluster whose center (also called centroid) is nearest. The center is the average of all the points in the cluster that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster. Exampl : The data set has three dimensions and the cluster has two points: X = (x1,x 2,x3) and Y = (y1,y2,y3). Then the centroid Z becomes Z = (z1,z2,z3), where

and

Advantages of cluster analysis It is easy for users to assign or nominate themselves into a cluster they would most like to compare with in a school cluster database because each cluster is clearly named with understandable terms. Cluster Analysis provides a simple profile of individuals. Given a number of analysis units, for example school size, student ethnicity, region, size of civil jurisdiction and social economic status in this example, each of which is described by a set of characteristics and attributes. Cluster Analysis also suggests how groups of units are determined such that units within groups are similar in some respect and unlike those from other groups. Disadvantages of cluster analysis An object can be assigned in one cluster only. For example in 'Schools Like Mine', schools are automatically assigned into the first twenty-two clusters. However, if schools want to compare themselves with integrated schools, they will have to manually assign themselves into cluster twenty-three. Data-driven clustering may not represent reality, because once a school is assigned to a cluster, it cannot be assigned to another one. Some schools may have more than one significant property or fall on the edge of two clusters. Clustering may have detrimental effects to teachers who work in low-decile schools, students who are educated in them, and parents who support them, by telling them the schools are classified as ineffective, when in fact many are doing well in some unique aspects that are not sufficiently illustrated by the clusters formed.

Conjoint Analysis

Conjoint analysis is a statistical technique used in market research to determine how people value different features that make up an individual product or service. The objective of conjoint analysis is to determine what combination of a limited number of attributes is most influential on respondent choice or decision making. A controlled set of potential products or services is shown to respondents and by analyzing how they make preferences between these products, the implicit valuation of the individual elements making up the product or service can be determined. These implicit valuations (utilities or partworths) can be used to create market models that estimate market share, revenue and even profitability of new designs. Conjoint analysis techniques may also be referred to as multiattribute compositional modelling, discrete choice modelling, or stated preference research, and is part of a broader set of trade-off analysis tools used for systematic analysis of decisions. These tools include Brand-Price Trade-Off, Simalto, and mathematical approaches such as evolutionary algorithms or Rule Developing Experimentation.

Advantages
y y y y y y

estimates psychological tradeoffs that consumers make when evaluating several attributes together measures preferences at the individual level uncovers real or hidden drivers which may not be apparent to the respondent themselves realistic choice or shopping task able to use physical objects if appropriately designed, the ability to model interactions between attributes can be used to develop needs based segmentation

Disadvantages
y y y

y y y

designing conjoint studies can be complex with too many options, respondents resort to simplification strategies difficult to use for product positioning research because there is no procedure for converting perceptions about actual features to perceptions about a reduced set of underlying features respondents are unable to articulate attitudes toward new categories, or may feel forced to think about issues they would otherwise not give much thought to poorly designed studies may over-value emotional/preference variables and undervalue concrete variables does not take into account the number items per purchase so it can give a poor reading of market share.

Ratio Scale
When a scale consists not only of equidistant points but also has a meaningful zero point, then we refer to it as a ratio scale. Sales figures, quantities purchased and market share are all expressed on a ratio scale. Ratio scales should be used to gather quantitative information, and we see them perhaps most commonly when respondents are asked for their age, income, years of participation, etc.
Ratio scales are the most sophisticated of scales, since it incorporates all the characteristics of nominal , ordinal andinterval scales . As a result, a large number of descriptive calculations are applicable.

Survey Administration
Amplitude Research is expert in all aspects of web survey administration including study design, questionnaire writing, sampling, survey programming, survey hosting, list management, data tabulations and reporting services. Our goal is to provide an easy and enjoyable survey administration process for our clients and their survey participants.

You might also like