0% found this document useful (0 votes)

8 views

DA-4th unit

The document discusses various concepts in data analytics, focusing on object segmentation, supervised and unsupervised learning, and decision tree algorithms. It explains the methodologies of segmentation, the differences between supervised and unsupervised learning, and details various decision tree techniques such as CHAID and CART. Additionally, it covers the importance of pruning in decision trees and the implications of underfitting in machine learning models.

Uploaded by

Harsha Karthik

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

DA-4th unit

Uploaded by

Harsha Karthik

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

lOMoARcPSD|50889649

Data Analytics - Unit-IV

data mining (ACE Engineering College)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by LAXMI VARSHITHA LABALA ([email protected])
lOMoARcPSD|50889649

UNIT IV
(Data Analytics)
Object Segmentation: Regression Vs Segmentation – Supervised and
Unsupervised Learning, Tree Building – Regression, Classification,
Overfitting, Pruning and Complexity, Multiple Decision Trees etc. Time
Series Methods: Arima, Measures of Forecast Accuracy, STL approach,
Extract features from generated model as Height, Average Energy etc
and Analyze for prediction.
Segmentation:
Segmentation is a methodology that involves dividing broad
market/items/customers into subsets of entities with common
characteristics and homogeneous groups — then designing and
implementing strategies specific to these segments makes easier
decision making.
Segmentation is used in different areas of Risk Management like credit
risk, operational risk, reserving and investment among others.
Segmentation is often used for modeling Credit risk. Applicants are
segmented based on the estimated credit risk and decisions are made
based on the segment in which the applicant falls.
Supervised Machine Learning:
In Supervised learning, you train the machine using data which is well
"labeled." It means some data is already tagged with the correct answer.
It can be compared to learning which takes place in the presence of a
supervisor or a teacher.
A supervised learning algorithm learns from labeled training data,
helps you to predict outcomes for unforeseen data. Successfully
building, scaling, and deploying accurate supervised machine learning
Data science model takes time and technical expertise from a team of
highly skilled data scientists. Moreover, Data scientist must rebuild
models to make sure the insights given remains true until its data
changes.
Why Supervised Learning?
 Supervised learning allows you to collect data or produce a data
output from the previous experience.
 Helps you to optimize performance criteria using experience

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

 Supervised machine learning helps you to solve various types of real

world computation problems.
Unsupervised Learning:
Unsupervised learning is a machine learning technique, where you do not
need to supervise the model. Instead, you need to allow the model to
work on its own to discover information. It mainly deals with the
unlabelled data.

Unsupervised learning algorithms allow you to perform more complex

processing tasks compared to supervised learning. Although,
unsupervised learning can be more unpredictable compared with other
natural learning deep learning and reinforcement learning methods.
Why Unsupervised Learning?
Here, are prime reasons for using Unsupervised Learning:
 Unsupervised machine learning finds all kind of unknown
patterns in data.
 Unsupervised methods help you to find features which can be
useful for categorization.
 It is taken place in real time, so all the input data to be
analyzed and labeled in the presence of learners.
 It is easier to get unlabeled data from a computer than labeled data,
which needs manual intervention.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

Types of Supervised Machine Learning Techniques:

Types of Unsupervised Machine Learning Techniques:

Unsupervised learning problems further grouped into clustering and
association problems.
Clustering:
Clustering is an important concept when it comes to unsupervised
learning. It mainly deals with finding a structure or pattern in a collection
of uncategorized data. Clustering algorithms will process your data and
find natural clusters(groups) if they exist in the data. You can also modify
how many clusters your algorithms should identify. It allows you to
adjust the granularity of these groups.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

Decision Trees:
Decision trees are used to solve both classification and regression
problems in the form of trees that can be incrementally updated by
splitting the dataset into smaller datasets (numerical and categorical),
where the results are represented in the leaf nodes.

There are many specific decision-tree algorithms. Notable ones include:

 ID3 (Iterative Dichotomiser 3)
 C4.5 (successor of ID3)
 CART (Classification And Regression Tree)
 CHAID (CHI-squared Automatic Interaction Detector). Performs
multi- level splits when computing classification trees.
 MARS: extends decision trees to handle numerical data better.
 Conditional Inference Trees.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

CHAID:
CHAID (Chi-square Automatic Interaction Detector) analysis is an
algorithm used for discovering relationships between a categorical
response variable and other categorical predictor variables. It is
useful when looking for patterns in datasets with lots of categorical
variables and is a convenient way of summarizing the data as the
relationships can be easily visualized.
In practice, CHAID is often used in direct marketing to understand how
different groups of customers might respond to a campaign based on their
characteristics. So suppose, for example, that we run a marketing
campaign and are interested in understanding what customer
characteristics (e.g., gender, socio-economic status, geographic
location, etc.) are associated with the response rate achieved. We build
a CHAID “tree” showing the effects of different customer characteristics
on the likelihood of response.

The algorithm performs stepwise splitting. It begins with a single cluster

of cases and searches a candidate set of predictor variables for a way to
split this cluster into two clusters.
Each predictor is tested for splitting as follows:
 Sort all the n cases on the predictor and examine all n-1 ways to
split the cluster in two.
 For each possible split, compute the within-cluster sum of squares
about the mean of the cluster on the dependent variable.
 Choose the best of the n-1 splits to represent the predictor’s
contribution. Now do this for every other predictor.
 For the actual split, choose the predictor and its cut point which
yields the smallest overall within-cluster sum of squares.
Categorical predictors require a different approach. Since categories
are unordered, all possible splits between categories must be
considered. For deciding on one split of k categories into two
groups, this means that 2k-1 possible splits must be considered.
 Once a split is found, its suitability is measured on the same within-
cluster sum of squares as for a quantitative predictor.
 Morgan and Sonquist called their algorithm AID because it naturally
incorporates interaction among predictors. Interaction is not
correlation. It has to do instead with conditional discrepancies. In
the analysis of variance, interaction means that a trend within one
Downloaded by LAXMI VARSHITHA LABALA ([email protected])
lOMoARcPSD|50889649

level of a variable is not parallel to a trend within another level of

the same variable.

CART (Classification and Regression Trees):

A Classification and Regression Tree (CART) is a predictive algorithm
used in machine learning. It explains how a target variable's values
can be predicted based on other values. It is a decision tree where
each fork is a split in a predictor variable and each node at the end
has a prediction for the target variable.
Classification Trees:
A classification tree is an algorithm where the target variable is fixed or
categorical. The algorithm is then used to identify the “class” within which
a target variable would most likely fall.
An example of a classification-type problem would be determining who
will or will not subscribe to a digital platform; or who will or will not
graduate from high school.
These are examples of simple binary classifications where the categorical
dependent variable can assume only one of two, mutually exclusive
values. In other cases, you might have to predict among a number of
different variables.
For instance, you may have to predict which type of smart phone a
consumer may decide to purchase. In such cases, there are multiple
values for the categorical dependent variable.
Here’s what a classic classification tree looks like.

Regression Trees:
A regression tree refers to an algorithm where the target variable is and
the algorithm is used to predict it’s value. As an example of a regression
type problem, you may want to predict the selling prices of a residential
house, which is a continuous dependent variable.
Downloaded by LAXMI VARSHITHA LABALA ([email protected])
lOMoARcPSD|50889649

This will depend on both continuous factors like square footage as well as
categorical factors like the style of home, area in which the property is
located and so on.

The regression trees are leveraged in case where the response

variable is either continuous or numeric, but not categorical.
Regression trees can be applied in case of prices, quantities, or data
involving quantities etc.
The main elements of CART (and any decision tree algorithm) are:
 Rules for splitting data at a node based on the value of one
variable;
 Stopping rules for deciding when a branch is terminal and can be
split no more; and
 Finally, a prediction for the target variable in each terminal node.
Advantages of CART:
 Simple to understand, Interpret and Visualize.
 Variable screening and feature Selection.
 It uses both Numerical and Categorical data.
When to use Classification and Regression Trees:
Classification trees are used when the dataset needs to be split into
classes which belong to the response variable. In many cases, the
classes Yes or No. In other words, they are just two and mutually
exclusive. In some cases, there may be more than two classes in which
case a variant of the classification tree algorithm is used.
Regression trees, on the other hand, are used when the response variable
is continuous. For instance, if the response variable is something like
the price of a property or the temperature of the day, a regression tree
is used. In other words, regression trees are used for prediction-type
problems while classification trees are used for classification-type
problems.
Downloaded by LAXMI VARSHITHA LABALA ([email protected])
lOMoARcPSD|50889649

How does a tree based algorithms decide where to split?

The decision of making strategic splits heavily affects a tree’s accuracy.
The decision criteria is different for classification and regression trees.
Decision trees use multiple algorithms to decide to split a node in two or
more sub-nodes. The creation of sub-nodes increases the homogeneity of
resultant sub-nodes. In other words, we can say that purity of the node
increases with respect to the target variable. Decision tree splits the
nodes on all available variables and then selects the split which results in
most homogeneous sub nodes.
The algorithm selection is also based on type of target variables. Let’s look
at the four most commonly used algorithms in decision tree:
1. Information gain:
The information gain is the amount of information gained about a
random variable or signal from observing another random variable.
Entropy is the average rate at which information is produced by a
stochastic source of data, Or, it is a measure of the uncertainty
associated with a random variable.
Information theory is a measure to define this degree of disorganization in
a system known as Entropy. If the sample is completely
homogeneous, then the entropy is zero and if the sample is an equally
divided (50% – 50%), it has entropy of one.
Information gain is

Entropy:

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

2. Reduction in Variance:
Till now, we have discussed the algorithms for categorical target variable.
Reduction in variance is an algorithm used for continuous target
variables (regression problems). This algorithm uses the standard
formula of variance to choose the best split. The split with lower variance
is selected as the criteria to split the population:

Above X-bar is mean of the values, X is actual and n is number of

values.
Steps to calculate Variance:
1. Calculate variance for each node.
2. Calculate variance for each split as weighted average of each
node variance.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

3. Gini Index:
Gini says, if we select two items from a population at random then
they must be of same class and probability for this is 1 if population is
pure.
1. It works with categorical target variable “Success” or “Failure”.
2. It performs only Binary splits
3. Higher the value of Gini higher the homogeneity.
4. CART (Classification and Regression Tree) uses Gini method to
create binary splits.
Steps to Calculate Gini for a split
1. Calculate Gini for sub-nodes, using formula sum of square of
probability for success and failure (p^2+q^2).
2. Calculate Gini for split using weighted Gini score of each node
of that split.
You might often come across the term ‘Gini Impurity’ which is
determined by subtracting the gini value from 1. So mathematically we
can say,
Gini Impurity = 1-Gini
To compute Gini impurity for a set of items, suppose i ε {1, 2... m},
and let fi be the fraction of items labeled with value i in the set.

4. Chi-Square:
It is an algorithm to find out the statistical significance between the
differences between sub-nodes and parent node. We measure it by sum of
squares of standardized differences between observed and expected
frequencies of target variable.
1. It works with categorical target variable “Success” or “Failure”.
2. It can perform two or more splits.
3. Higher the value of Chi-Square higher the statistical
significance of differences between sub-node and Parent node.
4. Chi-Square of each node is calculated using formula,
Chi-square = ((Actual – Expected)^2 / Expected)^1/2
5. It generates tree called CHAID (Chi-square Automatic
Interaction Detector)

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

Steps to Calculate Chi-square for a split:

1. Calculate Chi-square for individual node by calculating the deviation
for Success and Failure both.
2. Calculated Chi-square of Split using Sum of all Chi-square of
success and Failure of each node of the split
Pruning in Decision Tree:
Pruning is a data compression technique in machine learning and search
algorithms that reduces the size of decision trees by removing sections of
the tree that are non-critical and redundant to classify instances.
Pruning reduces the size of decision trees by removing parts of the tree
that do not provide power to classify instances.
Pruning is better But how to implement it in decision tree? The idea is
simple.
1. We first make the decision tree to a large depth.
2. Then we start at the bottom and start removing leaves which are
giving us negative returns when compared from the top.
3. Suppose a split is giving us a gain of say -10 (loss of 10) and
then the next split on that gives us a gain of 20. A simple decision
tree will stop at step 1 but in pruning, we will see that the overall
gain is +10 and keep both leaves.
Under fitting in Decision Tree:
A statistical model or a machine learning algorithm is said to have under
fitting when it cannot capture the underlying trend of the data.
Under fitting destroys the accuracy of our machine learning model. Its
occurrence simply means that our model or the algorithm does not fit the
data well enough. It usually happens when we have less data to build an
accurate model and also when we try to build a linear model with a
non- linear data.
Under fitting – High bias and low variance
Techniques to reduce under fitting:
1. Increase model complexity
2. Increase number of features
3. Remove noise from the data.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

Over fitting in Decision Tree:

A statistical model is said to be over fitted, when we train it with a lot
of data. When a model gets trained with so much of data, it starts
learning from the noise and inaccurate data entries in our data set.
Then the model does not categorize the data correctly, because of too
many details and noise.
Over fitting is one of the key challenges faced while using tree based
algorithms. If there is no limit set of a decision tree, it will give you
100% accuracy on training set because in the worse case it will end up
making 1 leaf for each observation. Thus, preventing over fitting is
pivotal while modeling a decision tree.

Time series analysis:

Time series analysis is a statistical technique that deals with time
series data, or trend analysis. Time series data means that data is in a
series of particular time periods or intervals.
Time series analysis can be useful to see how a given asset, security, or
economic variable changes over time. It can also be used to examine
how the changes associated with the chosen data point compare to
shifts in other variables over the same time period.
Such data may be collected at regular time intervals, such as, monthly
(eg. CPI), weekly (eg. Money supply), quarterly (eg. GDP) or annually (eg.
Government Budget).
Time series are used in statistics, econometrics, mathematical finance,
weather forecasting, earthquake prediction and many other
applications.
The various reasons or the forces which affect the values of an
observation in a time series are the components of a time series.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

The four categories of the components of time series are

 Trend
 Seasonal Variations
 Cyclic Variations
 Random or Irregular movements
Seasonal and Cyclic Variations are the periodic changes or short-term
fluctuations.

 Long term trend – The smooth long term direction of time series
where the data can increase or decrease in some pattern.
 Seasonal variation – Patterns of change in a time series within a
year which tends to repeat every year.
 Cyclical variation – Its much alike seasonal variation but the
rise and fall of time series over periods are longer than one year.
 Irregular variation – Any variation that is not explainable by
any of the three above mentioned components. They can be
classified into – stationary and non – stationary variation.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

The data is considered in three types:

Time series data: A set of observations on the values that a variable
takes at different times.
Cross-sectional data: Data of one or more variables, collected at the
same point in time.
Pooled data: A combination of time series data and cross-sectional data.
Applications of Time Series:
(a) Forecasting inflation rate or unemployment rates of the net inflow
of foreign funds in the near future could be of interest to the
government.
(b) Firms may be interested in demand for their product (e.g. two-
wheelers, soft drinks bottles, or soaps etc.) or the market share of their
product.
(c) Housing finance companies may want to forecast both the mortgage
interest rate and the demand for housing loans.

There are different time series models (methods) those

are
 Auto regression (AR)
 Moving Average (MA)
 Autoregressive Moving Average (ARMA)
 Autoregressive Integrated Moving Average (ARIMA)

Time series models can be simulated, estimated from data, and used to
produce forecasts of future behavior.
White Noise:
A series is called white noise if it is purely random in nature. Let {Et}
denote such a series then it has zero mean [E(ct)=0], has a constant
variance [V(et)= 02 ] and is an uncorrelated [ER0=0] random variable.

The scatter plot of such a series across time will indicate no pattern
and hence forecasting the future values of such a series is not possible.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

1. Auto Regression (AR):

2. Moving Average (MA)

3. Autoregressive Moving Average (ARMA)

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

Stationary of Time Series:

Autocorrelation function plot (ACF):

Autocorrelation refers to how correlated a time series is with its past
values whereas the ACF is the plot used to see the correlation between
the points, up to and including the lag unit. In ACF, the correlation
coefficient is in the x-axis whereas the number of lags is shown in the y-
axis.

4. Autoregressive Integrated Moving Average (ARIMA)

ARIMA stands for Auto Regressive Integrated Moving Average. There
are seasonal and Non-seasonal ARIMA models that can be used for
forecasting. Stationary time series is when the mean and variance are
constant over time. It is easier to predict when the series is
stationary.
Differencing is a method of transforming a non-stationary time series into
a stationary one. This is an important step in preparing data to be used in
an ARIMA model.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

ARIMA models are generally denoted ARIMA(p, d, q) where parameters p,

d, and q are non-negative integers, p is the order of the Autoregressive
model, d is the degree of differencing, and q is the order of the Moving-
average model.
Measures of Forecast Accuracy:
These measures of forecast accuracy represent how well
the forecasting method can predict the historical values of the time series.
Forecast Accuracy can be defined as the deviation of Forecast or Prediction
from the actual results.

Error = Actual demand – Forecast

Or ei = At – Ft

We measure Forecast Accuracy by 2 methods:

1. Mean Forecast Error (MFE)
2. Mean Absolute Deviation (MAD)
3. Mean Absolute Percentage Error (MAPE)
1. Mean Forecast Error (MFE)
For n time periods where we have actual demand and forecast values:

Ideal value = 0;
MFE > 0, model tends to under-
forecast MFE < 0, model tends to over-
forecast
While MFE is a measure of forecast model bias, MAD indicates the
absolute size of the errors
Uses of Forecast error:
 Forecast model bias
 Absolute size of the forecast errors
 Compare alternative forecasting models
 Identify forecast models that need adjustment
2. Mean Absolute Deviation (MAD)
It is also called MAD for short, and it is the average of the absol ute
value, or the difference between actual values and their average value,
and is used for the calculation of demand variability. It is expressed by
the following formula.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

For n time periods where we have actual demand and forecast values:

While MFE is a measure of forecast model bias, MAD indicates the

absolute size of the errors.
3. Mean Absolute Percentage Error (MAPE):
The mean absolute percentage error (MAPE) is a measure of how
accurate a forecast system is. It measures this accuracy as a percentage,
and can be calculated as the average absolute percent error for each time
period minus actual values divided by actual values.

Where:
 n is the number of fitted points,
 At is the actual value,
 Ft is the forecast value.
 Σ is summation notation (the absolute value is summed for every
forecasted point in time).

The mean absolute percentage error (MAPE) is the most common

measure used to forecast error, and works best if there are no extremes
to the data (and no zeros).
Seasonal And Trend Decomposition Using LOESS (STL):
STL uses the LOESS (LOcal regrESSion) method to model the trend and
seasonal components using polynomial regression. This is a statistical
method of decomposing a Time Series data into 3 components containing
seasonality, trend and residual.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

it is a sequence of data points that varies across a continuous time

axis.
 Loess is a regression technique that uses local weighted
regression to fit a smooth curve through points in a sequence,
which in our case is the Time Series data.
 Trend gives you a general direction of the overall data.
 W he re as s e as onality is a re g ular and predic table pattern
that recur at a fixed interval of time.
 R a n d om ne s s or N oise or Re s id ua l is the random
fluctuation or unpredictable change.
 loess is a regression technique that uses local we ig hte d
reg res s ion to fit a smooth curve throug h points in a sequence
Feature Extraction for Prediction:
 In ML, dimensionality refers to the number of features in your
dataset
 The difficult of searching through the space gets a lot harder as
you have more dimensions
 When the number of features is very large relative to the number of
observations in your dataset, certain algorithms struggle to train
effective models.
 This is called as Curse of Dimensionality and its especially relevant
for distance calculations models like knn, clustering etc.

Types of Dimensionality Reduction:

Feature Selection: In this, we try to find a subset of original set of
variables, or features, to get a smaller subset which can be used to
model the problem
Feature Extraction: This reduces the data in high dimensional space
to a lower dimensions space i.e., with lesser no of dimensions
Principal Component Analysis:
 PCA is a method of extracting important features (in the form of
components) from a large set of variables in the dataset
 It extracts low dimensional set of features from high dimensional
dataset with a motive as much information as possible
 With fewer variables, visualization also becomes much more
meaningful.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

ETL Approach:
Extract, Transform and Load (ETL) refers to a process in database usage
and especially in data warehousing that:

Extracts data from homogeneous or heterogeneous data sources

Transforms the data for storing it in proper format or structure for
querying and analysis purpose
Loads it into the final target (database, more specifically,
operational data store, data mart, or data warehouse)
Usually all the three phases execute in parallel since the data
extraction takes time, so while the data is being pulled another
transformation process executes, processing the already received data and
prepares the data for loading and as soon as there is some data ready
to be loaded into the target,

the data loading kicks off without waiting for the completion of the
previous phases.
ETL systems commonly integrate data from multiple applications
(systems), typically developed and supported by different vendors or
hosted on separate computer hardware.
The disparate systems containing the original data are frequently
managed and operated by different employees. For example, a cost
accounting system may combine data from payroll, sales, and
purchasing.

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

lOMoARcPSD|50889649

Long Answer Questions:

1. Explain about the Advantages of Decision Tree?

2. Discuss about Regression &Classification tree?
3. Explain about Gini index with example?
4. Explain about Supervised and Unsupervised Learning?
5. Explain CHAID algorithm for classification and also discuss its limitations?
6. What are the 4 components in Time series? Explain in detail.
7. What is the difference between ARMA and ARIMA?
8. What are the various steps involved in ETL Process?
9. Discuss about information gain and Chi Square approach?
10. What are the applications of time series? Discuss about various time
series methods?
11. What is STL Approach? Explain in detail?
12. Discuss about Pruning in decision trees?

Downloaded by LAXMI VARSHITHA LABALA ([email protected])

Sanchez. 2019. Properties and Management of Soils in The Tropics
100% (1)
Sanchez. 2019. Properties and Management of Soils in The Tropics
686 pages
Shs Framework Nov1217
No ratings yet
Shs Framework Nov1217
43 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
wibd
No ratings yet
wibd
10 pages
Data Science Techniques Classification Regression and Clustering
No ratings yet
Data Science Techniques Classification Regression and Clustering
5 pages
A Review of Multi-Class Classification Algorithms
No ratings yet
A Review of Multi-Class Classification Algorithms
10 pages
Machine Learning For Beginners
100% (1)
Machine Learning For Beginners
30 pages
ISIM
No ratings yet
ISIM
14 pages
Data Mining
No ratings yet
Data Mining
18 pages
Data Mining: Prof Jyotiranjan Hota
No ratings yet
Data Mining: Prof Jyotiranjan Hota
17 pages
overview_basics
No ratings yet
overview_basics
16 pages
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 3 - Data Mining - WWW - Rgpvnotes.in PDF
10 pages
AppliedML-Chap1-Clustering
No ratings yet
AppliedML-Chap1-Clustering
37 pages
abc
No ratings yet
abc
10 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
21 pages
DS Chapter 5
No ratings yet
DS Chapter 5
28 pages
DM_UNIT-1_FUNDAMENTALS OF DATA MINING (1)
No ratings yet
DM_UNIT-1_FUNDAMENTALS OF DATA MINING (1)
43 pages
MZU-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 3
No ratings yet
MZU-MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 3
39 pages
Data Mining and Visualization Question Bank
100% (1)
Data Mining and Visualization Question Bank
11 pages
Lecture Notes 1.1 & 1.2
No ratings yet
Lecture Notes 1.1 & 1.2
8 pages
Data Mining Unit-1 Notes
No ratings yet
Data Mining Unit-1 Notes
18 pages
Unit III 1
No ratings yet
Unit III 1
22 pages
Viva Data Mining Lab
No ratings yet
Viva Data Mining Lab
11 pages
ML - Unit 1
No ratings yet
ML - Unit 1
87 pages
Data Mining UNIT-2 Notes
No ratings yet
Data Mining UNIT-2 Notes
91 pages
5 What Is Data-WPS Office
No ratings yet
5 What Is Data-WPS Office
19 pages
Comparison of Graph Clustering Algorithms
No ratings yet
Comparison of Graph Clustering Algorithms
6 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
Classification:: Key Components of Classification
No ratings yet
Classification:: Key Components of Classification
21 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
kissu
No ratings yet
kissu
2 pages
1.1 Data and Information Mining
No ratings yet
1.1 Data and Information Mining
24 pages
Big Data Analytics Algorithm, Tools in Systematic Review
No ratings yet
Big Data Analytics Algorithm, Tools in Systematic Review
7 pages
Basic Notes
No ratings yet
Basic Notes
26 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
BDA Literature Survey
No ratings yet
BDA Literature Survey
14 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
Survey of Classification Techniques in Data Mining: Open Access
No ratings yet
Survey of Classification Techniques in Data Mining: Open Access
10 pages
important questions unit-1
No ratings yet
important questions unit-1
20 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
ML-UNSUPERVISED
No ratings yet
ML-UNSUPERVISED
35 pages
Applying Data Mining Techniques in Property Casualty Insurance
No ratings yet
Applying Data Mining Techniques in Property Casualty Insurance
25 pages
DMWH M3
No ratings yet
DMWH M3
21 pages
AI Unit V and II PPT
No ratings yet
AI Unit V and II PPT
40 pages
DWM Unit 3 Final Notes
No ratings yet
DWM Unit 3 Final Notes
47 pages
AI Lecture 02
No ratings yet
AI Lecture 02
40 pages
Unit 1
No ratings yet
Unit 1
52 pages
Anomaly Detection Report
No ratings yet
Anomaly Detection Report
33 pages
ML1 CAOnline Retail IIresearch Paper
No ratings yet
ML1 CAOnline Retail IIresearch Paper
8 pages
Unit 5
No ratings yet
Unit 5
84 pages
Wa0000.
No ratings yet
Wa0000.
26 pages
1 s2.0 S0169023X1830627X Main
No ratings yet
1 s2.0 S0169023X1830627X Main
22 pages
3. DM_AI22C07_UNIT 3 (1)
No ratings yet
3. DM_AI22C07_UNIT 3 (1)
272 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
A Guide To The Types of Machine Learning Algorithms - SAS UK
No ratings yet
A Guide To The Types of Machine Learning Algorithms - SAS UK
5 pages
Classification
No ratings yet
Classification
15 pages
Solution 1
63% (8)
Solution 1
3 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
Literature Review CCSIT205
No ratings yet
Literature Review CCSIT205
9 pages
bahiru dikosa
No ratings yet
bahiru dikosa
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Grammer Correction Tool: Krutika Bhide A-03 Sanjana Bhosle A-09 Gaurav Sharma B-18
No ratings yet
Grammer Correction Tool: Krutika Bhide A-03 Sanjana Bhosle A-09 Gaurav Sharma B-18
15 pages
Thesis Whisperer Methodology
100% (3)
Thesis Whisperer Methodology
6 pages
Performance Task 4th Quarter
No ratings yet
Performance Task 4th Quarter
4 pages
Daygon Activity Design
No ratings yet
Daygon Activity Design
2 pages
Ump Prospectus 2015 2016
0% (1)
Ump Prospectus 2015 2016
168 pages
Guide To Assessment
No ratings yet
Guide To Assessment
67 pages
Certified Cloud Computing Engineer - Sep - 2022 - 0
No ratings yet
Certified Cloud Computing Engineer - Sep - 2022 - 0
11 pages
ARW Guia de Evaluacion para El Alumno - Espanol
No ratings yet
ARW Guia de Evaluacion para El Alumno - Espanol
16 pages
Rec E-1
No ratings yet
Rec E-1
27 pages
PT Form 2 Application For Practice Teaching
No ratings yet
PT Form 2 Application For Practice Teaching
9 pages
Seminar 5 Middle English - Loans
No ratings yet
Seminar 5 Middle English - Loans
3 pages
Nothing in Life Is Free
No ratings yet
Nothing in Life Is Free
2 pages
UBa Student Guide
No ratings yet
UBa Student Guide
23 pages
Time Table July Nov 2024 (1)
No ratings yet
Time Table July Nov 2024 (1)
4 pages
PMP Course Curriculum
No ratings yet
PMP Course Curriculum
6 pages
William Wood - Analytic Theology and The Academic Study of Religion - 2021
100% (1)
William Wood - Analytic Theology and The Academic Study of Religion - 2021
337 pages
Neural Controller Matlab
0% (1)
Neural Controller Matlab
10 pages
REFERENCES
No ratings yet
REFERENCES
3 pages
HIERARCHICAL TAXONOMIC SYSTEM OF CLASSIFICATION
No ratings yet
HIERARCHICAL TAXONOMIC SYSTEM OF CLASSIFICATION
4 pages
Dr. K. Maran: Directorsims@sairam - Edu.in
No ratings yet
Dr. K. Maran: Directorsims@sairam - Edu.in
31 pages
The Writing Process
No ratings yet
The Writing Process
89 pages
An Analysis of Pinker's Work
No ratings yet
An Analysis of Pinker's Work
5 pages
Sample Thesis For Bs Marine Transportation
100% (3)
Sample Thesis For Bs Marine Transportation
6 pages
2296 Shipon-Blum Elisa
No ratings yet
2296 Shipon-Blum Elisa
96 pages
Enrollment Form MTECH RAHUL
No ratings yet
Enrollment Form MTECH RAHUL
2 pages
Structure of Sports
No ratings yet
Structure of Sports
4 pages
Prabhjot New
No ratings yet
Prabhjot New
3 pages
MBA7061 - Operations Management - Assignment
0% (1)
MBA7061 - Operations Management - Assignment
7 pages