0% found this document useful (0 votes)
15 views

Module 1

Uploaded by

Jitesh Shelke
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Module 1

Uploaded by

Jitesh Shelke
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Module 1.

1
Introduction: Machine learning, Examples of Machine Learning Problems, Learning versus
Designing, Training versus Testing, Characteristics of Machine learning tasks, Predictive and
descriptive tasks.

Machine learning
machine learning as – a “Field of study that gives computers the capability to learn
without being explicitly programmed”.

Machine Learning(ML) can be explained as automating and improving the learning


process of computers based on their experiences without being actually programmed i.e.
without any human assistance.

Data in Machine Learning


DATA: It can be any unprocessed fact, value, text, sound, or picture that is not being
interpreted and analyzed. Data is the most important part of all Data Analytics, Machine
Learning, Artificial Intelligence. Without data, we can’t train any model and all modern
research and automation will go in vain. Big Enterprises are spending lots of money just
to gather as much certain data as possible.
INFORMATION: Data that has been interpreted and manipulated and has now some
meaningful inference for the users.
KNOWLEDGE: Combination of inferred information, experiences, learning, and
insights. Results in awareness or concept building for an individual or organization.

How we split data in Machine Learning?


● Training Data: The part of data we use to train our model. This is the data that your
model actually sees(both input and output) and learns from.
● Validation Data: The part of data that is used to do a frequent evaluation of the
model, fit on the training dataset along with improving involved hyperparameters
(initially set parameters before the model begins learning). This data plays its part
when the model is actually training.
● Testing Data: Once our model is completely trained, testing data provides an
unbiased evaluation. When we feed in the inputs of Testing data, our model will
predict some values(without seeing actual output). After prediction, we evaluate our
model by comparing it with the actual output present in the testing data. This is how
we evaluate and see how much our model has learned from the experiences feed in
as training data, set at the time of training.

Applications of Machine Learning include:


● Web Search Engine: One of the reasons why search engines like google, bing etc
work so well is because the system has learnt how to rank pages through a complex
learning algorithm.
● Photo tagging Applications: Be it facebook or any other photo tagging application,
the ability to tag friends makes it even more happening. It is all possible because of
a face recognition algorithm that runs behind the application.
● Spam Detector: Our mail agent like Gmail or Hotmail does a lot of hard work for
us in classifying the mails and moving the spam mails to spam folder. This is again
achieved by a spam classifier running in the back end of mail application.

Today, companies are using Machine Learning to improve business decisions,increase


productivity, detect disease, forecast weather, and do many more things. With the
exponential growth of technology, we not only need better tools to understand the data
we currently have, but we also need to prepare ourselves for the data we will have. To
achieve this goal we need to build intelligent machines. We can write a program to do
simple things. But for most of times Hardwiring Intelligence in it is difficult. Best way to
do it is to have some way for machines to learn things themselves. A mechanism for
learning – if a machine can learn from input then it does the hard work for us. This is
where Machine Learning comes in action. Some examples of machine learning are:
● Database Mining for growth of automation: Typical applications include
Web-click data for better UX( User eXperience), Medical records for better
automation in healthcare, biological data and many more.
● Applications that cannot be programmed: There are some tasks that cannot
be programmed as the computers we use are not modeled that way. Examples
include Autonomous Driving, Recognition tasks from unordered data (Face
Recognition/ Handwriting Recognition), Natural language Processing,
computer Vision etc.
● Understanding Human Learning: This is the closest we have understood and
mimicked the human brain. It is the start of a new revolution, The real AI.
Now, After a brief insight lets come to a more formal definition of Machine
Learning

7 CHARACTERISTICS OF MACHINE LEARNING

1- THE ABILITY TO PERFORM AUTOMATED DATA VISUALIZATION

A massive amount of data is being generated by businesses and common people on a


regular basis. By visualizing notable relationships in data, businesses can not only make
better decisions but build confidence as well. Machine learning offers a number of tools
that provide rich snippets of data which can be applied to both unstructured and
structured data. With the help of user-friendly automated data visualization platforms in
machine learning, businesses can obtain a wealth of new insights in an effort to increase
productivity in their processes.

2- AUTOMATION AT ITS BEST

One of the biggest characteristics of machine learning is its ability to automate repetitive
tasks and thus, increasing productivity. A huge number of organizations are already using
machine learning-powered paperwork and email automation. In the financial sector, for
example, a huge number of repetitive, data-heavy and predictable tasks are needed to be
performed. Because of this, this sector uses different types of machine learning solutions
to a great extent. The make accounting tasks faster, more insightful, and more accurate.
Some aspects that have been already addressed by machine learning include addressing
financial queries with the help of chatbots, making predictions, managing expenses,
simplifying invoicing, and automating bank reconciliations.

3- CUSTOMER ENGAGEMENT LIKE NEVER BEFORE


For any business, one of the most crucial ways to drive engagement, promote brand
loyalty and establish long-lasting customer relationships is by triggering meaningful
conversations with its target customer base. Machine learning plays a critical role in
enabling businesses and brands to spark more valuable conversations in terms of
customer engagement. The technology analyzes particular phrases, words, sentences,
idioms, and content formats which resonate with certain audience members. You can
think of Pinterest which is successfully using machine learning to personalize suggestions
to its users. It uses the technology to source content in which users will be interested,
based on objects which they have pinned already.

4- THE ABILITY TO TAKE EFFICIENCY TO THE NEXT LEVEL WHEN MERGED WITH IOT

Thanks to the huge hype surrounding the IoT, machine learning has experienced a great
rise in popularity. IoT is being designated as a strategically significant area by many
companies. And many others have launched pilot projects to gauge the potential of IoT in
the context of business operations. But attaining financial benefits through IoT isn’t easy.
In order to achieve success, companies, which are offering IoT consulting services and
platforms, need to clearly determine the areas that will change with the implementation of
IoT strategies. Many of these businesses have failed to address it. In this scenario,
machine learning is probably the best technology that can be used to attain higher levels
of efficiency. By merging machine learning with IoT, businesses can boost the efficiency
of their entire production processes.

5- THE ABILITY TO CHANGE THE MORTGAGE MARKET

It’s a fact that fostering a positive credit score usually takes discipline, time, and lots of
financial planning for a lot of consumers. When it comes to the lenders, the consumer
credit score is one of the biggest measures of credit worthiness that involve a number of
factors including payment history, total debt, length of credit history etc. But wouldn’t it
be great if there is a simplified and better measure? With the help of machine learning,
lenders can now obtain a more comprehensive consumer picture. Bank can now predict
whether the customer is a low spender or a high spender and understand his/her tipping
point of spending. Apart from mortgage lending, financial institutions are using the same
techniques for other types of consumer loans.

6- ACCURATE DATA ANALYSIS

Traditionally, data analysis has always encompassed trial and error methods, an approach
which becomes impossible when we are working with large and heterogeneous datasets.
Machine learning comes as the best solution to all these issues by offering effective
alternatives to analyzing massive volumes of data. By developing efficient and fast
algorithms, as well as, data-driven models for processing of data in real-time, machine
learning is able to generate accurate analysis and results.
7- BUSINESS INTELLIGENCE AT ITS BEST

Machine learning characteristics, when merged with big data analytical work, can
generate extreme levels of business intelligence with the help of which several different
industries are making strategic initiatives. From retail to financial services to healthcare,
and many more – machine learning has already become one of the most effective
technologies to boost business operations.

Descriptive Analytics: Insight into the past

Descriptive analysis or statistics does exactly what the name implies: they “describe”, or
summarize, raw data and make it something that is interpretable by humans. They are
analytics that describe the past. The past refers to any point of time that an event has
occurred, whether it is one minute ago, or one year ago. Descriptive analytics are useful
because they allow us to learn from past behaviors, and understand how they might
influence future outcomes.

The vast majority of the statistics we use fall into this category. (Think basic arithmetic
like sums, averages, percent changes.) Usually, the underlying data is a count or
aggregate of a filtered column of data to which basic math is applied. For all practical
purposes, there are an infinite number of these statistics. Descriptive statistics are useful
to show things like total stock in inventory, average dollars spent per customer and year-
over-year change in sales. Common examples of descriptive analytics are reports that
provide historical insights regarding the company’s production, financials, operations,
sales, finance, inventory and customers.

Use descriptive analytics when you need to understand at an aggregate level what is
going on in your company, and when you want to summarize and describe different
aspects of your business.

Predictive Analytics: Understanding the future

Predictive analytics has its roots in the ability to “predict” what might happen. These
analytics are about understanding the future. Predictive analytics provides companies
with actionable insights based on data. Predictive analytics provides estimates about the
likelihood of a future outcome. It is important to remember that no statistical algorithm
can “predict” the future with 100% certainty. Companies use these statistics to forecast
what might happen in the future. This is because the foundation of predictive analytics is
based on probabilities.
These statistics try to take the data that you have, and fill in the missing data with best
guesses. They combine historical data found in ERP, CRM, HR and POS systems to
identify patterns in the data and apply statistical models and algorithms to capture
relationships between various data sets. Companies use predictive statistics and analytics
any time they want to look into the future. Predictive analytics can be used throughout the
organization, from forecasting customer behavior and purchasing patterns to identifying
trends in sales activities. They also help forecast demand for inputs from the supply
chain, operations and inventory.

One common application most people are familiar with is the use of predictive analytics
to produce a credit score. These scores are used by financial services to determine the
probability of customers making future credit payments on time. Typical business uses
include understanding how sales might close at the end of the year, predicting what items
customers will purchase together, or forecasting inventory levels based upon a myriad of
variables.

Use predictive analytics any time you need to know something about the future, or fill in
the information that you do not have.
Module 1.2
Machine learning Models: Geometric Models, Logical Models, and Probabilistic Models.

Models form the central concept in machine learning as they are what is being learned
from the data, in order to solve a given task.

Module 1.3
Features: Feature types, Feature Construction and Transformation, Feature Selection.
Features determine much of the success of a machine learning application, because a
model is only as good as its features. A feature can be thought of as a kind of
measurement that can be easily performed on any instance.

Mathematically, they are functions that map from the instance space to some set of
feature values called the domain of the feature.

Features are nothing but the independent variables in machine learning models. What is
required to be learned in any specific machine learning problem is a set of these features
(independent variables), coefficients of these features, and parameters for coming up with
appropriate functions or models (also termed hyperparameters).

Feature Transformation

Data preprocessing is one of the many crucial steps of any data science project. As we
know, our real-life data is often very unorganized and messy and without data
preprocessing. First, we have to preprocess our data and then feed that processed data to
our data science models for good performance. One part of preprocessing is Feature
Transformation

Feature Transformation is a technique we should always use regardless of the model we


are using, whether it is a classification task or regression task, or be it an unsupervised
learning model.

Feature transformation is a mathematical transformation in which we apply a


mathematical formula to a particular column (feature) and transform the values, which
are useful for our further analysis. It is a technique by which we can boost our model
performance. It is also known as Feature Engineering, which creates new features from
existing features that may help improve the model performance.

It refers to the algorithm family that creates new features using the existing features.
These new features may not have the same interpretation as the original features, but they
may have more explanatory power in a different space rather than in the original space.
This can also be used for Feature Reduction. It can be done in many ways, by linear
combinations of original features or using non-linear functions. It helps machine learning
algorithms to converge faster.

Like Linear and Logistic regression, some data science models assume that the variables
follow a normal distribution. More likely, variables in real datasets will follow a skewed
distribution. By applying some transformations to these skewed variables, we can map
this skewed distribution to a normal distribution to increase the performance of our
models.

As we know, Normal Distribution is a very important distribution in Statistics, which is


key to many statisticians for solving problems in statistics. Usually, the data distribution
in Nature follows a Normal distribution like - age, income, height, weight, etc. But the
features in the real-life data are not normally distributed. However, it is the best
approximation when we are unaware of the underlying distribution pattern.

Feature Transformation Techniques

The following transformation techniques can be applied to data sets, such as:
1. Log Transformation: Generally, these transformations make our data close to a
normal distribution but cannot exactly abide by a normal distribution. This transformation
is not applied to those features which have negative values. This transformation is mostly
applied to right-skewed data. Convert data from the addictive scale to multiplicative
scale, i.e., linearly distributed data.

2. Reciprocal Transformation: This transformation is not defined for zero. It is a


powerful transformation with a radical effect. This transformation reverses the order
among values of the same sign, so large values become smaller and vice-versa.

3. Square Transformation: This transformation mostly applies to left-skewed data.

4. Square Root Transformation: This transformation is defined only for positive


numbers. This can be used for reducing the skewness of right-skewed data. This
transformation is weaker than Log Transformation.

5. Custom Transformation: A Function Transformer forwards its X (and optionally y)


arguments to a user-defined function or function object and returns this function's result.
The resulting transformer will not be pickleable if lambda is used as the function. This is
useful for stateless transformations such as taking the log of frequencies, doing custom
scaling, etc.

6. Power Transformations: Power transforms are a family of parametric, monotonic


transformations that make data more Gaussian-like. The optimal parameter for stabilizing
variance and minimizing skewness is estimated through maximum likelihood. This is
useful for modeling issues related to non-constant variance or other situations where
normality is desired. Currently, Power Transformer supports the Box-Cox transform and
the Yeo-Johnson transform.

Box-cox requires the input data to be strictly positive (not even zero is acceptable), while
Yeo-Johnson supports both positive and negative data.

By default, zero-mean, unit-variance normalization is applied to the transformed data.

● Box-Cox Transformation: Sqrt/sqr/log are the special cases of this transformation.


● Yeo-Johnson Transformation: It is a variation of the Box-Cox

Feature Selection

Feature selection is a way of selecting the subset of the most relevant features from the
original features set by removing the redundant, irrelevant, or noisy features.

While developing the machine learning model, only a few variables in the dataset are
useful for building the model, and the rest features are either redundant or irrelevant. If
we input the dataset with all these redundant and irrelevant features, it may negatively
impact and reduce the overall performance and accuracy of the model. Hence it is very
important to identify and select the most appropriate features from the data and remove
the irrelevant or less important features, which is done with the help of feature selection
in machine learning.

Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and relevant
dataset to the model in order to get a better result.

A feature is an attribute that has an impact on a problem or is useful for the problem, and
choosing the important features for the model is known as feature selection. Each
machine learning process depends on feature engineering, which mainly contains two
processes; which are Feature Selection and Feature Extraction. Although feature selection
and extraction processes may have the same objective, both are completely different from
each other. The main difference between them is that feature selection is about selecting
the subset of the original feature set, whereas feature extraction creates new features.
Feature selection is a way of reducing the input variable for the model by using only
relevant data in order to reduce overfitting in the model.

So, we can define feature Selection as, "It is a process of automatically or manually
selecting the subset of most appropriate and relevant features to be used in model
building." Feature selection is performed by either including the important features or
excluding the irrelevant features in the dataset without changing them.

Before implementing any technique, it is really important to understand, need for the
technique and so for the Feature Selection. As we know, in machine learning, it is
necessary to provide a pre-processed and good input dataset in order to get better
outcomes. We collect a huge amount of data to train our model and help it to learn better.
Generally, the dataset consists of noisy data, irrelevant data, and some part of useful data.
Moreover, the huge amount of data also slows down the training process of the model,
and with noise and irrelevant data, the model may not predict and perform well. So, it is
very necessary to remove such noises and less-important data from the dataset and to do
this, and Feature selection techniques are used.

Selecting the best features helps the model to perform well. For example, Suppose we
want to create a model that automatically decides which car should be crushed for a spare
part, and to do this, we have a dataset. This dataset contains a Model of the car, Year,
Owner's name, Miles. So, in this dataset, the name of the owner does not contribute to the
model performance as it does not decide if the car should be crushed or not, so we can
remove this column and select the rest of the features(column) for the model building.

Below are some benefits of using feature selection in machine learning:

● It helps in avoiding the curse of dimensionality.


● It helps in the simplification of the model so that it can be easily interpreted by the
researchers.
● It reduces the training time.
● It reduces overfitting hence enhance the generalization.

Feature Selection Techniques

There are mainly two types of Feature Selection techniques, which are:
● Supervised Feature Selection technique
Supervised Feature selection techniques consider the target variable and can be
used for the labelled dataset.
● Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be
used for the unlabelled dataset.

1. Wrapper Methods

In wrapper methodology, selection of features is done by considering it as a search


problem, in which different combinations are made, evaluated, and compared with other
combinations. It trains the algorithm by using the subset of features iteratively.
On the basis of the output of the model, features are added or subtracted, and with this
feature set, the model has trained again.

Some techniques of wrapper methods are:

● Forward selection - Forward selection is an iterative process, which begins with an


empty set of features. After each iteration, it keeps adding on a feature and
evaluates the performance to check whether it is improving the performance or
not. The process continues until the addition of a new variable/feature does not
improve the performance of the model.
● Backward elimination - Backward elimination is also an iterative approach, but it
is the opposite of forward selection. This technique begins the process by
considering all the features and removes the least significant feature. This
elimination process continues until removing the features does not improve the
performance of the model.
● Exhaustive Feature Selection- Exhaustive feature selection is one of the best
feature selection methods, which evaluates each feature set as brute-force. It
means this method tries & make each possible combination of features and return
the best performing feature set.
● Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach, where
features are selected by recursively taking a smaller and smaller subset of features.
Now, an estimator is trained with each set of features, and the importance of each
feature is determined using coef_attribute or throug feature_importances_attribute.

2. Filter Methods

In Filter Method, features are selected on the basis of statistics measures. This method
does not depend on the learning algorithm and chooses the features as a pre-processing
step.

The filter method filters out the irrelevant feature and redundant columns from the model
by using different metrics through ranking.

The advantage of using filter methods is that it needs low computational time and does
not overfit the data.
Some common techniques of Filter methods are as follows:

● Information Gain: Information gain determines the reduction in entropy while


transforming the dataset. It can be used as a feature selection technique by
calculating the information gain of each variable with respect to the target
variable.

● Chi-square Test: Chi-square test is a technique to determine the relationship


between the categorical variables. The chi-square value is calculated between each
feature and the target variable, and the desired number of features with the best
chi-square value is selected.

● Fisher's Score: Fisher's score is one of the popular supervised technique of


features selection. It returns the rank of the variable on the fisher's criteria in
descending order. Then we can select the variables with a large fisher's score.

● Missing Value Ratio: The value of the missing value ratio can be used for
evaluating the feature set against the threshold value. The formula for obtaining
the missing value ratio is the number of missing values in each column divided by
the total number of observations. The variable is having more than the threshold
value can be dropped.

3. Embedded Methods

Embedded methods combined the advantages of both filter and wrapper methods by
considering the interaction of features along with low computational cost. These are fast
processing methods similar to the filter method but more accurate than the filter method.

These methods are also iterative, which evaluates each iteration, and optimally finds the
most important features that contribute the most to training in a particular iteration. Some
techniques of embedded methods are:

● Regularization- Regularization adds a penalty term to different parameters of the


machine learning model for avoiding overfitting in the model. This penalty term is
added to the coefficients; hence it shrinks some coefficients to zero. Those features
with zero coefficients can be removed from the dataset. The types of regularization
techniques are L1 Regularization (Lasso Regularization) or Elastic Nets (L1 and
L2 regularization).
● Random Forest Importance - Different tree-based methods of feature selection
help us with feature importance to provide a way of selecting features. Here,
feature importance specifies which feature has more importance in model building
or has a great impact on the target variable. Random Forest is such a tree-based
method, which is a type of bagging algorithm that aggregates a different number
of decision trees. It automatically ranks the nodes by their performance or decrease
in the impurity (Gini impurity) over all the trees. Nodes are arranged as per the
impurity values, and thus it allows to pruning of trees below a specific node. The
remaining nodes create a subset of the most important features.

How to choose a Feature Selection Method?


For machine learning engineers, it is very important to understand that which feature
selection method will work properly for their model. The more we know the datatypes of
variables, the easier it is to choose the appropriate statistical measure for feature
selection.

To know this, we need to first identify the type of input and output variables. In machine
learning, variables are of mainly two types:

● Numerical Variables: Variable with continuous values such as integer, float


● Categorical Variables: Variables with categorical values such as Boolean, ordinal,
nominals.
Below are some univariate statistical measures, which can be used for filter-based feature
selection:

1. Numerical Input, Numerical Output:

Numerical Input variables are used for predictive regression modelling. The common
method to be used for such a case is the Correlation coefficient.

Pearson's correlation coefficient (For linear Correlation).

Spearman's rank coefficient (for non-linear correlation).

2. Numerical Input, Categorical Output:

Numerical Input with categorical output is the case for classification predictive
modelling problems. In this case, also, correlation-based techniques should be used, but
with categorical output.

● ANOVA correlation coefficient (linear).


● Kendall's rank coefficient (nonlinear).

3. Categorical Input, Numerical Output:

This is the case of regression predictive modelling with categorical input. It is a different
example of a regression problem. We can use the same measures as discussed in the
above case but in reverse order.
4. Categorical Input, Categorical Output:

This is a case of classification predictive modelling with categorical Input variables.

The commonly used technique for such a case is Chi-Squared Test. We can also use
Information gain in this case.

We can summarise the above cases with appropriate measures in the below table:

Input Output Feature Selection technique


Variable Variable

Numerical Numerical ● Pearson's correlation coefficient (For linear Correlation).


● Spearman's rank coefficient (for non-linear correlation).

Numerical Categorical ● ANOVA correlation coefficient (linear).


● Kendall's rank coefficient (nonlinear).

Categorical Numerical ● Kendall's rank coefficient (linear).


● ANOVA correlation coefficient (nonlinear).

Categorical Categorical ● Chi-Squared test (contingency tables).


● Mutual Information.

You might also like