0% found this document useful (0 votes)
16 views

unit 3 BI & Data science (1)

Uploaded by

Hawana Tamang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

unit 3 BI & Data science (1)

Uploaded by

Hawana Tamang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT-3

Data Mining and Machine Learning

Data Mining: The origins of Data Mining- Data Mining Tasks- OLAP and
Multidimensional-Data Analysis-Basic Concept of Association Analysis and Cluster
Analysis.
Machine Learning: History and Evolution – AI Evolution- Statistics vs. Data Mining vs.
Data Analytics vs. Data Science – Supervised Learning- Unsupervised Learning-
Reinforcement Learning- Frameworks for Building Machine Learning Systems.

Data Mining:
The origins of Data Mining

 Data mining is a discipline with a long history. It starts with the early Data Mining
methods Bayes’ Theorem (1700`s) and Regression analysis (1800`s) which were
mostly identifying patterns in data.
 Data mining is the process of analyzing large data sets (Big Data) from different
perspectives and uncovering correlations and patterns to summarize them into useful
information.
 Nowadays it is blended with many techniques such as artificial intelligence, statistics,
data science, database theory and machine learning.
 Increasing power of technology and complexity of data sets has lead Data Mining to
evolve from static data delivery to more dynamic and proactive information deliveries;
from tapes and disks to advanced algorithms and massive databases.
 In the late 80`s Data Mining term began to be known and used within the research
community by statisticians, data analysts, and the management information systems
(MIS) communities.
 By the early 1990`s, data mining was recognized as a sub-process or a step within a
larger process called Knowledge Discovery in Databases (KDD) .The most commonly
used definition of KDD is “The nontrivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns in data” (Fayyad, 1996).

The sub-processes that form part of the KDD process are;


1. Understanding of the application and identifying the goal of the KDD process
2. Creating a target data set
3. Data cleaning and pre-processing
4. Matching the goals of the KDD process (step 1) to a particular data-mining method.
5. Research analysis and hypothesis selection
6. Data mining: Searching for patterns of interest in a particular form , including
classification rules, regression, and clustering
7. Interpreting mined patterns
8. Acting on the discovered analysis
Data Mining Tasks:

Data mining, also known as knowledge discovery in data (KDD), is the process of
uncovering patterns and other valuable information from large data sets. Given the evolution
of data warehousing technology and the growth of big data, adoption of data mining
techniques has rapidly accelerated over the last couple of decades, assisting companies by
transforming their raw data into useful knowledge.

Data mining functionalities are to perceive the various forms of patterns to be identified in
data mining activities. To define the type of patterns to be discovered in data mining
activities, data mining features are used. Data mining has a wide application for forecasting
and characterizing data in big data.

Data mining tasks are majorly categorized into two categories: descriptive and predictive.

1. Descriptive data mining:


Descriptive data mining offers a detailed description of the data, for example- it gives
insight into what's going on inside the data without any prior idea. This demonstrates
the common characteristics in the results. It includes any information to grasp what's
going on in the data without a prior idea.
2. Predictive Data Mining:
This allows users to consider features that are not specifically available. For example,
the projection of the market analysis in the next quarters with the output of the
previous quarters, In general, the predictive analysis forecasts or infers the features of
the data previously available. For an instance: judging by the outcomes of medical
records of a patient who suffers from some real illness.

Key Data Mining Tasks


1) Characterization and Discrimination

 Data Characterization: The characterization of data is a description of the general


characteristics of objects in a target class which creates what are called characteristic
rules.
A database query usually computes the data applicable to a user-specified class and
runs through a description component to retrieve the meaning of the data at various
abstraction levels.
Eg;-Bar maps, curves, and pie charts.
 Data Discrimination: Data discrimination creates a series of rules called discriminate
rules that is simply a distinction between the two classes aligned with the goal class
and the opposite class of the general characteristics of objects.

2) Prediction

To detect the inaccessible data, it uses regression analysis and detects the missing numeric
values in the data. If the class mark is absent, so classification is used to render the prediction.
Due to its relevance in business intelligence, the prediction is common. If the class mark is
absent, so the prediction is performed using classification. There are two methods of
predicting data. Due to its relevance in business intelligence, a prediction is common. The
prediction of the class mark using the previously developed class model and the prediction of
incomplete or incomplete data using prediction analysis are two ways of predicting data.
3) Classification

Classification is used to create data structures of predefined classes, as the model is used to
classify new instances whose classification is not understood. The instances used to produce
the model are known as data from preparation. A decision tree or set of classification rules is
based on such a form of classification process that can be collected to identify future details,
for example by classifying the possible compensation of the employee based on the
classification of salaries of related employees in the company.

4) Association Analysis

The link between the data and the rules that bind them is discovered. And two or more data
attributes are associated. It associates qualities that are transacted together regularly. They
work out what are called the rules of partnerships that are commonly used in the study of
stock baskets. To link the attributes, there are two elements. One is the trust that suggests the
possibility of both associated together, and another helps, which informs of associations' past
occurrence.

5) Outlier Analysis

Data components that cannot be clustered into a given class or cluster are outliers. They are
often referred to as anomalies or surprises and are also very important to remember.

Although in some contexts, outliers can be called noise and discarded, they can disclose
useful information in other areas, and hence can be very important and beneficial for their
study.

6) Cluster Analysis

Clustering is the arrangement of data in groups. Unlike classification, however, class labels
are undefined in clustering and it is up to the clustering algorithm to find suitable classes.
Clustering is often called unsupervised classification since provided class labels do not
execute the classification. Many clustering methods are all based on the concept of
maximizing the similarity (intra-class similarity) between objects of the same class and
decreasing the similarity between objects in different classes (inter-class similarity).

7) Evolution & Deviation Analysis

We may uncover patterns and shifts in actions over time, with such distinct analysis, we can
find features such as time-series results, periodicity, and similarities in patterns. Many
technologies from space science to retail marketing can be found holistically in data
processing and features.

 OLAP (online analytical processing)

 OLAP (online analytical processing) is a computing method that enables users to


easily and selectively extract and query data in order to analyze it from different
points of view. OLAP business intelligence queries often aid in trends analysis,
financial reporting, sales forecasting, budgeting and other planning purposes.
 For example, a user can request that data be analyzed to display a spreadsheet
showing all of a company's beach ball products sold in Florida in the month of July,
compare revenue figures with those for the same products in September and then see a
comparison of other product sales in Florida in the same time period.

How OLAP systems work

 To facilitate this kind of analysis, data is collected from multiple data sources and
stored in data warehouses then cleansed and organized into data cubes.

 Each OLAP cube contains data categorized by dimensions (such as customers,


geographic sales region and time period) derived by dimensional tables in the data
warehouses.

 Dimensions are then populated by members (such as customer names, countries and
months) that are organized hierarchically. OLAP cubes are often pre-summarized
across dimensions to drastically improve query time over relational databases.

 Types of OLAP Servers


We have four types of OLAP servers −

 Relational OLAP (ROLAP)


 Multidimensional OLAP (MOLAP)
 Hybrid OLAP (HOLAP)
 Specialized SQL Servers

1. Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end tools. To
store and manage warehouse data, ROLAP uses relational or extended-relational DBMS.
ROLAP includes the following −

 Implementation of aggregation navigation logic.


 Optimization for each DBMS back end.
 Additional tools and services.

2. Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views of
data. With multidimensional data stores, the storage utilization may be low if the data set is
sparse. Therefore, many MOLAP server use two levels of data storage representation to
handle dense and sparse data sets.

3. Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large data
volumes of detailed information. The aggregations are stored separately in MOLAP store.
4. Specialized SQL Servers
Specialized SQL servers provide advanced query language and query processing support for
SQL queries over star and snowflake schemas in a read-only environment.

 OLAP Operations
Since OLAP servers are based on multidimensional view of data, OLAP operations in
multidimensional data.
Here is the list of OLAP operations −

 Roll-up
 Drill-down
 Slice and dice
 Pivot (rotate)

 Roll-up. Also known as consolidation, or drill-up, this operation summarizes the


data along the dimension.

 Drill-down. This allows analysts to navigate deeper among the dimensions of


data, for example drilling down from "time period" to "years" and "months" to
chart sales growth for a product.

 Slice. This enables an analyst to take one level of information for display, such as
"sales in 2017."

 Dice. This allows an analyst to select data from multiple dimensions to analyze,
such as "sales of blue beach balls in Iowa in 2017."

 Pivot. Analysts can gain a new view of data by rotating the data axes of the cube.

OLAP software then locates the intersection of dimensions, such as all products sold in the
Eastern region above a certain price during a certain time period, and displays them. The
result is the "measure"; each OLAP cube has at least one to perhaps hundreds of measures,
which are derived from information stored in fact tables in the data warehouse.

OLAP begins with data accumulated from multiple sources and stored in a data warehouse.
The data is then cleansed and stored in OLAP cubes, which users run queries against.
 Association Rule Mining

Association analysis is useful for discovering interesting relationships hidden in large data
sets. The uncovered relationships can be represented in the form of association rules or sets
of frequent items.

Given a set of transactions, find rules that will predict the occurrence of an item based on the
occurrences of other items in the transaction

Market- Basket transactions

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Implication means co-occurrence, not causality!
Example of Association Rules
{Beer}{Diaper}
{Milk, Bread}{Eggs, Coke}
{Beer, Bread}{Milk}

Support Count ( ) – Frequency of occurrence of a item set.


Here ({Milk, Bread, Diaper})=2
Frequent Item set – An item set whose support is greater than or equal to minsup
threshold.
Association Rule – An implication expression of the form X -> Y, where X and Y are any
2 item sets.
Example: {Milk, Diaper}->{Beer}
Rule Evaluation Metrics –
Support(s) –
The number of transactions that include items in the {X} and {Y} parts of the rule as a
percentage of the total number of transaction. It is a measure of how frequently the
collection of items occurs together as a percentage of all transactions.
Support = (X+Y) ÷ total –
It is interpreted as fraction of transactions that contain both X and Y.
Confidence(c) –
It is the ratio of the no of transactions that includes all items in {B} as well as the no of
transactions that includes all items in {A} to the no of transactions that includes all
items in {A}.
Conf(X=>Y) = Supp(X Y) ÷ Supp(X) –
It measures how often each item in Y appears in transactions that contains items in X
also.
 Lift (l) –
The lift of the rule X=>Y is the confidence of the rule divided by the expected
confidence, assuming that the item sets X and Y are independent of each other. The
expected confidence is the confidence divided by the frequency of {Y}.
 Lift(X=>Y) = Conf(X=>Y) ÷ Supp(Y) –
Lift value near 1 indicates X and Y almost often appear together as expected,
greater than 1 means they appear together more than expected and less than 1
means they appear less than expected. Greater lift values indicate stronger
association.

 Example – From the above table, {Milk, Diaper}=>{Beer}


s= ({Milk, Diaper, Beer}) ÷|T|
= 2/5
= 0.4

 c= (Milk, Diaper, Beer) ÷(Milk, Diaper)


= 2/3
= 0.67
 l= Supp({Milk, Diaper, Beer}) ÷Supp({Milk, Diaper})*Supp({Beer})
= 0.4/ (0.6*0.6)
= 1.11
The Association rule is very useful in analyzing datasets. The data is collected using
bar-code scanners in supermarkets. Such databases consist of a large number of
transaction records which list all items bought by a customer on a single purchase. So
the manager could know if certain groups of items are consistently purchased together
and use this data for adjusting store layouts, cross-selling, promotions based on
statistics.

 Data Mining – Cluster Analysis

Cluster Analysis is the process to find similar groups of objects in order to form clusters. It
is an unsupervised machine learning-based algorithm that acts on unlabelled data. A group
of data points would comprise together to form a cluster in which all the objects would
belong to the same group.
Cluster:
The given data is divided into different groups by combining similar objects into a group.
This group is nothing but a cluster. A cluster is nothing but a collection of similar data
which is grouped together.
For example, consider a dataset of vehicles is given in which it contains information about
different vehicles like cars, buses, bicycles, etc. As it is unsupervised learning there are no
class labels like Cars, Bikes, etc for all the vehicles, all the data is combined and is not in a
structured manner.
Now our task is to convert the unlabelled data to labelled data and it can be done using
clusters.
The main idea of cluster analysis is that it would arrange all the data points by forming
clusters like cars cluster which contains all the cars, bikes clusters which contains all the
bikes, etc.
Simply it is partitioning of similar objects which are applied on unlabelled data.
Properties of Clustering:

1. Clustering Scalability: Nowadays there is a vast amount of data and should


be dealing with huge databases. In order to handle extensive databases, the
clustering algorithm should be scalable. Data should be scalable if it is not
scalable, then we can’t get the appropriate result and would lead to wrong
results.
2. High Dimensionality: The algorithm should be able to handle high
dimensional space along with the data of small size.
3. Algorithm Usability with multiple data kinds: Different kinds of data can
be used with algorithms of clustering. It should be capable of dealing with
different types of data like discrete, categorical and interval-based data,
binary data etc.
4. Dealing with unstructured data: These would be some databases that
contain missing values, noisy or erroneous data. If the algorithms are
sensitive to such data then it may lead to poor quality clusters. So it should
be able to handle
 Clustering Methods:
The clustering methods can be classified into the following categories:
o Partitioning Method
o Hierarchical Method
o Density-based Method
o Grid-Based Method
o Model-Based Method
o Constraint-based Method
1. Partitioning Method: It is used to make partitions on the data in order to form
clusters. If “n” partitions are done on “p” objects of the database then each partition
is represented by a cluster and n < p. The two conditions which need to be satisfied
with this Partitioning Clustering Method are:
o One objective should only belong to only one group.
o There should be no group without even a single purpose.

In the partitioning method, there is one technique called iterative relocation, which means
the object will be moved from one group to another to improve the partitioning.

2. Hierarchical Method: In this method, a hierarchical decomposition of the given


set of data objects is created. We can classify hierarchical methods and will be able
to know the purpose of classification on the basis of how the hierarchical
decomposition is formed. There are two types of approaches for the creation of
hierarchical decomposition, they are:
 Agglomerative Approach: The agglomerative approach is also known as the
bottom-up approach. Initially, the given data is divided in which the objects
form separate groups. Thereafter it keeps on merging the objects or the groups
that are close to one another which means that they exhibit similar properties.
This merging process continues until the termination condition holds.
 Divisive Approach: The divisive approach is also known as the top-down
approach. In this approach, we would start with the data objects that are in the
same cluster. The group of individual clusters is divided into small clusters by
continuous iteration. The iteration continues until the condition of termination
is met or until each cluster contains one object.
Once the group is split or merged then it can never be undone as it is a rigid method and is
not so flexible. The two approaches which can be used to improve the Hierarchical
Clustering Quality in Data Mining are: –
 One should carefully analyze the linkages of the object at every partitioning of
hierarchical clustering.
 One can use a hierarchical agglomerative algorithm for the integration of
hierarchical agglomeration. In this approach, first, the objects are grouped into
micro-clusters. After grouping data objects into micro clusters, macro
clustering is performed on the micro cluster.
3. Density-Based Method: The density-based method mainly focuses on density. In
this method, the given cluster will keep on growing continuously as long as the
density in the neighbourhood exceeds some threshold, i.e, for each data point
within a given cluster. The radius of a given cluster has to contain at least a
minimum number of points.
4. Grid-Based Method: In the Grid-Based method a grid is formed using the object
together,i.e, the object space is quantized into a finite number of cells that form a
grid structure. One of the major advantages of the grid-based method is fast
processing time and it is dependent only on the number of cells in each dimension
in the quantized space. The processing time for this method is much faster so it can
save time.
5. Model-Based Method: In the model-based method, all the clusters are
hypothesized in order to find the data which is best suited for the model. The
clustering of the density function is used to locate the clusters for a given model. It
reflects the spatial distribution of data points and also provides a way to
automatically determine the number of clusters based on standard statistics, taking
outlier or noise into account. Therefore it yields robust clustering methods.

6. Constraint-Based Method: The constraint-based clustering method is performed


by the incorporation of application or user-oriented constraints. A constraint refers
to the user expectation or the properties of the desired clustering results.
Constraints provide us with an interactive way of communication with the
clustering process. Constraints can be specified by the user or the application
requirement.

Applications of Cluster Analysis:


 It is widely used in image processing, data analysis, and pattern recognition.
 It helps marketers to find the distinct groups in their customer base and they can
characterize their customer groups by using purchasing patterns.
 It can be used in the field of biology, by deriving animal and plant taxonomies,
identifying genes with the same capabilities.
 It also helps in information discovery by classifying documents on the web.
 Machine Learning

 Machine learning (ML) is the study of computer algorithms that can improve
automatically through experience and by the use of data. It is seen as a part
of artificial intelligence. Machine learning algorithms build a model based on sample
data, known as training data, in order to make predictions or decisions without being
explicitly programmed to do so. Machine learning algorithms are used in a wide
variety of applications, such as in medicine, email filtering, speech recognition,
and computer vision, where it is difficult or unfeasible to develop conventional
algorithms to perform the needed tasks.
 A subset of machine learning is closely related to computational statistics, which
focuses on making predictions using computers; but not all machine learning is
statistical learning. The study of mathematical optimization delivers methods, theory
and application domains to the field of machine learning. Data mining is a related
field of study, focusing on exploratory data analysis through unsupervised learning.
Some implementations of machine learning use data and neural networks in a way
that mimics the working of a biological brain. In its application across business
problems, machine learning is also referred to as predictive analytics.

 How machine learning works

1. A Decision Process: In general, machine learning algorithms are used to make a


prediction or classification. Based on some input data, which can be labelled or
unlabeled, your algorithm will produce an estimate about a pattern in the data.

2. An Error Function: An error function serves to evaluate the prediction of the


model. If there are known examples, an error function can make a comparison to
assess the accuracy of the model.

3. A Model Optimization Process: If the model can fit better to the data points in the
training set, then weights are adjusted to reduce the discrepancy between the
known example and the model estimate. The algorithm will repeat this evaluate and
optimize process, updating weights autonomously until a threshold of accuracy has
been met.

 Machine learning methods

Machine learning classifiers fall into three primary categories.

1. Supervised machine learning

Supervised learning, also known as supervised machine learning, is defined by its use of
labelled datasets to train algorithms that to classify data or predict outcomes accurately. As
input data is fed into the model, it adjusts its weights until the model has been fitted
appropriately. This occurs as part of the cross validation process to ensure that the model
avoids over fitting or under fitting. Supervised learning helps organizations solve for a variety
of real-world problems at scale, such as classifying spam in a separate folder from your
inbox. Some methods used in supervised learning include neural networks, naïve bayes,
linear regression, logistic regression, random forest, support vector machine (SVM), and
more.

2. Unsupervised machine learning

Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden
patterns or data groupings without the need for human intervention. Its ability to discover
similarities and differences in information make it the ideal solution for exploratory data
analysis, cross-selling strategies, customer segmentation, image and pattern recognition. It’s
also used to reduce the number of features in a model through the process of dimensionality
reduction; principal component analysis (PCA) and singular value decomposition (SVD) are
two common approaches for this. Other algorithms used in unsupervised learning include
neural networks, k-means clustering, probabilistic clustering methods, and more.

3. Semi-supervised learning

Semi-supervised learning offers a happy medium between supervised and unsupervised


learning. During training, it uses a smaller labelled data set to guide classification and feature
extraction from a larger, unlabeled data set. Semi-supervised learning can solve the problem
of having not enough labelled data (or not being able to afford to label enough data) to train a
supervised learning algorithm.

 Reinforcement machine learning

Reinforcement machine learning is a behavioural machine learning model that is similar to


supervised learning, but the algorithm isn’t trained using sample data. This model learns as it
goes by using trial and error. A sequence of successful outcomes will be reinforced to
develop the best recommendation or policy for a given problem.

Machine Learning is a sub-set of artificial intelligence where computer algorithms are used to
autonomously learn from data and information. In machine learning computers don’t have to
be explicitly programmed but can change and improve their algorithms by themselves.

Machine learning algorithms enable computers to communicate with humans, autonomously


drive cars, write and publish sport match reports, and find terrorist suspects.

 The origins of machine learning


 The term machine learning was coined in 1959 by Arthur Samuel, an
American IBMer and pioneer in the field of computer gaming and artificial
intelligence. Also the synonym self-teaching computers was used in this time
period. A representative book of the machine learning research during the 1960s was
the Nilsson's book on Learning Machines, dealing mostly with machine learning for
pattern classification. Interest related to pattern recognition continued into the 1970s,
as described by Duda and Hart in 1973. In 1981 a report was given on using teaching
strategies so that a neural network learns to recognize 40 characters (26 letters, 10
digits, and 4 special symbols) from a computer terminal.
 Tom M. Mitchell provided a widely quoted, more formal definition of the algorithms
studied in the machine learning field: "A computer program is said to learn from
experience E with respect to some class of tasks T and performance measure P if its
performance at tasks in T, as measured by P, improves with experience E."This
definition of the tasks in which machine learning is concerned offers a
fundamentally operational definition rather than defining the field in cognitive terms.
This follows Alan Turing's proposal in his paper "Computing Machinery and
Intelligence", in which the question "Can machines think?" is replaced with the
question "Can machines do what we (as thinking entities) can do?".
 Modern day machine learning has two objectives, one is to classify data based on
models which have been developed, and the other purpose is to make predictions for
future outcomes based on these models. A hypothetical algorithm specific to
classifying data may use computer vision of moles coupled with supervised learning
in order to train it to classify the cancerous moles. A machine learning algorithm for
stock trading may inform the trader of future potential predictions.

EVOLUTION OF MACHINE LEARNING:

Machine Learning Frameworks


A Machine Learning Framework is an interface, library or tool which allows developers to
build machine learning models easily, without getting into the depth of the underlying
algorithms. Let’s discuss the Top 10 Machine Learning Frameworks in detail:

TensorFlow

Google’s Tensorflow is one of the most popular frameworks today. It is an open-source


software library for numerical computation using data flow graphs. TensorFlow implements
data flow graphs, where batches of data or tensors can be processed by a series of algorithms
described by a graph.

Theano
Theano is wonderfully folded over Keras, an abnormal state neural systems library, that runs
nearly in parallel with the Theano library. Keras’ fundamental favorable position is that it is a
moderate Python library for profound discovering that can keep running over Theano or
TensorFlow.

It was created to make actualizing profound learning models as quick and simple as feasible
for innovative work. Discharged under the tolerant MIT permit, it keeps running on Python
2.7 or 3.5 and can consistently execute on GPUs and CPUs given the basic structures.

Sci-Kit Learn

Scikit-learn is one of the most well-known ML libraries. It is preferable for administered and
unsupervised learning calculations. Precedents implement direct and calculated relapses,
choice trees, bunching, k-implies, etc.

This framework involves a lot of calculations for regular AI and data mining assignments,
including bunching, relapse, and order.

Caffe

Caffe is another popular learning structure made with articulation, speed, and measured
quality as the utmost priority. It is created by the Berkeley Vision and Learning Center
(BVLC) and by network donors.

Google’s DeepDream depends on Caffe Framework. This structure is a BSD-authorized C++


library with Python Interface.

H20

H20 is an open-source machine learning platform. It is an artificial intelligence tool which is


business-oriented and helps in making a decision based on data and enables the user to draw
insights. It is mostly used for predictive modeling, risk and fraud analysis, insurance
analytics, advertising technology, healthcare, and customer intelligence.

Amazon Machine Learning

Amazon Machine Learning provides visualization tools that help you go through the process
of creating machine learning (ML) models without having to learn complex ML
algorithms and technology.

It is a service that makes it easy for developers of all skill levels to use machine learning
technology. It connects to data stored in Amazon S3, Redshift, or RDS, and can run binary
classification, multiclass categorization, or regression on the data to build a model.

Torch

This framework provides wide support for machine learning algorithms to GPUs first. It is
easy to use and efficient because of the easy and fast scripting language, LuaJIT, and an
underlying C/CUDA implementation.

The goal of Torch is to have maximum flexibility and speed in building your scientific
algorithms along with an extremely simple process.

Google Cloud ML Engine


Cloud Machine Learning Engine is a managed service that helps developers and data
scientists in building and running superior machine learning models in production.

It offers training and prediction services that can be used together or individually. It is used
by enterprises to solve problems like ensuring food safety, clouds in satellite images,
responding four times faster to customer emails, etc.

Azure ML Studio

This Framework allows Microsoft Azure users to create and train models, then turn them into
APIs that can be consumed by other services. Also, you can connect your own Azure storage
to the service for larger models.

To use the Azure ML Studio, you don’t even need an account to try out the service. You can
log in anonymously and use Azure ML Studio for up to eight hours.

Spark ML Lib

This is Apache Spark’s machine learning library. The goal of this framework is to make
practical machine learning scalable and easy.

 A Brief History of Artificial Intelligence

The beginnings of modern AI can be traced to classical philosophers' attempts to describe


human thinking as a symbolic system. But the field of AI wasn't formally founded until 1956,
at a conference at Dartmouth College, in Hanover, New Hampshire, where the term "artificial
intelligence" was coined.

The beginnings of modern AI can be traced to classical philosophers' attempts to describe


human thinking as a symbolic system. But the field of AI wasn't formally founded until 1956,
at a conference at Dartmouth College, in Hanover, New Hampshire, where the term "artificial
intelligence" was coined.

Despite artificial intelligence has been present for millennia, it was not until the 1950s that its
real potential was investigated. A generation of scientists, physicists, and intellectuals had the
idea of AI, but it wasn’t until Alan Turing, a British polymath, proposed that people solve
problems and make decisions using available information and also a reason.

The difficulty of computers was the major stumbling block to expansion. They needed to
adapt fundamentally before they could expand any further. Machines could execute orders
but not store them. Until 1974, financing was also a problem.

By 1974, computers had become extremely popular. They were now quicker, less expensive,
and capable of storing more data.

 AI Research Today
AI research is ongoing and expanding in today’s world. AI research has grown at a pace of
12.9 percent annually over the last five years, as per Alice Bonasio, a technology journalist.
China is expected to overtake the United States as the world’s leading source of AI
technology in the next 4 years, having overtaken the United States’ second position in 2004
and is rapidly closing in on Europe’s top rank.

In the area of artificial intelligence development, Europe is the largest and most diverse
continent, with significant levels of international collaboration. India is the 3rd largest
country in AI research output, behind China and the USA.

 AI in the Present
Artificial intelligence is being utilized for so many things and has so much promise that it’s
difficult to imagine our future without it, related to business.

Artificial intelligence technologies are boosting productivity like never seen before, from
workflow management solutions to trend forecasts and even the way companies buy
advertisements.

Artificial Intelligence can gather and organize vast volumes of data in order to draw
inferences and estimates that are outside of the human ability to comprehend manually. It
also improves organizational efficiency while lowering the risk of a mistake, and it identifies
unusual patterns, such as spam and frauds, instantaneously to alert organizations about
suspicious behaviour, among other things. AI has grown in importance and sophistication to
the point that a Japanese investment firm became the first to propose an AI Board Member
for its ability to forecast market trends faster than humans.

Artificial intelligence will indeed be and is already being used in many aspects of life, such as
self-driving cars in the coming years, more precise weather forecasting, and earlier health
diagnoses, to mention a few.

 AI in The Future
It has been suggested that we are on the verge of the 4th Industrial Revolution, which will be
unlike any of the previous three. From steam and water power through electricity and
manufacturing process, computerization, and now, the question of what it is to be human is
being challenged.

Smarter technology in our factories and workplaces, as well as linked equipment that will
communicate, view the entire production process, and make autonomous choices, are just a
few of the methods the Industrial Revolution will lead to business improvements. One of the
most significant benefits of the 4th Industrial Revolution is the ability to improve the world’s
populace’s quality of life and increase income levels. As robots, humans, and smart devices
work on improving supply chains and warehousing, our businesses and organizations are
becoming “smarter” and more productive.
 AI in Different Industries
Artificial intelligence (AI) may help you enhance the value of your company in a variety of
ways. It may help you optimize your operations, increase total revenue, and focus your staff
on more essential duties if applied correctly. As a result, AI is being utilized in a variety of
industries throughout the world, including health care, finance, manufacturing, and others.

Health Care
AI is proven to be uplift in the healthcare business. It’s enhancing nearly every area of the
industry, from data security to robot-assisted operations. AI is finally providing this sector,
which has been harmed by inefficient procedures and growing prices, a much-needed facelift.

Automotive
Self-driving vehicles are certainly something you’ve heard of, and they’re a hint that the
future is almost here. It’s no longer science fiction; the autonomous car is already a reality.
As per recent projections, by 2040, roughly 33 million automobiles with self-driving
capability are projected to be on the road.

Finance
According to experts, the banking industry and AI are a perfect combination. Real-time data
transmission, accuracy, and large-scale data processing are the most important elements
driving the financial sector. Because AI is ideal for these tasks, the banking industry is
recognizing its effectiveness and precision and incorporating machine learning, statistical
arbitrage, adaptive cognition, chatbots, and automation into its business operations.

Transportation and Travel


From recommending the best route for drivers to arranging travel reservations remotely, AI
has now become a gigantic trend in this business. End consumers are finding it easier to
navigate thanks to artificial intelligence. Furthermore, travel businesses that integrate AI into
their systems profit from Smartphone usage.

E-Commerce
Have you ever come upon a picture of clothing that you were hunting for on one website but
couldn’t find on another? Well, that is done by AI. It’s due to the machine learning
techniques that businesses employ to develop strong client connections. These technologies
not only personalize customers’ experiences but also assist businesses in increasing sales.

Conclusion
In the early twenty-first century, no place has had a larger influence on AI than the
workplace. Machine-learning techniques are resulting in productivity gains that have never
been observed before. AI is transforming the way we do business, from workflow
management solutions to trend forecasts and even the way businesses buy advertising. AI
research has so much promise that it’s becoming difficult to envisage a world without it. Be
its self-driving vehicles, more precise weather predictions, or space travel, AI will be
prevalent in everyday life by 2030.

 `Statistics vs. Data Mining

Data Mining Statistics

Data mining is a process of extracting useful Statistics refers to the analysis and
information, pattern, and trends from huge data sets presentation of numeric data, and it is the
and utilizes them to make a data-driven decision. major part of all data mining algorithm.

The data used in data mining is numeric or non- The data used in the statistic is numeric only.
numeric.

In data mining, data collection is not more important. In statistics, data collection is more
important.

The types of data mining are clustering, classification, The types of statistics are descriptive
association, neural network, sequence-based analysis, statistical and Inferential statistical.
visualization, etc.

It is suitable for huge data sets. It is suitable for smaller data set.

Data mining is an inductive process. It means the Statistics is the deductive process. It does not
generation of new theory from data. indulge in making any predictions.

Data cleaning is a part of data mining. In statistics, clean data is used to implement
the statistical method.

It requires less user interaction to validate the model, It requires user interaction to validate the
so it is easy to automate. model, so it is complex automate.

Data mining applications include financial Data The application of statistics includes
Analysis, Retail Industry, Telecommunication biostatistics, quality control, demography,
Industry, Biological Data Analysis, Certain Scientific operational research, etc.
Applications, etc.

 Data Science vs. Data Analytics

You might also like