Data Science Solutions IA 2
Data Science Solutions IA 2
Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:
Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of a
cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process
of how the machine identifies the objects in Supervised Learning.
Supervised machine learning can be classified into two types of problems, which are given
below:
Regression:- These are used to predict continuous output variables, such as market trends,
weather prediction, etc.
Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabelled dataset, and the machine predicts the output without
any supervision.
In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.
The main aim of the unsupervised learning algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and differences. Machines are instructed to find
the hidden patterns from the input dataset.
Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.
So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.
3. Semi-Supervised Learning
To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced. The main aim of semi-supervised
learning is to effectively use all the available data, rather than only labelled data like in
supervised learning. Initially, similar data is clustered along with an unsupervised learning
algorithm, and further, it helps to label the unlabelled data into labelled data. It is because
labelled data is a comparatively more expensive acquisition than unlabelled data.
We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analysing the
same concept under the guidance of an instructor at college.
4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance. Agent gets rewarded for each
good action and get punished for each bad action; hence the goal of reinforcement learning
agent is to maximize the rewards.
In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.
Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.
Supervised learning model takes direct feedback to check Unsupervised learning model does not take any
if it is predicting correct output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden
data.
In supervised learning, input data is provided to the In unsupervised learning, only input data is pro
model along with the output. model.
The goal of supervised learning is to train the model so The goal of unsupervised learning is to find
that it can predict the output when it is given new data. patterns and useful insights from the unknown d
Supervised learning needs supervision to train the model. Unsupervised learning does not need any su
train the model.
Supervised learning can be used for those cases where we Unsupervised learning can be used for those
know the input as well as corresponding outputs. we have only input data and no corresponding o
Supervised learning model produces an accurate result. Unsupervised learning model may give less ac
as compared to supervised learning.
Supervised learning is not close to true Artificial Unsupervised learning is more close to the tr
intelligence as in this, we first train the model for each Intelligence as it learns similarly as a child
data, and then only it can predict the correct output. routine things by his experiences.
It includes various algorithms such as Linear Regression, It includes various algorithms such as Cluste
Logistic Regression, Support Vector Machine, Multi- and Apriori K-mean, C-mean, algorithm.
class Classification, Decision tree, Bayesian Logic, etc.
1. Trend forecasting
2. Fraud detection
3. Market research
4. Investment management
5. Risk analysis
6. Task automation
7. Customer service
8. Scalability
Retail data is extremely growing in volume, variety, and value with every year. Retailers are
using data science to turn insights into profitable margins by developing data-driven plans.
These uses of data science in retail are giving retailers an opportunity to be in the market,
improve the customer experience, and increase their sales and hence revenue. And as
technology continues to advance: Data science still has so much more to offer in the retail
world.
1. Recommendation Engines
2. Personalized Target Marketing
3. Price Optimization
4. Intelligent cross-selling and upselling
5. Inventory management
6. Customer sentiment analysis
7. Foretelling trends through social media
8. Managing real estate
9. Customer lifetime value prediction
10. Powering Augmented Reality
Using the data we receive from vehicular sensors and turning it into live information
on the condition of the assets, we save companies significant amounts of money every year.
Rather than having to maintain everything on a time or distance basis it’s done on the
actual condition. They therefore improve safety — which is a primary concern — and can
ensure they avoid delay in delivery their assets, and can avoid inflicting unnecessary costs.
3. delivering on time,
4. route optimization,
5. dynamic pricing,
One of the most innovative aspects in the telecommunications industry today is data
science. With greater volumes of data than ever before, telecom providers are increasingly
leveraging data science tools and Artificial Intelligence to make sense of it all.
Since the main activities of companies working in the telecommunication sector
involve data transfer, exchange, and import, it is imperative that telecom providers invest in
data science tools that can manage and extract useful insights from the vast amount of data
generated every day.
4. customer segmentation
Classification Prediction
Classification is the process of identifying which Predication is the process of identifying the
category a new observation belongs to base on a or unavailable numerical data for a new obse
training data set containing observations whose
category membership is known.
In classification, the accuracy depends on finding In prediction, the accuracy depends on how
the class label correctly. given predictor can guess the value of a pre
attribute for new data.
In classification, the model can be known as the In prediction, the model can be known
classifier. predictor.
A model or the classifier is constructed to find the A model or a predictor will be construct
categorical labels. predicts a continuous-valued function or
value.
For example, the grouping of patients based on For example, We can think of predic
their medical records can be considered a predicting the correct treatment for a p
classification. disease for a person.
11.Briefly explain any five machine learning algorithms with examples.
Decision Tree Algorithm
A decision tree is a supervised learning algorithm that is mainly used to solve the
classification problems but can also be used for solving the regression problems. It can work
with both categorical variables and continuous variables. It shows a tree-like structure that
includes nodes and branches, and starts with the root node that expand on further branches till
the leaf node. The internal node is used to represent the features of the dataset, branches
show the decision rules, and leaf nodes represent the outcome of the problem.
Some real-world applications of decision tree algorithms are identification between cancerous
and non-cancerous cells, suggestions to customers to buy a car
K-Nearest Neighbour is a supervised learning algorithm that can be used for both
classification and regression problems. This algorithm works by assuming the similarities
between the new data point and available data points. Based on these similarities, the new
data points are put in the most similar categories. It is also known as the lazy learner
algorithm as it stores all the available datasets and classifies each new case with the help of
K-neighbours. The new case is assigned to the nearest class with most similarities, and any
distance function measures the distance between the data points. The distance function can
be Euclidean, Minkowski, Manhattan, or Hamming distance
Random forest is the supervised learning algorithm that can be used for both classification
and regression problems in machine learning. It is an ensemble learning technique that
provides the predictions by combining the multiple classifiers and improve the performance
of the model.
It contains multiple decision trees for subsets of the given dataset, and find the average to
improve the predictive accuracy of the model. A random-forest should contain 64-128 trees.
The greater number of trees leads to higher accuracy of the algorithm.
To classify a new dataset or object, each tree gives the classification result and based on the
majority votes, the algorithm predicts the final output.
Random forest is a fast algorithm, and can efficiently deal with the missing & incorrect data
Apriori Algorithm
Apriori algorithm is the unsupervised learning algorithm that is used to solve the association
problems. It uses frequent itemsets to generate association rules, and it is designed to work on
the databases that contain transactions. With the help of these association rule, it determines
how strongly or how weakly two objects are connected to each other. This algorithm uses a
breadth-first search and Hash Tree to calculate the itemset efficiently.
Linear Regression
Linear regression is one of the most popular and simple machine learning algorithms that is
used for predictive analysis. Here, predictive analysis defines prediction of something, and
linear regression makes predictions for continuous numbers such as salary, age, etc.
It shows the linear relationship between the dependent and independent variables, and shows
how the dependent variable(y) changes according to the independent variable (x).
3) Many data scientists get their data in raw formats from various sources
of data and information.
4) But, for many data scientists also as business decision-makers,
particularly in big enterprises, the main sources of data and information
are corporate data warehouses.
6) After the data is loaded, it often cleansed, transformed, and checked for
quality before it is used for analytics reporting, data science, machine
learning, or anything.
Model Data: -
1. “Where the magic happens”
2. Reduce the dimensionality of your data set.
3. Not all your features or values are essential to predicting your model.
4. What you need to do is to select the relevant ones that contribute to the prediction of
results.
5. There are a few tasks we can perform in modelling.
5.1. classification to differentiating the emails you received as “Inbox” and “Spam” using
logistic regressions
6. We can also use modelling to group data to understand the logic behind those clusters.
6.1. For example, we group our e-commerce customers to understand their behavior on
your website.
7. In short, we use regression and predictions for forecasting future values, and classification
to identify, and clustering to group values.
Interpreting Data: -
1. Interpreting data refers to the presentation of your data to a non-technical layman.
2. We deliver the results in to answer the business questions we asked when we first started
the project, together with the actionable insights that we found through the data science
process.
3. Actionable insight is a key outcome that we show how data science can bring about
predictive analytics and later on prescriptive analytics. In which, we learn how to repeat a
positive result, or prevent a negative outcome.
4. It is essential to present your findings in such a way that is useful to the organization, or
else it would be pointless to your stakeholders.
5. In this process, technical skills only are not sufficient.
6. One essential skill you need is to be able to tell a clear and actionable story.
7. If your presentation does not trigger actions in your audience, it means that your
communication was not efficient.
8. Remember that you will be presenting to an audience with no technical background, so
the way you communicate the message is key.
Single Contiguous Allocation: This technique involves dividing the available memory into a
single, continuous block. The operating system allocates memory to processes in a
contiguous manner. It is a simple approach but can lead to fragmentation issues, both external
fragmentation (unused memory blocks scattered throughout the system) and internal
fragmentation (wasted memory within allocated blocks).
Partitioned Allocation: Partitioned allocation divides the memory into fixed-size partitions
or variable-sized partitions. Each partition can accommodate a single process. This technique
can be further categorized into:
Fixed Partitioning: The memory is divided into fixed-size partitions, and each
partition is allocated to a process. The remaining unused memory in each partition can cause
internal fragmentation.
Variable Partitioning: The memory is divided into variable-sized partitions based on
process requirements. It reduces internal fragmentation but can lead to external
fragmentation.
Paging: Paging divides the physical memory and processes into fixed-size blocks called
pages. The logical address space of a process is divided into fixed-size blocks called page
frames. The mapping between logical and physical addresses is maintained through a page
table. Paging helps to manage memory efficiently, reduces external fragmentation, and
enables non-contiguous allocation. However, it may incur some overhead due to page table
management.
Segmentation: Segmentation divides the logical address space of a process into variable-
sized blocks called segments. Each segment represents a logical unit such as code, data, or
stack. Segmentation allows for flexible memory allocation and supports dynamic data
structures. However, it can lead to external fragmentation, and efficient management of
variable-sized segments is complex.
Virtual Memory: Virtual memory is a technique that allows processes to use more memory
than physically available by utilizing disk space as an extension of physical memory. It
provides the illusion of a large, contiguous address space to each process. Virtual memory
techniques include demand paging, where only required pages are brought into physical
memory, and page replacement algorithms (e.g., LRU, FIFO) to manage page movements
between physical memory and disk.
Swapping: Swapping involves moving an entire process from main memory to secondary
storage (e.g., disk) when it is not actively executing. This technique helps in freeing up
memory for other processes but introduces higher latency due to disk I/O.