0% found this document useful (0 votes)
206 views16 pages

Data Science Solutions IA 2

The document discusses four main types of machine learning techniques: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. It provides details on each type, including examples and real-world applications. Supervised learning uses labeled training data, unsupervised learning uses unlabeled data to find patterns, semi-supervised learning uses a combination of labeled and unlabeled data, and reinforcement learning uses a feedback loop of rewards and punishments to optimize outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views16 pages

Data Science Solutions IA 2

The document discusses four main types of machine learning techniques: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. It provides details on each type, including examples and real-world applications. Supervised learning uses labeled training data, unsupervised learning uses unlabeled data to find patterns, semi-supervised learning uses a combination of labeled and unlabeled data, and reinforcement learning uses a feedback loop of rewards and punishments to optimize outcomes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

1.Explain different types of Machine Learning techniques.

Based on the methods and way of learning, machine learning is divided into mainly four
types, which are:

1. Supervised Machine Learning


2. Unsupervised Machine Learning
3. Semi-Supervised Machine Learning
4. Reinforcement Learning

1. Supervised Machine Learning


As its name suggests, Supervised machine learning is based on supervision. It means in the
supervised learning technique, we train the machines using the "labelled" dataset, and based
on the training, the machine predicts the output. Here, the labelled data specifies that some of
the inputs are already mapped to the output. More preciously, we can say; first, we train the
machine with the input and corresponding output, and then we ask the machine to predict the
output using the test dataset.

Let's understand supervised learning with an example. Suppose we have an input dataset of
cats and dog images. So, first, we will provide the training to the machine to understand the
images, such as the shape & size of the tail of cat and dog, Shape of eyes, colour, height
(dogs are taller, cats are smaller), etc. After completion of training, we input the picture of a
cat and ask the machine to identify the object and predict the output. Now, the machine is
well trained, so it will check all the features of the object, such as height, shape, colour, eyes,
ears, tail, etc., and find that it's a cat. So, it will put it in the Cat category. This is the process
of how the machine identifies the objects in Supervised Learning.

Some real-world applications of supervised learning are Risk Assessment, Fraud Detection,


Spam filtering, etc.

Categories of Supervised Machine Learning

Supervised machine learning can be classified into two types of problems, which are given
below:

o Classification:- Classification algorithms are used to solve the classification problems


in which the output variable is categorical, such as "Yes" or No, Male or Female, Red
or Blue, etc.

Some popular classification algorithms are given below:

o Random Forest Algorithm


o Decision Tree Algorithm
o Logistic Regression Algorithm
o Support Vector Machine Algorithm

Regression:-  These are used to predict continuous output variables, such as market trends,
weather prediction, etc.

Some popular Regression algorithms are given below:

o Simple Linear Regression Algorithm


o Multivariate Regression Algorithm
o Decision Tree Algorithm
o Lasso Regression
2. Unsupervised Machine Learning

Unsupervised learning is different from the Supervised learning technique; as its name
suggests, there is no need for supervision. It means, in unsupervised machine learning, the
machine is trained using the unlabelled dataset, and the machine predicts the output without
any supervision.

In unsupervised learning, the models are trained with the data that is neither classified nor
labelled, and the model acts on that data without any supervision.

The main aim of the unsupervised learning algorithm is to group or categories the unsorted
dataset according to the similarities, patterns, and differences. Machines are instructed to find
the hidden patterns from the input dataset.

Let's take an example to understand it more preciously; suppose there is a basket of fruit
images, and we input it into the machine learning model. The images are totally unknown to
the model, and the task of the machine is to find the patterns and categories of the objects.

So, now the machine will discover its patterns and differences, such as colour difference,
shape difference, and predict the output when it is tested with the test dataset.

3. Semi-Supervised Learning

Semi-Supervised learning is a type of Machine Learning algorithm that lies between


Supervised and Unsupervised machine learning. It represents the intermediate ground
between Supervised (With Labelled training data) and Unsupervised learning (with no
labelled training data) algorithms and uses the combination of labelled and unlabelled
datasets during the training period.

Although Semi-supervised learning is the middle ground between supervised and


unsupervised learning and operates on the data that consists of a few labels, it mostly consists
of unlabelled data. As labels are costly, but for corporate purposes, they may have few labels.
It is completely different from supervised and unsupervised learning as they are based on the
presence & absence of labels.

To overcome the drawbacks of supervised learning and unsupervised learning algorithms, the
concept of Semi-supervised learning is introduced. The main aim of semi-supervised
learning is to effectively use all the available data, rather than only labelled data like in
supervised learning. Initially, similar data is clustered along with an unsupervised learning
algorithm, and further, it helps to label the unlabelled data into labelled data. It is because
labelled data is a comparatively more expensive acquisition than unlabelled data.

We can imagine these algorithms with an example. Supervised learning is where a student is
under the supervision of an instructor at home and college. Further, if that student is self-
analysing the same concept without any help from the instructor, it comes under unsupervised
learning. Under semi-supervised learning, the student has to revise himself after analysing the
same concept under the guidance of an instructor at college.

4. Reinforcement Learning
Reinforcement learning works on a feedback-based process, in which an AI agent (A
software component) automatically explore its surrounding by hitting & trail, taking action,
learning from experiences, and improving its performance. Agent gets rewarded for each
good action and get punished for each bad action; hence the goal of reinforcement learning
agent is to maximize the rewards.

In reinforcement learning, there is no labelled data like supervised learning, and agents learn
from their experiences only.

The reinforcement learning process is similar to a human being; for example, a child learns


various things by experiences in his day-to-day life. An example of reinforcement learning is
to play a game, where the Game is the environment, moves of an agent at each step define
states, and the goal of the agent is to get a high score. Agent receives feedback in terms of
punishment and rewards.

Due to its way of working, reinforcement learning is employed in different fields such
as Game theory, Operation Research, Information theory, multi-agent systems.

Supervised Learning Unsupervised Learning


Supervised learning algorithms are trained using labelled Unsupervised learning algorithms are tra
data. unlabelled data.

Supervised learning model takes direct feedback to check Unsupervised learning model does not take any
if it is predicting correct output or not.

Supervised learning model predicts the output. Unsupervised learning model finds the hidden
data.

In supervised learning, input data is provided to the In unsupervised learning, only input data is pro
model along with the output. model.

The goal of supervised learning is to train the model so The goal of unsupervised learning is to find
that it can predict the output when it is given new data. patterns and useful insights from the unknown d

Supervised learning needs supervision to train the model. Unsupervised learning does not need any su
train the model.

Supervised learning can be categorized Unsupervised Learning can be classified i


in Classification and Regression problems. and Associations problems.

Supervised learning can be used for those cases where we Unsupervised learning can be used for those
know the input as well as corresponding outputs. we have only input data and no corresponding o

Supervised learning model produces an accurate result. Unsupervised learning model may give less ac
as compared to supervised learning.

Supervised learning is not close to true Artificial Unsupervised learning is more close to the tr
intelligence as in this, we first train the model for each Intelligence as it learns similarly as a child
data, and then only it can predict the correct output. routine things by his experiences.

It includes various algorithms such as Linear Regression, It includes various algorithms such as Cluste
Logistic Regression, Support Vector Machine, Multi- and Apriori K-mean, C-mean, algorithm.
class Classification, Decision tree, Bayesian Logic, etc.

2.Write the differences between Supervised Learning and Unsupervised Learning

7.Explain any five use cases of Data Science Implementation


Data Science Use Cases for Healthcare
A number of factors make data science indispensable to healthcare in the present day, the
most important of them being the competitive demand for valuable information in the health
market.
The collection of patient data through proper channels can help provide improved quality
healthcare to consumers.
From doctors to health insurance providers to institutions, all of them rely on the collection of
factual data and its accurate analysis to make well-informed decisions about patients' health
situations.
Data Science use cases in Healthcare
1. monitoring real-time data from wearables
2. predictive analytics
3. medical image analysis
4. drug discovery
5. genetics research
6. virtual assistants
7. customer data management

Data Science Use Cases for Finance Industry


Finance management used to require a lot of effort and time, but not anymore. Using Data
Science, now one can quickly analyse finance and make a better decision to manage
finance. The use of data science in financial markets has helped the industry in many ways.
It is through the use of data science for finance that firms are able to have a better
bonding with the consumers by knowing their choices, which in turn leads to an increase in
their sales, thus increasing their profit margin. It also helps to identify risks and fraud and
protect the firm. Thus, a data scientist is the most important asset to a firm without which an
organization cannot simply perform.

1. Trend forecasting

2. Fraud detection

3. Market research

4. Investment management

5. Risk analysis

6. Task automation

7. Customer service

8. Scalability

Data Science Use Cases for Retail

Retail data is extremely growing in volume, variety, and value with every year. Retailers are
using data science to turn insights into profitable margins by developing data-driven plans.
These uses of data science in retail are giving retailers an opportunity to be in the market,
improve the customer experience, and increase their sales and hence revenue. And as
technology continues to advance: Data science still has so much more to offer in the retail
world.

1. Recommendation Engines
2. Personalized Target Marketing
3. Price Optimization
4. Intelligent cross-selling and upselling
5. Inventory management
6. Customer sentiment analysis
7. Foretelling trends through social media
8. Managing real estate
9. Customer lifetime value prediction
10. Powering Augmented Reality

Data Science Use Cases for Transport and Logistics

Using the data we receive from vehicular sensors and turning it into live information
on the condition of the assets, we save companies significant amounts of money every year. 
Rather than having to maintain everything on a time or distance basis it’s done on the
actual condition. They therefore improve safety — which is a primary concern — and can
ensure they avoid delay in delivery their assets, and can avoid inflicting unnecessary costs.

1. tracking the whole transportation process end-to-end,

2. making all the activities fully automated and transparent,

3. delivering on time,

4. route optimization,

5. dynamic pricing,

6. maintaining the stock of supplies,

7. protecting perishable goods,

8. monitoring the vehicles' conditions,

9. improving production networks,

10. forecasting demand

11. improving customer service.


Data Science Use Cases for Telecom Industry

One of the most innovative aspects in the telecommunications industry today is data
science. With greater volumes of data than ever before, telecom providers are increasingly
leveraging data science tools and Artificial Intelligence to make sense of it all. 
Since the main activities of companies working in the telecommunication sector
involve data transfer, exchange, and import, it is imperative that telecom providers invest in
data science tools that can manage and extract useful insights from the vast amount of data
generated every day. 

1. streamlining the operations,

2. optimizing the network,

3. filtering out spam,

4. customer segmentation

5. improving data transmissions,

6. call detail record analysis

7. performing real-time analytics,

8. developing efficient business strategies,

9. increased network security

10. creating successful marketing campaigns,

11. targeted marketing

12. increasing revenue.


8.Differentiate between Classification and Prediction.

Classification Prediction

Classification is the process of identifying which Predication is the process of identifying the
category a new observation belongs to base on a or unavailable numerical data for a new obse
training data set containing observations whose
category membership is known.

In classification, the accuracy depends on finding In prediction, the accuracy depends on how
the class label correctly. given predictor can guess the value of a pre
attribute for new data.

In classification, the model can be known as the In prediction, the model can be known
classifier. predictor.

A model or the classifier is constructed to find the A model or a predictor will be construct
categorical labels. predicts a continuous-valued function or
value.

For example, the grouping of patients based on For example, We can think of predic
their medical records can be considered a predicting the correct treatment for a p
classification. disease for a person.
11.Briefly explain any five machine learning algorithms with examples.
Decision Tree Algorithm

A decision tree is a supervised learning algorithm that is mainly used to solve the
classification problems but can also be used for solving the regression problems. It can work
with both categorical variables and continuous variables. It shows a tree-like structure that
includes nodes and branches, and starts with the root node that expand on further branches till
the leaf node. The internal node is used to represent the features of the dataset, branches
show the decision rules, and leaf nodes represent the outcome of the problem.

Some real-world applications of decision tree algorithms are identification between cancerous
and non-cancerous cells, suggestions to customers to buy a car

K-Nearest Neighbour (KNN)

K-Nearest Neighbour is a supervised learning algorithm that can be used for both
classification and regression problems. This algorithm works by assuming the similarities
between the new data point and available data points. Based on these similarities, the new
data points are put in the most similar categories. It is also known as the lazy learner
algorithm as it stores all the available datasets and classifies each new case with the help of
K-neighbours. The new case is assigned to the nearest class with most similarities, and any
distance function measures the distance between the data points. The distance function can
be Euclidean, Minkowski, Manhattan, or Hamming distance

Random Forest Algorithm

Random forest is the supervised learning algorithm that can be used for both classification
and regression problems in machine learning. It is an ensemble learning technique that
provides the predictions by combining the multiple classifiers and improve the performance
of the model.

It contains multiple decision trees for subsets of the given dataset, and find the average to
improve the predictive accuracy of the model. A random-forest should contain 64-128 trees.
The greater number of trees leads to higher accuracy of the algorithm.

To classify a new dataset or object, each tree gives the classification result and based on the
majority votes, the algorithm predicts the final output.

Random forest is a fast algorithm, and can efficiently deal with the missing & incorrect data
Apriori Algorithm

Apriori algorithm is the unsupervised learning algorithm that is used to solve the association
problems. It uses frequent itemsets to generate association rules, and it is designed to work on
the databases that contain transactions. With the help of these association rule, it determines
how strongly or how weakly two objects are connected to each other. This algorithm uses a
breadth-first search and Hash Tree to calculate the itemset efficiently.

 Linear Regression

Linear regression is one of the most popular and simple machine learning algorithms that is
used for predictive analysis. Here, predictive analysis defines prediction of something, and
linear regression makes predictions for continuous numbers such as salary, age, etc.

It shows the linear relationship between the dependent and independent variables, and shows
how the dependent variable(y) changes according to the independent variable (x).

12.Explain the role of Datawarehouse in Data Science

1) A Data Warehouse is Built by combining data from multiple diverse


sources that support analytical reporting, structured and unstructured
queries, and decision making for the organization

2) Data Warehousing is a step-by-step approach for constructing and using


a Data Warehouse.

3) Many data scientists get their data in raw formats from various sources
of data and information.
4) But, for many data scientists also as business decision-makers,
particularly in big enterprises, the main sources of data and information
are corporate data warehouses.

5) A data warehouse holds data from multiple sources, including internal


databases and Software (SaaS) platforms.

6) After the data is loaded, it often cleansed, transformed, and checked for
quality before it is used for analytics reporting, data science, machine
learning, or anything.

7) It is not frequently updated.

5.Demonstrate the stages of Data Science Project Life Cycle.

• 5 Steps of a Data Science Project Lifecycle:


– Obtain Data
– Scrub Data
– Explore Data
– Model Data
– Interpreting Data
Obtain Data: -
1. In this step, you will need to query databases, using technical skills like MySQL to
process the data.
2. You may also receive data in file formats like Microsoft Excel.
3. The different type of databases you may encounter are like PostgreSQL, Oracle, or
even non-relational databases (NoSQL) like MongoDB.
4. Another way to obtain data is to scrape from the websites using web scraping tools
such as Beautiful Soup.
5. Another popular option to gather data is connecting to Web APIs. Websites such as
Facebook and Twitter allow users to connect to their web servers
6. The most traditional way of obtaining data is directly from files and access their data
Scrub Data:-

1) This process is for us to “clean” and to filter the data.


2) Remember the “garbage in, garbage out” philosophy, if the data is unfiltered and
irrelevant, the results of the analysis will not mean anything.
3) In this process, you need to convert the data from one format to another and consolidate
everything into one standardized format across all data.
4) In some situations, we will also need to filter the lines if you are handling locked files.
Locked files refer to web locked files where you get to understand data such as the
demographics of the users, time of entrance into your websites etc.
5)  Scrubbing data also includes the task of extracting and replacing values.
6) Lastly, you will also need to split, merge and extract columns. For example, for the place
of origin, you may have both “City” and “State”. Depending on your requirements, you
might need to either merge or split these data.
Explore data:-
1. You will need to inspect the data and its properties.
2. Different data types like numerical data, categorical data, ordinal and nominal data etc.
require different treatments.
3.  The next step is to compute descriptive statistics to extract features and test significant
variables.
4. Lastly, we will utilize data visualization to help us to identify significant patterns and
trends in our data. 

Model Data: -
1. “Where the magic happens”
2. Reduce the dimensionality of your data set.
3. Not all your features or values are essential to predicting your model.
4. What you need to do is to select the relevant ones that contribute to the prediction of
results.
5. There are a few tasks we can perform in modelling.
5.1. classification to differentiating the emails you received as “Inbox” and “Spam” using
logistic regressions
6. We can also use modelling to group data to understand the logic behind those clusters.
6.1. For example, we group our e-commerce customers to understand their behavior on
your website. 
7. In short, we use regression and predictions for forecasting future values, and classification
to identify, and clustering to group values.
Interpreting Data: -
1. Interpreting data refers to the presentation of your data to a non-technical layman. 
2. We deliver the results in to answer the business questions we asked when we first started
the project, together with the actionable insights that we found through the data science
process.
3. Actionable insight is a key outcome that we show how data science can bring about
predictive analytics and later on prescriptive analytics. In which, we learn how to repeat a
positive result, or prevent a negative outcome.
4. It is essential to present your findings in such a way that is useful to the organization, or
else it would be pointless to your stakeholders.
5. In this process, technical skills only are not sufficient.
6. One essential skill you need is to be able to tell a clear and actionable story.
7.  If your presentation does not trigger actions in your audience, it means that your
communication was not efficient.
8.  Remember that you will be presenting to an audience with no technical background, so
the way you communicate the message is key.

6.Explain various components of Data Science projects.


1. Problem
• This is the top, fundamental component. It can be anything from building a market
segmentation, building a recommendation system, association rule discovery for fraud
detection, or simulations to predict extreme events such as floods.  
2. Data
• It comes in many shapes: transactional (credit card transactions), real-time, sensor
data (IoT), unstructured data (tweets), big data, images or videos, and so on.
Typically, raw data needs to be identified or even built and put into databases
(NoSQL or traditional), then cleaned and aggregated using EDA (exploratory data
analysis). The process can include selecting and defining metrics.
3. Algorithms
• Also called techniques. Examples include decision trees, indexation algorithm,
Bayesian networks, or support vector machines. A rather big list can be found here.
4. Models
• By models, I mean testing algorithms, selecting, fine-tuning, and combining the best
algorithms using techniques such as model fitting, model blending, data reduction,
feature selection, and assessing the yield of each model, over the baseline. It also
includes calibrating or normalizing data, imputation techniques for missing data,
outliers processing, cross-validation, over-fitting avoidance, robustness testing and
boosting, and maintenance. Criteria that make a model desirable include robustness or
stability, scalability, simplicity, speed, portability, adaptability (to changes in the
data), and accuracy (sometimes measured using R-squared, though I recommend this
alternative instead).
5. Programming
• There is almost always some code involved, even if you use a black-box solution.
Typically, data scientists use Python, R or Java, and SQL. However, I've
completed some projects that did not involve real coding, but instead, machine-to-
machine communications via API's. Automation of code production (and of data
science in general) is a hot topic, as evidenced by the publication of articles such
as The Automated Statistician, and my own work to design simple, robust black-box
solutions.    
6. Environments
• Some call it packages. It can be anything such as a bare Unix box accessed remotely
combined with scripting languages and data science libraries such as Pandas (Python),
or something more structured such as Hadoop. Or it can be an integrated database
system from Teradata, Pivotal or other vendors, or a package like SPSS, SAS,
RapidMiner or MATLAB, or typically, a combination of these.

3.Describe the features of non-Relational databases.


• NoSQL databases, like MongoDB, are nonrelational, distributed database systems that
were designed to rise to the big data challenge.
• NoSQL databases step out past the traditional relational database architecture and
offer a much more scalable, efficient solution.
• NoSQL systems facilitate non-SQL data querying of non-relational or schema free,
semi-structured and unstructured data.
• In this way, NoSQL databases are able to handle the structured, semi-structured, and
unstructured data sources that are common in big data systems.
• NoSQL offers four categories of non-relational databases — graph databases,
document databases, key-values stores, and column family stores.
• Since NoSQL offers native functionality for each of these separate types of data
structures, it offers very efficient storage and retrieval functionality for most types of
non-relational data.
• This adaptability and efficiency make NoSQL an increasingly popular choice for
handling big data and for overcoming processing challenges that come along with it.

4.Discuss different memory management techniques in Operating Systems.

In operating systems, memory management refers to the techniques and mechanisms


employed to efficiently allocate, utilize, and deallocate memory resources in a computer
system. Here are some of the key memory management techniques commonly used:

Single Contiguous Allocation: This technique involves dividing the available memory into a
single, continuous block. The operating system allocates memory to processes in a
contiguous manner. It is a simple approach but can lead to fragmentation issues, both external
fragmentation (unused memory blocks scattered throughout the system) and internal
fragmentation (wasted memory within allocated blocks).
Partitioned Allocation: Partitioned allocation divides the memory into fixed-size partitions
or variable-sized partitions. Each partition can accommodate a single process. This technique
can be further categorized into:
Fixed Partitioning: The memory is divided into fixed-size partitions, and each
partition is allocated to a process. The remaining unused memory in each partition can cause
internal fragmentation.
Variable Partitioning: The memory is divided into variable-sized partitions based on
process requirements. It reduces internal fragmentation but can lead to external
fragmentation.
Paging: Paging divides the physical memory and processes into fixed-size blocks called
pages. The logical address space of a process is divided into fixed-size blocks called page
frames. The mapping between logical and physical addresses is maintained through a page
table. Paging helps to manage memory efficiently, reduces external fragmentation, and
enables non-contiguous allocation. However, it may incur some overhead due to page table
management.
Segmentation: Segmentation divides the logical address space of a process into variable-
sized blocks called segments. Each segment represents a logical unit such as code, data, or
stack. Segmentation allows for flexible memory allocation and supports dynamic data
structures. However, it can lead to external fragmentation, and efficient management of
variable-sized segments is complex.
Virtual Memory: Virtual memory is a technique that allows processes to use more memory
than physically available by utilizing disk space as an extension of physical memory. It
provides the illusion of a large, contiguous address space to each process. Virtual memory
techniques include demand paging, where only required pages are brought into physical
memory, and page replacement algorithms (e.g., LRU, FIFO) to manage page movements
between physical memory and disk.
Swapping: Swapping involves moving an entire process from main memory to secondary
storage (e.g., disk) when it is not actively executing. This technique helps in freeing up
memory for other processes but introduces higher latency due to disk I/O.

You might also like