Supervised Machine Learning
Supervised Machine Learning
Unsupervised
Unsupervised machine learning algorithms do not require human experts but
autonomously discover patterns in data. Unsupervised learning mainly deals with
unlabeled data. The model must work on its own to find patterns and information.
Examples of problems solved with unsupervised methods are clustering and
association:
Reinforcement
Reinforcement learning teaches the machine through trial and error using feedback
from its actions and experiences, also known as learning from mistakes. It involves
assigning positive values to desired outcomes and negative values to undesired
effects. The result is optimal solutions; the system learns to avoid adverse outcomes
and seek the positive. Practical applications of reinforcement learning include
building ratification intelligence for playing video games and robotics and industrial
automation.
Step 2a. Learning data - Create a learning data set used to train the model.
Step 2b. Testing data - Create a test dataset used to evaluate the model
performance. Only perform this step in the case of supervised learning.
Step 5. Model evaluation - Test the solution on the test data. The performances on
learning data are not necessarily transferrable to test data. The more complex and
fine-tuned the model is, the higher the chances are that the model will become prone
to overfitting, which means it cannot perform accurately against unseen data.
Overfitting can result in going back to the model learning process.
Pattern recognition uses the concept of learning to classify data based on statistical
information gained from patterns and their representations. Learning enables the
pattern recognition systems to be "trained" and adaptable to provide more accurate
results. When training the pattern recognition system, a portion of the dataset
prepares the system, and the remaining amount tests the system's accuracy. As
shown in the figure below, the data set divides into two groups: train the model and
test the model. The training data set is used to build the model and consists of about
80% of the data. It contains the set of images used to train the system. The testing
data set consists of about 20% of the data and measures the model's accuracy. For
example, if the system that identifies categories of birds can correctly identify seven
out of ten birds, then the system's accuracy is 70%.
Pattern recognition uses the concept of learning to classify data based on statistical
information gained from patterns and their representations. Learning enables the
pattern recognition systems to be "trained" and adaptable to provide more accurate
results. When training the pattern recognition system, a portion of the dataset
prepares the system, and the remaining amount tests the system's accuracy. As
shown in the figure below, the data set divides into two groups: train the model and
test the model. The training data set is used to build the model and consists of about
80% of the data. It contains the set of images used to train the system. The testing
data set consists of about 20% of the data and measures the model's accuracy. For
example, if the system that identifies categories of birds can correctly identify seven
out of ten birds, then the system's accuracy is 70%.
Question 1
Multiple choice question
When several items are grouped, which type of machine learning algorithm can
determine which items in the group predict the presence of other items?
Association
Classification
Clustering
Regression
Complete Question 2
Question 2
Multiple choice question
Which machine learning algorithm uses data sets verified by experts as its
learning basis?
Association
Clustering
Routing
Supervised
Complete Question 3
Question 3
Multiple choice question
Which method describes how a machine learns using the reinforcement machine
learning model?
Trial and error using feedback from the action and experiences.
Complete Question 4
Question 4
Multiple choice question
Which step in the machine learning process transforms data into a structured
format by removing missing data and corrupted observations?
Testing data
Learning data
Preparing data
Model evaluation
Complete Question 5
Question 5
Multiple choice question
In training a pattern recognition system which data set measures the accuracy
achieved by the model?
Data Scientist
Data scientists apply statistics, machine learning, and analytic approaches to answer
critical business questions. Data scientists interpret and deliver the results of their
findings by using visualization techniques, building data science apps, or narrating
exciting stories about the solutions to their data (business) problems. They work with
data sets of different sizes and run algorithms on large data sets. Data scientists
must be current with the latest automation and machine learning technologies. The
requirements to perform these roles include statistical and analytical skills,
programming knowledge (Python, R, Java), and familiarity with Hadoop, a collection
of open-source software utilities that facilitates working with massive amounts of
data. Data scientists are data wranglers who organize and deliver value from data.
Data Engineer
Data engineers are responsible for building and operationalizing data pipelines to
collect and organize data. They ensure the accessibility and availability of quality
data for data scientists and data analysts by integrating data from disparate sources
and performing data cleaning and transformation. Skills needed for data engineering
roles include understanding the architecture, tools, and methods of data ingestion,
transformation and storage; and proficiency with multiple programming languages
(including Python and Scala). In summary, data engineers build and operate the data
infrastructure needed to prepare data for further analysis by data analysts and
scientists.
Data Analyst
Data analysts query, process, provide reports, and summarize and visualize data.
They leverage existing tools and methods to solve a problem. They help people,
such as business analysts, to understand specific queries with ad-hoc reports and
charts. Data analysts must understand basic statistical principles, cleaning different
data types, visualization, and exploratory data analysis. In short, data analysts
analyze data to help businesses and other organizations make informed decisions.
Changes saved
Your answers will vary but may include the following platforms.
Social Media Platforms - Posting on social media (Like Twitter, Quora, Reddit, and
LinkedIn) can build your legitimacy as a data professional and a good way to gain
more visibility for your projects.
DataCamp Workspace - A collaborative cloud-based notebook where you can
instantly analyze data, collaborate with others, and publish an analysis. When you
create projects, you can share the link to your DataCamp profile so others can have
access.
GitHub - A website and cloud service that allows developers to store, manage, and
monitor their code repositories. It enables users to collaborate on or publish open-
source projects.
Kaggle - An online community platform for data enthusiasts to collaborate, find and
publish datasets, publish notebooks, and compete with others to solve data science
challenges. To showcase your work, create a notebook or Kernel that helps others
discover and understand your project.
Sites for Building and Hosting a Personal Website or Blog - ersonal websites or
blogs are another way to have your projects all in one place and share them
inexpensively. These sites allow more control and customization of your content than
DataCamp Workspace and Kaggle. WordPress and Wix are good options for
building and hosting a Blog or Website.
Data professionals fill three primary roles in organizations: Data Analyst, Data
Engineer, and Data Scientist. A few tools and skills commonly mentioned in job
advertisements for entry-level positions include:
Topic Objective: Explain the next steps necessary to create a portfolio showcasing
data analytic skills.
Question 1
Multiple choice question
To query and process data, provide reports, summarize and visualize data.
To build and operationalize data pipelines for collecting and organizing data.
To enter and validate data, to improve the reliability of the data being collected.
Complete Question 2
Question 2
Multiple choice question
To query and process data, provide reports, summarize and visualize data.
To build and operationalize data pipelines for collecting and organizing data.
Question 3
Multiple choice question
Resume
Word document
Portfolio
Excel spreadsheet
Complete Question 4
Question 4
Multiple choice question
To query and process data, provide reports, summarize and visualize data.
To build and operationalize data pipelines for collecting and organizing data.
To enter and validate data, to improve the reliability of the data being collected.
Complete Question 5
Question 5
Multiple choice question
Which three are skills that are typical for an entry level data analyst position?
(Choose three.)
Question 1
Multiple choice question
A data type that identifies either a true (T) state or a false (F) state.
A text data type to store confidential information such as social security numbers.
The Boolean data type represents either a logical True (T) or False (F) state. It can
be used to test the state of a variable or an expression in computer programming.
Complete Question 2
Question 2
Matchin g. Select from lists and then submit.
Refer to the exhibit. Match the column with the data type that it contains.
Shipped
Boolean
Revenue
Floating point
Quantity
Integer
Order number
String
Product category
String
Column
Shipped Boolean
Revenue Floating point
Quantity Integer
Order number String
Product category
Complete Question 3
String
Question 3
Multiple choice question
A sales manager in a large automobile dealership wants to determine the top four
best selling models based on sales data over the past two years. Which two charts
are suitable for the purpose? (Choose two.)
Column chart
Scatter chart
Line chart
Bar chart
Pie chart
Column and bar charts are the type of charts to be used when the purpose is to
display the value of a specific data point and compare that value across similar
categories. Column charts are positioned vertically, and bar charts are similar to
column charts with the exception that they are positioned horizontally.
Complete Question 4
Question 4
Multiple choice question
Spreadsheet data
Blogs
Newspaper articles
White papers
Structured data is entered and maintained in fixed fields within a file or record, such
as data found in relational databases and spreadsheets. Structured data entry
requires a certain format to minimize errors and make it easier for computer
interpretation.
Complete Question 5
Question 5
Multiple choice question
Integer
String
Floating point
The date and time type is important in recording when a piece of data is generated.
Complete Question 6
Question 6
Multiple choice question
What is the most cost-effective way for businesses to store their big data?
On-premises
Cloud storage
Cloud storage is the most cost-effective way to store big data. Cloud storage enables
big data storage on servers maintained by a third-party service provider on their
network infrastructure. The cloud service provider purchases, installs, and maintains
all hardware, software, and supporting infrastructure in its data centers. When using
cloud services, an organization avoids the enormous costs of building and
supporting the infrastructure necessary to store the vast amounts.
Complete Question 7
Question 7
Multiple choice question
Changing the format, structure, or value of data takes place in which phase of the
data pipeline?
Storage
Analysis
Ingestion
Transformation
Data transformation involves the process of changing the format, structure, or values
of data so that it is clean and better organized, making it easier for both humans and
computers to use.
Complete Question 8
Question 8
Multiple choice question
Data that does not fit into the rows and columns of traditional relational data storage
systems.
Data that fits into the rows and columns of traditional relational data storage
systems.
Geolocation data.
Unstructured data is data that does not fit into the rows and columns of a traditional
relational data storage systems. This unstructured data is vast and makes up the
largest segment of big data.
Complete Question 9
Question 9
This question component requires you to sele ct the matching option. When you have selected your answers select the submit button.
Velocity
Veracity
Variety
Volume
Describes the amount of data being transported and stored.
Is the process of preventing inaccurate data from spoiling data sets.
Describes the rate at which data is generated.
Describes a type of data that is not ready for processing and analysis.
Place the options in the following order:
Question 10
Multiple choice question
What is a major challenge for storage of big data with on-premises legacy data
warehouse architectures?
The volume of big data and its variety requires storage, management, and retrieval
of virtually limitless volumes of unstructured data, something that is a challenge for
on-premises legacy data warehouse architectures.
Complete Question 11
Question 11
Multiple choice question
Which step in a typical machine learning process involves testing the solution on the
test data?
Model evaluation
Data preparation
Learning data
Question 12
Multiple choice question
Which type of machine learning algorithm would be used to train a system to detect
spam in email messages?
Clustering
Classification
Regression
Association
Question 13
Multiple choice question
Which type of learning algorithm can predict the value of a variable of a loan interest
rate based on the value of other variables?
Regression
Classification
Clustering
Association
Question 14
Multiple choice question
What are two types of supervised machine learning algorithms? (Choose two.)
Regression
Mode
Association
Mean
Clustering
Classification
Two algorithms used with supervised machine learning are classification and
regression. Supervised machine learning algorithms are the most common
algorithms used in big data analytics.
Complete Question 15
Question 15
Multiple choice question
What are two applications that would gain ratification intelligence by using the
reinforcement learning model? (Choose two.)
Reinforcement learning model teaches the machine through trial and error using
feedback from its actions and experiences. It involves assigning positive values to
desired outcomes and negative values to undesired effects. Practical applications of
reinforcement learning include building ratification intelligence for playing video
games and in robotics and industrial automation.
Complete Question 16
Question 16
This question component requires you to sele ct the matching option. When you have selected your answers select the submit button.
Data Scientist
Data Analyst
Data Engineer
Leverage existing tools and problem-solving methods to query and process data,
provide reports, summarize and visualize data.
Build and operationalize data pipelines for collecting and organizing data while
ensuring the accessibility and availability of quality data.
Apply statistics, machine learning, and analytic approaches in order to interpret and
deliver visualized results to critical business questions.
Data Analyst Leverage existing tools and problem-solving methods to query and process data, provide repor
Data Engineer Build and operationalize data pipelines for collecting and organizing data while ensuring the a
Data Scientist Apply statistics, machine learning, and analytic approaches in order to interpret and deliver vis
Complete Question 17
Question 17
This question component requires you to sele ct the matching option. When you have selected your answers select the submit button.
Match the data professional role with the skill sets required.
Data analyst
Data scientist
Data engineer
Ability to understand basic statistical principles, cleaning different types of data, data
visualization, and exploratory data analysis.
Ability to understand the architecture and distribution of data acquisition and storage,
multiple programming languages (including Python and Java), and knowledge of
SQL database design including an understanding of creating and monitoring
machine learning models.
Ability to use statistical and analytical skills, programming knowledge (Python, R,
Java), and familiarity with Hadoop; a collection of open-source software utilities that
facilitates working with massive amounts of data.
Data
Ability to understand basic statistical principles, cleaning different types of data, data visualizatio
Analyst
Data Ability to understand the architecture and distribution of data acquisition and storage, multiple pr
Engineer Java), and knowledge of SQL database design including an understanding of creating and monito
Data Ability to use statistical and analytical skills, programming knowledge (Python, R, Java), and fam
Scientist
Complete Question 18
source software utilities that facilitates working with massive amounts of data
Question 18
Multiple choice question
What are two job roles normally attributed to a data analyst? (Choose two.)
Turning raw data into information and insight, which can be used to make business
decisions.
Reviewing company databases and external sources to make inferences about data
figures and complete statistical calculations.
Building systems that collect, manage, and convert raw data into usable information.
Working in teams to mine big data for information that can be used to predict
customer behavior and identify new revenue opportunities.
Unstructured data is data that does not fit into the rows and columns of a traditional
relational data storage systems. This unstructured data is vast and makes up the
largest segment of big data.
Complete Question 19
Question 19
Multiple choice question
Which skill set is important for someone seeking to become a data scientist?
The ability to ensure that the database remains stable and maintaining backups of
the database and execute database updates and modifications.
Data scientists apply statistics, machine learning, and analytic approaches to answer
critical business questions. They need thorough knowledge of the latest automation
and machine learning technologies, analytical skills, deep programming knowledge,
and familiarity with Hadoop.
Complete Question 20
Question 20
Multiple choice question
A data analyst is building a portfolio for future prospective employers and wishes to
include a previously completed project. What three process documentations would
be included in building that portfolio? (Choose three.)
A list of data analytic tools that did not work in the described manner for the project.