Managing Machine Learning Projects Final
Managing Machine Learning Projects Final
In which scenario would it be more appropriate to use a heuristic rather than a machine learning model?
*A: When the problem is well-defined and the rules are clear.
Feedback: Correct! Heuristics work well when the problem space is well-defined and can be solved with
a set of clear rules.
Feedback: Incorrect. Machine learning models are better suited for large and complex datasets where
patterns need to be learned.
Feedback: Incorrect. Machine learning is better for scenarios where continuous improvement is needed
based on new data.
Why is adding business value an important consideration when using machine learning in products?
Feedback: Incorrect. Adding business value does not inherently increase model complexity.
*B: Because it ensures that the machine learning solution aligns with business objectives.
Feedback: Correct! Aligning with business objectives ensures the machine learning solution provides
tangible benefits.
C: Because it reduces the need for human intervention in the learning process.
Feedback: Incorrect. Adding business value does not necessarily reduce human intervention.
Feedback: Correct! A lack of high-quality data is a common reason why machine learning projects fail.
Feedback: Incorrect. While an overly broad problem statement can be an issue, it is not the main reason
for the failure of machine learning projects.
Feedback: Incorrect. Excessive stakeholder involvement is not a common reason for the failure of
machine learning projects.
Feedback: Incorrect. Conducting too many experiments, while potentially inefficient, is not a common
reason for failure in machine learning projects.
Which of the following activities is part of the iterative process of solution design in machine learning?
Feedback: Correct! Brainstorming solutions is a key activity in the iterative process of solution design.
Feedback: Incorrect. Ignoring user feedback is not a part of the iterative process of solution design.
C: Skipping experiments
Feedback: Incorrect. Skipping experiments is not a part of the iterative process of solution design.
D: Avoiding mockups
Feedback: Incorrect. Avoiding mockups is not a part of the iterative process of solution design.
Which of the following criteria is essential for determining a good opportunity to apply machine
learning?
Feedback: Correct! Having labeled training data is crucial for supervised learning models to be trained
effectively.
Feedback: Not quite. While computational cost is a factor, it's not the primary criterion for determining a
good opportunity for machine learning.
Feedback: Incorrect. Data preprocessing is often required and does not determine the suitability of
applying machine learning.
Feedback: Wrong choice. If a problem can be easily solved with traditional algorithms, it might not be a
good opportunity for applying machine learning.
What is a key criterion for identifying good opportunities to apply machine learning?
Feedback: Incorrect. Rule-based systems are not always suitable for problems that require learning from
data.
Feedback: Correct! Having a large amount of data is crucial for training machine learning models.
Feedback: Incorrect. Problems requiring human intuition are often not suitable for machine learning.
Which industry example did Jon Reifschneider share to explain the importance of data quality in ML
projects?
Feedback: Correct! Jon Reifschneider shared an example from healthcare analytics to emphasize data
quality.
B: Retail forecasting
Feedback: No, while retail forecasting is important, it was not the example used. Try again!
C: Financial modeling
Feedback: No, financial modeling was not the example discussed. Try again!
Feedback: No, supply chain optimization was not the example used to explain data quality. Try again!
Which of the following best describes the importance of using heuristics and baseline models in the
development of machine learning products?
*A: They provide a reference point and streamline the development process.
Feedback: Correct! Heuristics and baseline models provide a reference point and help streamline the
development process.
Feedback: Incorrect. Heuristics and baseline models are more about streamlining the development
process rather than aiding in the final evaluation.
Feedback: This is not correct. While time-saving can be a benefit, their primary importance lies in
providing a reference point and streamlining the process.
What is one of the primary responsibilities of leading AI and machine learning development projects?
Feedback: Correct, ensuring alignment across different functional teams is crucial for the success of ML
projects.
Feedback: Incorrect, considering the business impact is essential along with model accuracy.
Feedback: Incorrect, documentation is vital for maintaining and understanding the ML project.
Which of the following are important criteria for finding good opportunities to apply machine learning?
Feedback: Correct! Having large volumes of data is essential for training effective machine learning
models.
B: The problem can be solved with simple heuristics
Feedback: Incorrect. Problems that can be solved with simple heuristics are not ideal for machine
learning.
Feedback: Correct! Clear patterns in the data make it easier for machine learning models to learn and
make predictions.
Feedback: Incorrect. Problems requiring significant domain expertise may not always be suitable for
machine learning.
Which of the following best describes a limitation of using heuristics compared to machine learning?
Feedback: Incorrect. Heuristics typically do not require large amounts of data for training.
Feedback: Correct! Heuristics often lack the flexibility to adapt to new data or changing conditions.
C: Heuristics are generally more computationally intensive than machine learning models.
Feedback: Incorrect. Heuristics are generally less computationally intensive than machine learning
models.
Feedback: Incorrect. Overfitting is more commonly an issue with machine learning models, not
heuristics.
Which of the following is a key criterion for identifying a good opportunity to apply machine learning?
Feedback: Not quite. While investment cost is important, it is not a key criterion for identifying machine
learning opportunities.
Feedback: Incorrect. Limited computational resources can actually hinder the application of machine
learning.
Feedback: No, machine learning is typically used when tasks can be automated and do not rely heavily
on human intuition.
What is one of the primary benefits of using machine learning over heuristics?
Feedback: Correct! Machine learning models can learn from additional data and improve their
performance over time.
Feedback: Incorrect. Heuristics are simpler and less adaptable compared to machine learning when
handling large datasets.
Feedback: Incorrect. Machine learning typically requires more computational resources than heuristics.
Feedback: Incorrect. Machine learning is generally more flexible and adaptable to new data patterns
than heuristics.
Feedback: While timely completion is important, it's not the primary purpose of the data science
process.
Feedback: Communication is crucial but is a secondary benefit of the data science process.
Feedback: Correct! The data science process aims to systematically tackle problems and provide robust
ML solutions.
Feedback: Reducing computational costs can be a consideration, but it is not the primary purpose of the
data science process.
*A: When dealing with a large amount of data that patterns can be learned from.
Feedback: Correct! Machine learning is advantageous when there is a large amount of data to learn
patterns from.
B: When the problem requires real-time decision making with minimal computation.
Feedback: Incorrect. Heuristics are often better suited for real-time decision-making with minimal
computation.
*C: When the relationships in data are too complex for rule-based solutions.
Feedback: Correct! Machine learning excels in finding complex relationships in data that rule-based
solutions may miss.
D: When the solution needs to remain static and unchanged over time.
Feedback: Incorrect. Heuristics are better suited for solutions that need to remain static and unchanged
over time.
Why is it important to gather user feedback during the iterative process of solution design in machine
learning?
Feedback: Correct! User feedback helps in refining the solution to better meet user needs.
Feedback: Incorrect. Finalizing deployment is not the primary reason for gathering feedback.
Which of the following steps is the first in the data science process?
Feedback: Correct! Defining the problem is the first step in the data science process.
B: Collecting data
Feedback: No, collecting data comes after defining the problem. Try again!
C: Cleaning data
Feedback: No, cleaning data is not the first step. It's done after collecting the data. Try again!
D: Building models
Feedback: No, building models happens much later in the data science process. Try again!
What is one of the primary steps involved in the iterative process of solution design when working with
machine learning?
Feedback: Correct! Brainstorming is a crucial initial step in the iterative process of solution design.
Feedback: Not quite. Deploying the final model comes later in the process.
Feedback: Wrong. Skipping the experiment phase can result in unvalidated solutions.
Feedback: Correct! Understanding the business context is essential for framing machine learning
problems.
Feedback: Incorrect. Ignoring data quality is not a key consideration for framing machine learning
problems.
Feedback: Correct! Defining the problem clearly is fundamental for framing machine learning problems.
Feedback: Incorrect. Avoiding stakeholder input is not a key consideration for framing machine learning
problems.
Feedback: Incorrect. Focusing only on model accuracy is not a key consideration for framing machine
learning problems.
Which of the following are key elements in the design of machine learning systems?
Feedback: No, user interface design is important but not a key element in the ML systems design
process.
E: System deployment
Feedback: No, while system deployment is important, it is not considered a key element in the design
phase of ML systems.
Which of the following practices are crucial for validating potential solution ideas in machine learning?
*A: Using baseline models
Feedback: Correct! Baseline models provide a reference point for evaluating more complex models.
B: Ignoring heuristics
Feedback: Incorrect. Heuristics are important for guiding the development process.
Feedback: Correct! Heuristics are useful for making quick, informed decisions.
Identify the activities involved in evaluating the feasibility of using machine learning to solve a problem.
Feedback: Correct! Assessing data quality and availability is crucial in determining if a machine
learning approach is feasible.
Feedback: This is incorrect. Brainstorming solutions is more relevant during the initial stages of solution
design, not specifically for feasibility evaluation.
Feedback: This is not correct. Gathering user feedback is typically part of the iterative design process,
not specifically for feasibility evaluation.
*E: Considering computational resources required.
Feedback: Correct! Considering the computational resources required is essential in evaluating the
feasibility of a machine learning solution.
Which of the following are important considerations when deciding to transition from heuristics to
machine learning?
Feedback: Correct! High-quality and sufficient data is crucial for training effective machine learning
models.
Feedback: Correct! Machine learning is often more suitable for complex problems where simple
heuristics may not suffice.
Feedback: Incorrect. While historical performance might provide insights, it is not a direct consideration
when deciding to transition from heuristics to machine learning.
Feedback: Correct! Machine learning models can often scale better than heuristics, especially for large
datasets and real-time applications.
Feedback: Incorrect. While it's important to have the right expertise, the decision should be based on the
problem and data rather than the team's current familiarity.
Question 24 - numeric
Suppose implementing a machine learning model decreases error rates by 25-30%. What is the
maximum decrease in error rates?
*A: 30.0
Feedback: Correct! The maximum decrease in error rates is 30%.
Default Feedback: Incorrect. Consider the upper bound of the given range.
Question 25 - numeric
*A: 80.0
Feedback: Correct! A significant portion of time in machine learning projects is spent on data
preparation.
Default Feedback: Incorrect. Consider the time typically spent on tasks like data cleaning, formatting,
and preprocessing in machine learning projects.
Identify the term that describes the ability of machine learning to provide customized experiences for
individual users. Please answer in all lowercase. Please answer in all lowercase.
*A: personalization
Default Feedback: Incorrect. Consider how machine learning can tailor experiences to individual user
preferences.
What is the term for the initial model used to evaluate a machine learning opportunity? Please answer in
all lowercase.
*A: prototype
Feedback: Correct! A prototype is often used to evaluate the feasibility of a machine learning
opportunity.
*B: proof
Feedback: Acceptable. Proof can also be used to refer to the initial evaluation model.
*C: concept
Feedback: Correct! Concept is another term used for the initial model.
Default Feedback: Not quite. Review the steps involved in evaluating machine learning opportunities.
Identify the term used to describe simple, rule-based solutions to problems that do not require learning
from data. Please answer in all lowercase.
*A: heuristics
*B: heuristic
Feedback: Correct! Heuristic is a term used to describe simple, rule-based solutions to problems.
Default Feedback: Incorrect. Please review the concept of rule-based solutions that do not learn from
data.
What is the key term for the initial, simplified version of a machine learning model used to validate
potential solution ideas? Please answer in all lowercase.
*A: baseline
Feedback: Correct! A baseline model is used to validate potential solution ideas in machine learning.
B: benchmark
Feedback: Incorrect. The correct term is not benchmark. Please try again.
Default Feedback: Incorrect. Please review the key terms related to validating potential solution ideas in
machine learning.
What is one essential criterion for a problem to be suitable for machine learning? Please answer in all
lowercase.
*A: data
Feedback: Correct! Having data is essential for training machine learning models.
*B: patterns
Feedback: Correct! Machine learning models need patterns in data to learn and make predictions.
Default Feedback: Incorrect. Please review the essential criteria for applying machine learning.
Question 31 - numeric
Estimate the percentage range of machine learning projects that fail due to poor problem framing.
Feedback: Correct! A significant percentage of machine learning projects fail due to poor problem
framing.
Default Feedback: Incorrect. Please review the statistics on why machine learning projects fail.
Question 32 - numeric
Estimate the percentage of machine learning projects that fail due to poor problem framing.
*A: 70.0
Feedback: Correct! Poor problem framing is a major reason why many machine learning projects fail.
Default Feedback: Incorrect. Consider the common reasons why machine learning projects fail.
Which of the following criteria is important when identifying good opportunities to apply machine
learning?
*A: The presence of a large amount of data
Feedback: Correct! The presence of a large amount of data is crucial for training machine learning
models.
Feedback: Not quite. Simplicity of the problem does not necessarily indicate a good opportunity for
machine learning.
Feedback: Incorrect. The lack of an existing solution does not alone justify the application of machine
learning.
Feedback: Incorrect. High computational cost is a consideration, but not a primary criterion for
identifying good machine learning opportunities.
Which of the following is a key consideration for framing machine learning problems?
Feedback: Correct! Clearly defining the problem is the first step in framing a machine learning problem.
Feedback: Incorrect. While having a large dataset is beneficial, it is not the primary consideration for
framing a problem.
Feedback: Incorrect. Using the latest algorithms is not a key consideration for problem framing.
Feedback: Incorrect. High computational power is more about implementation rather than problem
framing.
Which key element should primarily guide the initial phase of the data science process in an ML
project?
Feedback: Correct! Data collection and exploration are fundamental to understanding the problem space
and guiding subsequent steps.
B: Model deployment
Feedback: Not quite. Model deployment occurs much later in the process. Try to focus on the initial
steps.
Feedback: Incorrect. While important, user interface design is not part of the initial data science process.
D: Hyperparameter tuning
Feedback: Hyperparameter tuning is an advanced step that happens after building the initial models.
Consider what needs to happen first.
Which of the following best describes a scenario where machine learning would be more beneficial than
heuristics?
*A: A situation where a large amount of data is available and patterns need to be identified.
Feedback: Correct! Machine learning excels in scenarios where large datasets are involved and can
identify complex patterns that heuristics might miss.
B: A scenario where the rules to solve the problem are well-defined and do not change.
Feedback: Incorrect. Heuristics are better suited for well-defined and stable problems where the rules are
clear.
C: When the problem requires simple and quick decision-making with minimal data.
Feedback: Incorrect. Heuristics are more appropriate for quick and simple decision-making with
minimal data.
D: A situation where domain expertise is crucial and human judgment is required.
Feedback: Incorrect. Heuristics often rely on domain expertise and human judgment, whereas machine
learning leverages data to make decisions.
Feedback: Correct! Inadequate data quality is a common reason for machine learning project failure.
Feedback: Correct! Overfitting can cause the model to perform poorly on new data.
Feedback: Incorrect. The use of simple algorithms per se is not a reason for failure.
Feedback: Incorrect. High computational cost can be a challenge but not a direct reason for failure.
Feedback: Incorrect. Frequent updates to algorithms do not directly cause project failure.
Which of the following is a best practice when using heuristics in machine learning solution design?
Feedback: Correct! Heuristics should guide the process but not dictate it entirely.
Feedback: Relying solely on intuition can be risky. Balance it with data-driven insights.
Feedback: User feedback is crucial in the application of heuristics. Consider its importance.
Feedback: Great job! Understanding the problem and data is essential for defining a machine learning
problem.
Feedback: Rushing into coding might lead to solving an ill-posed problem. Consider reframing your
approach.
Feedback: Business objectives are crucial in defining machine learning problems. Revisit the importance
of alignment.
Feedback: While model accuracy is important, it's not the only factor in problem framing. Consider
other dimensions.
When should you consider transitioning from heuristics to machine learning in a project?
Feedback: Correct! Machine learning excels at identifying and dealing with complex patterns that
heuristics might miss.
B: When there is no available data to analyze.
Feedback: Incorrect. Machine learning relies on data to train models and make predictions.
Feedback: Not quite. Heuristics are often suitable for rule-based approaches, while machine learning is
more flexible.
Feedback: Incorrect. Machine learning often requires significant computational resources, unlike simple
heuristics.
*A: Machine learning can adapt to new data without human intervention.
Feedback: Correct! Machine learning models can learn from new data and improve over time, offering
flexibility and adaptability.
Feedback: Not quite. Heuristics are often simple rules of thumb and do not provide precise mathematical
models.
Feedback: Incorrect. Machine learning models require data to learn and make predictions.
Feedback: This is not necessarily true as machine learning models can be optimized for efficiency,
depending on the problem and data available.
*A: To ensure the idea meets a real user need and is feasible to implement.
Feedback: Correct! Validation ensures the product idea aligns with user needs and is technically
feasible.
Feedback: This isn't correct. Both technical feasibility and user needs must be considered in product
validation.
Feedback: Not quite. Skipping testing can lead to significant issues later in the development process.
Feedback: Reconsider this. While understanding competitor products is important, the focus should be
on unique user needs and feasibility.
What is an essential criterion for identifying a good opportunity to apply machine learning to a problem?
Feedback: Correct! Machine learning thrives on large datasets as they help improve the model's
accuracy and generalization.
Feedback: Not quite. While some problems may need less data, machine learning generally benefits
from larger datasets.
Feedback: Reconsider this. Machine learning is typically applied when traditional algorithms are
insufficient or inefficient.
Feedback: This is incorrect. Validation and testing are crucial steps in machine learning to ensure
models are effective and reliable.
Feedback: Correct! Defining the problem and project objectives is the crucial first step in the data
science process.
Feedback: Not quite. While collecting and cleaning data is essential, it comes after defining the problem
and objectives.
Feedback: Incorrect. Deployment is one of the final steps in the data science process.
Feedback: Not exactly. Model evaluation occurs after the model has been developed and tested.
Which of the following is an essential factor when framing a problem for a machine learning solution?
Feedback: Correct! Defining clear success metrics is essential for framing ML problems effectively.
Feedback: Not quite. It's more important to focus on problem relevance than the technology itself.
Feedback: Incorrect. Complexity is not a necessity; relevance and clarity are more critical.
Feedback: Not exactly. While data is important, storage maximization is not a primary concern when
framing ML problems.
Feedback: Correct! Without a clear problem definition, projects can easily go off track.
Feedback: While data security is important, it is not typically the primary reason for failure of ML
projects.
Feedback: Correct! Poor quality data can lead to unreliable models and failure of ML projects.
Feedback: Correct! Without proper testing and validation, models may not perform well in real-world
scenarios.
Feedback: Following best practices is usually beneficial, not a reason for failure.
Feedback: Correct! Machine learning can automate many tasks that were previously done manually,
increasing efficiency.
Feedback: Correct! Predictive analytics is a key benefit of machine learning, helping businesses
anticipate future outcomes.
Feedback: Incorrect. While machine learning can improve accuracy, 100% accuracy is rarely possible.
Feedback: Incorrect. Machine learning often requires human oversight and interpretation of results.
If a machine learning model's accuracy improves from 70% to 85%, what is the percentage increase in
accuracy?
*A: 15.0
Feedback: Correct! You've accurately calculated the improvement in the model's accuracy.
Default Feedback: Remember to calculate the difference between the two percentages to find the
increase.
Which of the following best describes the importance of iteration in managing machine learning
projects?
*A: Iteration allows for continuous improvement and refinement of the model.
Feedback: Correct! Iteration is crucial for continuous improvement and refinement of the model.
Feedback: Incorrect. Iteration is more about ongoing improvement rather than initial problem definition.
Feedback: Incorrect. Iteration does not reduce the need for documentation; in fact, it often increases it.
D: Iteration eliminates the need for ongoing support after project completion.
Feedback: Incorrect. Iteration does not eliminate the need for ongoing support after project completion.
Why is it important to have a proper process in place for machine learning projects?
*A: It ensures the project stays on track and meets its goals.
Feedback: Correct! A proper process helps in maintaining the direction and achieving the objectives of
the project.
Feedback: Incorrect. A proper process does not reduce the need for skilled personnel; it complements
their skills.
Feedback: Incorrect. While a proper process helps in planning, it does not guarantee timely completion.
Feedback: Incorrect. A proper process can mitigate risks but not eliminate them entirely.
Which of the following statements describe the iterative nature of machine learning projects?
Feedback: Incorrect. Machine learning projects are dynamic and models often need to be updated.
Feedback: Incorrect. Machine learning projects are less linear and more iterative in nature.
Which of the following is a key difference between machine learning projects and traditional software
projects?
Feedback: Correct! Machine learning projects generally involve higher technical risks compared to
traditional software projects.
Feedback: Incorrect. Machine learning projects often have more complex requirements.
Feedback: Incorrect. Machine learning projects typically require more ongoing support after
deployment.
Feedback: Incorrect. Machine learning projects are usually more iterative in nature compared to
traditional software projects.
Which of the following best explains why having a proper process, like CRISP-DM, is crucial for
machine learning projects?
*A: It ensures that all aspects of the project are systematically addressed.
Feedback: Correct! A proper process ensures that every step of the project is carefully handled, leading
to better outcomes.
Feedback: Incorrect. While efficiency is important, bypassing steps can lead to incomplete or flawed
results.
Feedback: Incorrect. While a proper process increases the likelihood of success, it does not guarantee
perfection.
D: It allows the team to skip the data cleaning phase.
Feedback: Incorrect. Data cleaning is a critical phase in the CRISP-DM process that cannot be skipped.
Which phase in the CRISP-DM process involves transforming raw data into a format suitable for
modeling?
Feedback: Correct! Data Preparation involves transforming raw data into a format suitable for modeling.
B: Data Understanding
Feedback: Incorrect. Data Understanding involves exploring the data and identifying patterns.
C: Modeling
Feedback: Incorrect. Modeling involves selecting and applying various modeling techniques.
D: Evaluation
Feedback: Incorrect. Evaluation involves assessing the models to ensure they meet the business
objectives.
What are some of the reasons behind the difficulty in managing machine learning projects?
Feedback: Correct! Managing machine learning projects often involves a diverse skill set.
Feedback: Incorrect. Machine learning projects generally have a higher degree of technical risk.
D: The straightforward nature of the work.
Feedback: Incorrect. The iterative and complex nature of machine learning work makes it challenging.
In a machine learning project team, who is typically responsible for deploying the model into
production?
*A: ML Engineer
Feedback: Correct! The ML Engineer is typically responsible for deploying the model into production.
B: Data Analyst
Feedback: Incorrect. The Data Analyst is more focused on analyzing data and generating insights.
C: Product Owner
Feedback: No. The Product Owner is responsible for maximizing the value of the product and may not
handle model deployment.
D: Business Analyst
Feedback: Not quite. The Business Analyst focuses on understanding business needs and requirements.
Which of the following statements are true about the iterative nature of machine learning projects?
*A: Machine learning projects often require multiple cycles of training and evaluation.
Feedback: Correct! Iteration is key in refining models and improving project outcomes.
Feedback: Correct! Early error detection and correction are significant benefits of an iterative approach.
D: Iteration in machine learning is unnecessary if the data is clean.
Feedback: Incorrect. Even with clean data, iteration is crucial for model improvement.
What is the main objective of the CRISP-DM process in a machine learning project?
*A: To provide a structured approach to planning and conducting data mining projects.
Feedback: Correct! The CRISP-DM process offers a structured framework for data mining projects.
Feedback: Incorrect. CRISP-DM does not eliminate the need for data preprocessing.
Which role is responsible for ensuring that the machine learning model aligns with business objectives?
A: Business Analyst
Feedback: Incorrect. While Business Analysts help interpret data, they do not ensure that the model
aligns with business objectives.
B: Data Scientist
Feedback: Incorrect. Data Scientists develop the model but do not necessarily ensure its alignment with
business objectives.
Feedback: Correct! The Product Owner ensures that the machine learning model aligns with business
objectives.
D: Software Engineer
Feedback: Incorrect. Software Engineers implement the model but are not responsible for ensuring its
alignment with business objectives.
Which phase of the CRISP-DM process involves data cleaning and transformation?
Feedback: Correct! The Data Preparation phase includes data cleaning and transformation to prepare the
data for modeling.
B: Business Understanding
Feedback: Incorrect. The Business Understanding phase focuses on understanding the project objectives
and requirements from a business perspective.
C: Modeling
Feedback: Incorrect. The Modeling phase involves selecting and applying various modeling techniques
to the prepared data.
D: Evaluation
Feedback: Incorrect. The Evaluation phase assesses the model to ensure it meets the business objectives.
What is the primary purpose of following the CRISP-DM process in machine learning projects?
Feedback: Correct! Following the CRISP-DM process provides a standardized approach to data mining,
which helps in maintaining consistency and quality throughout the project.
Feedback: Incorrect. While accuracy is important, the CRISP-DM process focuses on standardizing the
approach rather than guaranteeing model accuracy.
C: To reduce the computational cost of algorithms.
Feedback: Incorrect. The CRISP-DM process does not primarily focus on reducing computational costs
but rather on standardizing the data mining process.
Feedback: Incorrect. Data preprocessing is a crucial step within the CRISP-DM process itself.
Which tool is commonly used for version control in machine learning projects?
*A: Git
Feedback: Correct! Git is a widely-used version control system that helps manage changes to code and
other project files.
B: JIRA
Feedback: Incorrect. JIRA is mainly used for project management and tracking tasks.
C: Docker
D: Kubernetes
Feedback: Incorrect. Kubernetes is used for container orchestration, not version control.
Which of the following is a key difference between machine learning projects and traditional software
projects?
Feedback: Correct! Machine learning projects often involve a higher degree of technical risk due to the
complexity and unpredictability of the models.
Feedback: Incorrect. Machine learning projects may require a broader set of skills, leading to the
involvement of more team members, not fewer.
Feedback: Incorrect. Machine learning projects generally require more ongoing support after
deployment to ensure models remain accurate over time.
Which team member is generally responsible for data preprocessing in a machine learning project?
Feedback: Correct! Data Engineers are typically in charge of data preprocessing to ensure the data is
ready for analysis and model training.
B: Project Manager
Feedback: Incorrect. Project Managers oversee the entire project but are not typically responsible for
data preprocessing.
C: Product Owner
Feedback: Incorrect. Product Owners define the business requirements and priorities but are not
involved in data preprocessing.
D: Business Analyst
Feedback: Incorrect. Business Analysts help interpret data and provide insights but do not usually
handle data preprocessing.
Feedback: Correct! The iterative nature of the work is a significant challenge in managing machine
learning projects.
Feedback: Incorrect. Managing machine learning projects requires a broad set of skills.
Feedback: Correct! Ongoing support after project completion is another challenge in managing machine
learning projects.
Which of the following is a key component of the business understanding phase in managing machine
learning projects?
Feedback: Correct! Problem definition is a crucial component in the business understanding phase.
B: Data collection
Feedback: Incorrect. Data collection is important, but it is part of the data preparation phase.
C: Model tuning
Feedback: Incorrect. Model tuning occurs later in the process, not in the business understanding phase.
D: Algorithm selection
Feedback: Incorrect. Algorithm selection is part of the modeling phase, not the business understanding
phase.
In the CRISP-DM process, which phase involves assessing the performance and validity of the model?
*A: Evaluation
Feedback: Correct! The evaluation phase is where the model's performance and validity are thoroughly
assessed.
B: Modeling
Feedback: Incorrect. The modeling phase involves building the model, not assessing its performance.
C: Deployment
Feedback: Incorrect. The deployment phase involves putting the model into production, not assessing its
performance.
D: Data Preparation
Feedback: Incorrect. The data preparation phase involves cleaning and transforming data, not assessing
model performance.
Which of the following skills is often required for managing machine learning projects but not typically
for traditional software projects?
Feedback: Correct! Data engineering is crucial for managing machine learning projects due to the need
for handling large datasets.
Feedback: Incorrect. User interface design is important but not exclusive to machine learning projects.
Consider the unique roles in ML projects.
C: System administration
Feedback: Incorrect. System administration is important for maintaining IT infrastructure but not unique
to ML projects.
D: Customer support
Feedback: Incorrect. Customer support is essential for all types of projects but not a unique requirement
for ML projects.
Question 69 - multiple choice, shuffle
Which team member is responsible for ensuring that the machine learning model aligns with business
goals and delivers value?
Feedback: Correct! The Product Manager ensures that the model aligns with business goals and delivers
value.
B: Data Engineer
Feedback: Not quite. The Data Engineer is responsible for data infrastructure and pipelines.
C: ML Engineer
Feedback: Incorrect. The ML Engineer focuses on designing and implementing the machine learning
model itself.
D: Data Scientist
Feedback: No. While the Data Scientist builds and tests models, ensuring alignment with business goals
is the Product Manager's responsibility.
Which roles are typically involved in a machine learning project's lifecycle? Select all that apply.
Feedback: Correct! Data Scientists are crucial for developing models and analyzing data.
B: UI/UX Designer
Feedback: Incorrect. UI/UX Designers are generally not involved in machine learning projects.
Feedback: Correct! Project Managers oversee the project's progress and ensure it meets business
objectives.
E: Content Writer
Feedback: Incorrect. Content Writers are generally not part of a machine learning project team.
Which of the following are important tools for collaboration in machine learning projects? Select all that
apply.
Feedback: Correct! Version control systems like Git are essential for managing changes and
collaboration.
Feedback: Correct! IDEs facilitate coding and debugging, and some also support real-time collaboration.
Feedback: Incorrect. ERP systems are used for business management, not specifically for collaboration
in machine learning projects.
Feedback: Correct! Communication platforms like Slack or Teams are crucial for effective
collaboration.
Feedback: Incorrect. While important for ensuring code quality, automated testing tools are not
primarily used for collaboration.
Question 72 - numeric
How many steps are there in the CRISP-DM process in data science and machine learning?
*A: 6.0
Feedback: Correct! The CRISP-DM process consists of six steps.
Default Feedback: Incorrect. Please refer to the course material on the steps in the CRISP-DM process.
Question 73 - numeric
During a machine learning project, how many key phases are there in the iterative process of model
development?
*A: 5.0
Feedback: Correct! There are five key phases in the iterative process of model development.
Default Feedback: Incorrect. Please review the course material on the iterative nature of machine
learning projects.
Question 74 - numeric
In managing machine learning projects, what is the typical range of weeks required for the iterative
model training and evaluation phase?
Feedback: Correct! The iterative model training and evaluation phase typically takes between 2 to 12
weeks, depending on project complexity.
Default Feedback: Incorrect. The iterative model training and evaluation phase usually spans several
weeks, requiring ongoing adjustments and evaluations.
Question 75 - numeric
*A: 6.0
Default Feedback: Incorrect. Please refer to the CRISP-DM process for the number of phases.
Question 76 - numeric
Question category: Module: Organizing ML Projects
During the evaluation phase of a machine learning project, a team decides to track the precision metric.
If the number of true positives is 75 and the number of false positives is 25, what is the precision value?
*A: 0.75
Feedback: Correct! Precision is calculated as the number of true positives divided by the sum of true
positives and false positives.
Default Feedback: Incorrect. Recall that precision is the number of true positives divided by the sum of
true positives and false positives.
Question 77 - numeric
During model evaluation, a data scientist achieved an accuracy of how many percent if the model
correctly classified 85 out of 100 samples?
*A: 85.0
Feedback: Correct! The accuracy is 85% when 85 out of 100 samples are correctly classified.
Default Feedback: Incorrect. Remember to calculate the percentage of correctly classified samples based
on the total number of samples.
What is the first phase of the CRISP-DM process? Please answer in all lowercase.
*A: business
Feedback: Correct! The first phase of the CRISP-DM process is Business Understanding.
*B: businessunderstanding
Feedback: Correct! The first phase of the CRISP-DM process is Business Understanding.
*A: iterative
Feedback: Correct! Machine learning projects are iterative, meaning they involve repeated cycles of
improvement.
*B: adaptive
Feedback: Correct! Machine learning projects are adaptive, meaning they adjust based on new data and
findings.
Default Feedback: Think about how machine learning projects handle changes and improvements
compared to traditional software projects.
Question 80 - numeric
In a machine learning project, how often should performance metrics be tracked during the model
development phase? (Provide your answer in weeks)
*A: 2.0
Feedback: Correct! Tracking performance metrics every 2 weeks helps monitor progress and make
necessary adjustments.
Default Feedback: Incorrect. Consider how frequently you need to monitor progress to make iterative
improvements.
Which of the following is a key challenge associated with managing machine learning projects?
Feedback: Correct! Managing machine learning projects often involves iterative processes which require
continuous refinement and adjustments.
Feedback: Incorrect. Ongoing support for ML projects is often complex and challenging.
Which of the following roles is typically involved at every stage of a machine learning project life
cycle?
Feedback: Correct! The Project Manager is typically involved at every stage of the machine learning
project life cycle, ensuring that the project stays on track and meets its goals.
B: Data Engineer
Feedback: Not quite. Data Engineers are usually more involved during the data collection and
preprocessing stages.
Feedback: Incorrect. Machine Learning Researchers are primarily involved during the model
development and evaluation stages.
D: Business Analyst
Feedback: No. While Business Analysts are important, they are usually more involved in the initial
stages of defining the problem and the final stages of evaluating business outcomes.
What is the main advantage of following the CRISP-DM process in a machine learning project?
*A: It provides a structured approach that ensures comprehensive project planning and execution.
Feedback: Correct! The CRISP-DM process offers a structured approach to project planning and
execution, which is essential for the success of machine learning projects.
B: It guarantees the highest accuracy for the machine learning model.
Feedback: Not quite. While the CRISP-DM process helps in planning and execution, it does not
guarantee the highest accuracy for the model. Accuracy depends on various factors including data
quality and model choice.
Feedback: Incorrect. Data preprocessing is a crucial step in the CRISP-DM process and cannot be
eliminated.
Feedback: Incorrect. The CRISP-DM process does not directly affect the computational resources
required for model training.
Select all the performance metrics that are commonly tracked throughout a machine learning project.
*A: Accuracy
Feedback: Correct! Accuracy is a common performance metric tracked in machine learning projects.
*B: Precision
Feedback: Correct! Precision is also commonly tracked to understand the performance of a model.
*C: Recall
D: User Engagement
Feedback: Incorrect. User Engagement is more of a business metric rather than a direct performance
metric in machine learning.
E: Revenue
Feedback: No. Revenue is an outcome metric and is not directly used to track the performance of a
machine learning model.
Which of the following statements are true about the iterative nature of machine learning projects?
*A: Machine learning projects often require revisiting and refining earlier steps.
Feedback: Correct! Machine learning projects are iterative and often require revisiting and refining
earlier steps.
Feedback: Incorrect. Even after deployment, models may require adjustments based on new data and
performance monitoring.
*C: The iterative process helps in improving model performance over time.
Feedback: Correct! The iterative process allows for continuous improvement of the model based on
feedback and new data.
D: Iterative processes are only necessary during the initial phases of the project.
Feedback: Incorrect. Iterative processes are necessary throughout the lifecycle of the project, not just in
the initial phases.
E: Iteration is optional and can be skipped if the initial model performs well.
Feedback: Incorrect. Iteration is a crucial aspect of machine learning projects and should not be skipped,
even if the initial model performs well.
What is the acronym for the process model commonly used in data mining and machine learning
projects? Please answer in all lowercase.
*A: crisp-dm
Feedback: Correct! CRISP-DM stands for Cross Industry Standard Process for Data Mining.
*B: crispdm
Feedback: Correct! CRISP-DM stands for Cross Industry Standard Process for Data Mining.
*C: crisp_dm
Feedback: Correct! CRISP-DM stands for Cross Industry Standard Process for Data Mining.
Default Feedback: Incorrect. Please review the course material on the process models used in data
mining and machine learning projects.
What is the main purpose of the CRISP-DM process in machine learning projects?
Feedback: Correct! The CRISP-DM process helps in organizing and structuring data mining projects.
Feedback: Not quite. The CRISP-DM process focuses on structuring data mining projects, not coding
guidelines.
Feedback: Incorrect. CRISP-DM is a process for data mining, not a replacement for software
development methodologies.
Feedback: Correct! Iterative processes facilitate continuous improvement and adaptation in projects.
Feedback: Not quite. Iterative processes are about gradual improvement, not perfection from the start.
Feedback: Incorrect. Iterative processes are flexible and allow changes even after a phase is completed.
D: They strictly follow a linear path without loops.
Feedback: This isn't right. Iterative processes are characterized by loops and refinement, not a strict
linear path.
In a machine learning project team, which role is primarily responsible for maintaining the infrastructure
and updating the environment?
A: Data Scientist
Feedback: Data scientists focus on model development and data analysis, not infrastructure.
Feedback: Correct! DevOps engineers manage the infrastructure and ensure smooth operations.
C: Product Manager
Feedback: Product managers focus on project vision and requirements, not infrastructure.
D: UX Designer
What differentiates outcome metrics from output metrics in a machine learning project?
*A: Outcome metrics measure the impact on business goals, while output metrics measure the technical
performance of the model.
Feedback: Correct! Outcome metrics relate to business impact, while output metrics focus on technical
performance.
D: Outcome metrics require external validation, while output metrics are internally validated.
Feedback: Validation isn't the sole differentiator between outcome and output metrics.
Which of the following is a key reason for the difficulty in managing machine learning projects?
Feedback: Good try! While scope can be challenging, it's more about the nature of technological risk.
Feedback: Correct! Machine learning projects come with inherent technical risks that make them
difficult to manage.
Feedback: Not quite. Ongoing support is crucial for the success of machine learning projects.
Feedback: Automated processes are part of the work, but human intervention and understanding are
crucial.
Which of the following is crucial for the success of machine learning projects in terms of ongoing
processes?
Feedback: Correct! Iteration and documentation are crucial for machine learning projects.
Which of the following is a key component in the business understanding phase of a machine learning
project?
Feedback: Correct! Defining the problem statement is crucial for setting the direction of the project.
Feedback: Not quite. Algorithm selection comes later in the modeling phase.
Feedback: Not quite. Data gathering and cleaning is part of the data preparation phase.
Feedback: Not quite. Model evaluation occurs after modeling and testing.
Select all the steps that are part of the CRISP-DM process.
Feedback: Correct! Understanding the data is a crucial step in the CRISP-DM process.
B: Data warehousing
Feedback: Incorrect. Data warehousing is not a specific step in the CRISP-DM process.
*C: Evaluation
Feedback: Correct! Deployment is the final step where the model is put into use.
E: System integration
Feedback: Not quite. System integration is not explicitly part of the CRISP-DM process.
Which of the following are considered performance metrics in a machine learning project?
*A: Precision
Feedback: Correct! Precision is a performance metric that measures the accuracy of positive predictions.
B: User Satisfaction
*C: Recall
Feedback: Correct! Recall measures the ability of a model to find all the relevant cases.
D: Financial Cost
Feedback: Financial cost is typically an outcome metric and not a direct performance metric.
*E: F1 Score
Which of the following are challenges associated with managing machine learning projects? Select all
that apply.
Feedback: Correct! Technical risks are a significant challenge in machine learning projects.
Which of the following challenges are associated with managing machine learning projects?
Feedback: Correct! Managing machine learning projects involves a higher degree of technical risk
compared to traditional software projects.
Feedback: Correct! The iterative nature of machine learning work is a significant challenge in project
management.
Feedback: Not quite. Machine learning projects require extensive documentation throughout the project
lifecycle.
Feedback: Not quite. Successful machine learning projects require a broader set of skills beyond just
data scientists.
Feedback: Correct! Ongoing support is necessary after project completion to ensure models remain
accurate and relevant.
Question 98 - numeric, medium
If a machine learning model goes through three iterations in a project, and each iteration improves
accuracy by 5%, starting from 60%, what will the final accuracy be?
Feedback: Great job! You correctly calculated the final accuracy after three iterations.
Default Feedback: It seems there was a miscalculation. Consider how compounding improvements work
in iterative processes.
Feedback: Correct! Collecting sufficient data helps the model capture the underlying patterns in the data.
Feedback: Incorrect. While data cleaning is important, collecting sufficient data is more about capturing
patterns.
Feedback: Incorrect. Collecting more data does not necessarily reduce model complexity.
Feedback: Incorrect. Collecting more data does not necessarily minimize noise; it helps in capturing
underlying patterns.
Which of the following is an essential factor to consider when evaluating data needs for a specific
machine learning project?
Feedback: Incorrect. The color of data visualization tools is not a factor in evaluating data needs.
Feedback: Incorrect. The physical location of a data scientist does not impact data needs for a project.
Feedback: Incorrect. The brand of computer used does not affect the evaluation of data needs.
What is a key benefit of collecting data with the right set of features for a Machine Learning project?
Feedback: Correct! Having the right set of features can significantly improve the accuracy of the model.
Feedback: Incorrect. The right features help in model performance but do not directly reduce the dataset
size.
Feedback: Not exactly. While important, the right set of features doesn't directly simplify data cleaning.
Feedback: Incorrect. While good features can help with visualization, the primary benefit is improving
model performance.
Which of the following are strategies to collect data to support modeling efforts?
B: Random guessing
Feedback: Correct! User surveys can provide valuable data for modeling.
D: Ignoring outliers
Feedback: Incorrect. Ignoring outliers is related to data cleaning, not data collection.
*A: Imputation
Feedback: Correct! Imputation is a common method for handling missing data by filling in missing
values.
B: Data encryption
Feedback: No, data encryption is used for securing data, not handling missing data.
C: Data sorting
Feedback: Incorrect. Data sorting does not address missing data issues.
D: Feature scaling
Feedback: No, feature scaling is used for normalizing data, not handling missing data.
Feedback: Incorrect. While using the latest hardware and software might improve performance, it does
not ensure reproducibility.
Feedback: Incorrect. Complex model architectures are not inherently related to reproducibility.
Feedback: Incorrect. Reproducibility does not reduce the need for data preprocessing.
Feedback: Correct! Ensuring data is representative helps avoid biases and ensures the model's
predictions are fair across different segments.
Feedback: Not quite. While important, the representativeness of data isn't directly related to the speed of
data processing.
Feedback: Incorrect. Ensuring data is representative does not directly affect the storage requirements.
Feedback: No, simplifying algorithms is not a direct result of having representative data.
Which of the following is a challenge when using user data in recommendation systems?
*A: Data privacy concerns
Feedback: Correct! Data privacy concerns are a significant challenge when using user data in
recommendation systems.
Feedback: Not quite. While storage costs can be an issue, they are not the primary challenge in
recommendation systems related to user data.
Feedback: Incorrect. Inconsistent data formats can be a challenge, but they are not the biggest concern
when using user data in recommendation systems.
Feedback: Not quite. The lack of data variety is a concern, but not the main challenge when it comes to
user data in recommendation systems.
Feedback: Correct! Surveys and questionnaires are common methods of collecting data for modeling
efforts.
Feedback: Incorrect. Privacy concerns should always be addressed when collecting data.
Feedback: Incorrect. While synthetic data can be useful, relying solely on it can limit the model's
applicability to real-world scenarios.
Which of the following is a best practice for collecting data in a machine learning project?
Feedback: Correct! Ensuring data is representative of the target population helps improve the model's
performance and generalizability.
Feedback: Incorrect. Quality is crucial in data collection, not just quantity. Poor quality data can
negatively impact your model.
Feedback: Incorrect. Convenience sampling can lead to biased data. It's important to use diverse and
relevant data sources.
Feedback: Incorrect. Outliers can contain important information and should be carefully considered, not
ignored.
Which method is used to handle missing data by filling in with the mean, median, or mode of the
column?
*A: Imputation
Feedback: Correct! Imputation involves filling in missing data with statistical values like mean, median,
or mode.
B: Reduction
Feedback: Incorrect. Data reduction doesn't refer to handling missing data this way.
C: Aggregation
Feedback: Incorrect. Aggregation involves combining data, not filling in missing values.
D: Transformation
Feedback: Incorrect. Transformation involves changing the data format or structure, not addressing
missing values.
What challenge is often faced when using user data in recommendation systems?
Feedback: Correct! Privacy concerns are a major challenge when using user data.
Feedback: Incorrect. While important, the availability of high-performance computers is not specific to
user data in recommendation systems.
Feedback: Incorrect. Access to large datasets is a broader challenge, not specific to user data in
recommendation systems.
Feedback: Incorrect. The choice of programming language is not a primary challenge in using user data
in recommendation systems.
Which of the following strategies is most effective for collecting high-quality data for machine learning
models?
A: Crowdsourcing
Feedback: Incorrect. Crowdsourcing can provide a lot of data, but the quality may vary.
B: Web scraping
Feedback: Incorrect. While web scraping can collect large amounts of data, the quality and reliability
can be questionable.
C: Data augmentation
Feedback: Incorrect. Data augmentation is used to increase the amount of data but not necessarily to
improve its quality.
Feedback: Correct! Manual data collection typically ensures high-quality, reliable data for machine
learning models.
Which of the following is a common challenge faced by data scientists when accessing data within
larger organizations?
Feedback: Incorrect. While important, lack of visualization tools is not the main challenge for data
access.
C: Budget constraints
Feedback: Incorrect. Budget constraints can be a challenge, but they are not specific to data access.
Feedback: Incorrect. The availability of machine learning algorithms is not the primary challenge in
accessing data.
What is a common challenge data scientists face when accessing data within larger organizations?
Feedback: Correct! Data silos restrict access to data within different parts of an organization.
Feedback: Incorrect. Inadequate data visualization tools can be an issue, but data silos are a more
fundamental challenge.
Feedback: No. While cloud storage is important, data silos are a more prevalent challenge in larger
organizations.
Which of the following is crucial for maintaining version control in a machine learning project?
Feedback: Correct! Using version control systems like Git is essential for tracking changes and
collaborating effectively.
Feedback: Incorrect. While updating the dataset is important, it doesn't help with version control.
Feedback: Incorrect. Hyperparameter tuning optimizes model performance but isn't related to version
control.
Feedback: Incorrect. Advanced hardware speeds up training but does not assist in maintaining version
control.
Why is it important to ensure that data used in a Machine Learning project is representative of the
problem domain?
Feedback: Incorrect. Ensuring data representativeness does not necessarily minimize the amount of data
needed for training.
Which of the following sources is most commonly used for collecting training data in a machine
learning project?
Feedback: Correct! Public datasets are commonly used as they are readily available and cover a wide
range of topics.
B: Personal anecdotes
Feedback: Incorrect. Personal anecdotes are not typically used as they lack the necessary scale and
objectivity.
C: Books
Feedback: Incorrect. Books are more useful for theoretical knowledge rather than providing training
data.
D: Manual measurements
Feedback: Incorrect. Manual measurements are less common due to being time-consuming and prone to
human error.
Which of the following practices are important for ensuring reproducibility in a Machine Learning
project?
Feedback: Correct! Versioning the dataset helps in keeping track of changes and ensures reproducibility.
Feedback: Correct! Collaboration tools facilitate team communication and ensure everyone is on the
same page, aiding reproducibility.
Feedback: Incorrect. Ignoring data quality can lead to inconsistent results, undermining reproducibility.
Feedback: Correct! Good code documentation is essential for reproducibility as it helps others
understand and reproduce the results.
Feedback: This is partially correct but not directly related to ensuring reproducibility.
What are some best practices for ensuring reproducibility in machine learning projects?
Feedback: Correct! Proper documentation helps in tracking changes and understanding the workflow.
Feedback: Correct! Data lineage allows you to trace the origin and transformation of data.
C: Random testing
Feedback: Incorrect. Random testing is not a best practice for ensuring reproducibility.
*D: Versioning
Feedback: Correct! Versioning helps in managing different stages and iterations of the project.
Feedback: No, frequent team meetings are good for communication but do not directly ensure
reproducibility.
Identify the different types of data that are needed for a machine learning project.
Feedback: Correct! Training data is essential for building machine learning models.
Feedback: Correct! Testing data is used to evaluate the performance of machine learning models.
C: Backup data
Feedback: Incorrect. Backup data is not a specific type of data needed for machine learning projects.
E: Noise data
Feedback: Incorrect. Noise data is not a type of data needed for machine learning projects.
Which of the following practices are crucial for maintaining reproducibility in a Machine Learning
project?
Feedback: Correct! Versioning datasets and code is essential for ensuring that experiments can be
replicated.
Feedback: Correct! Collaborative tools facilitate communication and ensure that everyone is on the same
page, enhancing reproducibility.
Feedback: Incorrect. While updating models is important, it is not directly related to reproducibility.
Feedback: Correct! Thorough documentation is vital for reproducing experiments and understanding
their outcomes.
Which of the following are best practices for ensuring reproducibility in machine learning projects?
Feedback: Correct! Tracking data lineage helps in understanding the origin and changes to the data.
Feedback: Incorrect. Using proprietary software can hinder reproducibility because it may not be
accessible to all.
*D: Versioning
Feedback: Correct! Versioning ensures that different versions of data and code are tracked and
reproducible.
Which of the following are best practices for ensuring reproducibility in machine learning projects?
E: Ignoring outliers
Select all practices that are important for ensuring data quality in a Machine Learning project.
Feedback: Correct! Cleaning the data helps in removing anomalies that can affect the model's
performance.
Feedback: Correct! Collecting data from multiple sources can provide a more comprehensive dataset.
Feedback: Incorrect. Ignoring missing values can lead to inaccurate models. Missing data should be
handled appropriately.
Feedback: Incorrect. Using outdated data can lead to models that do not reflect current trends or
patterns.
If you have a dataset with 500 records and 30% of them are deemed to be noisy or incorrect, how many
records are considered good quality?
*A: 350.0
Feedback: Correct! If 30% of 500 records are noisy or incorrect, then 70% of them are good quality,
which is 350 records.
Default Feedback: Incorrect. Consider reviewing how to calculate the percentage of good quality records
from the total dataset.
What is the term used to describe the input variables in a machine learning model? Please answer in all
lowercase. Please answer in all lowercase.
*A: features
*B: predictors
Default Feedback: Incorrect. Please review the terms used for input variables in machine learning
models.
What is the term used to describe the practice of tracing the origin and transformations applied to data?
Please answer in all lowercase.
*A: lineage
Feedback: Correct! Lineage is the term used for tracing the origin and transformations applied to data.
Feedback: Correct! Data lineage is the practice of tracing the origin and transformations applied to data.
Default Feedback: No, that's not the correct term. Please review the lesson on data lineage.
What is the term for converting string variables into numerical codes for machine learning models?
Please answer in all lowercase.
*A: encoding
Feedback: Correct! Encoding is the process of converting string variables into numerical codes.
B: labeling
C: mapping
Default Feedback: Incorrect. This term refers to converting categorical data into a numerical format.
What is the term for errors or biases in data collection that can lead to skewed model outcomes? Please
answer in all lowercase.
*A: samplingbias
Feedback: Correct! Sampling bias refers to errors or biases in data collection that can lead to skewed
model outcomes.
*B: samplingerror
Feedback: Correct! Sampling error is another term for errors in data collection that affect model
outcomes.
Default Feedback: Incorrect. This term refers to errors or biases in data collection that affect model
outcomes.
What is the term used to describe the output variable in a machine learning model? Please answer in all
lowercase.
*A: label
Feedback: Correct! The output variable in a machine learning model is called the label.
*B: labels
Feedback: Correct! The output variables in a machine learning model are called labels.
Default Feedback: Incorrect. The output variable in a machine learning model is known as the label.
What is the term used to describe the process of removing errors and inconsistencies from data in a
Machine Learning project? Please answer in all lowercase.
*A: cleaning
Feedback: Correct! Cleaning refers to the process of removing errors and inconsistencies from data.
*B: preprocessing
Feedback: Correct! Preprocessing can also refer to data cleaning in some contexts.
*C: cleansing
Default Feedback: Incorrect. Please review the lesson on data cleaning to understand the importance and
process of removing errors and inconsistencies from data.
What is the term used to describe the input variable in a machine learning model? Please answer in all
lowercase. Please answer in all lowercase.
*A: feature
*B: attribute
Feedback: Correct! The input variable is also known as an attribute in machine learning.
Default Feedback: Incorrect. Please review the lesson materials on machine learning terminology.
What is the process of identifying and correcting errors in the dataset called? Please answer in all
lowercase.
Feedback: Correct! Data cleaning is the process of identifying and correcting errors in the dataset.
*B: datacleaning
Feedback: Correct! Data cleaning is the process of identifying and correcting errors in the dataset.
*C: cleaning
Feedback: Correct! Data cleaning is the process of identifying and correcting errors in the dataset.
*D: cleansing
Feedback: Correct! Data cleaning is the process of identifying and correcting errors in the dataset.
Default Feedback: Incorrect. Please review the materials on data preprocessing to find the term.
Which of the following is a best practice for ensuring reproducibility in machine learning projects?
*A: Proper documentation
Feedback: Correct! Proper documentation helps in tracking changes and understanding the workflow.
Feedback: Incorrect. Using outdated software can lead to compatibility issues and unreliable results.
Feedback: Incorrect. Data lineage is crucial for understanding the data's origin and transformations.
Feedback: Incorrect. Version control is essential for tracking changes and collaborative work.
Which of the following is a best practice for collecting data for a machine learning project?
Feedback: Correct! Ensuring data quality and consistency is crucial for building reliable machine
learning models.
Feedback: Not quite. While having more data can be beneficial, relevance and quality of data are more
important.
Feedback: Incorrect. Data privacy concerns should always be taken into account when collecting data.
Feedback: Incorrect. Using multiple sources of data can provide a more comprehensive dataset and
improve model performance.
Feedback: Correct! Ensuring data is representative helps the model generalize well to new data.
Feedback: Incorrect. While reducing computational cost is important, ensuring data is representative is
crucial for model performance.
Feedback: Incorrect. Increasing complexity does not necessarily relate to the representativeness of the
data.
Feedback: Incorrect. The speed of data collection is not related to the representativeness of the data.
Identify the factors that influence the amount of data required for a machine learning project.
Feedback: Correct! The complexity of the model influences the amount of data required.
Feedback: Incorrect. While cost of data storage is a consideration, it does not directly influence the
amount of data required for modeling.
Feedback: Correct! Variability in data is an important factor in determining the amount of data needed.
Feedback: Incorrect. Time available for data collection is a logistical factor but does not influence the
amount of data required for a project.
Feedback: Correct! The number of features in the dataset influences the amount of data needed for a
project.
Question 137 - text match
What is the term used to describe the process of converting string variables into numerical codes for use
in machine learning models? Please answer in all lowercase.
*A: encoding
Feedback: Correct! Encoding is the process of converting string variables into numerical codes.
B: labeling
Feedback: Incorrect. Labeling refers to assigning labels to data points, not converting string variables to
numerical codes.
C: transcoding
Feedback: Incorrect. Transcoding generally refers to converting data from one format to another, but not
specifically string to numerical codes.
D: binarization
Feedback: Incorrect. Binarization is a specific type of encoding, often used for binary classification, but
not the general term for converting string variables to numerical codes.
Default Feedback: Incorrect. This term is essential for handling categorical data in machine learning.
What is the term for the output variable in a machine learning model? Please answer in all lowercase.
*A: label
Feedback: Correct! The output variable in a machine learning model is commonly referred to as the
label.
*B: target
Feedback: Correct! The output variable in a machine learning model is also known as the target.
Default Feedback: Incorrect. The output variable in a machine learning model has a specific term. Please
review the course material and try again.
Question 139 - numeric
If a dataset has 50 features, what is the minimum recommended number of data points (samples) to
ensure a robust machine learning model?
Feedback: Correct! A common recommendation is to have at least 10 to 20 times the number of features
as data points.
Default Feedback: Not quite. Please review the guidelines for the minimum recommended number of
data points based on the number of features.
Feedback: While reducing overfitting is important, ensuring data representativeness primarily addresses
generalization to new data.
Feedback: Correct! Representative data ensures that the model's findings can be generalized to new data.
Feedback: Data collection speed is not typically affected by the representativeness of data.
Which of the following are best practices for collecting data for machine learning?
Feedback: Correct! Using diverse sources helps in capturing various aspects of the problem.
Feedback: Incorrect. Ignoring outliers without analysis can lead to biased models.
What is a critical factor that influences the amount of data required for a machine learning project?
Feedback: Correct! More complex models often require more data to generalize well.
Feedback: The color of data entries doesn't typically affect data requirements.
Feedback: The data collector's age doesn't affect the amount of data needed.
Which of the following are strategies to ensure reproducibility in machine learning projects?
Feedback: Correct! Using version control for datasets is a key strategy for ensuring reproducibility.
B: Avoiding documentation to save time
Feedback: Incorrect. Proper documentation is crucial to reproducibility, without it, others cannot
replicate your work.
Feedback: Correct! Data lineage helps track data sources, which is essential for reproducibility.
What is a common method for handling missing data in a machine learning dataset?
Feedback: Correct! Imputation using the mean value is a common method to handle missing data.
Feedback: Not quite. While ignoring missing data is sometimes viable, it’s not usually recommended as
it may lead to biased results.
Feedback: Incorrect. Filling missing values with zeros can introduce bias, especially if zero is not a
plausible value.
Feedback: Incorrect. Randomly generating new data is not a standard practice for handling missing data
as it can distort the dataset.
Which of the following strategies is most effective for identifying features and labels in training data?
Feedback: Domain expertise is crucial for labeling, but this strategy is not comprehensive for identifying
features.
Feedback: Automation can aid in feature selection but isn't solely effective for identifying labels.
Feedback: Correct! Combining both expert knowledge and data-driven methods provides a balanced
approach.
B: Atmospheric data
Feedback: While atmospheric data may be useful in some contexts, it is not essential for
recommendations.
C: Manufacturing data
D: Geological data
Which of the following practices contribute to maintaining high data quality in a Machine Learning
project?
A: Collect data from a single source to ensure consistency
Feedback: Correct! Regular updates ensure the data is current and relevant.
What is a key method to ensure data remains unbiased in a Machine Learning project?
Feedback: This is important, but doesn't specifically ensure the data is unbiased.
Which of the following are best practices for ensuring reproducibility in machine learning projects?
Feedback: Correct! Understanding the origin and changes in data contributes to reproducibility.
C: Data encryption
*D: Versioning
Feedback: This doesn't necessarily ensure reproducibility; open-source tools are often more transparent.
Which of the following are considered challenges when using user data in recommendation systems?
Feedback: While storage can be a concern, it's not a primary challenge specific to recommendation
systems.
Feedback: While cost is a consideration, it is not uniquely challenging for recommendation systems.
If a Machine Learning model requires a dataset with at least 1,000 entries to achieve reliable predictions,
what is the minimum number of new data entries needed if the current dataset contains 750 entries?
*A: 250.0
Feedback: Correct! You need 250 more entries to reach the minimum of 1,000.
Default Feedback: Think about how many more entries are needed to reach the dataset requirement.
When evaluating machine learning models, which key factor should be considered to prevent
overfitting?
*A: Regularization
B: Scalability
Feedback: Not quite. Scalability is important but not directly related to overfitting.
C: Accuracy
Feedback: Accuracy is a fundamental metric but does not address overfitting directly.
D: Interoperability
What is one of the primary benefits of using Jupyter as a computational notebook in data science?
Feedback: Correct! Jupyter's support for multiple programming languages allows data scientists to use
the best tool for the job.
Feedback: Incorrect. Jupyter supports data visualization, but it typically requires some coding.
Feedback: Correct! Scheduling model training on a fixed schedule allows for predictable resource
allocation.
Feedback: Incorrect. Fixed schedules do not necessarily reduce latency in real-time applications.
Feedback: Correct! It simplifies maintenance as updates and checks can be planned in advance.
Feedback: Incorrect. Fixed schedules do not provide increased flexibility in handling sudden data spikes.
What is an important technology decision to consider when building machine learning applications?
Feedback: Incorrect. The marketing strategy is not related to the technical aspects of building machine
learning applications.
Feedback: Incorrect. Social media presence is not a technical decision for machine learning applications.
Feedback: Correct! Jupyter integrates various data manipulation processes, making it highly useful in
data science.
Feedback: Incorrect. While Jupyter is useful for developing models, deployment typically occurs on
other platforms.
Feedback: Incorrect. Jupyter does not provide a built-in database for data storage.
Feedback: Incorrect. Jupyter itself does not inherently ensure data privacy and security.
Which of the following best describes a scenario where cloud machine learning is preferred over edge
machine learning?
Feedback: Correct! Cloud machine learning is preferred when the device lacks sufficient computational
resources, as the heavy lifting is done in the cloud.
Feedback: Incorrect. Edge machine learning is usually preferred when data privacy is a major concern
because data can be processed locally.
Feedback: Incorrect. While this can be a factor, it alone is not the primary reason for choosing cloud
over edge machine learning.
Which of the following is a challenge associated with working with big data?
Feedback: Correct! High computational resource requirements are a significant challenge when working
with big data.
Feedback: Incorrect. Easier data management is generally not a challenge associated with big data.
Feedback: Incorrect. The need for data processing is not reduced when working with big data; in fact, it
often increases.
Feedback: Incorrect. Big data typically leads to higher storage costs, not lower.
*A: AUC-ROC
Feedback: Great job! AUC-ROC is a critical metric for evaluating the performance of a binary
classification model.
Feedback: Not quite. Mean Squared Error is more commonly used for regression tasks.
C: Silhouette Score
Feedback: Incorrect. Silhouette Score is used for clustering evaluation, not binary classification.
D: Perplexity
Feedback: Sorry, Perplexity is a metric used in natural language processing, not binary classification.
*A: To provide an environment for running code, visualizing outputs, and documenting analysis
Feedback: Correct! Jupyter is designed to integrate code execution with visualization and narrative text.
Feedback: Not quite. While Jupyter is great for development, it's not typically used for deploying
models to production.
Feedback: That's not correct. Version control systems like Git are used for that purpose, not Jupyter.
Feedback: Incorrect. Although you can debug in Jupyter, its primary purpose is not debugging, but
combining code, results, and explanations.
Which key consideration should be taken into account to ensure the ethical use of machine learning
models?
Feedback: Correct! Ensuring bias and fairness is crucial for the ethical use of machine learning models.
B: Model complexity
C: Computational cost
Feedback: Incorrect. Data storage format is a technical consideration rather than an ethical one.
When designing a machine learning system, which aspect is essential for ensuring model
interpretability?
Feedback: Excellent! Feature importance helps in understanding which features most influence the
model's decisions.
B: Batch Size
Feedback: Not quite. Batch size is related to the training process but does not directly affect model
interpretability.
C: Learning Rate
Feedback: Incorrect. Learning rate impacts model training but not its interpretability.
D: Data Augmentation
Feedback: Sorry, data augmentation is used to increase the diversity of training data and does not relate
to model interpretability.
Question 163 - multiple choice, shuffle
Which component of a machine learning system is responsible for making predictions based on input
data?
*A: Model
Feedback: Correct! The model is the component that makes predictions based on input data.
B: Data pipeline
Feedback: Incorrect. The data pipeline is responsible for moving and transforming data, not making
predictions.
C: Feature extractor
Feedback: Not quite. The feature extractor processes raw data into features but does not make
predictions.
D: Training algorithm
Feedback: No, the training algorithm is used to train the model but doesn't make predictions itself.
*A: Machine learning that processes data locally on the device where it is generated
Feedback: Correct! Edge AI deals with processing data directly on the local device.
Feedback: Incorrect. Edge AI performs data processing on the device itself, not just preprocessing.
Which of the following is a key consideration when building machine learning applications?
Feedback: Correct! Data privacy and security are crucial when building machine learning applications.
Feedback: Incorrect. While UI/UX is important, the color scheme is not a key consideration for machine
learning applications.
Feedback: Incorrect. Marketing strategy, although important, is not a key technology consideration for
building machine learning applications.
Feedback: Incorrect. Logo design is not a key consideration for building machine learning applications.
What is a key benefit of using edge machine learning over cloud machine learning?
Feedback: Correct! Edge machine learning offers lower latency as data is processed closer to the source,
making it ideal for real-time applications.
Feedback: Not quite. While edge devices are becoming more powerful, they still have limited
computational resources compared to cloud computing.
C: Simpler to implement
Feedback: Incorrect. Implementing edge machine learning can be complex due to the need for
specialized hardware and software.
Feedback: No, edge machine learning often involves decentralized data processing, which can pose
security challenges.
What is a significant challenge of scheduling model training and prediction on a real-time basis?
Feedback: Incorrect. Real-time processing aims to reduce latency, not increase it.
Feedback: Incorrect. The challenge is more related to computational resources than data availability.
Which machine learning model is most suitable for a task where the relationship between features and
the target variable is nonlinear?
B: Linear Regression
Feedback: Not quite. Linear Regression is best suited for linear relationships.
C: Logistic Regression
Feedback: Not quite. Logistic Regression is typically used for binary classification tasks and assumes a
linear relationship.
D: Decision Trees
Feedback: Decision Trees can handle nonlinear relationships but may not be as powerful as Neural
Networks for complex tasks.
When considering different machine learning algorithms, which factor is essential in evaluating their
performance for a specific task?
Feedback: Correct! Computational efficiency is crucial for evaluating the performance of machine
learning algorithms, especially for large datasets.
C: Ease of implementation
D: Historical usage
Feedback: Historical usage does not necessarily reflect the performance of an algorithm for a specific
task.
Which of the following is a common technology used for data visualization in machine learning?
*A: Matplotlib
Feedback: Correct! Matplotlib is commonly used for data visualization in machine learning.
B: Scikit-learn
Feedback: Not quite. Scikit-learn is mainly used for machine learning algorithms.
C: SQLite
Feedback: Incorrect. SQLite is used for database management, not data visualization.
D: Django
Feedback: That's not right. Django is a web framework, not a data visualization tool.
Which of the following are important considerations when designing a machine learning system?
Feedback: Correct! Model interpretability ensures that the predictions can be understood and trusted.
Feedback: Not quite. While UI design is important for user interaction, it is not a core consideration in
ML system design.
*D: Scalability
Feedback: Correct! Scalability ensures that the system can handle increasing amounts of data efficiently.
Feedback: Not quite. Cloud storage is relevant but not a primary design consideration for ML systems.
Which of the following factors are important when deciding between a cloud machine learning system
and an edge machine learning model?
Feedback: Correct! Data privacy is important as edge computing can help keep sensitive data local.
Feedback: Incorrect. Storage capacity is usually a constraint in edge devices compared to cloud.
Feedback: Correct! Depending on the industry, regulatory requirements can influence the choice
between cloud and edge computing.
E: Ease of scaling up
Feedback: Incorrect. Cloud systems are generally easier to scale up compared to edge systems.
Which of the following are critical trade-offs when selecting a machine learning model?
Feedback: Correct! The bias-variance trade-off is a fundamental consideration when selecting a machine
learning model.
Feedback: Incorrect. The color scheme of the user interface is not a trade-off related to the model itself.
Feedback: Incorrect. While licensing fees might be a consideration, they are not a fundamental trade-off
in model selection.
Question 174 - checkbox, shuffle, partial credit
Which of the following are key criteria to consider when making technology selection decisions for
machine learning systems?
*A: Scalability
Feedback: Correct! Scalability is a crucial factor when selecting technology for ML systems.
*B: Cost
Feedback: Incorrect. The color scheme of the UI is not a key criterion for technology selection in ML
systems.
Feedback: Incorrect. Popularity alone does not determine the suitability of technology for ML systems.
Select the factors that influence the decision to use edge machine learning over cloud machine learning.
Feedback: Correct! Edge machine learning is often chosen for real-time processing requirements.
Feedback: Correct! Limited internet connectivity is a significant factor favoring edge machine learning.
Feedback: Incorrect. High computational power on the device is not a limiting factor for choosing edge
over cloud machine learning.
*D: Data privacy concerns
Feedback: Correct! Data privacy concerns can make edge machine learning more suitable as data is
processed locally.
Feedback: Incorrect. Centralized model management is more aligned with cloud machine learning.
What is the maximum acceptable latency in seconds for a machine learning system that needs to process
data with a delay of no more than 100 milliseconds?
*A: 0.1
If a machine learning model achieves an accuracy of 92% and is applied to a test set of 250 instances,
how many instances are classified incorrectly?
*A: 20.0
Feedback: Correct! The model misclassifies 8% of 250 instances, which equals 20 instances.
Default Feedback: Incorrect. Try calculating the number of misclassified instances from the given
accuracy and test set size.
When considering technology decisions for building machine learning applications, how many main
options are there for building machine learning models?
*A: 4.0
Feedback: Correct! The main options are building from scratch, using open-source libraries, commercial
libraries, and auto ML.
Default Feedback: Incorrect. Revisit the main options for building machine learning models discussed in
the lesson.
How many primary categories are there when considering technology decisions for building machine
learning applications?
*A: 4.0
Feedback: Correct! There are 4 primary categories to consider: scalability, flexibility, performance, and
ease of use.
Default Feedback: Consider the key technology decisions outlined in the course.
What is the typical range in milliseconds for latency in edge machine learning applications?
Feedback: Correct! Latency in edge machine learning applications typically falls within this range.
Default Feedback: Incorrect. Please review the typical latency ranges for edge machine learning
applications.
What is the term for a type of machine learning algorithm that involves decision trees, often used for
classification and regression tasks? Please answer in all lowercase.
*A: randomforest
Feedback: Correct! A Random Forest algorithm involves decision trees and is used for classification and
regression tasks.
*B: random-forest
Feedback: Correct! A Random Forest algorithm involves decision trees and is used for classification and
regression tasks.
*C: random_forest
Feedback: Correct! A Random Forest algorithm involves decision trees and is used for classification and
regression tasks.
Default Feedback: Incorrect. Please review the types of machine learning algorithms involving decision
trees.
What is the term used to describe the continuous learning process where the model is updated as new
data comes in? Please answer in all lowercase. Please answer in all lowercase.
*A: onlinelearning
Feedback: Correct! Online learning refers to the continuous learning process where models are updated
as new data arrives.
*B: incrementallearning
Default Feedback: Incorrect. This term refers to the continuous learning process where models are
updated as new data arrives.
What is the term for the ability of a machine learning model to generalize well to unseen data? Please
answer in all lowercase.
*A: generalization
Feedback: Correct! Generalization is the ability of a machine learning model to perform well on unseen
data.
*B: generalisation
Feedback: Correct! Generalisation (UK spelling) also describes the ability to perform well on unseen
data.
Default Feedback: Think about the key term used to describe the model's performance on new data.
Name a commonly used tool in machine learning for numerical computing. Please answer in all
lowercase.
*A: numpy
Feedback: Correct! NumPy is widely used for numerical computing in machine learning.
*B: tensorflow
Feedback: Correct! TensorFlow is also used for numerical computing in machine learning.
*C: pandas
Feedback: Correct! Pandas is another tool commonly used for numerical computing.
Default Feedback: Review the course materials on tools commonly used for numerical computing in
machine learning.
Identify a common open-source library used for building machine learning models. Please answer in all
lowercase.
*A: tensorflow
Feedback: Correct! TensorFlow is a widely used open-source library for machine learning.
*B: scikit-learn
Feedback: Correct! Scikit-learn is another popular open-source library for machine learning.
*C: pytorch
Feedback: Correct! PyTorch is also a common open-source library for machine learning.
Default Feedback: Incorrect. Consider the open-source libraries commonly used in machine learning.
What is the term for machine learning that updates its models continuously as new data comes in,
without retraining from scratch each time? Please answer in all lowercase. Please answer in all
lowercase.
*A: onlinelearning
Feedback: Correct! Online learning updates models continuously with new data.
*B: incrementallearning
*C: continuallearning
Default Feedback: Incorrect. This type of learning updates models continuously without retraining from
scratch.
Name a popular commercial machine learning library. Please answer in all lowercase. Please answer in
all lowercase.
*A: h2o
*B: databricks
*C: azureml
Default Feedback: Consider revisiting the lesson on commercial machine learning libraries.
*A: edge
Feedback: Correct! Edge machine learning processes data locally on the device.
*B: edgeai
Default Feedback: Remember, this type of machine learning processes data directly on local devices.
When evaluating different technology options for machine learning, which of the following is a key
consideration?
*A: Scalability
Feedback: Correct! Scalability is crucial for handling large datasets and growing workloads in machine
learning systems.
Feedback: Incorrect. While aesthetics can be important in some contexts, they are not a key
consideration in evaluating machine learning technologies.
C: Brand popularity
Feedback: Incorrect. Brand popularity does not necessarily correlate with the effectiveness or suitability
of a machine learning technology.
D: Keyboard shortcuts
Feedback: Incorrect. Keyboard shortcuts are relevant for user productivity but not a key consideration in
evaluating machine learning technology options.
Which of the following is a key technology decision involved in designing machine learning systems?
*A: Choosing the programming language for implementation
Feedback: Correct! Selecting the appropriate programming language is a critical decision in designing
machine learning systems.
Feedback: Incorrect. While important for operational aspects, it is not a key technology decision in
designing machine learning systems.
Feedback: Incorrect. The color scheme is related to design, not a key technology decision in machine
learning system development.
Feedback: Incorrect. The brand of computers does not directly impact the technological design of
machine learning systems.
What is the significance of Jupyter as a computational notebook in the field of data science?
Feedback: Correct! Jupyter enables interactive data analysis and visualization, making it a valuable tool
for data scientists.
Feedback: Incorrect. While Jupyter can work with large datasets, its primary significance is in
interactive data analysis and visualization.
Feedback: Incorrect. Jupyter is used for a variety of data science tasks, not just machine learning.
Which of the following are key considerations in designing machine learning systems? Select all that
apply.
Feedback: Correct! Model interpretability is crucial for understanding and trust in the predictions made
by the machine learning system.
Feedback: Correct! High-quality data is essential for training accurate and reliable machine learning
models.
Feedback: Incorrect. The font style used in the code editor does not impact the design of a machine
learning system.
Feedback: Correct! The deployment environment affects how the model runs and integrates with other
systems.
Feedback: Incorrect. While personal preference might influence workflow, it is not a key consideration
in designing machine learning systems.
What are factors to consider when deciding between a cloud machine learning system and an edge
machine learning model?
Feedback: Correct! Data privacy requirements are crucial when deciding between cloud and edge
machine learning models.
Feedback: Correct! Reliable internet connectivity is a significant factor in the decision between cloud
and edge machine learning systems.
*C: Cost of hardware
Feedback: Correct! The cost of hardware is a vital consideration when choosing between cloud and edge
machine learning models.
Feedback: Incorrect. The company logo design is irrelevant to the decision between cloud and edge
machine learning systems.
Feedback: Incorrect. Employee dress code is not a factor in making decisions about machine learning
system deployment.
Identify one common tool used in machine learning. Please answer in all lowercase.
*A: scikit-learn
*B: tensorflow
*C: pytorch
Feedback: Correct! PyTorch is also a popular tool in the field of machine learning.
Default Feedback: Incorrect. Please review the common tools and technologies used in machine
learning.
What is the term for the ability of a machine learning model to perform well on new, unseen data?
Please answer in all lowercase.
*A: generalization
Feedback: Correct! Generalization is the ability of a model to perform well on new, unseen data.
B: overfitting
Feedback: Incorrect. Overfitting refers to a model that performs well on training data but poorly on new
data.
C: underfitting
Feedback: Incorrect. Underfitting occurs when a model is too simple to capture the underlying patterns
in the data.
Default Feedback: Incorrect. Please review the concepts related to model evaluation and try again.
What is the term used to describe machine learning performed directly on devices like smartphones and
IoT devices? Please answer in all lowercase.
*A: edgeai
Feedback: Correct! Edge AI refers to machine learning performed directly on devices like smartphones
and IoT devices.
*B: edge_ai
Feedback: Correct! Edge AI refers to machine learning performed directly on devices like smartphones
and IoT devices.
*C: edge
Feedback: Correct! Edge AI refers to machine learning performed directly on devices like smartphones
and IoT devices.
Default Feedback: Incorrect. Please review the lesson on Edge AI and its distinction from cloud machine
learning.
What is a key factor when choosing between a cloud machine learning system and an edge machine
learning model?
B: Brand popularity
C: Cost of electricity
Feedback: While operational cost is important, electricity cost is not typically a primary factor here.
Feedback: UI design is not directly relevant to choosing between cloud and edge models.
Which of the following are benefits of using a cloud machine learning system?
Feedback: Incorrect. Cloud models typically have higher latency compared to edge models.
Feedback: Correct! Cloud systems benefit from centralized data processing capabilities.
Feedback: Correct! Choosing the right data storage solution is essential for efficient data handling and
processing.
B: Deciding the company logo design
Feedback: Not quite. The logo design isn't directly related to machine learning applications.
Feedback: Incorrect. While important for business operations, it doesn't affect machine learning
processes.
Feedback: That's not correct. A mission statement is important for overall strategy but not specific to
machine learning decisions.
What is one key feature of Jupyter notebooks that makes them essential for data science tasks?
Feedback: Correct! This allows data scientists to visualize data interactively which is crucial for
analysis.
Feedback: Not quite. Jupyter notebooks are great for computation and visualization, but data cleaning
requires additional code.
Feedback: That's not correct. Jupyter notebooks focus on computation and visualization rather than real-
time streaming.
Feedback: Incorrect. Jupyter itself doesn't have built-in machine learning models, but supports libraries
that do.
Which of the following considerations is essential in the design of machine learning systems?
*A: Bias-variance trade-off
Feedback: Correct! The bias-variance trade-off is a crucial consideration in machine learning system
design.
Feedback: Color palette selection is not a primary concern in machine learning systems. Focus on
technical aspects.
C: Typography choice
Feedback: Typography choice is not relevant to machine learning system design. Consider system
performance aspects instead.
Feedback: Page layout design does not impact the design of machine learning systems. Focus on
operational considerations.
*A: Scalability
Feedback: Correct! Scalability is crucial in designing systems that can handle increasing amounts of
work.
B: Font style
Feedback: Font style is not a primary consideration in machine learning system design. Consider
focusing on functional aspects.
C: Color scheme
Feedback: Color scheme is not a crucial consideration in designing machine learning systems. Focus on
technical aspects.
D: Brand logo
Feedback: Brand logo is unrelated to the core design of machine learning systems. Consider technical
and operational factors.
Question 203 - multiple choice, shuffle, easy difficulty
Feedback: Correct! Real-time training allows models to adapt quickly to new data.
Feedback: Real-time training often requires more computational resources, not less.
Feedback: Real-time processing can complicate data preprocessing due to continuous data inflow.
Feedback: Real-time scheduling doesn't directly impact how interpretable a model is.
What is one of the main challenges of working with big data in machine learning?
Feedback: Correct! Managing large volumes of data is a significant challenge in big data.
Feedback: While data labeling can be costly, it is not the main challenge in handling big data.
Feedback: Computational algorithms are continually improving, making this less of a challenge.
Feedback: Visualization tools are important but not the main challenge in big data.
Feedback: Correct! AutoML simplifies the process for those without extensive coding experience.
Feedback: Not quite. Data preprocessing is still a crucial step even when using AutoML.
Feedback: This is incorrect. AutoML aims to optimize models, but doesn't always guarantee the best
performance.
Feedback: This is misleading. Domain knowledge is still important in interpreting results even with
AutoML.
Which of the following are key considerations when designing a machine learning system?
Feedback: Correct! High-quality and sufficient data are crucial for training accurate and effective
machine learning models.
Feedback: Correct! Interpretability ensures that the outputs of the model can be understood and trusted
by stakeholders.
C: Server location
Feedback: Incorrect. While server location can affect data processing speed, it is not a fundamental
consideration in the design of a machine learning system.
E: Weather conditions
Feedback: Incorrect. Weather conditions generally do not affect the design of a machine learning system
unless specific weather-related data is being modeled.
What are some important design decisions when building a machine learning system?
Feedback: Correct! The format for storing data can affect the system's efficiency.
Feedback: Incorrect. Aesthetic decisions are less critical than functional ones in system design.
Feedback: Correct! The frequency of data collection impacts the system's performance and accuracy.
Which of the following are common tools and technologies used in machine learning?
Feedback: Correct! Jupyter Notebook is commonly used for interactive coding and data visualization.
B: Microsoft Word
Feedback: Not quite. While useful for documentation, it is not a tool used in machine learning model
development.
*C: TensorFlow
Feedback: Correct! TensorFlow is widely used for building machine learning models.
D: Adobe Photoshop
Feedback: This is incorrect. Photoshop is an image editing tool and not related to machine learning.
*E: Scikit-learn
What is the minimum number of components typically found in a standard machine learning pipeline?
*A: 3.0
Feedback: Correct! A basic machine learning pipeline often includes data collection, model training, and
prediction.
Default Feedback: Think about the essential stages involved in setting up a machine learning model.
*A: 3.0
Feedback: Correct! There are three primary types: supervised, unsupervised, and reinforcement learning.
Default Feedback: Try reviewing the module on types of machine learning for more insight.
What is the typical range for a learning rate that balances convergence speed and stability in gradient
descent optimization?
*A: (0.01, 0.1)
Feedback: Consider the balance between convergence speed and stability when setting a learning rate.
Default Feedback: Revisit learning rate settings in gradient descent optimization and their effects on
convergence.
Why is it essential to implement robust monitoring systems for machine learning models in production?
Feedback: Correct! Robust monitoring helps detect performance degradation early and allows for timely
intervention to maintain model accuracy.
Feedback: Incorrect. While reducing computational resources can be important, robust monitoring is
primarily about ensuring sustained model performance.
Feedback: Incorrect. Interpretability is important, but it is not the primary reason for robust monitoring
of models in production.
Feedback: Incorrect. Automating data preprocessing is a separate concern from monitoring model
performance.
Feedback: Correct! Tracking changes and improvements over time is crucial for understanding the
evolution of the model.
Feedback: Incorrect. While versioning might indirectly affect performance, its primary goal is to track
changes, not speed.
Feedback: Wrong. Versioning does not directly influence the need for data preprocessing.
In a machine learning project, what is the primary purpose of the system design phase?
Feedback: Correct! The system design phase is focused on defining the hardware and software
infrastructure needed for the project.
Feedback: Incorrect. Data collection and preprocessing are usually handled in separate phases.
Feedback: Incorrect. Model evaluation comes after the system has been designed and the model has
been trained.
Feedback: Incorrect. Deployment is a subsequent phase that occurs after the system design is complete.
Which of the following best describes the iterative process of developing and versioning machine
learning models?
*A: A sequence of steps that include data collection, model training, evaluation, and deployment,
repeated as needed
Feedback: Correct! This iterative process ensures continuous improvement and adaptation of the model
to new data.
Feedback: Incorrect. Model development and versioning is an ongoing iterative process, not a one-time
task.
C: A process that involves only data collection and model training without evaluation
Feedback: Incorrect. Evaluation is a crucial part of the iterative process to ensure model effectiveness.
D: A process that ends with model deployment and does not consider model performance monitoring
Which of the following is a key reason for maintaining model versioning in machine learning?
Feedback: Correct! Maintaining versioning ensures that results can be reproduced accurately.
Feedback: Incorrect. Reducing computational power is not the primary reason for maintaining model
versioning.
Feedback: Incorrect. Increasing training speed is not the primary purpose of model versioning.
Feedback: Incorrect. Decreasing storage requirements is not a key reason for model versioning.
Which of the following factors is crucial for ensuring the accuracy of medical AI models in healthcare
applications?
*A: High-quality labeled data
Feedback: Correct! High-quality labeled data is essential for training accurate medical AI models.
Feedback: Not quite. While complex models can be powerful, the quality of the data used to train them
is more crucial.
Feedback: Frequent updates can help, but they won't ensure accuracy without high-quality labeled data.
Feedback: Unlabeled data alone isn't sufficient to ensure accuracy. Labeled data is crucial.
What is a key task involved in supporting and maintaining machine learning models?
Feedback: Correct! Regularly updating the model with new data is essential for maintaining its accuracy
and relevance.
Feedback: Incorrect. While important for team morale, it is not a key task in model maintenance.
Feedback: Wrong. Creating marketing materials is unrelated to maintaining machine learning models.
Feedback: Not quite. While reducing the dataset size might be useful, it is not a key maintenance task.
Which of the following is a major factor affecting the accuracy of medical AI in practical healthcare
applications?
*A: Quality of training data
Feedback: Correct! High-quality training data is crucial for accurate AI predictions in healthcare.
Feedback: Incorrect. While architecture matters, the quality of data used to train the model is more
critical.
Feedback: Incorrect. The activation function type is less significant compared to data quality.
Feedback: Incorrect. While computational resources are important, they don't directly affect accuracy
like data quality does.
Feedback: Correct! Versioning helps in tracking changes and ensures that the model can be reproduced
at any point in time.
Feedback: Incorrect. Versioning does not directly affect the computational complexity of the model.
Feedback: Incorrect. Versioning is about tracking changes, not necessarily about using the latest
algorithms.
Feedback: Correct! Monitoring input and output data helps in maintaining data consistency and early
detection of anomalies.
Feedback: Incorrect. While speed is important, the primary purpose of monitoring input and output data
is to ensure consistency and detect anomalies.
Feedback: Incorrect. Enhancing the user interface is not related to monitoring input and output data.
Feedback: Incorrect. The complexity of data models is not directly influenced by monitoring input and
output data.
*A: To ensure the model continues to make accurate predictions over time
Feedback: Correct! Continuous monitoring helps maintain the accuracy of the model.
Feedback: Incorrect. While cost-saving is important, it is not the primary reason for monitoring models
in production.
Feedback: Incorrect. Data collection is an ongoing process and is necessary even after deployment.
Feedback: Incorrect. Monitoring does not eliminate the need for version control systems.
What is a significant risk of not properly maintaining machine learning models in production?
Feedback: Correct. Not maintaining models properly can result in them becoming stale and not
performing well on new data.
Feedback: Incorrect. Overfitting to the training data is addressed during the training phase, not the
maintenance phase.
Feedback: Incorrect. Proper maintenance focuses on model performance, not the utilization of
computational resources.
Feedback: Incorrect. Data augmentation is a technique used during training, not related to model
maintenance.
Which strategy is most effective for mitigating data and concept drift in machine learning models?
Feedback: Correct! Regular retraining with updated data helps mitigate drift issues.
Feedback: Incorrect. Higher complexity can lead to overfitting and doesn't address drift.
Feedback: Incorrect. A smaller dataset can reduce model performance and doesn't mitigate drift.
Question 225 - multiple choice, shuffle
Feedback: Incorrect. Advanced algorithms exist, but integration remains a bigger challenge.
Feedback: Not quite. While computational power is important, integration challenges are more pressing.
Feedback: Incorrect. There is growing interest, but integration challenges still pose significant hurdles.
Which of the following is a key aspect to consider when making technology decisions in a machine
learning project?
Feedback: Correct! Scalability is crucial to ensure that your solution can handle growing amounts of
data and users.
Feedback: Not quite. While popularity might bring community support, it isn't a key aspect when
making technology decisions.
Feedback: This is incorrect. The age of a technology does not necessarily reflect its suitability for your
project.
In the context of machine learning, which of the following best describes 'model management'?
Feedback: Correct! Model management involves tracking and versioning machine learning models to
ensure reproducibility and manage updates.
Feedback: Incorrect. Collecting data is a part of the data science process, not model management.
Feedback: Incorrect. Designing the architecture of a system is related to system design, not model
management.
Feedback: Incorrect. Choosing the technology stack pertains to technology decisions, not model
management.
What is one of the primary strategies to mitigate training-serving skew in machine learning models?
*A: Regularly retrain the model using a combination of historical and new data
Feedback: Correct! Regular retraining with a combination of historical and new data helps in mitigating
training-serving skew by keeping the model updated with the latest patterns.
Feedback: Incorrect. Using only historical data can lead to outdated predictions and does not address the
issue of training-serving skew effectively.
Feedback: Incorrect. Static thresholds may not adapt to changing patterns in the data, which can lead to
inaccurate predictions.
Which of the following are important steps in the model maintenance cycle?
Feedback: Correct! Regularly updating model parameters is crucial for maintaining model performance.
Feedback: Incorrect. Ignoring model performance metrics is not a part of the model maintenance cycle
and can lead to degradation in model performance.
Feedback: Correct! Routine evaluations are essential to ensure the model is performing as expected.
Feedback: Incorrect. Archiving models without review can result in valuable insights being missed.
Which of the following best describes the role of data considerations in machine learning projects?
*A: Ensuring the data is relevant, clean, and sufficient for the analysis
Feedback: Correct! Data considerations involve ensuring the data is relevant, clean, and sufficient for
the analysis.
Feedback: Incorrect. Deciding on the machine learning algorithm is related to model management, not
just data considerations.
C: Setting up the hardware and software environment
Feedback: Incorrect. Setting up the hardware and software environment pertains to technology
decisions.
Feedback: Incorrect. Defining the business problem is part of problem identification, not specifically
data considerations.
When making technology decisions for a machine learning project, which of the following factors is
most critical to consider?
Feedback: Correct! Scalability is crucial as it determines how well the solution can handle increased
loads and data.
Feedback: Incorrect. While important for user experience, the color scheme is not critical for technology
decisions in a machine learning project.
Feedback: Incorrect. Font style in documentation is not a critical factor for technology decisions.
Feedback: Incorrect. The number of team members is more relevant to project management than to
technology decisions.
Which of the following are key monitoring points for machine learning models in production?
Feedback: Correct! Monitoring the input data is essential to ensure the data fed into the model is
accurate and clean.
*B: Model performance
Feedback: Correct! Monitoring model performance helps in understanding how well the model is doing
its job.
Feedback: Incorrect. The number of team members is not a key point for monitoring machine learning
models.
Feedback: Correct! Monitoring output data is crucial for ensuring that the model's predictions are as
expected.
E: Office location
What are some of the risks associated with deploying machine learning models in production? (Select all
that apply)
Feedback: Correct! Model performance can degrade due to changes in the data distribution or other
external factors.
Feedback: Correct! Bias in training data or model algorithms can lead to unfair outcomes in production.
Feedback: Incorrect. While automation can reduce some oversight, human supervision remains critical
to handle unexpected issues.
Feedback: Correct! Machine learning models can be susceptible to adversarial attacks and other security
threats.
Which of the following are important considerations when identifying problems in a machine learning
project?
Feedback: Correct! Understanding the business objectives is crucial for defining the problem and
ensuring the machine learning solution aligns with business goals.
Feedback: Incorrect. While important, the programming language choice is part of technology decisions,
not problem identification.
Feedback: Correct! Assessing data quality is essential to determine if the data can support building a
reliable model.
Feedback: Correct! Recognizing and mitigating biases in data helps in building fair and effective
models.
Feedback: Incorrect. Configuring the software environment is related to technology setup, not problem
identification.
Identify the common risks associated with managing machine learning-based products in production.
Feedback: Correct! Training-serving skew is a common risk that can affect model performance.
B: Feature engineering
Feedback: Incorrect. Feature engineering is a part of the model development process, not a risk.
Feedback: Correct! Data drift is a significant risk in maintaining machine learning models.
D: Hyperparameter tuning
Feedback: Incorrect. Hyperparameter tuning is a step in model training, not a risk in production.
Feedback: Correct! Concept drift affects the relevance of the model over time and is a known risk.
Select the benefits of implementing robust monitoring systems for machine learning models in
production.
Feedback: Correct! Early detection of model drift is a major benefit of robust monitoring systems.
Feedback: Incorrect. Monitoring systems help detect issues but don't directly improve accuracy without
retraining.
Feedback: Correct! Monitoring systems help ensure compliance with regulatory requirements.
Feedback: Incorrect. Monitoring systems do not reduce the need for data preprocessing.
Feedback: Correct! Data drift is a common issue where the statistical properties of the input data change
over time, affecting model performance.
Feedback: Correct! Concept drift occurs when the statistical properties of the target variable change over
time, impacting the model's predictions.
Feedback: Incorrect. Consistent data quality generally supports good model performance and is not an
issue.
Feedback: Correct! Model staleness happens when a model becomes outdated because it hasn't been
retrained with new data, leading to reduced performance.
Feedback: Incorrect. High feature correlation is not typically an issue that affects model performance in
production.
If a machine learning model's accuracy drops from 95% to 85% after deployment, what is the percentage
decrease in performance accuracy?
*A: 10.0
Default Feedback: Incorrect. Remember to calculate the difference between the initial and current
accuracy.
Feedback: Any accuracy value below 0.85 suggests the model needs retraining.
Default Feedback: Consider the threshold value for accuracy and what it indicates about the model's
performance.
After how many months should a medical AI model ideally be reviewed for potential retraining to
ensure optimal performance?
*A: 6.0
Feedback: Correct! Regular reviews, typically every 6 months, help ensure the model's optimal
performance.
Default Feedback: Incorrect. Consider the industry standards for reviewing and retraining machine
learning models.
What is the range of acceptable accuracy rates (%) for a model to be considered reliable in most
industrial applications?
Feedback: Correct! Typical industrial applications require models to have high accuracy rates in this
range.
Default Feedback: Incorrect. Consider the industry standards for model reliability.
What is the term used for the continuous process of evaluating a machine learning model's performance
in production? Please answer in all lowercase.
*A: monitoring
Feedback: Correct! Monitoring is the continuous process of evaluating a machine learning model's
performance.
*B: surveillance
Feedback: Correct! Surveillance is another term for the continuous evaluation of model performance.
Default Feedback: Incorrect. The term you are looking for refers to the continuous process of evaluating
a machine learning model's performance.
What term describes the process of updating a machine learning model to adapt to new data? Please
answer in all lowercase.
*A: retraining
Feedback: Correct! Retraining is essential to adapt the model to new data and maintain its performance.
*B: fine-tuning
Feedback: Correct! Fine-tuning is another term often used to describe updating the model with new data.
Default Feedback: Incorrect. The term refers to regularly updating the model to ensure it performs well
on new data.
What is a critical component of model maintenance that involves checking the alignment of input data
with expected patterns? Please answer in all lowercase.
*B: datavalidation
What is the term for a significant change in the statistical properties of the input data, which can affect
the performance of machine learning models? Please answer in all lowercase.
Feedback: Correct! Data drift refers to significant changes in the statistical properties of the input data.
Default Feedback: Incorrect. Please review the course materials on the types of data shifts that can affect
model performance.
What is the term for the shift in the statistical properties of the target variable that the model is
predicting, which can impact model performance negatively? Please answer in all lowercase.
*A: conceptdrift
Feedback: Correct! Concept drift occurs when the statistical properties of the target variable change.
*B: concept-drift
Feedback: Correct! Concept drift occurs when the statistical properties of the target variable change.
*C: drift
Feedback: Correct! Concept drift occurs when the statistical properties of the target variable change.
Default Feedback: The answer refers to a common problem affecting model predictions when the target
variable changes over time.
*B: It helps detect data drift and model performance degradation over time.
Feedback: Correct! Monitoring helps in identifying any shifts in the data or performance issues,
ensuring the model remains reliable.
Feedback: This is incorrect. While monitoring can help identify when retraining is necessary, it does not
eliminate the need for retraining.
Feedback: Incorrect. Monitoring is primarily concerned with performance and reliability, not
computational efficiency.
Feedback: Correct! Understanding the business problem is crucial in the data science process as it
guides the entire project.
Feedback: Incorrect. Ignoring data quality can lead to inaccurate models and insights.
Feedback: Incorrect. While algorithms are important, understanding the problem and the data is
essential.
Feedback: Incorrect. Testing models is a critical step to ensure their reliability and accuracy.
Feedback: Correct! Versioning helps in tracking changes and improvements, ensuring that the best
performing model is used.
Feedback: Incorrect. Versioning isn't about increasing the model's size but about tracking changes and
improvements.
Feedback: Incorrect. Versioning does not affect the training time of a model.
Feedback: Incorrect. Versioning does not eliminate the need for monitoring; both are important for a
robust ML system.
Which of the following is a key factor affecting the accuracy of medical AI in practical healthcare
applications?
Feedback: Correct! Quality of training data is crucial for the accuracy of medical AI as it directly
impacts the model's learning process.
Feedback: Incorrect. The color of the user interface does not affect the accuracy of medical AI.
C: Number of developers
Feedback: Incorrect. While the number of developers can influence the development process, it does not
directly affect the accuracy of medical AI.
Feedback: Incorrect. The type of programming language used does not directly impact the accuracy of
medical AI.
Question 251 - checkbox, shuffle, partial credit
Which of the following are risks associated with machine learning models in production?
Feedback: Correct. Data drift is a significant risk as it can lead to model performance degradation.
B: Model interpretability
Feedback: Incorrect. While model interpretability is important, it is not considered a risk in itself.
Feedback: Correct. Security vulnerabilities can be exploited, leading to potential misuse of the model.
Feedback: Correct. Overfitting can cause the model to perform poorly on new, unseen data.
Select the key concepts that should be identified in a machine learning project.
Feedback: Correct! Identifying the problem is the first step in any machine learning project.
Feedback: Correct! Designing the system is crucial for implementing machine learning solutions.
Feedback: Correct! Considering the data is essential for building accurate models.
Feedback: Incorrect. Technology decisions play a significant role in the success of a machine learning
project.
Which of the following are key monitoring points for machine learning models?
Feedback: Correct! Monitoring model performance metrics helps ensure the model is performing as
expected.
Feedback: Incorrect. While training time is important, it is not a key monitoring point for models in
production.
Feedback: Incorrect. The number of layers in a model is not typically a key monitoring point.
What term is used to describe changes in the statistical properties of the input data over time? Please
answer in all lowercase.
*A: drift
Feedback: Correct! Drift refers to changes in the statistical properties of the input data over time.
*B: conceptdrift
Feedback: Correct! Concept drift is a specific type of drift affecting the model's understanding of the
data.
*C: datadrift
Feedback: Correct! Data drift is another term describing changes in the input data over time.
Default Feedback: Incorrect. This term refers to the phenomenon where the statistical properties of the
input data change over time, impacting model performance.
What is the first step in the data science process when addressing a machine learning problem?
Feedback: Data collection is important but identifying the problem takes precedence.
Feedback: Correct! Understanding the problem is the foundation of any successful project.
Feedback: Model selection is crucial but follows problem identification and data preparation.
Feedback: While system design is significant, it follows the identification of the problem and
understanding data needs.
Feedback: Cross-validation does not eliminate the need for testing but helps ensure robust model
evaluation.
Feedback: Feature engineering may benefit from insights during cross-validation, but it's not simplified
by it.
What is a key reason for maintaining versioning systems in machine learning models?
Feedback: Correct! Versioning helps track changes and maintain consistent performance.
Feedback: Not quite. Versioning is more about maintaining consistency than speed.
Feedback: This is not correct. Versioning does not eliminate preprocessing needs.
Which risk is most commonly associated with deploying machine learning models in production?
Feedback: Incorrect. User interface is not typically a risk during model deployment.
Feedback: This is not correct. Computational power is a consideration, but not a primary risk.
Which of the following are important aspects of model maintenance to ensure performance?
Feedback: Correct! Keeping models updated with new data is essential for maintaining performance.
Feedback: Incorrect. Ignoring data drifts can lead to poor model performance.
Feedback: Not quite. Only simplify models when necessary, not arbitrarily.
Feedback: Correct! Model versioning helps in tracking changes and updates, ensuring reproducibility
and collaboration.
Feedback: Not quite. While speed is important, versioning primarily focuses on tracking changes.
C: To improve model accuracy
Feedback: Accuracy improvement is a goal, but versioning focuses on tracking model changes over
time.
Feedback: This is not the primary purpose. Versioning is about tracking changes rather than ensuring the
newest algorithms.
What is one of the critical tasks involved in supporting machine learning models in production?
Feedback: Correct! Updating documentation ensures that the team stays informed and models remain
maintainable.
Feedback: Not necessarily. While complexity might increase, support focuses more on stability and
performance.
Feedback: Always being online isn't the main support task; uptime is important but not always critical.
Feedback: Throughput is important, but supporting models involves more holistic tasks.
Which of the following factors most significantly affects the accuracy of medical AI in healthcare
applications?
Feedback: The number of developers does not directly influence the accuracy of AI models.
Feedback: While open-source software can be beneficial, it's the data quality that significantly impacts
accuracy.
What is a primary strategy to mitigate risks like training-serving skew in production machine learning
models?
Feedback: Correct! Monitoring and feedback are key to addressing such risks.
Feedback: Not quite. While important, it doesn't directly mitigate these specific risks.
Feedback: Feature selection is important but doesn't directly address training-serving skew.
Feedback: Using real-time data alone doesn't necessarily mitigate these risks.
Which of the following are important considerations when managing machine learning models?
Feedback: Right! Accuracy and precision are key metrics in evaluating model performance.
Feedback: Incorrect. The UI color scheme is usually not relevant to model management.
Feedback: Yes, regular updates and monitoring are essential to keep the model relevant.
Feedback: No, the developer's preferred language should not influence model management decisions.
Which of the following are key monitoring points for machine learning models?
Feedback: Correct! Monitoring input data is crucial for understanding data drift and model performance.
Feedback: Correct! Output data should be monitored to ensure the model performs as expected.
Feedback: Not quite. While development time is important, it is not a monitoring point for live models.
Feedback: Correct! Monitoring model performance helps in identifying when a model needs retraining
or adjustments.
E: Hardware specifications
Feedback: This isn't typically a key monitoring point for models in production. Focus is more on data
and performance.
Feedback: Correct! Meeting regulatory standards is crucial for deploying AI in clinical environments.
Feedback: This is less of a challenge as many tools are available, but integration and compliance are
more pressing.
If a model monitoring system flags a 0.05 increase in error rate from its baseline, what is the percentage
increase, assuming the baseline error rate was 0.10?
*A: 50.0
Feedback: Correct! A 0.05 increase from a 0.10 baseline is a 50% increase in error rate.
Default Feedback: Review the calculation methods for percentage increase from the course.
If a machine learning model's precision is 0.75 and its recall is 0.60, what is the F1 score? Use the
formula \[F1 = 2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}}\] to
calculate.
*A: 0.6667
Feedback: Good job! You've applied the F1 score formula correctly.
Default Feedback: Check your calculations and ensure you apply the F1 score formula correctly.
Suppose a machine learning model's performance drops below an acceptable threshold after being in
production for 6 months. Based on monitoring strategies, within how many months should the model
ideally be evaluated for retraining?
*A: 6.0
Feedback: Correct! Regular evaluation, such as every 6 months, helps maintain model performance.
Default Feedback: Consider periodic evaluation times that are recommended for maintaining model
performance.