Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning
Ebook359 pages2 hours

Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning

Rating: 0 out of 5 stars

()

Read preview
LanguageEnglish
PublisherPackt Publishing
Release dateMar 29, 2024
ISBN9781835462683
Active Machine Learning with Python: Refine and elevate data quality over quantity with active learning

Related to Active Machine Learning with Python

Related ebooks

Intelligence (AI) & Semantics For You

View More

Reviews for Active Machine Learning with Python

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Active Machine Learning with Python - Margaux Masson-Forsythe

    Cover.png

    Active Machine Learning with Python

    Copyright © 2024 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    Group Product Manager: Niranjan Naikwadi

    Publishing Product Manager: Tejashwini R

    Book Project Manager: Kirti Pisat

    Senior Editor: Vandita Grover

    Technical Editor: Rahul Limbachiya

    Copy Editor: Safis Editing

    Proofreader: Safis Editing

    Indexer: Manju Arasan

    Production Designer: Vijay Kamble

    DevRel Marketing Coordinator: Vinishka Kalra

    First published: March 2024

    Production reference: 1270324

    Published by Packt Publishing Ltd.

    Grosvenor House

    11 St Paul’s Square

    Birmingham

    B3 1RB, UK.

    ISBN 978-1-83546-494-6

    www.packtpub.com

    To my beloved wife, Heather Masson-Forsythe, whose unwavering kindness and support are my pillars of strength with every new intense project I undertake each week.

    Contributors

    About the author

    Margaux Masson-Forsythe is a skilled machine learning engineer and advocate for advancements in surgical data science and climate AI. As the director of machine learning at Surgical Data Science Collective, she builds computer vision models to detect surgical tools in videos and track procedural motions. Masson-Forsythe manages a multidisciplinary team and oversees model implementation, data pipelines, infrastructure, and product delivery. With a background in computer science and expertise in machine learning, computer vision, and geospatial analytics, she has worked on projects related to reforestation, deforestation monitoring, and crop yield prediction.

    About the reviewer

    Mourya Boggarapa is a deep learning software engineer specializing in the end-to-end integration of large language models for custom AI accelerators. He holds a master’s degree in software engineering from Carnegie Mellon University. Prior to his current role, Mourya honed his skills through diverse experiences: developing backend systems for a major bank, building development infrastructure for a tech giant, and some mobile app development. He cultivated a comprehensive understanding of software development across various domains. His primary passion lies in deep learning. Additionally, he maintains a keen interest in human-computer interaction, aiming to bridge the gap between tech and human experience.

    Table of Contents

    Preface

    Part 1: Fundamentals of Active Machine Learning

    1

    Introducing Active Machine Learning

    Understanding active machine learning systems

    Definition

    Potential range of applications

    Key components of active machine learning systems

    Exploring query strategies scenarios

    Membership query synthesis

    Stream-based selective sampling

    Pool-based sampling

    Comparing active and passive learning

    Summary

    2

    Designing Query Strategy Frameworks

    Technical requirements

    Exploring uncertainty sampling methods

    Understanding query-by-committee approaches

    Maximum disagreement

    Vote entropy

    Average KL divergence

    Labeling with EMC sampling

    Sampling with EER

    Understanding density-weighted sampling methods

    Summary

    3

    Managing the Human in the Loop

    Technical requirements

    Designing interactive learning systems and workflows

    Exploring human-in-the-loop labeling tools

    Common labeling platforms

    Handling model-label disagreements

    Programmatically identifying mismatches

    Manual review of conflicts

    Effectively managing human-in-the-loop systems

    Ensuring annotation quality and dataset balance

    Assess annotator skills

    Use multiple annotators

    Balanced sampling

    Summary

    Part 2: Active Machine Learning in Practice

    4

    Applying Active Learning to Computer Vision

    Technical requirements

    Implementing active ML for an image classification project

    Building a CNN for the CIFAR dataset

    Applying uncertainty sampling to improve classification performance

    Applying active ML to an object detection project

    Preparing and training our model

    Analyzing the evaluation metrics

    Implementing an active ML strategy

    Using active ML for a segmentation project

    Summary

    5

    Leveraging Active Learning for Big Data

    Technical requirements

    Implementing ML models for video analysis

    Selecting the most informative frames with Lightly

    Using Lightly to select the best frames to label for object detection

    SSL with active ML

    Summary

    Part 3: Applying Active Machine Learning to Real-World Projects

    6

    Evaluating and Enhancing Efficiency

    Technical requirements

    Creating efficient active ML pipelines

    Monitoring active ML pipelines

    Determining when to stop active ML runs

    Enhancing production model monitoring with active ML

    Challenges in monitoring production models

    Active ML to monitor models in production

    Early detection for data drift and model decay

    Summary

    7

    Utilizing Tools and Packages for Active ML

    Technical requirements

    Mastering Python packages for enhanced active ML

    scikit-learn

    modAL

    Getting familiar with the active ML tools

    Summary

    Index

    Other Books You May Enjoy

    Preface

    Welcome to Active Learning with Python a comprehensive guide designed to introduce you to the power of active machine learning. This book is written with the conviction that while data is plentiful, its quality and relevance hold the key to building models that are not only efficient but also robust and insightful.

    Active machine learning is a method used in machine learning where the algorithm can query an oracle to label new data points with the desired outputs. It stands at the crossroads of optimization and human-computer interaction, enabling machines to learn more effectively with less data. This is particularly valuable in scenarios where data labeling is costly, time-consuming, or requires expert knowledge.

    Throughout this book, we leverage Python, a leading programming language in the field of data science and machine learning, known for its simplicity and powerful libraries. Python serves as an excellent medium for exploring the concepts of active machine learning, providing both beginners and experienced practitioners with the tools needed to implement sophisticated models.

    Who this book is for

    This book is intended for data scientists, machine learning engineers, researchers, and anyone curious about optimizing machine learning workflows. Whether you are new to active machine learning or looking to enhance your current models, this book provides insights into making the most of your data through strategic querying and learning techniques.

    What this book covers

    Chapter 1

    , Introducing Active Machine Learning, explores the fundamental principles of active machine learning, a highly effective approach that significantly differs from passive methods. This chapter also offers insights into its distinctive strategies and advantages.

    Chapter 2

    , Designing Query Strategy Frameworks, presents a comprehensive exploration of the most effective and widely utilized query strategy frameworks in active machine learning and covers uncertainty sampling, query-by-committee, expected model change, expected error reduction, and density-weighted methods.

    Chapter 3

    , Managing the Human in the Loop, discusses the best practices and techniques for the design of interactive active machine learning systems, with an emphasis on optimizing human-in-the-loop labeling. Aspects such as labeling interface design, the crafting of effective workflows, strategies for resolving model-label disagreements, the selection of suitable labelers, and their efficient management are covered.

    Chapter 4

    , Applying Active Learning to Computer Vision, covers various techniques for harnessing the power of active machine learning to enhance computer vision model performance in tasks such as image classification, object detection, and semantic segmentation, also addressing the challenges in their application.

    Chapter 5

    , Leveraging Active Learning for Big Data, explores the active machine learning techniques for managing big data such as videos, and acknowledges the challenges in developing video analysis models due to their large size and frequent data duplication based on frames-per-second rates, with a demonstration of an active machine learning method for selecting the most informative frames for labeling.

    Chapter 6

    , Evaluating and Enhancing Efficiency, details the evaluation of active machine learning systems, encompassing metrics, automation, efficient labeling, testing, monitoring, and stopping criteria, aiming for accurate evaluations and insights into system efficiency, guiding informed improvements in the field.

    Chapter 7

    , Utilizing Tools and Packages for Active ML, discusses the Python libraries, frameworks, and tools commonly used for active learning, highlighting their value in implementing various active learning techniques and offering an overview suitable for both beginners and experienced programmers.

    To get the most out of this book

    You should possess proficiency in Python coding and familiarity with Google Colab, alongside a foundational understanding of machine learning and deep learning principles.You also need to be familiar with machine learning frameworks like PyTorch.

    This book is for individuals who possess a fundamental understanding of machine learning and deep learning and who aim to acquire knowledge about active learning in order to optimize the annotation process of their machine learning datasets. This optimization will enable them to train the most effective models possible.

    You will need to create accounts for diverse tools: Encord, Roboflow, and Lightly. You will also need access to an AWS EC2 instance for Chapter 6

    , Evaluating and Enhancing Efficiency.

    If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

    Download the example code files

    You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Active-Machine-Learning-with-Python

    . If there’s an update to the code, it will be updated in the GitHub repository.

    We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/

    . Check them out!

    Conventions used

    There are a number of text conventions used throughout this book.

    Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: We define x_true and y_true.

    A block of code is set as follows:

    y_true = np.array(small_dataset['label'])

    x_true = np.array(small_dataset['text'])

    Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: Anomaly detection is another domain where active learning proves to be highly effective.

    Tips or important notes

    Appear like this.

    Get in touch

    Feedback from our readers is always welcome.

    General feedback: If you have questions about any aspect of this book, email us at [email protected]

    and mention the book title in the subject of your message.

    Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata

    and fill in the form.

    Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected]

    with a link to the material.

    If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com

    .

    Share Your Thoughts

    Once you’ve read Active Machine Learning with Python, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page

    for this book and share your feedback.

    Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

    Download a free PDF copy of this book

    Thanks for purchasing this book!

    Do you like to read on the go but are unable to carry your print books everywhere?

    Is your eBook purchase not compatible with the device of your choice?

    Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

    Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books

    Enjoying the preview?
    Page 1 of 1