A-lines/B-lines Lung Ultrasound Classifier

We at Deep Breathe sought to train a deep learning model for the task of automating the distinction between normal and abnormal lung tissue on lung ultrasound.

This repository contains work relating to development and validation of an A-line vs. B-line ultrasound image classifier that was used for the creation of the paper A deep learning solution for the classification of normal versus abnormal lung parenchyma on lung ultrasound: a multicentre study (a link will be added upon publication).

Getting Started
Building a Dataset
Use Cases
i)Train Single Experiment
ii) K-Fold Cross Validation
iii) Hyper Parameter Optimization
iv) Predictions
v) Grad-CAM for Individual Frame Predictions
Project Configuration
Project Structure
Contacts

Getting Started

Clone this repository (for help see this tutorial).
Install the necessary dependencies (listed in requirements.txt) and ensure that your python version is 3.8. To do this, open a terminal in the root directory of the project and run the following:
```
$ pip install -r requirements.txt
```
Obtain lung ultrasound data and preprocess it accordingly. See building a dataset for more details.
Update the TRAIN >> MODEL_DEF field of config.yml with the appropriate string representing the model type you wish to train. To train a model, ensure the TRAIN >> EXPERIMENT_TYPE field is set to 'train_single'.
Execute train.py to train your chosen model on your preprocessed data. The trained model will be serialized within results/models/, and its filename will resemble the following structure: model{yyyymmdd-hhmmss}.h5, where {yyyymmdd-hhmmss} is the current time.
Note: When train.py is executed, the project root should be the working directory.
Navigate to results/logs/ to see the tensorboard log files. The folder name will be {yyyymmdd-hhmmss}. These logs can be used to create a tensorboard visualization of the training results.

Building a Dataset

Create a database_config.yml file in the root directory as follows:

USERNAME: (database user name)
PASSWORD: (database password)
HOST: (database host name)
DATABASE: data

To build a dataset run the following command:

python src/data/ab_line_dataset_creator.py

This runs a series of automated steps using the clips query specified. Note that the raw clips were scrubbed of all on-screen information (e.g. vendor logos, battery indicators, index mark, depth markers) extraneous to the ultrasound beam itself. This was done using a dedicated deep learning masking software for ultrasound (AutoMask, WaveBase Inc., Waterloo, Canada). Following this, all ultrasound clips were deconstructed into their constituent frames, and a frame table was generated linking each frame to their ground truth, associated clip, and patient id.

The following intermediate csv headers are also extracted to create the final frames table csv file to train a model:

| patient_id | a_or_b_lines | class |

Where filename is the name of the labeled clip file, patient_id is a unique patient identifier, a_or_b_lines is a string label for the clip, and class is the label as a class integer.

Use Cases

Train a Model

With a pre-processed clip dataset, you can train a frame classification model of a chosen model definition.

Assemble in a pre-processed clip dataset (see Building a Dataset) and set the appropriate data paths in config.yml.
Mask the images of extraneous information outside the ultrasound beam. In our group, this was done with proprietary software mentioned above in 'Building a Dataset'
Generate a frame dataset from the masked clips using build-dataset.py.
Set the desired data and train configuration fields in config.yml including setting a model type using the MODEL_DEF parameter and setting the EXPERIMENT_TYPE to single_train.
Set the associated hyperparameter values based on the chosen model definition.
Run train.py.
View all logs and trained weights in the results directory.

Note: We found that the cutoffvgg16 model definition had the best performance on our internal data.

K-Fold Cross Validation

With a pre-processed clip dataset, you can evaluate model performance using k-fold cross-validation.

Assemble in a pre-processed clip dataset (see Building a Dataset) and set the appropriate data paths in config.yml.
Generate a frame from the masked clips using build-dataset.py.
Set the desired data and train configuration fields in config.yml including setting a model type using the MODEL_DEF parameter and setting the EXPERIMENT_TYPE to cross_validation.
Set the number of folds in the train section of config.yml.
Set the associated hyperparameter values based on the chosen model definition.
Run train.py.
View all logs and trained weights in the results directory. The partitions from each fold can be found in the partitions folder.

Hyper Parameter Optimization

With a pre-processed clip dataset, you can perform a hyperparameter search to assist with hyperparameter optimization.

Assemble in a pre-processed clip dataset (see Building a Dataset) and set the appropriate data paths in config.yml.
Generate a frame dataset from the masked clips using build-dataset.py.
Set the desired data and train configuration field in config.yml including setting a model type using the MODEL_DEF parameter and setting the EXPERIMENT_TYPE to hparam_search.
Set the hyperparameter search fields in the train section of config.yml.
Set the associated hyperparameter search configuration values based on the chosen model definition.
Run train.py.
View all logs in the logs folder and view bayesian hyperparameter search results in the experiments folder.

Predictions

With a trained model, you can compute frame predictions and clip predictions using the following steps:

Set the MODEL_TO_LOAD field in config.yml to point to a trained model (in .h5 format).
Set the FRAME_TABLE and CLIPS_TABLE fields to the dataset of interest. Set the FRAMES field to point to the dataset's directory of LUS frames.
Set the CLIP_PREDICTION > ALGORITHM field to determine which algorithm is used to compute clip-wise predictions, given the clip's set of frame predictions produced by the model. Below is a brief description of each algorithm available.
- "contiguous": If the number of contiguous frames for which the frame's predicted B-line probability meets or exceeds the classification threshold is at least the contiguity threshold, classify the clip as "B-lines".
- "average": Compute the average prediction probabilities across the entire clip. If the B-line average probability meets or exceeds the classification threshold, classify the clip as "B-lines".
- "sliding_window": Take the clip's B-line probability as the greatest average B-line probability present in any contiguous set of frames as large as the sliding window.
Execute predict.py.
Access the frame and corresponding clip predictions as CSV files, located in results/predictions.

Grad-CAM for Individual Frame Predictions

With a trained model and a collection of frame data, you can apply a Grad-CAM visualization to individual frames.

Set the MODEL_TO_LOAD field in config.yml to point to a trained model (in .h5 format).
Set the FRAME_TABLE field and set the FRAMES path field to point to a directory of LUS frames .
Run gradcam.py.
Select the frame that you want to apply Grad-CAM to.
View Grad-CAM results in the img/heatmaps folder.

Project Configuration

This project contains several configurable variables that are defined in the project config file: config.yml. When loaded into Python scripts, the contents of this file become a dictionary through which the developer can easily access its members.

For user convenience, the config file is organized into major components of the model development pipeline. Many fields need not be modified by the typical user, but others may be modified to suit the user's specific goals. A summary of the major configurable elements in this file is below.

Paths

This section of the config contains all path definitions for reading data and writing outputs.

CLIPS_TABLE: Clip table in csv format.
FRAME_TABLE: Frame table in csv format.
QUERY_TABLE: Raw SQL clips table output in csv format.
DATABASE_QUERY: Table containing LUS exam ID's(Unique to our setup.)
RAW_CLIPS: Location of raw clip data in mp4 format.
MASKED_CLIPS: Location of masked clip data in mp4 format
FRAMES: Location of frame data in png format.
PARTITIONS
TEST_DF
EXT_VAL_CLIPS_TABLE: Clip table in csv format for external dataset.
EXT_VAL_FRAME_TABLE: Frame table in csv format for external dataset.
EXT_VAL_FRAMES: Location of frame data in png format for external dataset.
HEATMAPS
LOGS
IMAGES
MODEL_WEIGHTS
MODEL_TO_LOAD: Trained model in h5 file format.
CLASS_NAME_MAP: Output class indices in pkl format.
BATCH_PREDS
METRICS
EXPERIMENTS
EXPERIMENT_VISUALIZATIONS
PRETRAINED_WEIGHTS
RT_ROOT_DIR
RT_LABELBOX_ANNOTATIONS
AUTOMASK_MODEL_PATH: Model used to mask clips.
HOLDOUT_CLIPS_PATH: Path to store clips table of Holdout artifact.
HOLDOUT_FRAMES_PATH: Path to store frames table of Holdout artifact.
MODEL_DEV_CLIPS_PATH: Path to store clips path of ModelDev artifact.
MODEL_DEV_FRAMES_PATH: Path to store frames table of ModelDev artifact.
K_FOLDS_SPLIT_PATH: Path to store data for KFolds artifact

WandB

This section of the config is specifically related to WandB MLOps tool.

ENTITY: Name of workspace in WandB
PROJECT_NAME: Name of project in WandB
LOGGING: Various configuration parameters related to artifact logging.
- IMAGES: Whether to log Images artifacts.
- MODEL_DEV_HOLDOUT: Whether to log ModelDev and Holdout artifacts.
- K_FOLD_CROSS_VAL: Whether to log KFoldCrossValidation artifact.
- TRAIN_TEST_VAL: Whether to log TrainValTest artifact
IMAGES_ARTIFACT_VERSION: Version of Images artifact to use for logging.
MODEL_DEV_ARTIFACT_VERSION: Version of ModelDev artifact to use for logging.
TRAIN_VAL_TEST_ARTIFACT_VERSION: Version of TrainValTest artifact to use for logging or training.
K_FOLD_CROSS_VAL_ARTIFACT_VERSION: Version of KFoldCrossValidation artifact to use for logging or training.
ARTIFACT_SEED: Random number generator seed used to generate an artifact. Stored in artifact metadata.

Data

IMG_DIM: Dimensions for frame resizing.
VAL_SPLIT: Validation split. Note this split is with reference to train, val and test data.
TEST_SPLIT: Test split. Note this split is with reference to train, val and test data.
K_FOLD_VALIDATION_SPLIT: Validation split for K-Fold data. Note this split is with reference to only train and val data. Test is the fold for k-fold cross-validation.
HOLDOUT_ARTIFACT_SPLIT: Split used for splitting ModelDev data and Holdout data.
CLASSES: A string list of data classes.
RT_B_LINES_3_CLASS
REAL_TIME_DATA: Flag indicating whether data is real-time or not.
AUTOMASK: configuration related to using automasking tool.
- VERSION: Indicates tool and version of automasking software.
- OUTPUT_FORMAT: Determines what file format is outputted by the tool.
- EDGE_PRESERVE: Represents how much of clip edge to preserve.
- SAVE_CROPPED_ROI: Flag determining if cropped roi is saved.

Train

MODEL_DEF: Defines the type of frame model to train. One of {'vgg16', 'mobilenetv2', 'xception', 'efficientnetb7', 'custom_resnetv2', 'cutoffvgg16'}
EXPERIMENT_TYPE: Defines the experiment type. One of {'single_train', 'cross_validation', 'hparam_search'}
SEED
N_CLASSES: Number of classes/labels.
BATCH_SIZE: Batch size.
EPOCHS: Number of epocs.
PATIENCE: Number of epochs with no improvement after which training will be stopped.
MIXED_PRECISION Toggle mixed precision training. Necessary for training with Tensor Cores.
N_FOLDS: Cross-validation folds.
MEMORY_LIMIT: Max memory, in MB, for the virtual device configuration used for training.
USE_PRETRAINED
DATA_AUG: Data augmentation parameters.
- ZOOM_RANGE
- HORIZONTAL_FLIP
- WIDTH_SHIFT_RANGE
- HEIGHT_SHIFT_RANGE
- SHEAR_RANGE
- ROTATION_RANGE
- BRIGHTNESS_RANGE
HPARAM_SEARCH:
- N_EVALS: Number of iterations in the hyperparameter search.
- METHOD: String identifier for method/algorithm used to perform hyperparameter search.
- METRIC_GOAL: Determines whether the metric should be minimized or maximized in the hyperparameter search.
- METRIC_NAME: String identifier for name of the metric to be optimized in the hyperparameter search.

Clip prediction

ALGORITHM: Choice of clip classification algorithm (one of "contiguous", "sliding_window" or "average")
CLASSIFICATION_THRESHOLD: Classification threshold applied for clip classification
CONTIGUITY_THRESHOLD: Number of contiguous frame predictions exceeding the classification threshold that constitute a clip-wise prediction for B-lines (for the "contiguous" algorithm)
SLIDING_WINDOW: Number of contiguous frames over which the average prediction is calculated (for the "sliding_window" algorithm)

Hyperparameters

Each model type has a list of configurable hyperparameters defined here.

MOBILENETV2
- LR
- DROPOUT
- L2_LAMBDA
- NODES_DENSE0
- FROZEN_LAYERS
SHUFFLENETV2
- LR
- DROPOUT
- L2_LAMBDA
VGG16
- LR
- DROPOUT
- L2_LAMBDA
- NODES_DENSE0
- FROZEN_LAYERS
XCEPTION
- LR
- DROPOUT
- FROZEN_LAYERS
- L2_LAMBDA
BiTR50x1
- LR
- DROPOUT
- L2_LAMBDA
EFFICIENTNETB7
- LR
- DROPOUT
- L2_LAMBDA
- FROZEN_LAYERS
CNN0
- LR
- DROPOUT
- L2_LAMBDA
- NODES_DENSE0
- KERNEL_SIZE
- STRIDES
- MAXPOOL_SIZE
- BLOCKS
- INIT_FILTERS
- FILTER_EXP_BASE
CUSTOM_RESNETV2
- LR
- DROPOUT0
- DROPOUT1
- STRIDES
- BLOCKS
- INIT_FILTERS
CUTOFFVGG16
- LR_EXTRACT
- LR_FINETUNE
- DROPOUT
- CUTOFF_LAYER
- FINETUNE_LAYER
- EXTRACT_EPOCHS

Hyperparameter Search

For each model there is a range of values that can be sampled for the hyperparameter search. The ranges are defined here in the config file. Each hyperparameter has a name, type, and range. The type dictates how samples are drawn from the range.

For more information on using bayesian hyperparameters, visit the skopt documentation.

MOBILENETV2
- LR
  - TYPE
  - RANGE
- DROPOUT
  - TYPE
  - RANGE
CUTOFFVGG16
- LR_EXTRACT
  - TYPE
  - RANGE
- LR_FINETUNE
  - TYPE
  - RANGE
- DROPOUT
  - TYPE
  - RANGE
- EXTRACT_EPOCHS
  - TYPE
  - RANGE
CUSTOM_RESNETV2
- LR
  - TYPE
  - RANGE
- DROPOUT0
  - TYPE
  - RANGE
- DROPOUT1
  - TYPE
  - RANGE
- BLOCKS
  - TYPE
  - RANGE
- INIT_FILTERS
  - TYPE
  - RANGE

Project Structure

The project looks similar to the directory structure below. Disregard any .gitkeep files, as their only purpose is to force Git to track empty directories. Disregard any ._init_.py files, as they are empty files that enable Python to recognize certain directories as packages.

├── img
|   ├── experiments                  <- Visualizations for experiments
|   ├── heatmaps                     <- Grad-CAM heatmap images
|   └── readme                       <- Image assets for README.md
├── results
|   ├── data                         
|   |   └── partition                <- K-fold cross-validation fold partitions
|   ├── experiments                  <- Experiment results
|   ├── figures                      <- Generated figures
|   ├── models                       <- Trained model output
|   ├── predictions                  <- Prediction output
│   └── logs                         <- TensorBoard logs
├── src
│   ├── data
|   |   ├── build-dataset.py         <- Builds a table of frame examples using a table of clip metadata
|   |   ├── database_pull.py         <- Script for pulling clip mp4 files from the cloud - step 2 (specific to our setup)
|   |   └── query_to_df.py           <- Script for pulling clip metadata from the cloud - step 1 (specific to our setup)
│   ├── explainability
|   |   └── gradcam.py               <- Script containing gradcam application and heatmap generation
│   ├── models                       
|   |   └── models.py                <- Script containing all model definitions
|   ├── visualization                
|   |   └── visualize.py             <- Script for visualization production
|   ├── predict.py                   <- Script for prediction on raw data using trained models
|   ├── deploy.py                    <- Script for confirming preprocessing for inference on the WaveBase device  
|   └── train.py                     <- Script for training experiments
|
├── .gitignore                       <- Files to be be ignored by git.
├── config.yml                       <- Values of several constants used throughout project
├── README.md                        <- Project description
└── requirements.txt                 <- Lists all dependencies and their respective versions

Contacts

Robert Arntfield
Project Lead
Deep Breathe
[email protected]

Blake VanBerlo
Deep Learning Project Lead
Deep Breathe
[email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A-lines/B-lines Lung Ultrasound Classifier

Table of Contents

Getting Started

Building a Dataset

Use Cases

Train a Model

K-Fold Cross Validation

Hyper Parameter Optimization

Predictions

Grad-CAM for Individual Frame Predictions

Project Configuration

Project Structure

Contacts

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
data		data
img		img
results		results
src		src
.gitignore		.gitignore
README.md		README.md
config.yml		config.yml
requirements.txt		requirements.txt

gchetty/ab-line-classifier

Folders and files

Latest commit

History

Repository files navigation

A-lines/B-lines Lung Ultrasound Classifier

Table of Contents

Getting Started

Building a Dataset

Use Cases

Train a Model

K-Fold Cross Validation

Hyper Parameter Optimization

Predictions

Grad-CAM for Individual Frame Predictions

Project Configuration

Project Structure

Contacts

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages