AutoML Tools
AutoML Tools
Takes only csv or tsv data; works for structured data only
Then choose the type of prediction task : classification, regression, forecasting
Then choose the metric : accuracy, AUC, norm_macro_recall ,avg_precision_score,
precision_score
Then set training time and number of iterations
Then choose validation type : K-fold cross validation , monte carlo cross validation
Then choose algorithms to block; some algorithms listed are – logistic regression, SGD, naive
bayes, SVM , KNN, decision trees, random forest, gradient boosting etc
Can be deployed, sets up a HTTPS endpoint and can be used via API calls for inference
Can generate feature importance, must provide a validation dataset (X_valid) to get
feature importance, for documentation refer : https://docs.microsoft.com/en-
us/azure/machine-learning/service/how-to-configure-auto-train#explain-the-model-
interpretability
Feature importance can be accessed via command line or azure portal
Reference link for understanding automl : https://docs.microsoft.com/en-
us/azure/machine-learning/service/concept-automated-ml
GCP AUTOML
Works for
o vision, including images and video
o language models, revealing structure and meaning to text and translation
o structured data
following are all use cases
o Natural language classification
o Natural language entity extraction
o Natural language sentiment analysis
o Tables
o Translation
o Video intelligence classification
o Video object tracking
o Vision classification
o Vision edge
o Vision object detection
Has a manual/automated labelling service
Deployed and can be used via a REST API for inference
Feature importance can be seen
The evaluation metrics can either be accessed via GCP console or command line
AWS Sagemaker
Can be deployed, sets up a HTTPS endpoint and can be used via API calls for inference
Can be used for
o Regression
o Classification
o Time series forecasting
o Recommendations
Feature importance can be accessed via command line
Following algorithms implemented in AWS Sagemaker
o BlazingText Word2Vec- BlazingText implementation of the Word2Vec algorithm for
scaling and accelerating the generation of word embeddings from a large number of
documents.
o DeepAR - An algorithm that generates accurate forecasts by learning patterns from
many related time-series using recurrent neural networks (RNN).
o Factorization Machines - A model with the ability to the estimate all of the
interactions between features even with a very small amount of data.
o Gradient Boosted Trees (XGBoost) - Short for “Extreme Gradient Boosting”, XGBoost
is an optimized distributed gradient boosting library.
o Image Classification (ResNet) - A popular neural network for developing image
classification systems.
o IP Insights - An algorithm to detect malicious users or learn to usage patterns of IP
addresses.
o K-Means Clustering - One of the simplest ML algorithms. It’s used to find groups
within unlabeled data.
o K-Nearest Neighbor (k-NN) - An index based algorithm to address classification and
regression based problems.
o Latent Dirichlet Allocation (LDA) - A model that is well suited to automatically
discovering the main topics present in a set of text files.
o Linear Learner (Classification) - Linear classification uses an object’s characteristics
to identify the appropriate group that it belongs to.
o Linear Learner (Regression) - Linear regression is used to predict the linear
relationship between two variables.
o Neural Topic Modelling (NTM) - A neural network based approach for learning topics
from text and image datasets.
o Object2Vec - A neural-embedding algorithm to compute nearest neighbors and to
visualize natural clusters.
o Object Detection - Detects, classifies, and places bounding boxes around
multiple objects in an image.
o Principal Component Analysis (PCA) - Often used in data pre-processing, this
algorithm takes a table or matrix of many features and reduces it to a smaller
number of representative features.
o Random Cut Forest - An unsupervised machine learning algorithm for anomaly
detection.
o Semantic Segmentation - Partitions an image to identify places of interest by
assigning a label to the individual pixels of the image.
o Seqence2Sequence - A general-purpose encoder-decoder for text that is often
used for machine translation, text summarization, etc.