Data Science
Data Science
AutoML
Outline
Domain Experts in DS
Fairness of Machine Learning Model (FairML)
Mention of ML Fairness in Research Papers
Thoughts ?
● Group Fairness
Partitions a population into groups defined by protected
attributes(such as gender, caste, or religion) and seeks for some
statistical measure to be equal across groups.
● Individual Fairness
similar individuals should be treated similarly.
ML Unfairness - Causes (Data)
● Skewed sample
● Tainted examples
● Limited features
● Sample size disparity
● Proxies
Difficulties in ensuring ML Algorithm is Fair
Interpretable Machine Learning
IML Benefits
Fairness: Ensuring that predictions are unbiased and do not implicitly or explicitly
discriminate against protected groups. An interpretable model can tell you why it has
decided that a certain person should not get a loan, and it becomes easier for a human
to judge whether the decision is based on a learned demographic (e.g. racial) bias.
Privacy: Ensuring that sensitive information in the data is protected.
Reliability or Robustness: Ensuring that small changes in the input do not lead to
large changes in the prediction.
Trust: It is easier for humans to trust a system that explains its decisions compared to a
black box.
IML Architecture
Preferred Explaining - Model Interpretation
Way to go….
Explainability and Fairness - Just one `pip` away
lime - https://github.com/marcotcr/lime
shap - https://github.com/slundberg/shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
eli5 - https://github.com/TeamHG-Memex/eli5
scikit-lego - https://github.com/koaning/scikit-lego
from sklego.preprocessing import InformationFilter
from sklego.linear_model import FairClassifier
What-if Tool - https://pair-code.github.io/what-if-tool/
Captum - https://github.com/pytorch/captum
Is only the organization haivng the protected
data being responsible for bringing the digital
fairness?
Off course not.
Data pre-processing
Feature engineering
Feature extraction
Feature selection
Algorithm selection
Hyperparameter optimization
Validation
As many of these steps are often beyond the abilities of non-experts, AutoML
was proposed as an artificial intelligence-based solution to the ever-growing
challenge of applying machine learning.
Targets of AutoML
1) Automated data preparation and ingestion (from raw data and miscellaneous
formats)
Automated column type detection; e.g., boolean, continuous, or text
Automated column intent detection; e.g., target/label
Automated task detection; e.g., binary classification, regression, clustering.
2) Automated feature engineering
Feature selection
Feature extraction
Detection and handling of skewed data and/or missing values
3) Automated model selection
4) Hyperparameter optimization of the learning algorithm and featurization
5) Automated selection of evaluation metrics / validation procedures
6) Automated analysis of results obtained
7) User interfaces and visualizations for automated machine learning
Advantages of AutoML
Increases productivity by automating repetitive tasks. This enables a data scientist to focus more on the problem rather
than the models.
Automating the ML pipeline also helps to avoid errors that might creep in manually.
Ultimately, AutoML is a step towards democratizing machine learning by making the power of ML accessible to everybody.
AutoML Frameworks
MLBox
Compatibilities:
Compatibilities:
By
Manikandan
Gmail - [email protected]
LinkedIn - www.linkedin.com/in/manikandan1191