Entropy 23 00018 v2 24
Entropy 23 00018 v2 24
this, the authors proposed defining fairness across an exponential or infinite number of
sub-segments, which were determined over the space of sensitive feature values. To this
end, an algorithm that produces the most fair, in terms of sub-segments, distribution
over classifiers was proposed. This is achieved by the algorithm through viewing the
sub-segment fairness as a zero-sum game between a Learner and an Auditor, as well as
through a series of heuristics.
Following up from other studies demonstrating that the exclusion of sensitive fea-
tures cannot fully eradicate discrimination from model decisions, Kamishima et al. [99]
presented and analysed three major causes of unfairness in machine learning: prejudice,
underestimation, and negative legacy. In order to address the issue of indirect prejudice,
a regulariser that was capable of restricting the dependence of any probabilistic discrimi-
native model on sensitive input features was developed. By incorporating the proposed
regulariser to logistic regression classifiers, the authors demonstrated its effectiveness in
purging prejudice.
In [92], a framework for quantifying and reducing discrimination in any supervised
learning model was proposed. First, an interpretable criterion for identifying discrimination
against any specified sensitive feature was defined and a formula for developing classifiers
that fulfil that criterion was introduced. Using a case study, the authors demonstrated
that, according to the defined criterion, the proposed method produced the Bayes optimal
non-discriminating classifier and justified the use of postprocessing over the altering of
the training process alternative by measuring the loss that results from the enforcement of
the non-discrimination criterion. Finally, the potential limitations of the proposed method
were identified and pinpointed by the authors, as it was shown that not all dependency
structures and not all other proposed definitions or intuitive notions of fairness can be
captured while using the proposed criterion.
Pleiss et al. [97], building on from [92], studied the problem of producing calibrated
probability scores, the end goal of many machine learning applications, while, at the same
time, ensuring fair decisions across different demographic segments. They demonstrated,
through experimentation on a diverse pool of datasets, that probability calibration is only
compatible with cases where fairness is pursued with respect to a single error constraint and
concluded that maintaining both fairness and calibrated probabilities, although desirable,
is often nearly impossible to achieve in practice. For the former cases, a simple postpro-
cessing technique was proposed that calibrates the output scores, while, at the same time,
maintaining fairness by suppressing the information of randomly chosen input features.
Celis et al. [98] highlighted that, although efforts have been made in recent studies to
achieve fairness with respect to some particular metric, some important metrics have been
ignored, while some of the proposed algorithms are not supported by a solid theoretical
background. To address these concerns, they developed a meta-classifier with strong
theoretical guarantees that can handle multiple fairness constraints with respect to multiple
non-disjoint sensitive features, thus enabling the adoption and employment of fairness
metrics that were previously unavailable.
In [94], a new metric for evaluating decision boundary fairness both in terms of
disparate treatment and disparate impact at the same time, with respect to one or more
sensitive features was introduced. Furthermore, utilising this metric, the authors designed a
framework comprising of two contrasting formulations: the first one optimises for accuracy
subject to fairness constraints, while the second one optimises towards fairness subject to
accuracy constraints. The proposed formulations were implemented for logistic regression
and support vector machines and evaluated on real-world data, showing that they offer fine-
grained control over the tradeoff between the degree of fairness and predictive accuracy.
Following up from their previous work [94], Zafar et al. [93] introduced a novel
notion of unfairness, which was defined through the rates of misclassification, called dis-
parate mistreatment. Subsequently, they proposed intuitive ways for measuring disparate
mistreatment in classifiers that rely on decision boundaries to make decisions. By experi-
menting on both synthetic and real world data, they demonstrated how easily the proposed