1. Classification
1. Classification
com
DLZNK464L9 Classification – Logistic Regression
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• Introduction to classifications
• Linear regression vs. Logistic regression
• Brief about Prediction
• Introduction to Logistic regression
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Classification
Supervised learning…class labels!
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Classification
• Qualitative variables (nominal) take values in an unordered set of classes (C), such as:
• Given a feature vector X and a qualitative response Y taking values in the set C, the
[email protected]
DLZNK464L9
classification task is to build a function C(X) that takes as input the feature vector X and
predicts its value for Y; i.e., C(X) ∈ C.
• Classifiers – non-probabilistic or probabilistic
• Often, we are more interested in estimating the probabilities that X belongs to every
category in C.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Example: Credit Card Default
Group-level dispersion vs. income and balance
2500
60000
2000
60000
1500
Balance
Income
40000
Income
40000
1000
[email protected]
DLZNK464L9
20000
20000
500
0
0
0
0 500 1000 1500 No No
2000 2500
Yes Yes
Balance
Default Default
Sharing or publishing the contents in part or full is liable for legal action.
Can we use Linear Regression?
Suppose for the Default classification task that we code:
1 if No
Y=
2 if Yes.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Linear vs. Logistic Regression
||||||||||||
1.0
||||||||||
Probability of Default
Probability of Default
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
|||||||||| ||||||||||
0.0
0.0
1.0
[email protected]
DLZNK464L9
Balance Balance
Sharing or publishing the contents in part or full is liable for legal action.
Linear vs. Logistic Regression
Sigmoid curve
|||||||||||| ||||||||||||
1.0
Probability of Default
Probability of Default
0.8
0.8
0.6
0.6
[email protected]
DLZNK464L9
0.4
0.4
0.2
0.2
|||||||||| ||||||||||
0.0
0.0
1.0
0 500 1000 1500 0 500 1000 1500
2000 2500 2000 2500
Balance Balance
Logistic regression ensures that our estimate for p(X) lies between 0 and 1.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Logistic Regression
Logistic regression uses the form
It is easy to see that no matter what values β0, β1, or X take, p(X) will have values between 0
[email protected]
DLZNK464L9
and 1.
A bit of rearrangement gives:
Sharing or publishing the contents in part or full is liable for legal action.
Maximum Likelihood
Another approach: Gradient descent
We use maximum likelihood to estimate the parameters.
Maximum likelihood estimates: differentiate the log-likelihood
with respect to the parameters, set the derivatives equal to
“cost function” zero, and solve.
This likelihood gives the probability of the observed zeros and ones in the data. We pick β0
[email protected]
DLZNK464L9
and β1 to maximize the likelihood of the observed data.
Most statistical packages can fit linear logistic regression models by maximum likelihood.
Sharing or publishing the contents in part or full is liable for legal action.
Making Predictions
What is our estimated probability of default for someone with a balance of $1000?
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Making Predictions
Let's do it again, using student as the predictor.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Logistic Regression with Several Variables
…multiple linear regression models?
The coefficient for student is now negative (it was positive before).
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Logistic Regressions: Confounding
0.8
Default Rate
0.4
0.2
500
[email protected]
DLZNK464L9
0.0
0
500 1000 1500 2000 No Yes
• Students tend to have higher balances than non-students, so their marginal default rate
is higher than for non-students.
• But for each level of balance, students default less than non-students.
• Multiple logistic regression can tease this out.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Logistic Regression with more than Two Classes
Logistic regressions are easily generalized to more than two classes (multinomial regression). Two
major options:
• “Classic”: Using a baseline class (K-1 functions). Parameter estimates are given in respect of the
Kth class. Baseline selection is not important.
[email protected]
DLZNK464L9
• Softmax: No baseline selection. A linear function for each class (implemented in glmnet).
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed and understood classifications using an example of credit card default.
• We discussed the reason for preferring Logistic regression over Linear regression, as linear regression
might produce probabilities less than zero or bigger than one, so Logistic regression is more
appropriate for modeling.
• We talked about making predictions using a simple example of statistical data of a student.
• We discussed Logistic regression which is easily generalized to more than two classes (multinomial
[email protected]
DLZNK464L9
regression) and ensures that our estimate for the predictor lies between 0 and 1.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Classification – Discriminant Analysis
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will cover:
• Introduction to discriminant analysis
• Conditional Probabilities
• Introduction to Bayes theorem
• Linear Discriminant Analysis with different P value
• Types of errors
• ROC Curve
[email protected]
DLZNK464L9
• Limitations of LDA
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Why Discriminant Analysis?
• When the classes are well-separated, parameter estimates for the logistic regression model are
unreliable. Linear discriminant analysis does not suffer from this problem.
• If n is small and the distribution of the predictors X is approximately normal in each of the classes,
the linear discriminant model is more stable than the logistic regression model.
[email protected]
DLZNK464L9
• Linear discriminant analysis is popular when we have >2 classes because it also provides low-
dimensional views of the data.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Discriminant Analysis
Goal:
Model the distribution of X in each of the classes separately.
Then, use the Bayes’ theorem to obtain Pr(Y|X).
When we use normal (Gaussian) distributions for each class, this leads to linear or
[email protected]
DLZNK464L9
quadratic discriminant analysis (other distributions can be used as well).
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Conditional Probabilities
F
[email protected]
DLZNK464L9 H
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Bayes’ Rule
[email protected]
DLZNK464L9
B
A
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Bayes’ Rule
Concepts:
• Likelihood
- How much does a certain hypothesis explain the data?
• Prior
- What do you believe before seeing any data?
[email protected]
DLZNK464L9
• Posterior
- What do we believe after seeing the data?
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Bayes’ Theorem for Classification
The Bayes’ theorem:
where
Pr(Y = k|X = x) =
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Classify to the Highest Density
Bayes’ π1=.5, π2=.5 π1=.3, π2=.7
boundaries…the
smallest error*
[email protected]
DLZNK464L9
−4 0 2 4 −4 0 2 4
−2 −2
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Linear Discriminant Analysis when p = 1
Again, we’re assuming normal distributions for each class:
• Here µk is the mean, and σk2 the variance (in class k). We will assume that all
the σk = σ are the same.
[email protected]
DLZNK464L9
• Plugging this into Bayes’ formula, we get the following expression for pk(x) =
Pr(Y=k|X=x):
Description of the kth group
Prior
Marginal probability
Sharing or publishing the contents in part or full is liable for legal action.
Discriminant Functions (Bayes classifier)
To classify at the value X=x, we need to see which of the pk(x) is the largest. Taking logs, and discarding
terms that do not depend on k, we see that this is equivalent to assigning x to the class with the
largest discriminant score:
[email protected]
DLZNK464L9
Note that δk(x) is a linear function of x.
If there are K=2 classes and π1 = π2 = 0.5, then one can see that the decision boundary is at:
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Discriminant Functions (Bayes’ classifier)
3
DLZNK464L9
2
1
4
0
−4 −2 0 2 4 −3 −2 1 2 3
−1
0
Decision boundary (LDA)
29 /
[email protected] 40
LDA:
DLZNK464L9
- Label-specific means
- Shared variance
Weighted average of sample
variances for each class
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Linear Discriminant Analysis when p > 1
Cor(x1,x2)≠0
Cor(x1,x2)=0
[email protected]
DLZNK464L9
Var(x1) ≠ Var(x2)
Var(x1) =
Var(x2) Vector of mean values
Density:
Covariance matrix (shared across
labels…)
Discriminant function:
Despite its complex form,
a linear function.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Illustration: p = 2 and K = 3 classes
Ellipses, 95% of observations 20 (simulated) observations (per class)
4
Bayes boundaries
0
4
2
X2
X2
0
−2
2
[email protected]
DLZNK464L9
−2
−4
−4
−4 −2 0 −4 −2 0 2 4
2 4
X1
LDA boundary
X1
Here π1 = π2 = π3 = 1/3.
The dashed lines are known as the Bayes’ decision boundaries. In this
case, they yield the fewest misclassification errors, among all possible
classifiers. Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
From δk(x) to Probabilities
Once we have estimates , we can turn these into estimates
for class probabilities:
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
LDA on Credit Data
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
A Summary of the Types of Errors (Confusion Matrix)
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Types of Errors
False positive rate: The fraction of true negative examples that are classified as positive — 0.2% in
example.
False negative rate: The fraction of true positives that are classified as negative — 75.7% in example.
We produced this table by classifying to class Yes if
^
Pr(Default = Yes|Balance, Student) ≥ 0.5
[email protected]
DLZNK464L9
We can change the two error rates by changing the threshold from 0.5 to some other value in [0, 1]:
^
Pr(Default = Yes|Balance, Student) ≥ threshold, and vary threshold.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Varying the threshold
The fraction of true negative examples that are classified as positive
0.6
Error Rate
Overall Error
0.4
[email protected]
DLZNK464L9 False Positive
False Negative
Threshold
In order to reduce the false negative rate, we may want to reduce the threshold
to 0.1 or less. Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
ROC Curve
1.0
0.8 0.6
True positive
0.4
rate
[email protected] No association
DLZNK464L9
between
0.2
predictor(s) and
response
0.0
[email protected]
DLZNK464L9
LDA will also fail if discriminatory information is not in the mean but in the variance of the data.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed that we use discriminant analysis when the classes are well-separated, and parameter
estimates for the logistic regression model are unreliable.
• We learned conditional probabilities using an example.
• We discussed Bayes’ theorem mathematically and understood how it works.
• We talked about Linear Discriminant Analysis with different p and K values to yield the fewest
misclassification errors, among all possible classifiers.
[email protected]
DLZNK464L9
• We talked about various types of errors, such as false positive rate, true positive rate, etc.
• We discussed the threshold rate graphically in order to reduce the false negative rate and understood
ROC Curve between true and false positive rates.
• We talked about the limitations of LDA, such as it failed to predict if discriminatory information is not
in the mean but in the variance of the data.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
DLZNK464L9 Extensions of the LDA
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• Other forms of Discriminant Analysis
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Other forms of Discriminant Analysis
πkf k(x)
Pr(Y = k|X = x) = Σ K
πlfl(x)
l=1
LDA: When fk(x) are Gaussian densities, with the same covariance matrix Σin each class.
By altering the forms for fk(x), we get different classifiers.
[email protected]
DLZNK464L9
• With Gaussians but different Σk in each class, we get quadratic discriminant analysis.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Quadratic Discriminant Analysis
QDA Bayes boundary
r=-0.7
2
r=0.7
1
−2
0
2
LDA
X2
X2
r=0.7
−1
r=0.7
0
−3
−1
[email protected]
1
−2
DLZNK464L9
−3
−4
−4
−4 −2 0 2 4 −4 −2 0 2 4
X1 X1
Sharing or publishing the contents in part or full is liable for legal action.
Näive Bayes
Assumes features are independent in each class.
Useful when p is large, and so multivariate methods like QDA and even LDA break down (curse of
dimensionality!).
• Gaussian näive Bayes assumes each Σk is diagonal:
[email protected]
DLZNK464L9
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed other types of Discriminant Analysis, such as quadratic discriminant analysis, and näive
Bayes classifier, and understood both the forms graphically and mathematically in detail.
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
[email protected]
DLZNK464L9 KNN
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• Introduction of K-nearest Neighbor
• Distance metrics
• Advantages and limitations of KNN
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
KNN
• K-nearest neighbors (KNN)
• Very popular because very
simple and excellent empirical
performance
• Handles both binary and multi-
class data
[email protected]
•
DLZNK464L9 Makes no assumptions about the
parametric form of the decision
boundary:
• A non-parametric
method
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
KNN
• Does not have a training phase –
just store the training data and
do computation when time to
classify.
these K neighbors
(or for regression: average)
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
K-nearest Neighbor
What value of k should we use?
• Using only the closest example (1NN) to determine the class is subject to errors due to:
• A single atypical example
• Noise
• Pick k too large and you end up with looking at neighbors that are not that close.
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Similarity Metrics
Nearest neighbor methods depends on a similarity (or distance) metric.
Ideas?
Euclidean distance
Binary instance space is Hamming distance (number of feature values that differ).
[email protected]
DLZNK464L9
For text, cosine similarity of tf.idf weighted vectors is typically most effective.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Advantages and limitations of KNN
• Good
o No training is necessary.
o No feature selection necessary.
o Scales well with large number of classes.
▪ Don’t need to train n classifiers for n classes.
• Bad
[email protected]
DLZNK464L9 o Classes can influence each other.
▪ Small changes to one class can have ripple effect.
o Scores can be hard to convert to probabilities.
o Can be more expensive at test time.
o “Model” is all of your training examples which can be large.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Example: K-nearest neighbors in two dimensions
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
For different values of K
KNN: K=1 KNN: K=100
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
KNN: K=10
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed K-nearest Neighbor is a simple method that has empirical performance and handles
both binary and multi-class data.
• We talked about in brief on various distance metrics, such as Euclidean and Hamming distance
• We discussed the advantages and limitations of KNN, such that no training is required but can be
more expensive during testing the model.
• We talked about KNN with different k-values and understood the effect on test data graphically.
[email protected]
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
[email protected] prohibited.
Sharing or publishing the contents in part or full is liable for legal action.