2 3-FeatureRelatedIssues
2 3-FeatureRelatedIssues
Week 2 Characterization of
Learning Problems
We will discuss the first and third case shortly and then focus
mostly on the second case.
Case1: A reasonably well composed set of Features is given based
on domain theoretic considerations.
The ZOO dataset The features reflected The problem in this case is
feature list in the Buffalo example of - neither a volume problem due to a large
conventional taxonomy ungraspable set of possible features
- nor a representation problem caused by
hair bilateral symmetry dataitems in non-digital form.
feathers two body openings
eggs embryonic spinal cord
milk vertebra
Still a domain-based sanity check is relevant.
airborne jaws
aquatic # of legs Terminological consistency and clear feature
predator mammary glands definitions are of key importance.
toothed fur
backbone neocortex
breathes # of middle ear bones
venomous give birth to live kids
fins lack of epi-pubic bones
legs hoofed
tail middle to large
domestic chewing
catsize cloven or hoofed
Case 3: Data-items are of a non digital nature and relevant features need to
be extracted from the data-items as a separate process.
Features can be derived in a variety of manners ranging
from totally manual, via manual/automatic hybrids to
totally automated.
Also, organizing and searching data often relies on detecting areas where
objects form groups with similar properties; in high dimensional data,
however, all objects appear to be sparse and dissimilar in many ways,
which prevents common data organization strategies from being efficient.
Over-fitting vs Under-fitting
Over-fitting is the production of a model that corresponds too closely or
exactly to a particular data-set, and may therefore fail to fit additional data or
predict future observations reliably. An over-fitted model is a model that
contains more features than can be justified by the data-set.