Integration and Normalization
Integration and Normalization
◻Manually
⬜Using Surveys
⬜Using Observations
⬜Based on Contributions from experts
◻Using special computer peripherals equipments
(instruments and/or sensors)
⬜Bar-code readers
⬜Computer-based medical imaging system
⬜Remote sensing instruments
⬜Atmosphere sensing instruments
◻Web-based systems
⬜Becomes very popular to collect clients’ feedback via web
⬜Examples are online polls and questionnaires
Data Sources Classifications
3
⬜Internal data sources
■ Internal sources are data sources that are housed within a company
■ An organization’s internal data are about people, products, services,
and processes
⬜Personal data sources
■ Employees may document their own expertise by creating personal
data. These data are not necessarily just facts, but may include
concepts, thoughts, and opinions. They include, for example,
opinions about what competitors are likely to do, and certain rules and
formulas developed by the users
⬜External data sources
■ There are many sources for external data, ranging from commercial
databases to data collected by sensors and satellites
■ Internet and commercial database sources also considered as external
sources
Data integration
4
Where
and
◻ For the same entity, attribute values from different sources may
differ
◻ This may be due to differences in representation, scaling, or
encoding
◻ Example: a weight attribute may be stored in metric units (meter,
kilogram …) in one system and British units (Pound, inch …) in
another
◻ The price of different hotels may involve not only different
currencies but also different services and taxes
◻ Hence, these conflicts must be resolved before integrating data
sources
◻ Resolving these kinds of conflicts is done by converting one unit to
the other so that we use a unified unit for all integrated data sources
◻ Careful integration of data from multiple sources helps in reducing
and avoiding redundancy and inconsistency in the resulting data sets
Duplication
12
Where and