Lecture 2-Data Science
Lecture 2-Data Science
Objectives
• Describe the data-related implications for data and database
systems within the data to wisdom continuum.
• Explain database models.
• Describe the purpose, structures, and functions of database
management systems (DBMSs).
• Describe generating, storing, curating, retrieving, and
interpreting data and related issues.
• Explore concepts and issues related to data warehouses, data
marts, data stores, big data, dashboards, and data analytics.
• Explain knowledge discovery in databases (KDD) including data
mining, data analytics, and benchmarking and their
relationship to evidence-based practice and value-based
patient centric care.
Information Management
• Information Management: applying management
techniques in the collection, storage, analyses,
dissemination, archiving, and destruction of internal
and external data in order to manage operations,
take decisions, plan projects, develop policies,
resource planning ….etc.
Association for Project management (APM) (2020)
IM Steps/Plan/Process
D
a
t Raw
a
Data Science
Data Science:
It is the amalgamation
of classical disciplines
like statistics, data
mining, databases,
domain knowledge,
and computer
systems.
Van Der Aalst (2016)
Data Sources
Data Types
1. Structured Data
2. Unstructured Data
Curating
Curating
• Servers • Warehouse
Backend Servers/Databases
• Cloud vs. In-house: One of the biggest developments in the recent
past is cloud computing. In a cloud-hosted DBMS the back end is
accessed through the Internet, while in an in-house hosted system
the server that houses the database is on site.
• Distributed vs. Centralized: One of the decisions that needs to be
made is whether the database is going to be distributed or
centralized. A centralized system is one where there is a single,
central computer that hosts a database and the DBMS. Many
hospitals today are examples of this type of system. The hospital is
the “hub” and hosts the system where many users on the network
access this database. A distributed system is one where there are
multiple database files located at different sites. The main difference
between these two options is one of control. In a centralized system,
there is a central control mechanism. Conversely, in a distributed
system there is no centralized control structure
Data Base Models
• Relational Database Models The Relational Database Model is still the
most popular form of DBMS, but Non-Relational Databases (e.g. MpSQ) are
on the rise. In the Relational Database Model, tables are related to each
other through a system of keys. Each table has a primary key which allows
the system to request one record at a time. Tables can be combined in such
a way to allow the system to generate reports based on all of tables. The
main features of this type of a system are tables, attributes, and keys
where attributes are the columns in the tables and keys are what allows us
to find one record in the table. The functions they provide include creating,
updating, or changing data, deleting data, and querying generally by means
of Structured Query Language (SQL) statements. Examples of widely used
RDBMS include Oracle, MySQL, Microsoft SQL Server, and DB2.
• NoSQL Database Models NoSQL is an agile system that easily processes
unstructured data and semi-structured data. It is cloud-friendly and a new
way of thinking about databases. NoSQL doesn’t adhere to traditional
RDMS structure, has a rich query language, and is easily scalable
(MongoDB, 2019)
Data Base Models
From Servers to End products
DBMS Storing
Curating
Curating
• Servers • Warehouse
Data Warehousing
• While a DBMS provides a structure to data, a data warehouse
provides specificity. Many organizations have developed
specific systems to meet their needs: these are data
warehouses.
• Data warehousing is “the process of extracting, integrating,
transforming, and cleansing data and storing it in a
consolidated database” (Mullins, 2013, p. 638).
• The purpose of a data warehouse is to provide a place to store
multiple forms of data in a lightly summarized way.
Other Types of Data Repository
• Data Marts: is a DBMS that is for a single unit of work and
may contain a subset of data stored in a warehouse. For
instance, a hospital may have a data warehouse where all
information is housed, and a single department may have a
data mart.
• Data Lakes: is a freer form of a DBMS where the structure of
the data is loose and varied including structured, semi-
structured, and unstructured data. Input processing can be in
batch, real-time, or one-time loads.
Which one Best Fits?
• Network configuration: What type of network will the system
be running on? For example, local area network (LAN), wide
area network (WAN), and wireless local area network (WLAN).
• Type of data being stored: If there will be a lot of medical
images, videos, or sounds, it is important to realize that these
need a lot of space.
• Amount of data: How much data are there? If there is a large
amount of data, a system that allows for faster retrieval from
the system may be necessary.
• Systems interoperability: Are there requirements that the
system interface with another system?
• Budget considerations: How much money is being dedicated
to the database project?
From Servers to End products
DBMS Storing
Curating
Curating
• Servers • Warehouse
Analytics and Data mining
• Analytics: Once the data has been stored, curated, and
retrieved, it is then the responsibility of the end user to go
through and perform analytics on the data.
• Descriptive Analytics: Describing Current Status.
• Prescriptive Analytics: “prescribing” solutions.
• Predictive Analytics: Forecasting.