Data Science
Data Science
Data Science
_________________________________________
@ 2024 FTVT Institute All Rights Reserved
PPT 1
2
Contents
Overview of Data Science
Represented with the help of characters such as alphabets (A-Z, a-z), digits
Information
Is the processed data on which decisions and actions are based
It is data that has been processed into a form that is meaningful to the
recipient
For example, when electronic computers are used, the input data can be recorded on any one of
the several types of storage medium, such as hard disk, CD, flash disk and so on.
Processing
In this step, the input data is changed to produce data in a more useful form.
For example, interest can be calculated on deposit to a bank, or a summary of sales for the month
can be calculated from the sales orders.
Output
At this stage, the result of the proceeding processing step is collected.
The particular form of the output data depends on the use of the data.
7
• In computer science and computer programming, for instance, a data type is simply an
attribute of data that tells the compiler or interpreter how the programmer intends to use the
data.
1. Data types from Computer programming perspective
Almost all programming languages explicitly include the notion of data type.
B. Semi-structured, and
Structured Data
Each of these has structured rows and columns that can be sorted.
10
Semi-structured Data
Unstructured Data
Unstructured data is information that either does not have a predefined data
model or is not organized in a pre-defined manner.
The Big Data Value Chain identifies the following key high-level activities:
14
Data Acquisition
It is the process of gathering, filtering, and cleaning data before it is put in
a data warehouse or any other storage solution on which data analysis can
be carried out.
Data Analysis
It is concerned with making the raw data acquired amenable to use in
decision-making as well as domain-specific usage.
Data analysis involves exploring, transforming, and modeling data with the
goal of highlighting relevant data, synthesizing and extracting useful
hidden information with high potential from a business point of view.
Data Curation
It is the active management of data over its life cycle to ensure it
meets the necessary data quality requirements for its effective usage.
A key trend for the duration of big data utilizes community and
crowdsourcing approaches.
17
Data Storage
It is the persistence and management of data in a scalable way that
satisfies the needs of applications that require fast access to the data.
Data Usage
It covers the data-driven business activities that
need access to data, its analysis, and the tools
needed to integrate the data analysis within the
business activity.
Resource Pooling
High Availability
Easy Scalability
24
High Availability
Clusters can provide varying levels of fault tolerance and availability guarantees
to prevent hardware or software failures from affecting access to data and
processing.
This becomes increasingly important as we continue to emphasize the
importance of real-time analytics.
Easy Scalability
Clusters make it easy to scale horizontally by adding additional machines to the
group. This means the system can react to changes in resource requirements
without expanding the physical resources on a machine.
25
Economical
Reliable
Scalable
Flexible
27