0% found this document useful (0 votes)
8 views60 pages

Chapter 3 New

Chapter 3 discusses database systems and big data, covering basic concepts such as data, information, and knowledge, as well as data handling approaches in organizations. It introduces databases and database management systems (DBMS), detailing their components, characteristics, and classifications based on data models and user support. The chapter emphasizes the importance of data storage, processing, and the relational data model as a foundation for modern DBMSs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views60 pages

Chapter 3 New

Chapter 3 discusses database systems and big data, covering basic concepts such as data, information, and knowledge, as well as data handling approaches in organizations. It introduces databases and database management systems (DBMS), detailing their components, characteristics, and classifications based on data models and user support. The chapter emphasizes the importance of data storage, processing, and the relational data model as a foundation for modern DBMSs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Chapter - 3

Database Systems and Big Data


Topics under discussion
 Basic Concepts
 Data handling approaches in organizations
 Introduction to Database
 Database Management System (DBMS)
 Components of DBMS
 Classification Of DBMs
 Database Languages
 Introduction to Big Data
Basic Concepts
 Data
 Raw facts or figures.
 Data represents facts or figures obtained from experiments, surveys,
or observations used as basis for making calculations or drawing
conclusions .
 In and of itself, data has no meaning.
 Example:- If I count the number of students attending this class,
that's data.
 It has no meaning until it is placed in a context.
 It is like an event out of context, without a meaningful relation to
other things.
 If we are given a certain data, we can associate it to different
things and give it different meanings.
Basic Concepts
 Information
 Information is data that has been processed and organized.
 It is the result of gathering, processing, manipulating and
organizing data in a way that adds to the knowledge of the
receiver.
 Example
 Let's say I want to buy a car. I can collect a lot of data
about makes of cars, performance ratings, prices and so on.
Once I do that, I have a lot of information about cars and
the car market. Unless we think of this collection of data
and put it in context (car/car market), it has no meaning.
Basic Concepts
 Information
 Information is data that has been given a meaning by way of
relational connection.
– This relational connection converts data in to information.
 Information is data with context. Therefore, information is context
dependent.
 Example: consider the following data
 15 degrees, and
 it is raining
 The temperature dropped to 15 degrees and then it started
raining.
 It is the cause and effect relationship between the two that
provides information.
 Therefore,
 Information = Data sets + understanding of relationship among data sets
 What we perceive or understand is the relationship between pieces of data,
or between pieces of data and other information.
Basic Concepts
 Knowledge
 An organized and processed information to convey understanding,
learning, expertise…. is called knowledge.
 Information becomes knowledge when one is able to understand the
patterns that exist within information and their implication.
 For Example
 I put Birr 100 in my saving account and the bank pays 5% per
annum, at the end of one year. Interest = Birr 5 and principal =
105 .
 Understanding this pattern represents knowledge and it enables
to understand the results it will produce.
 Therefore
– knowledge= Information + understanding of the pattern.
 There are two types of knowledge
 Formal /Explicit
 Informal /implicit/tacit knowledge
Basic Concepts
 Question
 What is wisdom?
 What is the difference of wisdom with knowledge?
Data Handling Approaches in
organizations
 Day-to-day business processes executed by individuals and
organizations require both present and historical data.
 Therefore, data storage is essential for organizations and
individuals.
 Data supports business functions and aids in business
decision-making.
 Below are the three approaches that organizations use to
store organizational Data.
A. Manual Data Handling Approach
B. Traditional File Based Data Handling Approach
C. Database Data Handling Approach
Introduction to Database
 A database is a shared collection of related data used to
support the activities of a particular organization.
 A database can be viewed as a repository of data that is
defined once and then accessed by various users as shown in
the following Figure.
Application1  The database is a single,
User 1
possibly large repository of data
that can be used simultaneously
Application 2 by many departments and users.
 The database is no longer owned
Database by one department but is a shared
corporate resource.
 The database holds not only the
User 2
organization’s operational data
but also a description of this
data (also called metadata).
User 3
Characteristics of Database
 Self-describing nature of a database system
 A fundamental characteristic of the database approach is that the
database system contains not only the database itself but also a
complete description of the database (called metadata).
 This information is used by the DBMS software or database users if
needed.
 For example
 Information about table names, types, and purposes.
 Names, data types (e.g., integer, string, date), sizes, null-ability
(whether null values are allowed) of columns/ attributes
 Information about primary keys, foreign keys
 Information about how different tables are linked through relationships
(e.g., one-to-many, many-to-many).
 Information about when records were created, modified, or accessed.
Characteristics…
 Support for multiple views of data
 A database supports multiple views of data. A view is a subset of the
database, which is defined and dedicated for particular users of the
system.
 Multiple users in the system might have different views of the system.
Each view might contain only the data of interest to a user or group of
users.
 Sharing of data and multiuser system
 Current database systems are designed for multiple users. That is, they
allow many users to access the same database at the same time. This
access is achieved through features called concurrency control
strategies.
 These strategies ensure that the data accessed are always correct and
that data integrity is maintained.
 The design of modern multiuser database systems is a great improvement
from those in the past which restricted usage to one person at a time.
Characteristics of…
 Restriction of unauthorized access
 Not all users of a database system will have the same accessing
privileges.
 For example, one user might have read-only access (i.e., the
ability to read a file but not make changes), while another
might have read and write privileges, which is the ability to
both read and modify a file.
 For this reason, a database management system should provide a
security subsystem to create and control different types of
user accounts and restrict unauthorized access.
 Backup and recovery facilities
 Backup and recovery are methods that allow you to protect your
data from loss.
 If a computer system fails in the middle of a complex update
process, the recovery subsystem is responsible for making sure
that the database is restored to its original state.
Database Management System (DBMS)
 DBMS is A software system that enables users to define, create, maintain,
and control access to the database.
 Typically, a DBMS provides the following facilities:
 It allows users to define the database, usually through a Data Definition
Language (DDL).
 It allows users to insert, update, delete, and retrieve data from the
database, usually through a Data Manipulation Language (DML).
 It provides controlled access to the database. For example, it may
provide
‐ a security system, which prevents unauthorized users accessing the database;
‐ a concurrency control system, which allows shared access of the database;
‐ a recovery control system, which restores the database to a previous
consistent state following a hardware or software failure;
Components DBMS Environment
 The database management system can be divided into five
major components, they are:
1. Hardware
2. Software
3. Data
4. Procedure and
5. people
 These components are illustrated in the following figure.

Figure: Components of DBMS Environment


Hardware Component
 The hardware is the actual computer system used for Maintaining and
accessing the database.
 Hardware components refers to all the systems physical devices.
 For example:-computers (PCs, workstations, servers and
supercomputers),storage devices(Hard disks, RAM, ROM,...), networking
devices (switches, hubs, routers,...), and other devices (input and
output devices)
 One can’t implement or use DBMS without using Hardware components. It can
range from a single personal computer, to a single mainframe, to a
network of computers.
 The particular hardware depends on the organization’s requirements and
the DBMS used. Some DBMSs run only on particular hardware or operating
systems, while others run on a wide variety of hardware and operating
systems.
 When we run any DBMS like Oracle, MySQL, etc. on our PC, then computer
parts like mouse, keyboard, RAM, ROM, hard disks all become part of DBMS
hardware components.
Software component

 software is a set of instructions that is used to instruct the


computer hardware for the operation of the computers.
 The software establishes an easy-to-use interface for users to
control the hardware and to create, store, access and/or update in
the database.
 all requests made by users for database management are handled and
processed by the DBMS software.
 The software component of DBMS comprises of
– DBMS software:- Microsoft SQL Server, Oracle, MySQL, etc.
– Operating System:- Microsoft Window, Linux, UNIX, etc.
– Application Programs and Utility Programs
– Network Software if the DBMS is being used over a network.
Data
 Data is that resource, for which DBMS was designed. The motive
behind the creation of DBMS was to store and utilize data.
 The database contains both the operational data and the
metadata
 Metadata is data about the data. This is information stored by
the DBMS to better understand the data stored in it.
 For Example:- when we store specific data (let us say, a
person's name) in the database, the DBMS also stores
additional information such as when and where the data was
stored, the size of the data, whether the data is relative or
dependent, data type, etc. all this additional information
about the actual data (i.e. person's name) is collectively
called metadata.
Procedure
 The procedure is a type of general instruction or guidelines for
the use of DBMS.
 This instruction includes
‐ how to set up the database,
‐ how to install the database,
‐ how to log in and log out of the database,
‐ how to manage the database,
‐ how to take a backup of the database, and
‐ how to generate the report of the database
 The basic purpose of procedures is to help guide users th the
operation and management of database systems
People/ User
 people refers to every person who design and accesses the database and
performs any operation like creating, deleting, accessing or modifying
data in the database with DBMS.
 The user (People) of the DBMS can be classified into the following types;
‐ Database Designer:- Database designers are responsible for identifying
the data to be stored in the database and for choosing appropriate
structures to represent and store this data.
‐ Database Administrator:- responsible to oversee, control and manage
the database resources (the database itself, the DBMS and other
related software).
‐ Application Developer:- The application programmer determines the
interface on how to retrieve, insert, update and delete data in the
database.
‐ End User:-any person who directly interacts with a DBMS and performs
various database-related operations like inserting, modifying,
retrieving or deleting data using database commands or applications
Classification of DBMS
 Database management systems can be classified based on
several criteria, such as
1. Classification Based on Data Model
» Relational DBMS
» Object oriented DBMS
» Etc.
2. Classification Based on Number of user it supports
» Single user DBMS
» Multi user DBMS
3. Classification Based on number of sites over which the database is
distributed
» Centralized DBMS
» Distributed DBMS
Classification of DBMS based on Data Model
 Relational DBMS
 In this model, the data is organized into a collection of two-dimensional
inter-related tables, also known as relations.
 Each relation is a collection of columns and rows, where the column
represents the attributes of an entity and the rows (or tuples) represents
the records.
 A relational database uses SQL for storing, manipulating, as well as
maintaining the data.
 Well-known DBMSs like Oracle, MS SQL Server, DB2 and MySQL support this
model.

Figure: Example of relational data model


Classification of DBMS based on Data Model

 Object-oriented data models


 This model is a database management system in which information is
represented in the form of objects as used in object-oriented programming.
 Object-oriented databases are different from relational databases, which are
table-oriented.
 Object-oriented database management systems (OODBMS) combine database
capabilities with object-oriented programming language capabilities.
 Examples of object-oriented DBMS include MongoDB and Apache Cassandra.
Classification of DBMS based on Number of
users
 Single-user DBMSs
 As the name itself indicates it can support only one user at a
time.
 It is mostly used with the personal computer on which the data
resides accessible to a single person.
 The user may design, maintain and write the database programs.
 Multiuser DBMSs
 which include the majority of DBMSs, supports multiple users
concurrently.
 Data can be both integrated and shared, a database should be
integrated when the same information is not need be recorded
in two places.
 Needs concurrency control and deadlock management techniques.
Classification of DBMS based on Number of Databases

 Centralized Database
 In a centralized database there is a single database file at one location
in the network.
 Multiple users can access this single database via a computer network (LAN,
WAN, etc.)
 This type of database is mainly used by institutions or organizations.
Classification of DBMS based on Number of
Databases
 Distributed Database
 Distributed database is basically a type of database which consists of
multiple databases that are connected with each other and are spread across
different physical locations
 The communication b/n databases at different physical location is thus done
by a computer network.
Types of DDB system
 Distributed database systems are classified into two types. These
are,
1. Homogeneous distributed database system
– In a homogeneous distributed database system, all sites have
identical database management system software, are aware of one
another, and agree to cooperate in processing users’ requests.
2. Heterogeneous distributed database system
– in a heterogeneous distributed database, different sites may use
different schemas, and different database-management system
software.
– The sites may not be aware of one another, and they may provide
only limited facilities for cooperation in transaction
processing.
– The differences in schemas are often a major problem for query
processing, while the divergence in software becomes a hindrance
for processing transactions that access multiple sites.
Relational Data Model
 The relational data model first introduced in 1970 by a computer
scientist and mathematician named Dr. Edgar Frank Codd.
 The Relational Database Management System (RDBMS) has become the
dominant data-processing software in use today.
 In the relational model, all data is logically structured within
relations (tables).
 Each relation has a name and is made up of named attributes (columns)
of data. Each tuple (row) contains one value per attribute.
 A great strength of the relational model is this simple logical
structure.
 Yet, behind this simple structure is a sound theoretical foundation
that is lacking in the first generation of DBMSs (the network and
hierarchical DBMSs).
 Most modern database management systems like MS SQL Server, ORACLE,
My-SQL and etc. are based on RDBMS.
Terminologies in RDBMS
 Relation
 A relation is a table with columns and rows
 The RDBMS database uses tables to store data.
 A table is a collection of related data entries and contains rows
and columns to store data.
 Each table represents some real-world objects such as person, place,
or event about which information is collected.
 A relation has the following properties:
– Each relation has a unique name by which it is identified in the
database.
– Relation does not contain duplicate tuples.
– The tuples of a relation have no specific order.
– All attributes in a relation are atomic, i.e., each cell of a
relation contains exactly one value.
Terminologies in RDBMS
 Attribute
 An attribute is a named column of a relation.
 A column is a vertical entity in the table which contains all information
associated with a specific field in a table.
 Properties of an Attribute:
– Every attribute of a relation must have a name.
– Null values are permitted for the attributes.
– Default values can be specified for an attribute automatically inserted
if no other value is specified for an attribute.
– Attributes that uniquely identify each tuple of a relation are the
primary key.
Terminologies in RDBMS
 Tuple
 A tuple is a row of a relation.
 The elements of a relation are the rows or tuples in the table
 Tuples can appear in any order and the relation will still be the same
relation, and therefore convey the same meaning.
 Properties of a row:
 No two tuples are identical to each other in all their entries.
 All tuples of the relation have the same format and the same number of entries.
 The order of the tuple is irrelevant. They are identified by their content, not by
their position.
 Degree
 The degree of a relation is the number of attributes it contains.
 A relation with only one attribute would have degree one and be called a
unary relation or one-tuple. A relation with two attributes is called binary,
one with three attributes is called ternary, and after that the term n-ary is
usually used.
Terminologies in RDBMS

 Cardinality
 The cardinality of a relation is the number of tuples it contains.
 the number of tuples is called the cardinality of the relation and this
changes as tuples are added or deleted
 data item/Cells
 The smallest unit of data in the table is the individual data item. It
is stored at the intersection of tuples and attributes.
 Relational Database
 Relational Database is a collection of normalized relations with
distinct relation names.
Terminologies in RDBMS
 The terminology for the relational model can be quite confusing
 The following table summarizes the different terms used in
relational model.
Terminologies in RDBMS
 The following figure shows examples of terminologies used in
RDBMS
Relational Keys
 Keys are very important part of Relational database model. They are used
to uniquely identify any record or row of data inside a table and
establish relationships between tables.
 A Key can be a single attribute or a group of attributes. When a key
consists of more than one attribute, we call it a composite key.
 There are two main types of database keys:
1. Primary Key
 Since a relation has no duplicate tuples, it is always possible to identify each row
uniquely. This means that a relation always has a primary key.
 key that is selected to identify tuples uniquely within the relation.
 In the worst case, the entire set of attributes could serve as the primary key, but
usually some smaller subset is sufficient to distinguish the tuples.
2. Foreign Key
 Foreign keys are used to define relationships between tables and to enforce
referential integrity in a database by ensuring that each value in the foreign key
column is actually a valid entry in the primary key column of another table.
 When an attribute appears in more than one relation, its appearance usually
represents a relationship between tuples of the two relations.
Relational Keys
Building Blocks of Relational Data Model

 The building blocks of the relational data model are:


1. Entities: real world physical or logical object
2. Attributes: properties used to describe each Entity or real
world object.
3. Relationship: the association between Entities
4. Constraints: rules that should be obeyed while manipulating
the data.
Entity
 An entity is a person, place, thing, or event about which data
will be collected and stored.
 An entity represents a particular type of object in the real-
world, which means that an entity is “distinguishable” that
is ,each occurrence is unique and distinct.
 An entity may be a tangible object ,i.e., one that you can touch
such as a person or a product. An entity may also be intangible,
such as a flight route, or music concert (an event).
 The name given to an entity should always be a singular noun
descriptive of each item to be stored in it.
 Example: student NOT students.
Attribute
 An attribute is a characteristic of an entity.
 For example, a customer entity would be described by attributes such as last
name ,firstName, phone numbers, address, etc.
 There are different types of Attributes used in DBMS
1. Simple Attributes:- Simple attributes are those that cannot be further divided into sub-
attributes.
 For example, A student's roll number or the employee identification number.
2. Composite Attributes:- Composite attributes are made up of two or more simple attributes.
 For example 1, a person's address may be a composite attribute that is made up of the
person's street address, city, state, and zip code.
 Example 2 name can also be split into first name, middle name, and last name.
3. Single-valued attribute:-Single-valued attributes can only have one value
 Example: The age of a student.
4. Multivalued Attributes:- attributes can have more than one value.
 For example, a person may have multiple email addresses or phone numbers.
5. Derived attribute :- An attribute that can be derived from other attributes.
 Example: Age, CGPA.
6. Stored attribute:- The stored attribute are those attribute which doesn’t require any
type of further update since they are stored in the database.
 Example: DOB(Date of birth) is the stored attribute.
Relationship
 Any association between two entity types is called a
relationship.
 Entities take part in the relationship.
 There are three types of relationships that can exist
between two entities.
1. One-to-One (1:1) Relationship
2. One-to-Many or Many-to-One Relationship
3. Many-to-Many Relationship
Relationship
1. One-to-One Relationship
 It is used to create a relationship between two tables in which a
single row of the first table can only be related to one and only
one records of a second table.
 Similarly, the row of a second table can also be related to anyone
row of the first table.
 Example:-
 If there are two entities ‘Person’ (Id, Name, Age, Address)and
‘Passport’(Passport_id, Passport_no).
 So, each person can have only one passport and each passport belongs to
only one person.
Relationship
2. One-to-Many or Many-to-One Relationship
 Such a relationship exists when each record of one table can be related to
one or more than one record of the other table.
 This relationship is the most common relationship found.
 A one-to-many relationship can also be said as a many-to-one relationship
depending upon the way we view it.
 Example
 If there are two entity type ‘User’ and ‘Account’ then each ‘user’ can
have more than one ‘Account’ but each ‘Account’ is held by only one
‘user’.
 In this example, we can say that each user is associated with many
Account. So, it is a one-to-many relationship. But, if we see it the other
way i.e. many Account is associated with one user then we can say that it
is a many-to-one relationship.
Relationship
3. Many-to-Many Relationship
 A many-to-many relationship exists between the tables if a single record of
the first table is related to one or more records of the second table and a
single record in the second table is related to one or more records of the
first table.
 Example:
 If there are two entity type ‘Customer’ and ‘Product’ then each customer
can buy more than one product and a product can be bought by many
different customers.
Constraints
 Constraints are rules that should be obeyed while manipulating the
data.
 This means that constraints allow only a particular kind of data to be
inserted in the database or only some particular kind of operations to
be performed on the data in the database.
 There are mainly 4 types of constraints in DBMS called relational
constraints. They are as follows
A. Domain constraints
B. Entity integrity constraints
C. Referential Integrity Constraints
D. Key constraints
Constraints
 Domain Constraint
 In DBMS, the Domain refers to allowable values of an attribute.
 The domain constraint specifies that the value of the attribute must
be available in the corresponding domain.
 Example: Consider the following Entity.

 Now, in the table above, the tuple with Student ID = 4 and name
= “Ahmed Ali” has marks = A.
 This is not an integer or float value. So, the domain
constraint is violated here.
Constraints
 Entity Integrity Constraint
 The entity integrity constraint states that primary key value can't be null.
 This is because the primary key value is used to identify individual rows in
relation and if the primary key has a null value, then we can't identify
those rows.
 A table can contain a null value other than the primary key field.
 Example: Consider the following relation

 Here, the second tuple has null value stored in the Student_ID attribute
which is a Primary Key.
 Hence, the Entity Integrity constraint is violated here.
Constraints
 Key Constraint
 This is the same as the Entity Integrity constraint.
 The key constraint states that the Primary Key attributes should be unique
and must not contain null values.
 However, Entity Integrity Constraint states that any attribute of a Primary
key must not be null.
 Example: Consider the following relation

 In the table above, the tuples 3 and 5 have value of Student_ID =


1003.
 So, the primary key constraint is violated here as the values are not
Constraints
 Referential Integrity Constraint
 A referential integrity constraint is specified between two tables.
 In the Referential integrity constraints, if a foreign key in Table 1 refers to the
Primary Key of Table 2, then every value of the Foreign Key in Table 1 must be null or
be available in Table 2.
 Example: Consider the following relations.

 Now, the tuple 3 in the “Students” relation violates referential


integrity constraint as there is no department with id Dep003.
Database Languages
 A database language is a type of programming
language specifically designed for interacting with
and managing databases.
 These languages allow users to perform various
operations such as creating, reading, updating, and
deleting data.
 The most common database languages include;
 Structured Query Language (SQL)- for relational database
management systems.
 NoSQL (Non-relational SQL)- mainly designed for
specialized database environments.
 Object Query Language (OQL)- Used in object-oriented
databases
SQL
• SQL (Structured Query Language) is a database language designed
for managing data in relational database management systems
(RDBMS).
• What can SQL do?
• SQL can execute queries against a database
• SQL can retrieve data from a database
• SQL can insert records in a database
• SQL can update records in a database
• SQL can delete records from a database
• SQL can create new databases
• SQL can create new tables in a database
• SQL can create stored procedures in a database
• SQL can create views in a database
• SQL can set permissions on tables, procedures, and views
Types of SQL Command
 SQL command can be categorized into four
categories based on their functionalities
1. Data Definition Language (DDL)
2. Data Manipulation Language (DML)
3. Data Control Language(DCL)
4. Transaction Control Language (TCL)

60
Data Definition Language (DDL)
 Data Definition Language commands are used to creating and altering the
database and database objects in the relational database management system.
• Data Definition Language (DDL) is a set of special commands that allows us to
define and modify the structure and the metadata of the database.
• These commands can be used to create, modify, and delete the database
structures such as schema, tables, indexes, etc.
• these commands are normally not used by an end-user (someone who is accessing
the database via an application).
 The most used DDL commands are
A. Create Command:- used to create database and database objects like
a table, index, view,…
B. Alter Command:- used to modify the definition of the existing
objects
C. Drop Command:- used to remove existing database and other database
objects
D. Truncate Table command:- used to remove all the data from table.
• Note: the purpose of drop table and truncate table command is different. Drop table
command will remove the structure of a table along with its content but truncate table
command remove data inside a table but not a table it self. 61
Data Manipulation Language
• Data Manipulation Language (DML) is a set of special commands
that allows us to access and manipulate data stored in
existing database objects.
• These commands are used to perform certain operations such as
insertion, deletion, updating, and retrieval of the data from
the database.
• The most widely used DML includes
A. Select Command:- used to retrieve one or more rows from table
B. Insert Command:- used to insert one or more rows into a table
C. Update Command:- used to change/ update existing data in the
database table.
D. Delete Command:- used to remove one or more rows from a table
or view.
62
Data Control Language
(DCL)
 Data control language commands are used to control the
access of data stored in database and provide data
security.
 The most widely used DCL includes
A. Grant Command:- used to give access right/ privilege on
database objects
B. Revoke Command:- used to cancel/ take away privileges
that were given on database objects using grant command

63
Transaction Control Language
(TCL)
 Transaction Control Language commands are used to manage
changes made by DML commands.
 It allows statements to be grouped together into one
logical transaction.
 The most widely used TCL,
A. Begin_transaction command:- marks the starting point of a
transaction.
B. Commit_transaction command:- marks the end of the
successful transaction
C. Rollback_transaction command:- rollbacks transaction to the
beginning of the transaction includes

64
Introduction to Big Data
 Big data refers to datasets that are too large or complex for
traditional data-processing software to handle.
 Big data refers to extremely large and diverse collections of
structured, unstructured, and semi-structured data that
continues to grow exponentially over time.
 These datasets are so huge and complex in volume, velocity,
and variety, that traditional data management systems cannot
store, process, and analyze them.
 Big data describes large and diverse datasets that are huge in
volume and also rapidly grow in size over time. Big data is
used in machine learning, predictive modeling, and other
advanced analytics to solve business problems and make
informed decisions
Characteristics of Big data
 5 V’s of Big Data
5 V’s of Big Data
 Volume
 It refers to the size of Big Data.
 Data can be considered Big Data or not is based on the volume.
 Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes (6.2
billion GB) per month. Also, by the year 2020 we will have almost 40000 Exabytes of
data.
 Velocity
 It refers to the speed at which the data is getting accumulated.
 Big data velocity deals with the speed at the data flows from sources like application
logs, business processes, networks, and social media sites, sensors, mobile devices,
etc.
 Example:- In the year 2000, Google was receiving 32.8 million searches per day. As
for 2018, Google was receiving 5.6 billion searches per day!
 Veracity
 It refers to the assurance of quality/integrity/credibility/accuracy of the data.
 Since the data is collected from multiple sources, we need to check the data for
accuracy before using it for business insights.
5 V’s of Big data
 Value:
 True to its name, Value refers to the actionable insight that can be
derived from big data sets.
 The bulk of Data having no Value is of no good to the company, unless you
turn it into something useful.
 Data in itself is of no use or importance but it needs to be converted into
something valuable to extract Information.
 Variety
 It refers to heterogeneous sources.
 Variety is basically the arrival of data from new sources that are both
inside and outside of an enterprise.
 It can be structured, semi-structured and unstructured.
END

You might also like