100% found this document useful (2 votes)
2K views

A7-R5-Databases Technologies (A Level Syllabus Based Notes)

Doeacc Nielit A level syllabus based notes of A7 R5-databases technologies (a level syllabus based notes) i hope this helps you cause there aren't book for this subject which matches subject fully

Uploaded by

Rajat Awasthi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
2K views

A7-R5-Databases Technologies (A Level Syllabus Based Notes)

Doeacc Nielit A level syllabus based notes of A7 R5-databases technologies (a level syllabus based notes) i hope this helps you cause there aren't book for this subject which matches subject fully

Uploaded by

Rajat Awasthi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

A7-R5-Databases Technologies

(i) An Overview of the Database Management System


What is database?
A database is an organized collection of structured information, or data, typically stored electronically
in a computer system. A database is usually controlled by a database management system (DBMS).
Together, the data and the DBMS, along with the applications that are associated with them, are
referred to as a database system, often shortened to just database.
Data within the most common types of databases in operation today is typically modeled in rows and
columns in a series of tables to make processing and data querying efficient. The data can then be
easily accessed, managed, modified, updated, controlled, and organized. Most databases use
structured query language (SQL) for writing and querying data.

Why database?
Databases can store very large numbers of records efficiently (they take up little space).
It is very quick and easy to find information.
It is easy to add new data and to edit or delete old data.
Data can be searched easily, eg 'find all Ford cars'.
Data can be sorted easily, for example into 'date first registered' order.
Data can be imported into other applications, for example a mail-merge letter to a customer saying that an MOT
test is due.
More than one person can access the same database at the same time - multi-access.
Security may be better than in paper files.

Database system, database management system


The software which is used to manage database is called Database Management System
(DBMS). For Example, MySQL, Oracle etc. are popular commercial DBMS used in different
applications. DBMS allows users the following tasks:
Data Definition: It helps in creation, modification and removal of definitions that define the
organization of data in database.
Data Updation: It helps in insertion, modification and deletion of the actual data in the database.
Data Retrieval: It helps in retrieval of data from the database which can be used by applications
for various purposes.
User Administration: It helps in registering and monitoring users, enforcing data security,
monitoring performance, maintaining data integrity, dealing with concurrency control and
recovering information corrupted by unexpected failure.
(DBMS), advantages of DBMS
Compared to the File Based Data Management System, Database Management System has
many advantages. Some of these advantages are given below −

Reducing Data Redundancy


The file based data management systems contained multiple files that were stored in many
different locations in a system or even across multiple systems. Because of this, there were
sometimes multiple copies of the same file which lead to data redundancy.
This is prevented in a database as there is a single database and any change in it is reflected
immediately. Because of this, there is no chance of encountering duplicate data.

Sharing of Data
In a database, the users of the database can share the data among themselves. There are various
levels of authorisation to access the data, and consequently the data can only be shared based on
the correct authorisation protocols being followed.
Many remote users can also access the database simultaneously and share the data between
themselves.

Data Integrity
Data integrity means that the data is accurate and consistent in the database. Data Integrity is
very important as there are multiple databases in a DBMS. All of these databases contain data
that is visible to multiple users. So it is necessary to ensure that the data is correct and
consistent in all the databases and for all the users.

Data Security
Data Security is vital concept in a database. Only authorised users should be allowed to access
the database and their identity should be authenticated using a username and password.
Unauthorised users should not be allowed to access the database under any circumstances as it
violates the integrity constraints.

Privacy
The privacy rule in a database means only the authorized users can access a database according
to its privacy constraints. There are levels of database access and a user can only view the data
he is allowed to. For example - In social networking sites, access constraints are different for
different accounts a user may want to access.

Backup and Recovery


Database Management System automatically takes care of backup and recovery. The users don't
need to backup data periodically because this is taken care of by the DBMS. Moreover, it also
restores the database after a crash or system failure to its previous condition.

Data Consistency
Data consistency is ensured in a database because there is no data redundancy. All data appears
consistently across the database and the data is same for all the users viewing the database.
Moreover, any changes made to the database are immediately reflected to all the users and there
is no data inconsistency.

(ii) An Architecture of the Database system


The design of a DBMS depends on its architecture. It can be centralized or decentralized or
hierarchical. The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier
architecture divides the whole system into related but independent n modules, which can be
independently modified, altered, changed, or replaced.

In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and
uses it. Any changes done here will directly be done on the DBMS itself. It does not provide
handy tools for end-users. Database designers and programmers normally prefer to use single-
tier architecture.

If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS
can be accessed. Programmers use 2-tier architecture where they access the DBMS by means of
an application. Here the application tier is entirely independent of the database in terms of
operation, design, and programming.

3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design a
DBMS.
Database (Data) Tier − At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this level.

Application (Middle) Tier − At this tier reside the application server and the programs that
access the database. For a user, this application tier presents an abstracted view of the database.
End-users are unaware of any existence of the database beyond the application. At the other
end, the database tier is not aware of any other user beyond the application tier. Hence, the
application layer sits in the middle and acts as a mediator between the end-user and the
database.

User (Presentation) Tier − End-users operate on this tier and they know nothing about any
existence of the database beyond this layer. At this layer, multiple views of the database can be
provided by the application. All views are generated by applications that reside in the
application tier.

Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.
Three levels of architecture,
The ANSI-SPARC database architecture is the basis of most of the modern databases.
The three levels present in this architecture are Physical level, Conceptual level and External
level.
The details of these levels are as follows −
Physical Level
This is the lowest level in the three level architecture. It is also known as the internal level. The
physical level describes how data is actually stored in the database. In the lowest level, this data
is stored in the external hard drives in the form of bits and at a little high level, it can be said
that the data is stored in files and folders. The physical level also discusses compression and
encryption techniques.
Conceptual Level
The conceptual level is at a higher level than the physical level. It is also known as the logical
level. It describes how the database appears to the users conceptually and the relationships
between various data tables. The conceptual level does not care for how the data in the database
is actually stored.
External Level
This is the highest level in the three level architecture and closest to the user. It is also known as
the view level. The external level only shows the relevant database content to the users in the
form of views and hides the rest of the data. So different users can see the database as a
different view as per their individual requirements.
Logical View, Physical View, Conceptual View,
Logical data independence, Physical Data Independence
What is Data Independence of DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that make
use of it.
Physical Data Independence
Physical data independence helps you to separate conceptual levels from the internal/physical
levels. It allows you to provide a logical description of the database without the need to specify
physical structures. Compared to Logical Independence, it is easy to achieve physical data
independence.

With Physical independence, you can easily change the physical storage structures or devices
with an effect on the conceptual schema. Any change done would be absorbed by the mapping
between the conceptual and internal levels. Physical data independence is achieved by the
presence of the internal level of the database and then the transformation from the conceptual
level of the database to the internal level.

Examples of changes under Physical Data Independence


Due to Physical independence, any of the below change will not affect the conceptual layer.

Using a new storage device like Hard Drive or Magnetic Tapes


Modifying the file organization technique in the Database
Switching to different data structures.
Changing the access method.
Modifying indexes.
Changes to compression techniques or hashing algorithms.
Change of Location of Database from say C drive to D Drive

Logical Data Independence


Logical Data Independence is the ability to change the conceptual scheme without changing

External views
External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.

When compared to Physical Data independence, it is challenging to achieve logical data


independence.

Examples of changes under Logical Data Independence


Due to Logical independence, any of the below change will not affect the external layer.

Add/Modify/Delete a new attribute, entity or relationship is possible without a rewrite of


existing application programs
Merging two records into one
Breaking an existing record into two or more records
Difference between Physical and Logical Data Independence
Logica Data Independence Physical Data Independence
Logical Data Independence is mainly concerned with the structure or changing the data
definition. Mainly concerned with the storage of the data.
It is difficult as the retrieving of data is mainly dependent on the logical structure of data. It is
easy to retrieve.
Compared to Logic Physical independence it is difficult to achieve logical data independence.
Compared to Logical Independence it is easy to achieve physical data independence.
You need to make changes in the Application program if new fields are added or deleted from
the database. A change in the physical level usually does not need change at the
Application program level.
Modification at the logical levels is significant whenever the logical structures of the database
are changed. Modifications made at the internal levels may or may not be needed to improve the
performance of the structure.
Concerned with conceptual schema Concerned with internal schema
Example: Add/Modify/Delete a new attribute Example: change in compression
techniques, hashing algorithms, storage devices, etc

Importance of Data Independence


Helps you to improve the quality of the data
Database system maintenance becomes affordable
Enforcement of standards and improvement in database security
You don't need to alter data structure in application programs
Permit developers to focus on the general structure of the Database rather than worrying about
the internal implementation
It allows you to improve state which is undamaged or undivided
Database incongruity is vastly reduced.
Easily make modifications in the physical level is needed to improve the performance of the
system.
(iii) Relational Database Management System (RDBMS)
Introduction,
Stands for "Relational Database Management System." An RDBMS is a DBMS designed
specifically for relational databases. Therefore, RDBMSes are a subset of DBMSes.

A relational database refers to a database that stores data in a structured format, using rows and
columns. This makes it easy to locate and access specific values within the database. It is
"relational" because the values within each table are related to each other. Tables may also be
related to other tables. The relational structure makes it possible to run queries across multiple
tables at once.

While a relational database describes the type of database an RDMBS manages, the RDBMS
refers to the database program itself. It is the software that executes queries on the data,
including adding, updating, and searching for values. An RDBMS may also provide a visual
representation of the data. For example, it may display data in a tables like a spreadsheet,
allowing you to view and even edit individual values in the table. Some RDMBS programs
allow you to create forms that can streamline entering, editing, and deleting data.

Most well known DBMS applications fall into the RDBMS category. Examples include Oracle
Database, MySQL, Microsoft SQL Server, and IBM DB2. Some of these programs support
non-relational databases, but they are primarily used for relational database management.
RDBMS terminology, relational model, base tables, keys, primary key, foreign key,
RDMS Terminologies include Database, Table, Columns, etc. Let us see them one by one −
Keys help you to identify any row of data in a table. In a real-world application, a table could
contain thousands of records. Moreover, the records could be duplicated. Keys ensure that you
can uniquely identify a table record despite these challenges.
Super Key - A super key is a group of single or multiple keys which identifies rows in a table.
Primary Key - is a column or group of columns in a table that uniquely identify every row in
that table.
Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a
super key with no repeated attributes.
Alternate Key - is a column or group of columns in a table that uniquely identify every row in
that table.
Foreign Key - is a column that creates a relationship between two tables. The purpose of
Foreign keys is to maintain data integrity and allow navigation between two different instances
of an entity.
Compound Key - has two or more attributes that allow you to uniquely recognize a specific
record. It is possible that each column may not be unique by itself within the database.
Composite Key - An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don't have any
natural primary key.
Surrogate Key - An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don't have any
natural primary key.
constraints,
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted from a
table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that
can be created in RDBMS.
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints

Codd Rules
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems,
came up with twelve rules of his own, which according to him, a database must obey in order to
be regarded as a true relational database.

These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.

Rule 1: Information Rule


The data stored in a database, may it be user data or metadata, must be a value of some table
cell. Everything in a database must be stored in a table format.

Rule 2: Guaranteed Access Rule


Every single data element (value) is guaranteed to be accessible logically with a combination of
table-name, primary-key (row value), and attribute-name (column value). No other means, such
as pointers, can be used to access data.

Rule 3: Systematic Treatment of NULL Values


The NULL values in a database must be given a systematic and uniform treatment. This is a
very important rule because a NULL can be interpreted as one the following − data is missing,
data is not known, or data is not applicable.

Rule 4: Active Online Catalog


The structure description of the entire database must be stored in an online catalog, known as
data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.

Rule 5: Comprehensive Data Sub-Language Rule


A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be
used directly or by means of some application. If the database allows access to data without any
help of this language, then it is considered as a violation.

Rule 6: View Updating Rule


All the views of a database, which can theoretically be updated, must also be updatable by the
system.

Rule 7: High-Level Insert, Update, and Delete Rule


A database must support high-level insertion, updation, and deletion. This must not be limited
to a single row, that is, it must also support union, intersection and minus operations to yield
sets of data records.

Rule 8: Physical Data Independence


The data stored in a database must be independent of the applications that access the database.
Any change in the physical structure of a database must not have any impact on how the data is
being accessed by external applications.

Rule 9: Logical Data Independence


The logical data in a database must be independent of its user’s view (application). Any change
in logical data must not affect the applications using it. For example, if two tables are merged or
one is split into two different tables, there should be no impact or change on the user
application. This is one of the most difficult rule to apply.

Rule 10: Integrity Independence


A database must be independent of the application that uses it. All its integrity constraints can
be independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.

Rule 11: Distribution Independence


The end-user must not be able to see that the data is distributed over various locations. Users
should always get the impression that the data is located at one site only. This rule has been
regarded as the foundation of distributed database systems.

Rule 12: Non-Subversion Rule


If a system has an interface that provides access to low-level records, then the interface must not
be able to subvert the system and bypass security and integrity constraints.

(iv) Database Design


Database design is the organization of data according to a database model. The designer
determines what data must be stored and how the data elements interrelate. With this
information, they can begin to fit the data to the database model.[1] Database management
system manages the data accordingly.
Data modelling is the first step in the process of database design. This step is sometimes
considered to be a high-level and abstract design phase, also referred to as conceptual design.
The aim of this phase is to describe:

The data contained in the database (e.g., entities: students, lecturers, courses, subjects)
The relationships between data items (e.g., students are supervised by lecturers; lecturers teach
courses)
The constraints on data (e.g., student number has exactly eight digits; a subject has four or six
units of credit only)
In the second step, the data items, the relationships and the constraints are all expressed using
the concepts provided by the high-level data model. Because these concmepts do not include the
implementation details, the result of the data modelling process is a (semi) formal
representation of the database structure. This result is quite easy to understand so it is used as
reference to make sure that all the user’s requirements are met.

The third step is database design. During this step, we might have two sub-steps: one called
database logical design, which defines a database in a data model of a specific DBMS, and
another called database physical design, which defines the internal database storage structure,
file organization or indexing techniques. These two sub-steps are database implementation and
operations/user interfaces building steps.

In the database design phases, data are represented using a certain data model. The data model
is a collection of concepts or notations for describing data, data relationships, data semantics
and data constraints. Most data models also include a set of basic operations for manipulating
data in the database.

Normalization Normal forms-1NF, 2NF, 3NF, BCNF 4NF and 5NF,


Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step
process that puts data into tabular form, removing duplicated data from the relation tables.

Normalization is used for mainly two purposes,

Eliminating redundant(useless) data.


Ensuring data dependencies make sense i.e data is logically stored.

First Normal Form (1NF)


For a table to be in the First Normal Form, it should follow the following 4 rules:

It should only have single(atomic) valued attributes/columns.


Values stored in a column should be of the same domain
All the columns in a table should have unique names.
And the order in which data is stored, does not matter.

Second Normal Form (2NF)


For a table to be in the Second Normal Form,

It should be in the First Normal form.


And, it should not have Partial Dependency.
Partial Dependency, where an attribute in a table depends on only a part of the primary key and
not on the whole key.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,

It is in the Second Normal form.


And, it doesn't have Transitive Dependency.
Transitive Dependency. When a non-prime attribute depends on other non-prime attributes
rather than depending upon the prime attributes or primary key.

Boyce and Codd Normal Form (BCNF)


Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals
with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF,
following conditions must be satisfied:

R must be in 3rd Normal Form


and, for each functional dependency ( X → Y ), X should be a super Key.

Fourth Normal Form (4NF)


A table is said to be in the Fourth Normal Form when,
It is in the Boyce-Codd Normal Form.
And, it doesn't have Multi-Valued Dependency.
A table is said to have multi-valued dependency, if the following conditions are true,

For a dependency A → B, if for a single value of A, multiple value of B exists, then the table
may have multi-valued dependency.
Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B
and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.

Fifth normal form (5NF)


A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
5NF is also known as Project-join normal form (PJ/NF).
What is Join Dependency?
If a table can be recreated by joining multiple tables and each of this table have a subset of the
attributes of the table, then the table is in Join Dependency. It is a generalization of Multivalued
Dependency

E-R Diagram.
An Entity–relationship model (ER model) describes the structure of a database with the help of
a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database. The main
components of E-R model are: entity set and relationship set.

What is an Entity Relationship Diagram (ER Diagram)?


An ER diagram shows the relationship among entity sets. An entity set is a group of similar
entities and these entities can have attributes. In terms of DBMS, an entity is a table or attribute
of a table in database, so by showing relationship among tables and their attributes, ER diagram
shows the complete logical structure of a database.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these
terms in detail in the next section(Components of a ER Diagram) of this guide so don’t worry
too much about these terms now, just go through them once.

Rectangle: Represents Entity sets.


Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
ER diagram has three main components:
1. Entity
2. Attribute
3. Relationship

1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an ER
diagram.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram. There are four types of attributes:

1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll
number can uniquely identify a student from a set of students. Key attribute is represented by
oval same as other attributes however the text of key attribute is underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For
example, In student entity, the student address is a composite attribute as an address is
composed of other attributes such as pin code, state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented
with double ovals in an ER Diagram. For example – A person can have more than one phone
numbers so the phone number attribute is multivalued.

4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived attribute
as it changes over time and can be derived from another attribute (Date of birth).
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship among
entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

1. One to One Relationship


When a single instance of an entity is associated with a single instance of another entity then it
is called one to one relationship. For example, a person has only one passport and a passport is
given to one person.
2. One to Many Relationship
When a single instance of an entity is associated with more than one instances of another entity
then it is called one to many relationship. For example – a customer can place many orders but a
order cannot be placed by many customers.
3. Many to One Relationship
When more than one instances of an entity is associated with a single instance of another entity
then it is called many to one relationship. For example – many students can study in a single
college but a student cannot study in many colleges at the same time.
4. Many to Many Relationship
When more than one instances of an entity is associated with more than one instances of
another entity then it is called many to many relationship. For example, a can be assigned to
many projects and a project can be assigned to many students.
Mapping ER-diagram to database tables.
ER Model, when conceptualized into diagrams, gives a good overview of entity-relationship,
which is easier to understand. ER diagrams can be mapped to relational schema, that is, it is
possible to create relational schema using ER diagram. We cannot import all the ER constraints
into relational model, but an approximate schema can be generated.

There are several processes and algorithms available to convert ER Diagrams into Relational
Schema. Some of them are automated and some of them are manual. We may focus here on the
mapping diagram contents to relational basics.

ER diagrams mainly comprise of −

Entity and its attributes


Relationship, which is association among entities.
Mapping Entity
An entity is a real-world object with some attributes.
Mapping Process (Algorithm)
Create table for each entity.
Entity's attributes should become fields of tables with their respective data types.
Declare primary key.
Mapping Relationship
A relationship is an association among entities.
Mapping Process
Create table for a relationship.
Add the primary keys of all participating Entities as fields of table with their respective data
types.
If relationship has any attribute, add each attribute as field of table.
Declare a primary key composing all the primary keys of participating entities.
Declare all foreign key constraints.

(v) Maria DB
Introduction to Maria DB,
MariaDB Database
MariaDB is a popular fork of MySQL created by MySQL's original developers. It grew out of
concerns related to MySQL's acquisition by Oracle. It offers support for both small data
processing tasks and enterprise needs. It aims to be a drop-in replacement for MySQL requiring
only a simple uninstall of MySQL and an install of MariaDB. MariaDB offers the same features
of MySQL and much more.

Key Features of MariaDB


The important features of MariaDB are −

All of MariaDB is under GPL, LGPL, or BSD.

MariaDB includes a wide selection of storage engines, including high-performance storage


engines, for working with other RDBMS data sources.
MariaDB uses a standard and popular querying language.

MariaDB runs on a number of operating systems and supports a wide variety of programming
languages.

MariaDB offers support for PHP, one of the most popular web development languages.

MariaDB offers Galera cluster technology.

MariaDB also offers many operations and commands unavailable in MySQL, and
eliminates/replaces features impacting performance negatively.
Data Types,
MariaDB data types can be categorized as numeric, date and time, and string values.

Numeric Data Types


The numeric data types supported by MariaDB are as follows −

TINYINT − This data type represents small integers falling within the signed range of -128 to
127, and the unsigned range of 0 to 255.

BOOLEAN − This data type associates a value 0 with “false,” and a value 1 with “true.”

SMALLINT − This data type represents integers within the signed range of -32768 to 32768,
and the unsigned range of 0 to 65535.

MEDIUMINT − This data type represents integers in the signed range of -8388608 to 8388607,
and the unsigned range of 0 to 16777215.
INT(also INTEGER) − This data type represents an integer of normal size. When marked as
unsigned, the range spans 0 to 4294967295. When signed (the default setting), the range spans -
2147483648 to 2147483647. When a column is set to ZEROFILL( an unsigned state), all its
values are prepended by zeros to place M digits in the INT value.

BIGINT − This data type represents integers within the signed range of 9223372036854775808
to 9223372036854775807, and the unsigned range of 0 to 18446744073709551615.

DECIMAL( also DEC, NUMERIC, FIXED)− This data type represents precise fixed-point
numbers, with M specifying its digits and D specifying the digits after the decimal. The M value
does not add “-” or the decimal point. If D is set to 0, no decimal or fraction part appears and
the value will be rounded to the nearest DECIMAL on INSERT. The maximum permitted digits
is 65, and the maximum for decimals is 30. Default value for M on omission is 10, and 0 for D
on omission.

FLOAT − This data type represents a small, floating-point number of the value 0 or a number
within the following ranges −

-3.402823466E+38 to -1.175494351E-38

1.175494351E-38 to 3.402823466E+38

DOUBLE (also REAL and DOUBLE PRECISION) − This data type represents normal-size,
floating-point numbers of the value 0 or within the following ranges −

-1.7976931348623157E+308 to -2.2250738585072014E-308

2.2250738585072014E-308 to 1.7976931348623157E+308

BIT − This data type represents bit fields with M specifying the number of bits per value. On
omission of M, the default is 1. Bit values can be applied with “ b’[value]’” in which value
represents bit value in 0s and 1s. Zero-padding occurs automatically from the left for full
length; for example, “10” becomes “0010.”
Date and Time Data Types
The date and time data types supported by MariaDB are as follows −

DATE − This data type represents a date range of “1000-01-01” to “9999-12-31,” and uses the
“YYYY-MM-DD” date format.

TIME − This data type represents a time range of “-838:59:59.999999” to “838:59:59.999999.”

DATETIME − This data type represents the range “1000-01-01 00:00:00.000000” to “9999-12-
31 23:59:59.999999.” It uses the “YYYY-MM-DD HH:MM:SS” format.

TIMESTAMP − This data type represents a timestamp of the “YYYY-MM-DD HH:MM:DD”


format. It mainly finds use in detailing the time of database modifications, e.g., insertion or
update.

YEAR − This data type represents a year in 4-digit format. The four-digit format allows values
in the range of 1901 to 2155, and 0000.

String DataTypes
The string type values supported by MariaDB are as follows −

String literals − This data type represents character sequences enclosed by quotes.

CHAR − This data type represents a right-padded, fixed-length string containing spaces of
specified length. M represents column length of characters in a range of 0 to 255, its default
value is 1.

VARCHAR − This data type represents a variable-length string, with an M range (maximum
column length) of 0 to 65535.
BINARY − This data type represents binary byte strings, with M as the column length in bytes.

VARBINARY − This data type represents binary byte strings of variable length, with M as
column length.

TINYBLOB − This data type represents a blob column with a maximum length of 255 (28 - 1)
bytes. In storage, each uses a one-byte length prefix indicating the byte quantity in the value.

BLOB − This data type represents a blob column with a maximum length of 65,535 (216 - 1)
bytes. In storage, each uses a two-byte length prefix indicating the byte quantity in the value.

MEDIUMBLOB − This data type represents a blob column with a maximum length of
16,777,215 (224 - 1) bytes. In storage, each uses a three-byte length prefix indicating the byte
quantity in the value.

LONGBLOB − This data type represents a blob column with a maximum length of
4,294,967,295(232 - 1) bytes. In storage, each uses a four-byte length prefix indicating the byte
quantity in the value.

TINYTEXT − This data type represents a text column with a maximum length of 255 (28 - 1)
characters. In storage, each uses a one-byte length prefix indicating the byte quantity in the
value.

TEXT − This data type represents a text column with a maximum length of 65,535 (216 - 1)
characters. In storage, each uses a two-byte length prefix indicating the byte quantity in the
value.

MEDIUMTEXT − This data type represents a text column with a maximum length of
16,777,215 (224 - 1) characters. In storage, each uses a three-byte length prefix indicating the
byte quantity in the value.
LONGTEXT − This data type represents a text column with a maximum length of
4,294,967,295 or 4GB (232 - 1) characters. In storage, each uses a four-byte length prefix
indicating the byte quantity in the value.

ENUM − This data type represents a string object having only a single value from a list.

SET − This data type represents a string object having zero or more values from a list, with a
maximum of 64 members. SET values present internally as integer values.
SQL Commands, Create,
Create Database is also used.
The SQL CREATE TABLE statement allows you to create and define a table.

Syntax
The syntax for the CREATE TABLE statement in SQL is:

CREATE TABLE table_name


(
column1 datatype [ NULL | NOT NULL ],
column2 datatype [ NULL | NOT NULL ],
...
);
Let's look at a SQL CREATE TABLE example.

CREATE TABLE suppliers


( supplier_id int NOT NULL,
supplier_name char(50) NOT NULL,
contact_name char(50)
);
insert,
The SQL INSERT statement is used to insert a one or more records into a table. There are 2
syntaxes for the INSERT statement depending on whether you are inserting one record or
multiple records.
The syntax for the INSERT statement when inserting a single record in SQL is:

INSERT INTO table


(column1, column2, ... )
VALUES
(expression1, expression2, ... );
Or the syntax for the INSERT statement when inserting multiple records in SQL is:

INSERT INTO table


(column1, column2, ... )
SELECT expression1, expression2, ...
FROM source_tables
[WHERE conditions];
update,
The SQL UPDATE statement is used to update existing records in the tables.
The syntax for the UPDATE statement when updating a table in SQL is:

UPDATE table
SET column1 = expression1,
column2 = expression2,
...
[WHERE conditions];
delete,
The SQL DELETE statement is a used to delete one or more records from a table.
Syntax
The syntax for the DELETE statement in SQL is:
DELETE FROM table
[WHERE conditions];
Note
You do not need to list fields in the DELETE statement since you are deleting the entire row
from the table.
drop,
The SQL DROP TABLE statement allows you to remove or delete a table from the SQL
database.

Syntax
The syntax for the DROP TABLE statement in SQL is:

DROP TABLE table_name;

alter,
The SQL ALTER TABLE statement is used to add, modify, or drop/delete columns in a table.
The SQL ALTER TABLE statement is also used to rename a table.

Add column in table


Syntax
To add a column in a table, the ALTER TABLE syntax in SQL is:

ALTER TABLE table_name


ADD column_name column_definition;
Example
Let's look at a SQL ALTER TABLE example that adds a column.

For example:
ALTER TABLE supplier
ADD supplier_name char(50);
This SQL ALTER TABLE example will add a column called supplier_name to the supplier
table.

SQL functions (String functions),


SQL string functions are used primarily for string manipulation. The following table details the
important string functions −

Sr.No. Function & Description


1 ASCII()
Returns numeric value of left-most character

2 BIN()
Returns a string representation of the argument

3 BIT_LENGTH()
Returns length of argument in bits

4 CHAR_LENGTH()
Returns number of characters in argument

5 CHAR()
Returns the character for each integer passed

6 CHARACTER_LENGTH()
A synonym for CHAR_LENGTH()

7 CONCAT_WS()
Returns concatenate with separator

8 CONCAT()
Returns concatenated string

9 CONV()
Converts numbers between different number bases

10 ELT()
Returns string at index number

11 EXPORT_SET()
Returns a string such that for every bit set in the value bits, you get an on string and for every
unset bit, you get an off string

12 FIELD()
Returns the index (position) of the first argument in the subsequent arguments

13 FIND_IN_SET()
Returns the index position of the first argument within the second argument

14 FORMAT()
Returns a number formatted to specified number of decimal places

15 HEX()
Returns a string representation of a hex value

16 INSERT()
Inserts a substring at the specified position up to the specified number of characters
17 INSTR()
Returns the index of the first occurrence of substring

18 LCASE()
Synonym for LOWER()

19 LEFT()
Returns the leftmost number of characters as specified

20 LENGTH()
Returns the length of a string in bytes

21 LOAD_FILE()
Loads the named file

22 LOCATE()
Returns the position of the first occurrence of substring

23 LOWER()
Returns the argument in lowercase

24 LPAD()
Returns the string argument, left-padded with the specified string

25 LTRIM()
Removes leading spaces
26 MAKE_SET()
Returns a set of comma-separated strings that have the corresponding bit in bits set

27 MID()
Returns a substring starting from the specified position

28 OCT()
Returns a string representation of the octal argument

29 OCTET_LENGTH()
A synonym for LENGTH()

30 ORD()
If the leftmost character of the argument is a multi-byte character, returns the code for that
character

31 POSITION()
A synonym for LOCATE()

32 QUOTE()
Escapes the argument for use in an SQL statement

33 REGEXP
Pattern matching using regular expressions

34 REPEAT()
Repeats a string the specified number of times

35 REPLACE()
Replaces occurrences of a specified string

36 REVERSE()
Reverses the characters in a string

37 RIGHT()
Returns the specified rightmost number of characters

38 RPAD()
Appends string the specified number of times

39 RTRIM()
Removes trailing spaces

40 SOUNDEX()
Returns a soundex string

41 SOUNDS LIKE
Compares sounds

42 SPACE()
Returns a string of the specified number of spaces

43 STRCMP()
Compares two strings

44 SUBSTRING_INDEX()
Returns a substring from a string before the specified number of occurrences of the delimiter
45 SUBSTRING(), SUBSTR()
Returns the substring as specified

46 TRIM()
Removes leading and trailing spaces

47 UCASE()
Synonym for UPPER()

48 UNHEX()
Converts each pair of hexadecimal digits to a character

49 UPPER()
Converts to uppercase
Date functions,
SQL Server Date Functions
Function Description
CURRENT_TIMESTAMP Returns the current date and time
DATEADD Adds a time/date interval to a date and then returns the date
DATEDIFF Returns the difference between two dates
DATEFROMPARTS Returns a date from the specified parts (year, month, and day values)
DATENAME Returns a specified part of a date (as string)
DATEPART Returns a specified part of a date (as integer)
DAY Returns the day of the month for a specified date
GETDATE Returns the current database system date and time
GETUTCDATE Returns the current database system UTC date and time
ISDATE Checks an expression and returns 1 if it is a valid date, otherwise 0
MONTH Returns the month part for a specified date (a number from 1 to 12)
SYSDATETIME Returns the date and time of the SQL Server
YEAR Returns the year part for a specified dateate functions,

indexing key, primary key, foreign key


Create table countries
(
CountryID int primary key,
CountryName varchar(100) not null,
)
Create table cities
(
CityName varchar(100) not null,
CountryID int,
Foreign key(CountryID) references countries(CountryID)
)
(vi) Manipulating Data with Maria DB
SQL Statements,
SQL statements are categorized into four different types of statements, which are
DML (DATA MANIPULATION LANGUAGE)
In Data Manipulation Language(DML), we have four different SQL statements, Select, Insert,
Update, and Delete.
DDL (DATA DEFINITION LANGUAGE)
In Data Definition Language (DDL), we have three different SQL statements, Create , alert and
drop.
DCL (DATA CONTROL LANGUAGE)
In Data Control Language(DCL), it defines the control over the data in the database. We have
two different commands, which are
1. Grant : Grant is allowed to do the specified user to the specified tasks.
Syntax
GRANT privilege_name
ON object_name
TO {user_name |PUBLIC |role_name}
[WITH GRANT OPTION];
2. Revoke : It is used to cancel previously granted or denied permissions.
Syntax
REVOKE privilege_name
ON object_name
FROM {user_name |PUBLIC |role_name}
TCL (TRANSACTION CONTROL LANGUAGE)
In Transaction Control Language (TCL), the commands are used to manage the transactions in
the database. These are used to manage the changes made by DML statements. It also allows
the statements to be grouped together into logical transactions.
COMMIT
Commit command is used to permanently save any transaction into the database.
Syntax
Commit;
ROLLBACK
Rollback command is used to restore the database for the last committed state. It’s also used
with a save point to jump to the save point.
Syntax
Rollback to savepoint name
SAVEPOINT
SAVEPOINT command is used to temporarily save a transaction so that you can roll back to
that point whenever necessary.
Syntax
savepointsavepoint-name;
Select,
The SELECT statement is used to select data from a database.

The data returned is stored in a result table, called the result-set.


SELECT Syntax
SELECT column1, column2, ...
FROM table_name;
Here, column1, column2, ... are the field names of the table you want to select data from. If you
want to select all the fields available in the table, use the following syntax:

SELECT * FROM table_name;


like clause,
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column.
There are two wildcards often used in conjunction with the LIKE operator:

% - The percent sign represents zero, one, or multiple characters


_ - The underscore represents a single character
The percent sign and the underscore can also be used in combinations!
Here are some examples showing different LIKE operators with '%' and '_' wildcards:
LIKE Operator Description
WHERE CustomerName LIKE 'a%' Finds any values that start with "a"
WHERE CustomerName LIKE '%a' Finds any values that end with "a"
WHERE CustomerName LIKE '%or%' Finds any values that have "or" in any position
WHERE CustomerName LIKE '_r%' Finds any values that have "r" in the second position
WHERE CustomerName LIKE 'a_%' Finds any values that start with "a" and are at least 2
characters in length
WHERE CustomerName LIKE 'a__%' Finds any values that start with "a" and are at least
3 characters in length
WHERE ContactName LIKE 'a%o' Finds any values that start with "a" and ends with "o"
group by,
SQL GROUP BY statement is used to arrange identical data into groups. The GROUP BY
statement is used with the SQL SELECT statement.
The GROUP BY statement follows the WHERE clause in a SELECT statement and precedes
the ORDER BY clause.
The GROUP BY statement is used with aggregation function.
Syntax

SELECT column
FROM table_name
WHERE conditions
GROUP BY column
ORDER BY column
order by,
The ORDER BY clause sorts the result-set in ascending or descending order.
It sorts the records in ascending order by default. DESC keyword is used to sort the records in
descending order.
Syntax:

SELECT column1, column2


FROM table_name
WHERE condition
ORDER BY column1, column2... ASC|DESC;
Where
ASC: It is used to sort the result set in ascending order by expression.
DESC: It sorts the result set in descending order by expression.
joins-left join, natural join, right join,
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
Here are the different types of the JOINs in SQL:
(INNER) JOIN: Returns records that have matching values in both tables
LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records from the
right table
RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched records from
the left table
FULL (OUTER) JOIN: Returns all records when there is a match in either left or right table
union.
The UNION operator is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of columns
The columns must also have similar data types
The columns in each SELECT statement must also be in the same order
UNION Syntax
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
Correlated and nested queries.
A Subquery or Inner query or a Nested query is a query within another SQL query and
embedded within the WHERE clause.
A subquery is used to return data that will be used in the main query as a condition to further
restrict the data to be retrieved.
Subqueries can be used with the SELECT, INSERT, UPDATE, and DELETE statements along
with the operators like =, <, >, >=, <=, IN, BETWEEN, etc.
There are a few rules that subqueries must follow −
Subqueries must be enclosed within parentheses.
A subquery can have only one column in the SELECT clause, unless multiple columns are in
the main query for the subquery to compare its selected columns.
An ORDER BY command cannot be used in a subquery, although the main query can use an
ORDER BY. The GROUP BY command can be used to perform the same function as the
ORDER BY in a subquery.
Subqueries that return more than one row can only be used with multiple value operators such
as the IN operator.
The SELECT list cannot include any references to values that evaluate to a BLOB, ARRAY,
CLOB, or NCLOB.
A subquery cannot be immediately enclosed in a set function.
The BETWEEN operator cannot be used with a subquery. However, the BETWEEN operator
can be used within the subquery.
Correlated subquery
A correlated subquery is one way of reading every row in a table and comparing values in each
row against related data. It is used whenever a subquery must return a different result or set of
results for each candidate row considered by the main query. In other words, you can use a
correlated subquery to answer a multipart question whose answer depends on the value in each
row processed by the parent statement.
Nested Subqueries Versus Correlated Subqueries :
With a normal nested subquery, the inner SELECT query runs first and executes once, returning
values to be used by the main query. A correlated subquery, however, executes once for each
candidate row considered by the outer query. In other words, the inner query is driven by the
outer query.
Backup and restore commands
Restoring is the process of copying data from a backup and applying logged transactions to the
data. Restore is what you do with backups. Take the backup file and turn it back into a database.
The Restore database option can be done using either of the following two methods.
Method 1 – T-SQL
Syntax
Restore database <Your database name> from disk = '<Backup file location + file name>'
Backup is a copy of data/database, etc. Backing up MS SQL Server database is essential for
protecting data. MS SQL Server backups are mainly three types − Full or Database, Differential
or Incremental, and Transactional Log or Log.
The following command is used for full backup database called 'TestDB' to the location 'D:\'
with backup file name 'TestDB_Full.bak'
Backup database TestDB to disk = 'D:\TestDB_Full.bak'
(vii) NoSQL Database Technology
Introduction to NoSQL Databases,
NoSQL is a non-relational database management systems, different from traditional relational
database management systems in some significant ways. It is designed for distributed data
stores where very large scale of data storing needs (for example Google or Facebook which
collects terabits of data every day for their users). These type of data storing may not require
fixed schema, avoid join operations and typically scale horizontally.
Difference between relational and NoSQL databse,
RDBMS
- Structured and organized data
- Structured query language (SQL)
- Data and its relationships are stored in separate tables.
- Data Manipulation Language, Data Definition Language
- Tight Consistency

NoSQL
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Key-Value pair storage, Column Store, Document Store, Graph databases
- Eventual consistency rather ACID property
- Unstructured and unpredictable data
- CAP Theorem
- Prioritizes high performance, high availability and scalability
- BASE Transaction
NoSQL features,
NoSQL Databases can have a common set of features such as:
Non-relational data model.
Runs well on clusters.
Mostly open-source.
Built for the new generation Web applications.
Is schema-less.
types,
Types of NoSQL Databases:

Key-value Pair Based


Column-oriented Graph
Graphs based
Document-oriented

Key Value Pair Based


Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.

Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like "Website" associated with a value like
"Guru99".
It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon's Dynamo paper.

Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored contiguously.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part
is stored as a document. The document is stored in JSON or XML formats. The value is
understood by the DB and can be queried.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases
advantages,
Advantages of NoSQL
Can be used as Primary or Analytic Data Source
Big Data Capability
No Single Point of Failure
Easy Replication
No Need for Separate Caching Layer
It provides fast performance and horizontal scalability.
Can handle structured, semi-structured, and unstructured data with equal effect
Object-oriented programming which is easy to use and flexible
NoSQL databases don't need a dedicated high-performance server
Support Key Developer Languages and Platforms
Simple to implement than using RDBMS
It can serve as the primary data source for online applications.
Handles big data which manages data velocity, variety, volume, and complexity
Excels at distributed database and multi-data center operations
Eliminates the need for a specific caching layer to store data
Offers a flexible schema design which can easily be altered without downtime or service
disruption
Disadvantages of NoSQL
No standardization rules
Limited query capabilities
RDBMS databases and tools are comparatively mature
It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
When the volume of data increases it is difficult to maintain unique values as keys become
difficult
Doesn't work as well with relational data
The learning curve is stiff for new developers
Open source options so not so popular for enterprises.
Architecture of MongoDB, Documents, Collections,
MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead
of using tables and rows as in the traditional relational databases, MongoDB makes use of
collections and documents. Documents consist of key-value pairs which are the basic unit of
data in MongoDB. Collections contain sets of documents and function which is the equivalent
of relational database tables. MongoDB is a database which came into light around the mid-
2000s.
Database: In simple words, it can be called the physical container for data. Each of the
databases has its own set of files on the file system with multiple databases existing on a single
MongoDB server.
Collection: A group of database documents can be called a collection. The RDBMS equivalent
to a collection is a table. The entire collection exists within a single database. There are no
schemas when it comes to collections. Inside the collection, various documents can have varied
fields, but mostly the documents within a collection are meant for the same purpose or for
serving the same end goal.
Document: A set of key–value pairs can be designated as a document. Documents are
associated with dynamic schemas. The benefit of having dynamic schemas is that a document in
a single collection does not have to possess the same structure or fields. Also, the common
fields in a collection’s document can have varied types of data.
A record in a MongoDB collection is basically called a document. The document, in turn, will
consist of field name and values.
• Important Features of MongoDB
Queries: It supports ad-hoc queries and document-based queries.
Index Support: Any field in the document can be indexed.
Replication: It supports Master–Slave replication. MongoDB uses native application to
maintain multiple copies of data. Preventing database downtime is one of the replica set’s
features as it has self-healing shard.
Multiple Servers: The database can run over multiple servers. Data is duplicated to foolproof
the system in the case of hardware failure.
Auto-sharding: This process distributes data across multiple physical partitions called shards.
Due to sharding, MongoDB has an automatic load balancing feature.
MapReduce: It supports MapReduce and flexible aggregation tools.
Failure Handling: In MongoDB, it’s easy to cope with cases of failures. Huge numbers of
replicas give out increased protection and data availability against database downtime like rack
failures, multiple machine failures, and data center failures, or even network partitions.
GridFS: Without complicating your stack, any sizes of files can be stored. GridFS feature
divides files into smaller parts and stores them as separate documents.
Schema-less Database: It is a schema-less database written in C++.
Document-oriented Storage: It uses BSON format which is a JSON-like format.
Procedures: MongoDB JavaScript works well as the database uses the language instead of
procedures.
Dynamic Schemas,
Dynamic Schema: MongoDB supports dynamic schemas. In other words, we need not define
the schema before the insertion of data. We can change the schema of the database dynamically.
A dynamic schema supports fluent polymorphism.
Mongo Shell,
MongoDB Shell is the quickest way to connect, configure, query, and work with your
MongoDB database.
MongoDB Shell provides a modern command-line experience that includes syntax highlighting,
intelligent autocomplete, contextual help, and clear error messages. These are just some of the
features included in MongoDB Shell that makes working with your MongoDB Databases
easier.
The MongoDB Shell is a standalone product, it’s developed separately from the MongoDB
Server and it’s open-source under the Apache 2 license.
Mongo Server and Client,
MongoDB compass is a client of Mongo server.
The GUI for MongoDB. Visually explore your data. Run ad hoc queries in seconds. Interact
with your data with full CRUD functionality. View and optimize your query performance.
Available on Linux, Mac, or Windows. Compass empowers you to make smarter decisions
about indexing, document validation, and more.
Data Types,
MongoDB, data representation is done in JSON document format, but here the JSON is binary-
encoded, which is termed as BSON. BSON is the extended version of the JSON model, which
is providing additional data types, makes performance to be competent to encode and decode in
diverse languages and ordered fields.
Types:
Integer is a data type that is used for storing a numerical value, i.e., integers as you can save in
other programming languages. 32 bit or 64-bit integers are supported, which depends on the
server.
db.TestCollection.insert({"Integer example": 62})
Boolean is implemented for storing a Boolean (i.e., true or false) values.
db.TestCollection.insert({"Nationality Indian": true})
Double is implemented for storing floating-point data in MongoDB.
db.TestCollection.insert({"double data type": 3.1415})
Min / Max keys are implemented for comparing a value adjacent to the lowest as well as
highest BSON elements.
String is one of the most frequently implemented data type for storing the data.
db.TestCollection.insert({"string data type" : "This is a sample message."})
Arrays are implemented for storing arrays or list type or several values under a single key.
var degrees = ["BCA", "BS", "MCA"]
db.TestCollection.insert({" Array Example" : " Here is an example of array",
" Qualification" : degrees})
Object is implemented for embedded documents.
var embeddedObject={"English" : 94, "ComputerSc." : 96, "Maths" : 80,
"GeneralSc." : 85}
db.TestCollection.insert({"Object data type" : "This is Object",
"Marks" : embeddedObject})
Symbol is implemented to a string and is usually kept reticent for languages having specific
symbol type.
Null is implemented for storing a Null value.

db.TestCollection.insert({" EmailID ": null})


Date is implemented for storing the current date and time as UNIX-time format.
var date=new Date()
var date2=ISODate()
var month=date2.getMonth()
db.TestCollection.insert({"Date":date, "Date2":date2, "Month":month})
Timestamp stores 64-bit value, in which the first 32 bits are time_t value (seconds epoch) and
the other 32 bits are ordinal to operate within a given second.

Binary data is implemented for storing binary data.


Object ID is implemented for storing the ID of the document.
Regular expression is implemented for storing regular expression.
Code is implemented for storing JavaScript code for your MongoDB document.
Embedded Documents,
MongoDB provides you a cool feature which is known as Embedded or Nested Document.
Embedded document or nested documents are those types of documents which contain a
document inside another document. Or in other words, when a collection has a document, this
document contains another document, another document contains another sub-document, and so
on, then such types of documents are known as embedded/nested documents.
Relationships in MongoDB represent how various documents are logically related to each other.
Relationships can be modeled via Embedded and Referenced approaches. Such relationships
can be either 1:1, 1:N, N:1 or N:N.
Let us consider the case of storing addresses for users. So, one user can have multiple addresses
making this a 1:N relationship.
Following is the sample document structure of user document −
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"name": "Tom Hanks",
"contact": "987654321",
"dob": "01-01-1991"
}
Following is the sample document structure of address document −
{
"_id":ObjectId("52ffc4a5d85242602e000000"),
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
}
Modeling Embedded Relationships
In the embedded approach, we will embed the address document inside the user document.
> db.users.insert({
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address": [
{
"building": "22 A, Indiana Apt",
"pincode": 123456,
"city": "Los Angeles",
"state": "California"
},
{
"building": "170 A, Acropolis Apt",
"pincode": 456789,
"city": "Chicago",
"state": "Illinois"
}
]
}
})
This approach maintains all the related data in a single document, which makes it easy to
retrieve and maintain. The whole document can be retrieved in a single query such as −
>db.users.findOne({"name":"Tom Benzamin"},{"address":1})
Note that in the above query, db and users are the database and collection respectively.
The drawback is that if the embedded document keeps on growing too much in size, it can
impact the read/write performance.
Modeling Referenced Relationships
This is the approach of designing normalized relationship. In this approach, both the user and
address documents will be maintained separately but the user document will contain a field that
will reference the address document's id field.
{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin",
"address_ids": [
ObjectId("52ffc4a5d85242602e000000"),
ObjectId("52ffc4a5d85242602e000001")
]
}
As shown above, the user document contains the array field address_ids which contains
ObjectIds of corresponding addresses. Using these ObjectIds, we can query the address
documents and get address details from there. With this approach, we will need two queries:
first to fetch the address_ids fields from user document and second to fetch these addresses
from address collection.
>var result = db.users.findOne({"name":"Tom Benzamin"},{"address_ids":1})
>var addresses = db.address.find({"_id":{"$in":result["address_ids"]}})
Creating Configuration file for Mongo,
Configuration File
You can configure mongod and mongos instances at startup using a configuration file. The
configuration file contains settings that are equivalent to the mongod and mongos command-
line options. See Configuration File Settings and Command-Line Options Mapping.
Using a configuration file makes managing mongod and mongos options easier, especially for
large-scale deployments. You can also add comments to the configuration file to explain the
server’s settings.
JSON File format for storing documents,
JSON stands for JavaScript Object Notation
JSON is a lightweight data-interchange format
JSON is "self-describing" and easy to understand
JSON is language independent *
Since the JSON format is text only, it can easily be sent to and from a server, and used as a data
format by any programming language.
JavaScript has a built in function to convert a string, written in JSON format, into native
JavaScript objects:
JSON.parse()
So, if you receive data from a server, in JSON format, you can use it like any other JavaScript
object.
Inserting and Saving Documents,
To add a document to your database, use the db.<collection>.insert command.
> db.user.insert({name: "Ada Lovelace", age: 205})
WriteResult({ "nInserted" : 1 })
A couple of notes. The “user” in the command refers to the collection that the document was
being inserted in. Collections in MongoDB are like tables in a SQL database, but they are
groups of documents rather than groups of records.
The db.collection.save() method is used to updates an existing document or inserts a new
document, depending on its document parameter.
Syntax:
db.collection.save()
i.e.
db.invoice.save( { inv_no: "I00001", inv_date: "10/10/2012", ord_qty:200 } );
Batch Insert,
Bulk.insert(<document>)¶
Adds an insert operation to a bulk operations list.
Bulk.insert() accepts the following parameter:
Parameter Type Description
doc document Document to insert. The size of the document must be less than or equal to
the maximum BSON document size.
But use bulkwWrite() method it’s more easy and simple to do more than just insert

db.students.bulkWrite(
[
{ insertOne :
{
"document" :
{
name: "Andrew", major: "Architecture", gpa: 3.2
}
}
},
{ insertOne :
{
"document" :
{
name: "Terry", major: "Math", gpa: 3.8
}
}
},
{ updateOne :
{
filter : { name : "Terry" },
update : { $set : { gpa : 4.0 } }
}
},
{ deleteOne :
{ filter : { name : "Kate"} }
},
{ replaceOne :
{
filter : { name : "Claire" },
replacement : { name: "Genny", major: "Counsling", gpa: 2.4 }
}
}
],
{ordered: false}
);
Insert Validation,
Validation rules are on a per-collection basis.
To specify validation rules when creating a new collection, use db.createCollection() with the
validator option.
To add document validation to an existing collection, use collMod command with the validator
option.
JSON Schema is the recommended means of performing schema validation.
For example, the following example specifies validation rules using JSON schema:
db.createCollection("students", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "year", "major", "address" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
year: {
bsonType: "int",
minimum: 2017,
maximum: 3017,
description: "must be an integer in [ 2017, 3017 ] and is required"
},
major: {
enum: [ "Math", "English", "Computer Science", "History", null ],
description: "can only be one of the enum values and is required"
},
gpa: {
bsonType: [ "double" ],
description: "must be a double if the field exists"
},
address: {
bsonType: "object",
required: [ "city" ],
properties: {
street: {
bsonType: "string",
description: "must be a string if the field exists"
},
city: {
bsonType: "string",
description: "must be a string and is required"
}
}
}
}
}
}
})
Removing Documents,
db.collection.deleteMany()
db.collection.deleteOne()
the following example deletes all documents from the inventory collection:
db.inventory.deleteMany({})
Delete All Documents that Match a Condition
The following example removes all documents from the inventory collection where the status
field equals "A":
db.inventory.deleteMany({ status : "A" })
Delete Only One Document that Matches a Condition
db.inventory.deleteOne( { status: "D" } )
Updating Documents,
Update a Single Document
The following example uses the db.collection.updateOne() method on the inventory collection
to update the first document where item equals "paper":

db.inventory.updateOne(
{ item: "paper" },
{
$set: { "size.uom": "cm", status: "P" },
$currentDate: { lastModified: true }
}
)
The update operation:
uses the $set operator to update the value of the size.uom field to "cm" and the value of the
status field to "P",
uses the $currentDate operator to update the value of the lastModified field to the current date.
If lastModified field does not exist, $currentDate will create the field. See $currentDate for
details.
Document Replacement,
To replace the entire content of a document except for the _id field, pass an entirely new
document as the second argument to db.collection.replaceOne().
When replacing a document, the replacement document must consist of only field/value pairs;
i.e. do not include update operators expressions.
The replacement document can have different fields from the original document. In the
replacement document, you can omit the _id field since the _id field is immutable; however, if
you do include the _id field, it must have the same value as the current value.
The following example replaces the first document from the inventory collection where item:
"paper":
db.inventory.replaceOne(
{ item: "paper" },
{ item: "paper", instock: [ { warehouse: "A", qty: 60 }, { warehouse: "B", qty: 40 } ] }
)
_id Field
Once set, you cannot update the value of the _id field nor can you replace an existing document
with a replacement document that has a different _id field value.
Using Modifiers,
In addition to the MongoDB Query Operators, there are a number of “meta” operators that let
you modify the output or behavior of a query.
Modifiers :
Name Description
$comment Adds a comment to the query to identify queries in the database profiler output.
$explain Forces MongoDB to report on query execution plans. See explain().
$hint Forces MongoDB to use a specific index. See hint()
$max Specifies an exclusive upper limit for the index to use in a query. See max().
$maxTimeMS Specifies a cumulative time limit in milliseconds for processing operations
on a cursor. See maxTimeMS().
$min Specifies an inclusive lower limit for the index to use in a query. See min().
$orderby Returns a cursor with documents sorted according to a sort specification. See
sort().
$query Wraps a query document.
$returnKey Forces the cursor to only return fields included in the index.
$showDiskLoc Modifies the documents returned to include references to the on-disk
location of each document.
$natural A special sort order that orders documents using the order of documents on disk.
Updating Multiple Documents,
The following example uses the db.collection.updateMany() method on the inventory collection
to update all documents where qty is less than 50:
db.inventory.updateMany(
{ "qty": { $lt: 50 } },
{
$set: { "size.uom": "in", status: "P" },
$currentDate: { lastModified: true }
}
)

Returning Updated Documents,


db.collection.findAndModify(document)
Modifies and returns a single document. By default, the returned document does not include the
modifications made on the update. To return the document with the modifications made on the
update, use the new option. The findAndModify() method is a shell helper around the
findAndModify command.
The findAndModify() method has the following form:
db.collection.findAndModify({
query: <document>,
sort: <document>,
remove: <boolean>,
update: <document or aggregation pipeline>, // Changed in MongoDB 4.2
new: <boolean>,
fields: <document>,
upsert: <boolean>,
bypassDocumentValidation: <boolean>,
writeConcern: <document>,
collation: <document>,
arrayFilters: [ <filterdocument1>, ... ]
});
Introduction to Indexing,
Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB
must perform a collection scan, i.e. scan every document in a collection, to select those
documents that match the query statement. If an appropriate index exists for a query, MongoDB
can use the index to limit the number of documents it must inspect.
Indexes are special data structures [1] that store a small portion of the collection’s data set in an
easy to traverse form. The index stores the value of a specific field or set of fields, ordered by
the value of the field. The ordering of the index entries supports efficient equality matches and
range-based query operations. In addition, MongoDB can return sorted results by using the
ordering in the index.
Fundamentally, indexes in MongoDB are similar to indexes in other database systems.
MongoDB defines indexes at the collection level and supports indexes on any field or sub-field
of the documents in a MongoDB collection.
db.collection.createIndex( <key and index type specification>, <options> )
db.collection.createIndex( { name: -1 } )
Introduction to Compound Indexes, Using Compound Indexes,
MongoDB also supports user-defined indexes on multiple fields, i.e. compound indexes.
The order of fields listed in a compound index has significance. For instance, if a compound
index consists of { userid: 1, score: -1 }, the index sorts first by userid and then, within each
userid value, sorts by score.
db.collection.createIndex( { <field1>: <type>, <field2>: <type2>, ... } )
The value of the field in the index specification describes the kind of index for that field. For
example, a value of 1 specifies an index that orders items in ascending order. A value of -1
specifies an index that orders items in descending order.
Indexes store references to fields in either ascending (1) or descending (-1) sort order. For
single-field indexes, the sort order of keys doesn’t matter because MongoDB can traverse the
index in either direction. However, for compound indexes, sort order can matter in determining
whether the index can support a sort operation.
Consider a collection events that contains documents with the fields username and date.
Applications can issue queries that return results sorted first by ascending username values and
then by descending (i.e. more recent to last) date values, such as:
db.events.find().sort( { username: 1, date: -1 } )
Indexing Objects and Arrays,
To index a field that holds an array value, MongoDB creates an index key for each element in
the array. These multikey indexes support efficient queries against array fields. Multikey
indexes can be constructed over arrays that hold both scalar values [1] (e.g. strings, numbers)
and nested documents.
To create a multikey index, use the db.collection.createIndex() method:
db.coll.createIndex( { <field>: < 1 or -1 > } )
MongoDB automatically creates a multikey index if any indexed field is an array; you do not
need to explicitly specify the multikey type.
Aggregation Framework,
Aggregation operations process data records and return computed results. Aggregation
operations group values from multiple documents together, and can perform a variety of
operations on the grouped data to return a single result. MongoDB provides three ways to
perform aggregation: the aggregation pipeline, the map-reduce function, and single purpose
aggregation methods.
The aggregation pipeline is a framework for data aggregation modeled on the concept of data
processing pipelines. Documents enter a multi-stage pipeline that transforms the documents into
aggregated results.
Pipeline Operations- $match,
The MongoDB $match operator filters the documents to pass only those documents that match
the specified condition(s) to the next pipeline stage.
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])
First Stage: The $match stage filters the documents by the status field and passes to the next
stage those documents that have status equal to "A".
Second Stage: The $group stage groups the documents by the cust_id field to calculate the sum
of the amount for each unique cust_id.
$project,
Passes along the documents with the requested fields to the next stage in the pipeline. The
specified fields can be existing fields from the input documents or newly computed fields.
The $project stage has the following prototype form:
{ $project: { <specification(s)> } }
The $project takes a document that can specify the inclusion of fields, the suppression of the _id
field, the addition of new fields, and the resetting of the values of existing fields. Alternatively,
you may specify the exclusion of fields.
The $project specifications have the following forms:
Form Description
<field>: <1 or true> Specifies the inclusion of a field.
_id: <0 or false>
Specifies the suppression of the _id field.
$group,
The MongoDB $group stages operator groups the documents by some specified expression and
groups the document for each distinct grouping. An _id field in the output documents contains
the distinct group by key. The output documents can also contain computed fields that hold the
values of some accumulator expression grouped by the $group‘s _id field. This operator does
not order its output documents.
Syntax:
{ $group: { _id: <expression>, <field1>: { <accumulator1> : <expression1> }, ... } }
Points to remember:
The _id field is mandatory; an _id value can be specified as a null to calculate accumulated
values for all the input documents as a whole.
The rest of the fields eligible to be computed are optional and computed using the
<accumulator> operators.
The _id and the <accumulator> expressions can accept any valid expression.
$unwind,
The MongoDB $unwind stages operator is used to deconstructing an array field from the input
documents to output a document for each element. Every output document is the input
document with the value of the array field replaced by the element.
Syntax:
{ $unwind: <field path> }
Points to remember:
If the value of a field is not an array, db.collection.aggregate() generates an error.
If the specified path for a field does not exist in an input document, the pipeline ignores the
input document and displaying no output.
If the array is empty in an input document, the pipeline ignores the input document and
displaying no output.
$sort,
Sorts all input documents and returns them to the pipeline in sorted order.
The $sort stage has the following prototype form:
{ $sort: { <field1>: <sort order>, <field2>: <sort order> ... } }
$sort takes a document that specifies the field(s) to sort by and the respective sort order. <sort
order> can have one of the following values:
Value Description
1 Sort ascending.
-1 Sort descending.
{ $meta: "textScore" } Sort by the computed textScore metadata in descending order.
db.users.aggregate(
[
{ $sort : { age : -1, posts: 1 } }
]
)
$limit,
The limit() function in MongoDB is used to specify the maximum number of results to be
returned. Only one parameter is required for this function.to return the number of the desired
result.
db.article.aggregate([
{ $limit : 5 }
]);
This operation returns only the first 5 documents passed to it by the pipeline. $limit has no
effect on the content of the documents it passes.
$skip,
Skips over the specified number of documents that pass into the stage and passes the remaining
documents to the next stage in the pipeline.
The $skip stage has the following prototype form:
{ $skip: <positive integer> }
$skip takes a positive integer that specifies the maximum number of documents to skip.
Example
Consider the following example:
db.article.aggregate([
{ $skip : 5 }
]);
This operation skips the first 5 documents passed to it by the pipeline. $skip has no effect on the
content of the documents it passes along the pipeline.
Using Pipelines,
Pipeline
The MongoDB aggregation pipeline consists of stages. Each stage transforms the documents as
they pass through the pipeline. Pipeline stages do not need to produce one output document for
every input document; e.g., some stages may generate new documents or filter out documents.
Pipeline stages can appear multiple times in the pipeline with the exception of $out, $merge,
and $geoNear stages.
MongoDB and MapReduce,
As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing
large volumes of data into useful aggregated results. MongoDB uses mapReduce command for
map-reduce operations. MapReduce is generally used for processing large data sets.
MapReduce Command
Following is the syntax of the basic mapReduce command −
>db.collection.mapReduce(
function() {emit(key,value);}, //map function
function(key,values) {return reduceFunction}, { //reduce function
out: collection,
query: document,
sort: document,
limit: number
}
)
The map-reduce function first queries the collection, then maps the result documents to emit
key-value pairs, which is then reduced based on the keys that have multiple values.
In the above syntax −
map is a javascript function that maps a value with a key and emits a key-value pair
reduce is a javascript function that reduces or groups all the documents having the same key
out specifies the location of the map-reduce query result
query specifies the optional selection criteria for selecting documents
sort specifies the optional sort criteria
limit specifies the optional maximum number of documents to be returned
Aggregation Commands,
Name Description
aggregate Performs aggregation tasks such as group using the aggregation framework.
count Counts the number of documents in a collection or a view.
distinct Displays the distinct values found for a specified key in a collection or a view.
mapReduce Performs map-reduce aggregation for large data sets.
Aggregation Methods
Name Description
db.collection.aggregate() Provides access to the aggregation pipeline.
db.collection.mapReduce() Performs map-reduce aggregation for large data sets.
Introduction to Replication,
Replication is the process of synchronizing data across multiple servers. Replication provides
redundancy and increases data availability with multiple copies of data on different database
servers. Replication protects a database from the loss of a single server. Replication also allows
you to recover from hardware failure and service interruptions. With additional copies of the
data, you can dedicate one to disaster recovery, reporting, or backup.
Why Replication?
To keep your data safe
High (24*7) availability of data
Disaster recovery
No downtime for maintenance (like backups, index rebuilds, compaction)
Read scaling (extra copies to read from)
Replica set is transparent to the application
How Replication Works in MongoDB
MongoDB achieves replication by the use of replica set. A replica set is a group of mongod
instances that host the same data set. In a replica, one node is primary node that receives all
write operations. All other instances, such as secondaries, apply operations from the primary so
that they have the same data set. Replica set can have only one primary node.
Replica set is a group of two or more nodes (generally minimum 3 nodes are required).
In a replica set, one node is primary node and remaining nodes are secondary.
All data replicates from primary to secondary node.
At the time of automatic failover or maintenance, election establishes for primary and a new
primary node is elected.
After the recovery of failed node, it again join the replica set and works as a secondary node.
configuring a Replica Set,
we will convert standalone MongoDB instance to a replica set. To convert to replica set,
following are the steps −
Shutdown already running MongoDB server.
Start the MongoDB server by specifying -- replSet option. Following is the basic syntax of --
replSet −
mongod --port "PORT" --dbpath "YOUR_DB_DATA_PATH" --replSet
"REPLICA_SET_INSTANCE_NAME"
Example
mongod --port 27017 --dbpath "D:\set up\mongodb\data" --replSet rs0
It will start a mongod instance with the name rs0, on port 27017.
Now start the command prompt and connect to this mongod instance.
In Mongo client, issue the command rs.initiate() to initiate a new replica set.
To check the replica set configuration, issue the command rs.conf(). To check the status of
replica set issue the command rs.status().
Member Configuration Options
To add members to replica set, start mongod instances on multiple machines. Now start a
mongo client and issue a command rs.add().
Syntax
The basic syntax of rs.add() command is as follows −
>rs.add(HOST_NAME:PORT)
Example
suppose your mongod instance name is mongod1.net and it is running on port 27017. To add
this instance to replica set, issue the command rs.add() in Mongo client.
>rs.add("mongod1.net:27017")
>
You can add mongod instance to replica set only when you are connected to primary node. To
check whether you are connected to primary or not, issue the command db.isMaster() in mongo
client.

What is the difference between replication and sharding?


Replication: The primary server node copies data onto secondary server nodes. This can help
increase data availability and act as a backup, in case if the primary server fails.

Sharding: Handles horizontal scaling across servers using a shard key. This means that rather
than copying data holistically, sharding copies pieces of the data (or “shards”) across multiple
replica sets. These replica sets work together to utilize all of the data.

Think of it like a pizza. With replication, you are making a copy of a complete pizza pie on
every server. With sharding, you’re sending pizza slices to several different replica sets.
Combined together, you have access to the entire pizza pie.

Replication and sharding can work together to form something called a sharded cluster, where
each shard is replicated in turn to preserve the same high availability.
methods used in mongodb shell
To drop a database first select it by USE databasename remember command prompt is letter
sensitive to if your database name is Accme than don’t write it’s name as accme same goes for
functions. After selecting the database to drop it type db.dropDatabase()
To check in which database you are currently in type db
To create a database type use databasename
And put some data in it if don’t it would not save
To see all available methods visit : https://docs.mongodb.com/manual/reference/method/

(viii) Selecting the Right Database


Selection of right databases, RDBMS or NoSQL, selection of database based on
performance, data size, type of data, frequency of accessing data, business needs,
type of application.

Notes by – [email protected]

You might also like