A7-R5-Databases Technologies (A Level Syllabus Based Notes)
A7-R5-Databases Technologies (A Level Syllabus Based Notes)
Why database?
Databases can store very large numbers of records efficiently (they take up little space).
It is very quick and easy to find information.
It is easy to add new data and to edit or delete old data.
Data can be searched easily, eg 'find all Ford cars'.
Data can be sorted easily, for example into 'date first registered' order.
Data can be imported into other applications, for example a mail-merge letter to a customer saying that an MOT
test is due.
More than one person can access the same database at the same time - multi-access.
Security may be better than in paper files.
Sharing of Data
In a database, the users of the database can share the data among themselves. There are various
levels of authorisation to access the data, and consequently the data can only be shared based on
the correct authorisation protocols being followed.
Many remote users can also access the database simultaneously and share the data between
themselves.
Data Integrity
Data integrity means that the data is accurate and consistent in the database. Data Integrity is
very important as there are multiple databases in a DBMS. All of these databases contain data
that is visible to multiple users. So it is necessary to ensure that the data is correct and
consistent in all the databases and for all the users.
Data Security
Data Security is vital concept in a database. Only authorised users should be allowed to access
the database and their identity should be authenticated using a username and password.
Unauthorised users should not be allowed to access the database under any circumstances as it
violates the integrity constraints.
Privacy
The privacy rule in a database means only the authorized users can access a database according
to its privacy constraints. There are levels of database access and a user can only view the data
he is allowed to. For example - In social networking sites, access constraints are different for
different accounts a user may want to access.
Data Consistency
Data consistency is ensured in a database because there is no data redundancy. All data appears
consistently across the database and the data is same for all the users viewing the database.
Moreover, any changes made to the database are immediately reflected to all the users and there
is no data inconsistency.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and
uses it. Any changes done here will directly be done on the DBMS itself. It does not provide
handy tools for end-users. Database designers and programmers normally prefer to use single-
tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS
can be accessed. Programmers use 2-tier architecture where they access the DBMS by means of
an application. Here the application tier is entirely independent of the database in terms of
operation, design, and programming.
3-tier Architecture
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design a
DBMS.
Database (Data) Tier − At this tier, the database resides along with its query processing
languages. We also have the relations that define the data and their constraints at this level.
Application (Middle) Tier − At this tier reside the application server and the programs that
access the database. For a user, this application tier presents an abstracted view of the database.
End-users are unaware of any existence of the database beyond the application. At the other
end, the database tier is not aware of any other user beyond the application tier. Hence, the
application layer sits in the middle and acts as a mediator between the end-user and the
database.
User (Presentation) Tier − End-users operate on this tier and they know nothing about any
existence of the database beyond this layer. At this layer, multiple views of the database can be
provided by the application. All views are generated by applications that reside in the
application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.
Three levels of architecture,
The ANSI-SPARC database architecture is the basis of most of the modern databases.
The three levels present in this architecture are Physical level, Conceptual level and External
level.
The details of these levels are as follows −
Physical Level
This is the lowest level in the three level architecture. It is also known as the internal level. The
physical level describes how data is actually stored in the database. In the lowest level, this data
is stored in the external hard drives in the form of bits and at a little high level, it can be said
that the data is stored in files and folders. The physical level also discusses compression and
encryption techniques.
Conceptual Level
The conceptual level is at a higher level than the physical level. It is also known as the logical
level. It describes how the database appears to the users conceptually and the relationships
between various data tables. The conceptual level does not care for how the data in the database
is actually stored.
External Level
This is the highest level in the three level architecture and closest to the user. It is also known as
the view level. The external level only shows the relevant database content to the users in the
form of views and hides the rest of the data. So different users can see the database as a
different view as per their individual requirements.
Logical View, Physical View, Conceptual View,
Logical data independence, Physical Data Independence
What is Data Independence of DBMS?
Data Independence is defined as a property of DBMS that helps you to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that make
use of it.
Physical Data Independence
Physical data independence helps you to separate conceptual levels from the internal/physical
levels. It allows you to provide a logical description of the database without the need to specify
physical structures. Compared to Logical Independence, it is easy to achieve physical data
independence.
With Physical independence, you can easily change the physical storage structures or devices
with an effect on the conceptual schema. Any change done would be absorbed by the mapping
between the conceptual and internal levels. Physical data independence is achieved by the
presence of the internal level of the database and then the transformation from the conceptual
level of the database to the internal level.
External views
External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.
A relational database refers to a database that stores data in a structured format, using rows and
columns. This makes it easy to locate and access specific values within the database. It is
"relational" because the values within each table are related to each other. Tables may also be
related to other tables. The relational structure makes it possible to run queries across multiple
tables at once.
While a relational database describes the type of database an RDMBS manages, the RDBMS
refers to the database program itself. It is the software that executes queries on the data,
including adding, updating, and searching for values. An RDBMS may also provide a visual
representation of the data. For example, it may display data in a tables like a spreadsheet,
allowing you to view and even edit individual values in the table. Some RDMBS programs
allow you to create forms that can streamline entering, editing, and deleting data.
Most well known DBMS applications fall into the RDBMS category. Examples include Oracle
Database, MySQL, Microsoft SQL Server, and IBM DB2. Some of these programs support
non-relational databases, but they are primarily used for relational database management.
RDBMS terminology, relational model, base tables, keys, primary key, foreign key,
RDMS Terminologies include Database, Table, Columns, etc. Let us see them one by one −
Keys help you to identify any row of data in a table. In a real-world application, a table could
contain thousands of records. Moreover, the records could be duplicated. Keys ensure that you
can uniquely identify a table record despite these challenges.
Super Key - A super key is a group of single or multiple keys which identifies rows in a table.
Primary Key - is a column or group of columns in a table that uniquely identify every row in
that table.
Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a
super key with no repeated attributes.
Alternate Key - is a column or group of columns in a table that uniquely identify every row in
that table.
Foreign Key - is a column that creates a relationship between two tables. The purpose of
Foreign keys is to maintain data integrity and allow navigation between two different instances
of an entity.
Compound Key - has two or more attributes that allow you to uniquely recognize a specific
record. It is possible that each column may not be unique by itself within the database.
Composite Key - An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don't have any
natural primary key.
Surrogate Key - An artificial key which aims to uniquely identify each record is called a
surrogate key. These kind of key are unique because they are created when you don't have any
natural primary key.
constraints,
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted from a
table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that
can be created in RDBMS.
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints
Codd Rules
Dr Edgar F. Codd, after his extensive research on the Relational Model of database systems,
came up with twelve rules of his own, which according to him, a database must obey in order to
be regarded as a true relational database.
These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.
The data contained in the database (e.g., entities: students, lecturers, courses, subjects)
The relationships between data items (e.g., students are supervised by lecturers; lecturers teach
courses)
The constraints on data (e.g., student number has exactly eight digits; a subject has four or six
units of credit only)
In the second step, the data items, the relationships and the constraints are all expressed using
the concepts provided by the high-level data model. Because these concmepts do not include the
implementation details, the result of the data modelling process is a (semi) formal
representation of the database structure. This result is quite easy to understand so it is used as
reference to make sure that all the user’s requirements are met.
The third step is database design. During this step, we might have two sub-steps: one called
database logical design, which defines a database in a data model of a specific DBMS, and
another called database physical design, which defines the internal database storage structure,
file organization or indexing techniques. These two sub-steps are database implementation and
operations/user interfaces building steps.
In the database design phases, data are represented using a certain data model. The data model
is a collection of concepts or notations for describing data, data relationships, data semantics
and data constraints. Most data models also include a set of basic operations for manipulating
data in the database.
For a dependency A → B, if for a single value of A, multiple value of B exists, then the table
may have multi-valued dependency.
Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B
and C should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
E-R Diagram.
An Entity–relationship model (ER model) describes the structure of a database with the help of
a diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a
design or blueprint of a database that can later be implemented as a database. The main
components of E-R model are: entity set and relationship set.
1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an ER
diagram.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship
with other entity is called weak entity. The weak entity is represented by a double rectangle
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER
diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll
number can uniquely identify a student from a set of students. Key attribute is represented by
oval same as other attributes however the text of key attribute is underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For
example, In student entity, the student address is a composite attribute as an address is
composed of other attributes such as pin code, state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented
with double ovals in an ER Diagram. For example – A person can have more than one phone
numbers so the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is
represented by dashed oval in an ER Diagram. For example – Person age is a derived attribute
as it changes over time and can be derived from another attribute (Date of birth).
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship among
entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
There are several processes and algorithms available to convert ER Diagrams into Relational
Schema. Some of them are automated and some of them are manual. We may focus here on the
mapping diagram contents to relational basics.
(v) Maria DB
Introduction to Maria DB,
MariaDB Database
MariaDB is a popular fork of MySQL created by MySQL's original developers. It grew out of
concerns related to MySQL's acquisition by Oracle. It offers support for both small data
processing tasks and enterprise needs. It aims to be a drop-in replacement for MySQL requiring
only a simple uninstall of MySQL and an install of MariaDB. MariaDB offers the same features
of MySQL and much more.
MariaDB runs on a number of operating systems and supports a wide variety of programming
languages.
MariaDB offers support for PHP, one of the most popular web development languages.
MariaDB also offers many operations and commands unavailable in MySQL, and
eliminates/replaces features impacting performance negatively.
Data Types,
MariaDB data types can be categorized as numeric, date and time, and string values.
TINYINT − This data type represents small integers falling within the signed range of -128 to
127, and the unsigned range of 0 to 255.
BOOLEAN − This data type associates a value 0 with “false,” and a value 1 with “true.”
SMALLINT − This data type represents integers within the signed range of -32768 to 32768,
and the unsigned range of 0 to 65535.
MEDIUMINT − This data type represents integers in the signed range of -8388608 to 8388607,
and the unsigned range of 0 to 16777215.
INT(also INTEGER) − This data type represents an integer of normal size. When marked as
unsigned, the range spans 0 to 4294967295. When signed (the default setting), the range spans -
2147483648 to 2147483647. When a column is set to ZEROFILL( an unsigned state), all its
values are prepended by zeros to place M digits in the INT value.
BIGINT − This data type represents integers within the signed range of 9223372036854775808
to 9223372036854775807, and the unsigned range of 0 to 18446744073709551615.
DECIMAL( also DEC, NUMERIC, FIXED)− This data type represents precise fixed-point
numbers, with M specifying its digits and D specifying the digits after the decimal. The M value
does not add “-” or the decimal point. If D is set to 0, no decimal or fraction part appears and
the value will be rounded to the nearest DECIMAL on INSERT. The maximum permitted digits
is 65, and the maximum for decimals is 30. Default value for M on omission is 10, and 0 for D
on omission.
FLOAT − This data type represents a small, floating-point number of the value 0 or a number
within the following ranges −
-3.402823466E+38 to -1.175494351E-38
1.175494351E-38 to 3.402823466E+38
DOUBLE (also REAL and DOUBLE PRECISION) − This data type represents normal-size,
floating-point numbers of the value 0 or within the following ranges −
-1.7976931348623157E+308 to -2.2250738585072014E-308
2.2250738585072014E-308 to 1.7976931348623157E+308
BIT − This data type represents bit fields with M specifying the number of bits per value. On
omission of M, the default is 1. Bit values can be applied with “ b’[value]’” in which value
represents bit value in 0s and 1s. Zero-padding occurs automatically from the left for full
length; for example, “10” becomes “0010.”
Date and Time Data Types
The date and time data types supported by MariaDB are as follows −
DATE − This data type represents a date range of “1000-01-01” to “9999-12-31,” and uses the
“YYYY-MM-DD” date format.
DATETIME − This data type represents the range “1000-01-01 00:00:00.000000” to “9999-12-
31 23:59:59.999999.” It uses the “YYYY-MM-DD HH:MM:SS” format.
YEAR − This data type represents a year in 4-digit format. The four-digit format allows values
in the range of 1901 to 2155, and 0000.
String DataTypes
The string type values supported by MariaDB are as follows −
String literals − This data type represents character sequences enclosed by quotes.
CHAR − This data type represents a right-padded, fixed-length string containing spaces of
specified length. M represents column length of characters in a range of 0 to 255, its default
value is 1.
VARCHAR − This data type represents a variable-length string, with an M range (maximum
column length) of 0 to 65535.
BINARY − This data type represents binary byte strings, with M as the column length in bytes.
VARBINARY − This data type represents binary byte strings of variable length, with M as
column length.
TINYBLOB − This data type represents a blob column with a maximum length of 255 (28 - 1)
bytes. In storage, each uses a one-byte length prefix indicating the byte quantity in the value.
BLOB − This data type represents a blob column with a maximum length of 65,535 (216 - 1)
bytes. In storage, each uses a two-byte length prefix indicating the byte quantity in the value.
MEDIUMBLOB − This data type represents a blob column with a maximum length of
16,777,215 (224 - 1) bytes. In storage, each uses a three-byte length prefix indicating the byte
quantity in the value.
LONGBLOB − This data type represents a blob column with a maximum length of
4,294,967,295(232 - 1) bytes. In storage, each uses a four-byte length prefix indicating the byte
quantity in the value.
TINYTEXT − This data type represents a text column with a maximum length of 255 (28 - 1)
characters. In storage, each uses a one-byte length prefix indicating the byte quantity in the
value.
TEXT − This data type represents a text column with a maximum length of 65,535 (216 - 1)
characters. In storage, each uses a two-byte length prefix indicating the byte quantity in the
value.
MEDIUMTEXT − This data type represents a text column with a maximum length of
16,777,215 (224 - 1) characters. In storage, each uses a three-byte length prefix indicating the
byte quantity in the value.
LONGTEXT − This data type represents a text column with a maximum length of
4,294,967,295 or 4GB (232 - 1) characters. In storage, each uses a four-byte length prefix
indicating the byte quantity in the value.
ENUM − This data type represents a string object having only a single value from a list.
SET − This data type represents a string object having zero or more values from a list, with a
maximum of 64 members. SET values present internally as integer values.
SQL Commands, Create,
Create Database is also used.
The SQL CREATE TABLE statement allows you to create and define a table.
Syntax
The syntax for the CREATE TABLE statement in SQL is:
UPDATE table
SET column1 = expression1,
column2 = expression2,
...
[WHERE conditions];
delete,
The SQL DELETE statement is a used to delete one or more records from a table.
Syntax
The syntax for the DELETE statement in SQL is:
DELETE FROM table
[WHERE conditions];
Note
You do not need to list fields in the DELETE statement since you are deleting the entire row
from the table.
drop,
The SQL DROP TABLE statement allows you to remove or delete a table from the SQL
database.
Syntax
The syntax for the DROP TABLE statement in SQL is:
alter,
The SQL ALTER TABLE statement is used to add, modify, or drop/delete columns in a table.
The SQL ALTER TABLE statement is also used to rename a table.
For example:
ALTER TABLE supplier
ADD supplier_name char(50);
This SQL ALTER TABLE example will add a column called supplier_name to the supplier
table.
2 BIN()
Returns a string representation of the argument
3 BIT_LENGTH()
Returns length of argument in bits
4 CHAR_LENGTH()
Returns number of characters in argument
5 CHAR()
Returns the character for each integer passed
6 CHARACTER_LENGTH()
A synonym for CHAR_LENGTH()
7 CONCAT_WS()
Returns concatenate with separator
8 CONCAT()
Returns concatenated string
9 CONV()
Converts numbers between different number bases
10 ELT()
Returns string at index number
11 EXPORT_SET()
Returns a string such that for every bit set in the value bits, you get an on string and for every
unset bit, you get an off string
12 FIELD()
Returns the index (position) of the first argument in the subsequent arguments
13 FIND_IN_SET()
Returns the index position of the first argument within the second argument
14 FORMAT()
Returns a number formatted to specified number of decimal places
15 HEX()
Returns a string representation of a hex value
16 INSERT()
Inserts a substring at the specified position up to the specified number of characters
17 INSTR()
Returns the index of the first occurrence of substring
18 LCASE()
Synonym for LOWER()
19 LEFT()
Returns the leftmost number of characters as specified
20 LENGTH()
Returns the length of a string in bytes
21 LOAD_FILE()
Loads the named file
22 LOCATE()
Returns the position of the first occurrence of substring
23 LOWER()
Returns the argument in lowercase
24 LPAD()
Returns the string argument, left-padded with the specified string
25 LTRIM()
Removes leading spaces
26 MAKE_SET()
Returns a set of comma-separated strings that have the corresponding bit in bits set
27 MID()
Returns a substring starting from the specified position
28 OCT()
Returns a string representation of the octal argument
29 OCTET_LENGTH()
A synonym for LENGTH()
30 ORD()
If the leftmost character of the argument is a multi-byte character, returns the code for that
character
31 POSITION()
A synonym for LOCATE()
32 QUOTE()
Escapes the argument for use in an SQL statement
33 REGEXP
Pattern matching using regular expressions
34 REPEAT()
Repeats a string the specified number of times
35 REPLACE()
Replaces occurrences of a specified string
36 REVERSE()
Reverses the characters in a string
37 RIGHT()
Returns the specified rightmost number of characters
38 RPAD()
Appends string the specified number of times
39 RTRIM()
Removes trailing spaces
40 SOUNDEX()
Returns a soundex string
41 SOUNDS LIKE
Compares sounds
42 SPACE()
Returns a string of the specified number of spaces
43 STRCMP()
Compares two strings
44 SUBSTRING_INDEX()
Returns a substring from a string before the specified number of occurrences of the delimiter
45 SUBSTRING(), SUBSTR()
Returns the substring as specified
46 TRIM()
Removes leading and trailing spaces
47 UCASE()
Synonym for UPPER()
48 UNHEX()
Converts each pair of hexadecimal digits to a character
49 UPPER()
Converts to uppercase
Date functions,
SQL Server Date Functions
Function Description
CURRENT_TIMESTAMP Returns the current date and time
DATEADD Adds a time/date interval to a date and then returns the date
DATEDIFF Returns the difference between two dates
DATEFROMPARTS Returns a date from the specified parts (year, month, and day values)
DATENAME Returns a specified part of a date (as string)
DATEPART Returns a specified part of a date (as integer)
DAY Returns the day of the month for a specified date
GETDATE Returns the current database system date and time
GETUTCDATE Returns the current database system UTC date and time
ISDATE Checks an expression and returns 1 if it is a valid date, otherwise 0
MONTH Returns the month part for a specified date (a number from 1 to 12)
SYSDATETIME Returns the date and time of the SQL Server
YEAR Returns the year part for a specified dateate functions,
SELECT column
FROM table_name
WHERE conditions
GROUP BY column
ORDER BY column
order by,
The ORDER BY clause sorts the result-set in ascending or descending order.
It sorts the records in ascending order by default. DESC keyword is used to sort the records in
descending order.
Syntax:
NoSQL
- Stands for Not Only SQL
- No declarative query language
- No predefined schema
- Key-Value pair storage, Column Store, Document Store, Graph databases
- Eventual consistency rather ACID property
- Unstructured and unpredictable data
- CAP Theorem
- Prioritizes high performance, high availability and scalability
- BASE Transaction
NoSQL features,
NoSQL Databases can have a common set of features such as:
Non-relational data model.
Runs well on clusters.
Mostly open-source.
Built for the new generation Web applications.
Is schema-less.
types,
Types of NoSQL Databases:
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like "Website" associated with a value like
"Guru99".
It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon's Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored contiguously.
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.
Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part
is stored as a document. The document is stored in JSON or XML formats. The value is
understood by the DB and can be queried.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics &
e-commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.
Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases
advantages,
Advantages of NoSQL
Can be used as Primary or Analytic Data Source
Big Data Capability
No Single Point of Failure
Easy Replication
No Need for Separate Caching Layer
It provides fast performance and horizontal scalability.
Can handle structured, semi-structured, and unstructured data with equal effect
Object-oriented programming which is easy to use and flexible
NoSQL databases don't need a dedicated high-performance server
Support Key Developer Languages and Platforms
Simple to implement than using RDBMS
It can serve as the primary data source for online applications.
Handles big data which manages data velocity, variety, volume, and complexity
Excels at distributed database and multi-data center operations
Eliminates the need for a specific caching layer to store data
Offers a flexible schema design which can easily be altered without downtime or service
disruption
Disadvantages of NoSQL
No standardization rules
Limited query capabilities
RDBMS databases and tools are comparatively mature
It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
When the volume of data increases it is difficult to maintain unique values as keys become
difficult
Doesn't work as well with relational data
The learning curve is stiff for new developers
Open source options so not so popular for enterprises.
Architecture of MongoDB, Documents, Collections,
MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead
of using tables and rows as in the traditional relational databases, MongoDB makes use of
collections and documents. Documents consist of key-value pairs which are the basic unit of
data in MongoDB. Collections contain sets of documents and function which is the equivalent
of relational database tables. MongoDB is a database which came into light around the mid-
2000s.
Database: In simple words, it can be called the physical container for data. Each of the
databases has its own set of files on the file system with multiple databases existing on a single
MongoDB server.
Collection: A group of database documents can be called a collection. The RDBMS equivalent
to a collection is a table. The entire collection exists within a single database. There are no
schemas when it comes to collections. Inside the collection, various documents can have varied
fields, but mostly the documents within a collection are meant for the same purpose or for
serving the same end goal.
Document: A set of key–value pairs can be designated as a document. Documents are
associated with dynamic schemas. The benefit of having dynamic schemas is that a document in
a single collection does not have to possess the same structure or fields. Also, the common
fields in a collection’s document can have varied types of data.
A record in a MongoDB collection is basically called a document. The document, in turn, will
consist of field name and values.
• Important Features of MongoDB
Queries: It supports ad-hoc queries and document-based queries.
Index Support: Any field in the document can be indexed.
Replication: It supports Master–Slave replication. MongoDB uses native application to
maintain multiple copies of data. Preventing database downtime is one of the replica set’s
features as it has self-healing shard.
Multiple Servers: The database can run over multiple servers. Data is duplicated to foolproof
the system in the case of hardware failure.
Auto-sharding: This process distributes data across multiple physical partitions called shards.
Due to sharding, MongoDB has an automatic load balancing feature.
MapReduce: It supports MapReduce and flexible aggregation tools.
Failure Handling: In MongoDB, it’s easy to cope with cases of failures. Huge numbers of
replicas give out increased protection and data availability against database downtime like rack
failures, multiple machine failures, and data center failures, or even network partitions.
GridFS: Without complicating your stack, any sizes of files can be stored. GridFS feature
divides files into smaller parts and stores them as separate documents.
Schema-less Database: It is a schema-less database written in C++.
Document-oriented Storage: It uses BSON format which is a JSON-like format.
Procedures: MongoDB JavaScript works well as the database uses the language instead of
procedures.
Dynamic Schemas,
Dynamic Schema: MongoDB supports dynamic schemas. In other words, we need not define
the schema before the insertion of data. We can change the schema of the database dynamically.
A dynamic schema supports fluent polymorphism.
Mongo Shell,
MongoDB Shell is the quickest way to connect, configure, query, and work with your
MongoDB database.
MongoDB Shell provides a modern command-line experience that includes syntax highlighting,
intelligent autocomplete, contextual help, and clear error messages. These are just some of the
features included in MongoDB Shell that makes working with your MongoDB Databases
easier.
The MongoDB Shell is a standalone product, it’s developed separately from the MongoDB
Server and it’s open-source under the Apache 2 license.
Mongo Server and Client,
MongoDB compass is a client of Mongo server.
The GUI for MongoDB. Visually explore your data. Run ad hoc queries in seconds. Interact
with your data with full CRUD functionality. View and optimize your query performance.
Available on Linux, Mac, or Windows. Compass empowers you to make smarter decisions
about indexing, document validation, and more.
Data Types,
MongoDB, data representation is done in JSON document format, but here the JSON is binary-
encoded, which is termed as BSON. BSON is the extended version of the JSON model, which
is providing additional data types, makes performance to be competent to encode and decode in
diverse languages and ordered fields.
Types:
Integer is a data type that is used for storing a numerical value, i.e., integers as you can save in
other programming languages. 32 bit or 64-bit integers are supported, which depends on the
server.
db.TestCollection.insert({"Integer example": 62})
Boolean is implemented for storing a Boolean (i.e., true or false) values.
db.TestCollection.insert({"Nationality Indian": true})
Double is implemented for storing floating-point data in MongoDB.
db.TestCollection.insert({"double data type": 3.1415})
Min / Max keys are implemented for comparing a value adjacent to the lowest as well as
highest BSON elements.
String is one of the most frequently implemented data type for storing the data.
db.TestCollection.insert({"string data type" : "This is a sample message."})
Arrays are implemented for storing arrays or list type or several values under a single key.
var degrees = ["BCA", "BS", "MCA"]
db.TestCollection.insert({" Array Example" : " Here is an example of array",
" Qualification" : degrees})
Object is implemented for embedded documents.
var embeddedObject={"English" : 94, "ComputerSc." : 96, "Maths" : 80,
"GeneralSc." : 85}
db.TestCollection.insert({"Object data type" : "This is Object",
"Marks" : embeddedObject})
Symbol is implemented to a string and is usually kept reticent for languages having specific
symbol type.
Null is implemented for storing a Null value.
db.students.bulkWrite(
[
{ insertOne :
{
"document" :
{
name: "Andrew", major: "Architecture", gpa: 3.2
}
}
},
{ insertOne :
{
"document" :
{
name: "Terry", major: "Math", gpa: 3.8
}
}
},
{ updateOne :
{
filter : { name : "Terry" },
update : { $set : { gpa : 4.0 } }
}
},
{ deleteOne :
{ filter : { name : "Kate"} }
},
{ replaceOne :
{
filter : { name : "Claire" },
replacement : { name: "Genny", major: "Counsling", gpa: 2.4 }
}
}
],
{ordered: false}
);
Insert Validation,
Validation rules are on a per-collection basis.
To specify validation rules when creating a new collection, use db.createCollection() with the
validator option.
To add document validation to an existing collection, use collMod command with the validator
option.
JSON Schema is the recommended means of performing schema validation.
For example, the following example specifies validation rules using JSON schema:
db.createCollection("students", {
validator: {
$jsonSchema: {
bsonType: "object",
required: [ "name", "year", "major", "address" ],
properties: {
name: {
bsonType: "string",
description: "must be a string and is required"
},
year: {
bsonType: "int",
minimum: 2017,
maximum: 3017,
description: "must be an integer in [ 2017, 3017 ] and is required"
},
major: {
enum: [ "Math", "English", "Computer Science", "History", null ],
description: "can only be one of the enum values and is required"
},
gpa: {
bsonType: [ "double" ],
description: "must be a double if the field exists"
},
address: {
bsonType: "object",
required: [ "city" ],
properties: {
street: {
bsonType: "string",
description: "must be a string if the field exists"
},
city: {
bsonType: "string",
description: "must be a string and is required"
}
}
}
}
}
}
})
Removing Documents,
db.collection.deleteMany()
db.collection.deleteOne()
the following example deletes all documents from the inventory collection:
db.inventory.deleteMany({})
Delete All Documents that Match a Condition
The following example removes all documents from the inventory collection where the status
field equals "A":
db.inventory.deleteMany({ status : "A" })
Delete Only One Document that Matches a Condition
db.inventory.deleteOne( { status: "D" } )
Updating Documents,
Update a Single Document
The following example uses the db.collection.updateOne() method on the inventory collection
to update the first document where item equals "paper":
db.inventory.updateOne(
{ item: "paper" },
{
$set: { "size.uom": "cm", status: "P" },
$currentDate: { lastModified: true }
}
)
The update operation:
uses the $set operator to update the value of the size.uom field to "cm" and the value of the
status field to "P",
uses the $currentDate operator to update the value of the lastModified field to the current date.
If lastModified field does not exist, $currentDate will create the field. See $currentDate for
details.
Document Replacement,
To replace the entire content of a document except for the _id field, pass an entirely new
document as the second argument to db.collection.replaceOne().
When replacing a document, the replacement document must consist of only field/value pairs;
i.e. do not include update operators expressions.
The replacement document can have different fields from the original document. In the
replacement document, you can omit the _id field since the _id field is immutable; however, if
you do include the _id field, it must have the same value as the current value.
The following example replaces the first document from the inventory collection where item:
"paper":
db.inventory.replaceOne(
{ item: "paper" },
{ item: "paper", instock: [ { warehouse: "A", qty: 60 }, { warehouse: "B", qty: 40 } ] }
)
_id Field
Once set, you cannot update the value of the _id field nor can you replace an existing document
with a replacement document that has a different _id field value.
Using Modifiers,
In addition to the MongoDB Query Operators, there are a number of “meta” operators that let
you modify the output or behavior of a query.
Modifiers :
Name Description
$comment Adds a comment to the query to identify queries in the database profiler output.
$explain Forces MongoDB to report on query execution plans. See explain().
$hint Forces MongoDB to use a specific index. See hint()
$max Specifies an exclusive upper limit for the index to use in a query. See max().
$maxTimeMS Specifies a cumulative time limit in milliseconds for processing operations
on a cursor. See maxTimeMS().
$min Specifies an inclusive lower limit for the index to use in a query. See min().
$orderby Returns a cursor with documents sorted according to a sort specification. See
sort().
$query Wraps a query document.
$returnKey Forces the cursor to only return fields included in the index.
$showDiskLoc Modifies the documents returned to include references to the on-disk
location of each document.
$natural A special sort order that orders documents using the order of documents on disk.
Updating Multiple Documents,
The following example uses the db.collection.updateMany() method on the inventory collection
to update all documents where qty is less than 50:
db.inventory.updateMany(
{ "qty": { $lt: 50 } },
{
$set: { "size.uom": "in", status: "P" },
$currentDate: { lastModified: true }
}
)
Sharding: Handles horizontal scaling across servers using a shard key. This means that rather
than copying data holistically, sharding copies pieces of the data (or “shards”) across multiple
replica sets. These replica sets work together to utilize all of the data.
Think of it like a pizza. With replication, you are making a copy of a complete pizza pie on
every server. With sharding, you’re sending pizza slices to several different replica sets.
Combined together, you have access to the entire pizza pie.
Replication and sharding can work together to form something called a sharded cluster, where
each shard is replicated in turn to preserve the same high availability.
methods used in mongodb shell
To drop a database first select it by USE databasename remember command prompt is letter
sensitive to if your database name is Accme than don’t write it’s name as accme same goes for
functions. After selecting the database to drop it type db.dropDatabase()
To check in which database you are currently in type db
To create a database type use databasename
And put some data in it if don’t it would not save
To see all available methods visit : https://docs.mongodb.com/manual/reference/method/
Notes by – [email protected]