Topic 1 Introduction PDF
Topic 1 Introduction PDF
INTRODUCTION TO DATABASES
A data type is an attribute of a piece of data that tells a device how the end-
user might interact with the data. You can also think of them as categorizations
that different coding programs might combine in order to execute certain
functions. Most programming languages including C++ and Java use the same
basic data types.
1. Integer
Integer data types often represent whole numbers in programming. An
integer's value moves from one integer to another without acknowledging
fractional numbers in between. The number of digits can vary based on the
device, and some programming languages may allow negative values.
Integers are digits that account for whole numbers only. Some integer
examples include:
425
65
9
2. Character
In coding, alphabet letters denote characters. Programmers might represent
these data types as (CHAR) or (VARGCHAR), and they can be single
characters or a string of letters. Characters are usually fixed-length figures that
default to 1 octet—an 8-bit unit of digital information—but can increase to
65,000 octets.
Float: A data type that typically allows up to seven points after a decimal.
Double: A data type that allows up to 15 points after a decimal.
The floating-point double type can provide more accurate values, but it also
may require additional memory to process
5. Long
Long data types are often 32- or 64-bit integers in code. Sometimes, these can
represent integers with 20 digits in either direction, positive or negative.
Programmers use an ampersand to indicate the data type is a long variable.
Long data types are whole numbers, both positive and negative, that have
many place values. Examples include:
-398,741,129,664,271
9,000,000,125,356,546
6. Short
Similar to the long data type, a short is a variable integer. Programmers
represent these as whole numbers, and they can be positive or negative.
Sometimes a short data type is a single integer.
Short data types can be up to several integers, but they are always less than
long data. Examples include:
-27,400
5,428
17
7. String
A string data type is a combination of characters that can be either constant or
variable. This often incorporates a sequence of character data types that result
in specific commands depending on the programming language. Strings can
include both upper and lowercase letters, numbers and punctuation.
8. Boolean
Boolean data is what programmers use to show logic in code. It's typically one
of two values—true or false—intended to clarify conditional statements. These
can be responses to "if/when" scenarios, where code indicates if a user
performs a certain action. When this happens, the Boolean data directs the
program's response, which determines the next code in the sequence.
Boolean data can help guide the logic in a code. Here are some examples of
how you might use this:
bool baseballIsBest = false;
bool footballIsBest = true;
Depending on the program, the code may direct the end-user to different
screens based on their selection.
9. Nothing / Null
The nothing data type shows that a code has no value. This might indicate that
a code is missing, the programmer started the code incorrectly or that there
were values that defy the intended logic. It's also called the "nullable type."
Nothing means a code has no value, but the programmer coded something
other than the digit 0. This is often "Null," "NaN" or "Nothing" in code. An
example of this is:
Dim option = Nothing
Program.WriteWords(x Is Nothing)
10. Void
Similar to the nothing type, the void type contains a value that the code cannot
process. Void data types tell a user that the code can't return a response.
Programmers might use or encounter the void data type in early system testing
when there are no responses programmed yet for future steps.
The void data type in coding functions as an indicator that code might not have
a function or a response yet. This might appear as:
int function_name (void)
Transactional Data
This type of data describes your core business activities. If you are a trading
company, this may includes the data of your purchasing and selling activities.
If you are a manufacturing company, this will be your production activities data.
If you are a ride-hailing or cab company, this will the trip data. In a very basic
organizational operations, the data related to the activities of hiring and firing
employees can also be classified as transactional data. As a result, this kind of
data has a very huge volume in comparison with the other types and usually
created, stored, and maintained within the operational application such as ERP
system.
Master Data
It consists of key information that make up the transactional data. For example,
the trip data in a cab company may contain driver, passenger, route, and fare
data. The driver, passenger, locations, and basic fare data are the master data.
The driver data may consists the name of the driver and all of the associated
information. So does the passenger data. Together, they make up the
transactional data.
Master data usually contains places (addresses, postal-coded, cities,
countries), parties (customers, suppliers, employees) and things (products,
assets, items, etc.). It is application-specific, meaning that its uses are specific
for the application with business process related to it, e.g: the employees
master data is created, stored, and maintained within the HR application.
Reference Data
Reference data is a subset of master data. It is usually a standardized data that
governed by certain codification (e.g. list of Countries is governed by ISO 3166–
1. There’s an easy way to differentiate reference data from master data. Always
remember that reference data is way less volatile than master data. Let’s back
again to our cab company. Tomorrow, the day after tomorrow, or next week,
the list of driver may change whenever there’s a new person onboard or kicked
out. But the list of countries will remain the same even 2 decades from now,
unless there’s a little land that declare its independence.
Reporting Data
It’s an aggregated data compile for the purpose of analytic and reporting. This
data consist of transactional, master, and reference data. For example: Trip
data (transaction + master) on the 13th day of July in Greater London
region (reference). Reporting data is very strategic and usually being produced
as ingredient of decision making process.
Metadata
It’s a data about data. Sounds confusing? Indeed. It’s the type of data that got
me dizzy in the first time I enter the data management field. Thankfully, this
beautiful picture make it easy for me to comprehend what metadata actually is.
Flat rectangular files or tabular data is a classical and still often used data
structure which can be read by all statistical and spreadsheet programs.
Flat file database is a database that stores information in a single file or table.
In a text file, every line contains one record where fields either have fixed length
or they are separated by commas, whitespaces, tabs or any other character.
In a flat file database, there is no structural relationship among the records and
they cannot contain multiple tables as well.
Advantages:
• Flat file database is best for small databases.
• It is easy to understand and implement. Fewer skills are required to
handle a flat file database.
• Less hardware and software skills are required to maintain a flat file
database.
Disadvantages:
• A flat file may contain fields which duplicate the data as there is no
automation in flat files.
• If one record is to be deleted from the flat file database, then all the
relevant information in different fields has to be deleted manually making
the data manipulation inefficient.
• Flat file database waste the computer space by requiring it to keep the
information on items that are logically cannot be available.
• Information retrieving is very time consuming in a large database.
In hierarchical database, the entity type is the main table, rows of a table
represent the records and columns represent the attributes.
Advantages:
Disadvantages:
Hierarchical database model lacks flexibility. If a new relationship is to be
established between two entities then a new and possibly a redundant
database structure has to be build.
Maintenance and of data is inefficient in a hierarchical model. Any change in
the relationships may require manual reorganization of the data.
This model is also inefficient for non-hierarchical accesses.
Advantages:
• The network database model makes the data access quite easy and
proficient as an application can access the owner record and all the
member records within a set.
• This model is conceptually easy to design.
• This model ensures data integrity because no member can exist without
an owner. So the user must make an owner entry and then the member
records.
• The network model also ensures the data independence because the
application works independently of the data.
Disadvantages:
• The model lacks structural independence which means that to bring any
change in the database structure; the application program must also be
modified before accessing the data.
• A user friendly database management system cannot be established via
network model.
Relational database model was proposed by E.F. Codd. After the hierarchical
and network model, the birth of this model was huge step ahead. It allows the
entities to be related through a common attribute. So in order to relate two
tables (entities), they simply need to have a common attribute. In the tables
there are primary keys and alternative keys. Primary keys form a relation with
the alternative keys. This property makes this model extremely flexible.
Thus using relational database ample information can be stored using small
tables. The accessing of data is also very efficient. The user only has to enter
a query, and the application provides the user with the asked information.
Relational databases are established using a computer language, Structured
Query Language (SQL). This language forms the basis of all the database
applications available today, from Access to Oracle.
Advantages:
• Relational database supports mathematical set of operations like union,
intersection, difference and Cartesian product. It also supports select,
project, relational join and division operations.
• Relational database uses normalization structure which helps to achieve
data independence more easily.
• Security control can also be implemented more effectively by imposing
an authorization control on the sensitive attributes present in a table.
• Relational database uses a language which is easy and human readable.
Disadvantages:
• The response to a query becomes time-consuming and inefficient if the
number of tables between which the relationships are established
increases.
Advantages:
• If there are complex (many-to-many) relationships between the entities,
the object-oriented database handles them much faster than any of the
above discussed database models.
• Navigation through the data is much easier.
• Objects do not require assembly or disassembly hence saving the coding
and execution time.
Disadvantages:
• Lower efficiency level when data or relationships are simple.
• Data can be accessible via specific language using a particular API which
is not the case in relational databases.
Advantages:
• Data remains encapsulated in object-relational database.
• Concept of inheritance and polymorphism can also be implemented in
this database.
Disadvantages:
• Object relational database is complex.
• Proponents of relational approach believe simplicity and purity of
relational model are lost.
• It is costly as well.
• Web enabled database (1990s – present):
• Web enabled database simply put a database with a web-based
interface.
Disadvantages:
• Main disadvantage is that it can be hacked easily.
• Web enabled databases support the full range of DB operations, but in
order to make them easy to use, they must be “dumped down”.
WHAT IS A DATABASE?
Databases are used for storing, maintaining and accessing any sort of data.
They collect information on people, places or things. That information is
gathered in one place so that it can be observed and analyzed. Databases can
be thought of as an organized collection of information.
TYPES OF DATABASES
Distributed databases:
A distributed database is a type of database that has contributions from the
common database and information captured by local computers. In this type of
database system, the data is not in one place and is distributed at various
organizations.
Relational databases:
This type of database defines database relationships in the form of tables. It is
also called Relational DBMS, which is the most popular DBMS type in the
market. Database example of the RDBMS system include MySQL, Oracle, and
Microsoft SQL Server database.
Object-oriented databases:
This type of computers database supports the storage of all data types. The
data is stored in the form of objects. The objects to be held in the database
have attributes and methods that define what to do with the data. PostgreSQL
is an example of an object-oriented relational DBMS.
Centralized database:
It is a centralized location, and users from different backgrounds can access
this data. This type of computers databases store application procedures that
help users access the data even from a remote location.
Open-source databases:
This kind of database stored information related to operations. It is mainly used
in the field of marketing, employee relations, customer service, of databases.
Cloud databases:
A cloud database is a database which is optimized or built for such a virtualized
environment. There are so many advantages of a cloud database, some of
which can pay for storage capacity and bandwidth. It also offers scalability on-
demand, along with high availability.
Data warehouses:
Data Warehouse is to facilitate a single version of truth for a company for
decision making and forecasting. A Data warehouse is an information system
that contains historical and commutative data from single or multiple sources.
Data Warehouse concept simplifies the reporting and analysis process of the
organization.
NoSQL databases:
NoSQL database is used for large sets of distributed data. There are a few big
data performance problems that are effectively handled by relational
databases. This type of computers database is very efficient in analyzing large-
size unstructured data.
Graph databases:
A graph-oriented database uses graph theory to store, map, and query
relationships. These kinds of computers databases are mostly used for
analyzing interconnections. For example, an organization can use a graph
database to mine data about customers from social media.
OLTP databases:
OLTP another database type which able to perform fast query processing and
maintaining data integrity in multi-access environments.
Personal database:
A personal database is used to store data stored on personal computers that
are smaller and easily manageable. The data is mostly used by the same
department of the company and is accessed by a small group of people.
Multimodal database:
The multimodal database is a type of data processing platform that supports
multiple data models that define how the certain knowledge and information in
a database should be organized and arranged.
Document/JSON database:
In a document-oriented database, the data is kept in document collections,
usually using the XML, JSON, BSON formats. One record can store as much
data as you want, in any data type (or types) you prefer.
Hierarchical:
This type of DBMS employs the “parent-child” relationship of storing data. Its
structure is like a tree with nodes representing records and branches
representing fields. The windows registry used in Windows XP is a hierarchical
database example.
Network DBMS:
This type of DBMS supports many-to-many relations. It usually results in
complex database structures. RDM Server is an example of database
management system that implements the network model.
Architecture of DBMS
1.Query Processor :
It interprets the requests (queries) received from end user via an application
program into instructions. It also executes the user request which is received
from the DML compiler.
DML Compiler –
It processes the DML statements into low level instruction (machine
language), so that they can be executed.
DDL Interpreter –
It processes the DDL statements into a set of table containing meta data
(data about data).
Query Optimizer –
It executes the instruction generated by DML Compiler.
2. Storage Manager :
Storage Manager is a program that provides an interface between the data
stored in the database and the queries received. It is also known as
Database Control System. It maintains the consistency and integrity of the
database by applying the constraints and executes the DCL statements. It is
responsible for updating, storing, deleting, and retrieving data in the
database.
• Authorization Manager –
It ensures role-based access control, i.e,. checks whether the
particular person is privileged to perform the requested operation or
not.
• Integrity Manager –
It checks the integrity constraints when the database is modified.
• Transaction Manager –
It controls concurrent access by performing the operations in a
scheduled way that it receives the transaction. Thus, it ensures that
the database remains in the consistent state before and after the
execution of a transaction.
• File Manager –
It manages the file space and the data structure used to represent
information in the database.
• Buffer Manager –
It is responsible for cache memory and the transfer of data between
the secondary storage and main memory.
• Data Files –
It stores the data.
• Data Dictionary –
It contains the information about the structure of any database
object. It is the repository of information that governs the metadata.
• Indices –
It provides faster retrieval of data item.
1. Duplicate Data
Data is stored more than once in different files, that means duplicate data may
occur in all these files. Since all the files are independent on each other so it is
very difficult to overcome this error and if anyone finds this error then it will take
time and effort to solve this issue.
2. Inconsistency
In file processing system, various copies of same data may contain different
values. Data is not consistent in this system, it means if a data item needs to
be changed then all the files containing that data need to be modified. It may
create a risk of out dated values of data.
For Example: If you change student name in library then his name should be
changed in all the departments related to the student.
3. Accessing Anomalies
Accessing anomalies means that it is not easy to access data in a desired or
efficient way. It makes supervision of department very difficult. If a user wants
information in a specific manner then he requires creating a program for it.
For Example: Let’s say , if admin of the college wants any student information
like his name, fathers name, roll number, marks and class then program for it
is written but if he wants records of whose students whose numbers are more
than 80 percent then he require to create a different program for it.
Also Read:
For Example: The maximum marks of the student can never be more than
100.
For Example: If a student can access his data in the college library then he
can easily change books issued date. Also he can change his fine detains to
zero.
6. Atomicity Problem
Atomicity is required to save the data values, it means that information is
completely entered or canceled at all. Any system may fail at any time and at
that time it is desired that data should be in a consistent state.
For Example: If you are buying a ticket from railway and you are in the process
of money transaction. Suddenly, your internet got disconnected then you may
or may not have paid for the ticket. If you have paid then your ticket will be
booked and if not then you will not be charged anything. That is called
consistent state, means you have paid or not.
Same atomicity is not present in File Processing System.
8. Data Isolation
Data is isolated in File Processing System and data is stored in different files.
These files can be in different formats. If you want to extract data from two file
then you are required to which part of the file is needed and how they are
related to each other.
Disadvantages of DBMS
Data Modification: DBMS allows users to insert, update and delete the data
from the tables. These tables contains rows and columns, where row
represents a record of data while column represents attributes of the records.
Data Retrieval: DBMS allows users to fetch data from the database.
Characteristics of DBMS
• Stores the data in such a way so that the relation between data is
still maintained in the database.
• Allows fast retrieval.
• It can handle multiple accessing the database at the same time.
• It maintains data integrity by following ACID properties of the
database.
• It provides data security by managing user access.
• DBMS allows automatic backup of database to handle accidental
corruption or deletion of data.
• It allows scaling of database as per the need.
• It allows data rollback and redone in case of a data operation
failure.
Advantages of DBMS
Disadvantages of DBMS
ADVANTAGES OF DBMS
Avoidance of inconsistency.
DBMS controls data redundancy and also controls data consistency. Data
consistency is nothing but if you want to update data in any files then all the
files should not be updated again.
In DBMS, data is stored in a single database so data becomes more consistent
in comparison to file processing systems.
Shared data
Data can be shared between authorized users of the database in DBMS. All
the users have their own right to access the database. Admin has complete
access to the database. He has a right to assign users to access the database.
Enforcement of standards
As DBMS have central control of the database. So, a DBA can ensure that all
the applications follow some standards such as format of data, document
standards etc. These standards help in data migrations or in interchanging the
data.
Tunability
Tuning means adjusting something to get a better performance. Same in the
case of DBMS, as it provides tunability to improve performance. DBA adjusts
databases to get effective results.
DISADVANTAGES OF DBMS
Complexity
The provision of the functionality that is expected of a good DBMS makes the
DBMS an extremely complex piece of software. Database designers,
developers, database administrators and end-users must understand this
functionality to take full advantage of it.
Failure to understand the system can lead to bad design decisions, which
leads to a serious consequence for an organization.
Size
The functionality of DBMS makes use of a large piece of software which
occupies megabytes of disk space.
Performance
Performance may not run as fast as desired.
Cost of DBMS
The cost of DBMS varies significantly depending on the environment and
functionality provided. There is also the recurrent annual maintenance cost.
DBMS applications
I have mentioned very few applications, this list is never going to end as
almost every field where the database needs to be managed is using DBMS
now a days. The traditional file system is used only where the data size is
very small.
Database users are categorized based up on their interaction with the data
base.
These are seven types of data base users in DBMS.
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric
end users. They check whether all the requirements of end users are
satisfied.
4. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who
are familiar with the database. They can develop their own data base
applications according to their requirement. They don’t write the program
code but they interact the data base by writing SQL queries directly
through the query processor.
6. Application Programmers:
Application Programmers are the back end programmers who writes the
code for the application programs. They are the computer professionals.
These programs could be written in Programming languages such as
Visual Basic, Developer, C, FORTRAN, COBOL etc.