0% found this document useful (0 votes)
567 views

Database Administration and Data Mining

This document provides an overview of database administration and data mining. It discusses key concepts like: - The database management system (DBMS) is software that manages data storage and retrieval from a database. It allows for defining, storing, and manipulating data. - Characteristics of a DBMS include modeling real world entities, self-explanatory metadata, data integrity, security, relational structures, and supporting multiple views. - Objectives of a DBMS are to provide convenient and effective methods for defining, storing, retrieving, and protecting database information, while ensuring data availability, shareability, and integrity.

Uploaded by

Shaan Dongre
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
567 views

Database Administration and Data Mining

This document provides an overview of database administration and data mining. It discusses key concepts like: - The database management system (DBMS) is software that manages data storage and retrieval from a database. It allows for defining, storing, and manipulating data. - Characteristics of a DBMS include modeling real world entities, self-explanatory metadata, data integrity, security, relational structures, and supporting multiple views. - Objectives of a DBMS are to provide convenient and effective methods for defining, storing, retrieving, and protecting database information, while ensuring data availability, shareability, and integrity.

Uploaded by

Shaan Dongre
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Database Administration and Data Mining

Deepak Dhiwar TY-BBA Notes 22-23

DATABASE ADMINISTRATION AND DATA MINING


Deepak Dhiwar TY-BBA Notes 22-23
Q-1 Explain the database management system and Characteristics of DBMS with suitable figure.

A-1 Introduction: Database Management System (DBMS) Data is one of the most important
business assets. In any organization data is the basic resource needed to run the organization.

It is only useful when it is suitably managed. When data is properly managed, it transforms into
information which can be utilized in making business decisions.

Organizations can only survive when it takes optimized decisions in the highly competitive
market. The optimal business decisions can be made only if organization is able to affectively
collect, organize, analyze and interpret data and get proper information.

Now we can understand the importance of data management in all business operations. The data
is effectively managed in a data base by using a database management system.

The database management system is a software package which effectively manages the database.

The database is created and maintained by an integrated set of programs termed as the Database
Management System (DBMS).

Conveniently and effectively defining, storing, retrieving and manipulating the data contained in
the database, is the major aim of the DBMS.

A Database Management System (DBMS) is a software tool used to perform various types of
operations on data in a database and creates a convenient and effective environment which helps
the users to access data easily.

Data- Is a collection of information or real facts which can be recorded and have implicit
meaning. Customer_name, item_price, balance etc. can be considered as data.
A Database-Is a shared collection of related data, by date we mean known fact that can be
recorded and that have implicit meaning.

For example, couldn't consider the names, telephone numbers and addresses of the people that
you know. This data is recorded in an indexed address book or stored on a diskette or using a
personal computer this is a collection of related data with an implicit meaning and hence it is a
database.

A Database Management System (DBMS) is software that helps in effective and convenient use of
database by utilizing and maintaining huge collections of data. OR Is a computer based system or
program to record and maintain information or data. Your database can be viewed as a
respiratory of data that is defined once and then accessed by various users as shown.

Characteristics of a DBMS

1. Real-world entity: One of the most important and easy-to-understand characteristics of


DBMS is that it is realistic.

The DBMS has been designed in such a way that it can cater to the needs of huge
business organizations and can store large data with efficient operations on them.
Database can store things like cost of vegetables or cost of different brands of breads or
milk etc. The entities in the database look like the real-world entities (please have a look
at the table for student database shown below).
For example, we can have a Database Management System for a School or a big MNC,
and the data is stored in the form of real-world entities. Any student that is stored in a
student database is like a real-world student (object/entity) and has properties (commonly
known as attributes in DBMS terms) like his/her name, gender, age, roll number, etc.
2. Self-Explaining Nature: A DBMS contains one database and along with that it also
contains metadata about that one database.

Metadata is the data about data. For example: In a DBMS for a particular School, the
total Number of rows in the database and what is the name of each column of the
database table, and all such information about the data is metadata.

So, the combination of this database and metadata leaves no questions in anyone’s mind
as the DBMS becomes self-explanatory. This is because the database has all the
information in a structured format and if anyone has any doubts or questions regarding
how the database is designed, they may investigate the metadata.

● 3-Integrity: Integrity means that the data which comes into the Database should be correct
as well as consistent. Let us see an example to understand the meaning of correct and
consistent data with respect to DBMS.

For instance: Let us say that there is an XYZ bank, and it has its own database of all the
customers of the bank. If we are entering the details of XYZ Bank’s account and the
account number is not specific to that bank, the data is incorrect. However, if a person
has changed his/her address in the savings account of the same Bank and the current
account still has the old address of the person in the current account’s details, this is data
inconsistency.

In DBMS, the data that is entered into the database is both correct as well as consistent.
Apart from that, integrating changes is very easy in a database.

For instance, if the bank earlier had a constraint of having a minimum account balance of
0 and now it has changed it to 2000 Rupees, the DBMS will be able to integrate this
change very easily and all the accounts that do not have a minimum balance of 5000 will
be detected and notified accordingly.

3. Security: The database should be accessible to the users in a limited way.

The access to make changes to a database by the user should be limited and the users
must not be given complete access to the entire database.

Unauthorized users should not be allowed to access the database.

Authentication: The DBMS has authentication for various users that directly refers to the
limit to which the user can access the Database. Authentication means the process of
laughing in of the user only with the rights that he/she has been authorized to. For
instance, in any organization, admin has the access to make changes to the database of
the organization as some new employee might have joined the organization or someone
might have left it. However, the employees have access only to their personal profile and
can make changes to it only. They cannot access the database of any other employee or
the organization.
These kinds of security measures are available and built very strongly in a DBMS.

4. Relational databases: Relational databases were first introduced in the 1970s. In this type
of database, each record contains fields called attributes. Each attribute represents one
piece of information about a particular object. For example: If you want to keep track of
your personal details then you will need three different attributes namely name, address,
phone number. All of these attributes together form a single row in a table.

This means that every time we add new information into our database, we must insert
multiple rows into the same table. If we do not follow this rule, then we may end up
having duplicate entries in our database. So relational databases allow us to organize data
using relations between objects.

5. Supports Multiple Views of Data: A view can be said to be a subset of a database which
gets defined by specific users of the systems with each view containing data which is of
interest to a specific user or group of users. This means that there can be multiple views
on the system.
6. Sharing of Data and Multiple User System: A multiple user database system allows
multiple user access. In order to achieve this community d my user DBMS should have
concurrency control strategies implemented.
7. Control Data Redundancy: Ideally each data item should be stored in only one location in
the database under the database approach. Nevertheless, redundancy occurs which is
controlled for keeping it at a minimum for improved system performance.
8. Data Sharing: as data is integrated in an organization, it leads to better capability for
producing more information from a specific amount of data.
9. Enforcing integrity constraints: For defining and enforcing the limitations like data type,
data uniqueness, ETC., DBMS must provide the capability.
10. Restricting Unauthorized Access: All users of the system do not have the same access
rights. This is achieved by DBMS providing the security subsystems for creating and
controlling user’s account.
11. Transaction Processing: The DBMS should have concurrency control subsystems to
ensure that simultaneous updating by several users is done in a controlled manner.
Consistency and validity must be maintained in the updates.

Q-2 What are the objectives of DBMS.

A-2 The objective of DBMS is to provide convenient and effective methods of defining, storing
and retrieving the information contained in the database.

In addition, the DBMS must provide for the safety of the information stored it should protect the
data from system crash or attempts at unauthorized access. If the data are to be shared among
several users, the system must have like possible anomalous results.

1. Availability: Data burst be available for applications and queries. Data availability refers to
the fact that the data are made available: to wide variety of users, in a meaningful format,
at reasonable cost, with ease of access, when and where required.
2. Shareability: Data items prepared by one application must be available to all other
applications on queries. No data items are exclusive to an application.

3. Data Integrity: Integrity is a critical aspect to the design, implementation and uses of any
system which stores, processes, or retrieves data. Data integrity refers to the correctness of
the data in the database. In other words, how reliable is the data available in the database.
Integrity also means your data is authentic, accurate and consistent.
4. Data Security: Refers to protective digital privacy measures that are applied to prevent
unauthorized access to computers, websites, databases or parts thereof. Data security
refers to the fact that only authorized users can access the data. Data security can be
enforced by passwords. if two separate users are accessing a particular data at the same
time, the DBMS must not allow them to make conflicting changes. Data security protects
data from corruption. It also refers to the collective measures used to protect and secure a
database and its management software from illegal use and malicious threat and attacks.
Database security professionals employ a number of practices to assure data integrity,
including data encryption, which locks data by encryption, data backup which stores a
copy of data in an alternative location, access controls, including assignment of read/ write
privileges. Input validation that prevents incorrect data entry. Data validation next certifies
uncorrupted transmissions.

5. Data Independence: One of the main objectives of DBMS is to facilitate sharing of a


database by correct and future applications. The DBMS should not be tailored to a specific
platform. One should be able to run DBMS on any platforms. DMS must ensure that data
independence for application programs first for example if you if you want to change or
upgrade the storage system itself like replacing a hard drive with a SSD it should not have
any effect on the logical data or schemas.

6. Evolvability: Golden the database must evolve as application usage and query needs
evolve.

Q-3 Explain the three-tier architecture of a DBMS.

A-3 A database can be viewed in the three level of abstraction, and it is known as three level
architecture. So, there are three independent levels by which database is organized.

1. External View: The highest level of the abstraction is external view. This view contains
the users or application program concerns. For a conceptual or global view, various
users’ view scan exists.
‘External or subschema’ is used to describe the external views. these schemas have:
● Logical records definition,

● relationships in the external view, and

● methods of deriving the objects in the external view from the objects in the
conceptual view.
The object comprises of:

● Entities,

● Attributes, and

● Relationships.

2. Conceptual view: The database entities and relationships among them are considered at
the abstraction level. Conceptual views are described using conceptual schemas.

It explains all the relationships and records comprised in the conceptual view. So, based
on this every database has only one conceptual schema.
The method of deriving the objects in the conceptual view from the objects in the internal
view is also considered by the conceptual view.

Due to explanation at conceptual view is format independent of its physical


representation. Properties that state the check to preserve integrity and consistency are
also comprised by this schema.
3. Internal view: It is the nearest to the physical storage method and the abstractions lowest
level. It shows:
● how data is saved and explains the used data structure, and

● which access methods are used by the database?

● It is described using internal schema. The definition of the stored records is also
included by internal schema and following does that also used.
● Methods of representing the data fields,

● expresses the internal view, and

● the access aids.

Q-4 With the help of diagram describe overall structure of DPMS.


Database systems are partitioned into modules for different functions. Some functions
(e.g., file systems) may be provided by the operating system.

Components include:
o File manager manages allocation of disk space and data structures used to
represent information on disk.
o Database manager: The interface between low-level data and application
programs and queries.
o Query processor translates statements in a query language into low-level
instructions the database manager understands. (May also attempt to find an
equivalent but more efficient form.)
o DML precompiler converts DML statements embedded in an application program
to normal procedure calls in a host language. The precompiler interacts with the
query processor.
o DDL compiler converts DDL statements to a set of tables containing metadata
stored in a data dictionary.

In addition, several data structures are required for physical system


implementation:

o Data files: store the database itself.


o Data dictionary: stores information about the structure of the database. It is
used heavily. Great emphasis should be placed on developing a good design and
efficient implementation of the dictionary.
o Indices: provide fast access to data items holding values.

Q-5 What are the objectives and purpose of DPMS?

A-5 Objectives of Database Management System


Database management system has the following objectives:

1. Elimination of redundant data


2. Enabling data access easy for the user
3. Providing a source of mass storage of data
4. Allowing multiple users to access the database at the same time
5. Providing appropriate and quick response to user queries.

Purpose of Database systems


The purpose of database systems is to make the database user-friendly and do easy operations.
Users can easily insert, update, and delete. The main purpose is to have more control of the data.
The purpose of database systems is to manage the following insecurities:

▪ data redundancy and inconsistency,

▪ difficulty in accessing data,

▪  data isolation,

▪ atomicity of updates,

▪ concurrent access,
▪ security problems, and

▪ supports multiple views of data.

Avoid data redundancy and inconsistency:

If there are multiple copies of the same data, it just avoids it. It just maintains data in a single
repository. Also, the purpose of database systems is to make the database consistent.

Difficulty in accessing data:

A database system can easily manage to access data. Through different queries, it can access data
from the database.

Data isolation:

Data are isolated in several fields in the same database.

Atomicity of updates:

In case of power failure, the database might lose data. So, this feature will automatically prevent
data loss.

Concurrent access:

Users can have multiple access to the database at the same time.

Security problems:

Database systems will make the restricted access. So, the data will not be vulnerable.

Supports multiple views of data:

It can support multiple views of data to give the required view as their needs. Only database
admins can have a complete view of the database. We cannot allow the end-users to have a view
of developers.
Advantages of Database System

1. Data Abstraction: Data abstraction means, to hide the complexity of data from the basic
users. DBMS abstracts the data from the users, which is not useful for the users it shows
only those data which are useful for the users.
2. Controlling Data Redundancy: Data redundancy means having multiple copies of the
same data. DPMS controls the data redundancy and integrates all data into a single
database file. Controlling the data redundancy also helps to save us towards space and
increase retrieval and update speed.
3. Minimized data inconsistency: Data inconsistency means that different files may
contain different information about a particular object or person. If the VMS has reduced
the data redundancy, then the database system leads to better data consistency full our
data items appears only once (no redundancy) so the updated values are immediately
available to all users.
4. Data Manipulation Easily: in DBMS, data can be manipulated easily, because data is
centralized so once the data structure is defined, we can easily change in the data like-
insertion, modification, or deletion.
5. Data Sharing: Data can be shared data can be shared easily by multiple applications in
centralized DBMS. The application can be developed without having to create any newly
stored files. The DBMS helps to develop a friendly environment where end users can
access and manage data.
6. Data Security: Data is very important for any business organization. The more users
access the data, the higher risk of data security breaches. If someone stole business data,
then it would be very bad for business. So, a company will never want any outsider to
come and access the company's data. The business organization invest plenty amount of
time, effort, and money to ensure that its data are used by only authorized users first
DBMS provides data security means protecting your precious data from unauthorized
access data can be accessed only by authorized users of the organization. A database can
be accessed only by proper authentication usually by verifying login and password.
7. Support Multiuser Views: Multi users can view the data at the same time. Using the
database, many users can access the data at the same time which increases our working
speed. DBMS gives the ability for its multiple authorized users to access the same
database from different locations, in different ways, to complete its different works.
8. Concurrent Access: several users can access the database concurrently.
9. Helps for Decision Making: Better organize data and improve data across gives us a
better-quality information which helps for making better decisions.

Disadvantage of Database System

1. Cost of Hardware and Software: In order to run the DBMS software, we need a
high-speed processor, and a large memory size is required which causes expensive
hardware is needed.
2. Cost of Data Conversion: When a computer file-based system is replaced with a
database system, then the data stored in the data files must be converted into database
files close it is difficult and time-consuming method to convert the data of data files
into database.
3. Cost of Staff Training: DBMS are often complex systems, so training is required for
the users to use the DBMS. The organization has to be paid plenty of amount for the
training of workers to run the database management system.
4. Problem Associated with the Centralization: There is less security when data is
accessed in centralized (data is accessed by every user) manners. So, data can be lost
and damaged.
5. Complexity of Backup and Recovery: In a multiuser database system when backup
is loading, it may lead to delicacy of data. If one takes the database backup, then it
may affect the multiuser database system (which is in operation).
6. Confidentiality, Privacy and Security: When data is accessible from the remote
location (it is, database system is centralized) then the possibilities of misuse of data
increases compared to conventional database.
7. Data Quality: Suitable and sufficient controls are required to control the users who
are updating their data and control the data quality. Direct access of data by various
users leads to massive opportunities for user to damage their data. So, if not suitable
controls are available then it may have that data is comprised.
Unit-2 Database Administration

Introduction
Purpose of Database Administration
Concept of Database Administration
Transaction Management, Properties of Transaction (ACID properties)

Q-1 What is database administration. And what is the purpose of database administration.
A-1 Database administration consist of everything required to manage your database and make it
available as needed.

The database administrator (DBA) is the person who manages, backs up and ensures the
availability of the data produced and consumed by today's organizations via their IT systems.

For example, no he is the person who ensures that the banker tellers have easy, fast access to
your information, and can quickly access your bank balance and transaction history. In this
example, the DBA is a system or application database administrator-a general DBA role
responsible for most aspects of the organization's databases.

Data administrator (DA) is a person who is responsible for processing data into a convenient data
model. The person is in charge of figuring out which data is relevant to be stored in the database.

Data administrator is less of a technical role and more of a business role with some technical
knowledge.

Purpose of database administration

In small organizations, the database administrator or the DBA is responsible for undertaking data
administration tasks.

The core responsibility of DBA includes administration of the database, administration of the
database management system or DBMS and the administration of the database environment.

The purpose of database administration is as follows:


1. Administration of the Database: The DBA performs the following activities in relation
to administering database or a series of databases:

● Physical design: The DBA is concerned with the physical design and
implementation of the database and not the conceptual and logical design of
the database systems.
● Data standards and documentation: The DBA ensures that the physical data
is documented in a standard manner so that multiple applications and users
can easily and effectively access it.
● Monitoring data usage and tuning database structures: The DBA also
performs the task of monitoring the live running against a database and
modifying schemas and structures to increase the performance of the database
systems.
● Data archiving: The DB has to put into implementation a strategy to achieve
the “dead” data.
● Data Backup and Recovery: In the event of hardware or software failure, it
is the DBA who established and implements the procedures to backup and
recovery of data fully

2. Administration of the DBMS: In the relation to administering a DBMS, a DBA must


perform the following key tasks:

● Installation: the DBA is responsible for installing the DBMS and its components.

● Configuration control: the DBA enforces policies and procedures for managing
updates and changes to the software of database system
● Monitoring DBMS usage and tuning DBMS: the DBA monitored life running
against DBMS and dealer the element of the DBMS is to enhance the
performance of the system.

3. Administration of the Database Environment: With respect to the administrating the


database environment, the DBA must monitor and control access to the database and
DBMS by users and application system and must perform activities including:

● Data control: DB must establish user groups, assign passwords, grant access to
DBMS facilities and grant access to databases to the users and application
systems.
● Impact assessment: DBA assesses the impact of any changes in the use of the
database systems.
● Privacy, Security, and Integrity: the DBA must ensure that strategies laid down
for the data integrity, security, and privacy is adhered to at the physical level.
● Training: DB must manage education and training of users with respect to
principles and policies of the database use.

Q-2 What is the concept of Database Administration.


A-2 A database administrators (DBA) primary job is to ensure that data is available, protected
from loss and corruption, and easily accessible as needed. Below are some of the chief
responsibilities that make up the day-to-day work of a DBA
1. Software Installation and Maintenance: A DB often collaborates on the initial
installation and configuration of new Oracle, SQL Server etc. Database. The System
Administrator sets up hardware and deploys the operating systems for the database server
then the DBA installs the database software and configures it for use. As updates and
patches are required, the DBA handles this ongoing maintenance. And if a new server is
needed, the DBA handler the transfer of data from the existing system to the new
platform.
2. Data extraction, Transformation and Loading: Known as ETL, data extraction,
transformation and loading refers to efficiently importing large volumes of data that have
been extracted from multiple systems into the data warehouse environment full this
external data is cleaned up and transformed to fit the desired format so that it can be
imported into a central respiratory.
3. Specialized Data Handling: Today's databases can be massive and may contain under
structured data types such as images, documents, or sound and video files. Managing a
very large database (VLDB) may require higher level skills and additional monitoring
and tuning to maintain efficiency.
4. Database Backup and Recovery: DBAs create backup and recovery plans and
procedures based on industry best practices, then make sure that the necessary steps are
followed. Backups cost time and money, so the DPP may have to persuade management
to take necessary precautions to reserve data.
5. System admins or other person may actually create the backups, but it is the DBA’s
responsibility to make sure that everything is done on schedule. in case of server failure
or other form of data loss, the DB will use existing backup to restore lost information to
the system. Different types of failures may require different recovery strategies and the
DBA must be prepared for any possibility. With technology change, it is becoming ever
more typical for a DB to backup databases to the cloud, Oracle cloud for Oracle
databases and MS Azure for SQL Server.
6. Security: A DBA needs to know potential weaknesses of the database software and the
company's overall system and work to minimize risks. No system is 100% immune to
attacks but implementing best practices can minimize risks in the case of security breach
or hey regularity, the DBA can consult audit logs to see who has done what to the data.
Audit trails are also important when working with regulated data.
Setting up employee access to an important aspects of database security. DBAs controls
who have access and what type of access they are allowed. For instance, user may have
permission to see only certain pieces of information, or they might be denied the ability
to make changes to this system.
7. Capacity Planning: The DB needs to know how large the database currently is and how
fast it is growing in order to make predictions about future needs. Storage refers to how
much room the database takes up in server and backup space. Capacity refers to usage
level. If the company is growing quickly and adding many new users, the DBA will have
to create the capacity to handle the extra workload.
8. Performance Monitoring: Monitoring databases for performance issues is a part of the
ongoing system maintenance DBA performs. If some part of the system is slowing down
processing, the DBA may need to make configuration changes to the software or add
additional hardware capacity. Many types of monitoring tools are available, and part of
the DBA's job is to understand what they need to track to improve the system. Third party
organizations can be ideal for outsourcing this aspect, but make sure they offer modern
DB support.
9. Database Tuning: Performance monitoring shows where the database should be tweaked
to operate as efficiently as possible. The physical configuration, the way the database is
indexes and how queries are handled can all have a dramatic effect on database
performance. With effective monitoring, it is possible to proactively tune a system based
on application and usage instead of waiting until a problem develops.
10. Troubleshooting: DBAs are on call for troubleshooting in case of any problems.
Whether they need to quickly restore lost data or correct an issue to minimize damage, a
DBA needs to quickly understand and respond to problems when they occur.

Q-3 What is Transaction Management.

A-3 Introduction: Transactions are collections of operation that forms a single logic unit of
work put stop in case of system failure, a transaction makes sure that after recovery the data will
be in a consistent state.

The transaction represents any real-world event of an organization where concurrency control is
required.

Transaction processing system execute database transactions using large number of databases
and hundreds of concurrent users.

Every transaction is delimited by statements or function calls of the form begin transaction and
end transaction.

The reliability of DBMS is linked to the reliability of computer system and some solution must
be there to deal with such computer system failures.

Recovery system, the main component of transaction management/ processing unit, deals with
such failures. It makes the database fault tolerant.

Number of transactions can be executed at the same time and be accessing the same database.
Search for current access to the database may result in some inconsistent state of database.
Go currency control/ management unit of transaction management preserves the consistency of
database in case of concurrent accesses.

Concept of Transaction Processing

1. Program Execution: Cortana section is the exhibition of program that accesses or


changes the contents of a database.
2. Information Dividing Processing: Transaction processing means dividing information
processing up into individual, indivisible operations, called transactions.
3. Maintain Database Integrity: Transaction processing is designed to maintain database
integrity (the consistency of related data items) in a known, consistent state.
4. A Whole Transaction: A transaction, a typical example of which would be a customer's
order, consist of a series of events (accepting the order, allocating stock and so forth) that
are treated as a whole.
Transaction is the logical unit of work on the database that is either completed in the
entirety (COMMIT) or not done at all. In the latter case the transaction has to Logical
Unit clean up its own mess, known as ROLLBACK. A transition could be an entire
program, a portion of a program or a single command.
5. Database Access: A transaction contains single or multiple database access operations
like insertion, deletion, modification or retrieval.
6. A single Application Program: A single application program with several transaction
boundaries may contain more than one transaction
7. A read Only Transaction: Chronic transaction is the one in which the database
operations only retrieve that data and do not update it.
8. READ and WRITE: A transaction is a sequence of read and write actions grouped
together to form database access.
9. Large Databases and Concurrent Users: Transaction processing system our system
that have large databases and lots of concurrent users who execute the database
transactions. For example, reservation systems, banking system, credit card processing
systems, stock market systems and supermarket checkout systems. These systems have
high response time and availability to respond to hundreds of concurrent users.
10. Alter State: A transaction that changes the contents of a database must alter the state of a
database from a consistent one to another state consistent database state maintains data
integrity.

Operations in Transaction

The main operations in our transactions are:


1. Read Operations
2. Write operation

1. Read Operation: Read operation treat the data from the database and then stores it in the
buffer in main memory.
For example, Read(A) instruction will read the value of A from the database and will
store it in the buffer in main memory

Write Operation: Write operation writes the updated data value back to the database
from the buffer.
For example, Write (A) we'll write the updated value of A from the buffer to the
Database.

Simple Transaction Examples:


1. Read your account balance.
2. Deduct the amount from your balance.
3. Write the remaining balance to your account.
4. Read your friend's account balance.
5. Add the amount to his account balance.
6. Write the new updated balance to his account.
7.
This whole set of operations can be called a transaction. Although you have seen red, right and
update operations in the above example but the transactions can have operations like the read and
write.

In DBMS, we write the above 6 steps transaction like this: let's say your account is A and your
friend's account is B, you are transferring 10,000 from A to B, the steps of the transactions are:

1. R(A);
2. A= A - 10000;
3. W(A);
4. R(B);
5. B= B + 10000;
6. W(B);

In the above transaction, R refers to the read operation and W refers to the right operation.

Transaction States

There are different states that the transactions can have. These includes:
1. Active: This is the initial state of a transaction that a transaction enters when it begins and
remains in this state when it is running.
2. Abort: A transaction that is not inactive or running state is aborted. When a transaction is
aborted, the system must return to the state it was in when the transaction began, and any
changes caused by the aborted transaction must be rolled back. Next time commit when a
transaction is permanently completed, it is said to be in the commit state. Once the
transaction has been committed, any changes caused by the transactions cannot be
undone.
3. Partially Committed: A transaction is partially committed when all statements in the
transactions are completed.
4. Failed: When a transaction encounters some abnormal or errant conditions and cannot
continue, it is said to be in a failed state.
Q-4 Describe ACID properties in detail.

A-4 To ensure the integrity of data, database system maintains following properties of
transactions.
1. Atomicity: This, we mean that either the entire transaction takes place at once or doesn't
happen at all. There's no midway it is transaction do not occur partially. Each transaction
is considered as one unit and either runs to completion or it is not executed at all. It
involves the following two operations:
A. Abort: If transaction aborts, changes made to database are not visible.
B. Commit: If a transaction commits, changes made are visible. Atomicity is also
known as the’ all or Nothing Rule’

2. Consistency: Consistency property of transaction implies that if the database was


inconsistent state before the start of a transaction, then on termination of a transaction, the
database will also be in a consistent state. In other words, all data in a database must
work as a state machine the database must ensure all data is consistent at all times with
all rules.

3. Isolation: Isolation property of transaction indicates that action performed by a


transaction will be hidden from outside the transaction until the transaction terminates.
Thus, each transaction is unaware of other transaction executing concurrently in the
system. In other words, queries and transactions always run at a point in time. You can
query data while many other users are changing data and you will not see their changes
and they will not see each other's changes.

4. Durability: Durability property of a transaction ensures that once a transaction completes


successfully (commits), the changes it has made to the database persist, even if there are
system failures. write- ahead logs provide absolute data durability until data is eventually
written into permanent data and index files.
These four properties are often called ACD (Atomicity, Consistency, Isolation, Durability)
properties of transaction. The asset properties are intended to guarantee valid database
transactions, even if there are network errors, disruptions, hardware failures, ETC cluster for this
reason, acid- compliant databases are important for organization in many different types of
industries, especially those who conduct monetary transactions, handle time sensitive data, or
manage/ monitor data in manufacturing, transportation or energy production.

Example for Atomicity:

Q-5 What are the States of Transaction.

1. Active State – 
When the instructions of the transaction are running then the transaction is in active
state. If all the ‘read and write’ operations are performed without any error then it
goes to the “partially committed state”; if any instruction fails, it goes to the “failed
state”. 
 
2. Partially Committed – 
After completion of all the read and write operation the changes are made in main
memory or local buffer. If the changes are made permanent on the Database, then the
state will change to “committed state” and in case of failure it will go to the “failed
state”. 
 
3. Failed State – 
When any instruction of the transaction fails, it goes to the “failed state” or if failure
occurs in making a permanent change of data-on-Data Base. 
 
4. Aborted State – 
After having any type of failure the transaction goes from “failed state” to “aborted
state” and since in previous states, the changes are only made to local buffer or main
memory and hence these changes are deleted or rolled-back. 
 
5. Committed State – 
It is the state when the changes are made permanent on the Data Base and the
transaction is complete and therefore terminated in the “terminated state”. 
 
6. Terminated State – 
If there isn’t any roll-back or the transaction comes from the “committed state”, then
the system is consistent and ready for new transaction and the old transaction is
terminated. 

Unit 3 Data Warehousing


Deepak Dhiwar TY-BBA Notes 22-23

Q-1 What is Data Warehousing? And explain the purpose of Data Warehousing.
A-1 Introduction: Database contains information organized in columns and rows commas and
tables that is periodically indexed to make accessing relevant information more accessible.
Many enterprises in organizations create and manage databases using a database management
system. special DBMS software can be used to create and store your product inventory and
customer information.
 Organizations most often used a data basis for online transaction processing (OLTP).
 A database was built to store current transactions and enable fast access to specific transactions
for ongoing business processes, known as Online Transaction Processing (OLTP)
The term “Data Warehouse” (DW) Was first coined by bill Inman in 1990. According to Inman,
a data warehouse is subject oriented, integrated, time variant and non-volatile collection of data.
This data helps analysts to take informed decision in an organization.
According to Raj Kimball, Data Warehouse is a copy of transaction data specifically structured
for queries and analysis.
Therefore, data warehouse can be thought of as a collection of prepacked or summarized data
which is developed as per particular business rules. It is also designed in such a way that it helps
in taking management decisions.
The following types of information are stored in a data warehouse:
1. Business Information: It stores business information which has been sourced from
throughout the organization and encompassing all aspects of the company's product
processes and customers.
2. Standard Reports and Queries: Most of the users of the data warehouse require a set of
standard reports and queries. Thus, it becomes very useful when these standard reports
are periodically, automatically produced. In such a case, the user just views the report
whether required instead of taking the time to run the report every time. This saves time
as well as increases manpower productivity.
Purpose of Data Warehousing
The main purpose of data warehouses is to provide the companies with accessibility. It also
provides the analysis of data to derive accurate business visions and forecasting models.
Data warehouses is an important business intelligence tool that allows organizations to:
1. Ensure Consistency: Data warehouses are used to apply a uniform format to all data to
make it easier for the business decision maker to analyze the data and share insights on
the data. This also reduces the risk of error in interpretations and improves overall data
accuracy full
2. Make Better Business Decisions: Popular businesses leaders develop data-driven
strategies and make decisions without consulting facts. Data warehousing improves the
accessibility and efficiency of data and is helpful in making better company decisions.
3. Improve Bottom Line: Platforms allow business to access the historical activity of an
organization to know of the initiatives that have been successful or unsuccessful in the
past. This enables the executives to attach their strategies, decrease cost and maximize
efficiency of the organization.
Q-2 Explain Data Warehousing Concepts
a-2 The basic concept of data warehouse is to enable a company to make better decisions and
forecast business outcomes.
Data warehouse is an information system that involves historical and commutative data from a
single or multiple sources as it helps to simplify the reporting and analysis process of an
organization.
The different concepts of Data Warehousing are:
1. Dimensional Data Model: It is commonly used in the data warehouse systems and is
designed to read, summarize, analyze numeric information. This model helps to arrange
data in a manner that makes it easy to retrieve the information.
The two schemas are:
i. Star Schema: This is easy to design with the center of the star consisting of a fact
table and the points of the star as dimension tables. The fact tables are in third
normal form and dimensional tables are denormalized.
ii. Snowflake Schema: this is an extension of star schema where each dimension is
normalized and connected to more dimension tables.

2. Conceptual Data Model: This model identifies the highest-level relationships between
the several entities. This describes the database at a very high level and is useful to
understand the requirements of the database.
This model is used in the requirement gathering process before the database designer
starts making a specific database.

Features of Conceptual Data Model Include:

i. Necessary entities and relationships among them.


ii. No attribute is defined
iii. No primary key is defined

3. Logical Data Model: This model describes the data in as much detail as possible
irrespective of how it is physically implemented in their database.

Characteristics of a logical data model include:

i. All entities and relationships among them.


ii. All attributes of each entity are defined
iii. The primary key for every entity is defined.
iv. Foreign keys are defined
v. Normalization occurs at this level.

4. Data Model: This model represents how the model will be built in the database. This
model shows all table structures consist of column name, column date type, column
constraints called maki and relationships between tables.

Characteristics of physical data model include:

i. Specification of all tables.


ii. Foreign keys which specify relationships between tables.
iii. Denormalization may occur based on user needs.
iv. Physical consideration may cause this model to be quite different from the logical
data model.
v. This model will be different from different relational database management
systems. For example, data type for a column may be very for My SQL and SQL
Server.
5. OLAP (Online Analytical Processing): OLAP he's a tool that enables users to easily
extract and query data in order to analyze it. OLAP business intelligence queries often
help in trends analysis, financial reporting, sales forecasting, budgeting, and many others.
For example, a user can request data analysis in the form of spreadsheet illustrating all
the companies’ products sold in India in the month of July and compare revenue figures
with same products in September, and then see a comparison chart of product sales in
India in similar time periods.

Q-3 Explain the Need of Data Warehousing.


A-3 Technology is becoming essential for effective business intelligence as it enables easy
organization and maintenance of a large amount of data in addition to fast retrieval and analysis
in the manner and depth required from time to time.
Following point shows the need for and importance of data warehousing:
a. Data Warehouse helps business users to access critical data from some sources all in one
place.
b. It provides consistent information on various cross functional activities.
c. It helps you to integrate many sources of data to reduce stress on the production system.
d. It helps users to reduce total turnaround time for analysis and reporting.
e. It helps users to access retrieval data from different sources in a single place so, it saves
users time for retrieving data information for multiple sources. You can also access data
from the cloud easily.
f. It allows you to store a large amount of historical data to analyze different periods and
trends to make future predictions.
g. Enhances the value of operational business applications and customers relationship
management systems
h. It separates analytical processing from transactional databases, improving the
performance of both systems
i. It provides more accurate reports.
Short Note on OLAP
Online analytical processing (OLAP) is a category of software technology that enables analysts,
managers and executives to gain insight into data through fast, consistent, interactive access in a
wide variety of possible views of information that has been transferred from raw data to reflect
the real dimensionality of the enterprise as understood by the user.
For example, a user can request data analysis in the form of spreadsheet illustrating all of our
companies’ products sold in India in the month of July and compare revenue figures with same
products in September and then see a comparison chart of product sales in India in this similar
time period.
Unit 4 Data analytics and Data mining

Q-1 What is Cloud Computing?


A-1 Cloud Computing is the delivery-on-demand computing services from application to storage
and processing power, typically over the Internet and on a pay as you go basis.
Instead of buying, owning, and maintaining physical data centers and servers, you can access
technology services, such as computing power, storage and databases, on an as needed basis
from a cloud provider like Amazon Web Services (AWS)
Cloud computing uses servers that provide a large number of different types of service: they can
be used to store and manage data, run applications, and deliver content. They can deliver video
streaming, they can provide web email service, they can run office or general productivity
software.
Instead of accessing files, data and programs on a local personal computer they are accessed to
one other computer and stored or run on the cloud system.
The word cloud is used to refer to the Internet. Does it mean an Internet based computing where
different services like servers’ storage and applications are made available to an organization's
devices via the Internet
Cloud computing is a computing model through which a large number of systems are connected
by public or private networks which gives a dynamically scalable infrastructure for data,
application and file storage.
Organizations of every type, size, and industry are using the cloud for a wide variety of use
cases, such as data backup, disaster recovery, e-mail, watchful desktops, software development
and testing, big data analytics, and customer facing web applications
For example, healthcare companies are using the cloud to develop more personalized treatments
for patients.
Financial services companies are using the cloud to power real time fraud detection and
prevention.
Video game makers are using the cloud to deliver online games to millions of players around the
world.
Whenever you travel through a bus or train, you take a ticket for your destination and hold back
to your seat till you reach your destination. Likewise other passengers also take a ticket and
travel in the same bus with you, and it hardly bothers you where they go. When your stop comes
you get off the bus thanking the driver. Cloud computing is just like that bus, carrying data and
information for different users and allows to use its service with minimal cost.

Purpose of Cloud Computing


1. Large Service: cloud computing gives large services as compared to others.
2. Subject Demand: when the demand of the same subject is more there is no effect on this
service while the servers are crashing.
3. Saves money or Lower IT cost gets to offload some or most of the costs and efforts of
purchasing, installing, configuring, and managing your own on premises infrastructure.
4. Improves agility: the cloud gives you easy access to a broad range of technologies so that
you can innovate faster and build nearly anything that you can imagine first up you can
quickly spin your resources as you need them- from infrastructure services, such as
compute, storage, and databases to take Internet of Things, machine learning, data leaks
and analytics and much more
5. cost saving the cloud allows you to trade fixed expenses (such as data centers and
physical servers) for variable expenses, and only pay for IT as you consume it. Plus, the
variable expenses are much lower than what you would pay to do it yourself because of
the economies of scale
6. Deploy globally in minutes: with the cloud, you can expand the new geographic region
and deploy globally in minutes for example AWS infrastructure all over the world, so
you can deploy your application in multiple physical locations with just a few clicks.
Putting applications in closer proximity to end users reduces latencies and improves their
experience.
Cloud Computing Concepts.
Cloud concept includes:
a. Cloud computing service models
b. Cloud computing deployment models
The services offered by the cloud computing can be divided into three major classes
1. IaaS (Infrastructure as a Service): The basic storage and computing facilities are
provided by IAS in the form of standardized service over the network.

The different services such as servers, storage systems, networking equipment, data
center space and so on are combined for managing the workloads.

On this infrastructure, the personalized software can be developed by the users. For
example, Amazon, GoGrid, 3 Tera and so on.

IAS can be seen as a physical server space which can be rented, and it can be maintained
in the data warehouse of the vendor.
Any sort of legal software can be uploaded to this server by the customer and then his
staff and clients can be provided the access as per the choice.

Different computer resources and storage are provided by IAAS which can be utilized by
the developers and IT firms for facilitating the business solution.

2. PaaS (Platform as a Service): In this type of services the software layer or development
environment is combined and offered in the form of a service, and it can be used as the
foundation of other higher level of services.

The customers are provided with the options of creating their own applications which can
be run on the infrastructure of this service provider.

in order to address D manageability and scalability requirements, a predefined


combination of OS and application server like LAMP platform (Linux, Apache, MySQL
and PHP), restricted J2 EE, Ruby etc. Are facilitated by PaaS service provider

The typical example of these is Google's app engine, force.com ETC platform as a
service is the place where different operating systems for example windows, Android,
BST, iOS, Linux, Mac, IBM on the cloud are installed in place of installing them on a
physical hardware.

The standard remote services are facilitated by PaaS which can be used by the developers
to create the applications on the computer infrastructure.
It can consist of different developer tools which can be provided as service and it can be
utilized to create services, data access and data services or billing services.

3. SaaS (Software as a Service): This is the model where the entire application is provided
to the customers in the form of service on demand.

On the cloud a single service instance is executed, and different users can be served
for customers, there is no requirement of any type of initial investment in server or
software licenses. On the other hand, the cost is reduced for the provider as there is only
one application which has to be hosted and maintained.

Companies such as Google, Salesforce, Microsoft, Zuho, ETC are SaaS providers.
In the case of sense, hosting of the software is managed by the service provider and thus
there is no requirement of installing it managing or buying the hardware the customer can
directly connect and start using it

The typical example of sense or customer relation management (CRM) as a service,


payroll software, logistics software, order management software or any sort of software
which can be hosted over the Internet rather than installing it physically on the computer
of the user.
The organizations are moving towards cloud computing, and they begin their business
with remote delivery of emails and online backup of business information.

Cloud Computing Deployment Models

There are three main models it is possible to deploy in the cloud environment.

1. Public Cloud: Third parties have the ownership and authority of management of
these clouds.
A greater economy of scale at a lower price is facilitated to the users as different users
are charged the infrastructure costs for facilitating the low cost to every individual
client with the help of pay as you go model.

The similar infrastructure pool having limited configuration security protections and
availability variances are shared to every client. Cloud providers management support
these.

The main benefit of the public cloud is that the size of these clouds may be greater
than that of enterprise clouds and thus the facility of seamless scaling is provided as
per demand.

2. Private Cloud: They are mainly created only for the single organization. The main
focus remains on the data safety and improving control which are not provided in the
public clouds. The private clouds are mainly of two types

i. On Premises Private Cloud: This is also termed as internal clouds and the hosting is
done within the data center owned by the enterprise. A greater standardization and
protections are facilitated by this model but in case of size and scalability, it has some
limitations.
For the physical resources, there will be some capital and operational cost incurred in
the IT department.

ii. External Hosted Private Cloud: The hosting in this sort of private cloud is done
with the help of an external cloud provider in which an exclusive cloud environment
ensures the full privacies provided. This kind of cloud is mainly preferred by those
organizations which do not refer to public clouds due to physical resource sharing.

3. Hybrid Cloud: This is mainly the combination of both public and private cloud
models. Third party cloud providers can be used by hybrid cloud service providers
fully or partially and therefore offering greater flexibility.
This kind of environment can facilitate the on demand, externally provided scale. Any
sort of increase in the workload can be managed effectively by combining a private
cloud with public cloud resources.

Advantages of cloud computing


1. Cost Reduction: Cloud computing is seen as an incremental investment;
Companies can save money in the long term by obtaining resources.
2. Storage Increase: instead of purchasing large amounts of storage before
the need, organizations can increase storage incrementally, requesting
additional disk space on the service provider when the need is recognized.
3. Resource Pooling: In the IT industry, this feature is also known as multi
tenancy, where many users/clients share a type and varied level of
resources.
4. Highly Automated: As the software and hardware requirements are
hosted on a cloud provider, IT department sites no longer have to worry
about keeping the things to date and available.
5. Greater Mobility: once the information is stored in the cloud, access is
quite simple just to have an Internet connection, regardless of where they
are located.
6. Change the IT Focus: Once the responsibility of the computing
environment has essentially shifted to the cloud provider, the IT
department can now focus more on the organization's needs and the
development of strategic applications and tactics and not on operational
needs of the day-to-day.
7. Towards screen IT: By releasing the physical space, virtualization of
applications and service contributes to the reductions of equipment as well
as the need for air conditioning consequently, less energy based.
8. Keep Updated Things: Similar to changing the IT focus, the benefit is
because of the new demands of providers of cloud services, the focus of
providers is to monitor and maintain the most recent tools and techniques
for the contractors.
9. Quick Elasticity: the characteristics have to do with the fundamental
aspects of cloud flexibility and elasticity for example the web shops carry
a standard number of transactions during the year, but it is necessary to
increase Christmas time. And of course, these stores do not want to pay
for that capacity at peak during the rest of the year.
10. Measurement Service: It means services monitored, controlled and
reported. This feature allows a model of pay per use service or pay for use.
It has similarities with the concept of telephone service packages where
you pay a standard signature to basic levels and pay extra for the
additional services, without changing the contract.
11. Offsite Backups: One of the advantages of cloud services is that the data
can be held off site and will be backed up by the cloud service provider.
This enables the cloud to provide secure offsite data storage for stop it is
essential to ensure that backups are made for your data and that they are
held off site so that data can be recovered in the case of major disasters.
Disadvantages or Limitations of Cloud Computing
1. Renting can be more expensive over a Long Term: Although there can be economies of
scale, renting a service can sometimes be more expensive over the longer term.
Costs need to be carefully viewed and balanced against buying, and the other advantages
of using the cloud.
2. Storage of Sensitive Data: Some companies have issues with the security of their data by
being held on the cloud, data can be opened to the data breaches, and this may mean that
cloud services are not applicable for some or all of the operations.
3. Migrating data to the cloud may not be Easy: It is normally easy to set up and start using
a new cloud application. However, migrating existing data or applications to the cloud
can be quite involved and it may be more expensive than it is anticipated.
4. Internet Connection is Required: To use public cloud capabilities a good Internet
connection is required. If the connection goes down as sometimes happens, the cloud
facilities will not be available.
5. Location and data privacy: Where the data is stored? How is data stored? Does the data
provider have adequate security for data in places where they are stored?

You might also like