Lecture Notes CSC3170 2025 Part 1
Lecture Notes CSC3170 2025 Part 1
Relational Model
▪ The relational model is widely recognized as one of
the great technical achievements of the 20th century
▪ Essentially all databases in use or under
development today are based on Codd’s ideas
• Whenever anyone uses an ATM machine, or
purchases an airline ticket, or uses a credit card,
he or she is effectively relying on Codd’s
invention
▪ The relational model was the very first abstract
database model to be defined
• Codd not only invented the relational model in
particular, he actually invented the data model
concept in general
Transaction Processing
▪ Plays a major role in IBM’s relational System R,
creating a unified approach to the problems of
concurrency control and crash recovery
▪ Defines the transaction as a unit of work that must
leave the database in a consistent state whether or
not the transaction succeeds
▪ Develops techniques that allowed concurrent
execution of many transactions, as well as restart
after crashes, while maintaining the consistency of
the database
Relational Technology
▪ Contributes to the refinement and spread of
database management technology
▪ After reading Codd’s seminal papers on the
relational model, Stonebraker started work with a
colleague, Eugene Wong, to develop an efficient
and practical implementation
▪ The result was INGRES (Interactive Graphic and
Retrieval System), and a prototype of INGRES was
working by 1974
▪ INGRES and System R together helped to turn
relational systems from a laboratory curiosity into
the default choice for even the most demanding
data processing applications
Based on Database System Concepts - 7th Edition 1.6
Structured and Unstructured Textual
Information
▪ A Database Management System (DBMS) is a complex software system
whose task is to manage a large, complex collection of primarily structured
information
▪ Structured Information are organized in discrete units (entities)
▪ Entities of the same type are organized in some predefined ways:
• They have the same number of attributes
• Each attribute has
▪ The same predefined format, e.g. the number of bytes required to
store an attribute
▪ Unstructured information refers to data that
• Does not have a rigid format
• Does not distinguish information into specific items
• Usually free text
7
Based on Database System Concepts - 7th Edition 1.7
Database Systems
▪ Enterprise Information
• Sales: customers, products, purchases
• Accounting: payments, receipts, assets
• Human Resources: Information about employees, salaries, payroll
taxes.
▪ Manufacturing: management of production, inventory, orders, supply
chain.
▪ Banking and finance
• customer information, accounts, loans, and banking transactions.
• Credit card transactions
• Finance: sales and purchases of financial instruments (e.g., stocks
and bonds; storing real-time market data)
▪ Universities: registration, grades
In the early days, database applications were built directly on top of file
systems, which leads to:
▪ Atomicity of updates
• Failures may leave database in an inconsistent state with partial
updates carried out
• Example: Transfer of funds from one account to another should either
complete or not happen at all
▪ Concurrent access by multiple users
• Concurrent access needed for performance
• Uncontrolled concurrent accesses can lead to inconsistencies
▪ Example: Two people reading a balance (say $100) and updating
it by withdrawing money (say $50 each) at the same time
▪ Security problems
• Hard to provide user access to some, but not all, data
Rows
▪ A database system is partitioned into modules that deal with each of the
responsibilities of the overall system
▪ The functional components of a database system can be divided into
• The storage manager component
• The query processor component
• The transaction management component
▪ Centralized databases
• A few cores, shared memory
▪ Client-server
• One server machine executes work on behalf of multiple client
machines
▪ Parallel databases
• Many cores, shared memory
• Shared disk
• Shared nothing
▪ Distributed databases
• Geographical distribution
• Schema/data heterogeneity
A person who has central control over the system is called a database
administrator (DBA), whose functions are:
▪ Schema definition
▪ Storage structure and access-method definition
▪ Schema and physical-organization modification
▪ Granting of authorization for data access
▪ Routine maintenance
▪ Periodically backing up the database
▪ Ensuring that enough free disk space is available for normal
operations, and upgrading disk space as required
▪ Monitoring jobs running on the database and ensuring that
performance is not degraded by very expensive tasks submitted by
some users
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony 157 73 0 0 0 0
Brutus 4 157 0 1 0 0
Caesar 232 227 0 2 1 1
Calpurnia 0 10 0 0 0 0
Cleopatra 57 0 0 0 0 0
mercy 2 0 3 5 5 1
worser 2 0 1 1 1 0
37
Based on Database System Concepts - 7th Edition 1.37
Sec. 6.2.2
tf-idf Weighting
38
Based on Database System Concepts - 7th Edition 1.38
Sec. 6.3
tf-idf Weighting
Each document is now represented by a real-valued vector of tf-idf weights ∈ R|V|. Here,
V=7, which is the number of terms.
Antony and Cleopatra Julius Caesar The Tempest Hamlet Othello Macbeth
Antony 5.25 3.18 0 0 0 0.35
Brutus 1.21 6.1 0 1 0 0
Caesar 8.59 2.54 0 1.51 0.25 0
Calpurnia 0 1.54 0 0 0 0
Cleopatra 2.85 0 0 0 0 0
mercy 1.51 0 1.9 0.12 5.25 0.88
worser 1.37 0 0.11 4.15 0.25 1.95
39
Based on Database System Concepts - 7th Edition 1.39
End of Part 1