lecture012
lecture012
Lecture 01
Course Logistics
●
Instructor: George Kuan
– In-person MW 10:10AM - 11:40AM CDM 224
– Async
●
Lectures: In-person and online video archives
●
D2L
●
Office hours: Mon & Wed 9:20-10:10am CDM 224, by appointment on
Zoom
●
Email: [email protected] (prefix subject with [CSC355])
ORACLE Resources
●
https://academy.oracle.com/
●
https://livesql.oracle.com
Course Policies
●
Prerequisites:
– CSC 301 Data Structures II, or
– CSC 393 Data Structures in C++
●
Late policy:
– Late day bank of 2 days. No graded items will be
accepted beyond this provision.
Assessment
●
Grading:
– Class Participation 5%
– Takehome Labs 25%
– Quizzes 10%
– Midterm Exam 25%
– Final Exam 35%
●
Paper & pen Midterm and Final given in person (for sync, please schedule ahead of time or in-
person on Loop campus or testing center proctored exam)
●
2 quizzes will be given in-person for in-person section and online for sync section.
●
There will be no last minute makeup allowances for quizzes or exams unless documented
exception.
Academic Integrity
●
DePaul University is a learning community that promotes the intellectual development of each individual within the community. The
University believes that all members of the community are responsible for adherence to these standards for academic honesty, and that all
violations of academic integrity are detrimental to the intellectual development of individuals within the community and to the community at
large.
●
Violations of academic integrity include, but are not limited to:
– Cheating
– Plagiarism
– Academic Misconduct
– Complicity
– Noncompliance
Databases are Everywhere
●
Amazon
●
Instagram
●
Southwest
●
Chase
●
CampusConnect
Where is data?
●
Factual information (such as measurements or
statistics) used as a basis for reasoning,
discussion, or calculation.
●
information in digital form that can be
transmitted or processed
Source: merriam-webster.com
What is data? - Examples
●
Color, year, make model, price, etc.
●
Name, last name, DOB, age, SSN
●
GPA, program, college
●
Glucose, Globulin, Calcium, Leukocytes, HbA1c
What is a database?
●
A database is an organized collection of
logically related data that are typically…
– Persistent: are stored on stable medium
– Shared: have multiple uses and interested users
– Interrelated: form a bigger picture
Database Management Systems (DBMS)
●
A collection of software components that lets
you:
– create
– maintain (modify, keep available)
– control access to
a database
(R)DBMS
●
Examples: Oracle, MS SQL Server, PostgreSQL, MySQL,
MariaDB
●
How to work with DBMS:
– Directly: via CLI
– GUI: SQLDeveloper, TOAD, Management Studio
– Programmatically using an API or library: Active Record and Data
Mapper such as Django, Rails ActiveRecord, JDO
●
The database and DBMS together comprise a database system
Why use a database system?
●
Early data processing systems used files of data in plain
text form
●
Problem: program-data dependence led to
– limited data sharing
– duplication of data
– increased time for development and maintenance
●
We are looking for Data Independence...
Data Independence
●
Data is separated from programs and
applications
●
Changes to the data should not affect the
others
Why use a database system?
●
A database system uses a “single repository” of
data accessed by multiple users
– Contains information on the structure of the data
– Allows sharing of and concurrent access to data
– Support different views of the data
●
What are the benefits?
Benefits
●
Program-data independence
●
Controlled data redundancy
●
Controlled access to data
●
Support for multiple user interfaces
●
More efficient query processing
●
Faster application development
Database Models
●
Older Models:
– File Systems, Hierarchical, Network
– Drawbacks…
●
The Relational Model (Codd 1969)
●
Newer models:
– Semi-structured, object-relational, NoSQL
– Graph databases (Neo4J, Amazon Neptune)
– Vector databases to store for ML
File Systems
●
Data stored in simple text files, each one
possibly having a different fixed organization of
its data
●
High level of program-data dependence
●
Difficult to share data
●
Not practical to optimize queries
Hierarchical/Network Models
Hierarchical/Network Models (2)
●
Hierarchical Model: Data arranged in “parent-child”
relationships
●
Network Model: Can represent more general
relationships among types of data
●
Both models have similar weaknesses:
– Applications must navigate relationships explicitly
– DBMS cannot rearrange data to optimize queries
Relational Model
Relational Model (2)
●
First model to separate the logical structure of the
database from its physical implementation
●
Data are divided into two-dimensional tables called
relations
●
Tables are linked by shared columns of data
●
Rules exist for dividing data among tables
●
A standardized query language exists (SQL [Chamberlin
and Boyce 1970s])
Relational Model (3)
Newer Models
●
Semi-structured databases: Store collections of data in XML files
●
Object-relational databases: Add support for structured data types to relational databases
●
Document databases: Have a less restrictive structure, typically without a fixed schema
●
Data warehouses: Integrate multiple sources of data, possible from different models
●
Graph databases
●
Wide-column store: Name and format of columns can vary from row to row, like a 2D key-value store (e.g., Google Bigtable, Apache
Cassandra, DataStax, Azure Tables)
●
Document store: (e.g., CouchDB, Microsoft Cosmos, ElasticSearch, MongoDB, Apache Solr)
●
Key-value store: Opaque collection of rows where each row may have different fields (RocksDB, Amazon DynamoDB, Memcached, Redis)
●
Vector database/store: Uses Approximate Nearest Neighbors to search collection of vectors (fixed-length list of numbers). Used for LLM
retrieval-augmented generation (RAG). (e.g., Pinecone, Qdrant, Postgres with pgvector)
Components of a DBMS
User Interactions with DBMS
●
Database Definition:
– database schema
– links between tables
– constraints
●
Query Processing: Request retrieval or modification of data
(“Queries”/”actions”)
●
Transaction Processing: Execute sets of operations that must be
executed as a unit (“transactions”)
Higher-Level User Interactions with DBMS
●
Database Object-Relational Models (ORMs)
– ActiveRecord (Rails and Django): Insert, Update,
Delete, and properties = columns, tries to be
transparent
– Data mapper (Java Hibernate, SQLAlchemy, Apple
Core Data): CRUD, tries to make domain and database
independent, possibly more performant
●
LangChain Q/A
Approximate Course Schedule
●
Week 1: Introduction and Relational Model
●
Weeks 2-5: SQL DDL, Queries, Transactions
●
Weeks 6-7: Relational Database Design
●
Weeks 8-9: Constraints and Triggers, Database
Programming, Views
●
Week 10: Additional Topics, Course Review
SQLDeveloper
●
Front-end connection to a server running
Oracle DBMS
●
SQL commands can be run individually or
collected in script files
●
Can be downloaded free from Oracle
Setting Up a Connection
●
To set up a new connection to acadoradbprd01:
– Connection Name: YOURNAME355
– Username: your campusconnect username
– Password: cdm####### (your 7-digit Student ID)
– Hostname: acadoradbprd01.dpu.depaul.edu
– Port: 1521
– SID: ACADPRD0
– Test, then Connect
●
Double-click to Open an existing connection
●
Disconnect (and commit) when you’re done!
Running SQL Commands
●
Single SQL Command:
– Type command, then Execute (Ctrl-Enter)
●
to change password, ALTER USER username IDENTIFIED BY
newpassword;
●
Script (SQL commands stored in a file):
– Type @ followed by full path to script file, then Run Script (F5)
●
Output will appear in bottom window under Query Result or
Script Output
Browsing Database Tables
●
Left window shows current Tables, click on + to expand
list
●
Right-click on Tables and choose Refresh to see
changes (can also Commit changes)
●
Click on a table to view it in the center window (may
need to Refresh view also)
– COLUMNS shows schema
– DATA shows contents
Saving SQLDeveloper Output
●
Three ways:
– Click on Save icon to save contents of Script Output
window to a file
– Highlight and then copy and paste contents of
Script Output window to a file
– Take and save screenshot of SQLDeveloper display