DS4001 Databases (7.
5 credits)
Lecture 1 - Introduction
Yuantao Fan
[email protected]
Halmstad University
Why Studying Databases (DB)?
• Databases are everywhere -> Data is everywhere
– World wide web (www)
– Education: Student, Staff, Course, Grade…
– Research: Sensor data, Demographical data…
– Industry: Logistics, Production, Sales, Inventories etc.
– Finance: banking, stockmarket etc.
What is a Database?
• Data
– Facts (text, numbers)
– Pictures/videos - numbers (of pixels) of spatial relations
– One of the most critical assets of many business
– Needs to be secure, and the operation should be cost-efficient
• Database – A repository/collection of data that can be accessed via digital platform
– Structured – data is stored efficiently
– Persistent – data will not lost without deliberate action
– Mutable – data can be added/modefied/deleted
• Different kind of database store data in different forms
• Databases are the core component of most computer applications.
Database Management System (DBMS)
• Database
– a repository/collection of data managed
by a specialized software called DBMS
• A DBMS is software that allows
applications to store and analyze
information in a database.
• A general-purpose DBMS is designed
to allow the definition, creation,
querying, update, and administration
of databases.
• Database, Database Server, Databse
System, Data Server, DBMS – often
used interchangeably
Databases examples
• Create a database for a digital music store
– Keeping track of various artistis and their albums
– Storing information about artistis
– Releasing data for the albums
– …
• Build an onboard data log device on heavy-duty
vehicles
– Keeping track of sensor data (mileage, fuel
temperature, load, engine torque etc.) via the CAN
network
– Storing data for diagnosis or prognosis purposes
–…
Databases examples
• Create a database for company
storing emplyee and their
information
– Storing position, department,
salary etc.
Why not using a file system?
• Flat File Strawman
– General/simple process for storing data (with relations)
– Store data in comma-separated value (CSV) files
• Use separate file per entity
• The application has to parse the file each time accessing the content
– Disadvantages
• Values error
• Different platforms
• Access from different users
• Very inefficient and “cumbersome” to work with
• …
DBMS
• Early DBMSs
– Database applications were difficult to develop and maintain
– Development are pretty much queries/application specific
• Need to know very concrete usage
• Modern DBMSs
– Efficient for accessing, and processing, huge amounts of data
– Handle persistent data; garantees constriants on data
– Handles concurrent access to data
Relational databases
• Data stored in tabular form – columns and rows
• Columns contain item properties, e.g. Last Name, First Name, etc.
• (Can also be viewed as mathematical relations)
Relational databases
• Data stored in tabular form – columns and rows
• Columns contain item properties, e.g. Last Name, First Name, etc.
• (Can also be viewed as mathematical relations)
Relational Model (RM)
• An approach managing data
– using a structure and language consistent with first-order predicate logic
– Data is represented in terms of tuples, grouped in relations
– Codd, Edgar F. "A relational model of data for large shared data banks." Communications of the ACM 13.6
(1970): 377-387
• Relational algebra
– Projection, union, intersection, difference, product, joints etc…
• Idea
– Store database in simple data structure
– Access data through high-level language
• Data models
– A data model is collection of concepts for describing the data in a database
– A schema is a description of a particular collection of data, using a given data model
Structured Query Language (SQL)
• SQL is a domain-specific and standardized language for manipulating relational
databases
– Common language supported by lots of different DBMS
– Create table, manipulate content and query information in databases
– Very popular
– Ppl call it ‘Sequel’
• Relational languages
– Data Definition Language
– Data Control Language
– Data Manipulation Language
Why Structured Query Language (SQL) for Data Science?
• One of the essential skill in data science
• Advantages
– Give you a good understanding of relational database
– Many applications rely on Databases
– Boost your professional profile
• Applications
– Big data
– Table with a few rows
– Small start-ups
– Big Database
– Mobile phone
– Vehicles
Other Database models
NoSQL-databases
• Key-value pairs (Oracle
NoSQL, Riak …)
• Hierarchical models
• Column Family
• Document (XML, JSON)
Course Objectives
• Learn how to query databases using SQL
– Data science; machine learning workflow
• Design a database
– ER-diagram
– Schema
Domain ER-diagram / SQL code
Schema
descriptioin high level design Relational DBMS
Course Content
• Week 1 • Week 5
– Introduction to the course – Database & Query applications
• Accessing database with Python
– Basic SQL statements 1
– Project Feedback
– Lab 0 - setting up the environment
– Lab 4 – query within python & machine learning
• Week 2 workflow
– Basic SQL statements 2 • Week 6
– DDL and DML exercise – A few relevant topics, e.g. relational algebra
– Lab 1 – Exercise on basic SQL statments – Finalizing the project
• Week 3 • Week 7
– Designing databases 1 – Project Presentation & Discussion
– Designing databases 2, Project introduction – Summary
– Lab 2 – ER and designing database • Week 8
• Week 4 – Revision of the course
– Doing more with SQL
• Functions, and sub-queries, multi-tables
• Week 9
– Project idea hand-in
– Written Examination
– Lab 3 – More advanced SQL statements
– Project Hand-in
Course Schedule
• Each week
– 2 lectures (on-site campus, zoom on special occasions)
• The first lecture introduces core content
• The second lecture focus on practices and exercises
– 1 Q&A / lab /project sessions
• Supervision on lab assignment
• 7.5 credits; 20 hours per week; 9 weeks (including examination)
– 6 hours in the classroom
– Suggestion - 6 to 10 hours on lab assignment (and the project); the rest: reading, repetition,
explore, summary and more
23
Course Materials
• Main course materials (Lecture slides, Labs, Projects)
– Blackboard, bb.hh.se
• Lecture slides
• Lab assignments
• Project assignment
– Please check if you have access to the course(s)
• If not, send me an email to:
[email protected]• Text book
– Elmasri, R., et al. "FUNDAMENTALS OF Database Systems SEVENTH EDITION." (2016).
– Hector Garcia, Molina, Jeffrey D. Ullman & Jennifer Widom. Database Systems: The Complete Book, 2. uppl.
Pearson Education, 2013
24
Grading Criteria
• Project Assignment
– 4 Labs
– 1 Project
• Introduction in week 3
• Hand-in week 4, final hand-in last week.
• Written examination
– Most likely on campus