Exercise Loans With Solutions
Exercise Loans With Solutions
warehousing
Data Management
A.Y. 2022/23
Maurizio Lenzerini
Conceptual Schema of the operational data
Logical schema of the operational data
Person(SSN)
foreign key Person[SSN] ⊆ HasIncome[person]
HasIncome(person,date,qty,incomeClass)
foreign key HasIncome[person] ⊆ Person[SSN]
HasChild(parent,child)
foreign key HasChild[parent] ⊆ Person[SSN]
foreign key HasChild[child] ⊆ Person[SSN]
Loan(borrower,branchcode,bankcode,startdate,enddate,amount,rate,status,category,purpose)
foreign key Loan[borrower] ⊆ Person[SSN]
foreign key Loan[branchcode,bankcode] ⊆ Branch[code,bank]
Branch(code,bank,city)
foreign key Branch[bank] ⊆ Bank[code]
Bank(code,name)
HasDiscount(borrower,branchcode,bankcode,startdate,dcode)
foreign key HasDiscount[borrower,branchcode,bankcode,startdate] ⊆
Loan[borrower,branchcode,bankcode,startdate]
foreign key HasDiscount[dcode] ⊆ DiscountDecision[code]
DiscountDecision(code,date,type,amount)
Requirements for data warehousing
For every loan we are interested in the starting date, the end date, the borrower (with SSN,
number of children, income at the starting date, class of income), the branch of the bank where
the loan was issued (with name and city of the branch), the category (e.g., fixed rate or one of
different types of variable rate), the purpose of the loan (e.g., to buy a car, a house, a personal
loan, ...), the discount type (if it applies) and the status (fully repaid, defaulted, ...). Also, for
every loan the following values are of interest: the amount, the interest rate and the amount of
discount (if it applies).
Typical questions for starting OLAP sessions on the data warehouse by business analysts:
• Give the average interest rate, per loan category and branch in the various months of the start
date.
• For all branches, give the minimum, maximum interest rate per loan purpose and per class of
income of the borrower.
• Give the number of loans and the average duration (in years) per branch and per number of
children of the borrower.
• Give the percentage of defaulted loans per month of the end year and per city of the branch
where the loan was issued.
What to do in the exercise
income
borrower incomeClass
date start
Loan branch
month amount
year branchCity
interestRate
end discount*
Note:
There may exist some fact with no value associated to “discount” (see *).
Integrity constraints:
• “borrower”, “branch” and “start” form an identifier for “Loan”
• For every fact F of type Loan, the fact F has a value for “discount” if and only if it has a value for the dimension
“discountType”.
Borrower
Star schema
Dimension
keyBorrower Tables
Table
borrower Branch
SSN keyBranch
numChildren branchcode
income bankcode
Dimension incomeClass branchCity
Table
Fact
Loan Status
Table borrower keyStatus
Date branch status
start
keyDate Category
date end
keyCategory
month status
category
year category
purpose Purpose
discountType keyProperties
Integrity constraints:
purpose
• “borrower”, “branch” and “start” form a key amount
for “Loan” interestRate DiscountType
• For every tuple t of Loan, then t.discount is discount*
keyDiscountType
NULL if and only if t.discountType is NULL DiscountType
Example of ETL query
We consider the ETL process (based on a query, in this case) that loads the data
into the dimension table “Borrower”. Here is the SQL code corresponding to such
process:
We consider the first query: Give the average interest rate, per loan category and
branch in the various months of the start date.
Here is the SQL code corresponding to such query: