0% found this document useful (0 votes)

356 views

ccs341 Data Warehouse Lab Experiments

This document provides information about the data warehousing lab experiments at PSV College of Engineering and Technology including the vision, mission and program details of the institution and computer science department. It outlines the practical exercises and course outcomes for the data warehousing subject.

Uploaded by

karthika murugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

356 views

ccs341 Data Warehouse Lab Experiments

Uploaded by

karthika murugan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

lOMoARcPSD|23661327

CCS341-DATA Warehouse LAB Experiments

data warehousing lab (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by karthika murugan ([email protected])
lOMoARcPSD|23661327

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

(Approved by AICTE, New Delhi and Affiliated to Anna University, Chennai)
Accredited by the NAAC with ‘A’ Grade
(Inclusion Under Section 2 (f) & 12 (B) of the UGC Act, 1956)
(An ISO 9001: 2015 Certified Institution)
Bangalore - Chennai Highway, (NH-46),
Mittapalli, Balinayanapalli Post, Krishnagiri - 635 108.

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(THIRD YEAR VI SEM)

LABORATORY SUBJECT CODE/ NAME : CCS341/DATA WAREHOUSING

STAFF NAME: B.NEELU (ASSISTANT PROFESSOR)

Vision and Mission of the Institute

Vision:
To be recognized at national level for quality technical education with ethics
supported by research leading to produce innovative, entrepreneurial, and successful
engineers.
Mission:
M1: To provide state of the art education with strong Engineering basics and managerial
skills.
M2: To develop students with good Engineering skills for designing and developing
solutions to cater the need of industries and society
M3: To develop the institute as a Hub, working constantly in chase of brilliance in
Engineering education, Research and technology transfer to the Industries and society at a
large
M4: To inculcate qualities required for becoming a good entrepreneur

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Vision and Mission of the Department

Vision:
To provide an amiable atmosphere to widen and bloom as strong professionals,
socially mindful and globally accountable personalities

Mission:
M1: To expose high-class wisdom to the students by providing lively learning
atmosphere to enlarge practical and headship talent to shine as a resourceful expert
M2: To inculcate life-long learning skill that permits the students to adapt and response
to the revolution in technology in the global market
M3: To develop student community with professional ethics to start pioneering research
and development in the thrust areas
M4: To make the students to learn the emerging technologies and behaving ethically in
their professional life.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Program Educational Objectives (PEOs)

PEO1:
Apply their technical competence in computer science to solve real world problems,
with technical and people leadership.

PEO2:
Conduct cutting edge research and develop solutions on problems of social relevance.
PEO3:
Work in a business environment, exhibiting team skills, work ethics, adaptability and
lifelong learning

Program Specific Outcomes (PSOs)

PSO1:
Exhibit design and programming skills to build and automate business solutions
using cutting edge technologies.
PSO2:
Strong theoretical foundation leading to excellence and excitement towards research,
to provide elegant solutions to complex problems.
PSO3:
Ability to work effectively with various engineering fields as a team to design, build
and develop system applications.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Program Outcomes (POs)

Engineering Graduates will be able to:

1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering

fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem Analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
3. Design / Development of Solutions: Design solutions for complex engineering
problems and design system components or processes that meet the specified needs with
appropriate consideration for the public health and safety, and the cultural, societal, and
environmental considerations.
4. Conduct Investigations of Complex Problems: Use research-based knowledge and
research methods including design of experiments, analysis and interpretation of data,
and synthesis of the information to provide valid conclusions.
5. Modern Tool Usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.
6. The Engineer and Society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent responsibilities
relevant to the professional engineering practice.
7. Environment and Sustainability: Understand the impact of the professional
engineering solutions in societal and environmental contexts, and demonstrate the
knowledge of, and need for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities
and norms of the engineering practice.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

9. Individual and Team Work: Function effectively as an individual, and as a member or

leader in diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions.
11. Project Management and Finance: Demonstrate knowledge and understanding of the
Engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and in multidisciplinary environments.
12. Life - Long Learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DATA WAREHOUSING (CCS341)

PRACTICAL EXERCISES:
1. Data exploration and integration with WEKA
2. Apply weka tool for data validation
3. Plan the architecture for real time application
4. Write the query for schema definition
5. Design data ware house for real time applications
6. Analyse the dimensional Modeling
7. Case study using OLAP
8. Case study using OTLP
9. Implementation of warehouse testing.

COURSE OUTCOMES:
CO1: Design data warehouse architecture for various Problems
CO2: Apply the OLAP Technology
CO3: Analyse the partitioning strategy
CO4: Critically analyze the differentiation of various schema for given problem
CO5: Frame roles of process manager & system manager

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

EXP 1. Data exploration and integration with WEKA

AIM:
To Explore Data and Integrate with WEKA
ALGORTIHM AND EXPLORES:
1. Download and install Weka. You can find it here:
http://www.cs.waikato.ac.nz/mn/weka/downloading.html
2.Open the weka tool and select the explorer option.
3.New window will be opened which consists of different options (Preprocess,
Association etc.)
3. In the preprocess, click the ―open file‖ option.
4.Go to C:\Program Files\Weka-3-6\data for finding different existing. arff datasets.
Click on any dataset for loading the data then the data will be displayed as shown below

Load each dataset and observe the following:

Here we have taken IRIS.arff dataset as sample for observing all the below things.

i. List the attribute names and they types

There are 5 attributes& its datatype present in the above loaded dataset
(IRIS.arff) sepallength – Numeric sepalwidth – Numeric petallength –
Numeric petallength – Numeric Class – Nominal
ii. Number of records in each dataset

There are total 150 records (Instances) in dataset (IRIS.arff).

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

iii. Identify the class attribute (if any)

There is one class attribute which consists of 3 labels.

They are: 1. Iris-setosa
2. Iris-versicolor
3. Iris-virginica

iv. Plot Histogram

v. Determine the number of records for each class.

There is one class attribute (150 records) which consists of 3 labels. They are shown
below 1. Iris-setosa - 50 records
2. Iris-versicolor – 50 records
3. Iris-virginica – 50 records

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

vi. Visualize the data in various dimensions

RESULT:
Thus the data exploration and integration with WEKA executed successfully.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

EXP 2. Apply WEKA tool for Data Validation

AIM:
To Apply WEKA tool for Data Validation
Steps and Apply:
1. Load the dataset (Iris-2D. arff) into weka tool
2. Go to classify option & in left-hand navigation bar we can see differentclassification
algorithms under rules section.
3. In which we selected JRip (If-then) algorithm & click on start option with ―use
training set‖ test option enabled.
4. Then we will get detailed accuracy by class consists ofF-measure, TP rate, FP rate,
Precision, Recall values& Confusion Matrix as represented below.

Using Cross-Validation Strategy with 10 folds:

Here, we enabled cross-validation test option with 10 folds & clicked start button as
represented below.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

Using Cross-Validation Strategy with 20 folds:

Here, we enabled cross-validation test option with 20 folds & clicked start button as
represented below.

If we see the above results of cross validation with 10 folds & 20 folds. As per our
observation the error rate is lesser with 20 folds got 97.3% correctness when compared to
10 folds got 94.6% correctness.

RESULT: Thus the WEKA tool for Data Validation done Successfully.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

EXP 3.Plan the architecture for real time application

Aim:
To plan the architecture for a real-time application using Weka, you need to consider several
factors. Weka is a popular machine learning library that provides various algorithms for data
mining and predictive modelling.

Here are the steps to plan the architecture:

1. Define the problem: Clearly understand the problem you are trying to solve with
your real-time application. Identify the specific tasks and goals you want to achieve
using Weka.
2. Data collection and preprocessing: Determine the data sources and collect the
required data for your application. Preprocess the data to clean, transform, and
prepare it for analysis using Weka. This may involve tasks like data cleaning, feature
selection, normalization, and handling missing values.
3. Choose the appropriate Weka algorithms: Weka offers a wide range of machine
learning algorithms. Select the algorithms that are suitable for your problem and data.
Consider factors like the type of data (classification, regression, clustering), the size
of the dataset, and the computational requirements.
4. Real-time data streaming: If your application requires real-time data processing, you
need to set up a mechanism to stream the data continuously. This can be done using
technologies like Apache Kafka, Apache Flink, or Apache Storm. Ensure that the data
streaming infrastructure is integrated with Weka for seamless processing.
5. Model training and evaluation: Train the selected Weka algorithms on your training
dataset. Evaluate the performance of the models using appropriate evaluation metrics
like accuracy, precision, recall, or F1-score. Fine-tune the models if necessary.
6. Integration and deployment: Integrate the trained models into your real-time
application. This may involve developing APIs or microservices to expose the
models' functionality. Ensure that the application can handle real-time requests and
provide predictions or insights in a timely manner.
7. Monitoring and maintenance: Set up monitoring mechanisms to track the
performance of your real-time application. Monitor the accuracy and performance of
the models over time. Update the models periodically to adapt to changing data
patterns or to improve performance.
Remember to document your architecture design and implementation decisions for future
reference. Regularly review and update your architecture as your application evolves and new
requirements arise.

RESULT: Thus architecture for real time applications was Planned.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

EXP 4.Write the query for schema definition

AIM:
To Write the query for schema definition

ALGORITHM:
1. Create a new database
2. Switch to the newly created database
3. Define the schema for each table
4. Define relationships between tables (if needed)
5. Execute the schema definition queries

PROGRAM:

-- Create a new database named "library"

CREATE DATABASE library;

-- Switch to the "library" database

USE library;

-- Define the schema for the "books" table

CREATE TABLE books (
book_id INT AUTO_INCREMENT PRIMARY KEY,
title VARCHAR(255) NOT NULL,
author VARCHAR(100) NOT NULL,
publication_year INT,
isbnVARCHAR(20),
available BOOLEAN DEFAULT TRUE
);

-- Define the schema for the "members" table

CREATE TABLE members (
member_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE,
phone_numberVARCHAR(20),
address VARCHAR(255)
);

-- Define the schema for the "checkouts" table

CREATE TABLE checkouts (
checkout_id INT AUTO_INCREMENT PRIMARY KEY,
book_id INT NOT NULL,

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

member_id INT NOT NULL,

checkout_date DATE NOT NULL,
return_date DATE,
FOREIGN KEY (book_id) REFERENCES books(book_id),
FOREIGN KEY (member_id) REFERENCES members(member_id)
);

OUTPUT:

Database 'library' created.

Database changed to 'library'.

Table 'books' created successfully.

Table 'members' created successfully.

Table 'checkouts' created successfully

RESULT:
Thus Schema Definition was written and executed Successfully.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

5. Design data ware house for real time applications

AIM:
To Design data ware house for real time applications

ALGORITHM AND PROGRAM:

1. *Data Sources and Integration*:
sql
-- Example: Creating a Snowpipe for real-time data ingestion from an external stage
CREATE PIPE snowpipe_real_time
AUTO_INGEST = TRUE
AS
COPY INTO temperature_data
FROM (SELECT $1::timestamp, $2::int, $3::float FROM @real_time_stage)
FILE_FORMAT = (TYPE = 'JSON');

2. Data Storage and Modeling:

sql
-- Example: Creating tables for storing real-time data
CREATE TABLE temperature_data (
timestamp TIMESTAMP,
sensor_id INT,
temperature FLOAT
);

3. Data Governance and Security:

sql
-- Example: Creating roles and granting privileges
CREATE ROLE analyst_role;
GRANT USAGE ON DATABASE my_database TO analyst_role;
GRANT SELECT ON temperature_data TO analyst_role;

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

4. Monitoring and Performance Optimization:

sql
-- Example: Monitoring query performance using Snowflake's query history
SELECT * FROM
TABLE(INFORMATION_SCHEMA.QUERY_HISTORY_BY_USER('ANALYST_USER')
);

5. Deployment and Testing:

- Deployment would involve setting up Snowflake accounts, databases, and resources,
which are typically done through the Snowflake web interface or via Snowflake's APIs.
Testing would involve validating the data ingestion process, querying data, and ensuring
proper access controls.

6. Training and Documentation:

- Training sessions and documentation would cover topics such as Snowflake SQL syntax,
data modeling best practices, and security principles.

7. Iterative Improvement and Maintenance:

- This would involve ongoing monitoring of system performance, optimizing queries and
data models as needed, and iterating on the data warehouse design based on user feedback
and evolving business requirements.
OUTPUT:

+----------------------------+---------------+-------------------+
| timestamp | sensor_id | temperature |
|-----------------------------|--------------|-------------------|
| 2024-02-06 10:00:00 | 1 | 25.5 |
| 2024-02-06 10:01:00 | 2 | 26.3 |
| 2024-02-06 10:02:00 | 1 | 24.8 |
| 2024-02-06 10:02:30 | 3 | 27.1 |
| 2024-02-06 10:03:00 | 2 | 26.7 |
| 2024-02-06 10:04:00 | 1 | 25.2 |
+---------------------+-----------+-----------------------------+

RESULT:
Thus Data Warehouse for real time application Designed.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

EXP 6.Analyse the dimensional Modeling

AIM:
To Analyse the dimensional Modeling

ALGORITHM:
1. Identify the business process
2. Identify dimensional and facts
3. Design the dimensional model
4. Define relationships
5. Optimize for query performance

PROGRAM:
1. *Sales Fact Table:*
sql
CREATE TABLE SalesFact (
SaleID INT PRIMARY KEY,
DateID INT,
ProductID INT,
QuantitySold INT,
AmountSoldDECIMAL(10, 2)
);

2. *Date Dimension:*
sql
CREATE TABLE DateDim (
DateID INT PRIMARY KEY,
CalendarDate DATE,
Day INT,
Month INT,
Year INT
);

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

-- Populate Date Dimension (sample data)

INSERT INTO DateDim (DateID, CalendarDate, Day, Month, Year)
VALUES
(1, '2024-01-01', 1, 1, 2024),
(2, '2024-01-02', 2, 1, 2024),
-- Add more dates as needed
;

3. *Product Dimension:*
sql
CREATE TABLE ProductDim (
ProductID INT PRIMARY KEY,
ProductName VARCHAR(255),
Category VARCHAR(50),
-- Additional attributes as needed
);

-- Populate Product Dimension (sample data)

INSERT INTO ProductDim (ProductID, ProductName, Category)
VALUES
(101, 'Product A', 'Electronics'),
(102, 'Product B', 'Clothing'),
-- Add more products as needed
;

4. Query to retrieve sales with date and product details:

sql
SELECT
s.SaleID,
d.CalendarDate,
p.ProductName,

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

s.QuantitySold,
s.AmountSold
FROM
SalesFact s
JOIN DateDim d ON s.DateID = d.DateID
JOIN ProductDim p ON s.ProductID = p.ProductID;

This query retrieves sales information along with corresponding date and product details,
leveraging the dimensional model.

OUTPUT:

| SaleID | CalendarDate | ProductName | QuantitySold | AmountSold |

|----------|-------------------|--------------------|-------------------|------------------|
|1 | 2024-01-01 | Product A | 10 | 100.00 |
|2 | 2024-01-02 | Product B | 5 | 50.00 |
|3 | 2024-01-02 | Product A | 8 | 80.00 |

RESULT:
Thus the dimensional modelling Analysed Successfully.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

7. Case study using OLAP

AIM:
To study case using OLAP
Introduction:
In this case study, we will explore how Online Analytical Processing (OLAP)
technology was implemented in a retail data warehousing environment to improve data
analysis capabilities and support decision-making processes. The case study will focus on a
fictional retail company, XYZ Retail, and the challenges they faced in managing and
analyzing their vast amounts of transactional data.

Background:
XYZ Retail is a large chain of stores with locations across the country. The company
has been experiencing rapid growth in recent years, leading to an increase in the volume of
data generated from sales transactions, inventory management, customer interactions, and
other operational activities. The existing data management system was struggling to keep up
with the demand for timely and accurate data analysis, hindering the company's ability to
make informed business decisions.

Challenges:
1. Lack of real-time data analysis: The existing data warehouse system was unable to provide
real-time insights into sales trends, inventory levels, and customer preferences.
2. Limited scalability: The data warehouse infrastructure was reaching its limits in terms of
storage capacity and processing power, making it difficult to handle the growing volume of
data.
3. Complex data relationships: The data stored in the warehouse was highly normalized,
making it challenging to perform complex queries and analyze data across multiple
dimensions.

Solution:
To address these challenges, XYZ Retail decided to implement an OLAP solution as
part of their data warehousing strategy. OLAP technology allows for multidimensional
analysis of data, enabling users to easily slice and dice information across various dimensions
such as time, product categories, geographic regions, and customer segments.

Implementation:
1. Data modeling: The data warehouse was redesigned using a star schema model, which
simplifies data relationships and facilitates OLAP cube creation.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

2. OLAP cube creation: OLAP cubes were created to store pre-aggregated data for faster
query performance. The cubes were designed to support various dimensions and measures
relevant to the retail business.
3. Reporting and analysis: Business users were trained on how to use OLAP tools to create
ad-hoc reports, perform trend analysis, and drill down into detailed data.

Results:
1. Improved data analysis: With OLAP technology in place, XYZ Retail was able to perform
complex analyses on sales data, identify trends, and make informed decisions based on real-
time insights.
2. Faster query performance: OLAP cubes enabled faster query performance compared to
traditional relational databases, allowing users to retrieve data more efficiently.
3. Enhanced decision-making: The ability to analyze data across multiple dimensions helped
XYZ Retail gain a deeper understanding of their business operations and customer behavior,
leading to more strategic decision-making.

Conclusion:
By leveraging OLAP technology in their data warehousing environment, XYZ Retail
was able to overcome the challenges of managing and analyzing vast amounts of data. The
implementation of OLAP not only improved data analysis capabilities but also empowered
business users to make informed decisions based on real-time insights. This case study
demonstrates the value of OLAP in enhancing data analysis and decision-making processes in
a retail environment.

RESULT:
Thus case study using OLAP done successfully.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

8. Case study using OTLP

AIM:
To study case using OTLP

Introduction:
This case study explores the implementation of the Operational Data Layer Pattern (OTLP) in
a data warehousing environment to improve data integration, processing, and analytics
capabilities. The case study focuses on a fictional company, Tech Solutions Inc., and how
they leveraged OTLP to enhance their data warehousing operations.

Background:
Tech Solutions Inc. is a technology consulting firm that provides IT solutions to various
clients. The company collects a vast amount of data from different sources, including
customer interactions, sales transactions, and operational activities. The existing data
warehouse infrastructure was struggling to handle the growing volume of data and provide
real-time insights for decision-making.

Challenges:
1. Data silos: Data from different sources were stored in separate silos, making it difficult to
integrate and analyze data effectively.
2. Real-time data processing: The existing data warehouse was not capable of processing
real-time data streams, leading to delays in data analysis and decision-making.
3. Scalability: The data warehouse infrastructure was reaching its limits in terms of storage
capacity and processing power, hindering the company's ability to scale with the growing
data volume.

Solution:
To address these challenges, Tech Solutions Inc. decided to implement the OTLP pattern in
their data warehousing environment. OTLP combines elements of both Operational Data
Store (ODS) and Traditional Data Warehouse (TDW) architectures to enable real-time data
processing, data integration, and analytical capabilities.

Implementation:
1. Data integration: Tech Solutions Inc. integrated data from various sources into the
operational data layer, where data transformations and cleansing processes were applied.
2. Real-time processing: The OTLP architecture allowed for real-time data processing,
enabling the company to analyze streaming data and generate insights in near real-time.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

3. Analytics and reporting: Business users were provided with self-service analytics tools to
create ad-hoc reports, perform trend analysis, and gain actionable insights from the integrated
data.

Results:
1. Improved data integration: The OTLP architecture facilitated seamless integration of data
from multiple sources, breaking down data silos and enabling a unified view of the
company's operations.
2. Real-time analytics: With OTLP in place, Tech Solutions Inc. was able to analyze
streaming data in real-time, allowing for faster decision-making and response to market
trends.
3. Scalability: The OTLP architecture provided scalability to handle the growing volume of
data, ensuring that the company's data warehousing operations could support future growth.

Conclusion:
By implementing the Operational Data Layer Pattern (OTLP) in their data warehousing
environment, Tech Solutions Inc. was able to overcome the challenges of data silos, real-time
data processing, and scalability. The adoption of OTLP not only improved data integration
and analytics capabilities but also empowered business users to make informed decisions
based on real-time insights. This case study highlights the benefits of leveraging OTLP in
enhancing data warehousing operations for improved business outcomes.

RESULT:
Thus case study using OTLP done successfully.

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

9. Implementation of warehouse testing.

AIM:
To implement warehouse testing

Steps with program:

1. Install necessary libraries:
pip install pytest pandas

2. Create a Python script for data transformation and loading:

# data_transformation.py
import pandas as pd
def transform_data(input_data):
# Perform data transformation logic here
transformed_data = input_data.apply(lambda x: x * 2)
return transformed_data

def load_data(transformed_data):
# Load transformed data into the operational data layer
transformed_data.to_csv('transformed_data.csv', index=False)

3. Create test cases using pytest:

# test_data_integration.py
import pandas as pd
import data_transformation

def test_transform_data():
input_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
expected_output = pd.DataFrame({'A': [2, 4, 6], 'B': [8, 10, 12]})
transformed_data = data_transformation.transform_data(input_data)
assert transformed_data.equals(expected_output)

Downloaded by karthika murugan ([email protected])

lOMoARcPSD|23661327

def test_load_data():
input_data = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
data_transformation.load_data(input_data)
loaded_data = pd.read_csv('transformed_data.csv')
assert input_data.equals(loaded_data)

4. Run the tests using pytest:

pytest test_data_integration.py

5. Analyze the test results to ensure that the data transformation and loading processes are
functioning correctly in the operational data layer.

By implementing automated tests for data integration processes in the data warehousing
environment, you can ensure the accuracy and reliability of the data transformation and
loading operations. This approach helps in identifying any issues or discrepancies early on in
the development cycle, leading to a more robust and efficient data warehousing system.

OUTPUT:

RESULT:
Thus implementation of warehouse testing done successfully.

Downloaded by karthika murugan ([email protected])

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
CGM V Sem Lab Manual
No ratings yet
CGM V Sem Lab Manual
17 pages
Lexra Users Guide 5.0
No ratings yet
Lexra Users Guide 5.0
210 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
CCS341 DATA WAREHOUSING FIRST INTERNAL QUESTION Set 1
No ratings yet
CCS341 DATA WAREHOUSING FIRST INTERNAL QUESTION Set 1
2 pages
Data Warehousing ccs341
No ratings yet
Data Warehousing ccs341
103 pages
Data Warehouse 21reg
100% (1)
Data Warehouse 21reg
2 pages
Lab Record-Cs3401 Algorithms
No ratings yet
Lab Record-Cs3401 Algorithms
79 pages
DSA Lab Syllabus
No ratings yet
DSA Lab Syllabus
1 page
Cs3461 Operating Systems Laboratory L T P C
No ratings yet
Cs3461 Operating Systems Laboratory L T P C
1 page
Dbms
No ratings yet
Dbms
99 pages
Data Warehouse-Ccs341 Material
No ratings yet
Data Warehouse-Ccs341 Material
58 pages
Ad3311 Set4
No ratings yet
Ad3311 Set4
2 pages
CCS341 Set2
100% (1)
CCS341 Set2
2 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
CCS341-DATA WAREHOUSING - 1805692571-Ccs341-Question-Bank
No ratings yet
CCS341-DATA WAREHOUSING - 1805692571-Ccs341-Question-Bank
10 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
CCS341 Data Warehousing Question Bank
No ratings yet
CCS341 Data Warehousing Question Bank
2 pages
02 - What Is Full Stack Web Development
No ratings yet
02 - What Is Full Stack Web Development
8 pages
ui&ux .new
100% (1)
ui&ux .new
35 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Aim L Record
No ratings yet
Aim L Record
26 pages
Unit 3
No ratings yet
Unit 3
24 pages
Embedded System & IoT
No ratings yet
Embedded System & IoT
27 pages
AIDS Syllabus 2021 L
No ratings yet
AIDS Syllabus 2021 L
87 pages
CS3551 DC - Unit - Ii Qbank Final With Answers
No ratings yet
CS3551 DC - Unit - Ii Qbank Final With Answers
40 pages
21CS53 DBMS Module3 QuestionBank 2023-24
No ratings yet
21CS53 DBMS Module3 QuestionBank 2023-24
3 pages
CS8591 Computer Networks L T P C 3 0 0 3 Objectives
0% (1)
CS8591 Computer Networks L T P C 3 0 0 3 Objectives
5 pages
Computer Network Lab
No ratings yet
Computer Network Lab
49 pages
21CS52
No ratings yet
21CS52
42 pages
Anna University, Chennai Non-Autonomous Affiliated Colleges Regulations 2021 Choice Based Credit System B.E. Computer Science and Engineering
No ratings yet
Anna University, Chennai Non-Autonomous Affiliated Colleges Regulations 2021 Choice Based Credit System B.E. Computer Science and Engineering
86 pages
Compiler Design Lab Manual
No ratings yet
Compiler Design Lab Manual
36 pages
CS3461 OS Manual
No ratings yet
CS3461 OS Manual
119 pages
cs3251 UNIT II QUESTION BANK
No ratings yet
cs3251 UNIT II QUESTION BANK
4 pages
East West Institute of Technology: Sadp Notes
No ratings yet
East West Institute of Technology: Sadp Notes
30 pages
Ge3151 16marks QB
No ratings yet
Ge3151 16marks QB
36 pages
CS3401 - Algorithm
No ratings yet
CS3401 - Algorithm
37 pages
CCS341 Data Warehousing
No ratings yet
CCS341 Data Warehousing
7 pages
Advanced Algorithms - Cse-Cs
No ratings yet
Advanced Algorithms - Cse-Cs
2 pages
Lesson Plan - CCS341 - DW-C
100% (1)
Lesson Plan - CCS341 - DW-C
5 pages
Study On Intel 80386 Microprocessor
No ratings yet
Study On Intel 80386 Microprocessor
3 pages
Quiz Application in C#
100% (1)
Quiz Application in C#
9 pages
Security Trends, Legal, Ethical and Professional Aspects of Security
No ratings yet
Security Trends, Legal, Ethical and Professional Aspects of Security
3 pages
Ccs341 Question Bank
No ratings yet
Ccs341 Question Bank
10 pages
Question Paper - AI (Feb 1)
No ratings yet
Question Paper - AI (Feb 1)
2 pages
CCS334 BDA Practical Question
No ratings yet
CCS334 BDA Practical Question
2 pages
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
No ratings yet
Ece443 - Wireless Sensor Networks Course Information Sheet: Electronics and Communication Engineering Department
10 pages
Set 3
No ratings yet
Set 3
16 pages
CS3381 - Oops Lab Manual Final Bme
No ratings yet
CS3381 - Oops Lab Manual Final Bme
36 pages
CST 402 - Distributed Computing
No ratings yet
CST 402 - Distributed Computing
78 pages
Aca Lab Manual Final
No ratings yet
Aca Lab Manual Final
28 pages
Lab Manual
No ratings yet
Lab Manual
57 pages
COA Unit 1
No ratings yet
COA Unit 1
33 pages
ML unit-2
100% (1)
ML unit-2
28 pages
Installation of Wire shark, tcpdump, etc and observe data transferred in client server communication using UDP_TCP and identify the UDP_TCP datagram
No ratings yet
Installation of Wire shark, tcpdump, etc and observe data transferred in client server communication using UDP_TCP and identify the UDP_TCP datagram
7 pages
Dpco
No ratings yet
Dpco
58 pages
18CS42 Design and Analysis of Algorithms
No ratings yet
18CS42 Design and Analysis of Algorithms
16 pages
Cloud Computing QB
No ratings yet
Cloud Computing QB
3 pages
Problem Solving and Python Programming L T P C
No ratings yet
Problem Solving and Python Programming L T P C
1 page
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
DW lab manual (4)
No ratings yet
DW lab manual (4)
39 pages
Week 2 CISSP Study Group
No ratings yet
Week 2 CISSP Study Group
75 pages
Data Science With R Workflow: Click The Links For Documentation
No ratings yet
Data Science With R Workflow: Click The Links For Documentation
3 pages
DBMS Syallabus
No ratings yet
DBMS Syallabus
1 page
Titant: Online Real-Time Transaction Fraud Detection in Ant Financial
No ratings yet
Titant: Online Real-Time Transaction Fraud Detection in Ant Financial
12 pages
NodeJs Notes
No ratings yet
NodeJs Notes
28 pages
MySQL Partitioning
100% (12)
MySQL Partitioning
43 pages
Blind XPath Injection
No ratings yet
Blind XPath Injection
11 pages
Primary vs. Secondary Advantages and Disadvantages of Secondary Data Classification of Secondary Data
No ratings yet
Primary vs. Secondary Advantages and Disadvantages of Secondary Data Classification of Secondary Data
9 pages
Data Science Using R
No ratings yet
Data Science Using R
74 pages
BBA Syllabus 1st, 2nd, 3rd & 4th Sem
50% (2)
BBA Syllabus 1st, 2nd, 3rd & 4th Sem
46 pages
CA4 KQSProg Guide Rev G
No ratings yet
CA4 KQSProg Guide Rev G
29 pages
Oracle Database 12c & Oracle Database 12C Rac On Ibm Aix: Tips and Considerations
No ratings yet
Oracle Database 12c & Oracle Database 12C Rac On Ibm Aix: Tips and Considerations
31 pages
"Online Bus Reservation System": A Project Report ON
No ratings yet
"Online Bus Reservation System": A Project Report ON
46 pages
Documentation & Data Dictionary - IMDb and Box Office Mojo
No ratings yet
Documentation & Data Dictionary - IMDb and Box Office Mojo
45 pages
Chapter 4 - Relational Database (Part 1)
100% (1)
Chapter 4 - Relational Database (Part 1)
11 pages
MySQL Tutorial
No ratings yet
MySQL Tutorial
176 pages
Lotus Domino and Visual Basic PDF
100% (1)
Lotus Domino and Visual Basic PDF
164 pages
Unit-1 Transparency in DDBMS
No ratings yet
Unit-1 Transparency in DDBMS
15 pages
Master Thesis Lab Inventory System
No ratings yet
Master Thesis Lab Inventory System
92 pages
Geospatial Information Systems For Supply Chain
No ratings yet
Geospatial Information Systems For Supply Chain
17 pages
Домашнее задание 2
No ratings yet
Домашнее задание 2
11 pages
DSCI 320 Test 1 - Practice Tests Chs 1-8
80% (5)
DSCI 320 Test 1 - Practice Tests Chs 1-8
14 pages
Smart Map IJCSEI 19
No ratings yet
Smart Map IJCSEI 19
21 pages
Sugar Setup Wizard Confirm Settings
No ratings yet
Sugar Setup Wizard Confirm Settings
3 pages
Larsa4d Usermanual
No ratings yet
Larsa4d Usermanual
330 pages
CV-RPA Solution Architect-AA
No ratings yet
CV-RPA Solution Architect-AA
5 pages
Databricks Associate Data Engineer Notes
No ratings yet
Databricks Associate Data Engineer Notes
39 pages
HP SM DocumentEngine
No ratings yet
HP SM DocumentEngine
76 pages
Dice Resume CV Karthik S
No ratings yet
Dice Resume CV Karthik S
4 pages

ccs341 Data Warehouse Lab Experiments

Uploaded by

ccs341 Data Warehouse Lab Experiments

Uploaded by

lOMoARcPSD|23661327

CCS341-DATA Warehouse LAB Experiments

data warehousing lab (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

(THIRD YEAR VI SEM)

LABORATORY SUBJECT CODE/ NAME : CCS341/DATA WAREHOUSING

Vision and Mission of the Institute

Downloaded by karthika murugan ([email protected])

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Vision and Mission of the Department

Downloaded by karthika murugan ([email protected])

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Program Educational Objectives (PEOs)

Program Specific Outcomes (PSOs)

Downloaded by karthika murugan ([email protected])

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Program Outcomes (POs)

Engineering Graduates will be able to:

1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering

Downloaded by karthika murugan ([email protected])

9. Individual and Team Work: Function effectively as an individual, and as a member or

Downloaded by karthika murugan ([email protected])

P.S.V. COLLEGE OF ENGINEERING AND TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Downloaded by karthika murugan ([email protected])

EXP 1. Data exploration and integration with WEKA

Load each dataset and observe the following:

i. List the attribute names and they types

There are total 150 records (Instances) in dataset (IRIS.arff).

Downloaded by karthika murugan ([email protected])

iii. Identify the class attribute (if any)

There is one class attribute which consists of 3 labels.

iv. Plot Histogram

v. Determine the number of records for each class.

Downloaded by karthika murugan ([email protected])

vi. Visualize the data in various dimensions

Downloaded by karthika murugan ([email protected])

EXP 2. Apply WEKA tool for Data Validation

Using Cross-Validation Strategy with 10 folds:

Downloaded by karthika murugan ([email protected])

Using Cross-Validation Strategy with 20 folds:

Downloaded by karthika murugan ([email protected])

EXP 3.Plan the architecture for real time application

Here are the steps to plan the architecture:

RESULT: Thus architecture for real time applications was Planned.

Downloaded by karthika murugan ([email protected])

EXP 4.Write the query for schema definition

-- Create a new database named "library"

-- Switch to the "library" database

-- Define the schema for the "books" table

-- Define the schema for the "members" table

-- Define the schema for the "checkouts" table

Downloaded by karthika murugan ([email protected])

member_id INT NOT NULL,

Database 'library' created.

Database changed to 'library'.

Table 'books' created successfully.

Table 'members' created successfully.

Table 'checkouts' created successfully

Downloaded by karthika murugan ([email protected])

5. Design data ware house for real time applications

ALGORITHM AND PROGRAM:

2. *Data Storage and Modeling*:

3. *Data Governance and Security*:

Downloaded by karthika murugan ([email protected])

4. *Monitoring and Performance Optimization*:

5. *Deployment and Testing*:

6. *Training and Documentation*:

7. *Iterative Improvement and Maintenance*:

Downloaded by karthika murugan ([email protected])

2. Data Storage and Modeling:

3. Data Governance and Security:

4. Monitoring and Performance Optimization:

5. Deployment and Testing:

6. Training and Documentation:

7. Iterative Improvement and Maintenance:

4. Query to retrieve sales with date and product details: