0% found this document useful (0 votes)
8 views

Data_Warehousing_Lab_Record_Final

Uploaded by

naveenkumarv.edu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data_Warehousing_Lab_Record_Final

Uploaded by

naveenkumarv.edu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

JANSONS INSTITUTE OF TECHNOLOGY

Karumathampatti, Coimbatore

Department of Computer Science and Engineering

Laboratory Record

CCS341 - DATA WAREHOUSING

Name :
Register No. :
Branch : B.E. – Computer Science and Engineering
Semester : V
Academic Year : 2023-2024

1
JANSONS INSTITUTE OF TECHNOLOGY
Karumathampatti, Coimbatore

Department of Computer Science and Engineering

Certified to be the record of laboratory work done by


_________________________________________________________
in the course CCS341- DATA WAREHOUSING during the academic year
2023-2024 is bonafide.

Faculty In-charge Head of Department

The record is submitted for III / V (Year/Semester) B.E. Practical Examination


of Anna University conducted on___________________

Internal Examiner External Examiner

2
INDEX

EXP NO DATE NAME OF THE EXPERIMENT PAGE NO SIGN

Data Exploration and integration


1 4
with WEKA

Applying WEKA tool for data


2 11
Validation

Plan the architecture for real time 14


3
application

Write the query for schema


4 18
definition
Design data ware house for real
5 time applications 20

Analyze the dimensional Modeling


6 24

Case study using OLAP


7 27

8 Case study using OTLP 33

Implementation of warehouse
9 testing 39

Conversion of ARFF File to Text


10 (a) 42
File

Conversion of Text File to ARFF


10 (b) 45
File

3
EX.NO:1 DATE:

Data exploration and integration with WEKA

AIM: To investigate Data exploration and integration with WEKA

Introduction :
Weka (pronounced to rhyme with Mecca) is a workbench that contains a collection of
visualization tools and algorithms for data analysis and predictive modeling, together with
graphical user interfaces for easy access to these functions. The original non-Java version of
Weka was a Tcl /Tk front-end to (mostly third-party) modeling algorithms implemented in other
programming languages, plus data preprocessing utilities in C, and Make file-based system for
running machine learning experiments. This original version was primarily designed as a tool for
analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3),
for which development started in 1997, is now used in many different application areas, in
particular for educational purposes and research. Advantages of Weka include:
Free availability under the GNU General Public License.
▪ Portability, since it is fully implemented in the Java programming language and thus runs
on almost any modern computing platform
▪ A comprehensive collection of data preprocessing and modeling techniques
▪ Ease of use due to its graphical user interfaces

Description:
Open the program. Once the program has been loaded on the user’s machine it is opened by
navigating to the programs start option and that will depend on the user’s operating system.
Figure 1.1 is an example of the initial opening screen on a computer.
There are four options available on this initial screen

4
Fig: 1.1 Weka GUI

1. Explorer - the graphical interface used to conduct experimentation on raw data After
clickingthe Explorer button the weka explorer interface appears.

Fig: 1.2 Pre-processor

5
Inside the Weka explorer window there are six tabs:
1. Preprocess- used to choose the data file to be used by the application.
Open File- allows for the user to select files residing on the local machine or recorded medium
Open URL- provides a mechanism to locate a file or data source from a different location
specified by the user
Open Database- allows the user to retrieve files or data from a database source provided by user

6
2. Classify- used to test and train different learning schemes on the preprocessed data file under
experimentation

Fig: 1.3 choosing Zero set from classify

Again there are several options to be selected inside of the classify tab. Test option gives the user
the choice of using four different test mode scenarios on the data set.
1. Use training set
2. Supplied training set
3. Cross validation
4. Split percentage

3. Cluster- used to apply different tools that identify clusters within the data file.
The Cluster tab opens the process that is used to identify commonalties or clusters of occurrences
within the data set and produce information for the user to analyze.

7
4. Association- used to apply different rules to the data file that identify association within the
data. The associate tab opens a window to select the options for associations within the dataset.

5. Select attributes-used to apply different rules to reveal changes based on selected attributes
inclusion or exclusion from the experiment

6. Visualize- used to see what the various manipulation produced on the data set in a 2D format,
in scatter plot and bar graph output.

2. Experimenter - this option allows users to conduct different experimental variations on data

8
sets and perform statistical manipulation. The Weka Experiment Environment enables the user to
create, run, modify, and analyze experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an experiment that runs
several schemes against a series of datasets and then analyze the results to determine if one of the
schemes is (statistically) better than the other schemes.

Fig: 1.6 Weka experiment

Results destination: ARFF file, CSV file, JDBC database.


Experiment type: Cross-validation (default), Train/Test Percentage Split (data randomized).
Iteration control: Number of repetitions, Data sets first/Algorithms first.
Algorithms: filters

9
3. Knowledge Flow -basically the same functionality as Explorer with drag and drop
functionality. The advantage of this option is that it supports incremental learning from
previous results
4. Simple CLI - provides users without a graphic interface option the ability to
execute commands from a terminal window.
b. Explore the default datasets in weka tool.

Click the “Open file…” button to open a data set and double click on the “data”
directory. Weka provides a number of small common machine learning datasets that
you can use to practiceon. Select the “iris.arff” file to load the Iris dataset.

Fig: 1.7 Different Data Sets in weka

Result:
Thus the Data exploration and Integration with WEKA was studied.

10
EX.NO:2 DATE:

Applying Weka Tool for Data Validation

AIM:
To use the Weka tool for data validation. Students will learn how to load datasets into
Weka, explore data characteristics, handle missing values, detect outliers, and ensure data
quality through various preprocessing techniques.

Algorithm:
Load Dataset Algorithm:
● Open Weka.
● Navigate to the "Explorer" tab.
● Click on the "Open file" button and select a dataset (e.g., from the Weka
datasets or upload a CSV file).
Missing Values Handling Algorithm:
● Explore the dataset using the "Preprocess" panel in the "Explorer" tab.
● Identify missing values using the "Edit" option or utilize filters for missing
value identification.
● Choose an imputation strategy (e.g., mean, median) and apply it using Weka's
preprocessing tools.
Outlier Detection Algorithm:
● Visualize potential outliers using the "Visualize" panel in the "Explorer" tab.
● Utilize scatter plots or box plots to identify outliers.
● Apply filters or transformations to handle outliers using Weka's preprocessing
capabilities.
Data Quality Assurance Algorithm:
● Evaluate data quality using summary statistics and visualizations.
● Apply filters for data quality assurance, such as removing instances with
inconsistent data.
● Examine the impact of filters on data consistency, accuracy, and reliability .

Procedure:
Introduction to Weka:
● Install Weka on your system following the provided guidelines.
● Launch Weka and explore the user interface.

Loading Datasets:
● Load a sample dataset into Weka.
● Examine the dataset details using the "Explorer" tab.

11
Handling Missing Values:
● Identify and visualize missing values.
● Apply imputation techniques to handle missing values.

Outlier Detection:
● Visualize potential outliers.
● Apply outlier handling techniques using Weka's filters.

Data Quality Assurance:


● Evaluate data quality using summary statistics.
● Apply filters for data quality assurance.

Documentation and Analysis:


● Document each step taken during data validation using Weka.
● Include screenshots or outputs from Weka to support documentation.
● Discuss any challenges encountered and solutions devised.
● Reflect on the importance of data validation in machine learning and data analysis.

Output:

12
Result:
Thus, the program for applying Weka tool for data validation is implemented successfully
and the output is verified

13
EX.NO:3 DATE:

Planning the Architecture of Real-Time Application

AIM:

To guide the process of planning the architecture for a real-time application in data
warehousing.

Procedure:

Define Required Algorithms


● Identify the goals of the real-time application.
● Determine the types of data to be processed and the analytics to be performed.
● Specify the data sources and their characteristics.
Architecture Design Algorithm:
● Define the flow of data from source to destination.
● Integrate Weka into the architecture for preprocessing tasks.
● Specify the role of Python in real-time analytics.
● Design the architecture for scalability and optimal performance.
Implementation Steps Algorithm:
● Use Weka for data preprocessing:
● Load the data.
● Handle missing values using Weka's imputation techniques.
● Detect outliers and normalize data.
● Implement real-time analytics with Python:
● Set up data streaming using libraries like Kafka or RabbitMQ.
● Perform feature engineering and data analysis using Pandas and
NumPy.
● Deploy machine learning models for real-time predictions.
Documentation and Best Practices Algorithm:
● Document the entire architecture, including Weka configurations and Python
code.
● Emphasize best practices such as data security, error handling, and
performance optimization.

14
Code:

1. Data Preprocessing with Weka:


# Step 1: Load the data into Weka
java -cp weka.jar weka.gui.explorer.Explorer

# Step 2: Use Weka's filters for handling missing values


java -cp weka.jar weka.filters.unsupervised.attribute.ReplaceMissingValues -i input.arff -o
output.arff

# Step 3: Apply filters for outlier detection and normalization


java -cp weka.jar weka.filters.unsupervised.attribute.Standardize -i input.arff -o
normalized_output.arff

2. Real-time Analytics with Python:

Output:

Step 1: Load the Data into Weka


● Weka Explorer Interface:
● Loaded dataset: example_dataset.arff

Step 2: Handle Missing Values with Weka


● Command Line Output:

15
java -cp weka.jar weka.filters.unsupervised.attribute.ReplaceMissingValues -i
example_dataset.arff -o example_dataset_no_missing.arff

● Successfully replaced missing values in the dataset.

Step 3: Apply Filters for Outlier Detection and Normalization


● Command Line Output:

java -cp weka.jar weka.filters.unsupervised.attribute.Standardize -i


example_dataset_no_missing.arff -o example_dataset_normalized.arff

● Outliers detected and normalized dataset saved as example_dataset_normalized.arff.

Python Sample Output:

Step 1: Set Up Data Streaming with Kafka


● Command Line Output:

● Producer script running, streaming real-time data to Kafka topic.

Step 2: Perform Feature Engineering and Data Analysis in Python


● Python Code Output:

Step 3: Deploy Machine Learning Models with Flask


● Command Line Output

16
● Flask app running, ready to serve machine learning models.

Result:
Thus the program has been completed successfully and the output is verified.

17
EX.NO:4 DATE:

Write the query for schema definition

Aim:
Create a schema definition query for a simple database that stores information about
books, including their title, author, publication year, and genre.

Algorithm:

Step 1: Identify the entities: In this case, you have a "Book" entity.
Step 2: Define attributes: Determine the attributes each book should have, such as title,
author, publication year, and genre.
Step 3: Determine data types: Choose appropriate data types for each attribute
(e.g.,VARCHAR for title and author, INTEGER for publication year).
Step 4: Set primary key: Decide on a primary key for the table (e.g., book_id).
Step 5: Establish relationships (if applicable): If there are multiple tables, define
relationships between them.

Program:
# generate_arff.py

import random

# Function to generate a random book title


def generate_title():
adjectives = ["The", "A", "My", "Your", "His", "Her"]
nouns = ["Catcher", "Mockingbird", "1984", "Pride", "Prejudice", "Hobbit"]
return f"{random.choice(adjectives)} {random.choice(nouns)}"

# Function to generate a random author name


def generate_author():
first_names = ["John", "Jane", "Robert", "Emily", "William", "Mary"]
last_names = ["Smith", "Johnson", "Williams", "Jones", "Brown", "Davis"]
return f"{random.choice(first_names)} {random.choice(last_names)}"

# Function to generate a random publication year


def generate_publication_year():
return random.randint(1800, 2022)

# Function to generate a random genre


def generate_genre():

18
genres = ["Fiction", "Dystopian", "Romance", "Mystery", "Fantasy", "Science
Fiction"]
return random.choice(genres)

# Generate ARFF content


arff_content = '''@relation Books

@attribute title string


@attribute author string
@attribute publication_year numeric
@attribute genre string
@data
# Generate 50 data entries
for _ in range(50):
arff_content +=
f"'{generate_title()}','{generate_author()}',{generate_publication_year()},'{generate_ge
nre()}'\n"

# Write content to ARFF file


with open('books_dataset.arff', 'w') as file:
file.write(arff_content)

print("ARFF file generated successfully: books_dataset.arff")

To run the above program:


Step 1: Save it as generate_arff.py.
Step 2: Open a terminal or command prompt.
Step 3: Navigate to the directory where the script is saved.
Step 4: Run python generate_arff.py

Sample output:
In bash
ARFF file generated successfully: books_dataset.arff

Result:
Once you execute the query, the result would be the creation of a table in your database
named "Books" with the defined structure. You can then use this table to store
information about books in your database.

19
EX.NO:5 DATE:

Design data ware house for real time applications

Aim:

Design a data warehouse for real-time applications, focusing on building a star


schema and implementing a basic OLAP cube.

Algorithm:

Step 1:Data Modeling:


○ Identify key entities (e.g., products, customers, orders).
○ Define dimensions and fact tables in a star schema.
Step 2:ETL Process:
○ Extract real-time data from the source (e.g., incoming orders).
○ Transform and clean the data as needed.
○ Load the data into the data warehouse.
Step 3:OLAP Implementation:
○ Aggregate sales data based on dimensions (e.g., product, customer).
○ Build an OLAP cube using aggregated data.
Step 4:Querying the Data Warehouse:
○ Formulate queries based on business requirements.
○ Execute queries against the data warehouse to extract insights.
Step 5:Optimization:
○ Analyze query patterns to identify indexing needs.
○ Create indexes on key columns for faster querying.
Step 6:Display Sample Output:
○ Execute a sample query to retrieve a subset of data.
○ Display the results to showcase the data warehouse's capabilities.
Step 7:Conclusion and Further Steps:
○ Summarize key learnings from each step in the data warehouse design.
○ Suggest further steps, such as advanced ETL processes or additional data
source integration.
Step 8:Cleanup and Closure:
○ Close any open database connections.
○ Clean up temporary files or resources used in the lab.

20
Program:

Step 1: Data Modeling:

import pandas as pd

# Create sample dataframes for products, customers, and orders


products_df = pd.DataFrame({'product_id': [1, 2, 3], 'product_name': ['A', 'B', 'C']})
customers_df = pd.DataFrame({'customer_id': [101, 102, 103], 'customer_name': ['John',
'Alice', 'Bob']})
orders_df = pd.DataFrame({'order_id': [201, 202], 'product_id': [1, 2], 'customer_id': [101,
102]})

# Merge dataframes to create the star schema


star_schema = pd.merge(orders_df, products_df, on='product_id').merge(customers_df,
on='customer_id')
print(star_schema)

Step 2: ETL Process

Step 3: OLAP Implementation


from pycubelib import PyCube
# Assuming we have aggregated sales data
sales_data = aggregate_sales_data()

# Build OLAP cube


cube = PyCube.build_cube(sales_data, dimensions=['product', 'customer'], measures=['sales'])

Step 4: Querying the Data Warehouse


# Assuming 'cube' is the OLAP cube built in Step 3
# Example query to get total sales per product and customer
query_result = cube.query("SELECT product, customer, SUM(sales) FROM cube GROUP
BY product, customer")

21
# Displaying the sample output
print("Product\t\tCustomer\tTotal Sales")
print("-----------------------------------")
for row in query_result:
print(f"{row['product']}\t\t{row['customer']}\t\t{row['SUM(sales)']}")

Sample output:

Step 1:

Step 2:

22
Step 3:

Step 4:

Result:

Thus the data warehouse for real time application is designed successfully

23
EX.NO:6 DATE:

Analyze the dimensional Modeling

AIM:

To design multi-dimensional data models - Star, Snowflake and Fact Constellation


schemas for sales enterprise data

Procedure:
Schema is a logical description of the entire database. It includes the name and description
of records of all record types including all associated data-items and aggregates. Much like
a database, a data warehouse also requires to maintain a schema. A database uses relational
model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema.
Star Schema

• Each dimension in a star schema is represented with only one-dimension table.


• This dimension table contains the set of attributes.
• The following diagram shows the sales data of a company with respect to the four
dimensions, namely time, item, branch, and location.

• There is a fact table at the center. It contains the keys to each of four dimensions.
• The fact table also contains the attributes, namely dollars sold and units sold.

24
Snowflake Schema

• Some dimension tables in the Snowflake schema are normalized.


• The normalization splits up the data into additional tables.
• Unlike Star schema, the dimensions table in a snowflake schema are normalized.
For example, the item dimension table in star schema is normalized and split into
two dimension tables, namely item and supplier table.

• Now the item dimension table contains the attributes item_key, item_name, type,
brand, and supplier-key.
• The supplier key is linked to the supplier dimension table. The supplier dimension
table contains the attributes supplier_key and supplier_type.

Fact Constellation Schema

• A fact constellation has multiple fact tables. It is also known as galaxy schema.
• The following diagram shows two fact tables, namely sales and shipping.

25
• The sales fact table is same as that in the star schema.
• The shipping fact table has the five dimensions, namely item_key, time_key,
shipper_key, from_location, to_location.
• The shipping fact table also contains two measures, namely dollars sold and units
sold.
• It is also possible to share dimension tables between fact tables. For example, time,
item, and location dimension tables are shared between the sales and shipping fact
table.

Result:
Star, Snowflake and Fact Constellation schemas for sales enterprise data has been
designed using DBDesigner Tool.

26
EX.NO:7 DATE:

Case study using OLAP

AIM:

Develop a comprehensive case study utilizing OLAP technology to analyze and showcase
its effectiveness in data management and decision-making processes.

Procedure:

1. OLAP Operations:
Since OLAP servers are based on multidimensional view of data, we will
discuss OLAP
a. operations in multidimensional data.
b. Here is the list of OLAP operations
c. Roll-up (Drill-up)
d. Drill-down
e. Slice and dice
f. Pivot (rotate)
2. Roll-up (Drill-up):
a. Roll-up performs aggregation on a data cube in any of the following ways
b. By climbing up a concept hierarchy for a dimension
c. By dimension reduction
d. Roll-up is performed by climbing up a concept hierarchy for the dimension
location.
e. Initially the concept hierarchy was "street < city < province < country".
f. On rolling up, the data is aggregated by ascending the location hierarchy from
the level of city to the level of country.
g. The data is grouped into cities rather than countries.
h. When roll-up is performed, one or more dimensions from the data cube are
removed.
3. Drill-down:
a. Drill-down is the reverse operation of roll-up. It is performed by either of the
following ways
b. By stepping down a concept hierarchy for a dimension
c. By introducing a new dimension.
d. Drill-down is performed by stepping down a concept hierarchy for the
dimension time.
e. Initially the concept hierarchy was "day < month < quarter < year."

27
f. On drilling down, the time dimension is descended from the level of quarter
to the level
g. of month.
h. When drill-down is performed, one or more dimensions from the data cube
are added.
i. It navigates the data from less detailed data to highly detailed data.
4. Slice:
a. The slice operation selects one particular dimension from a given cube and
provides a
b. new sub-cube.
5. Dice:
a. Dice selects two or more dimensions from a given cube and provides a new
sub-cube.
6. Pivot (rotate):
a. The pivot operation is also known as rotation. It rotates the data axes in view
in order to
b. provide an alternative presentation of data.
c. Now, we are practically implementing all these OLAP Operations using
Microsoft
d. Excel.
e. Procedure for OLAP Operations:
f. 1.Open Microsoft Excel, go toData tab in top & click on ―Existing
Connections”.
g. 2. Existing Connections window will be opened, there “Browse for
more”option should be
h. clicked for importing .cub extension file for performing OLAP Operations.
For sample, I took
i. music.cub file.

28
3.As shown in above window, select ―PivotTable Report” and click “OK”.
4.We got all the music.cub data for analyzing different OLAP Operations.Firstly, we
performed
drill-down operation as shown below.

In the above window, we selected year „2008‟ in „Electronic‟ Category, then


automatically the Drill-Down option is enabled on top navigation options. We will click on
„Drill-Down‟ option, then the below window will be displayed.

5. Now we are going to perform roll-up (drill-up) operation, in the above window I selected
January month then automatically Drill-up option is enabled on top. We will click on Drill-up
option, then the below window will be displayed.

29
6. Next OLAP operation Slicing is performed by inserting slicer as shown in top navigation
Options.

While inserting slicers for slicing operation, we select 2 Dimensions (for e.g.
CategoryName & Year) only with one Measure (for e.g. Sum of sales).After inserting a
slice&
adding a filter (CategoryName: AVANT ROCK & BIG BAND; Year: 2009 & 2010), we will
get
table as shown below.

30
7. Dicing operation is similar to Slicing operation. Here we are selecting 3 dimensions
(CategoryName, Year, RegionCode)& 2 Measures (Sum of Quantity, Sum of Sales) through
„insert slicer‟ option. After that adding a filter for CategoryName, Year & RegionCode as
shown below.

8. Finally, the Pivot (rotate) OLAP operation is performed by swapping rows (Order Date-
Year)
& columns (Values-Sum of Quantity & Sum of Sales) through right side bottom navigation
bar
as shown below.

31
After Swapping (rotating), we will get resultant as represented below with a pie-chart for
Category-Classical& Year Wise data.

Result:
Thus the case study using OLAP has been completed successfully.

32
EX.NO:8 DATE:

Case study using OTLP

AIM:

The primary objective of this lab practical is to provide students with a comprehensive
understanding of Online Transaction Processing (OLTP) through a practical case study. By
the end of this exercise, students should be adept at designing OLTP databases, managing
transactions, handling concurrency, ensuring data integrity, and optimizing performance.

Procedure:
In this scenario, we will focus on the implementation of OLTP in the context of a small-scale
e-commerce platform. The platform needs to manage customer information, product
inventory, and order processing efficiently. Students will go through the process of setting up
the database, simulating transactions, addressing concurrency issues, exploring isolation
levels, maintaining data integrity, and optimizing system performance.

Steps:

1. Database Setup:

1.1 Create a new database:


Begin by creating a new database named eCommerceDB using your preferred relational
database management system (RDBMS). For instance, in MySQL, you would execute:

1.2 Design and implement tables:


Design and implement the necessary tables for managing customer information (Customers),
product inventory (Products), and order details (Orders). Consider relationships between
tables, ensuring data normalization and integrity.

33
2. Transaction Simulation:

2.1 Insert sample data:


Simulate customer transactions by inserting sample data into the Customers and Products
tables.

2.2 Simulate customer orders:

34
Insert orders into the Orders table, ensuring each order involves multiple products and links
to the respective customers.

3. Concurrency Control :

3.1 Implementing concurrency control:


Explore and implement mechanisms for handling concurrent transactions. Discuss techniques
such as locking, isolation levels, and transactions.

3.2 Simulate concurrent transactions:


Create scenarios where multiple users place orders simultaneously. Observe and document
any issues related to concurrency, such as deadlocks.

4. Isolation Levels:

4.1 Set different isolation levels:


Explore and set different isolation levels for transactions (e.g., Read Committed, Repeatable
Read).

4.2 Observe and document behavior:


Execute transactions at each isolation level and observe the differences in behavior.

35
Document the advantages and disadvantages of each isolation level.

5. Data Integrity :

5.1 Implement constraints:


Ensure data integrity by implementing constraints such as primary keys, unique constraints,
and foreign keys.

5.2 Violate constraints intentionally:


Attempt to violate these constraints intentionally and observe the system's response.
Document how the system handles constraint violations.

6. Performance Testing:

6.1 Measure System Performance:


In this section, students will simulate a substantial number of transactions to gauge the
system's performance. This involves executing a series of transactions and monitoring key
performance metrics, such as response time and resource utilization. For example:

6.2 Optimize Database Schema or Queries:

36
After measuring the initial system performance, students are expected to optimize the
database schema or queries to enhance efficiency. This could involve indexing,
denormalization, or query optimization strategies. For instance:

Subsequently, students should rerun the performance tests, comparing the results before and
after optimization.

7. Documentation:

7.1 Comprehensive Documentation:


This section emphasizes the importance of providing detailed documentation for the entire
lab. Students are required to compile a report covering various aspects of the case study. The
documentation should include:

● Database Schema Description: Offer an overview of the designed database schema,


explaining the purpose of each table, relationships between tables, and any constraints
applied.
● Transaction Simulation: Describe the steps taken to simulate customer transactions.
Include examples of SQL queries used to insert sample data into the Customers,
Products, and Orders tables.
● Concurrency Control: Detail the mechanisms implemented to handle concurrent
transactions. If any issues or deadlocks occurred during the simulation, document
them along with the solutions.
● Isolation Levels: Discuss the exploration and implementation of different isolation
levels. Provide insights into how each isolation level affected transaction behavior.
● Data Integrity: Explain how constraints were implemented to ensure data integrity.
Document intentional attempts to violate these constraints and the system's response.

37
● Performance Testing: Present the methodology for performance testing, including
the SQL scripts used to simulate a large number of transactions. Include any
optimizations made to enhance system performance.
● Challenges Faced: Acknowledge any challenges encountered during the lab and
describe how they were addressed. This could include difficulties in implementing
certain features, managing concurrency, or optimizing performance.
● Conclusion: Summarize the key findings and lessons learned from the lab. Reflect on
the significance of OLTP concepts in the context of the case study.

Result:

Thus the case study using OLTP has been completed successfully.

38
EX.NO: 9 DATE:

Implementation of warehouse testing

AIM:

To apply the Navie Bayes Classification for testing the given dataset.

Algorithm:

1. Open the weka tool.


2. Download a dataset by using UCI.
3. Apply replace missing values.
4. Apply normalize filter.
5. Click the Classification Tab.
6. Apply Navie Bayes Classification.
7. Find the Classified Value.
8. Note the output.

Bayes’ Theorem In the Classification Context:

X is a data tuple. In Bayesian term it is considered “evidence”. H is some hypothesis


that X belongs to a specified class C .P(H|X) is the posterior probability of H
conditioned on X .

Example: predict whether a costumer will buy a computer or not " Costumers are
described by two attributes: age and income " X is a 35 years-old costumer with an
income of 40k " H is the hypothesis that the costumer will buy a computer " P(H|X)
reflects the probability that costumer X will buy a computer given that we know the
costumers’ age and income.

Input Data:

39
Output Data:

40
Result:
Thus the Navie Bayes Classification for testing the given dataset is implemented.

41
EX.NO:10 (a) DATE:

CONVERSION OF TEXT FILE INTO ARFF FILE

Aim:
To convert a text file to ARFF(Attribute-Relation File Format) using Weka3.8.2 tool.

Objectives:

Most of the data that we have collected from public forum is in the text format that
cannot be read by Weka tool. Since Weka (Data Mining tool) recognizes the data in
ARFF format only we have to convert the text file into ARFF file.

Algorithm:

1. Download any data set from UCI data repository.


2. Open the same data file from excel. It will ask for delimiter (which produce column) in
excel.
3. Add one row at the top of the data.
4. Enter header for each column.
5. Save file as .CSV (Comma Separated Values) format.
6. Open Weka tool and open the CSV file.
7. Save it as ARFF format.

Output:

Data Text File:

42
Data ARFF File:

Result:

Thus, conversion of a text file to ARFF(Attribute-Relation File Format) using Weka3.8.2 tool
is implemented

43
EX.NO:10 (b) DATE:

CONVERSION OF ARFF TO TEXT FILE

Aim:
To convert ARFF (Attribute-Relation File Format) into text file.

Objectives:

Since the data in the Weka tool is in ARFF file format we have to convert the
ARFF file to text format for further processing.

Algorithm:

1. Open any ARFF file in Weka tool.


2. Save the file as CSV format.
3. Open the CSV file in MS-EXCEL.
4. Remove some rows and add corresponding header to the data.
5. Save it as text file with the desire delimiter.

Data ARFF File:

44
Data Text File:

Result:

Thus conversion of ARFF (Attribute-Relation File Format) into text file is


implemented.

45

You might also like