Data Partitioning

The document discusses using a multi-agent data mining framework to perform parallel and distributed association rule mining on large datasets. It explores two approaches: horizontal data partitioning which divides the records among agents, and vertical partitioning which divides the attributes. The performance of these two approaches is evaluated through experiments on the framework.

Uploaded by

Kamal Albashiri

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

61 views

Data Partitioning

Uploaded by

Kamal Albashiri

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Partitioning and Parallel/Distributed Processing Association Rule

Mining Using MultiAgent Data Mining Framework

Kama Ali Albashiri
Department of Computer Science, Faculty of Accounting, Algabel Algarbi University
Gharin, Libya
[email protected]

Frans Coenen
Department of Computer Science, The University of Liverpool
Liverpool, L69 3BX, UK
[email protected]

Abstract

This paper explores and demonstrates (by experiment) the capabilities of Multi-Agent
Data Mining (MADM) System in the context of parallel and distributed Data Mining
(DM). The exploration is conducted by considering a specific parallel/distributed DM
scenario, namely data partitioning to achieve parallel/distributed ARM. To facilitate the
partitioning a compressed set enumeration tree data structure (the T-tree) is used together
with an associated ARM algorithm (Apriori-T). The aim of the scenario is to demonstrate
that the MADM vision is capable of exploiting the benefits of parallel computing;
particularly parallel query processing and parallel data accessing. In addition the
approach described offers significant advantages with respect to computational efficiency
when compared to alternative mechanisms for (a) dividing the input data between
processors (agents) and (b) achieving distributed/parallel ARM.

Keywords: Meta Mining, Multi-Agent Data Mining, Frequent Itemsets,

Association Rule Mining.

Introduction
A common feature of most DM tasks is that they are resource intensive and operate on
large sets of data. Data sources measured in gigabytes or terabytes are quite common in
DM. This has called for fast DM algorithms that can mine very large databases in a
reasonable amount of time. However, despite the many algorithmic improvements
proposed in many serial algorithms, the large size and dimensionality of many databases
makes the DM of such databases too slow and too big to be processed using a single
process. There is therefore a growing need to develop efficient parallel DM algorithms
that can run on distributed systems.
There are several ways in which data distribution can occur, and these require different
approaches to model construction, including:
• Horizontal Data Distribution. The most straight forward form of distribution is
horizontal partitioning, in which different records are collected at different sites, but each
record contains all of the attributes for the object it describes. This is the most common
and natural way in which data may be distributed. For example, a multinational company
deals with customers in several countries, collecting data about different customers in
each country. It may want to understand its customers worldwide in order to construct a
global advertising campaign.
• Vertical Data Distribution. The second form of distribution is vertical partitioning, in
which different attributes of the same set of records are collected at different sites. Each
site collects the values of one or more attributes for each record and so, in a sense, each
site has a different view of the data. For example, a credit-card company may collect data
about transactions by the same customer in different countries and may want to treat the
transactions in different countries as different aspects of the customers total card usage.
Vertically partitioned data is still rare, but it is becoming more common and important
[85].
This paper also addresses a second generic MADM scenario, that of distributed/parallel
DM. This scenario assumes an end user who owns a large data set and wishes to obtain
DM results but lacks the required resources (i.e. processors and memory). The data set is
partitioned into horizontal or vertical partitions that can be distributed among a number of
processors (agents) and independently processed, to identify local itemsets, on each
process.
In the exploration of the applicability of MADM to parallel/distributed ARM, the two
parallel ARM approaches, based on the Apriori algorithm, are described and their
performance evaluated as indicated above. Recall that DATA-HS makes use of a
horizontal partitioning of the data. The data is apportioned amongst the processes,
typically by horizontally segmenting the dataset into sets of records.
DATA-VP makes use of a vertical partitioning approach to distributing the input dataset
over the available number of DM (worker) agents. To facilitate the vertical data
partitioning the tree data structure, described in Paper 5, Section 5.3, is again used
together with the Apriori-T ARM algorithm [31]. Using both approaches each partition
can be mined in isolation, while at the same time taking into account the possibility of the
existence of frequent itemsets dispersed across two or more partitions. In the first
approach, DATA-HS, the scenario complements the meta ARM scenario described in the
previous paper.

The rest of paper is organized as follows: In Section 6.1 some background on data
distribution and the motivation for the scenario is described. Data partitioning is
introduced in Section 6.2. Data partitioning may be achieved in either a horizontal or
vertical manner. In Section 6.3 a parallel/distributed task with Data Horizontal
Segmentation (DATA-HS) algorithm is described. Before describing the vertical
approach the Apriori-T algorithm is briefly described in Section 6.4.
The parallel/distributed task with Data Vertical Partitioning (DATA-VP) algorithm
(which is founded on Apriori-T) is then described in Section 6.5. The DATA-VP MADM
task architecture and network configuration is presented in Section 6.6. Experimentation
and Analysis, comparing the operation of DATA-HS and DATA-VP, is then presented in
Section 6.7. Discussion of how this scenario addresses the goal of this paper is presented
in Section 6.8. Finally a summary is given in Section 6.9.

Data Segmentation and Partitioning

Notwithstanding the extensive work that has been done in the field of ARM, there still
remains a need for the development of faster algorithms and alternative heuristics to
increase their computational efficiency. Because of the inherent intractability of the
fundamental problem, much research effort has been directed at parallel ARM to
decrease overall processing times (see [51, 95, 109, 118]), and distributed ARM to
support the mining of datasets distributed over a network [27]. The main challenges
associated with parallel DM include:
• Minimising I/O.
• Minimising synchronisation and communication.
• Effective load balancing.
• Effective data layout (horizontal vs. vertical).
• Good data decomposition.
• Minimising/avoiding duplication of work.
To allow the data to be mined using a number of cooperating agents the most obvious
approach is to allocate different subsets of the data to each agent. There are essentially
two fundamental approaches to partitioning/segmenting the data:
1. Horizontal segmentation where the data is divided according to row number.
2. Vertical partitioning where the data is divided according to column number.
Note that in this paper the term partitioning is used to indicate vertical subdivision of
data, and segmentation to indicate horizontal sub-division of data.
Horizontal segmentation, is in general more straightforward. Assuming a uniform/
homogeneous dataset it is sufficient to divide the number of records by the number of
available agents and allocate each resulting segment accordingly.
The most natural method to vertically partition a dataset is to divide the number of
columns by the number of available agents so each is allocated an equal number of
columns.
Many parallel DM algorithms have been developed based on the Apriori algorithm or
variations of the Apriori algorithm. The most common parallel methods are [4, 51]:
• Count Distribution. This method follows a data-parallel strategy and statically partitions
the database into horizontal partitions that are independently scanned for the local counts
of all candidate itemsets on each process. At the end of each iteration, the local counts are
summed across all processes to form the global counts so that frequent itemsets can be
identified.
• Data Distribution. The Data Distribution method attempts to utilize the aggregate main
memory of parallel machines by partitioning both the database and the candidate
itemsets. Since each candidate itemset is counted by only one process, all processes have
to exchange database partitions during each iteration in order for each process to get the
global counts of the assigned candidate itemsets.
• Candidate Distribution. The Candidate Distribution method also partitions candidate
itemsets but selectively replicates, instead of “partitionand-exchanging” the database
transactions, so that each process can proceed independently.
Experiments show that the Count Distribution method exhibits better performance and
scalability than the other two methods [4]. The steps for the Count Distribution method
may be generalised as follows (for distributed-memory multiprocessors):
1. Divide the database evenly into horizontal partitions among all processes.
2. Each process scans its local database partition to collect the local count of each item.
3. All processes exchange and sum up the local counts to get the global counts
of all items and find frequent 1-itemsets.
4. Set level k = 2.
5. All processes generate candidate k-itemsets from the mined frequent (k-1)-itemsets.
6. Each process scans its local database partition to collect the local count of each
candidate k-itemset.
7. All processes exchange and sum up the local counts into the global counts of all
candidate k-itemsets and find frequent k-itemsets among them.
8. Repeat (5) - (8) with k = k + 1 until no more frequent itemsets are found.
In the following sections two MADM tasks, using both vertical partitioning and
horizontal segmentation, are introduced. These tasks were implemented using the task
wrapper, as described in Paper 4, so that they could be incorporated into EMADS as task
agents.
6.3 The Parallel/Distributed Task with Horizontal Segmentation (DATA-HS) Algorithm
The Data Horizontal Segmentation (DATA-HS) algorithm uses horizontal segmentation,
dividing the dataset into segments each containing an equal number of records. ARM in
this case involves the generation of a number of T-trees, holding frequent itemsets, one
for each segment; and then merging these T-trees to create one global T-tree. As a
demonstration of MADM re-usability, this is carried out using the meta ARM task agent
which is defined from the previous scenario.
Assuming that a data agent representing the large dataset has been launched by a user, the
DATA-HS MADM algorithm comprises the following steps:
1. User agent requests the task agent to horizontally segment the dataset according
to the total number of segments required.
2. The task agent assigns and sends each data segment to an interested data agent; if none
exist then the task agent launches new data agents.
3. Then a meta ARM task is called to obtain the Association Rules (ARs) as described in
Paper 5.
6.4 The Apriori-T Algorithm
The Apriori-T (Apriori Total) algorithm is an Association Rule Mining (ARM) algorithm
[33] that combines the classic Apriori ARM algorithm with the T-tree data structure. As
each level is processed, candidates are added as a new level of the T-tree, their support is
counted, and those that do not reach the required support threshold pruned. When the
algorithm terminates, the T-tree contains only frequent itemsets. The Apriori-T algorithm
was developed as part of the more sophisticated ARM algorithm The Apriori-TFP. The
Apriori and Apriori-TFP algorithms were described in Paper 2, Subsection 2.1.1.1.
At each level, new candidate itemsets of size k are generated from identified frequent k-1
itemsets, using the downward closure property of itemsets, which in turn may necessitate
the inspection of neighbouring branches in the T-tree to determine if a particular k-1
subset is supported. This process is referred to as X-checking. Note that X-checking adds
a computational overhead; offset against the additional effort required to establish
whether a candidate k itemset, all of whose k-1 itemsets may not necessarily be
supported, is or is not a frequent itemset.
The number of candidate nodes generated during the construction of a T-tree, and
consequently the computational effort required, is very much dependent on the
distribution of columns within the input data. Best results are produced by ordering the
dataset, according to the support counts for the 1-itemsets, so that the most frequent 1-
itemsets occur first [30].
6.5 The Parallel/Distributed Task with Vertical
Partitioning (DATA-VP) Algorithm
The second algorithm considered in the exploration of the applicability of MADM to
parallel/distributed ARM is the Data Vertical Partitioning (DATA-VP). The DATA-VP
algorithm commences by distributing the input dataset over the available number of
workers (DM agents) using a vertical partitioning strategy. Initially the set of single
attributes (columns) is split equally between the available workers so that an
allocationItemSet (a sequence of single attributes) is defined for each DM agent in terms
of a startColNum and endColNum:
allocationItemSet = {n|startColNum < n · endColNum}
Each DM agent will have its own allocationItemSet which is then used to determine the
subset of the input dataset to be considered by the DM agent.
Using its allocationItemSet the task agent will partition the data among workers (DM
agents) as follows:
1. Remove all records in the input dataset that do not intersect with the allocationItemSet.
2. From the remaining records remove those attributes whose column number is greater
than endColNum. Attributes whose identifiers are less than startColNum cannot be
removed because these may still need to be included in the subtree counted by the DM
agent.
3. Send the allocated data partition to the corresponding DM agent.
The input dataset distribution procedure, given an allocationItemSet, can be summarised
as follows:
8 records 2 input data
if (record \ allocationItemSet ´ true)
record = {n|n 2 record, n · endColNum}
else delete record
TID Item Set
1 acf
2b
3 ace
4 bd
5 ae
6 abc
7d
8 ab
9c
10 abd
Table 6.1: Dataset Example
As an example, the ordered data set in Table 6.1 has items with 6 attributes, a, b, c, d, e
and f. Assuming three worker agents are participating, the above partitioning process will
result in three dataset partitions, with allocationItemSets {a, b}, {c, d} and {e, f}.
Application of the above algorithm will create partitions as follows (but note that the
empty sets, here shown for clarity, will in fact not be included in the partitions):
Partition 1 (a to b): {{a}, {b}, {a}, {b}, {a}, {a, b}, {}, {a, b}, {}, {a, b}}
Partition 2 (c to d): {{a, c}, {}, {a, c}, {b, d}, {}, {a, b, c}, {d}, {}, {c},{a, b, d}}
Partition 3 (e to f): {{a, c, f}, {}, {a, c, e}, {}, {a, e}, {}, {}, {}, {}}
Once partitioning is complete each partition can be mined, using the Apriori-T algorithm,
in isolation while at the same time taking into account the possibility of the existence of
frequent itemsets dispersed across two or more partitions.
Figure 6.1 shows the resulting sub T-trees assuming all combinations represented by each
partition are supported. Note that because the input dataset is ordered according to the
frequency of 1-itemsets the size of the individual partitioned sets does not necessarily
increase as the endColNum approaches N (the number of columns in the input dataset); in
the later partitions, the lower frequency leads to more records being eliminated. Thus the
computational effort required to process each partition is roughly balanced.
Figure 6.1: Vertical Partitioning of a T-tree Example [31] The DATA-VP MADM task
can thus be summarised as follows:
1. A task agent starts a number of workers (DM agents); these will be referred to as
workers.
2. The task agent determines the division of allocationItemSet according to the total
number of available workers (agents) and transmits this information to them.
3. The task agent transmits the allocated partition of the data (calculated as described
above) to each worker.
4. Each worker then generates a T-tree for its allocated partition (a sub T-tree of the final
T-tree).
5. On completion each DM (worker) agent transmits its partition of the T-tree to the task
agent which are then merged into a single global T-tree (the final T-tree ready for the
next stage in the ARM process, rule generation).
The local T-tree generation process begins with a top-level “tree” comprising only those
1-itemsets included in each worker (DM agent) allocationItemSet.
The DM agent will then generate the candidate 2-itemsets that belong in its sub (local) T-
tree. These will comprise all the possible pairings between each element in the
allocationItemSet and the lexicographically preceding attributes of those elements (see
Figure 6.1). The support values for the candidate 2-itemsets are then determined and the
sets pruned to leave only frequent 2-itemsets. Candidate sets for the third level are then
generated. Again, no attributes from succeeding allocationItemSet are considered, but the
possible candidates will, in general, have subsets which are contained in preceding
allocationItemSet and which, therefore, are being counted by some other DM agents. To
avoid the overhead involved in the X-checking process, described in Section 6.4, which
in this case would require message-passing between the DM agents concerned, X-
checking does not take place. Instead, the DM agent will generate its candidates
assuming, where necessary, that any subsets outside its local T-tree are frequent.

DATA-VP Task Architecture and Network Configuration

The DATA-VP task architecture shown in Figure 6.2 assumes the availability of at least
one worker (DM agent), preferably more. Figure 6.2 shows the assumed distribution of
agents and shared data across the network. The figure also shows the house-keeping
JADE agents (AMS and DF) through which agents find each other.
Messaging

Parallel/distributed ARM tends to entail much exchange of data messaging as the task
proceeds. Messaging represents a significant computational overhead, in some cases
outweighing any other advantage gained. Usually the number of messages sent and the
size of the content of the message are significant factors affecting performance. It is
therefore expedient, in the context of the techniques described here, to minimize the
number of messages that are required to be sent as well as their size.
The technique described here is One-to-Many approach, where only the task Figure 6.2:
Parallel/Distributed ARM Model for DATA-VP Task Architecture agent can
send/receive messages to/from DM agents. This involves fewer operations, although, the
significance of this advantage decreases as the number of agents used increases.

Experimentation and Analysis

To evaluate the two approaches, in the context of the EMADS vision, a number of
experiments were conducted. These are described and analysed in this section.
The experiments presented here used up to six data partitions and two artificial datasets:
(i) T20.D100K.N250.num, and (ii) T20.D500K.N500.num where T = 20 (average
number of items per transactions), D = 100K or D = 500K (Number of transactions), and
N = 500 or N = 250 (Number of items) are used. The datasets were generated using the
IBM Quest generator used in Agrawal and Srikant [2].
As noted above the most significant overhead of any parallel/distributed system is the
number and size of messages sent and received between agents. For the DATA-VP
EMADS approach, the number of messages sent is independent of the number of levels
in the T-tree; communication takes place only at the end of the tree construction. DATA-
VP passes entire pruned sub (local) T-tree branches.
(a) Number of Data Partitions (b) Support Threshold Figure 6.3: Average of Execution
Time for Dataset T20.D100K.N250.num (a) Number of Data Partitions (b) Support
Threshold Figure 6.4: Average of Execution Time for Dataset T20.D500K.N500.num
Therefore, DATA-VP has a clear advantage in terms of the number of messages sent.
Figure 6.3 and Figure 6.4 show the effect of increasing the number of data partitions with
respect to a range of support thresholds. As shown in Figure 6.3 the DATA-VP algorithm
shows better performance compared to the DATA-HS algorithm. This is largely due to
the smaller size of the dataset and the T-tree data structure which: (i) facilitates vertical
distribution of the input dataset, and (ii) readily lends itself to parallelization/distribution.
However, when the data size is increased as in the second experiment, and further DM
(worker) agents are added (increasing the number of data partitions), the results shown
in Figure 6.4, show that the increasing overhead of messaging size outweighs any gain
from using additional agents, so that parallelization/distribution becomes counter
productive. Therefore DATA-HS showed better performance from the addition of further
data agents compared to the DATA-VP approach.

Discussion
MADM can be viewed as an effective distributed and parallel environment where the
constituent agents function autonomously and (occasionally) exchange information with
each other. EMADS is designed with asynchronous, distributed communication protocols
that enable the participating agents to operate independently and collaborate with other
peer agents as necessary, thus eliminating centralized control and synchronization
barriers.
Distributed and parallel DM can improve both efficiency and scalability first by
executing the DM processes in parallel improving the run-time efficiency and second, by
applying the DM processes on smaller subsets of data that are properly partitioned and
distributed to fit in main memory (a data reduction technique).
The scenario, described in this paper, demonstrated that MADM provides suitable
mechanisms for exploiting the benefits of parallel computing; particularly parallel data
processing. The scenario also demonstrated that MADM is suitable for re-usability and
illustrated how it is supported by re-employing the meta ARM task agent, described in
the previous paper, with the DATA-HS task.

Conclusion

In this paper a MADM method for parallel/distributed ARM has been described so as to
explore the MADM issues of scalability and re-usability. Scalability is explored by
parallel processing of the data and re-usability is explored by reemploying the meta ARM
task agent with the DATA-HS task.
The solution to the scenario considered in this paper made use of a vertical data
partitioning or a horizontal data segmentation technique to distribute the input data
amongst a number of agents. In the horizontal data segmentation (DATA-HS) method,
the dataset was simply divided into segments each comprising an equal number of
records. Each segment was then assigned to a data agent that allowed for using the meta
ARM task when employed on EMADS. Each DM agent then used its local data agent to
generate a complete local T-tree for its allocated segment. Finally, the local T-trees were
collated into a single tree which contained the overall frequent itemsets. The proposed
vertical partitioning (DATA-VP) was facilitated by the T-tree data structure, and an
associated mining algorithm (Apriori-T), that allowed for computationally effective
parallel/distributed ARM when employed on EMADS.
The reported experimental results showed that the data partitioning methods described
are extremely effective in limiting the maximal memory requirements of the algorithm,
while their execution time scale only slowly and linearly with increasing data
dimensions. Their overall performance, both in execution time and especially in memory
requirements has brought significant improvement.

ORCL - Oracle Cloud Data Management 2023 Foundations Associate (1Z0-1105-23) Final Exam - 2
100% (1)
ORCL - Oracle Cloud Data Management 2023 Foundations Associate (1Z0-1105-23) Final Exam - 2
7 pages
DATA STRUCTURE AND ALGORITHMS Notes
100% (1)
DATA STRUCTURE AND ALGORITHMS Notes
74 pages
Dell Emc Data Domain Dd3300 Faq: Frequently Asked Questions
No ratings yet
Dell Emc Data Domain Dd3300 Faq: Frequently Asked Questions
14 pages
Albashiri Paper 13
No ratings yet
Albashiri Paper 13
9 pages
Wseas Albashiri13 2
No ratings yet
Wseas Albashiri13 2
7 pages
Publication 4 2259 1575
No ratings yet
Publication 4 2259 1575
6 pages
Effect of Data Skewness and Workload Balance in Parallel Data Mining
No ratings yet
Effect of Data Skewness and Workload Balance in Parallel Data Mining
17 pages
Ba 2419551957
No ratings yet
Ba 2419551957
3 pages
Ads Mse
No ratings yet
Ads Mse
22 pages
Unit 2
No ratings yet
Unit 2
73 pages
4005 BDA ASSSIGNMENT 2 (1)
No ratings yet
4005 BDA ASSSIGNMENT 2 (1)
9 pages
1M AND 10 M
No ratings yet
1M AND 10 M
23 pages
DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database Systems
No ratings yet
DYFRAM: Dynamic Fragmentation and Replica Management in Distributed Database Systems
25 pages
DBMS (LONG 12pm)
No ratings yet
DBMS (LONG 12pm)
4 pages
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
No ratings yet
M.C.a. (Sem - IV) Paper - IV - Adavanced Database Techniques
114 pages
Business Intelligence Architectures
No ratings yet
Business Intelligence Architectures
14 pages
distributeddbms
No ratings yet
distributeddbms
46 pages
Query Proceessing
No ratings yet
Query Proceessing
5 pages
Assignment # 2: Submitted by Submitted To Class Semester Roll No
No ratings yet
Assignment # 2: Submitted by Submitted To Class Semester Roll No
9 pages
Top Down Database Design
No ratings yet
Top Down Database Design
4 pages
Sampling Based Range Partition Methods For Big Data Analytics
No ratings yet
Sampling Based Range Partition Methods For Big Data Analytics
16 pages
Data Mining Questions
No ratings yet
Data Mining Questions
9 pages
HAJJATII
No ratings yet
HAJJATII
11 pages
short_answer_datamining
No ratings yet
short_answer_datamining
7 pages
Block
No ratings yet
Block
29 pages
Algorithms For Clustering ClickStream Data
No ratings yet
Algorithms For Clustering ClickStream Data
11 pages
Importance of Clustering
No ratings yet
Importance of Clustering
5 pages
Complete Referenec of Sementics
No ratings yet
Complete Referenec of Sementics
6 pages
List Data Warehouse Models With Example
No ratings yet
List Data Warehouse Models With Example
19 pages
data_partition_survey
No ratings yet
data_partition_survey
23 pages
Module 2
No ratings yet
Module 2
19 pages
Contact Me To Get Fully Solved Smu Assignments/Project/Synopsis/Exam Guide Paper
No ratings yet
Contact Me To Get Fully Solved Smu Assignments/Project/Synopsis/Exam Guide Paper
7 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
7 pages
Distributed Database Design Concept
No ratings yet
Distributed Database Design Concept
5 pages
InformaticaQ&A
100% (1)
InformaticaQ&A
18 pages
IV-cse DM Viva Questions
No ratings yet
IV-cse DM Viva Questions
10 pages
Implementing Sorting in Database Systems
No ratings yet
Implementing Sorting in Database Systems
37 pages
BIG_DATA
No ratings yet
BIG_DATA
8 pages
13238-Article Text-23626-1-10-20221220
No ratings yet
13238-Article Text-23626-1-10-20221220
7 pages
DWDM
No ratings yet
DWDM
5 pages
Computation of Storage Requirements For Multi-Dimensional Signal Processing Applications
No ratings yet
Computation of Storage Requirements For Multi-Dimensional Signal Processing Applications
14 pages
Camintac Essay - Nubbh Kejriwal
No ratings yet
Camintac Essay - Nubbh Kejriwal
4 pages
Data Mining Nov10
100% (1)
Data Mining Nov10
2 pages
Distributed Database
No ratings yet
Distributed Database
23 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
DOC-20250224-WA0004
No ratings yet
DOC-20250224-WA0004
14 pages
databace2
No ratings yet
databace2
10 pages
Data Mining and Database Systems Where Is The Intersection
No ratings yet
Data Mining and Database Systems Where Is The Intersection
5 pages
Chat GPT
No ratings yet
Chat GPT
32 pages
Database Modeling - notes-VII
No ratings yet
Database Modeling - notes-VII
6 pages
Strategies and Algorithms For Clustering Large Datasets: A Review
No ratings yet
Strategies and Algorithms For Clustering Large Datasets: A Review
20 pages
Ans: A: 1. Describe The Following: Dimensional Model
No ratings yet
Ans: A: 1. Describe The Following: Dimensional Model
8 pages
BDA - Question Bank - 2
No ratings yet
BDA - Question Bank - 2
12 pages
Travel Distance Calculation
No ratings yet
Travel Distance Calculation
19 pages
Compusoft, 3 (10), 1108-115 PDF
No ratings yet
Compusoft, 3 (10), 1108-115 PDF
8 pages
Data Mining Association Rules Mining:: Large
No ratings yet
Data Mining Association Rules Mining:: Large
7 pages
Data Processing Operations
No ratings yet
Data Processing Operations
11 pages
Ijctt V3i4p103
No ratings yet
Ijctt V3i4p103
6 pages
IJAERS-SEPT-2014-020-MAD-ARM - Distributed Association Rule Mining Mobile Agent
No ratings yet
IJAERS-SEPT-2014-020-MAD-ARM - Distributed Association Rule Mining Mobile Agent
5 pages
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
No ratings yet
Distributed Databases: by Chien-Pin Hsu CS157B Section 1 Nov 11, 2004
24 pages
1.3 Tasks of Data Mining
No ratings yet
1.3 Tasks of Data Mining
10 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
A D FTK 3.2 R N: Ccess ATA Elease Otes
No ratings yet
A D FTK 3.2 R N: Ccess ATA Elease Otes
14 pages
Chapter 1 - Database Performance Tuning and Query Optimization
No ratings yet
Chapter 1 - Database Performance Tuning and Query Optimization
50 pages
Intrebari Certificare Azure
No ratings yet
Intrebari Certificare Azure
178 pages
Veeam Quick Feature Comparison Veritas Backup Exec
No ratings yet
Veeam Quick Feature Comparison Veritas Backup Exec
5 pages
Lab 1: Fundamentals of Database Management System: Activity 1A
No ratings yet
Lab 1: Fundamentals of Database Management System: Activity 1A
12 pages
Tours and Travels Report
No ratings yet
Tours and Travels Report
52 pages
Informix Show Locks
100% (2)
Informix Show Locks
6 pages
Fat 32 - Exfat
No ratings yet
Fat 32 - Exfat
2 pages
Mysql V Postgresql
No ratings yet
Mysql V Postgresql
7 pages
Oracle Database 11g Database Architecture and ASM
No ratings yet
Oracle Database 11g Database Architecture and ASM
44 pages
It Term 2 Pre Board 3 - Set1
No ratings yet
It Term 2 Pre Board 3 - Set1
5 pages
Assighnment and Notes of Comp - Science
No ratings yet
Assighnment and Notes of Comp - Science
28 pages
Hive Mock Test
100% (1)
Hive Mock Test
6 pages
CS3492-Database-Management-Systems-Lecture-Notes-2 (1)
No ratings yet
CS3492-Database-Management-Systems-Lecture-Notes-2 (1)
170 pages
Bridge The Gape of .Net With HANA
No ratings yet
Bridge The Gape of .Net With HANA
18 pages
Oraclegg Part3 Trouble
No ratings yet
Oraclegg Part3 Trouble
41 pages
Store Management System
No ratings yet
Store Management System
63 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
Kroenke Dbc6e PP Ch01
No ratings yet
Kroenke Dbc6e PP Ch01
46 pages
ETLT
No ratings yet
ETLT
9 pages
ITI 4102 - EN Data Warehousing and Mining1
No ratings yet
ITI 4102 - EN Data Warehousing and Mining1
96 pages
Normalization Question
No ratings yet
Normalization Question
8 pages
Unique ID Registration
No ratings yet
Unique ID Registration
2 pages
Saksham_ISM_Project
No ratings yet
Saksham_ISM_Project
118 pages
Dbms R, JUNE 2022
No ratings yet
Dbms R, JUNE 2022
4 pages
HANA Training v1.1
No ratings yet
HANA Training v1.1
3 pages
Semaphore: 1 Teminologies 2 What Is A Semaphore? 3 Types of Semaphores 4 Priority Inversion 5 Priority Inheritance
No ratings yet
Semaphore: 1 Teminologies 2 What Is A Semaphore? 3 Types of Semaphores 4 Priority Inversion 5 Priority Inheritance
6 pages
PLSQL 1 - 1
No ratings yet
PLSQL 1 - 1
1 page

Data Partitioning

Uploaded by

Data Partitioning

Uploaded by

Data Partitioning and Parallel/Distributed Processing Association Rule

Mining Using MultiAgent Data Mining Framework

Keywords: Meta Mining, Multi-Agent Data Mining, Frequent Itemsets,

Data Segmentation and Partitioning

DATA-VP Task Architecture and Network Configuration

Experimentation and Analysis

You might also like