STING - Statistical Information Grid in Data Mining
Last Updated :
05 Apr, 2022
STING is a Grid-Based Clustering Technique. In STING, the dataset is recursively divided in a hierarchical manner. After the dataset, each cell is divided into a different number of cells. And after the cell, the statistical measures of the cell are collected, which helps answer the query as quickly as possible.
Grid-Based Method in Data Mining:
In Grid-Based Methods, the space of instance is divided into a grid structure. Clustering techniques are then applied using the Cells of the grid, instead of individual data points, as the base units. The biggest advantage of this method is to improve the processing time.
Statistical Information Grid(STING):
A STING is a grid-based clustering technique. It uses a multidimensional grid data structure that quantifies space into a finite number of cells. Instead of focusing on data points, it focuses on the value space surrounding the data points.
In STING, the spatial area is divided into rectangular cells and several levels of cells at different resolution levels. High-level cells are divided into several low-level cells.
In STING Statistical Information about attributes in each cell, such as mean, maximum, and minimum values, are precomputed and stored as statistical parameters. These statistical parameters are useful for query processing and other data analysis tasks.
The statistical parameter of higher-level cells can easily be computed from the parameters of the lower-level cells.
How STING Work:
Step 1: Determine a layer, to begin with.
Step 2: For each cell of this layer, it calculates the confidence interval or estimated range of probability that this is cell is relevant to the query.
Step 3: From the interval calculate above, it labels the cell as relevant or not relevant.
Step 4: If this layer is the bottom layer, go to point 6, otherwise, go to point 5.
Step 5: It goes down the hierarchy structure by one level. Go to point 2 for those cells that form the relevant cell of the high-level layer.
Step 6: If the specification of the query is met, go to point 8, otherwise go to point 7.
Step 7: Retrieve those data that fall into the relevant cells and do further processing. Return the result that meets the requirement of the query. Go to point 9.
Step 8: Find the regions of relevant cells. Return those regions that meet the requirement of the query. Go to point 9.
Step 9: Stop or terminate.
Advantages:
- Grid-based computing is query-independent because the statistics stored in each cell represent a summary of the data in the grid cells and are query-independent.
- The grid structure facilitates parallel processing and incremental updates.
Disadvantage:
- The main disadvantage of Sting (Statistics Grid). As we know, all cluster boundaries are either horizontal or vertical, so no diagonal boundaries are detected.
Similar Reads
Information Search and Visualization in HCI
Information search and visualization are two important components of the data management process and extract meaningful insights. Effective Information Search and visualization go simultaneously, as searching for relevant data is the first step and visualization helps in understanding the results in
4 min read
Difference Between Data Mining and Statistics
Data mining: Data mining is the method of analyzing expansive sums of data in an exertion to discover relationships, designs, and insights. These designs, concurring to Witten and Eibemust be âmeaningful in that they lead to a few advantages, more often than not a financial advantage.â Data in data
2 min read
Frequent Pattern Mining in Data Mining
Frequent pattern mining in data mining is the process of identifying patterns or associations within a dataset that occur frequently. This is typically done by analyzing large datasets to find items or sets of items that appear together frequently. Frequent pattern extraction is an essential mission
10 min read
What is Information Visualization in Design?
What is Information Visualization?Information visualization is the process of interchanging data and real-life situations. Raw numbers can be transformed into vivid visual tales. Using charts, graphs, and interactive displays, designers can convert abstract ideas into simple pictures that even a lay
5 min read
Scalability and Decision Tree Induction in Data Mining
Pre-requisites: Data Mining Scalability in data mining refers to the ability of a data mining algorithm to handle large amounts of data efficiently and effectively. This means that the algorithm should be able to process the data in a timely manner, without sacrificing the quality of the results. In
5 min read
Information Gain and Mutual Information for Machine Learning
In the field of machine learning, understanding the significance of features in relation to the target variable is essential for building effective models. Information Gain and Mutual Information are two important metrics used to quantify the relevance and dependency of features on the target variab
6 min read
Data Mining in Science and Engineering
Data mining is an automatic process of uncovering implicit patterns, correlations, anomalies, and statistical information within large amounts of data stored in repositories. This information can be interpreted by hypothesis or theory and used to make forecasts. It is an interdisciplinary area that
4 min read
Tuple Duplication in Data Mining
Data Integration is a data preprocessing technique that involves combining data from multiple heterogeneous data sources into a coherent data store and providing a unified view of the data. These sources may include multiple data cubes, databases, or flat files. The data integration approaches are f
3 min read
What is Prediction in Data Mining?
To find a numerical output, prediction is used. The training dataset contains the inputs and numerical output values. According to the training dataset, the algorithm generates a model or predictor. When fresh data is provided, the model should find a numerical output. This approach, unlike classifi
2 min read
Entity Identification Problem in Data Mining
Nowadays, data mining is used in almost all places where a large amount of data is stored and processed. Data Integration is one of the major tasks of data preprocessing. Integration of multiple databases or data files into the single store of identical data is known as Data Integration. Data Integr
3 min read