Data Mining: Data Attributes and Quality
Last Updated :
06 May, 2023
Prerequisite – Data Mining
Data: It is how the data objects and their attributes are stored.
- An attribute is an object’s property or characteristics. For example. A person’s hair colour, air humidity etc.
- An attribute set defines an object. The object is also referred to as a record of the instances or entity.
Different types of attributes or data types:
In data mining, understanding the different types of attributes or data types is essential as it helps to determine the appropriate data analysis techniques to use. The following are the different types of data:
1]Nominal Data:
This type of data is also referred to as categorical data. Nominal data represents data that is qualitative and cannot be measured or compared with numbers. In nominal data, the values represent a category, and there is no inherent order or hierarchy. Examples of nominal data include gender, race, religion, and occupation. Nominal data is used in data mining for classification and clustering tasks.
2]Ordinal Data:
This type of data is also categorical, but with an inherent order or hierarchy. Ordinal data represents qualitative data that can be ranked in a particular order. For instance, education level can be ranked from primary to tertiary, and social status can be ranked from low to high. In ordinal data, the distance between values is not uniform. This means that it is not possible to say that the difference between high and medium social status is the same as the difference between medium and low social status. Ordinal data is used in data mining for ranking and classification tasks.
3]Binary Data:
This type of data has only two possible values, often represented as 0 or 1. Binary data is commonly used in classification tasks, where the target variable has only two possible outcomes. Examples of binary data include yes/no, true/false, and pass/fail. Binary data is used in data mining for classification and association rule mining tasks.
4]Interval Data:
This type of data represents quantitative data with equal intervals between consecutive values. Interval data has no absolute zero point, and therefore, ratios cannot be computed. Examples of interval data include temperature, IQ scores, and time. Interval data is used in data mining for clustering and prediction tasks.
5]Ratio Data:
This type of data is similar to interval data, but with an absolute zero point. In ratio data, it is possible to compute ratios of two values, and this makes it possible to make meaningful comparisons. Examples of ratio data include height, weight, and income. Ratio data is used in data mining for prediction and association rule mining tasks.
6]Text Data:
This type of data represents unstructured data in the form of text. Text data can be found in social media posts, customer reviews, and news articles. Text data is used in data mining for sentiment analysis, text classification, and topic modeling tasks.
Data Quality: Why do we preprocess the data?
Data preprocessing is an essential step in data mining and machine learning as it helps to ensure the quality of data used for analysis. There are several factors that are used for data quality assessment, including:
1.Incompleteness:
This refers to missing data or information in the dataset. Missing data can result from various factors, such as errors during data entry or data loss during transmission. Preprocessing techniques, such as imputation, can be used to fill in missing values to ensure the completeness of the dataset.
2.Inconsistency:
This refers to conflicting or contradictory data in the dataset. Inconsistent data can result from errors in data entry, data integration, or data storage. Preprocessing techniques, such as data cleaning and data integration, can be used to detect and resolve inconsistencies in the dataset.
3.Noise:
This refers to random or irrelevant data in the dataset. Noise can result from errors during data collection or data entry. Preprocessing techniques, such as data smoothing and outlier detection, can be used to remove noise from the dataset.
4.Outliers:
Outliers are data points that are significantly different from the other data points in the dataset. Outliers can result from errors in data collection, data entry, or data transmission. Preprocessing techniques, such as outlier detection and removal, can be used to identify and remove outliers from the dataset.
5.Redundancy:
Redundancy refers to the presence of duplicate or overlapping data in the dataset. Redundant data can result from data integration or data storage. Preprocessing techniques, such as data deduplication, can be used to remove redundant data from the dataset.
5.Data format:
This refers to the structure and format of the data in the dataset. Data may be in different formats, such as text, numerical, or categorical. Preprocessing techniques, such as data transformation and normalization, can be used to convert data into a consistent format for analysis.
Similar Reads
Attributes and its Types in Data Analytics
In this article, we are going to discuss attributes and their various types in data analytics. We will also cover attribute types with the help of examples for better understanding. So let's discuss them one by one. What are Attributes?Attributes are qualities or characteristics that describe an obj
4 min read
Analysis of Attribute Relevance in Data mining
Method of Analysis of Attribute : There have been numerous investigations in AI, insights, fluffy and harsh set Hypotheses on quality pertinence investigation. The overall thought behind characteristic Pertinence examination is to process some gauge that is utilized to evaluate the importance of a t
2 min read
Types and Part of Data Mining architecture
Data Mining refers to the detection and extraction of new patterns from the already collected data. Data mining is the amalgamation of the field of statistics and computer science aiming to discover patterns in incredibly large datasets and then transform them into a comprehensible structure for lat
4 min read
Data Objects, Attributes and Relationships in DBMS
Data Model is an abstract model that represents the data objects, data flow between these data objects, and the interrelationship between these data objects. It is a way of storing data on a computer so that it can be used in a more efficient manner for further purposes. Data model or data structure
3 min read
Data Architecture Design and Data Management
Data architecture design is like a detailed plan for how to handle data in a company, showing the steps for gathering, storing, accessing, and using data. This plan helps keep data neat and well-organized. Data management adds to this by taking care of data from start to finish, including collecting
7 min read
Difference between Data Warehousing and Data Mining
A Data Warehouse is built to support management functions whereas data mining is used to extract useful information and patterns from data. Data warehousing is the process of compiling information into a data warehouse. The main purpose of data warehousing is to consolidate and store large datasets
5 min read
Redundancy and Correlation in Data Mining
Prerequisites:Chi-square test, covariance-and-correlation What is Data Redundancy ? During data integration in data mining, various data stores are used. This can lead to the problem of redundancy in data. An attribute (column or feature of data set) is called redundant if it can be derived from any
2 min read
Data Reduction in Data Mining
Prerequisite - Data Mining The method of data reduction may achieve a condensed description of the original data which is much smaller in quantity but keeps the quality of the original data. INTRODUCTION: Data reduction is a technique used in data mining to reduce the size of a dataset while still p
7 min read
Multivalued Attributes in DBMS
Attributes are significant in DBMS as it deals with the organization and formatting of data. Of all the attributes, multivalued attributes are somewhat different and they have to be understood properly. To help us understand the above idea in the subsequent sections of this article, we will also exp
6 min read
Data Mining Query Language
Data Mining is a process is in which user data are extracted and processed from a heap of unprocessed raw data. By aggregating these datasets into a summarized format, many problems arising in finance, marketing, and many other fields can be solved. In the modern world with enormous data, Data Minin
9 min read