How to Create ARFF File in Weka Tool
Last Updated :
30 Dec, 2022
In this article, we will be learning about ARFF files and how to create ARFF File (Attribute relation File Format)
As the name suggests it described a list of instances sharing a set of attributes. these files are supported by WEKA machine Learning tool, arff files are used for the purpose of various operations related to data preprocessing, data cleaning etc.
Structure of file.
ARFF file contains 2 sections
- Header Section
- Data Section
All the keywords in ARFF file start with @ symbol.
1. Header Section
This section contains various information related to the dataset like the name of the relation, columns, and type of columns. The header section contains 2 parts Table/relation and attribute part.
@relation :used to give the table name
@attribute: used to give a column name
datatypes:
nominal: represented inside curly brackets (Like constants)
string : data type which accepts only string value
numeric: used to store numbers
date: used to store date
Syntax:
@relation tablename
@attribute column_name type
example:
@relation "employee"
@attribute f_name string
@attribute l_name string
@attribute contact_num numeric
@attribute dept {HR,IT,MANAGEMENT,MAINTAINANCE}
@attribute DOB date dd-mm-yyyy
@attribute city string
Here dept column is having nominal data type so it can only accept above mentioned types of data only,
2. Data section
Data section is used to represents the data or entries for available columns. (according to the order in header section data would be inserted).
data section starts with @data, and this section must be added after Header section. only single record can be written in single line.
@data: Used to start data section
%: % sign is used to represent the comment in file.
Syntax:
@data
<record1>
<record2>
.
.
<record N>
all the Records must be in the same format as their attributes are defined in Header section Like
example:
1,naman,N,1234556678,IT,02-08-2000,rjt
2,yash,M,1234556679,HR,04-05-2001,amd
3,kishan,G,1214556678,MANAGEMENT,02-11-2001,pbr
4,?,?,5234556678,IT,03-05-2000,amd
entire file would look like this:
emp.arff file:
@relation "employee"
@attribute id numeric
@attribute f_name string
@attribute l_name string
@attribute contact_num numeric
@attribute dept {HR,IT,MANAGEMENT,MAINTAINANCE}
@attribute DOB date dd-mm-yyyy
@attribute city string
@data
1,naman,N,1234556678,IT,02-08-2000,rjt
2,yash,M,1234556679,HR,04-05-2001,amd
3,kishan,G,1214556678,MANAGEMENT,02-11-2001,pbr
4,?,?,5234556678,IT,03-05-2000,amd
We separate values by comma(,) and to represent the empty or missing value for a particular column we use the (?)sign.
How to Create and open arff file
you need to have weka tool install on your machine. you can check this How to install Weka.
Step 1: Open any text editor and paste the above code.
Step 2: Save the file with emp_dm.arff file extension
Step 3: Open weka tool
Step 4: Click on Explorer
Then click on Open file
Select/Locate arff file from disk then click On Open.
Step 6: file is now Loaded now click on Edit from Preprocess Tab
Step 7: dataset would be shown like this.
So this is how you can work with arff file. with weka tool, various operations can be done on the Available Dataset. here missing values would be shown as the empty cells.
Similar Reads
How to Create First DAG in Airflow?
Directed Acyclic Graph (DAG) is a group of all individual tasks that we run in an ordered fashion. In other words, we can say that a DAG is a data pipeline in airflow. In a DAG: There is no loopEdges are directedKey Terminologies:Operator: The task in your DAG is called an operator. In airflow, the
5 min read
How to Define And Save Filters in Wireshark?
Defining and saving filters is a way to create shortcuts for complex display filters in Wireshark. We can create pre-defined filters that appear in the capture and display filter bookmark menus. We can define a filter in Wireshark and tag it to use later. This saves time in recalling and writing som
2 min read
How to convert a PDF file to TIFF file using Python?
This article will discover how to transform a PDF (Portable Document Format) file on your local drive into a TIFF (Tag Image File Format) file at the specified location. We'll employ Python's Aspose-Words package for this task. The aspose-words library will be used to convert a PDF file to a TIFF fi
3 min read
How to Use Weka Java API?
To use the weka API you need to install weka according to your operating system. After downloading the archive and extracting it youâll find the weka.jar file. The JAR file contains all the class files required i.e. weka API. Now we can find all the information about the classes and methods in the W
2 min read
How to inspect a Tensorflow .tfrecord file?
TensorFlow's TFRecord format is a powerful and efficient way to store and manage large datasets. It is specifically designed for TensorFlow and enables faster data reading during model training. A TFRecord file consists of a sequence of binary records, which can store a variety of data types, includ
5 min read
How to Install Apache Airflow in Kaggle
Apache Airflow is a popular open-source tool used to arrange workflows and manage ETL (Extract, Transform, Load) pipelines. Installing Apache Airflow in a Kaggle notebook allows users to perform complex data processing tasks within the Kaggle environment, leveraging the flexibility of DAGs (Directed
4 min read
How to upload a dataset in Jupyter Notebook?
Jupyter Notebook is a web-based powerful IDE tool that helps with data analysis, visualization and narrative multi-media. Uploading a dataset is a very common task when working in the Jupyter Notebook. It is widely used by data analysts, data scientists, researchers and other developers to perform d
4 min read
How to convert any HuggingFace Model to gguf file format?
Hugging Face has become synonymous with state-of-the-art machine learning models, particularly in natural language processing. On the other hand, the GGUF file format, though less well-known, serves specific purposes that necessitate the conversion of models into this format. This article provides a
3 min read
Creating a simple machine learning model
Machine Learning models are the core of smart applications. To get a better insight into how machine learning models work, let us discuss one of the most basic algorithms: Linear Regression. This is employed in predicting a dependent variable based on one or multiple independent variables by utilizi
3 min read
Ways to import CSV files in Google Colab
Colab (short for Collaboratory) is Google's free platform which enables users to code in Python. It is a Jupyter Notebook-based cloud service, provided by Google. This platform allows us to train the Machine Learning models directly in the cloud and all for free. Google Colab does whatever your Jupy
4 min read