How can Tensorflow be used with Estimators to split the iris dataset?
Last Updated :
17 Apr, 2023
TensorFlow is an open-source machine-learning framework that has become incredibly popular in the past few years. It is widely used for building and training deep neural networks, as well as for implementing other machine learning algorithms. Estimators, on the other hand, are high-level TensorFlow APIs that can be used to simplify the process of building, training, and evaluating machine learning models.
In TensorFlow, the estimator is a high-level API that simplifies the process of building, training, evaluating, and deploying machine learning models. It provides a simple interface for working with pre-built models or building custom models while abstracting away many of the low-level details of TensorFlow.
The Iris dataset is a popular machine learning dataset that contains measurements of various characteristics of iris flowers, such as the length and width of petals and sepals. The purpose of this dataset is to classify irises into one of three species based on these measurements. The Iris dataset is often used as a reference dataset in machine learning research and is an excellent dataset to explore TensorFlow and estimators.
Here we used the dataset directly online from the UCI machine learning repository so to run this code we need an active Internet connection.
Before we begin, make sure you have TensorFlow, scikit-learn, and pandas installed on your system. You can install them using pip:
pip install tensorflow
pip install scikit-learn
pip install pandas
Importing the necessary libraries
Python3
import tensorflow as tf
import pandas as pd
Load the iris dataset
Next, let's load the iris dataset into a pandas DataFrame:
Python3
iris_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
iris_data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
Split the dataset into training and testing sets
The iris dataset contains 150 samples, with 50 samples for each of the three species. We can split the dataset into training and testing sets using the train_test_split function from the scikit-learn library:
Python3
from sklearn.model_selection import train_test_split
train_data, test_data, train_labels, test_labels = train_test_split(
iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']],
iris_data['species'], test_size=0.2)
Now that we have split the dataset into training and testing sets.
Define the feature columns
let's define the feature columns using the tf.feature_column API. Feature columns are used to map raw input data to a format that can be input to a TensorFlow model. In this case, we will define four feature columns for the four input features in the iris dataset:
Python3
feature_columns = [
tf.feature_column.numeric_column('sepal_length'),
tf.feature_column.numeric_column('sepal_width'),
tf.feature_column.numeric_column('petal_length'),
tf.feature_column.numeric_column('petal_width')
]
TensorFlow Estimator
Next, let's create an Estimator object using the DNNClassifier class. This will allow us to create a deep neural network that can classify the iris flowers based on the input features:
Python3
estimator = tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10, 10],
n_classes=3,
model_dir='model'
)
In this case, we are creating a neural network with two hidden layers, each with 10 nodes. The n_classes parameter is set to 3, since there are three possible classes in the iris dataset. The model_dir parameter specifies the directory where the TensorFlow model will be saved.
Train and test Dataset using Tensorflow Estimators
Now, let's define the input functions that will feed data into the Estimator. We will define two input functions, one for the training data and one for the testing data:
Python3
train_input_fn = tf.estimator.inputs.pandas_input_fn(
x=train_data,
y=train_labels,
batch_size=32,
shuffle=True
)
test_input_fn = tf.estimator.inputs.pandas_input_fn(
x=test_data,
y=test_labels,
batch_size=32,
shuffle=False
)
The pandas_input_fn function is used to create input functions from pandas DataFrames. The batch_size parameter specifies the number of samples that will be fed to the model at once. The shuffle parameter is set to True for the training input function, which will shuffle the training data before feeding it to the model.
Train the Estimator
Now that we have defined the Estimator and the input functions, we can train the model using the train method:
Python3
estimator.train(input_fn=train_input_fn, steps=1000)
The train method trains the model using the specified input function for the specified number of steps.
Evaluation
Finally, we can evaluate the performance of the model on the testing data using the evaluate method:
Python3
eval_result = estimator.evaluate(input_fn=test_input_fn)
print(eval_result)
Output:
{'accuracy': 0.33333334, 'average_loss': 1.6798068, 'loss': 1.6798068, 'global_step': 8}
The evaluate method returns a dictionary containing various performance metrics, such as accuracy, loss. We can print out the evaluation results to see how well our model is performing on the testing data.
Complete code:
Python3
import tensorflow as tf
import pandas as pd
from sklearn.model_selection import train_test_split
# Load the iris dataset
iris_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
iris_data.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']
# Map string labels to integers
label_map = {'Iris-setosa': 0, 'Iris-versicolor': 1, 'Iris-virginica': 2}
iris_data['label'] = iris_data['species'].map(label_map)
# Split the dataset into training and testing sets
train_data, test_data, train_labels, test_labels = train_test_split(
iris_data[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']],
iris_data['label'], test_size=0.2)
# Define the feature columns
feature_columns = [
tf.feature_column.numeric_column('sepal_length'),
tf.feature_column.numeric_column('sepal_width'),
tf.feature_column.numeric_column('petal_length'),
tf.feature_column.numeric_column('petal_width')
]
# Define the Estimator
estimator = tf.estimator.DNNClassifier(
feature_columns=feature_columns,
hidden_units=[10, 10],
n_classes=3,
model_dir='model'
)
# Define the input functions
train_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
x=train_data,
y=train_labels,
batch_size=32,
shuffle=True
)
test_input_fn = tf.compat.v1.estimator.inputs.pandas_input_fn(
x=test_data,
y=test_labels,
batch_size=32,
shuffle=False
)
# Train the model
estimator.train(input_fn=train_input_fn, steps=1000)
# Evaluate the model
eval_result = estimator.evaluate(input_fn=test_input_fn)
print(eval_result)
Output:
{'accuracy': 0.33333334, 'average_loss': 1.6798068, 'loss': 1.6798068, 'global_step': 8}
Similar Reads
How can Tensorflow be used to load the flower dataset and work with it?
Tensorflow flower dataset is a large dataset of images of flowers. In this article, we are going to see, how we can use Tensorflow to load the flower dataset and work with it. Let us start by importing the necessary libraries. Here we are going to use the tensorflow_dataset library to load the datas
4 min read
How can Tensorflow be used to split the flower dataset into training and validation?
The Tensorflow flower dataset is a large dataset that consists of flower images. In this article, we are going to see how we can split the flower dataset into training and validation sets. For the purposes of this article, we will use tensorflow_datasets to load the dataset. Â It is a library of publ
3 min read
How can Tensorflow be used with the flower dataset to compile and fit the model?
In this article, we will learn how can we compile a model and fit the flower dataset to it. TO fit a dataset on a model we need to first create a data pipeline, create the model's architecture using TensorFlow high-level API, and then before fitting the model on the data using data pipelines we need
6 min read
How can Tensorflow be used to download and explore the Iliad dataset using Python?
Tensorflow is a free open-source machine learning and artificial intelligence library widely popular for training and deploying neural networks. It is developed by Google Brain Team and supports a wide range of platforms. In this tutorial, we will learn to download, load and explore the famous Iliad
2 min read
How can Tensorflow be used to pre-process the flower training dataset?
The flower dataset present in TensorFlow large catalog of datasets is an extensive collection of images of flowers. There are five classes of flowers present in the dataset namely: DaisyDandelionRosesSunflowersTulips Here in this article, we will study how to preprocess the dataset. After the comple
3 min read
How can Tensorflow be used to configure the dataset for performance?
Tensorflow is a popular open-source platform for building and training machine learning models. It provides several techniques for loading and preparing the dataset to get the best performance out of the model. The correct configuration of the dataset is crucial for the overall performance of the mo
8 min read
How can Tensorflow be used to standardize the data using Python?
In this article, we are going to see how to use standardize the data using Tensorflow in Python. What is Data Standardize? The process of converting the organizational structure of various datasets into a single, standard data format is known as data standardization. It is concerned with the modific
3 min read
How can Tensorflow be used with abalone dataset to build a sequential model?
In this article, we will learn how to build a sequential model using TensorFlow in Python to predict the age of an abalone. We may wonder what is an abalone. Answer to this question is that it is a kind of snail. Generally, the age of an Abalone is determined by the physical examination of the abalo
7 min read
Estimators Inspect the Titanic Dataset using Python
The TensorFlow Estimator API is a high-level interface that simplifies the process of training and evaluating machine learning models in TensorFlow. It provides pre-built model architectures and optimization algorithms, as well as tools for input preprocessing, evaluation, and serving. To use the Es
7 min read
How to keep up with ongoing developments and updates in TensorFlow?
In the world of data science, TensorFlow is an open-source machine learning framework that is used to build, train and deploy models. Understanding the TensorFlow's capabilities and being up to date with development can enhance your skills. In this blog, we are going to explore how can you keep upda
5 min read