Training vs Testing vs Validation Sets

Training, validation and testing sets are three essential components in building reliable machine learning models. The training set teaches the model patterns, the validation set helps fine‑tune hyperparameters and prevent overfitting and the testing set evaluates how well the model performs on completely unseen data.

Training: used to learn patterns
Validation: used to tune and optimize the model
Testing: used to measure final performance and generalization

given_data — Training vs. Testing vs. Validation Sets

1. Training Set

The training set is the portion of the dataset used to fit the machine learning model. During training, the algorithm learns patterns, relationships and parameters (such as weights in neural networks or coefficients in regression models) directly from this data. The model repeatedly adjusts itself to minimize the training error using optimization techniques like gradient descent.

The model has full access to the labels and features in this dataset.
Learning happens exclusively on the training set.
Performance on training data does not reflect real-world performance.

Use Cases

Learning decision boundaries in classification problems
Estimating parameters in regression models
Training deep learning models with backpropagation

Advantages

Enables the model to learn complex patterns from data
Forms the foundation of the predictive capability of the model

Limitations

High accuracy on training data may indicate overfitting
Cannot be used to evaluate generalization performance

Example:

Python

from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd

titanic = fetch_openml(name="titanic", version=1, as_frame=True)
data = titanic.frame

data = data[["pclass", "sex", "age", "fare", "survived"]].copy()
data["sex"] = data["sex"].map({"male": 0, "female": 1})
data["age"] = data["age"].fillna(data["age"].median())
data["fare"] = data["fare"].fillna(data["fare"].median())

X = data.drop("survived", axis=1)
y = data["survived"]

X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = LogisticRegression(max_iter=300)
model.fit(X_train, y_train)

Output:

2. Validation Set

The validation set is a separate subset of data used to tune model hyperparameters and make design decisions during training. Unlike the training set, it is not used to update model weights directly. Instead, it provides an unbiased estimate of model performance during development.

Used during model selection and optimization
Helps detect overfitting early
Often involved in techniques like early stopping

Use Cases

Selecting optimal hyperparameters (e.g., learning rate, depth of trees)
Comparing multiple models or architectures
Deciding when to stop training in neural networks

Advantages

Prevents the model from being overly tailored to training data
Improves generalization by guiding model tuning

Limitations

Repeated tuning on validation data can still cause validation overfitting
Reduces the amount of data available for training

Example:

Python

X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42
)

validation_accuracy = model.score(X_val, y_val)
print("Validation Accuracy:", validation_accuracy)

Output:

Validation Accuracy: 0.8010204081632653

3. Testing Set

The testing set is a completely independent subset used to evaluate the final model’s performance after all training and tuning are complete. It simulates how the model will perform on unseen, real-world data and provides the most reliable estimate of generalization.

Used only once (or very sparingly)
No model decisions should be based on test results
Represents real deployment conditions

Use Cases

Reporting final accuracy, precision, recall, RMSE, etc.
Comparing different finalized models
Validating production readiness of a model

Advantages

Provides an unbiased and realistic performance estimate
Confirms whether the model generalizes well

Limitations

Cannot improve the model once evaluated
Small test sets may give unstable performance estimates

Example:

Python

test_accuracy = model.score(X_test, y_test)
print("Test Accuracy:", test_accuracy)

Output:

Test Accuracy: 0.817258883248731

Comparison

Let's compare them:

Aspect	Training Set	Validation Set	Testing Set
Core Objective	Learn patterns and fit model parameters	Tune hyperparameters and select the best model	Evaluate final model performance
Role in Learning	Directly involved in model learning	Indirect role (guides learning decisions)	No role in learning
Parameter Updates	Model weights are updated repeatedly	Weights remain unchanged	Weights remain unchanged
Hyperparameter Influence	Does not guide hyperparameter selection	Actively used for hyperparameter tuning	Must not influence hyperparameters
Typical Usage Frequency	Used many times across epochs/iterations	Used multiple times during experimentation	Ideally used only once
Performance Interpretation	Indicates how well the model fits seen data	Indicates whether the model is overfitting or underfitting	Indicates real-world generalization
Risk if Overused	Severe overfitting to training data	Validation leakage and biased tuning	Inflated and unreliable performance estimates
Relationship to Deployment	Indirect, supports model creation	Indirect, supports model refinement	Direct indicator of deployment readiness
Typical Dataset Proportion	Largest portion of the dataset	Medium-sized portion	Smallest portion

Training vs Testing vs Validation Sets

1. Training Set

Use Cases

Advantages

Limitations

2. Validation Set

Use Cases

Advantages

Limitations

3. Testing Set

Use Cases

Advantages

Limitations

Comparison

Explore