Training vs Testing vs Validation Sets

Last Updated : 6 Jan, 2026

Training, validation and testing sets are three essential components in building reliable machine learning models. The training set teaches the model patterns, the validation set helps fine‑tune hyperparameters and prevent overfitting and the testing set evaluates how well the model performs on completely unseen data.

  • Training: used to learn patterns
  • Validation: used to tune and optimize the model
  • Testing: used to measure final performance and generalization
given_data
Training vs. Testing vs. Validation Sets

1. Training Set

The training set is the portion of the dataset used to fit the machine learning model. During training, the algorithm learns patterns, relationships and parameters (such as weights in neural networks or coefficients in regression models) directly from this data. The model repeatedly adjusts itself to minimize the training error using optimization techniques like gradient descent.

  • The model has full access to the labels and features in this dataset.
  • Learning happens exclusively on the training set.
  • Performance on training data does not reflect real-world performance.

Use Cases

  • Learning decision boundaries in classification problems
  • Estimating parameters in regression models
  • Training deep learning models with backpropagation

Advantages

  • Enables the model to learn complex patterns from data
  • Forms the foundation of the predictive capability of the model

Limitations

  • High accuracy on training data may indicate overfitting
  • Cannot be used to evaluate generalization performance

Example:

Python
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd

titanic = fetch_openml(name="titanic", version=1, as_frame=True)
data = titanic.frame

data = data[["pclass", "sex", "age", "fare", "survived"]].copy()
data["sex"] = data["sex"].map({"male": 0, "female": 1})
data["age"] = data["age"].fillna(data["age"].median())
data["fare"] = data["fare"].fillna(data["fare"].median())

X = data.drop("survived", axis=1)
y = data["survived"]

X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = LogisticRegression(max_iter=300)
model.fit(X_train, y_train)

Output:

Screenshot-2025-12-16-171644

2. Validation Set

The validation set is a separate subset of data used to tune model hyperparameters and make design decisions during training. Unlike the training set, it is not used to update model weights directly. Instead, it provides an unbiased estimate of model performance during development.

  • Used during model selection and optimization
  • Helps detect overfitting early
  • Often involved in techniques like early stopping

Use Cases

  • Selecting optimal hyperparameters (e.g., learning rate, depth of trees)
  • Comparing multiple models or architectures
  • Deciding when to stop training in neural networks

Advantages

  • Prevents the model from being overly tailored to training data
  • Improves generalization by guiding model tuning

Limitations

  • Repeated tuning on validation data can still cause validation overfitting
  • Reduces the amount of data available for training

Example:

Python
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=42
)

validation_accuracy = model.score(X_val, y_val)
print("Validation Accuracy:", validation_accuracy)

Output:

Validation Accuracy: 0.8010204081632653

3. Testing Set

The testing set is a completely independent subset used to evaluate the final model’s performance after all training and tuning are complete. It simulates how the model will perform on unseen, real-world data and provides the most reliable estimate of generalization.

  • Used only once (or very sparingly)
  • No model decisions should be based on test results
  • Represents real deployment conditions

Use Cases

  • Reporting final accuracy, precision, recall, RMSE, etc.
  • Comparing different finalized models
  • Validating production readiness of a model

Advantages

  • Provides an unbiased and realistic performance estimate
  • Confirms whether the model generalizes well

Limitations

  • Cannot improve the model once evaluated
  • Small test sets may give unstable performance estimates

Example:

Python
test_accuracy = model.score(X_test, y_test)
print("Test Accuracy:", test_accuracy)

Output:

Test Accuracy: 0.817258883248731

Comparison

Let's compare them:

AspectTraining SetValidation SetTesting Set
Core ObjectiveLearn patterns and fit model parametersTune hyperparameters and select the best modelEvaluate final model performance
Role in LearningDirectly involved in model learningIndirect role (guides learning decisions)No role in learning
Parameter UpdatesModel weights are updated repeatedlyWeights remain unchangedWeights remain unchanged
Hyperparameter InfluenceDoes not guide hyperparameter selectionActively used for hyperparameter tuningMust not influence hyperparameters
Typical Usage FrequencyUsed many times across epochs/iterationsUsed multiple times during experimentationIdeally used only once
Performance InterpretationIndicates how well the model fits seen dataIndicates whether the model is overfitting or underfittingIndicates real-world generalization
Risk if OverusedSevere overfitting to training dataValidation leakage and biased tuningInflated and unreliable performance estimates
Relationship to DeploymentIndirect, supports model creationIndirect, supports model refinementDirect indicator of deployment readiness
Typical Dataset ProportionLargest portion of the datasetMedium-sized portionSmallest portion
Comment

Explore