Training, validation and testing sets are three essential components in building reliable machine learning models. The training set teaches the model patterns, the validation set helps fine‑tune hyperparameters and prevent overfitting and the testing set evaluates how well the model performs on completely unseen data.
- Training: used to learn patterns
- Validation: used to tune and optimize the model
- Testing: used to measure final performance and generalization

1. Training Set
The training set is the portion of the dataset used to fit the machine learning model. During training, the algorithm learns patterns, relationships and parameters (such as weights in neural networks or coefficients in regression models) directly from this data. The model repeatedly adjusts itself to minimize the training error using optimization techniques like gradient descent.
- The model has full access to the labels and features in this dataset.
- Learning happens exclusively on the training set.
- Performance on training data does not reflect real-world performance.
Use Cases
- Learning decision boundaries in classification problems
- Estimating parameters in regression models
- Training deep learning models with backpropagation
Advantages
- Enables the model to learn complex patterns from data
- Forms the foundation of the predictive capability of the model
Limitations
- High accuracy on training data may indicate overfitting
- Cannot be used to evaluate generalization performance
Example:
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
import pandas as pd
titanic = fetch_openml(name="titanic", version=1, as_frame=True)
data = titanic.frame
data = data[["pclass", "sex", "age", "fare", "survived"]].copy()
data["sex"] = data["sex"].map({"male": 0, "female": 1})
data["age"] = data["age"].fillna(data["age"].median())
data["fare"] = data["fare"].fillna(data["fare"].median())
X = data.drop("survived", axis=1)
y = data["survived"]
X_train, X_temp, y_train, y_temp = train_test_split(
X, y, test_size=0.3, random_state=42
)
model = LogisticRegression(max_iter=300)
model.fit(X_train, y_train)
Output:

2. Validation Set
The validation set is a separate subset of data used to tune model hyperparameters and make design decisions during training. Unlike the training set, it is not used to update model weights directly. Instead, it provides an unbiased estimate of model performance during development.
- Used during model selection and optimization
- Helps detect overfitting early
- Often involved in techniques like early stopping
Use Cases
- Selecting optimal hyperparameters (e.g., learning rate, depth of trees)
- Comparing multiple models or architectures
- Deciding when to stop training in neural networks
Advantages
- Prevents the model from being overly tailored to training data
- Improves generalization by guiding model tuning
Limitations
- Repeated tuning on validation data can still cause validation overfitting
- Reduces the amount of data available for training
Example:
X_val, X_test, y_val, y_test = train_test_split(
X_temp, y_temp, test_size=0.5, random_state=42
)
validation_accuracy = model.score(X_val, y_val)
print("Validation Accuracy:", validation_accuracy)
Output:
Validation Accuracy: 0.8010204081632653
3. Testing Set
The testing set is a completely independent subset used to evaluate the final model’s performance after all training and tuning are complete. It simulates how the model will perform on unseen, real-world data and provides the most reliable estimate of generalization.
- Used only once (or very sparingly)
- No model decisions should be based on test results
- Represents real deployment conditions
Use Cases
- Reporting final accuracy, precision, recall, RMSE, etc.
- Comparing different finalized models
- Validating production readiness of a model
Advantages
- Provides an unbiased and realistic performance estimate
- Confirms whether the model generalizes well
Limitations
- Cannot improve the model once evaluated
- Small test sets may give unstable performance estimates
Example:
test_accuracy = model.score(X_test, y_test)
print("Test Accuracy:", test_accuracy)
Output:
Test Accuracy: 0.817258883248731
Comparison
Let's compare them:
| Aspect | Training Set | Validation Set | Testing Set |
|---|---|---|---|
| Core Objective | Learn patterns and fit model parameters | Tune hyperparameters and select the best model | Evaluate final model performance |
| Role in Learning | Directly involved in model learning | Indirect role (guides learning decisions) | No role in learning |
| Parameter Updates | Model weights are updated repeatedly | Weights remain unchanged | Weights remain unchanged |
| Hyperparameter Influence | Does not guide hyperparameter selection | Actively used for hyperparameter tuning | Must not influence hyperparameters |
| Typical Usage Frequency | Used many times across epochs/iterations | Used multiple times during experimentation | Ideally used only once |
| Performance Interpretation | Indicates how well the model fits seen data | Indicates whether the model is overfitting or underfitting | Indicates real-world generalization |
| Risk if Overused | Severe overfitting to training data | Validation leakage and biased tuning | Inflated and unreliable performance estimates |
| Relationship to Deployment | Indirect, supports model creation | Indirect, supports model refinement | Direct indicator of deployment readiness |
| Typical Dataset Proportion | Largest portion of the dataset | Medium-sized portion | Smallest portion |