pytorch早停法

最新推荐文章于 2025-05-29 18:54:19 发布

原创最新推荐文章于 2025-05-29 18:54:19 发布 · 3.3k 阅读

19 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #深度学习 #计算机视觉 #pytorch

早停法是一种有效的防止深度学习模型过拟合的策略，它通过在训练过程中监控验证集上的损失来决定何时停止训练。在PyTorch中，可以通过定义早停对象并在每个epoch后检查验证损失来实现这一方法。当验证损失不再下降或达到预设的耐心值时，模型训练停止，从而得到更好的泛化能力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

作为深度学习训练数据的trick，结合交叉验证法，可以防止模型过早拟合。

早停法是一种被广泛使用的方法，在很多案例上都比正则化的方法要好。是在训练中计算模型在验证集上的表现，当模型在验证集上的表现开始下降的时候，停止训练，这样就能避免继续训练导致过拟合的问题。其主要步骤如下：
1. 将原始的训练数据集划分成训练集和验证集
2. 只在训练集上进行训练，并每隔一个周期计算模型在验证集上的误差
3. 当模型在验证集上（权重的更新低于某个阈值；预测的错误率低于某个阈值；达到一定的迭代次数），则停止训练
4. 使用上一次迭代结果中的参数作为模型的最终参数

如下图之后的某个epoch,模型的验证误差逐渐上升，模型出现过拟合，所以需要提前停止训练，早停法主要是训练时间和泛化错误之间的权衡。不同的停止标准也是给我们带来不同的效果。

pytorch实现早停法

#Train the Model using Early Stopping
def train_model(model, batch_size, patience, n_epochs):
    
    # to track the training loss as the model trains
    train_losses = []
    # to track the validation loss as the model trains
    valid_losses = []
    # to track the average training loss per epoch as the model trains
    avg_train_losses = []
    # to track the average validation loss per epoch as the model trains
    avg_valid_losses = [] 
    
    # initialize the early_stopping object
    early_stopping = EarlyStopping(patience=patience, verbose=True)
    
    for epoch in range(1, n_epochs + 1):
 
        ###################
        # train the model #
        ###################
        model.train() # prep model for training
        for batch, (data, target) in enumerate(train_loader, 1):
            # clear the gradients of all optimized variables
            optimizer.zero_grad()
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the loss
            loss = criterion(output, target)
            # backward pass: compute gradient of the loss with respect to model parameters
            loss.backward()
            # perform a single optimization step (parameter update)
            optimizer.step()
            # record training loss
            train_losses.append(loss.item())
 
        ######################    
        # validate the model #
        ######################
        model.eval() # prep model for evaluation
        for data, target in valid_loader:
            # forward pass: compute predicted outputs by passing inputs to the model
            output = model(data)
            # calculate the loss
            loss = criterion(output, target)
            # record validation loss
            valid_losses.append(loss.item())
 
        # print training/validation statistics 
        # calculate average loss over an epoch
        train_loss = np.average(train_losses)
        valid_loss = np.average(valid_losses)
        avg_train_losses.append(train_loss)
        avg_valid_losses.append(valid_loss)
        
        epoch_len = len(str(n_epochs))
        
        print_msg = (f'[{epoch:>{epoch_len}}/{n_epochs:>{epoch_len}}] ' +
                     f'train_loss: {train_loss:.5f} ' +
                     f'valid_loss: {valid_loss:.5f}')
        
        print(print_msg)
        
        # clear lists to track next epoch
        train_losses = []
        valid_losses = []
        
        # early_stopping needs the validation loss to check if it has decresed, 
        # and if it has, it will make a checkpoint of the current model
        early_stopping(valid_loss, model)
        
        if early_stopping.early_stop:
            print("Early stopping")
            break
        
    # load the last checkpoint with the best model
    model.load_state_dict(torch.load('checkpoint.pt'))
 
    return  model, avg_train_losses, avg_valid_losses