2025-06-26 16:12:03,164 - mmdet - INFO - Saving checkpoint at 12 epochs [ ] 0/81, elapsed: 0s, ETA:/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) [ ] 1/81, 0.3 task/s, elapsed: 3s, ETA: 273s/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/bbox/coders/fut_nms_free_coder.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.post_center_range = torch.tensor( /media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/bbox/coders/map_nms_free_coder.py:82: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.post_center_range = torch.tensor( [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 4.5 task/s, elapsed: 18s, ETA: 0s Traceback (most recent call last): File "tools/train.py", line 266, in <module> main() File "tools/train.py", line 255, in main custom_train_model( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/VAD/apis/train.py", line 21, in custom_train_model custom_train_detector( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/VAD/apis/mmdet_train.py", line 194, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch self._do_evaluate(runner) File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/evaluation/eval_hooks.py", line 88, in _do_evaluate key_score = self.evaluate(runner, results) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 361, in evaluate eval_res = self.dataloader.dataset.evaluate( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/datasets/nuscenes_vad_dataset.py", line 1781, in evaluate all_metric_dict[key] += results[i]['metric_results'][key] KeyError: 0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2522198) of binary: /home/wangbaihui/anaconda3/envs/vad/bin/python /home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py:367: UserWarning: ********************************************************************** CHILD PROCESS FAILED WITH NO ERROR_FILE ********************************************************************** CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 2522198 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example: from torch.distributed.elastic.multiprocessing.errors import record @record def trainer_main(args): # do train ********************************************************************** warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module> main() File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(*args, **kwargs) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main run(args) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run elastic_launch( File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: *************************************** tools/train.py FAILED ======================================= Root Cause: [0]: time: 2025-06-26_16:12:28 rank: 0 (local_rank: 0) exitcode: 1 (pid: 2522198) error_file: <N/A> msg: "Process failed with exitcode 1" ======================================= Other Failures: <NO_OTHER_FAILURES> ***********2025-06-26 16:12:03,164 - mmdet - INFO - Saving checkpoint at 12 epochs [ ] 0/81, elapsed: 0s, ETA:/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) [ ] 1/81, 0.3 task/s, elapsed: 3s, ETA: 273s/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/bbox/coders/fut_nms_free_coder.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.post_center_range = torch.tensor( /media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/bbox/coders/map_nms_free_coder.py:82: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.post_center_range = torch.tensor( [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 4.5 task/s, elapsed: 18s, ETA: 0s Traceback (most recent call last): File "tools/train.py", line 266, in <module> main() File "tools/train.py", line 255, in main custom_train_model( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/VAD/apis/train.py", line 21, in custom_train_model custom_train_detector( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/VAD/apis/mmdet_train.py", line 194, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch self._do_evaluate(runner) File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/evaluation/eval_hooks.py", line 88, in _do_evaluate key_score = self.evaluate(runner, results) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 361, in evaluate eval_res = self.dataloader.dataset.evaluate( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/datasets/nuscenes_vad_dataset.py", line 1781, in evaluate all_metric_dict[key] += results[i]['metric_results'][key] KeyError: 0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2522198) of binary: /home/wangbaihui/anaconda3/envs/vad/bin/python /home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py:367: UserWarning: ********************************************************************** CHILD PROCESS FAILED WITH NO ERROR_FILE ********************************************************************** CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 2522198 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example: from torch.distributed.elastic.multiprocessing.errors import record @record def trainer_main(args): # do train ********************************************************************** warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module> main() File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 361, in wrapper return f(*args, **kwargs) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main run(args) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run elastic_launch( File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: *************************************** tools/train.py FAILED ======================================= Root Cause: [0]: time: 2025-06-26_16:12:28 rank: 0 (local_rank: 0) exitcode: 1 (pid: 2522198) error_file: <N/A> msg: "Process failed with exitcode 1" ======================================= Other Failures: <NO_OTHER_FAILURES> ***********

25-06-26 14:46:11,967 - mmdet - INFO - Saving checkpoint at 48 epochs [ ] 0/81, elapsed: 0s, ETA:/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.) return torch.floor_divide(self, other) [ ] 1/81, 0.3 task/s, elapsed: 4s, ETA: 303s/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/bbox/coders/fut_nms_free_coder.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.post_center_range = torch.tensor( /media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/bbox/coders/map_nms_free_coder.py:82: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). self.post_center_range = torch.tensor( [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 3.4 task/s, elapsed: 24s, ETA: 0s Traceback (most recent call last): File "tools/train.py", line 266, in <module> main() File "tools/train.py", line 255, in main custom_train_model( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/VAD/apis/train.py", line 21, in custom_train_model custom_train_detector( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/VAD/apis/mmdet_train.py", line 194, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 267, in after_train_epoch self._do_evaluate(runner) File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/core/evaluation/eval_hooks.py", line 88, in _do_evaluate key_score = self.evaluate(runner, results) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/mmcv/runner/hooks/evaluation.py", line 361, in evaluate eval_res = self.dataloader.dataset.evaluate( File "/media/wangbaihui/1ecf654b-afad-4dab-af7b-e34b00dda87a/mmdetection3d/VAD/projects/mmdet3d_plugin/datasets/nuscenes_vad_dataset.py", line 1781, in evaluate all_metric_dict[key] += results[i]['metric_results'][key] KeyError: 0 ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2469743) of binary: /home/wangbaihui/anaconda3/envs/vad/bin/python /home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py:367: UserWarning: CHILD PROCESS FAILED WITH NO ERROR_FILE CHILD PROCESS FAILED WITH NO ERROR_FILE Child process 2469743 (local_rank 0) FAILED (exitcode 1) Error msg: Process failed with exitcode 1 Without writing an error file to <N/A>. While this DOES NOT affect the correctness of your application, no trace information about the error will be available for inspection. Consider decorating your top level entrypoint function with torch.distributed.elastic.multiprocessing.errors.record. Example: from torch.distributed.elastic.multiprocessing.errors import record @record def trainer_main(args): # do train **** warnings.warn(_no_error_file_warning_msg(rank, failure)) Traceback (most recent call last): File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 702, in <module> main() File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 361, in wrapper return f(*args, kwargs) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 698, in main run(args) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/run.py", line 689, in run elastic_launch( File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 116, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/wangbaihui/anaconda3/envs/vad/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 244, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError: ************************************* tools/train.py FAILED ======================================= Root Cause: [0]: time: 2025-06-26_14:46:41 rank: 0 (local_rank: 0) exitcode: 1 (pid: 2469743) error_file: <N/A> msg: "Process failed with exitcode 1" ======================================= Other Failures: <NO_OTHER_FAILURES>

我们正在使用mmdetection3d进行训练，在评估阶段遇到了KeyError:0的错误。根据引用[3]中的经验，这种错误通常与评估配置有关。在mmdetection中，评估器（evaluator）的配置决定了计算哪些指标（如bbox、segm等）。...

from data import * from utils.augmentations import SSDAugmentation, BaseTransform from utils.functions import MovingAverage, SavePath from utils.logger import Log from utils import timer from layers.modules import MultiBoxLoss from yolact import Yolact import os import sys import time import math, random from pathlib import Path import torch from torch.autograd import Variable import torch.nn as nn import torch.optim as optim import torch.backends.cudnn as cudnn import torch.nn.init as init import torch.utils.data as data import numpy as np import argparse import datetime # Oof import eval as eval_script def str2bool(v): return v.lower() in ("yes", "true", "t", "1") parser = argparse.ArgumentParser( description='Yolact Training Script') parser.add_argument('--batch_size', default=2, type=int, help='Batch size for training') parser.add_argument('--resume', default=None, type=str, help='Checkpoint state_dict file to resume training from. If this is "interrupt"'\ ', the model will resume training from the interrupt file.') parser.add_argument('--start_iter', default=-1, type=int, help='Resume training at this iter. If this is -1, the iteration will be'\ 'determined from the file name.') parser.add_argument('--num_workers', default=0, type=int, help='Number of workers used in dataloading') parser.add_argument('--cuda', default=True, type=str2bool, help='Use CUDA to train model') parser.add_argument('--lr', '--learning_rate', default=None, type=float, help='Initial learning rate. Leave as None to read this from the config.') parser.add_argument('--momentum', default=None, type=float, help='Momentum for SGD. Leave as None to read this from the config.') parser.add_argument('--decay', '--weight_decay', default=None, type=float, help='Weight decay for SGD. Leave as None to read this from the config.') parser.add_argument('--gamma', default=None, type=float, help='For each lr step, what to multiply the lr by. Leave as None to read this from the config.') parser.add_argument('--save_folder', default='weights/', help='Directory for saving checkpoint models.') parser.add_argument('--log_folder', default='logs/', help='Directory for saving logs.') parser.add_argument('--config', default=None, help='The config object to use.') parser.add_argument('--save_interval', default=10000, type=int, help='The number of iterations between saving the model.') parser.add_argument('--validation_size', default=5000, type=int, help='The number of images to use for validation.') parser.add_argument('--validation_epoch', default=2, type=int, help='Output validation information every n iterations. If -1, do no validation.') parser.add_argument('--keep_latest', dest='keep_latest', action='store_true', help='Only keep the latest checkpoint instead of each one.') parser.add_argument('--keep_latest_interval', default=100000, type=int, help='When --keep_latest is on, don\'t delete the latest file at these intervals. This should be a multiple of save_interval or 0.') parser.add_argument('--dataset', default=None, type=str, help='If specified, override the dataset specified in the config with this one (example: coco2017_dataset).') parser.add_argument('--no_log', dest='log', action='store_false', help='Don\'t log per iteration information into log_folder.') parser.add_argument('--log_gpu', dest='log_gpu', action='store_true', help='Include GPU information in the logs. Nvidia-smi tends to be slow, so set this with caution.') parser.add_argument('--no_interrupt', dest='interrupt', action='store_false', help='Don\'t save an interrupt when KeyboardInterrupt is caught.') parser.add_argument('--batch_alloc', default=None, type=str, help='If using multiple GPUS, you can set this to be a comma separated list detailing which GPUs should get what local batch size (It should add up to your total batch size).') parser.add_argument('--no_autoscale', dest='autoscale', action='store_false', help='YOLACT will automatically scale the lr and the number of iterations depending on the batch size. Set this if you want to disable that.') parser.set_defaults(keep_latest=False, log=True, log_gpu=False, interrupt=True, autoscale=True) args = parser.parse_args() if args.config is not None: set_cfg(args.config) if args.dataset is not None: set_dataset(args.dataset) if args.autoscale and args.batch_size != 8: factor = args.batch_size / 8 if name == 'main': print('Scaling parameters by %.2f to account for a batch size of %d.' % (factor, args.batch_size)) cfg.lr = factor cfg.max_iter //= factor cfg.lr_steps = [x // factor for x in cfg.lr_steps] # Update training parameters from the config if necessary def replace(name): if getattr(args, name) == None: setattr(args, name, getattr(cfg, name)) replace('lr') replace('decay') replace('gamma') replace('momentum') # This is managed by set_lr cur_lr = args.lr if torch.cuda.device_count() == 0: print('No GPUs detected. Exiting...') exit(-1) if args.batch_size // torch.cuda.device_count() < 6: if name == 'main': print('Per-GPU batch size is less than the recommended limit for batch norm. Disabling batch norm.') cfg.freeze_bn = True loss_types = ['B', 'C', 'M', 'P', 'D', 'E', 'S', 'I'] if torch.cuda.is_available(): if args.cuda: torch.set_default_tensor_type('torch.cuda.FloatTensor') if not args.cuda: print("WARNING: It looks like you have a CUDA device, but aren't " + "using CUDA.\nRun with --cuda for optimal training speed.") torch.set_default_tensor_type('torch.FloatTensor') else: torch.set_default_tensor_type('torch.FloatTensor') class NetLoss(nn.Module): """ A wrapper for running the network and computing the loss This is so we can more efficiently use DataParallel. """ def init(self, net:Yolact, criterion:MultiBoxLoss): super().init() self.net = net self.criterion = criterion def forward(self, images, targets, masks, num_crowds): preds = self.net(images) losses = self.criterion(self.net, preds, targets, masks, num_crowds) return losses class CustomDataParallel(nn.DataParallel): """ This is a custom version of DataParallel that works better with our training data. It should also be faster than the general case. """ def scatter(self, inputs, kwargs, device_ids): # More like scatter and data prep at the same time. The point is we prep the data in such a way # that no scatter is necessary, and there's no need to shuffle stuff around different GPUs. devices = ['cuda:' + str(x) for x in device_ids] splits = prepare_data(inputs[0], devices, allocation=args.batch_alloc) return [[split[device_idx] for split in splits] for device_idx in range(len(devices))], \ [kwargs] len(devices) def gather(self, outputs, output_device): out = {} for k in outputs[0]: out[k] = torch.stack([output[k].to(output_device) for output in outputs]) return out def train(): if not os.path.exists(args.save_folder): os.mkdir(args.save_folder) dataset = COCODetection(image_path=cfg.dataset.train_images, info_file=cfg.dataset.train_info, transform=SSDAugmentation(MEANS)) if args.validation_epoch > 0: setup_eval() val_dataset = COCODetection(image_path=cfg.dataset.valid_images, info_file=cfg.dataset.valid_info, transform=BaseTransform(MEANS)) # Parallel wraps the underlying module, but when saving and loading we don't want that yolact_net = Yolact() net = yolact_net net.train() if args.log: log = Log(cfg.name, args.log_folder, dict(args._get_kwargs()), overwrite=(args.resume is None), log_gpu_stats=args.log_gpu) # I don't use the timer during training (I use a different timing method). # Apparently there's a race condition with multiple GPUs, so disable it just to be safe. timer.disable_all() # Both of these can set args.resume to None, so do them before the check if args.resume == 'interrupt': args.resume = SavePath.get_interrupt(args.save_folder) elif args.resume == 'latest': args.resume = SavePath.get_latest(args.save_folder, cfg.name) if args.resume is not None: print('Resuming training, loading {}...'.format(args.resume)) yolact_net.load_weights(args.resume) if args.start_iter == -1: args.start_iter = SavePath.from_str(args.resume).iteration else: print('Initializing weights...') yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum, weight_decay=args.decay) criterion = MultiBoxLoss(num_classes=cfg.num_classes, pos_threshold=cfg.positive_iou_threshold, neg_threshold=cfg.negative_iou_threshold, negpos_ratio=cfg.ohem_negpos_ratio) if args.batch_alloc is not None: args.batch_alloc = [int(x) for x in args.batch_alloc.split(',')] if sum(args.batch_alloc) != args.batch_size: print('Error: Batch allocation (%s) does not sum to batch size (%s).' % (args.batch_alloc, args.batch_size)) exit(-1) net = CustomDataParallel(NetLoss(net, criterion)) if args.cuda: net = net.cuda() # Initialize everything if not cfg.freeze_bn: yolact_net.freeze_bn() # Freeze bn so we don't kill our means yolact_net(torch.zeros(1, 3, cfg.max_size, cfg.max_size).cuda()) if not cfg.freeze_bn: yolact_net.freeze_bn(True) # loss counters loc_loss = 0 conf_loss = 0 iteration = max(args.start_iter, 0) last_time = time.time() epoch_size = len(dataset)+1 // args.batch_size num_epochs = math.ceil(cfg.max_iter / epoch_size) # Which learning rate adjustment step are we on? lr' = lr * gamma ^ step_index step_index = 0 data_loader = data.DataLoader(dataset, args.batch_size, num_workers=args.num_workers, shuffle=True, collate_fn=detection_collate, pin_memory=True) save_path = lambda epoch, iteration: SavePath(cfg.name, epoch, iteration).get_path(root=args.save_folder) time_avg = MovingAverage() global loss_types # Forms the print order loss_avgs = { k: MovingAverage(100) for k in loss_types } print('Begin training!') print() # try-except so you can use ctrl+c to save early and stop training try: for epoch in range(num_epochs): # Resume from start_iter if (epoch+1)epoch_size < iteration: continue for datum in data_loader: # Stop if we've reached an epoch if we're resuming from start_iter if iteration == (epoch+1)epoch_size: break # Stop at the configured number of iterations even if mid-epoch if iteration == cfg.max_iter: break # Change a config setting if we've reached the specified iteration changed = False for change in cfg.delayed_settings: if iteration >= change[0]: changed = True cfg.replace(change[1]) # Reset the loss averages because things might have changed for avg in loss_avgs: avg.reset() # If a config setting was changed, remove it from the list so we don't keep checking if changed: cfg.delayed_settings = [x for x in cfg.delayed_settings if x[0] > iteration] # Warm up by linearly interpolating the learning rate from some smaller value if cfg.lr_warmup_until > 0 and iteration <= cfg.lr_warmup_until: set_lr(optimizer, (args.lr - cfg.lr_warmup_init) * (iteration / cfg.lr_warmup_until) + cfg.lr_warmup_init) # Adjust the learning rate at the given iterations, but also if we resume from past that iteration while step_index < len(cfg.lr_steps) and iteration >= cfg.lr_steps[step_index]: step_index += 1 set_lr(optimizer, args.lr * (args.gamma ** step_index)) # Zero the grad to get ready to compute gradients optimizer.zero_grad() # Forward Pass + Compute loss at the same time (see CustomDataParallel and NetLoss) losses = net(datum) losses = { k: (v).mean() for k,v in losses.items() } # Mean here because Dataparallel loss = sum([losses[k] for k in losses]) # no_inf_mean removes some components from the loss, so make sure to backward through all of it # all_loss = sum([v.mean() for v in losses.values()]) # Backprop loss.backward() # Do this to free up vram even if loss is not finite if torch.isfinite(loss).item(): optimizer.step() # Add the loss to the moving average for bookkeeping for k in losses: loss_avgs[k].add(losses[k].item()) cur_time = time.time() elapsed = cur_time - last_time last_time = cur_time # Exclude graph setup from the timing information if iteration != args.start_iter: time_avg.add(elapsed) if iteration % 10 == 0: eta_str = str(datetime.timedelta(seconds=(cfg.max_iter-iteration) * time_avg.get_avg())).split('.')[0] total = sum([loss_avgs[k].get_avg() for k in losses]) loss_labels = sum([[k, loss_avgs[k].get_avg()] for k in loss_types if k in losses], []) print(('[%3d] %7d ||' + (' %s: %.3f |' * len(losses)) + ' T: %.3f || ETA: %s || timer: %.3f') % tuple([epoch, iteration] + loss_labels + [total, eta_str, elapsed]), flush=True) if args.log: precision = 5 loss_info = {k: round(losses[k].item(), precision) for k in losses} loss_info['T'] = round(loss.item(), precision) if args.log_gpu: log.log_gpu_stats = (iteration % 10 == 0) # nvidia-smi is sloooow log.log('train', loss=loss_info, epoch=epoch, iter=iteration, lr=round(cur_lr, 10), elapsed=elapsed) log.log_gpu_stats = args.log_gpu iteration += 1 if iteration % args.save_interval == 0 and iteration != args.start_iter: if args.keep_latest: latest = SavePath.get_latest(args.save_folder, cfg.name) print('Saving state, iter:', iteration) yolact_net.save_weights(save_path(epoch, iteration)) if args.keep_latest and latest is not None: if args.keep_latest_interval <= 0 or iteration % args.keep_latest_interval != args.save_interval: print('Deleting old save...') os.remove(latest) # This is done per epoch if args.validation_epoch > 0: if epoch % args.validation_epoch == 0 and epoch > 0: compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None) # Compute validation mAP after training is finished compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None) except KeyboardInterrupt: if args.interrupt: print('Stopping early. Saving network...') # Delete previous copy of the interrupted network so we don't spam the weights folder SavePath.remove_interrupt(args.save_folder) yolact_net.save_weights(save_path(epoch, repr(iteration) + '_interrupt')) exit() yolact_net.save_weights(save_path(epoch, iteration)) def set_lr(optimizer, new_lr): for param_group in optimizer.param_groups: param_group['lr'] = new_lr global cur_lr cur_lr = new_lr def gradinator(x): x.requires_grad = False return x def prepare_data(datum, devices:list=None, allocation:list=None): with torch.no_grad(): if devices is None: devices = ['cuda:0'] if args.cuda else ['cpu'] if allocation is None: allocation = [args.batch_size // len(devices)] * (len(devices) - 1) allocation.append(args.batch_size - sum(allocation)) # The rest might need more/less images, (targets, masks, num_crowds) = datum cur_idx = 0 for device, alloc in zip(devices, allocation): for _ in range(alloc): images[cur_idx] = gradinator(images[cur_idx].to(device)) targets[cur_idx] = gradinator(targets[cur_idx].to(device)) masks[cur_idx] = gradinator(masks[cur_idx].to(device)) cur_idx += 1 if cfg.preserve_aspect_ratio: # Choose a random size from the batch _, h, w = images[random.randint(0, len(images)-1)].size() for idx, (image, target, mask, num_crowd) in enumerate(zip(images, targets, masks, num_crowds)): images[idx], targets[idx], masks[idx], num_crowds[idx] \ = enforce_size(image, target, mask, num_crowd, w, h) cur_idx = 0 split_images, split_targets, split_masks, split_numcrowds \ = [[None for alloc in allocation] for _ in range(4)] for device_idx, alloc in enumerate(allocation): split_images[device_idx] = torch.stack(images[cur_idx:cur_idx+alloc], dim=0) split_targets[device_idx] = targets[cur_idx:cur_idx+alloc] split_masks[device_idx] = masks[cur_idx:cur_idx+alloc] split_numcrowds[device_idx] = num_crowds[cur_idx:cur_idx+alloc] cur_idx += alloc return split_images, split_targets, split_masks, split_numcrowds def no_inf_mean(x:torch.Tensor): """ Computes the mean of a vector, throwing out all inf values. If there are no non-inf values, this will return inf (i.e., just the normal mean). """ no_inf = [a for a in x if torch.isfinite(a)] if len(no_inf) > 0: return sum(no_inf) / len(no_inf) else: return x.mean() def compute_validation_loss(net, data_loader, criterion): global loss_types with torch.no_grad(): losses = {} # Don't switch to eval mode because we want to get losses iterations = 0 for datum in data_loader: images, targets, masks, num_crowds = prepare_data(datum) out = net(images) wrapper = ScatterWrapper(targets, masks, num_crowds) _losses = criterion(out, wrapper, wrapper.make_mask()) for k, v in _losses.items(): v = v.mean().item() if k in losses: losses[k] += v else: losses[k] = v iterations += 1 if args.validation_size <= iterations * args.batch_size: break for k in losses: losses[k] /= iterations loss_labels = sum([[k, losses[k]] for k in loss_types if k in losses], []) print(('Validation ||' + (' %s: %.3f |' * len(losses)) + ')') % tuple(loss_labels), flush=True) def compute_validation_map(epoch, iteration, yolact_net, dataset, log:Log=None): with torch.no_grad(): yolact_net.eval() start = time.time() print() print("Computing validation mAP (this may take a while)...", flush=True) val_info = eval_script.evaluate(yolact_net, dataset, train_mode=True) end = time.time() if log is not None: log.log('val', val_info, elapsed=(end - start), epoch=epoch, iter=iteration) yolact_net.train() def setup_eval(): eval_script.parse_args(['--no_bar', '--max_images='+str(args.validation_size)]) if name == 'main': train() 上述代码怎么插进def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) # 在模型初始化后调用 total_params = count_parameters(yolact_model) print(f"Params: {total_params/1e6:.3f}M") # 转换为百万单位 pip install thop # 先安装依赖 from thop import profile # 在模型初始化后添加 input = torch.randn(1, 3, cfg.img_size, cfg.img_size).to(device) flops, _ = profile(yolact_model, inputs=(input,)) print(f"GFLOPs: {flops/1e9:.2f}G") # 转换为十亿单位 import time # 在推理循环前添加 total_time = 0 num_frames = 100 # 测试帧数 # 预热 for _ in range(10): yolact_model(torch.randn(1,3,550,550).to(device)) # 正式测试 for i in range(num_frames): start = time.perf_counter() # 原推理代码 with torch.no_grad(): preds = yolact_model(input_image) # 后处理代码 # ... total_time += time.perf_counter() - start fps = num_frames / total_time print(f"FPS: {fps:.2f}") def calculate_iou(box1, box2): # box格式: [x1, y1, x2, y2] inter_x1 = max(box1[0], box2[0]) inter_y1 = max(box1[1], box2[1]) inter_x2 = min(box1[2], box2[2]) inter_y2 = min(box1[3], box2[3]) inter_area = max(0, inter_x2 - inter_x1) * max(0, inter_y2 - inter_y1) union_area = (box1[2]-box1[0])(box1[3]-box1[1]) + \ (box2[2]-box2[0])(box2[3]-box2[1]) - inter_area return inter_area / union_area # 在检测结果与GT匹配时调用 for pred_box, gt_box in zip(predictions, ground_truths): iou = calculate_iou(pred_box, gt_box) # 存储或输出iou def print_metrics(params, gflops, fps, iou=None): print(f"Model Metrics:") print(f"├── Parameters: {params:.3f}M") print(f"├── GFLOPs: {gflops:.2f}G") print(f"├── FPS: {fps:.2f}") if iou is not None: print(f"└── mIoU: {iou:.4f}") # 在评估流程结束时调用

print(f"└── mIoU: {val_info.get('mIoU', 0):.4f}") # 记录所有指标 if log is not None: metrics = { 'params': total_params, 'gflops': flops/1e9, 'fps': fps, 'mIoU': val_info.get('mIoU', 0) } ...

import torch from thop import profile # 1. 创建原始模型（未编译） model = YourModel().eval() # 确保在评估模式 # 2. 计算 FLOPs（thop 可安全添加 total_ops） input_tensor = torch.randn(1, 3, 224, 224) flops, params = profile(model, inputs=(input_tensor,)) # 3. 打印结果后编译模型 print(f"FLOPs: {flops/1e9:.2f} G, Params: {params/1e6:.2f} M") scripted_model = torch.jit.script(model) # 安全编译这个代码放在以下代码哪个地方 from data import * from utils.augmentations import SSDAugmentation, BaseTransform from utils.functions import MovingAverage, SavePath from utils.logger import Log from utils import timer from layers.modules import MultiBoxLoss from yolact import Yolact from thop import profile import os import sys import time import math, random from pathlib import Path import torch from torch.autograd import Variable import torch.nn as nn import torch.optim as optim import torch.backends.cudnn as cudnn import torch.nn.init as init import torch.utils.data as data import numpy as np import argparse import datetime # Oof import eval as eval_script def str2bool(v): return v.lower() in ("yes", "true", "t", "1") parser = argparse.ArgumentParser( description='Yolact Training Script') parser.add_argument('--batch_size', default=2, type=int, help='Batch size for training') parser.add_argument('--resume', default=None, type=str, help='Checkpoint state_dict file to resume training from. If this is "interrupt"'\ ', the model will resume training from the interrupt file.') parser.add_argument('--start_iter', default=-1, type=int, help='Resume training at this iter. If this is -1, the iteration will be'\ 'determined from the file name.') parser.add_argument('--num_workers', default=0, type=int, help='Number of workers used in dataloading') parser.add_argument('--cuda', default=True, type=str2bool, help='Use CUDA to train model') parser.add_argument('--lr', '--learning_rate', default=None, type=float, help='Initial learning rate. Leave as None to read this from the config.') parser.add_argument('--momentum', default=None, type=float, help='Momentum for SGD. Leave as None to read this from the config.') parser.add_argument('--decay', '--weight_decay', default=None, type=float, help='Weight decay for SGD. Leave as None to read this from the config.') parser.add_argument('--gamma', default=None, type=float, help='For each lr step, what to multiply the lr by. Leave as None to read this from the config.') parser.add_argument('--save_folder', default='weights/', help='Directory for saving checkpoint models.') parser.add_argument('--log_folder', default='logs/', help='Directory for saving logs.') parser.add_argument('--config', default=None, help='The config object to use.') parser.add_argument('--save_interval', default=10000, type=int, help='The number of iterations between saving the model.') parser.add_argument('--validation_size', default=5000, type=int, help='The number of images to use for validation.') parser.add_argument('--validation_epoch', default=2, type=int, help='Output validation information every n iterations. If -1, do no validation.') parser.add_argument('--keep_latest', dest='keep_latest', action='store_true', help='Only keep the latest checkpoint instead of each one.') parser.add_argument('--keep_latest_interval', default=100000, type=int, help='When --keep_latest is on, don\'t delete the latest file at these intervals. This should be a multiple of save_interval or 0.') parser.add_argument('--dataset', default=None, type=str, help='If specified, override the dataset specified in the config with this one (example: coco2017_dataset).') parser.add_argument('--no_log', dest='log', action='store_false', help='Don\'t log per iteration information into log_folder.') parser.add_argument('--log_gpu', dest='log_gpu', action='store_true', help='Include GPU information in the logs. Nvidia-smi tends to be slow, so set this with caution.') parser.add_argument('--no_interrupt', dest='interrupt', action='store_false', help='Don\'t save an interrupt when KeyboardInterrupt is caught.') parser.add_argument('--batch_alloc', default=None, type=str, help='If using multiple GPUS, you can set this to be a comma separated list detailing which GPUs should get what local batch size (It should add up to your total batch size).') parser.add_argument('--no_autoscale', dest='autoscale', action='store_false', help='YOLACT will automatically scale the lr and the number of iterations depending on the batch size. Set this if you want to disable that.') parser.set_defaults(keep_latest=False, log=True, log_gpu=False, interrupt=True, autoscale=True) args = parser.parse_args() if args.config is not None: set_cfg(args.config) if args.dataset is not None: set_dataset(args.dataset) if args.autoscale and args.batch_size != 8: factor = args.batch_size / 8 if name == 'main': print('Scaling parameters by %.2f to account for a batch size of %d.' % (factor, args.batch_size)) cfg.lr = factor cfg.max_iter //= factor cfg.lr_steps = [x // factor for x in cfg.lr_steps] # Update training parameters from the config if necessary def replace(name): if getattr(args, name) == None: setattr(args, name, getattr(cfg, name)) replace('lr') replace('decay') replace('gamma') replace('momentum') # This is managed by set_lr cur_lr = args.lr if torch.cuda.device_count() == 0: print('No GPUs detected. Exiting...') exit(-1) if args.batch_size // torch.cuda.device_count() < 6: if name == 'main': print('Per-GPU batch size is less than the recommended limit for batch norm. Disabling batch norm.') cfg.freeze_bn = True loss_types = ['B', 'C', 'M', 'P', 'D', 'E', 'S', 'I'] if torch.cuda.is_available(): if args.cuda: torch.set_default_tensor_type('torch.cuda.FloatTensor') if not args.cuda: print("WARNING: It looks like you have a CUDA device, but aren't " + "using CUDA.\nRun with --cuda for optimal training speed.") torch.set_default_tensor_type('torch.FloatTensor') else: torch.set_default_tensor_type('torch.FloatTensor') class NetLoss(nn.Module): """ A wrapper for running the network and computing the loss This is so we can more efficiently use DataParallel. """ def init(self, net:Yolact, criterion:MultiBoxLoss): super().init() self.net = net self.criterion = criterion def forward(self, images, targets, masks, num_crowds): preds = self.net(images) losses = self.criterion(self.net, preds, targets, masks, num_crowds) return losses class CustomDataParallel(nn.DataParallel): """ This is a custom version of DataParallel that works better with our training data. It should also be faster than the general case. """ def scatter(self, inputs, kwargs, device_ids): # More like scatter and data prep at the same time. The point is we prep the data in such a way # that no scatter is necessary, and there's no need to shuffle stuff around different GPUs. devices = ['cuda:' + str(x) for x in device_ids] splits = prepare_data(inputs[0], devices, allocation=args.batch_alloc) return [[split[device_idx] for split in splits] for device_idx in range(len(devices))], \ [kwargs] len(devices) def gather(self, outputs, output_device): out = {} for k in outputs[0]: out[k] = torch.stack([output[k].to(output_device) for output in outputs]) return out def train(): if not os.path.exists(args.save_folder): os.mkdir(args.save_folder) dataset = COCODetection(image_path=cfg.dataset.train_images, info_file=cfg.dataset.train_info, transform=SSDAugmentation(MEANS)) if args.validation_epoch > 0: setup_eval() val_dataset = COCODetection(image_path=cfg.dataset.valid_images, info_file=cfg.dataset.valid_info, transform=BaseTransform(MEANS)) # Parallel wraps the underlying module, but when saving and loading we don't want that yolact_net = Yolact() net = yolact_net net.train() # 添加参数量计算 def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) total_params = count_parameters(yolact_net) print(f"Model Parameters: {total_params / 1e6:.3f}M") # 转换为百万单位 # 在模型初始化后添加 input_size = (1, 3, cfg.max_size, cfg.max_size) dummy_input = torch.zeros(input_size).to("cuda" if args.cuda else "cpu") flops, _ = profile(yolact_net, inputs=(dummy_input,), verbose=False) print(f"GFLOPs: {flops / 1e9:.2f}G") if args.log: log = Log(cfg.name, args.log_folder, dict(args._get_kwargs()), overwrite=(args.resume is None), log_gpu_stats=args.log_gpu) # I don't use the timer during training (I use a different timing method). # Apparently there's a race condition with multiple GPUs, so disable it just to be safe. timer.disable_all() # Both of these can set args.resume to None, so do them before the check if args.resume == 'interrupt': args.resume = SavePath.get_interrupt(args.save_folder) elif args.resume == 'latest': args.resume = SavePath.get_latest(args.save_folder, cfg.name) if args.resume is not None: print('Resuming training, loading {}...'.format(args.resume)) yolact_net.load_weights(args.resume) if args.start_iter == -1: args.start_iter = SavePath.from_str(args.resume).iteration else: print('Initializing weights...') yolact_net.init_weights(backbone_path=args.save_folder + cfg.backbone.path) optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum, weight_decay=args.decay) criterion = MultiBoxLoss(num_classes=cfg.num_classes, pos_threshold=cfg.positive_iou_threshold, neg_threshold=cfg.negative_iou_threshold, negpos_ratio=cfg.ohem_negpos_ratio) if args.batch_alloc is not None: args.batch_alloc = [int(x) for x in args.batch_alloc.split(',')] if sum(args.batch_alloc) != args.batch_size: print('Error: Batch allocation (%s) does not sum to batch size (%s).' % (args.batch_alloc, args.batch_size)) exit(-1) net = CustomDataParallel(NetLoss(net, criterion)) if args.cuda: net = net.cuda() # Initialize everything if not cfg.freeze_bn: yolact_net.freeze_bn() # Freeze bn so we don't kill our means yolact_net(torch.zeros(1, 3, cfg.max_size, cfg.max_size).cuda()) if not cfg.freeze_bn: yolact_net.freeze_bn(True) # loss counters loc_loss = 0 conf_loss = 0 iteration = max(args.start_iter, 0) last_time = time.time() epoch_size = len(dataset)+1 // args.batch_size num_epochs = math.ceil(cfg.max_iter / epoch_size) # Which learning rate adjustment step are we on? lr' = lr gamma ^ step_index step_index = 0 data_loader = data.DataLoader(dataset, args.batch_size, num_workers=args.num_workers, shuffle=True, collate_fn=detection_collate, pin_memory=True) save_path = lambda epoch, iteration: SavePath(cfg.name, epoch, iteration).get_path(root=args.save_folder) time_avg = MovingAverage() global loss_types # Forms the print order loss_avgs = { k: MovingAverage(100) for k in loss_types } print('Begin training!') print() # try-except so you can use ctrl+c to save early and stop training try: for epoch in range(num_epochs): # Resume from start_iter if (epoch+1)epoch_size < iteration: continue for datum in data_loader: # Stop if we've reached an epoch if we're resuming from start_iter if iteration == (epoch+1)epoch_size: break # Stop at the configured number of iterations even if mid-epoch if iteration == cfg.max_iter: break # Change a config setting if we've reached the specified iteration changed = False for change in cfg.delayed_settings: if iteration >= change[0]: changed = True cfg.replace(change[1]) # Reset the loss averages because things might have changed for avg in loss_avgs: avg.reset() # If a config setting was changed, remove it from the list so we don't keep checking if changed: cfg.delayed_settings = [x for x in cfg.delayed_settings if x[0] > iteration] # Warm up by linearly interpolating the learning rate from some smaller value if cfg.lr_warmup_until > 0 and iteration <= cfg.lr_warmup_until: set_lr(optimizer, (args.lr - cfg.lr_warmup_init) * (iteration / cfg.lr_warmup_until) + cfg.lr_warmup_init) # Adjust the learning rate at the given iterations, but also if we resume from past that iteration while step_index < len(cfg.lr_steps) and iteration >= cfg.lr_steps[step_index]: step_index += 1 set_lr(optimizer, args.lr * (args.gamma ** step_index)) # Zero the grad to get ready to compute gradients optimizer.zero_grad() # Forward Pass + Compute loss at the same time (see CustomDataParallel and NetLoss) losses = net(datum) losses = { k: (v).mean() for k,v in losses.items() } # Mean here because Dataparallel loss = sum([losses[k] for k in losses]) # no_inf_mean removes some components from the loss, so make sure to backward through all of it # all_loss = sum([v.mean() for v in losses.values()]) # Backprop loss.backward() # Do this to free up vram even if loss is not finite if torch.isfinite(loss).item(): optimizer.step() # Add the loss to the moving average for bookkeeping for k in losses: loss_avgs[k].add(losses[k].item()) cur_time = time.time() elapsed = cur_time - last_time last_time = cur_time # Exclude graph setup from the timing information if iteration != args.start_iter: time_avg.add(elapsed) if iteration % 10 == 0: eta_str = str(datetime.timedelta(seconds=(cfg.max_iter-iteration) * time_avg.get_avg())).split('.')[0] total = sum([loss_avgs[k].get_avg() for k in losses]) loss_labels = sum([[k, loss_avgs[k].get_avg()] for k in loss_types if k in losses], []) print(('[%3d] %7d ||' + (' %s: %.3f |' * len(losses)) + ' T: %.3f || ETA: %s || timer: %.3f') % tuple([epoch, iteration] + loss_labels + [total, eta_str, elapsed]), flush=True) if args.log: precision = 5 loss_info = {k: round(losses[k].item(), precision) for k in losses} loss_info['T'] = round(loss.item(), precision) if args.log_gpu: log.log_gpu_stats = (iteration % 10 == 0) # nvidia-smi is sloooow log.log('train', loss=loss_info, epoch=epoch, iter=iteration, lr=round(cur_lr, 10), elapsed=elapsed) log.log_gpu_stats = args.log_gpu iteration += 1 if iteration % args.save_interval == 0 and iteration != args.start_iter: if args.keep_latest: latest = SavePath.get_latest(args.save_folder, cfg.name) print('Saving state, iter:', iteration) yolact_net.save_weights(save_path(epoch, iteration)) if args.keep_latest and latest is not None: if args.keep_latest_interval <= 0 or iteration % args.keep_latest_interval != args.save_interval: print('Deleting old save...') os.remove(latest) # This is done per epoch if args.validation_epoch > 0: if epoch % args.validation_epoch == 0 and epoch > 0: compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None) # Compute validation mAP after training is finished compute_validation_map(epoch, iteration, yolact_net, val_dataset, log if args.log else None) except KeyboardInterrupt: if args.interrupt: print('Stopping early. Saving network...') # Delete previous copy of the interrupted network so we don't spam the weights folder SavePath.remove_interrupt(args.save_folder) yolact_net.save_weights(save_path(epoch, repr(iteration) + '_interrupt')) exit() yolact_net.save_weights(save_path(epoch, iteration)) def set_lr(optimizer, new_lr): for param_group in optimizer.param_groups: param_group['lr'] = new_lr global cur_lr cur_lr = new_lr def gradinator(x): x.requires_grad = False return x def prepare_data(datum, devices:list=None, allocation:list=None): with torch.no_grad(): if devices is None: devices = ['cuda:0'] if args.cuda else ['cpu'] if allocation is None: allocation = [args.batch_size // len(devices)] * (len(devices) - 1) allocation.append(args.batch_size - sum(allocation)) # The rest might need more/less images, (targets, masks, num_crowds) = datum cur_idx = 0 for device, alloc in zip(devices, allocation): for _ in range(alloc): images[cur_idx] = gradinator(images[cur_idx].to(device)) targets[cur_idx] = gradinator(targets[cur_idx].to(device)) masks[cur_idx] = gradinator(masks[cur_idx].to(device)) cur_idx += 1 if cfg.preserve_aspect_ratio: # Choose a random size from the batch _, h, w = images[random.randint(0, len(images)-1)].size() for idx, (image, target, mask, num_crowd) in enumerate(zip(images, targets, masks, num_crowds)): images[idx], targets[idx], masks[idx], num_crowds[idx] \ = enforce_size(image, target, mask, num_crowd, w, h) cur_idx = 0 split_images, split_targets, split_masks, split_numcrowds \ = [[None for alloc in allocation] for _ in range(4)] for device_idx, alloc in enumerate(allocation): split_images[device_idx] = torch.stack(images[cur_idx:cur_idx+alloc], dim=0) split_targets[device_idx] = targets[cur_idx:cur_idx+alloc] split_masks[device_idx] = masks[cur_idx:cur_idx+alloc] split_numcrowds[device_idx] = num_crowds[cur_idx:cur_idx+alloc] cur_idx += alloc return split_images, split_targets, split_masks, split_numcrowds def no_inf_mean(x:torch.Tensor): """ Computes the mean of a vector, throwing out all inf values. If there are no non-inf values, this will return inf (i.e., just the normal mean). """ no_inf = [a for a in x if torch.isfinite(a)] if len(no_inf) > 0: return sum(no_inf) / len(no_inf) else: return x.mean() def compute_validation_loss(net, data_loader, criterion): global loss_types with torch.no_grad(): losses = {} # Don't switch to eval mode because we want to get losses iterations = 0 for datum in data_loader: images, targets, masks, num_crowds = prepare_data(datum) out = net(images) wrapper = ScatterWrapper(targets, masks, num_crowds) _losses = criterion(out, wrapper, wrapper.make_mask()) for k, v in _losses.items(): v = v.mean().item() if k in losses: losses[k] += v else: losses[k] = v iterations += 1 if args.validation_size <= iterations * args.batch_size: break for k in losses: losses[k] /= iterations loss_labels = sum([[k, losses[k]] for k in loss_types if k in losses], []) print(('Validation ||' + (' %s: %.3f |' * len(losses)) + ')') % tuple(loss_labels), flush=True) # 修改 compute_validation_map 函数 def compute_validation_map(epoch, iteration, yolact_net, dataset, log: Log = None): with torch.no_grad(): yolact_net.eval() # 添加 FPS 计算 num_test_frames = 100 total_time = 0 # 预热 GPU for _ in range(10): _ = yolact_net(torch.zeros(1, 3, cfg.max_size, cfg.max_size).cuda()) # 正式测试 for i in range(num_test_frames): img, _ = dataset[i] img = img.unsqueeze(0).cuda() start_time = time.perf_counter() preds = yolact_net(img) torch.cuda.synchronize() # 确保 CUDA 操作完成 total_time += time.perf_counter() - start_time fps = num_test_frames / total_time print(f"FPS: {fps:.2f}") # 原有验证代码 print("\nComputing validation mAP...") val_info = eval_script.evaluate(yolact_net, dataset, train_mode=True) # 记录 FPS if log is not None: log.log('val', {'fps': fps}, epoch=epoch, iter=iteration) yolact_net.train() return fps # 在 compute_validation_map 函数中 print(f"\nValidation Metrics @ iter {iteration}:") print(f"├── Params: {total_params / 1e6:.3f}M") print(f"├── GFLOPs: {flops / 1e9:.2f}G") print(f"├── FPS: {fps:.2f}") print(f"└── mIoU: {val_info.get('mIoU', 0):.4f}") # 记录所有指标 if log is not None: metrics = { 'params': total_params, 'gflops': flops / 1e9, 'fps': fps, 'mIoU': val_info.get('mIoU', 0) } log.log('metrics', metrics, epoch=epoch, iter=iteration) def setup_eval(): eval_script.parse_args(['--no_bar', '--max_images='+str(args.validation_size)]) if name == 'main': train()

我们正在修改YOLACT的训练脚本(train.py)，目的是在模型初始化后添加FLOPs和参数量的计算。根据用户提供的错误信息，问题发生在使用thop.profile计算FLOPs时，因为模型已经被ScriptModule化（可能由于某些原因已经被...

解释一下以下代码：def train_model(model, criterion, optimizer, scheduler, num_epochs=25): since = time.time() # 返回当前时间的时间戳 # deepcopy为深拷贝,即创建一个新的对象，完全复制原始对象及其所有子对象。 # model.state_dict是PyTorch中的一个方法，返回一个字典对象，将模型的所有参数映射到它们的张量值 best_model_wts = copy.deepcopy(model.state_dict()) best_acc = 0.0 # 显示当前的训练进度 for epoch in range(num_epochs): print('Epoch {}/{}'.format(epoch, num_epochs - 1)) # 输出训练进度 print('-' * 10) # 每个epoch都有一个训练和验证阶段 for phase in ['train', 'val']: if phase == 'train': # 调用Optimizer的step函数使它所有参数更新 optimizer.step() # PyTorch中用于更新优化器的学习率 scheduler.step() model.train() # 设置模型的工作模式：训练模式，模型会更新参数 else: model.eval() # 设置模型的工作模式：评估模式，模型不会更新参数 running_loss = 0.0 # 用于记录当前epoch的累计损失 running_corrects = 0 # 用于记录当前正确分类的数量 # 迭代数据 for inputs, labels in dataloaders[phase]: # GPU加速,inputs、labels用于将输入数据和标签数据移动到指定的设备上，例如CPU或GPU inputs = inputs.to(device) labels = labels.to(device) # 清空梯度，在每次优化前都需要进行此操作 optimizer.zero_grad() # torch.set_grad_enabled在训练模式下启用梯度计算，而在评估模式下禁用梯度计算 with torch.set_grad_enabled(phase == 'train'): outputs = model(inputs) _, preds = torch.max(outputs, 1) # 返回输入张量在指定维度上的最大值及其对应的索引 loss = criterion(outputs, labels) # 后向 + 仅在训练阶段进行优化 if phase == 'train': # 反向传播：根据模型的参数计算loss的梯度 loss.backward() # 调用Optimizer的step函数使它所有参数更新 optimizer.step() running_loss += loss.item() * inputs.size(0) # 获取损失值和输入数据的数量 running_corrects += torch.sum(preds == labels.data) # 记录当前epoch中正确分类的数量 # dataset_sizes[phase]：该变量用于获取数据集在当前阶段中的大小 epoch_loss = running_loss / dataset_sizes[phase] #当前epoch的平均损失 epoch_acc = running_corrects.double() / dataset_sizes[phase] # 当前epoch的准确率 # 打印当前epoch的损失和准确率，以实时监控模型的训练状态 print('{} Loss: {:.4f} Acc: {:.4f}'.format( phase, epoch_loss, epoch_acc)) # 前阶段的名称、当前epoch的平均损失、当前epoch的准确率 if phase == 'val' and epoch_acc > best_acc: #更新最佳模型参数 best_acc = epoch_acc best_model_wts = copy.deepcopy(model.state_dict()) print() time_elapsed = time.time() - since # 输出整个训练过程的时间 print('Training complete in {:.0f}m {:.0f}s'.format( time_elapsed // 60, time_elapsed % 60)) # 用于输出最佳验证准确率,best_acc是训练过程中被更新为模型在验证集上的最佳准确率 print('Best val Acc: {:4f}'.format(best_acc)) # 用于加载模型的状态字典，即将最佳模型的参数设置为给定的参数字典 model.load_state_dict(best_model_wts) return model

模型参数保存（Model Checkpoint Saving）下面将逐一解释这些部分。 ### 1. 训练循环（Training Loop）训练循环是模型训练的核心，通常在每个epoch中遍历整个训练数据集。主要步骤包括： - 将模型设置为...

基于Django的酒店预订信息管理系统

酒店预订信息管理系统是基于Django框架开发的，用于提升酒店业和旅游业的效率和用户体验。系统采用Django的MTV架构模式，实现模块化开发，便于维护和团队协作。系统包含用户注册登录、酒店信息展示、在线预订、订单管理、用户评论反馈等核心功能。用户注册登录模块支持信息的新增、编辑、查询和删除，并采用密码加密技术保障数据安全。酒店信息展示模块通过数据库存储和管理酒店的详细信息，如房间类型、价格、设施和图片。在线预订模块设计预订流程，包括房间选择、时间设定和支付方式，确保用户便捷完成预订。订单管理模块供酒店管理者查看和处理订单，具备权限管理功能。用户评论反馈模块为用户和酒店提供互动平台，用户可评价服务，酒店可回复反馈。Django框架提供数据库支持、用户认证系统、内容管理系统等组件，提高开发效率和系统稳定性。系统可部署在多种环境中，支持传统服务器和Docker容器化部署。该系统为酒店提供全面的数字化解决方案，提升管理效率和用户体验，是酒店业数字化转型的重要工具。资源来源于网络分享，仅用于学习交流使用，请勿用于商业，如有侵权请联系我删除！

Selenium实现自动登录脚本.doc

滑动窗口算法模板及刷题应用场景详解.doc

美格SMR815系列资料.rar

美格SMR815系列资料

TDA3000+EOG招标参数_CN_2012-2H.doc

Vue项目中封装通用Dialog弹窗组件.doc

系统集成智能化工程专业技术方案.doc

Redis持久化机制RDB与AOF全解析.doc

实现多个设备同时充电的示例效果.zip

1.版本：matlab2014a/2019b/2024b 2.附赠案例数据可直接运行。 3.代码特点：参数化编程、参数可方便更改、代码编程思路清晰、注释明细。 4.适用对象：计算机，电子信息工程、数学等专业的大学生课程设计、期末大作业和毕业设计。

【Python编程】基于Python的网络爬虫技术详解：数据采集与处理的基本流程与实践指南

内容概要：本文介绍了Python爬虫的基础知识、常见步骤及注意事项。首先概述了Python爬虫的概念，即一种用Python编写的程序，可自动从网络获取并提取网页信息，广泛应用于数据采集、信息监测等领域。接着详细阐述了爬虫的五个常见步骤：确定目标与分析网页，明确所需数据及其呈现形式；发送请求，利用urllib或requests库获取网页内容；解析网页内容，借助BeautifulSoup或lxml库提取有用信息；提取数据，依据网页结构定位并获取目标数据；数据存储，将提取的数据保存为CSV、JSON或存入数据库。最后强调了使用爬虫时应注意遵守法律法规、遵循网站规则、避免过度请求等事项。; 适合人群：对Python爬虫感兴趣的初学者，包括但不限于程序员、数据分析师等。; 使用场景及目标：①学习Python爬虫的基本概念与工作原理；②掌握构建简单爬虫程序的技能，能够独立完成从确定目标到数据存储的全过程。; 阅读建议：由于涉及到具体的代码示例，在阅读过程中应结合实际操作练习，同时注意遵守文中提到的法律法规与道德规范，确保爬虫活动合法合规。

2025年人工智能AI指数报告.pdf

重新构建的Dao Jet数据库引擎

资源下载链接为： https://pan.quark.cn/s/9e7ef05254f8 DAO Jet 数据库引擎是微软早期开发的用于操作 Access 数据库的组件，与 Jet 引擎紧密相连。Jet 引擎是 Windows 操作系统内置的数据库管理系统，主要支持 Access 数据库及其他使用 Jet 数据存储格式的应用程序。DAO（数据访问对象）是与 Jet 引擎搭配的数据访问接口，能让程序员利用对象和方法操作数据库。当出现“无法初始化 DAO/Jet 数据库引擎”的问题时，通常是由于 DAO 或 Jet 引擎组件损坏、丢失或版本不兼容引起的，这可能是系统更新、病毒入侵、卸载不当或其他软件冲突所致。解决办法之一是从 VC6 光盘中提取相关文件重新安装 DAO Jet 数据库引擎。DAO 接口包含多种类和接口，如 Database、Recordset、Field 等，开发者可借助这些对象执行 SQL 查询、创建和修改表、索引及查询等操作。与 ADO（ActiveX 数据对象）相比，DAO 功能稍弱，但因更贴近底层的 Jet 引擎，在处理 Access 特定特性时可能更便捷。安装 DAO Jet 数据库引擎一般包括以下步骤：下载包含所有必要 DAO 组件的 DAO-REDIST 文件；运行安装程序，按向导提示操作；安装时系统会自动检测并修复 DAO 组件问题或安装缺失组件；安装完成后，可能需重启计算机使更改生效；之后可通过编写简单 VBA 代码测试数据库连接，验证 DAO 是否正常工作。需注意，DAO Jet 数据库引擎主要适用于旧系统和应用程序，新版本的 Microsoft Office 和 Windows 操作系统不再推荐使用 DAO，而是更倾向于采用 ADO 接口以及 SQL Server Express 等更安全、更强大的数据库管理系统。不过，对于依赖 DAO 技

【AI编程与智能应用开发】基于扣子平台COZE的编程学习助手智能体构建与实战案例分析

内容概要：文章详细介绍了字节跳动推出的智能体开发平台——扣子（Coze），重点讲解了如何在该平台上构建一个“编程学习助手”智能体。此智能体能够支持自然语言编程提问、展示代码示例、集成 GitHub 插件查询开源项目、支持语义上下文记忆，并可部署至微信、网页等多个端口。通过图形化流程设计、系统提示词引导、插件功能扩展以及上下文记忆等功能，Coze 平台大大降低了 AI 应用开发的门槛，使得非程序员也能创建实用的智能产品。此外，Coze 还提供了便捷的跨平台部署方式，增强了智能体的实用性和分发便利性。; 适合人群：对 AI 编程和智能应用开发感兴趣的初学者及个人开发者，特别是那些希望快速上手但缺乏传统编程经验的人群。; 使用场景及目标：①利用图形化界面和提示词创建智能体，降低开发门槛；②构建具备自然语言处理能力的编程学习助手，提高编程学习效率；③通过集成 GitHub 插件，查询和推荐开源项目资源；④实现智能体的多平台部署，扩大应用范围。; 其他说明：Coze 平台不仅适用于个人开发者进行实验性项目，还被广泛应用于企业客服、教育、运营自动化等领域。随着插件生态和大模型技术的发展，Coze 将在未来继续拓展其应用场景和开发深度。

看一下修改时间：Image$$VECTOR$$Base

Image\$\$VECTOR\$\$Base Image$$VECTOR$$Base 客服测试

2025年我国制造业数字化转型发展形势展望.pdf

相关推荐

dlr-checkpoint-navigation-controller-ios:底特律实验室食谱 - 检查点导航控制器

MobileNetV3-SSD-Compact-Version:MobileNetV3 SSD的简洁版本

uPIT-for-speech-separation:语音级别的PIT实验进行语音分离

基于Django的酒店预订信息管理系统

Selenium实现自动登录脚本.doc

滑动窗口算法模板及刷题应用场景详解.doc

美格SMR815系列资料.rar

TDA3000+EOG招标参数_CN_2012-2H.doc

Vue项目中封装通用Dialog弹窗组件.doc

系统集成智能化工程专业技术方案.doc

Redis持久化机制RDB与AOF全解析.doc

实现多个设备同时充电的示例效果.zip

【Python编程】基于Python的网络爬虫技术详解：数据采集与处理的基本流程与实践指南

2025年人工智能AI指数报告.pdf

重新构建的Dao Jet数据库引擎

【AI编程与智能应用开发】基于扣子平台COZE的编程学习助手智能体构建与实战案例分析

看一下修改时间：Image$$VECTOR$$Base

2025年我国制造业数字化转型发展形势展望.pdf

大家在看

ray-optics:光学系统的几何光线追踪

修复Windows 10&11 因更新造成的IE11 无法使用

参考资料-Boost_PFC电路中开关器件的损耗分析与计算.zip

3DSlicer 5.2带中文包-稳定版

KGM转MP3或者FLAC_kgma_kgma格式_FLAC_kgma转换器_kgm转换成flac_亲测完美转换！保证可用。

最新推荐

PKID查壳工具最新版发布，轻松识别安卓安装包加壳

【PDF.js问题诊断手册】：快速定位与解决常见问题

grep -Ei

一键关闭系统更新的工具介绍

【PC3000高级应用】：复杂故障快速诊断与解决

我的数据是names(data) [1] "分组" "排序" "PACU" "V1" "V2" "V3" ；后四个数据分别代表四个时间点的测量值，请你用ART模型进行规范检验，我需要得到分组（即不同处理方法）的差异

鼎捷易飞ERPV9.0委外进货单批量导入解决方案

硬盘数据恢复秘籍：PC3000流程全面解读

MRAM在PLC中起到的作用

局域网内FTP服务安装与搭建指南