You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)
Reproduction
essentially, I am needing this workaround to load resume states for the prodigy optimiser;
if self.optimizer is not None and self.config.optimizer == "prodigy":
# fix the device assignment for the prodigy optimizer parameters
for group in (self.optimizer.param_groups if self.optimizer.optimizer.split_groups else self.optimizer.param_groups[:1]):
p = group['params'][0]
group['running_d_numerator'] = group['running_d_numerator'].to(p.device)
group['running_d_denom'] = group['running_d_denom'].to(p.device)
otherwise we get errors on calculations with d during training because it is on the cpu and everything else is on cuda:0.
Expected behavior
I would expect for optim components to be moved to their correct device at load time.
The text was updated successfully, but these errors were encountered:
System Info
Accelerate
version: 1.2.1accelerate
bash location: /home/bghira/src/SimpleTuner/.venv/bin/accelerateAccelerate
default config:Not found
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
essentially, I am needing this workaround to load resume states for the prodigy optimiser;
otherwise we get errors on calculations with
d
during training because it is on thecpu
and everything else is oncuda:0
.Expected behavior
I would expect for optim components to be moved to their correct device at load time.
The text was updated successfully, but these errors were encountered: