Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading the prodigy optimizer does not move custom parameters to the accelerator #3372

Open
4 tasks
bghira opened this issue Jan 29, 2025 · 0 comments
Open
4 tasks

Comments

@bghira
Copy link

bghira commented Jan 29, 2025

System Info

  • Accelerate version: 1.2.1
  • Platform: Linux-6.12.9-gentoo-x86_64-x86_64-AMD_Ryzen_7_7800X3D_8-Core_Processor-with-glibc2.40
  • accelerate bash location: /home/bghira/src/SimpleTuner/.venv/bin/accelerate
  • Python version: 3.11.11
  • Numpy version: 2.2.0
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • PyTorch XPU available: False
  • PyTorch NPU available: False
  • PyTorch MLU available: False
  • PyTorch MUSA available: False
  • System RAM: 124.91 GB
  • GPU type: NVIDIA GeForce RTX 4090
  • Accelerate default config:
    Not found

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
  • My own task or dataset (give details below)

Reproduction

essentially, I am needing this workaround to load resume states for the prodigy optimiser;

        if self.optimizer is not None and self.config.optimizer == "prodigy":
            # fix the device assignment for the prodigy optimizer parameters
            for group in (self.optimizer.param_groups if self.optimizer.optimizer.split_groups else self.optimizer.param_groups[:1]):
                p = group['params'][0]
                group['running_d_numerator'] = group['running_d_numerator'].to(p.device)
                group['running_d_denom'] = group['running_d_denom'].to(p.device)

otherwise we get errors on calculations with d during training because it is on the cpu and everything else is on cuda:0.

Expected behavior

I would expect for optim components to be moved to their correct device at load time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant