-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gettin gradient of loss during inference #3371
Comments
For my understanding, did you confirm that the gradient being |
@BenjaminBossan
But with changes for distrubuted setup fails:
The loss in this case an empty tensor |
Logits in single gpu setup (without accelearate+deepspeed) is a tensor with grad_fn=. But with multiple gpus there is no grad_fn |
I am fine-tuning llama 2 using accelerate+deepseed zero3. During evaluation, which is run after every checkpoint step, I need to calculate gradient loss w.r.t certain input ids. As per my understanding the embedding matrix is sharded and when I try to get the gradient, I get an error saying that grad is set to None. Is there a cleaner way to do it using accelerate APIs?
My code:
The text was updated successfully, but these errors were encountered: