Skip to content

Conversation

@ShriyaRishab
Copy link
Contributor

Upstreaming @erhoo82's branch: tp_overlap_patch

@ShriyaRishab
Copy link
Contributor Author

@ptrendx @ksivaman please review, thanks.

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the only functional change is to change some memory accesses to use atomicAdd_system. We also add some more information to an error message, which requires plumbing in layer_name throughout the whole PyTorch infrastructure. If this debugging info is not essential, I feel the extra messiness is not worth it. If it is absolutely needed, then at the very least we should change layer_name to something less awkward and confusing, maybe ub_layer_id or ub_layer_label.

@ShriyaRishab ShriyaRishab force-pushed the shriya/tp_overlap_patch branch from b26cdba to aaced36 Compare May 9, 2023 22:58
@ShriyaRishab
Copy link
Contributor Author

Agreed, removed the logging commit from this PR

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ptrendx
Copy link
Member

ptrendx commented May 9, 2023

@ShriyaPalsamudram please sign your commit.

@layalir layalir force-pushed the shriya/tp_overlap_patch branch from df43bf0 to d178cea Compare May 10, 2023 01:23
@ptrendx ptrendx merged commit 8e5f00f into NVIDIA:release_v0.8 May 10, 2023
ptrendx pushed a commit that referenced this pull request May 10, 2023
userbuffer pushsend/recv fix with atomicAdd_system

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
nzmora-nvidia pushed a commit to nzmora-nvidia/TransformerEngine that referenced this pull request May 10, 2023
userbuffer pushsend/recv fix with atomicAdd_system

Signed-off-by: Sangkug Lym <[email protected]>
Co-authored-by: Sangkug Lym <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants