Skip to content

Support #[global_allocator] without the allocator shim#86844

Merged
bors merged 14 commits intorust-lang:masterfrom
bjorn3:global_alloc_improvements
May 25, 2023
Merged

Support #[global_allocator] without the allocator shim#86844
bors merged 14 commits intorust-lang:masterfrom
bjorn3:global_alloc_improvements

Conversation

@bjorn3
Copy link
Member

@bjorn3 bjorn3 commented Jul 3, 2021

This makes it possible to use liballoc/libstd in combination with --emit obj if you use #[global_allocator]. This is what rust-for-linux uses right now and systemd may use in the future. Currently they have to depend on the exact implementation of the allocator shim to create one themself as --emit obj doesn't create an allocator shim.

Note that currently the allocator shim also defines the oom error handler, which is normally required too. Once #![feature(default_alloc_error_handler)] becomes the only option, this can be avoided. In addition when using only fallible allocator methods and either --cfg no_global_oom_handling for liballoc (like rust-for-linux) or --gc-sections no references to the oom error handler will exist.

To avoid this feature being insta-stable, you will have to define __rust_no_alloc_shim_is_unstable to avoid linker errors.

(Labeling this with both T-compiler and T-lang as it originally involved both an implementation detail and had an insta-stable user facing change. As noted above, the __rust_no_alloc_shim_is_unstable symbol requirement should prevent unintended dependence on this unstable feature.)

@bjorn3 bjorn3 added A-linkage Area: linking into static, shared libraries and binaries A-allocators Area: Custom and system allocators T-lang Relevant to the language team T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 3, 2021
@rust-highfive
Copy link
Contributor

Some changes occured to rustc_codegen_cranelift

cc @bjorn3

@rust-highfive
Copy link
Contributor

r? @jackh726

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 3, 2021
@rust-log-analyzer

This comment has been minimized.

@bjorn3 bjorn3 force-pushed the global_alloc_improvements branch from ba27e9f to 51f054c Compare July 3, 2021 16:11
@rust-log-analyzer

This comment has been minimized.

@bjorn3 bjorn3 force-pushed the global_alloc_improvements branch from e4a996a to b2b1a59 Compare July 3, 2021 16:40
@jyn514
Copy link
Member

jyn514 commented Jul 3, 2021

Hmm, r? @scottmcm maybe?

@camelid camelid added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 23, 2021
@bors
Copy link
Collaborator

bors commented Aug 6, 2021

☔ The latest upstream changes (presumably #87822) made this pull request unmergeable. Please resolve the merge conflicts.

@bjorn3 bjorn3 force-pushed the global_alloc_improvements branch from 355b470 to 4b56c58 Compare August 7, 2021 09:38
@bors
Copy link
Collaborator

bors commented Aug 7, 2021

☔ The latest upstream changes (presumably #87743) made this pull request unmergeable. Please resolve the merge conflicts.

@bjorn3 bjorn3 force-pushed the global_alloc_improvements branch from 4b56c58 to 74b52ae Compare August 8, 2021 09:47
@nbdd0121
Copy link
Member

nbdd0121 commented Aug 8, 2021

It took me a few while to understand this, so essentially instead of generating __rust_alloc that forwards to __rg_alloc or __rdl_alloc based on allocator kind, this PR makes #[global_allocator] generate __rust_alloc directly, while in case global allocator is absent, generate a shim that forwards __rust_alloc to __rdl_alloc.

Would it make sense to (in addition to this PR), just generate a

#[global_allocator]
static GLOBAL: System = System;

at somewhere in higher level when global allocator is absent, instead of having the logic duplicated in codegen? This would allow __rdl_alloc etc to be removed completely (and is more consistent with what document describes in https://doc.rust-lang.org/std/alloc/struct.System.html).

@bjorn3
Copy link
Member Author

bjorn3 commented Aug 8, 2021

Would it make sense to just generate a

#[global_allocator]
static GLOBAL: System = System;

at somewhere in higher level instead of having the logic duplicated in codegen? This would allow __rdl_alloc etc to be removed completely (and is more consistent with what document describes in https://doc.rust-lang.org/std/alloc/struct.System.html).

You can use say --crate-type cdylib --crate-type lib, in which case the cdylib would need the allocator shim, but the lib must not get the allocator shim. Both use the same object files, except for the allocator shim object file that only the bin gets.

@wesleywiser
Copy link
Member

I posted a message in the wg-allocators Zulip stream to see if anyone there has opinions on this PR before it's merged.

@bjorn3
Copy link
Member Author

bjorn3 commented May 11, 2023

Rebased and just like with #106560 I disabled the test on MSVC for now.

@rust-log-analyzer

This comment has been minimized.

@pnkfelix
Copy link
Contributor

@bors r+

@bors
Copy link
Collaborator

bors commented May 25, 2023

📌 Commit 33d9b58 has been approved by pnkfelix

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented May 25, 2023

⌛ Testing commit 33d9b58 with merge a2b1646...

@bors
Copy link
Collaborator

bors commented May 25, 2023

☀️ Test successful - checks-actions
Approved by: pnkfelix
Pushing a2b1646 to master...

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a2b1646): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.2%] 1
Regressions ❌
(secondary)
0.5% [0.3%, 0.7%] 10
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.2% [0.2%, 0.2%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.4% [3.4%, 3.4%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.3% [-1.3%, -1.3%] 1
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.4%] 44
Regressions ❌
(secondary)
0.2% [0.0%, 0.4%] 25
Improvements ✅
(primary)
-0.1% [-0.1%, -0.1%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.1%, 0.4%] 46

Bootstrap: 647.061s -> 648.599s (0.24%)

// Make sure we don't accidentally allow omitting the allocator shim in
// stable code until it is actually stabilized.
#[cfg(not(bootstrap))]
core::ptr::read_volatile(&__rust_no_alloc_shim_is_unstable);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did expect the perf regression to come from this change. It is a single extra instruction on the allocation path to ensure __rust_no_alloc_shim_is_unstable must be defined if no allocator shim is linked in. The only way I can think of that guarantees that it isn't possible to link without defining __rust_no_alloc_shim_is_unstable would be to put __rust_alloc and an item referencing __rust_no_alloc_shim_is_unstable in the same COMDAT group, but this isn't possible for all object file formats and rust doesn't have a way to do this without global_asm!().

Copy link
Contributor

@RReverser RReverser Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that this symbol is now present in every single allocation path, especially where it enlarges the output binary for platforms like Wasm.

Is it instead possible to introduce a new function __rust_no_alloc_shim_is_unstable that would simply forward to __rust_alloc?

That way you still get the desired linker error if it's not declared, but it won't take any more space and won't cause as much of a performance hit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would prevent LLVM from optimizing allocations away I think as __rust_no_alloc_shim_is_unstable is not considered to be an allocator function by LLVM, while __rust_alloc is. That said, I just opened a draft PR which will enable doing away with the allocator shim entirely in the future, after which I hope it will be much easier to request stabilization of support for not using the allocator shim and thus remove this symbol entirely.

@pnkfelix
Copy link
Contributor

@bjorn3 when you say:

I did expect the perf regression to come from this change.

are you referring to the increase in binary object file size?

@bjorn3
Copy link
Member Author

bjorn3 commented May 30, 2023

I'm referring to the instruction count regression. I hadn't noticed the binary object file size regression.

@pnkfelix
Copy link
Contributor

  • The 0.2% hit to primary benchmark serde_derive check-incr_unchanged is easily justified by the feature addition here.
  • The more interesting question is 44 primary benchmarks saw a regression to their binary size. However, the only one of those of note, in my opinion, is ripgrep, which suffered a 0.43% increase to binary size on various opt scenarios.
  • marking as triaged.

@rustbot label: perf-regression-triaged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-allocators Area: Custom and system allocators A-linkage Area: linking into static, shared libraries and binaries A-rust-for-linux Relevant for the Rust-for-Linux project merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team

Projects

None yet

Development

Successfully merging this pull request may close these issues.