Support #[global_allocator] without the allocator shim#86844
Support #[global_allocator] without the allocator shim#86844bors merged 14 commits intorust-lang:masterfrom
Conversation
|
Some changes occured to rustc_codegen_cranelift cc @bjorn3 |
|
r? @jackh726 (rust-highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
ba27e9f to
51f054c
Compare
This comment has been minimized.
This comment has been minimized.
e4a996a to
b2b1a59
Compare
|
Hmm, r? @scottmcm maybe? |
|
☔ The latest upstream changes (presumably #87822) made this pull request unmergeable. Please resolve the merge conflicts. |
355b470 to
4b56c58
Compare
|
☔ The latest upstream changes (presumably #87743) made this pull request unmergeable. Please resolve the merge conflicts. |
4b56c58 to
74b52ae
Compare
|
It took me a few while to understand this, so essentially instead of generating Would it make sense to (in addition to this PR), just generate a #[global_allocator]
static GLOBAL: System = System;at somewhere in higher level when global allocator is absent, instead of having the logic duplicated in codegen? This would allow |
You can use say |
|
I posted a message in the |
Co-authored-by: Ralf Jung <post@ralfj.de>
I can't figure out how to link with the MSVC toolchain
|
Rebased and just like with #106560 I disabled the test on MSVC for now. |
This comment has been minimized.
This comment has been minimized.
|
@bors r+ |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (a2b1646): comparison URL. Overall result: ❌ regressions - ACTION NEEDEDNext Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 647.061s -> 648.599s (0.24%) |
| // Make sure we don't accidentally allow omitting the allocator shim in | ||
| // stable code until it is actually stabilized. | ||
| #[cfg(not(bootstrap))] | ||
| core::ptr::read_volatile(&__rust_no_alloc_shim_is_unstable); |
There was a problem hiding this comment.
I did expect the perf regression to come from this change. It is a single extra instruction on the allocation path to ensure __rust_no_alloc_shim_is_unstable must be defined if no allocator shim is linked in. The only way I can think of that guarantees that it isn't possible to link without defining __rust_no_alloc_shim_is_unstable would be to put __rust_alloc and an item referencing __rust_no_alloc_shim_is_unstable in the same COMDAT group, but this isn't possible for all object file formats and rust doesn't have a way to do this without global_asm!().
There was a problem hiding this comment.
It's unfortunate that this symbol is now present in every single allocation path, especially where it enlarges the output binary for platforms like Wasm.
Is it instead possible to introduce a new function __rust_no_alloc_shim_is_unstable that would simply forward to __rust_alloc?
That way you still get the desired linker error if it's not declared, but it won't take any more space and won't cause as much of a performance hit.
There was a problem hiding this comment.
That would prevent LLVM from optimizing allocations away I think as __rust_no_alloc_shim_is_unstable is not considered to be an allocator function by LLVM, while __rust_alloc is. That said, I just opened a draft PR which will enable doing away with the allocator shim entirely in the future, after which I hope it will be much easier to request stabilization of support for not using the allocator shim and thus remove this symbol entirely.
|
@bjorn3 when you say:
are you referring to the increase in binary object file size? |
|
I'm referring to the instruction count regression. I hadn't noticed the binary object file size regression. |
@rustbot label: perf-regression-triaged |
This makes it possible to use liballoc/libstd in combination with
--emit objif you use#[global_allocator]. This is what rust-for-linux uses right now and systemd may use in the future. Currently they have to depend on the exact implementation of the allocator shim to create one themself as--emit objdoesn't create an allocator shim.Note that currently the allocator shim also defines the oom error handler, which is normally required too. Once
#![feature(default_alloc_error_handler)]becomes the only option, this can be avoided. In addition when using only fallible allocator methods and either--cfg no_global_oom_handlingfor liballoc (like rust-for-linux) or--gc-sectionsno references to the oom error handler will exist.To avoid this feature being insta-stable, you will have to define
__rust_no_alloc_shim_is_unstableto avoid linker errors.(Labeling this with both T-compiler and T-lang as it originally involved both an implementation detail and had an insta-stable user facing change. As noted above, the
__rust_no_alloc_shim_is_unstablesymbol requirement should prevent unintended dependence on this unstable feature.)