Remove the Arc rt::init allocation for thread info#123550
Remove the Arc rt::init allocation for thread info#123550bors merged 1 commit intorust-lang:masterfrom
Arc rt::init allocation for thread info#123550Conversation
|
r? @Nilstrieb rustbot has assigned @Nilstrieb. Use |
|
I just checked and |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
d5a081b to
d5b8b00
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Remove last rt::init allocation for thread info Removes the last allocation pre-main by just not storing anything in std::thread::Thread for the main thread. - The thread name can just be a hard coded literal, as was done in rust-lang#123433. - The ThreadId is always the `1` value, so `ThreadId::new` now starts at `2` and can fabricate the `1` value when needed. - Storing Parker in a static that is initialized once at startup. This uses SyncUnsafeCell and MaybeUninit as this is quite performance critical and we don't need synchronization or to store a tag value and possibly leave in a panic. This currently does not have a regression test to prevent future changes from re-adding allocations pre-main as I'm [having trouble](GnomedDev@6f7be53) implementing it, but if wanted I can draft this PR until that test is ready.
|
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (666bbff): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 666.761s -> 666.789s (0.00%) |
Noratrieb
left a comment
There was a problem hiding this comment.
I don't love the increased complexity and unsafety.. if you have some good justification for why this is important that would be great, but I'm inclined to accept it even without that, it certainly feels good to have this property.
|
The increased complexity is a bit sad, but this is already a complex and unsafe process to initialise the basics for the runtime, so I felt that the increased performance and decreased compile time was worth a small amount of well documented unsafety. |
|
Hmm, looking at the actual perf run, it seems quite negative which is certainly unexpected. How is this commonly debugged, as I don't want to go off vibes? |
|
Run the cachegrind command to see where in the compiler the diff occurs. Though FWIW, I would expect these results to be noise and wouldn't chase them further myself - I'd just treat it as "makes no difference". |
so yeah, no real decreased compile time. as for increased performance, I doubt that this will be measurable, maybe |
|
Okay, I don't have a benchmark (I never have a benchmark). Would you like me to rewrite this using OnceLock, just to see if that perf run is also neutral? |
778330b to
ab8eba1
Compare
|
Sorted the existing review comments, just waiting on a reply to my last comment. |
|
☔ The latest upstream changes (presumably #123913) made this pull request unmergeable. Please resolve the merge conflicts. |
ab8eba1 to
2c45b39
Compare
|
Okay, @Nilstrieb I've been trying for the last week different ways to make this less unsafe and complex but it doesn't seem possible with the "Parker must be initialized in place" requirement. I cannot initialize a OnceLock or an Option in-place without increasing complexity significantly, so this seems like the least complex (and most performant) way to do this. |
00c2df1 to
0747f28
Compare
Noratrieb
left a comment
There was a problem hiding this comment.
thanks! i can't think of a way to make this nicer, it's good enough now i think
| /// The internal representation of a `Thread` handle | ||
| struct Inner { | ||
| name: ThreadName, // Guaranteed to be UTF-8 | ||
| static MAIN_THREAD_INFO: SyncUnsafeCell<(MaybeUninit<ThreadId>, MaybeUninit<Parker>)> = |
There was a problem hiding this comment.
we could now move this into new_main, but I really don't want to continue letting you suffer with this so we'll leave it as-is
|
@bors r+ |
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
💔 Test failed - checks-actions |
|
no @bors retry |
|
"PLEASE submit a bug report to" wow, lld crashed. |
|
The job Click to see the possible cause of the failure (guessed by this bot) |
|
💔 Test failed - checks-actions |
|
please stop dying |
|
Previous: Details2024-10-24T03:22:49.8907156Z error: linking with `rust-lld` failed: exit code: 0xc0000005 2024-10-24T03:22:49.8907587Z | 2024-10-24T03:22:49.8909276Z = note: "rust-lld" "-flavor" "gnu" "C:\\a\\_temp\\msys64\\tmp\\rustc2fOg1x\\symbols.o" "compiled.avr_rjmp_offsets.ab053966543a1f9f-cgu.0.rcgu.o" "--as-needed" "-Bdynamic" "-z" "noexecstack" "-L" "C:\\a\\rust\\rust\\build\\x86_64-pc-windows-gnu\\test\\run-make\\avr-rjmp-offset\\rmake_out" "-o" "compiled" "--gc-sections" "--entry=main" 2024-10-24T03:22:49.8916624Z = note: PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.␍ 2024-10-24T03:22:49.8918113Z Exception Code: 0xC0000005␍ 2024-10-24T03:22:49.8919000Z 0x00007FFFE67B784C, C:\Windows\SYSTEM32\ntdll.dll(0x00007FFFE67A0000) + 0x1784C byte(s), RtlEnterCriticalSection() + 0x3CC byte(s)␍ 2024-10-24T03:22:49.8920295Z 0x00007FFFE67BB550, C:\Windows\SYSTEM32\ntdll.dll(0x00007FFFE67A0000) + 0x1B550 byte(s), RtlGetCurrentServiceSessionId() + 0xBF0 byte(s)␍ 2024-10-24T03:22:49.8921662Z 0x00007FFFE67BA8C1, C:\Windows\SYSTEM32\ntdll.dll(0x00007FFFE67A0000) + 0x1A8C1 byte(s), RtlFreeHeap() + 0x51 byte(s)␍ 2024-10-24T03:22:49.8922774Z 0x00007FFFE48AC69C, C:\Windows\System32\msvcrt.dll(0x00007FFFE4890000) + 0x1C69C byte(s), free() + 0x1C byte(s)␍ 2024-10-24T03:22:49.8924039Z 0x00007FFFE1F24AF1, C:\a\rust\rust\mingw64\bin\libwinpthread-1.dll(0x00007FFFE1F20000) + 0x4AF1 byte(s), pthread_tls_init() + 0x701 byte(s)␍ 2024-10-24T03:22:49.8925438Z 0x00007FFFE1F24C15, C:\a\rust\rust\mingw64\bin\libwinpthread-1.dll(0x00007FFFE1F20000) + 0x4C15 byte(s), pthread_create_wrapper() + 0xC5 byte(s)␍ 2024-10-24T03:22:49.8926760Z 0x00007FFFE48CDFD4, C:\Windows\System32\msvcrt.dll(0x00007FFFE4890000) + 0x3DFD4 byte(s), _beginthreadex() + 0x134 byte(s)␍ 2024-10-24T03:22:49.8927919Z 0x00007FFFE48CE0AC, C:\Windows\System32\msvcrt.dll(0x00007FFFE4890000) + 0x3E0AC byte(s), _endthreadex() + 0xAC byte(s)␍ 2024-10-24T03:22:49.8929103Z 0x00007FFFE4E94CB0, C:\Windows\System32\KERNEL32.DLL(0x00007FFFE4E80000) + 0x14CB0 byte(s), BaseThreadInitThunk() + 0x10 byte(s)␍ 2024-10-24T03:22:49.8930300Z 0x00007FFFE681ECDB, C:\Windows\SYSTEM32\ntdll.dll(0x00007FFFE67A0000) + 0x7ECDB byte(s), RtlUserThreadStart() + 0x2B byte(s)␍ 2024-10-24T03:22:49.8931030Z 2024-10-24T03:22:49.8931178Z 2024-10-24T03:22:49.8931367Z error: aborting due to 1 previous error 2024-10-24T03:22:49.8931819Z ------------------------------------------ 2024-10-24T03:22:49.8932087Z 2024-10-24T03:22:49.8932093Z 2024-10-24T03:22:49.8932099Z 2024-10-24T03:22:49.8932191Z failures: 2024-10-24T03:22:49.8932535Z [run-make] tests\run-make\avr-rjmp-offset This looks like legitimate error: Weird. |
|
☀️ Test successful - checks-actions |
|
Finished benchmarking commit (f61306d): comparison URL. Overall result: ❌ regressions - no action needed@rustbot label: -perf-regression Instruction countThis is the most reliable metric that we have; it was used to determine the overall result at the top of this comment. However, even this metric can sometimes exhibit noise.
Max RSS (memory usage)Results (primary -0.3%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResults (secondary -2.1%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResults (primary 0.0%, secondary 0.0%)This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 780.805s -> 780.742s (-0.01%) |
| #[derive(Clone)] | ||
| enum Inner { | ||
| /// Represents the main thread. May only be constructed by Thread::new_main. | ||
| Main(&'static (ThreadId, Parker)), |
There was a problem hiding this comment.
Given that the main thread is a static reference, why not just have this be an Option<Pin<Arc<OtherInner>>>? Every None match can refer to the static.
There was a problem hiding this comment.
Removes an allocation pre-main by just not storing anything in std::thread::Thread for the main thread.