-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
<atomic>
: Improve ARM64 performance
#3399
<atomic>
: Improve ARM64 performance
#3399
Conversation
"Re-implement std::atomic acquire/release/seqcst load/store using __load_acquire/__stlr"
… by the updated header.
It's not like we've recently had to service a slew of regressions and ABI breaks. I'm not at all nervous. 😅 |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
If you use the official 17.6 release later and pass /arch:armv8.3, you will get LDAPR instead of LDAR emitted for acquired loads. |
Am I reading the code wrong, or did
|
Can you explain why this needs
|
STLR provides seq_cst when the load side does LDAR, but we have already shipped callers doing LD+DMB ISH. The extra barrier can be removed when we ABI break. |
I'm reading that __load_acquire can be LDAPR (presumably if the compiler as asked to compile for a sufficient ARM version), and I'm reading somewhere else that that's too weak for seq_cst. Is the barrier guarding against that maybe? Also the |
This mirrors Ben Niu's internal MSVC-PR-449792 "Re-implement
std::atomic
acquire/release/seqcst load/store using__load_acquire
/__stlr
" as of iteration 15. Note that this PR is targeted at the internal branchprod/be
, thus there will be temporary divergence between GitHub and our usual branchprod/fe
.This relies on new compiler intrinsics, thus it won't be immediately active on GitHub until the necessary compiler and VCRuntime changes ship in a public Preview. Ben's benchmarking indicates massive performance improvements for load-acquire and store-release (around 14.1x to 23.8x speedups for officially supported chips - yes, times not percent) and significant performance improvements for sequentially consistent stores (1.58x speedups).
Ben has sworn a solemn oath on a basket of fluffy kittens that this does not break bincompat. 🧺 😻
Fixes #83.
☢️ 🦾