Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend memcpy, memmove and memcmp optimizations #2158

Merged
merged 17 commits into from
Sep 25, 2021

Conversation

AdamBucior
Copy link
Contributor

Fixes #431.

Product code changes

  • Extended memcpy and memmove optimizations to not require is_trivially_copyable and is_trivial but just is_trivially_copy_constructible, is_trivially_move_constructible, is_trivially_copy_assignable or is_trivially_move_assignable depending on operation.
  • Renamed _Ptr_copy_cat and _Ptr_move_cat to _Iter_copy_cat and _Iter_move_cat because they no longer work only on pointers.
  • Fixed move_iterator handling in memcpy/memmove optimizations by specializing _To_address for it so that it doesn't warn about deprecated move_iterator::operator->.
  • Extended memcpy, memmove and memcmp optimization to handle pointers to pointer-interconvertible types.
  • Special cased nullptr_t in _Is_all_bits_zero.
  • Refactored _Lex_compare_memcmp_classify and extended it to bools.
  • Extended memcmp in lexicographical_compare_three_way to handle comparators like strong_order, weak_order and partial_order.
  • Extended memcmp optimization to handle non-transparent functors (ex. to enable optimization in equal when using int, unsigned int and equal_to<int>. This also enables memcmp optimization with enums when using functor that converts them to their underlying type).
  • Disabled memcpy and memmove optimization for volatile.
  • Enabled memcpy optimization for const destination in (ranges::)unitialized_(copy|move)(_n) algorithms (see <memory>: uninitialized_meow should handle voidify's wording #1780).

Test changes

  • Removed VSO_0180469_ptr_cat in favor of more detailed GH_000431_iter_copy_move_cat, GH_000431_equal_memcmp_is_safe and GH_000431_lex_compare_memcmp_classify.
  • Inspired by VSO_0180469_fill_family added GH_000431_copy_move_family, GH_000431_equal_family and GH_000431_lex_compare_family to test that all of the optimizations work well (also the Some StdLib algorithms fail /std:c++latest compilation with custom contiguous iterators (Visual Studio 2019 16.8) #1523 test from VSO_0180469_ptr_cat was moved to GH_000431_equal_family).
  • Fix P0202R3_constexpr_algorithm_and_exchange to define all necessary typedefs in output_pointer.

@AdamBucior AdamBucior requested a review from a team as a code owner August 25, 2021 12:30
@AdamBucior AdamBucior changed the title Extend memcpy, memmove and memcmp optimizations. Extend memcpy, memmove and memcmp optimizations Aug 25, 2021
stl/inc/xutility Outdated Show resolved Hide resolved
@AdamBucior
Copy link
Contributor Author

WOW, how is it possible that is_trivially_constructible_v<float, int> is true? That seems ridiculous. What's the real point of this trait?

Comment on lines +158 to +163
tests\GH_000431_copy_move_family
tests\GH_000431_equal_family
tests\GH_000431_equal_memcmp_is_safe
tests\GH_000431_iter_copy_move_cat
tests\GH_000431_lex_compare_family
tests\GH_000431_lex_compare_memcmp_classify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having separate tests is more modular, and gives a better idea what failed when a test fails, they contribute to test execution time. Not sure what is more important.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having them separate is better for development because you only run one of them and not all of them which would take much more time. Also I already had problems with GH_000431_lex_compare_memcmp_classify test failing on x86 due to compiler running out of memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both of you! 😸 There is definitely a test throughput concern with having lots of small test directories, but in this case I believe that it's reasonable to have separate directories for the reasons that @AdamBucior stated.

@StephanTLavavej StephanTLavavej added the performance Must go faster label Aug 25, 2021
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - looking good so far, I've partially reviewed the product code (leaving a note to myself where I paused) in a video review. Will continue reviewing this later. 😸

stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
Co-authored-by: Stephan T. Lavavej <stl@nuwen.net>
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - reviewed this in part 2 of the video review. Looked at the rest of <xutility>, still need to review the test code.

stl/inc/xutility Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
stl/inc/xutility Outdated Show resolved Hide resolved
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished reviewing the tests! Looks good, should need just one more round of minor changes. Thanks again for this major PR.

tests/std/tests/GH_000431_copy_move_family/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/GH_000431_copy_move_family/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/GH_000431_copy_move_family/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/GH_000431_copy_move_family/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/GH_000431_lex_compare_family/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/GH_000431_lex_compare_family/test.cpp Outdated Show resolved Hide resolved
tests/std/tests/GH_000431_lex_compare_family/test.cpp Outdated Show resolved Hide resolved
@StephanTLavavej StephanTLavavej removed their assignment Sep 21, 2021
@StephanTLavavej StephanTLavavej self-assigned this Sep 21, 2021
@StephanTLavavej
Copy link
Member

Thanks! I pushed two small commits, one to use if constexpr (promoted) for all compilers as Clang was correctly enforcing the Standard rule, and another to use dual-range equal() when a predicate is being passed.

@StephanTLavavej
Copy link
Member

I'm mirroring this to an MSVC-internal PR. Please notify me if any further changes are pushed.

@CaseyCarter CaseyCarter self-requested a review September 23, 2021 18:32
@StephanTLavavej
Copy link
Member

I pushed workarounds for VSO-1409786 "/clr:pure BE assertion !tupLength || TU_SIZE(tup) == TU_IVAL(tupLength) in src\vctools\Compiler\Utc\src\p2\tuple.c line 6189" and VSO-1409862 "/clr and /clr:pure emit fatal error C1060: compiler is out of heap space for a not-very-pathological test".

@CaseyCarter CaseyCarter removed their assignment Sep 24, 2021
@StephanTLavavej
Copy link
Member

Thanks to @joemmett, I've pushed a mechanical refactoring of the GH_000431_equal_memcmp_is_safe test that dramatically improves compiler memory consumption. This performs STATIC_ASSERTs at every level, guards the lambda with _HAS_CXX17 as we need constexpr lambdas now, and (importantly) removes the top-level function, so the top-level STATIC_ASSERTs are at namespace scope. This allows the compiler to free more memory between each STATIC_ASSERT. (Jonathan has also fixed the compiler to improve its behavior, but this refactoring is helpful both before and after his fix.) For x86 native, this improves compiler memory consumption as reported by /d1reportMemorySummary peak working set size from 807.7 MB to 283.3 MB.

Jonathan also confirmed that /clr:pure still needs to emit dynamic initializers for 150,000 variable template specializations (because /clr:pure lacks static initialization), which is why compilation still takes forever - so I am skipping it with the impure_matrix.lst. This is a perma-workaround so it isn't marked TRANSITION.

@StephanTLavavej StephanTLavavej merged commit ed318b8 into microsoft:main Sep 25, 2021
@StephanTLavavej
Copy link
Member

Thanks for this major optimization overhaul! 🚀 😻 ⚡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

<xutility>: copy and fill family could engage memcpy / memset in more cases
5 participants
  翻译: