Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use <intrin0.h> with Clang #3285

Merged
merged 2 commits into from
Dec 15, 2022
Merged

Conversation

CaseyCarter
Copy link
Member

Addresses the problem mentioned in #2520 (comment). We've had several issues with Clang and intrin0.h in the past, hopefully we can fix them "forever" by avoiding intrin0.h when compiling with Clang.

The existence of intrin0.h is purely an optimization in that it allows us to avoid including the entirety of intrin.h. We could support this optimization with Clang in the future by adding an intrin0.h to Clang's set of override headers. If we were to do so we should synchronize with libc++ on the proper set of intrinsics to move into intrin0.h.

@CaseyCarter CaseyCarter added the bug Something isn't working label Dec 13, 2022
@CaseyCarter CaseyCarter requested a review from a team as a code owner December 13, 2022 20:17
@StephanTLavavej
Copy link
Member

Looks good, I've pushed a tiny change to remove fine-grained header inclusion since we're using the coarse-grained header in src/vector_algorithms.cpp. FYI @strega-nil-ms as you already approved.

There is one remaining occurrence of us including something smaller than intrin.h:

STL/stl/inc/complex

Lines 25 to 29 in 4483e87

#elif defined(__clang__) // ^^^ defined(_M_ARM64) || defined(_M_ARM64EC) ^^^
// TRANSITION, not using x86/x64 FMA intrinsics for Clang yet
#elif defined(_M_IX86) || defined(_M_X64)
#define _FMP_USING_X86_X64_INTRINSICS
#include <emmintrin.h>

This currently excludes Clang and is special, so leaving it as-is is totally fine.

@CaseyCarter
Copy link
Member Author

There is one remaining occurrence of us including something smaller than intrin.h:

[code that includes emmintrin.h]

This would be fine: Clang overrides many (all?) of the Intel intrinsic headers. intrin0.h is a problem only because it's a header that we "made up" which Clang doesn't override.

@StephanTLavavej StephanTLavavej self-assigned this Dec 15, 2022
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 8ddf4da into microsoft:main Dec 15, 2022
@StephanTLavavej
Copy link
Member

Thanks for noticing the repeated problems here and finding a robust way to avoid them forever! 🛠️ 🐈 🎉

@CaseyCarter CaseyCarter deleted the intrin branch December 16, 2022 01:30
@Kojoley
Copy link
Contributor

Kojoley commented Apr 7, 2023

This quite dramatically increased inclusion cost for a lot of headers:

msvc clang
was
clang
now
slow
down
header
0.068 0.082 0.369 351% <limits>
0.093 0.101 0.387 284% <atomic>
0.097 0.103 0.389 277% <latch>
0.102 0.110 0.396 259% <bit>
0.111 0.126 0.410 226% <compare>
0.119 0.133 0.424 219% <coroutine>
0.119 0.134 0.419 213% <utility>
0.120 0.134 0.419 212% <typeindex>
0.141 0.155 0.448 189% <tuple>
0.179 0.206 0.491 139% <array>
0.178 0.207 0.493 138% <span>
0.180 0.207 0.491 137% <numeric>
0.185 0.215 0.499 132% <iterator>
0.195 0.222 0.511 131% <semaphore>
0.191 0.219 0.504 130% <expected>
0.193 0.229 0.513 124% <cmath>
0.213 0.243 0.528 117% <scoped_allocator>
0.217 0.245 0.530 116% <any>
0.222 0.250 0.535 114% <deque>
0.219 0.249 0.532 114% <forward_list>
0.221 0.250 0.533 113% <optional>
0.231 0.264 0.560 112% <list>
0.226 0.258 0.539 109% <set>
0.229 0.262 0.546 109% <vector>
0.229 0.260 0.541 108% <map>
0.245 0.283 0.576 104% <barrier>
0.243 0.278 0.563 102% <stop_token>
0.255 0.301 0.595 97% <valarray>
0.258 0.299 0.582 95% <charconv>
0.266 0.307 0.589 92% <memory>
0.271 0.319 0.602 89% <unordered_set>
0.287 0.328 0.620 89% <variant>
0.272 0.319 0.603 89% <unordered_map>
0.300 0.328 0.613 87% <string_view>
0.302 0.330 0.614 86% <stdexcept>
0.302 0.332 0.618 86% <bitset>
0.296 0.343 0.626 83% <functional>
0.300 0.345 0.629 82% <algorithm>
0.319 0.351 0.634 81% <string>
0.321 0.358 0.643 80% <thread>
0.342 0.371 0.655 77% <system_error>
0.359 0.397 0.685 72% <stacktrace>
0.438 0.480 0.774 61% <streambuf>
0.444 0.476 0.761 60% <shared_mutex>
0.434 0.471 0.753 60% <mutex>
0.441 0.476 0.759 59% <condition_variable>
0.459 0.511 0.798 56% <ios>
0.416 0.506 0.789 56% <ranges>
0.424 0.518 0.799 54% <stack>
0.505 0.528 0.811 54% <memory_resource>
0.481 0.541 0.826 52% <codecvt>
0.511 0.539 0.820 52% <locale>
0.533 0.649 0.939 45% <queue>
0.654 0.764 1.056 38% <random>
0.666 0.762 1.048 38% <format>
0.692 0.787 1.075 37% <fstream>
0.691 0.788 1.075 37% <syncstream>
0.724 0.824 1.121 36% <iomanip>
0.689 0.787 1.070 36% <istream>
0.658 0.794 1.079 36% <regex>
0.693 0.790 1.073 36% <spanstream>
0.714 0.803 1.089 36% <sstream>
0.690 0.787 1.066 35% <iostream>
0.731 0.833 1.125 35% <strstream>
0.766 0.866 1.166 35% <complex>
0.684 0.827 1.112 35% <ostream>
0.807 0.853 1.143 34% <print>
0.725 0.870 1.152 32% <execution>
1.117 1.238 1.520 23% <chrono>
1.388 1.512 1.795 19% <filesystem>
2.528 2.914 2.868 -2% <__msvc_all_public_headers.hpp>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
  翻译: