GCC Lands AVX-512 Fully-Masked Vectorization - Phoronix
Articles & Reviews
News Archive
Forums
Premium Ad-Free<br>Contact
Popular Categories
Close
Articles & Reviews
News Archive
Forums
Premium
Contact
Categories
Computers Display Drivers Graphics Cards Linux Gaming Memory Motherboards Processors Software Storage Operating Systems Peripherals
GCC Lands AVX-512 Fully-Masked Vectorization
Written by Michael Larabel in GNU on 19 June 2023 at 06:30 AM EDT. 31 Comments
Stemming from looking at the generated x264 video encode binary and some performance inefficiencies, SUSE engineers have worked out AVX-512 fully masked vectorization support for the GCC 14 development code.
Back in January SUSe compiler engineer Jan Hubicka opened this bug around the x264 benchmark with the averaging loop not being well optimized for AVX-512.
"x264 benchmark has a loop averaging two unsigned char arrays that is executed with relatively low trip counts that does not play well with our vectorized code. For AVX512 most time is spent in unvectorized variant since the average number of iterations is too small to reach the vector code.
...
For sizes 12-16 128bit vectorization wins, 20-28 behaves funily. However avx512 vectorization is a huge loss for all sizes up to 31 bytes. aocc seems to win for 16 bytes.
...
One issue is that we at most perform one epilogue loop vectorization, so with AVX512 we vectorize the epilogue with AVX2 but its epilogue remains unvectorized. With AVX512 we'd want to use a fully masked epilogue using AVX512 instead.
I started working on fully masked vectorization support for AVX512 but got distracted."
Fast forward nearly six months, SUSE compiler engineer Richard Biener has landed an initial implementation of AVX-512 fully masked vectorization within the GNU Compiler Collection codebase for helping out the x264 test case and other less-than-full vector cases.
"This implements fully masked vectorization or a masked epilog for avx512 style masks which single themselves out by representing each lane with a single bit and by using integer modes for the mask (both is much like gcn).
avx512 is also special in that it doesn't have any instruction to compute the mask from a scalar iv like sve has with while_ult. Instead the masks are produced by vector compares and the loop control retains the scalar iv (mainly to avoid dependences on mask generation, a suitable mask test instruction is available).
like rvv code generation prefers a decrementing iv though ivopts messes things up in some cases removing that iv to eliminate it with an incrementing one used for address generation.
one of the motivating testcases is from pr108410 which in turn is extracted from x264 where large size vectorization shows issues with small trip loops. Execution time there improves compared to classic avx512 with avx2 epilogues for the cases of less than 32 iterations."
The AVX-512 fully masked vectorization support landed this morning in GCC 14 Git via this commit.
31 Comments
Tweet
New GCC Back-End Proposed For WebAssembly<br>GCC 16.1 Released With AMD Zen 6 Support, Algol 68 & Many C++ Improvements<br>Hygon C86-4G CPU Support Added To The GCC 17 Compiler<br>GCC 16's Improved Error Messages, Experimental HTML Output<br>GCC Establishes Working Group To Decide On AI/LLM Policy<br>GCC 16 Compiler Nearly Ready For Release With Zen 6, AVX10.2, APX & Algol 68
Michael Larabel is the principal author of Phoronix.com and founded the site in 2004 with a focus on enriching the Linux hardware experience. Michael has written more than 20,000 articles covering the state of Linux hardware support, Linux performance, graphics drivers, and other topics. Michael is also the lead developer of the Phoronix Test Suite, Phoromatic, and OpenBenchmarking.org automated benchmarking software. He can be followed via Twitter, LinkedIn, or contacted via MichaelLarabel.com.
OpenZFS 2.4.2 Released With Linux 7.0 Kernel Support, Many Bug Fixes<br>Wine Wayland Driver Merges Pointer Warp Support<br>AMD & Intel Roll Out New Linux Updates For Today's Patch Tuesday<br>FreeBSD 15.2 Will Aim For The Nice KDE Desktop Installation Experience<br>DXVK-NVAPI 0.9.2 Further Improves NVIDIA Integration For Steam Play Linux Gaming<br>IBM s390 Is The Latest Architecture Seeing Rust Linux Kernel Support<br>Linux's Latest Vulnerability Allows Reading Root-Owned Files By Unprivileged Users<br>ARCTIC Fan Controller Driver Expected To Land In Linux 7.2
VKD3D 2.0 Released For Wine's Direct3D 12 Implementation Atop Vulkan
AlmaLinux To Unveil Media/Entertainment Linux OS Edition
More AMDGPU Driver Fixes Prepped For Linux 7.2
chipStar 1.3 Released For Running HIP/CUDA Code On SPIR-V With OpenCL
New Patches Allow The Microsoft Surface Pro 9 5G To Be More Useful Under Linux
AMDGPU HDMI 2.1 FRL To Be Initially Disabled-By-Default
The Very Exciting Cache Aware Scheduling Looks Like It Will Land For Linux 7.2
Firefox 153 Nightly Rolls Out New Settings UI
RHEL 10.2 Released With New AI Command...