Hello,
More of my philosophy about AVX-512 and about Zen 4 and about technology and more of my thoughts..
I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..
"AVX-512 implementation in Zen 4 is unexpectedly good, despite relying on the double pumping of 256-bit units. Most of the operations are fast (they don’t have bad latencies), and the bandwidth in terms of 512-bit instructions processed per cycle is
also good in the context of using 256-bit units. AMD has particularly fast Conflict Detection operations and Mask Registry handling, which has significantly higher performance than Intel’s implementation. Integer 64×64 multiplication (vmpmullq) is
extremely fast too."
Read more here:
https://www.hwcooling.net/en/how-good-is-amds-avx-512-does-it-improve-zen-4-performance/
And according to the following benchmark results, OpenBLAS is about 2.85 times faster than the default BLAS and MKL is about 3.25 times faster:
Read more here:
https://csantill.github.io/RPerformanceWBLAS/
So i think OpenBLAS is good, so you can download it from the following web link:
https://www.openblas.net/
Also i have just looked at the following article about the
benchmark of Intel Xeon Scalable Processor vs. Nvidia V100 GPU,
here it is:
https://www.xcelerit.com/computing-benchmarks/insights/benchmarks-intel-xeon-scalable-processor-vs-nvidia-v100-gpu/
So i think that the main problem of Intel Xeons in the above benchmark is the memory bandwidth, so i think that the number of GFLOPs of Intel Xeons in the above benchmark is a result of multiplying the frequency of the CPU by the number of cores and by
2x8FMA, i mean fused multiply–add (FMA) instructions for floating-point scalar and SIMD operations, and it is giving a result of 2,240 GFLOPs, so then if you want to have a powerful computer that also have a good memory bandwidth, i advice you to use a
new two socket motherboard for new Intel Xeon processors that support a memory bandwidth of like 5.2 GT/s for DDR5 x 8 bytes per channel x 12 channels for one socket, and that equals 499.2 GB per second or 998.4 GB per second for two sockets, and this
will equal the memory bandwidth of the Nvidia V100 PCIe (Volta) in the above benchmark , and this will solve the memory bandwidth problem, and of course the two socket motherboard for a two new 64 cores Intel Xeon 3.4 Ghz will give you around 6963 GFLOPs
as the Nvidia V100 PCIe (Volta).
Thank you,
Amine Moulay Ramdane.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)