MI210 is xnack disabled by default. Compile with :xnack+ will produce
GPU kernels that cannot be run on MI210.
Two targets, gfx90a_xnack and gfx90a_noxnack are introduced to replace
gfx90a, for the following reason:
Upstream usually ships 2 version: gfx90a:xnack-, gfx90a:xnack+. Although
a single gfx90a should have maximum compatibility, According to [1,2],
compile with xnack+/xnack- may have better performance on xnack enabled/disabled GPUs. Therefore we ship both the target, align with
upstream. gfx900 is also appended with :xnack- to align with upstream
default.
However, some ROCm libraries does not need xnack feature flag, such as sci-libs/miopen and sci-libs/rocFFT, so "--no-xnack-flag" option is
introduced to stop appending xnack feature. Demonstrate this option in
the second example ebuild.
Update description file profiles/desc/amdgpu_targets.desc accordingly.
Github PR:
https://github.com/gentoo/gentoo/pull/33861
[1]
https://llvm.org/docs/AMDGPUUsage.html#target-features
[2]
https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#compiling-hip-kernels-for-specific-xnack-modes
Signed-off-by: Yiyang Wu <
xgreenlandforwyy@gmail.com>
---
eclass/rocm.eclass | 31 ++++++++++++++++++++++---------
profiles/desc/amdgpu_targets.desc | 5 ++++-
2 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/eclass/rocm.eclass b/eclass/rocm.eclass
index b78dfea1cc31..3209c003e323 100644
--- a/eclass/rocm.eclass
+++ b/eclass/rocm.eclass
@@ -60,7 +60,8 @@
#
# Examples for packages depend on ROCm libraries -- a package which depends on
# rocBLAS, uses comma separated ${HCC_AMDGPU_TARGET} to determine GPU
-# architectures, and requires ROCm version >=5.1
+# architectures which does not accept :xnack feature flag, and requires ROCm +# version >=5.1
# @CODE
# ROCM_VERSION=5.1
# inherit rocm
@@ -71,7 +72,7 @@
#
# src_configure() {
# if use rocm; then
-# local amdgpu_flags=$(get_amdgpu_flags)
+# local amdgpu_flags=$(get_amdgpu_flags --no-xnack-flag)
# export HCC_AMDGPU_TARGET=${amdgpu_flags//;/,}
# fi
# default
@@ -143,7 +144,7 @@ _rocm_set_globals() {
gfx