Abstract

The Lawrence Livermore National Laboratory (LLNL) will soon have in place the El Capitan exascale supercomputer, based on advanced micro devices (AMD) graphics processing units (GPUs). As part of a multiyear effort under the National Nuclear Security Administration (NNSA) Advanced Simulation and Computing (ASC) program, we have been developing marbl, a next generation, performance portable multiphysics application based on high-order finite elements. In previous years, we successfully ported the Arbitrary Lagrangian–Eulerian (ALE), multimaterial, compressible flow capabilities of marbl to nvidia GPUs as described in Vargas et al. (2022, “Matrix-Free Approaches for GPU Acceleration of a High-Order Finite Element Hydrodynamics Application Using MFEM, Umpire, and RAJA,” Int. J. High Perform. Comput. Appl., 36(4), pp. 492–509). In this paper, we describe our ongoing effort in extending marbl's GPU capabilities with additional physics, including multigroup radiation diffusion and thermonuclear burn for high energy density physics (HEDP) and fusion modeling. We also describe how our portability abstraction approach based on the raja Portability Suite and the mfem finite element discretization library has enabled us to achieve high performance on AMD based GPUs with minimal effort in hardware-specific porting. Throughout this work, we highlight numerical and algorithmic developments that were required to achieve GPU performance.

References

1.
Vargas
,
A.
,
Stitt
,
T. M.
,
Weiss
,
K.
,
Tomov
,
V. Z.
,
Camier
,
J.-S.
,
Kolev
,
T.
, and
Rieben
,
R. N.
,
2022
, “
Matrix-Free Approaches for GPU Acceleration of a High-Order Finite Element Hydrodynamics Application Using MFEM, Umpire, and RAJA
,”
Int. J. High Perform. Comput. Appl.
,
36
(
4
), pp.
492
509
.10.1177/10943420221100262
2.
Gittings
,
M.
,
Weaver
,
R.
,
Clover
,
M.
,
Betlach
,
T.
,
Byrne
,
N.
,
Coker
,
R.
,
Dendy
,
E.
, et al.
2008
, “
The RAGE Radiation-Hydrodynamic Code
,”
Comput. Sci. Discov.
,
1
(
1
), p.
015005
.10.1088/1749-4699/1/1/015005
3.
Dubey
,
A.
,
Antypas
,
K.
,
Ganapathy
,
M. K.
,
Reid
,
L. B.
,
Riley
,
K.
,
Sheeler
,
D.
,
Siegel
,
A.
, and
Weide
,
K.
,
2009
, “
Extensible Component-Based Architecture for FLASH, a Massively Parallel, Multiphysics Simulation Code
,”
Parallel Comput.
,
35
(
10–11
), pp.
512
522
.10.1016/j.parco.2009.08.001
4.
van der Holst
,
B.
,
Tóth
,
G.
,
Sokolov
,
I. V.
,
Powell
,
K. G.
,
Holloway
,
J. P.
,
Myra
,
E. S.
,
Stout
,
Q.
, et al.,
2011
, “
CRASH: A Block-Adaptive-Mesh Code for Radiative Shock Hydrodynamics – Implementation and Verification
,”
Astrophys. J.
,
194
(
2
), p.
23
.10.1088/0067-0049/194/2/23
5.
Larsen
,
J. T.
, and
Lane
,
S. M.
,
1994
, “
HYADES – a Plasma Hydrodynamics Code for Dense Plasma Studies
,”
J. Quant. Spectrosc. Radiat. Transfer
,
51
(
1–2
), pp.
179
186
.10.1016/0022-4073(94)90078-7
6.
Smith
,
A.
, and
James
,
N.
,
2022
, “
AMD InstinctTM MI200 Series Accelerator and Node Architectures
,”
IEEE Hot Chips 34 Symposium (HCS), IEEE Computer Society
, Cupertino, CA, Aug. 21–23, pp.
1
23
.10.1109/HCS55958.2022.9895477
7.
NVIDIA
,
2022
, “
H100 Tensor Core GPU Architecture Overview
,” accessed Jan. 24, 2024, https://resources.nvidia.com/en-us-tensor-core
8.
Anderson
,
R.
,
Black
,
A.
,
Busby
,
L.
,
Blakeley
,
B.
,
Bleile
,
R.
,
Camier
,
J.-S.
,
Ciurej
,
J.
, et al.,
2020
, “
The Multiphysics on Advanced Platforms Project
,” Report No. LLNL-TR-815869.
9.
Deakin
,
T.
,
McIntosh-Smith
,
S.
,
Price
,
J.
,
Poenaru
,
A.
,
Atkinson
,
P.
,
Popa
,
C.
, and
Salmon
,
J.
,
2019
, “
Performance Portability Across Diverse Computer Architectures
,” IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (
P3HPC
), Denver, CO, Nov. 22, pp.
1
13
.10.1109/P3HPC49587.2019.00006
10.
Beckingsale
,
D.
,
Burmark
,
J.
,
Hornung
,
R.
,
Jones
,
H.
,
Killian
,
W.
,
Kunen
,
A.
,
Pearce
,
O.
, et al.,
2019
, “
RAJA: Portable Performance for Large-Scale Scientific Applications
,” IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (
P3HPC
), IEEE, Denver, CO, Nov. 22, pp.
71
81
.10.1109/P3HPC49587.2019.00012
11.
Beckingsale
,
D. A.
,
McFadden
,
M. J.
,
Dahm
,
J. P. S.
,
Pankajakshan
,
R.
, and
Hornung
,
R. D.
,
2020
, “
Umpire: Application-Focused Management and Coordination of Complex Hierarchical Memory
,”
IBM J. Res. Develop.
,
64
(
3/4
), pp.
00:1
00:10
.10.1147/JRD.2019.2954403
12.
Anderson
,
R.
,
Andrej
,
J.
,
Barker
,
A.
,
Bramwell
,
J.
,
Camier
,
J.-S.
,
Cerveny
,
J.
,
Dobrev
,
V.
, et al.,
2021
, “
MFEM: A Modular Finite Element Methods Library
,”
Comput. Math. Appl.
,
81
, pp.
42
74
.10.1016/j.camwa.2020.06.009
13.
Boehme
,
D.
,
Aschwanden
,
P.
,
Pearce
,
O.
,
Weiss
,
K.
, and
LeGendre
,
M.
,
2021
, “
Ubiquitous Performance Analysis
,”
Proceedings ISC High Performance
,
B. L.
Chamberlain
,
A.-L.
Varbanescu
,
H.
Ltaief
, and
P.
Luszczek
, eds., ISC-HPC '21,
Springer International Publishing
, Virtual, June 24–July 2, pp.
431
449
.
14.
Boehme
,
D.
,
Gamblin
,
T.
,
Beckingsale
,
D.
,
Bremer
,
P.-T.
,
Gimenez
,
A.
,
LeGendre
,
M.
,
Pearce
,
O.
, and
Schulz
,
M.
,
2016
, “
Caliper: Performance Introspection for HPC Software Stacks
,”
Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
, (
SC '16
),
IEEE Computer Society
, LLNL-CONF-699263, Salt Lake City, UT, Nov. 13–16, pp.
47:1
47:11
.10.1109/SC.2016.46
15.
Bhatele
,
A.
,
Brink
,
S.
, and
Gamblin
,
T.
,
2019
, “
Hatchet: Pruning the Overgrowth in Parallel Profiles
,”
Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
, (
SC '19
), Association for Computing Machinery, Denver, CO, Nov. 17–22, pp.
20:1
20:21
.10.1145/3295500.3356219
16.
BLT: A Streamlined CMake Build System Foundation for Developing HPC Software
,” Lawrence Livermore National Laboratory GitHub, accessed July 2023, https://github.com/llNL/blt
17.
McCalpin
,
J. D.
,
1995
, “
Memory Bandwidth and Machine Balance in Current High Performance Computers
,”
Computer Society Technical Committee on Computer Architecture Newsletter
, pp.
19
25
.https://www.researchgate.net/publication/51992086_Memory_bandwidth_and_machine_balance_in_high_performance_computers/citation/download
18.
Humphrey
,
J. R.
,
Price
,
D. K.
,
Spagnoli
,
K. E.
,
Paolini
,
A. L.
, and
Kelmelis
,
E. J.
,
2010
, “
CULA: Hybrid GPU Accelerated Linear Algebra Routines
,”
SPIE Proc.
7705
, pp.
9
15
,
19.
Tomov
,
S.
,
Nath
,
R.
,
Ltaief
,
H.
, and
Dongarra
,
J.
,
2010
, “
Dense Linear Algebra Solvers for Multicore With GPU Accelerators
,” IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (
IPDPSW
),
IEEE
, Atlanta, GA, Apr. 19-23, pp.
1
8
.10.1109/IPDPSW.2010.5470941
20.
Rupp
,
K.
,
Rudolf
,
F.
, and
Weinbub
,
J.
,
2010
, “
ViennaCL: A High Level Linear Algebra Library for GPUs and Multi-Core CPUs
,”
International Workshop on GPUs and Scientific Applications
, Vienna, Austria, Sept. 11, pp.
51
56
.https://www.iue.tuwien.ac.at/pdf/ib_2010/hashed_links/3w__rSdl.B8Y_us.pdf
21.
Tomov
,
S.
,
Dongarra
,
J.
, and
Baboulin
,
M.
,
2010
, “
Towards Dense Linear Algebra for Hybrid GPU Accelerated Manycore Systems
,”
Parallel Comput.
,
36
(
5–6
), pp.
232
240
.10.1016/j.parco.2009.12.005
22.
Dongarra
,
J.
,
Gates
,
M.
,
Haidar
,
A.
,
Kurzak
,
J.
,
Luszczek
,
P.
,
Tomov
,
S.
, and
Yamazaki
,
I.
,
2014
, “
Accelerating Numerical Dense Linear Algebra Calculations With GPUs
,”
Numerical Computations with GPUs
, Kindratenko, V, ed., Springer, Cham, pp.
1
26
.10.1007/978-3-319-06548-9_1
23.
Haidar
,
A.
,
Dong
,
T.
,
Tomov
,
S.
,
Luszczek
,
P.
, and
Dongarra
,
J.
,
2015
, “
Framework for Batched and GPU-Resident Factorization Algorithms to Block Householder Transformations
,”
ISC High Performance
,
Springer
, Frankfurt, Germany, July 12–16, pp.
31
47
.
24.
Abdelfattah
,
A.
,
Baboulin
,
M.
,
Dobrev
,
V.
,
Dongarra
,
J.
,
Earl
,
C.
,
Falcou
,
J.
,
Haidar
,
A.
, et al.,
2016
, “
High-Performance Tensor Contractions for GPUs
,” Report No. UT-EECS-16-
738
, 01–2016.
25.
Anzt
,
H.
,
Cojean
,
T.
,
Flegar
,
G.
,
Göbel
,
F.
,
Grützmacher
,
T.
,
Nayak
,
P.
,
Ribizel
,
T.
,
Tsai
,
Y.
, and
Quintana-Ortí
,
E. S.
,
2022
, “
Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing
,”
ACM Trans. Math. Software
,
48
(
1
), pp.
1
33
. mar10.1145/3480935
26.
Anderson
,
R. W.
,
Dobrev
,
V. A.
,
Kolev
,
T. V.
, and
Rieben
,
R. N.
,
2015
, “
Monotonicity in High-Order Curvilinear Finite Element Arbitrary Lagrangian–Eulerian Remap
,”
Int. J. Numer. Methods Fluids
,
77
(
5
), pp.
249
273
.10.1002/fld.3965
27.
Tipton
,
R.
,
1994
, “
CALE93: The Eulerian Interface Advection Scheme in CALE
,” Technical Report.
28.
Youngs
,
D. L.
,
1982
, “
Time-Dependent Multi-Material Flow With Large Fluid Distortion
,”
Numer. Methods Fluid Dynamics
,
24
, pp.
273
285
.
29.
Ahn
,
H. T.
, and
Shashkov
,
M.
,
2009
, “
Adaptive Moment-of-Fluid Method
,”
J. Comput. Phys.
,
228
(
8
), pp.
2792
2821
.10.1016/j.jcp.2008.12.031
30.
Liu
,
J.
, and
Yao
,
J.
,
2021
, “
Non-Diffusive Volume Advection With a High Order Interface Reconstruction Method
,”
Lawrence Livermore National Laboratory
, Report No. LLNL-TR-825766.
31.
Hajduk
,
H.
,
Kuzmin
,
D.
,
Kolev
,
T.
, and
Abgrall
,
R.
,
2020
, “
Matrix-Free Subcell Residual Distribution for Bernstein Finite Element Discretizations of Linear Advection Equations
,”
Comput. Methods Appl. Mech. Eng.
,
359
, p.
112658
.10.1016/j.cma.2019.112658
32.
Anderson
,
R.
,
Dobrev
,
V.
,
Kolev
,
T.
,
Kuzmin
,
D.
,
de Luna
,
M. Q.
,
Rieben
,
R.
, and
Tomov
,
V.
,
2017
, “
High-Order Local Maximum Principle Preserving (MPP) Discontinuous Galerkin Finite Element Method for the Transport Equation
,”
J. Comput. Phys.
,
334
, pp.
102
124
.10.1016/j.jcp.2016.12.031
33.
Dobrev
,
V. A.
,
Kolev
,
T. V.
,
Rieben
,
R. N.
, and
Tomov
,
V. Z.
,
2016
, “
Multi-Material Closure Model for High-Order Finite Element Lagrangian Hydrodynamics
,”
Int. J. Numer. Meth. Fluids
,
82
(
10
), pp.
689
706
.10.1002/fld.4236
34.
Warshaw
,
S. I.
,
2001
, “The TDF System for Thermonuclear Plasma Reaction Rates, Mean Energies and Two-Body Final State Particle Spectra,
Lawrence Livermore National Laboratory
, Report No. UCRL-ID-144510.
35.
Brysk
,
H.
,
1974
, “
Electron-Ion Equilibration in a Partially Degenerate Plasma
,”
Phys. Plasmas
,
16
(
10
), pp.
927
932
.10.1088/0032-1028/16/10/005
36.
Kolev
,
T. V.
, and
Vassilevski
,
P. S.
,
2012
, “
Parallel Auxiliary Space AMG Solver for H(Div) Problems
,”
SIAM J. Sci. Comput.
,
34
(
6
), pp.
A3079
A3098
.10.1137/110859361
37.
Pazner
,
W.
,
Kolev
,
T.
, and
Vassilevski
,
P.
,
2023
, “Matrix-Free GPU-Accelerated Saddle-Point Solvers for High-Order Problems in H(Div),” https://arxiv.org/abs/2304.12387.
38.
Dobrev
,
V.
,
Kolev
,
T.
,
Lee
,
C. S.
,
Tomov
,
V.
, and
Vassilevski
,
P. S.
,
2019
, “
Algebraic Hybridization and Static Condensation With Application to Scalable H(Div) Preconditioning
,”
SIAM J. Sci. Comput.
,
41
(
3
), pp.
B425
B447
.10.1137/17M1132562
39.
Melenk
,
J.
,
2002
, “
On Condition Numbers in hp-FEM With Gauss-Lobatto-Based Shape Functions
,”
J. Comput. Appl. Math.
,
139
(
1
), pp.
21
48
.10.1016/S0377-0427(01)00391-0
40.
Brunner
,
T. A.
,
2006
, “
Development of a Grey Nonlinear Thermal Radiation Diffusion Verification Problem
,” Report No. SAND2006-4030C.
41.
Walters
,
W.
,
2008
, “
A Brief History of Shaped Charges
,”
24th International Symposium on Ballistics
, Vol.
1
, New Orleans, LA, Sept. 22-26, pp.
3
10
.
42.
Hurricane
,
O. A.
,
Hansen
,
J. F.
,
Robey
,
H. F.
,
Remington
,
B. A.
,
Bono
,
M. J.
,
Harding
,
E. C.
,
Drake
,
R. P.
, and
Kuranz
,
C. C.
,
2009
, “
A High Energy Density Shock Driven Kelvin-Helmholtz Shear Layer Experiment
,”
Phys. Plasmas
,
16
(
5
), p.
056305
.10.1063/1.3096790
43.
Pazner
,
W.
,
Kolev
,
T.
, and
Dohrmann
,
C. R.
,
2023
, “
Low-Order Preconditioning for the High-Order Finite Element de Rham Complex
,”
SIAM J. Sci. Comput.
,
45
(
2
), pp.
A675
A702
.10.1137/22M1486534
You do not currently have access to this content.