Graphical processing unit (GPU) computation in recent years has seen extensive growth due to advancement in both hardware and software stack. This has led to increase in the use of GPUs as accelerators across a broad spectrum of applications. This work deals with the use of general purpose GPUs for performing computational fluid dynamics (CFD) computations. The paper discusses strategies and findings on porting a large multifunctional CFD code to the GPU architecture. Within this framework, the most compute intensive segment of the software, the BiCGStab linear solver using additive Schwarz block preconditioners with point Jacobi iterative smoothing is optimized for the GPU platform using various techniques in CUDA Fortran. Representative turbulent channel and pipe flow are investigated for validation and benchmarking purposes. Both single and double precision calculations are highlighted. For a modest single block grid of 64 × 64 × 64, the turbulent channel flow computations showed a speedup of about eightfold in double precision and more than 13-fold for single precision on the NVIDIA Tesla GPU over a serial run on an Intel central processing unit (CPU). For the pipe flow consisting of 1.78 × 106 grid cells distributed over 36 mesh blocks, the gains were more modest at 4.5 and 6.5 for double and single precision, respectively.

References

References
1.
Vanka
,
S. P.
,
2013
, “
2012 Freeman Scholar Lecture: Computational Fluid Dynamics on Graphics Processing Units
,”
ASME J. Fluids Eng.
,
135
(
6
), p. 061401.
2.
Thibault
,
J. C.
, and
Senocak
,
I.
,
2009
, “
CUDA Implementation of a Navier–Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows
,”
AIAA
Paper No. 2009-758.
3.
Jacobsen
,
D. A.
,
Thibault
,
J. C.
, and
Senocak
,
I.
,
2010
, “
An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters
,”
AIAA
Paper No. 2010-522.
4.
Jacobsen
,
D. A.
, and
Senocak
,
I.
,
2011
, “
A Full-Depth Amalgamated Parallel 3D Geometric Multigrid Solver for GPU Clusters
,”
AIAA
Paper No. 2011-946.
5.
Jacobsen
,
D. A.
, and
Senocak
,
I.
,
2011
, “
Scalability of Incompressible Flow Computations on Multi-GPU Clusters Using Dual-Level and Tri-Level Parallelism
,”
AIAA
Paper No. 2011-947.
6.
Elsen
,
E.
,
LeGresley
,
P.
, and
Darve
,
E.
,
2008
, “
Large Calculation of the Flow Over a Hypersonic Vehicle Using a GPU
,”
J. Comput. Phys.
,
227
(
24
), pp.
10148
10161
.
7.
Cohen
,
J.
, and
Molemaker
,
M. J.
,
2009
, “
A Fast Double Precision CFD Code Using CUDA
,” Parallel Computational Fluid Dynamics: Recent Advances and Future Directions,
R.
Biswas
, ed.,
DEStech Publications
,
Lancaster, PA
, pp.
414
429
.
8.
Kampolis
,
I.
,
Trompoukis
,
X.
,
Asouti
,
V.
, and
Giannakoglou
,
K.
,
2010
, “
CFD-Based Analysis and Two-Level Aerodynamic Optimization on Graphics Processing Units
,”
Comput. Methods Appl. Mech. Eng.
,
199
(
9
), pp.
712
722
.
9.
Patnaik
,
G.
,
Corrigan
,
A.
,
Obenschain
,
K.
,
Schwer
,
D.
, and
Fyfe
,
D.
,
2012
, “
Efficient Utilization of a CPU–GPU Cluster
,”
AIAA
Paper No. 2012-0563.
10.
Corrigan
,
A.
,
Camelli
,
F. F.
,
Löhner
,
R.
, and
Wallin
,
J.
,
2011
, “
Running Unstructured Grid‐Based CFD Solvers on Modern Graphics Hardware
,”
Int. J. Numer. Methods Fluids
,
66
(
2
), pp.
221
229
.
11.
Le
,
H. P.
, and
Cambier
,
J.-L.
,
2011
, “
Development of a Flow Solver With Complex Kinetics on the Graphic Processing Units
,”
AIAA
Paper No. 2012-721.
12.
Chandar
,
D. D.
,
Sitaraman
,
J.
, and
Mavriplis
,
D.
,
2012
, “
GPU Parallelization of an Unstructured Overset Grid Incompressible Navier–Stokes Solver for Moving Bodies
,”
AIAA
Paper No. 2012-0723.
13.
Shimokawabe
,
T.
,
Aoki
,
T.
,
Muroi
,
C.
,
Ishida
,
J.
,
Kawano
,
K.
,
Endo
,
T.
,
Nukada
,
A.
,
Maruyama
,
N.
, and
Matsuoka
,
S.
,
2010
, “
An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code
,”
IEEE 2010 International Conference for High Performance Computing, Networking, Storage and Analysis
(
SC
), New Orleans, LA, Nov. 13–19, pp.
1
11
.
14.
Phillips
,
E. H.
, and
Fatica
,
M.
,
2010
, “
Implementing the Himeno Benchmark With CUDA on GPU Clusters
,”
IEEE International Symposium on Parallel and Distributed Processing
(
IPDPS
), Atlanta, GA, Apr. 19–23, pp.
1
10
.
15.
Xiong
,
Q.
,
Li
,
B.
,
Zhou
,
G.
,
Fang
,
X.
,
Xu
,
J.
,
Wang
,
J.
,
He
,
X.
,
Wang
,
X.
,
Wang
,
L.
,
Ge
,
W.
, and
Li
,
J.
,
2012
, “
Large-Scale DNS of Gas–Solid Flows on Mole-8.5
,”
Chem. Eng. Sci.
,
71
, pp.
422
430
.
16.
Tafti
,
D. K.
,
2001
, “
GenIDLEST—A Scalable Parallel Computational Tool for Simulating Complex Turbulent Flows
,” ASME–IMECE, American Society of Mechanical Engineers, pp.
347
356
.
17.
Tafti
,
D.
, and
Vanka
,
S.
,
1991
, “
A Numerical Study of the Effects of Spanwise Rotation on Turbulent Channel Flow
,”
Phys. Fluids A: Fluid Dyn.
,
3
(
4
), p.
642
.
18.
Tafti
,
D. K.
,
2011
, “
Time-Accurate Techniques for Turbulent Heat Transfer Analysis in Complex Geometries
,”
Computational Fluid Dynamics and Heat Transfer
,
R.
Amano
and
B.
Sunden
, eds.,
WIT Press
,
Southampton, UK
, pp.
217
264
.
19.
Gopalakrishnan
,
P.
, and
Tafti
,
D. K.
,
2009
, “
A Parallel Boundary Fitted Dynamic Mesh Solver for Applications to Flapping Flight
,”
Comput. Fluids
,
38
(
8
), pp.
1592
1607
.
20.
Gopalakrishnan
,
P.
, and
Tafti
,
D. K.
,
2010
, “
Effect of Wing Flexibility on Lift and Thrust Production in Flapping Flight
,”
AIAA J.
,
48
(
5
), pp.
865
877
.
21.
Nagendra
,
K.
,
Tafti
,
D.
, and
Viswanath
,
K.
,
2014
, “
A New Approach for Conjugate Heat Transfer Problems Using Immersed Boundary Method for Curvilinear Grid Based Solvers
,”
J. Comput. Phys.
,
267
, pp.
225
246
.
22.
Amritkar
,
A.
,
Deb
,
S.
, and
Tafti
,
D.
,
2014
, “
Efficient Parallel CFD–DEM Simulations Using OpenMP
,”
J. Comput. Phys.
,
256
, pp.
501
519
.
23.
Amritkar
,
A.
,
Tafti
,
D.
,
Liu
,
R.
,
Kufrin
,
R.
, and
Chapman
,
B.
,
2012
, “
OpenMP Parallelism for Fluid and Fluid-Particulate Systems
,”
Parallel Comput.
,
38
(
9
), pp.
501
517
.
24.
Tafti
,
D. K.
,
1995
, “
A Study of Krylov Methods for the Solution of the Pressure-Poisson Equation on the CM-5
,” Numerical Developments in CFD, pp.
1
8
.
25.
Van der Vorst
,
H. A.
,
2003
,
Iterative Krylov Methods for Large Linear Systems
,
Cambridge University Press
,
Cambridge, UK
.
26.
Wang
,
G.
, and
Tafti
,
D. K.
,
1999
, “
Performance Enhancement on Microprocessors With Hierarchical Memory Systems for Solving Large Sparse Linear Systems
,”
Int. J. High Performance Comput. Appl.
,
13
(
1
), pp.
63
79
.
27.
Sathre
,
P.
,
Amritkar
,
A.
,
Chivukula
,
S.
,
Hou
,
K.
,
Tafti
,
D.
, and
Feng
,
W.-C.
,
2015
, “
A Methodology for Concurrent Co-Design of Accelerator-Aware Applications
,” (submitted).
28.
Baboulin
,
M.
,
Buttari
,
A.
,
Dongarra
,
J.
,
Kurzak
,
J.
,
Langou
,
J.
,
Langou
,
J.
,
Luszczek
,
P.
, and
Tomov
,
S.
,
2009
, “
Accelerating Scientific Computations With Mixed Precision Algorithms
,”
Comput. Phys. Commun.
,
180
(
12
), pp.
2526
2533
.
29.
Clark
,
M. A.
,
Babich
,
R.
,
Barros
,
K.
,
Brower
,
R. C.
, and
Rebbi
,
C.
,
2010
, “
Solving Lattice QCD Systems of Equations Using Mixed Precision Solvers on GPUs
,”
Comput. Phys. Commun.
,
181
(
9
), pp.
1517
1528
.
30.
Jespersen
,
D. C.
,
2010
, “
Acceleration of a CFD Code With a GPU
,”
Sci. Program.
,
18
(
3
), pp.
193
201
.
31.
Moser
,
R. D.
,
Kim
,
J.
, and
Mansour
,
N. N.
,
1999
, “
Direct Numerical Simulation of Turbulent Channel Flow Up to Re = 590
,”
Phys. Fluids
,
11
(
4
), p.
943
.
32.
den Toonder
,
J. M. J.
, and
Nieuwstadt
,
F. T. M.
,
1997
, “
Reynolds Number Effects in a Turbulent Pipe Flow for Low to Moderate Re
,”
Phys. Fluids
,
9
(
11
), pp.
3398
3409
.
You do not currently have access to this content.