This paper discusses the various issues of using graphics processing units (GPU) for computing fluid flows. GPUs, used primarily for processing graphics functions in a computer, are massively parallel multicore processors, which can also perform scientific computations in a data parallel mode. In the past ten years, GPUs have become quite powerful and have challenged the central processing units (CPUs) in their price and performance characteristics. However, in order to fully benefit from the GPUs' performance, the numerical algorithms must be made data parallel and converge rapidly. In addition, the hardware features of the GPUs require that the memory access be managed carefully in order to not suffer from the high latency. Fully explicit algorithms for Euler and Navier–Stokes equations and the lattice Boltzmann method for mesoscopic flows have been widely incorporated on the GPUs, with significant speed-up over a scalar algorithm. However, more complex algorithms with implicit formulations and unstructured grids require innovative thinking in data access and management. This article reviews the literature on linear solvers and computational fluid dynamics (CFD) algorithms on GPUs, including the author's own research on simulations of fluid flows using GPUs.

References

References
1.
Nickolls
,
J.
, and
Dally
,
W. J.
,
2010
, “
The GPU Computing Era
,”
IEEE MICRO
,
30
(
2
), pp.
56
69
.10.1109/MM.2010.41
2.
Fatahalian
,
K.
, and
Houston
,
M.
,
2008
, “
A Closer Look at GPUs
,”
Commun. ACM
,
51
(
10
), pp.
50
57
.10.1145/1400181.1400197
3.
Boyd
,
C.
,
2008
, “
Data Parallel Computing
,”
ACM Queue
,
6
(
2
), pp.
31
39
.10.1145/1365490.1365499
4.
Lindholm
,
E.
,
Nickolls
,
J.
,
Oberman
,
S.
, and
Montrym
,
J.
,
2008
, “
NVIDIA Tesla: A Unified Graphics and Computing Architecture
,”
IEEE MICRO
,
28
(
2
), pp.
39
55
.10.1109/MM.2008.31
6.
Products & Technologies
,” AMD, http:/www.amd.com/us/products
7.
Patankar
,
S. V.
,
1980
,
Numerical Heat Transfer and Fluid Flow
,
McGraw Hill
,
New York
.
8.
Fletcher
,
C. A. J.
,
1991
,
Computational Techniques for Fluid Dynamics
,
Springer
,
Berlin
.
9.
Anderson
,
D. A.
,
Tannehill
,
J. C.
, and
Pletcher
,
R. H.
,
1984
,
Computational Fluid Mechanics and Heat Transfer
,
Hemisphere
,
New York
.
10.
Ferziger
,
J. H.
, and
Peric
,
M.
,
2002
,
Computational Methods for Fluid Dynamics
,
3rd ed.
,
Springer Verlag
,
Berlin
.
11.
ANSYS Fluent
, http://www.ansys.com
12.
CFD and CAE Products – CD-adapco
,” CD-adapco, http://www.cd-adapco.com/products/
13.
COMSOL Multiphysics Engineering Simulation Software
,” COMSOL, http://www.comsol.com/products/multiphysics/
14.
ESI Group – Fluid Dynamics
,” ESI, http://www.esi-group.com/products/Fluid-Dynamics
15.
Metacomp Technologies
, http://www.metacomptech.com/
16.
Pope
,
S. B.
,
2000
,
Turbulent Flows
,
Cambridge University
,
Cambridge, England
.
17.
Gorder
,
P. F.
,
2007
, “
Multicore Processors for Science and Engineering
,”
Comput. Sci. Eng.
,
9
(
2
), pp.
3
7
10.1109/MCSE.2007.35.
18.
Geer
,
D.
,
2005
, “
Chip Makers Turn to Multicore Processors
,”
Computer
,
38
(
5
), pp.
11
13
.10.1109/MC.2005.160
19.
Owens
,
J. D.
,
Houston
,
M.
,
Luebke
,
D.
,
Green
,
S.
,
Stone
,
J. E.
, and
Phillips
,
J. C.
,
2008
, “
GPU Computing
,”
Proc. IEEE
,
96
(
5
), pp.
879
899
.10.1109/JPROC.2008.917757
20.
Kirk
,
D. B.
, and
Hwu
,
W. W.
,
2010
,
Programming Massively Parallel Processors: A Hands-On Approach
(Applications of GPU Computing Series),
Morgan Kaufman, Burlington, MA
.
21.
Liu
,
G. R.
, and
Liu
,
M. B.
,
2003
,
Smoothed Particle Hydrodynamics: A Meshfree Particle Method
,
World Scientific
,
Singapore
.
22.
Succi
,
S.
,
2001
,
The Lattice Boltzmann Equation for Fluid Dynamics and Beyond
,
Oxford University
,
New York
.
23.
Bird
,
G. A.
,
1994
,
Molecular Gas Dynamics and the Direct Simulation of Gas Flows
,
Oxford University
,
New York
.
24.
Parallel Programming and Computing Platform: CUDA
,” NVIDIA, http://www.nvidia.com/object/cuda_home_new.html
25.
Nickolls
,
J.
,
Buck
,
I.
,
Garland
,
M.
, and
Skadron
,
K.
,
2008
, “
Scalable Parallel Programming With CUDA
,”
ACM Queue
,
6
(
2
), pp.
41
53
.10.1145/1365490.1365500
26.
Halfhill
,
T. R.
,
2008
, “
Parallel Processing With CUDA
,”
Microprocessor Rep.
,
Jan. 28
,
2008
.
27.
Sanders
,
J.
, and
Kandrot
,
E.
,
2011
,
CUDA by Example: An Introduction to General-Purpose GPU Programming
,
Addison-Wesley
,
New Jersey
.
28.
Cook
,
S.
,
2011
,
CUDA Programming: A Developer's Guide to Parallel Computing With GPUs
,
Morgan Kaufmann, Burlington, MA
.
29.
Farber
,
R.
,
2011
,
CUDA Application Design and Development
,
Elsevier
,
New York
.
30.
Tsuchiyama
,
R.
,
Nakamura
,
T.
,
Iizuka
,
T.
,
Asahara
,
A.
,
Son
,
J.
, and
Miki
,
S.
,
2012
,
The OpenCL Programming Book
,
Fixstars Corporation, Japan
.
31.
PGI CUDA FORTRAN Compiler
, The Portland Group, http://www.pgroup.com/resources/accel_files/index.htm
32.
Harlow
,
F. H.
, and
Welch
,
J. E.
,
1965
, “
Numerical Calculation of Time-Dependent Viscous Incompressible Flow of Fluid With a Free Surface
,”
Phys. Fluids
,
8
(
12
), pp.
2182
2189
.10.1063/1.1761178
33.
Hockney
,
R. W.
, and
Jesshope
,
C. R.
,
1981
,
Parallel Computers
,
Adam Hilger
,
Bristol
, UK.
34.
Greenbaum
,
A.
,
1997
,
Iterative Methods for Solving Linear Systems
,
SIAM
,
Philadelphia
.
35.
Saad
,
Y.
,
2003
,
Iterative Methods for Sparse Linear Systems
,
SIAM
,
Philadelphia
.
36.
Hockney
,
R. W.
,
1965
, “
A Fast Direct Solution of Poisson's Equation Using Fourier's Analysis
,”
J. ACM
,
12
(
1
), pp.
95
113
.10.1145/321250.321259
37.
Allmann
,
S.
,
Rauber
,
T.
, and
Runger
,
G.
,
2001
, “
Cyclic Reduction on Distributed Shared Memory Machines
,” Euromicro Conference on Parallel Distributed and Networked-Based Processing,
IEEE
Computer Society, pp.
290
297
.10.1109/EMPDP.2001.905055
38.
Lambiotte
,
J. J.
, and
Voigt
,
R. G.
,
1975
, “
The Solution of Tridiagonal Linear Systems on the CDC STAR-100 Computer
,”
ACM Trans. Math. Softw.
,
1
(
4
), pp.
308
329
.10.1145/355656.355658
39.
Muller
,
S. M.
, and
Sheerer
,
D.
,
1991
, “
A Method to Parallelize Tridiagonal Solvers
,”
Parallel Comput.
,
17
, pp.
181
188
.10.1016/S0167-8191(05)80104-8
40.
Stone
,
H. S.
,
1973
, “
An Efficient Parallel Algorithm for the Solution of a Tridiagonal Linear System of Equations
,”
J. ACM
,
20
(
1
), pp.
27
38
.10.1145/321738.321741
41.
Ho
,
C. T.
, and
Johnson
,
S. L.
,
1990
, “
Optimizing Tridiagonal Solvers for Alternating Direction Methods on Boolean Cube Multiprocessors
,”
SIAM (Soc. Ind. Appl. Math.) J. Sci. Stat. Comput.
,
11
(
3
), pp.
563
592
.10.1137/0911032
42.
Egecioglu
,
O.
,
Koc
,
C. K.
, and
Laub
,
A. J.
,
1989
, “
A Recursive Doubling Algorithm for Solution of Tridiagonal Systems on Hypercube Multiprocessors
,”
J. Comput. Appl. Math.
,
27
, pp.
95
108
.10.1016/0377-0427(89)90362-2
43.
Zhang
,
Y.
,
Cohen
,
J.
, and
Owens
,
J. D.
,
2010
, “
Fast Tridiagonal Solvers on the GPU
,” Proceedings of the 15th
ACM
SIGPLAN Symposium on the Principles and Practice of Parallel Programming, pp.
127
136
.10.1145/1693453.1693472
44.
Davidson
,
A.
,
Zhang
,
Y.
, and
Owens
,
J. D.
,
2011
, “
An Auto-Tuned Method for Solving Large Tridiagonal Systems on the GPU
,” Proceedings of the 2011
IEEE
International Parallel & Distributed Processing Symposium, pp.
956
965
.10.1109/IPDPS.2011.92
45.
Egloff
,
D.
,
2010
, “
High Performance Finite Difference PDE Solvers on GPUs
,” QuantAlea GmbH Technical Report.
46.
Sakharmykh
,
N.
,
2010
, “
Efficient Tridiagonal Solvers for ADI Methods and Fluid Simulation
,”
NVIDIA GPU Technology Conference
.
47.
Goddeke
,
D.
, and
Strzodka
,
R.
,
2011
, “
Cyclic Reduction Tridiagonal Solvers on GPUs Applied to Mixed Precision Multigrid
,”
IEEE Trans. Parallel Distrib. Syst.
,
22
(
1
), pp.
22
32
.10.1109/TPDS.2010.61
48.
Bolz
,
J.
,
Farmer
,
I.
,
Grinspun
,
E.
, and
Schroder
,
P.
,
2003
, “
Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid
,”
ACM Trans. Graphics
,
22
(
3
), pp.
917
924
.10.1145/882262.882364
49.
Goodnight
,
N.
,
Woolley
,
C.
,
Lewin
,
G.
,
Luebke
,
D.
, and
Humphreys
,
G.
,
2003
, “
A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware
,”
SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware
, pp.
1
11
.
50.
Bahi
,
J. M.
,
Couturier
,
R.
, and
Khodja
,
L. Z.
,
2011
, “
Parallel Sparse Linear Solver GMRES for GPU Clusters With Compression of Exchanged Data
,”
Lect. Notes Comput. Sci.
,
7155
, pp.
471
480
.10.1007/978-3-642-29737-3
51.
Amador
,
G.
, and
Gomes
,
A.
,
2009
, “
Linear Solvers for Stable Fluids: GPU vs CPU
,”
Proceedings of the 17th Encontro Português de Computação Gráfica (EPCG’09)
, pp.
145
153
.
52.
Gaikwad
,
A.
, and
Toke
,
I. M.
,
2010
, “
Parallel Iterative Linear Solvers on GPU: A Financial Engineering Case
,” Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing
(PDP)
, pp.
607
614
10.1109/PDP.2010.55.
53.
Li
,
R.
, and
Saad
,
Y.
,
2013
, “
GPU-Accelerated Preconditioned Iterative Linear Solvers
,”
J. Supercomput.
,
63
, pp.
443
466
.10.1007/s11227-012-0825-3
54.
Jost
,
T.
,
Contassot-Vivier
,
S.
, and
Vialle
,
S.
,
2010
, “
An Efficient Multi-Algorithms Sparse Linear Solver for GPUs
,” Parallel Computing: From Multicores and GPU's to Petascale, Vol.
19
,
IOS Press
, Amsterdam, The Netherlands, pp.
546
553
10.3233/978-1-60750-530-3-546.
55.
Haase
,
G.
,
Liebmann
,
M.
,
Douglas
,
C. C.
, and
Plank
,
G.
,
2010
, “
A Parallel Algebraic Multigrid Solver on Graphics Processing Units
,”
Lect. Notes Comput. Sci.
,
5938
, pp.
38
47
.10.1007/978-3-642-11842-5
56.
Wiggers
,
W. A.
,
Bakker
,
V.
,
Kokkeler
,
A. B. J.
, and
Smit
,
G. J. M.
,
2007
, “
Implementing the Conjugate Gradient Algorithm on Multi-Core Systems
,” International Symposium on System-on-Chip
(ISSOC)
, Tampere, Finland, Nov. 19–21, pp.
1
4
10.1109/ISSOC.2007.4427436.
57.
Cevahir
,
A.
,
Nukada
,
A.
, and
Matsuoka
,
S.
,
2009
, “
Fast Conjugate Gradients With Multiple GPUs
,” International Conference on Computational Sciences (
ICCS
), Vol. 5544,
Springer
,
New York
, pp.
893
903
10.1007/978-3-642-01970-8_90.
58.
Liu
,
X.
,
Liu
,
Z.
,
Tan
,
S. X.-D.
, and
Gordon
,
J.
,
2012
, “
Full-Chip Thermal Analysis of 3D ICs With Liquid Cooling by GPU-Accelerated GMRES Method
,”
ISQED
(2012), pp.
123
128
10.1109/ISQED.2012.6187484.
59.
Heuveline
,
V.
,
Lukarski
,
D.
, and
Weiss
,
J. P.
,
2012
, “
Fine-Grained Parallel Preconditioners for Fast GPU-Based Solvers
,”
NVIDIA GPU Technology Conference
,
San Jose, CA
,
May
.
60.
Kruger
,
J.
, and
Westermann
,
R.
,
2003
, “
Linear Algebra Operators for GPU Implementation of Numerical Algorithms,”
ACM Trans. Graphics
,
22
(
3
), pp.
908
913
.10.1145/882262.882363
61.
Williams
,
S.
,
Vuduc
,
R.
,
Oliker
,
L.
,
Shalf
,
J.
,
Yelick
,
K.
, and
Demmel
,
J.
,
2009
, “
Optimizing Sparse Matrix-Vector Multiply on Emerging Multicore Platforms
,”
Parallel Comput.
,
35
(
3
), pp.
178
194
.10.1016/j.parco.2008.12.006
62.
Williams
,
S.
,
Bell
,
N.
,
Choi
,
J.
,
Garland
,
M.
,
Oliker
,
L.
, and
Vu
,
R.
,
2010
, “
Sparse Matrix-Vector Multiplication on Multicore and Accelerators
,” Scientific Computing With Multicore and Accelerators,
CRC Press
, Boca Raton, FL.10.1201/b10376-8
63.
Bell
,
N.
, and
Garland
,
M.
,
2008
, “
Efficient Sparse Matrix-Vector Multiplication on CUDA
,” NVIDIA Technical Report No. NVR 2008-004.
64.
Baskaran
,
M.
, and
Bordawekar
,
R.
,
2008
, “
Optimizing Sparse Matrix-Vector Multiplications on GPUs
,” IBM Technical Report No. RC 24704.
65.
Buatois
,
L.
,
Caumon
,
G.
, and
Levy
,
B.
,
2009
, “
Concurrent Number Cruncher – GPU Implementation of a General Sparse Linear Solver
,”
Int. J. Parallel, Emergent, Distrib. Syst.
,
24
(
3
), pp.
205
223
.10.1080/17445760802337010
66.
Tomov
,
S.
,
Nath
,
R.
,
Ltaief
,
H.
, and
Dongarra
,
J.
,
2010
, “
Dense Linear Algebra Solvers for Multicore With GPU Accelerators
,”
IEEE
International Symposium on Parallel & Distributed Processing, pp.
1
8
.10.1109/IPDPSW.2010.5470941
67.
Weber
,
P.
,
Du
,
R.
,
Luszczek
,
P.
,
Tomov
,
S.
,
Peterson
,
G.
, and
Dongarra
,
J.
,
2012
, “
From CUDA to OpenCL: Towards a Performance-Portable Solution for Multi-Platform GPU Programming
,”
Parallel Comput.
,
38
(
8
), pp.
391
407
.10.1016/j.parco.2011.10.002
68.
Buttari
,
A.
,
Langon
,
J.
,
Kurzak
,
J.
, and
Dongarra
,
J.
,
2009
, “
A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures
,”
Parallel Comput.
,
35
(
1
), pp.
38
53
.10.1016/j.parco.2008.10.002
69.
GPGPU.org: General-Purpose Computation on Graphics Processing Units
,” GPGPU, http://www.gpgpu.org
70.
Humphrey
,
J. R.
,
Price
,
D. K.
,
Spagnoli
,
K. E.
,
Paolini
,
A. L.
, and
Kelmelis
,
E. J.
,
2010
, “
CULA: Hybrid GPU Accelerated Linear Algebra Routines
,”
Proc. SPIE
,
7705
, p.
770502
.10.1117/12.850538
71.
Volkov
,
V.
, and
Demmel
,
J. W.
,
2008
, “
Benchmarking GPUs to Tune Dense Linear Algebra
,”
Proc. 2008 ACM/IEEE Conference on Supercomputing
, pp.
31
41
.
72.
Vuduc
,
R.
,
Chandramowlishwaran
,
A.
,
Choi
,
J.
,
Guney
,
M.
, and
Shringarpure
,
A.
,
2010
, “
On the Limits of GPU Acceleration
,”
Proc. USENIX Wkshp. Hot Topics in Parallelism (HotPar)
,
Berkeley, CA
,
June
.
73.
Agarwal
,
R. K.
,
1989
, “
Development of a Navier-Stokes Code on a Connection Machine
,” Proc. of the 9th AIAA Computational Fluid Dynamics Conference, Buffalo, NY, June,
AIAA
, Paper No. 89-1938, pp.
103
108
.10.2514/6.1989-1938
74.
Agarwal
,
R. K.
, and
Lewis
,
J. C.
,
1992
, “
Computational Fluid Dynamics on Parallel Processors
,”
Comput. Syst. Eng.
,
3
(
1–4
), pp.
251
259
.10.1016/0956-0521(92)90110-5
75.
Levit
,
C.
, and
Jespersen
,
D.
,
1988
, “
Explicit and Implicit Solution of Navier-Stokes Equations on a Massively Parallel Computer
,”
Comput. Struct.
,
30
(
1–2
), pp.
385
393
.10.1016/0045-7949(88)90244-1
76.
Robichaux
,
J.
,
Tafti
,
D. K.
, and
Vanka
,
S. P.
,
1992
, “
Large-Eddy Simulations of Turbulence on the CM-2
,”
Numer. Heat Transfer, Part B
,
21
(
3
), pp.
367
388
.10.1080/10407799208944910
77.
Wang
,
G.
,
1996
, “
Large Eddy Simulations of Bluff-Body Wakes on Parallel Computers
,” Ph.D. thesis,
University of Illinois at Urbana
,
Champaign, IL
.
78.
Kass
,
M.
, and
Miller
,
G.
,
1990
, “
Rapid, Stable Fluid Dynamics for Computer Graphics
,” Computer Graphics (Proc. of
SIGGRAPH
90), pp.
49
57
.10.1145/97880.97884
79.
Stam
,
J.
,
1999
, “
Stable Fluids
,” Proc. 26th Annual Conference on Computer Graphics and Interactive Techniques (
SIGGRAPH
), pp.
121
128
.10.1145/311535.311548
80.
Stam
,
J.
,
2001
, “
A Simple Fluid Solver Based on FFT
,”
J. Graph Tools
,
6
(
2
), pp.
43
52
.10.1080/10867651.2001.10487540
81.
Harris
,
M.
,
2004
, “
Fast Fluid Dynamics Simulation on the GPU
,”
GPU Gems
,
Pearson Education
,
Boston
, MA, pp.
637
665
.
82.
Amador
,
G.
, and
Gomes
,
A.
,
2010
, “
CUDA-Based Linear Solvers for Stable Fluids
,” International Conference on Top of Form Information Science and Applications (
ICISA
),
Apr. 21–23
.10.1109/ICISA.2010.5480268
83.
Crane
,
K.
,
Llamas
,
I.
, and
Tariq
,
S.
,
2007
, “
Real-Time Simulation and Rendering of 3D Fluids
,”
GPU Gems
, Vol.
3
,
Pearson Education
,
Boston
, MA, pp.
633
675
.
84.
Scheidegger
,
C. E.
,
Comba
,
J. L. D.
, and
da Cunha
,
R. D.
,
2005
, “
Practical CFD Simulations on Programmable Graphics Hardware Using SMAC
,”
Comput. Graph. Forum
,
24
, pp.
715, 728
.10.1111/j.1467-8659.2005.00897.x
85.
Comba
,
J. L. D.
,
Dietrich
,
C.
,
Pagot
,
C.
, and
Scheidegger
,
C. E.
,
2003
, “
Computations on GPUs: From a Programmable Pipeline to an Efficient Stream Processor
,”
Rev. Inf. Teór. Appl.
,
10
, pp.
41
70
.
86.
Goddeke
,
D.
,
Strzodka
,
R.
, and
Turek
,
S.
,
2007
, “
Performance and Accuracy of Hardware-Oriented Native Emulated and Mixed-Precision Solvers in FEM Simulations
,”
Int. J. Parallel Emergent Distrib. Syst.
,
22
, pp.
221
256
.10.1080/17445760601122076
87.
Goddeke
,
D.
,
Strzodka
,
R.
,
Mohd-Yusof
,
J.
,
McCormick
,
P.
,
Wobker
,
H.
,
Becker
,
C.
, and
Turek
,
S.
,
2008
, “
Using GPUs to Improve Multigrid Solver Performance on a Cluster
,”
Int. J. CSE
,
4
(
1
), pp.
36
55
.10.1504/IJCSE.2008.021111
88.
Hagen
,
T.
,
Lie
,
K.
, and
Natvig
,
J.
,
2006
, “
Solving the Euler Equations on Graphics Processing Units
,”
Comput. Sci. (ICCS)
,
3994
, pp.
220
227
.10.1007/11758549_34
89.
Hagen
,
T. R.
,
Hjelmervik
,
J. M.
,
Lie
,
K. A.
,
Natvig
,
J. R.
, and
Henriksen
,
M. O.
,
2005
, “
Visual Simulation of Shallow Water Waves
,”
Simul. Model Pract. Theory
,
13
, pp.
716
726
.10.1016/j.simpat.2005.08.006
90.
Brodtkorb
,
A.
,
Hagen
,
T. R.
,
Lie
,
K. A.
, and
Natvig
,
J. R.
,
2010
, “
Simulation and Visualization of the Saint-Venant System Using GPUs
,”
Comput. Visualization Sci.
,
13
, pp.
341
353
.10.1007/s00791-010-0149-x
91.
Brodtkorb
,
A.
, and
Hagen
,
T. R.
,
2010
, “
A Comparison of Three Commodity-Level Parallel Architectures: Multi-Core CPU, Cell BE and GPU
,”
MMCS
2008, Vol. 5862, pp.
70
80
.10.1007/978-3-642-11620-9_6
92.
Elsen
,
E.
,
LeGresley
,
P.
, and
Darve
,
E.
,
2008
, “
Large Calculation of the Flow Over a Hypersonic Vehicle Using a GPU
,”
J. Comput. Phys.
,
227
(
24
), pp.
10148
10161
.10.1016/j.jcp.2008.08.023
93.
Buck
,
I.
,
Foley
,
T.
,
Horn
,
D.
,
Sugerman
,
J.
,
Fatahalian
,
K.
,
Houston
,
M.
, and
Hanrahan
,
P.
,
2003
, “
Brook for GPUs: Stream Computing on Graphics Hardware
,”
ACM Trans.
,
23
(
3
), pp.
777
786
.10.1145/1015706.1015800
94.
Brandvik
,
T.
, and
Pullan
,
G.
,
2008
, “
Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware
,”
46th AIAA Aerospace Sciences Meeting and Exhibit
,
Reno, NV
,
Jan. 7–10
, AIAA Paper No. 2008-607.
95.
Brandvik
,
T.
, and
Pullan
,
G.
,
2007
, “
Acceleration of a Two-Dimensional Euler Solver Using Commodity Graphics Hardware
,”
J. Mech. Eng. Sci.
,
221
(
12
), pp.
1745
1748
.10.1243/09544062JMES813FT
96.
Brandvik
,
T.
, and
Pullan
,
G.
,
2009
, “
An Accelerated 3D Navier-Stokes Solver for Flows in Turbomachines
,”
ASME
Turbo Expo 2009,
Orlando
, FL,
June 8–12
, Paper No. GT2009-60052.10.1115/GT2009-60052
97.
Corrigan
,
A.
,
Camelli
,
F.
,
Löhner
,
R.
, and
Wallin
,
J.
,
2009
, “
Running Unstructured Grid CFD Solvers on Modern Graphics Hardware
,”
19th AIAA Computational Fluid Dynamics Conference
,
July
, Paper No. AIAA-2009-4001.
98.
Corrigan
,
A.
,
Camelli
,
F.
,
Löhner
,
R.
, and
Mut
,
F.
,
2012
, “
Semi-Automatic Porting of a Large-Scale FORTRAN CFD Code to GPUs
,”
Int. J. Numer. Methods Fluids
,
69
, pp.
314
331
.10.1002/fld.2560
99.
Antoniou
,
A. S.
,
Karantasis
,
K. I.
,
Polychronopoulos
,
E. D.
, and
Ekaterinaris
,
J. A.
,
2010
, “
Acceleration of a Finite-Difference WENO Scheme for Large-Scale Simulations on Many-Core Architectures
,”
48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition
,
Orlando, FL
,
Jan. 4–7
.
100.
Cohen
,
J. M.
, and
Molemaker
,
M. J.
,
2009
, “
A Fast Double Precision CFD Code Using CUDA
,”
21st International Conference on Parallel Computational Fluid Dynamics
.
101.
Thibault
,
J.
, and
Senocak
,
I.
,
2009
, “
CUDA Implementation of a Navier-Stokes Solver on Multi-GPU Desktop Platforms for Incompressible Flows
,”
47th AIAA Aerospace Sciences Meeting
,
Jan. 5–8
, Paper No. AIAA 2009-758.
102.
Jacobsen
,
D.
,
Thibault
,
J.
, and
Senocak
,
I.
,
2010
, “
An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computation on Multi-GPU Clusters
,”
AIAA Aerospace Sciences Meeting
,
Reno, NV
,
January
.
103.
DeLeon
,
R.
,
Jacobsen
,
D.
, and
Senocak
,
I.
,
2012
, “
Large Eddy Simulations of Turbulent Incompressible Flows on GPU Clusters
,”
50th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition
, pp.
1
13
.
104.
Griebel
,
M.
, and
Zaspel
,
P.
,
2010
, “
A Multi-GPU Accelerated Solver for the Three-Dimensional Two-Phase Incompressible Navier-Stokes Equations
,”
Comput. Sci. Res. Dev.
,
25
, pp.
65
73
.10.1007/s00450-010-0111-7
105.
Kelly
,
J.
,
2009
, “
GPU-Accelerated Simulation of Two-Phase Incompressible Fluid Flow Using a Level-Set Method for Interface Capturing
,” ASME 2009 International Mechanical Engineering Congress and Exposition
(IMECE
2009), Lake Buena Vista, FL, Nov. 13–19, Paper No. IMECE2009-13330, pp.
2221
2228
.10.1115/IMECE2009-13330
106.
Jespersen
,
D. C.
,
2009
, “
Acceleration of a CFD Code With a GPU
,” NASA Technical Report No. NAS-09-003.
107.
Buning
,
P. G.
,
Jesperson
,
D. E.
,
Pulliam
,
T. H.
,
Chan
,
W. M.
,
Slotnick
,
J. P.
,
Krist
,
S. E.
, and
Renze
,
K. J.
,
1998
,
OVERFLOW User's Manual- version 1.8
,
NASA Langley Research Center
, Hampton, VA.
108.
Phillips
,
E. H.
,
Zhang
,
Y.
,
Davis
,
R. L.
, and
Owens
,
J. D.
,
2009
, “
Rapid Aerodynamic Performance Prediction on a Cluster of Graphics Processing Units
,”
47th AIAA Aerospace Sciences Meeting
,
Reno, NV
,
January
.
109.
Phillips
,
E. H.
,
Davis
,
R. L.
, and
Owens
,
J. D.
,
2010
, “
Unsteady Turbulent Simulations on a Cluster of Graphics Processors
,”
40th AIAA Fluid Dynamics Conference
,
June
, Paper No. AIAA 2010-5036.
110.
Asouti
,
V. G.
,
Trompoukis
,
X. S.
,
Kampolis
,
J. C.
, and
Giannakoglou
,
K. C.
,
2011
, “
Unsteady CFD Computations Using Vertex-Centered Finite Volumes for Unstructured Grids on Graphics Processing Units
,”
Int. J. Numer. Methods Fluids
,
67
, pp.
232
246
.10.1002/fld.2352
111.
Kanpolis
,
J. C.
,
Trompoukis
,
X. S.
,
Asouti
,
V. G.
, and
Giannakoglou
,
K. C.
,
2010
, “
CFD Based Analysis and Two-Level Aerodynamic Optimization on Graphics Processing Units
,”
Comput. Methods Appl. Mech. Eng.
,
199
, pp.
712
722
.10.1016/j.cma.2009.11.001
112.
Turek
,
S.
,
Becker
,
C.
, and
Kilian
,
S.
,
2003
, “
Hardware-Oriented Numeric and Concepts for PDE Software
,”
FGCS, Future Gener. Comput. Syst.
,
22
, pp.
217
238
.10.1016/j.future.2003.09.007
113.
Strzodka
,
R.
,
Doggett
,
M.
, and
Kolb
,
A.
,
2005
, “
Scientific Computation for Simulations of Programmable Graphics Hardware
,”
Simul. Model. Pract. Theory
,
13
, pp.
667
680
.10.1016/j.simpat.2005.08.001
114.
Patnaik
,
G.
, and
Obenschain
,
K. S.
,
2010
, “
Using GPU on HPC Applications to Satisfy Low-Power Computational Requirements
,” 48th
AIAA
Aerospace Sciences Meeting,
Orlando, FL
,
January
, Paper No. AIAA-2010-524.10.2514/6.2010-524
115.
Corrigan
,
A.
, and
Lohner
,
R.
,
2011
, “
Porting of FEFLO to Multi-GPU Clusters
,” 49th
AIAA
Aerospace Sciences Conference,
Orlando, FL
, Paper No. 2011-0948.10.2514/6.2011-948
116.
Klockner
,
A.
,
Warburton
,
T.
,
Bridge
,
J.
, and
Hesthaven
,
J. S.
,
2009
, “
Nodal Discretization Galerkin Methods on Graphics Processors
,”
J. Comput. Phys.
,
228
, pp.
7863
7882
.10.1016/j.jcp.2009.06.041
117.
Fatica
,
M.
,
Jameson
,
A.
, and
Alonso
,
J.
,
2004
, “
Stream-FLO: An Euler Solver for Streaming Architectures
,” AIAA Paper No. AIAA 2004-1090.
118.
Wang
,
P.
,
Abel
,
T.
, and
Kaehler
,
R.
,
2010
, “
Adaptive Mesh Fluid Simulations on GPU
,”
New Astron.
,
15
(
7
), pp.
581
589
.10.1016/j.newast.2009.10.002
119.
Liang
,
W. Y.
,
Hsieh
,
T. J.
,
Satria
,
M.
,
Chang
,
Y. L.
,
Fang
,
J. P.
,
Chen
,
C. C.
, and
Han
,
C. C.
,
2009
, “
A GPU-Based Simulation of Tsunami Propagation and Inundation
,”
Lect. Notes Comput. Sci.
,
5574
, pp.
593
603
.10.1007/978-3-642-03095-6
120.
Mossaiby
,
F.
,
Rossi
,
R.
,
Dadvand
,
P.
, and
Idelsohn
,
S.
,
2012
, “
OpenCL-Based Implementation of an Unstructured Edge-Based Finite Element Convection-Diffusion Solver on Graphics Hardware
,”
Int. J. Numer. Methods Eng.
,
89
, pp.
1635
1651
.10.1002/nme.3302
121.
Che
,
S.
,
Boyer
,
M.
,
Meng
,
J.
,
Tarjan
,
D.
,
Sheaffer
,
J.
, and
Skadron
,
K.
,
2008
, “
A Performance Study of General-Purpose Applications on Graphics Processors Using Cuda
,”
J. Parallel Distrib. Comput.
,
68
(
10
), pp.
1370
1380
.10.1016/j.jpdc.2008.05.014
122.
Li
,
W.
,
Wei
,
X.
, and
Kaufman
,
A.
,
2003
, “
Implementing Lattice Boltzmann Computation on Graphics Hardware
,”
Visual Comput.
,
19
, pp.
444
456
.10.1007/s00371-003-0210-6
123.
Kaufman
,
A.
,
Fan
,
Z.
, and
Petkov
,
K.
,
2009
, “
Implementing the Lattice Boltzmann Model on Commodity Graphics Hardware
,”
J. Stat. Mech.
,
2009
, p.
P06016
.10.1088/1742-5468/2009/06/P06016
124.
Fan
,
Z.
,
Kuo
,
Y.
,
Zhao
,
Y.
,
Qiu
,
F.
,
Kaufman
,
A.
, and
Arcieri
,
W.
,
2009
, “
Visual Simulation of Thermal Fluid Dynamics in a Pressurized Water Reactor
,”
Visual Comput.
,
25
(
11
), pp.
985
996
.10.1007/s00371-008-0309-x
125.
Tolke
,
J.
,
2010
, “
Implementation of a Lattice Boltzmann Kernel Using the Compute Unified Device Architecture Developed by NVIDIA
,”
Comput. Visualization Sci.
,
13
, pp.
29
39
.10.1007/s00791-008-0120-2
126.
Tolke
,
J.
, and
Krafczyk
,
M.
,
2008
, “
Teraflop Computing on a Desktop PC With GPUs for 3D CFD
,”
Int. J. Comput. Fluid Dyn.
,
22
(
7
), pp.
443
456
.10.1080/10618560802238275
127.
Bailey
,
P.
,
Myre
,
J.
,
Walsh
,
S. D. C.
,
Lilja
,
D. J.
, and
Saar
,
M. O.
,
2009
, “
Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors
,”
International Conference on Parallel Processing
,
Vienna Austria
.
128.
Feichtinger
,
C.
,
Habich
,
J.
,
Kostler
,
H.
,
Hager
,
G.
,
Rude
,
U.
, and
Wellein
,
G.
,
2011
, “
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU–CPU Clusters
,”
Parallel Comput.
,
37
(
9
), pp.
536
549
.10.1016/j.parco.2011.03.005
129.
Obrecht
,
C.
,
Kuznik
,
F.
,
Tourancheau
,
B.
, and
Roux
,
J. J.
,
2011
, “
A New Approach to the Lattice Boltzmann Method for Graphics Processing Units
,”
Comput. Math. Appl.
,
61
(
12
), pp.
3628
3638
.10.1016/j.camwa.2010.01.054
130.
Peng
,
L.
,
Nomura
,
K.
,
Oyakawa
,
T.
,
Kalia
,
R.
,
Nakano
,
A.
, and
Vashishta
,
P.
,
2008
, “
Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-Core Platforms
,”
Lect. Notes Comput. Sci.
,
5168
, pp.
763
777
.10.1007/978-3-540-85451-7
131.
Alam
,
M. S.
, and
Cheng
,
L.
,
2011
, “
Parallelization of LBM Code Using CUDA Capable GPU Platform for 3D Single and Two-Sided Non-Facing Lid-Driven Cavity Flow
,” Proceedings of the ASME 2011 30th International Conference on Ocean, Offshore and Arctic Engineering (
OMAE
2011),
Rotterdam, The Netherlands
,
June 19–24
, pp.
745
753
.10.1115/OMAE2011-50332
132.
Sailfish Reference Manual
,” Sailfish, http://sailfish.us.edu.pl/index.html
133.
Rustico
,
E.
,
Bilotta
,
G.
,
Gallo
,
G.
,
Herault
,
A.
, and
Del Negro
,
C.
,
2012
, “
Smoothed Particle Hydrodynamics Simulations on Multi-GPU Systems
,” 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (
PDP
).10.1109/PDP.2012.21
134.
Anderson
,
J. A.
,
Lorenz
,
C. D.
, and
Travesset
,
A.
,
2008
, “
General Purpose Molecular Dynamics Simulations Fully Implemented on Graphics Processing Units
,”
J. Comput. Phys.
,
227
, pp.
5342
5359
.10.1016/j.jcp.2008.01.047
135.
Marsh
,
D.
,
2010
, “
Molecular Dynamics-Lattice Boltzmann Hybrid Method on Graphics Processors
,” M.S. thesis,
University of Illinois at Urbana-Champaign
,
Champaign, IL
.
136.
Sahu
,
K.
, and
Vanka
,
S. P.
,
2011
, “
A Multiphase Lattice Boltzmann Study of Buoyancy-Induced Mixing in a Tilted Channel
,”
Comput. Fluids
,
50
(
1
), pp.
199
215
.10.1016/j.compfluid.2011.07.012
137.
He
,
X.
,
Zhang
,
R.
,
Chen
,
S.
, and
Doolen
,
G. D.
,
1999
, “
On the Three-Dimensional Rayleigh-Taylor Instability
,”
Phys. Fluids
,
11
(
5
), pp.
1143
1152
.10.1063/1.869984
138.
Redapangu
,
P.
,
Vanka
,
S. P.
, and
Sahu
,
K.
,
2012
, “
Multiphase Lattice Boltzmann Simulations of Buoyancy Induced Flow of Two Immiscible Fluids With Different Viscosities
,”
Eur. J. Mech.
, B/Fluids,
34
, pp.
105
114
.10.1016/j.euromechflu.2012.01.006
139.
Redapangu
,
P.
,
Sahu
,
K. C.
, and
Vanka
,
S. P.
,
2012
, “
A Study of Pressure-Driven Displacement Flow of Two Immiscible Liquids Using a Multiphase Lattice Boltzmann Approach
,”
Phys. Fluids
,
24
(
10
), p.
102110
.10.1063/1.4760257
140.
Wang
,
G.
,
Cope
,
W. K.
, and
Vanka
,
S. P.
,
1994
,
Multigrid Calculations of Twin Jet Impingement With Crossflow: Comparison of Segregated and Coupled Relaxation Strategies
, Vol.
196
,
American Society of Mechanical Engineers, Fluids Engineering Division (Publication) FED
,
New York
, pp.
233
244
.
141.
Shinn
,
A. F.
, and
Vanka
,
S. P.
,
2009
, “
Implementation of a Semi-Implicit Pressure-Based Multigrid Fluid Flow Algorithm on a Graphics Processing Unit
,” Proceedings of the ASME (
IMECE
2009), Lake Buena Vista, FL, pp.
125
133
.10.1115/IMECE2009-11587
142.
Shinn
,
A. F.
,
Vanka
,
S. P.
, and
Hwu
,
W. W.
,
2010
, “
Direct Numerical Simulation of Turbulent Flow in a Square Duct Using a Graphics Processing Unit (GPU)
,” 40th
AIAA
Fluid Dynamics Conference.10.2514/6.2010-5029
143.
Shinn
,
A. F.
, and
Vanka
,
S. P.
,
2013
, “
Large Eddy Simulations of Film-Cooling Flows With a Micro-Ramp Vortex Generator
,”
ASME J. Turbomach.
,
135
(
1
), p.
011004
.10.1115/1.4006329
144.
Chaudhary
,
R.
,
Vanka
,
S. P.
, and
Thomas
,
B. G.
,
2010
, “
Direct Numerical Simulations of Magnetic Field Effects on Turbulent Flow in a Square Duct
,”
Phys. Fluids
,
22
(
7
), p.
075102
.10.1063/1.3456724
145.
Chaudhary
,
R.
,
Thomas
,
B. G.
, and
Vanka
,
S. P.
,
2012
, “
Effect of Electromagnetic Ruler Braking (EMBr) on Transient Turbulent Flow in Continuous Slab Casting Using Large Eddy Simulations
,”
Metall. Mater. Trans. B
,
43
(
3
), pp.
532
553
.10.1007/s11663-012-9634-6
146.
Chaudhary
,
R.
,
Vanka
,
S. P.
, and
Thomas
,
B. G.
,
2011
, “
Direct Numerical Simulations of Transverse and Spanwise Magnetic Field Effects on Turbulent Flow in a 2:1 Aspect Ratio Rectangular Duct
,”
Comput. Fluids
,
51
(
1
), pp.
100
114
.10.1016/j.compfluid.2011.08.002
147.
Vanka
,
S. P.
,
Shinn
,
A. F.
, and
Sahu
,
K. C.
,
2011
, “
Computational Fluid Dynamics Using Graphics Processing Units: Challenges and Opportunities
,” Proceedings of the ASME 2011
IMECE
Conference,
Denver, CO
, pp.
429
437
.10.1115/IMECE2011-65260
148.
Nicoud
,
F.
, and
Ducros
,
F.
,
1999
, “
Subgrid-Scale Stress Modelling Based on the Square of the Velocity Gradient Tensor
,”
Flow, Turbul. Combust.
,
62
(
3
), pp.
183
200
.10.1023/A:1009995426001
149.
Shinn
,
A. F.
,
2011
, “
Large Eddy Simulations of Turbulent Flows on Graphics Processing Units: Application to Film-Cooling Flows
,” Ph.D thesis,
University of Illinois at Urbana-Champaign
,
Champaign, IL
.
150.
Chaudhary
,
R.
,
2011
, “
Studies of Turbulent Flows in Continuous Casting of Steel With and Without Magnetic Field
,” Ph.D. thesis,
University of Illinois at Urbana-Champaign
,
Champaign, IL
.
151.
Zaman
,
K. B. M. Q.
,
Rigby
,
D. L.
, and
Heidman
,
J. D.
,
2010
, “
Inclined Jet in Crossflow Interacting With a Vortex Generator
,”
J. Propul. Power
,
26
(
5
), pp.
947
954
.10.2514/1.49742
152.
Timmel
,
K.
,
Eckert
,
S.
, and
Gerbeth
,
G.
,
2011
, “
Experimental Investigation of the Flow in a Continuous-Casting Mold Under the Influence of a Transverse Direct Current Magnetic Field
,”
Metall. Mater. Trans. B
,
42
(
1
), pp.
68
80
.10.1007/s11663-010-9458-1
153.
Timmel
,
K.
,
Miao
,
X.
,
Eckert
,
S.
,
Lucas
,
D.
, and
Gerbeth
,
G.
,
2010
, “
Experimental and Numerical Modeling of the Steel Flow in a Continuous Casting Mould Under the Influence of a Transverse DC Magnetic Field
,”
Magnetohydrodynamics
,
46
(
4
), pp.
337
448
.
154.
Lee
,
V.
,
Kim
,
C.
,
Chuggani
,
J.
,
Deisher
,
M.
,
Kim
,
D.
,
Nguyen
,
A.
,
Satish
,
N.
,
Smelyansky
,
M.
,
Chennupaty
,
S.
,
Hammarlund
,
P.
,
Singhal
,
R.
, and
Dubey
,
P.
,
2010
, “
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU
,”
ISCA
10,
Saint-Malo, France
,
June 19–23
.10.1145/1815961.1816021
You do not currently have access to this content.