## Abstract

Practical engineering designs typically involve many load cases. For topology optimization with many deterministic load cases, a large number of linear systems of equations must be solved at each optimization step, leading to an enormous computational cost. To address this challenge, we propose a mirror descent stochastic approximation (MD-SA) framework with various step size strategies to solve topology optimization problems with many load cases. We reformulate the deterministic objective function and gradient into stochastic ones through randomization, derive the MD-SA update, and develop algorithmic strategies. The proposed MD-SA algorithm requires only low accuracy in the stochastic gradient and thus uses only a single sample per optimization step (i.e., the sample size is always one). As a result, we reduce the number of linear systems to solve per step from hundreds to one, which drastically reduces the total computational cost, while maintaining a similar design quality. For example, for one of the design problems, the total number of linear systems to solve and wall clock time are reduced by factors of 223 and 22, respectively.

## 1 Introduction

A mechanics-driven topology optimization is a powerful tool that can generate materials and structures with unconventional and significantly improved performance in many engineering applications [112]. Real-world engineering designs, such as automobiles, high-rise buildings, and airplanes, involve numerous load scenarios. For instance, the challenge problem proposed in the 2020 topology optimization roundtable is to design a suspension upright of a race car subjected to 13 load cases, as shown in Fig. 1. A detailed solution of this challenge problem is provided by SIMULIA from Dassault Systèmes [13]. Thus, it is important to account for many load cases in structural and mechanical topology optimization.

Fig. 1
Fig. 1

In the literature, several formulations incorporating many load cases have been explored [1416]. A popular one is the weighted-sum formulation, which aims to minimize a weighted sum of the objective functions from all the load cases. Another common approach is the min-max formulation, which minimizes the maximum objective function among all load cases [16,17]. For all of these formulations, a major challenge is the computational cost because the solution of the state equation for each load case is required. For large-scale problems with many load cases (e.g., hundreds or more), this leads to overwhelming computational costs.

To address this challenge, Zhang et al. [18] proposes a randomized algorithm that drastically reduces computational cost for topology optimization with many load cases. The paper introduces randomization techniques to rewrite the deterministic optimization formulation into an equivalent stochastic one. A sample average approximation (SAA)-based approach is then employed in Ref. [18] to estimate the corresponding gradients. Widely used in various contexts [1921], the basic idea of the SAA approach is to generate a random sample ξ1, …, ξn of size n, of the involved random vector ξ, and consequently to approximate the original problem by the sample average problem. Different from the conventional SAA, which fixes the random sample throughout the optimization process, the SAA-based algorithm in Ref. [18] re-selects random samples at every optimization iteration.

The obtained stochastic gradient has to be combined with an optimization algorithm. For example, the optimality criteria (OC) method is used to update the vector of design variables in Ref. [18]. As a result, the SAA approach requires an adequate level of accuracy for the estimated problem (the objective function and its gradient) and, thus, a few samples (e.g., five or six) are typically used. For the randomized formulation of topology optimization considering many load cases, n = 6 is typically sufficient to estimate the gradients when the SAA-based approach is used. This leads to a greatly reduced computational cost compared with the standard deterministic approach.

Unlike the SAA method, the stochastic approximation (SA) method is an algorithm designed to solve stochastic optimization problems, whose objective functions are given in terms of expectations. The SA approach does not require the estimated gradient to be accurate, and it requires only one sample to evaluate the estimated gradient at each optimization iteration [22]. Because of this attractive property, the SA method and its variants have been widely adopted in many modern machine learning methods [23,24]. The performance of the classical SA approach, however, is highly sensitive to the choice of step size. To address this step size dependence, a robust SA approach is proposed in Ref. [25], in which “robustness” with respect to the step size is achieved by averaging the iterates (vector of design variables) over a certain window size. The mirror descent (MD) approach originates from the work of Nemirovski and Yudin [26], and the mirror descent SA (MD-SA) approach can be found in Ref. [22] and is a generalization of the robust SA approach (which is intrinsically linked to the Euclidean space). Through a general definition of a distance-generating function [22], the MD-SA allows for the adjustment of the algorithm to the underlying geometry of the problem.

In this paper, we propose a tailored MD-SA algorithm to efficiently solve randomized topology optimization problems under many load cases for compliance minimization. By tailoring to the geometry of the underlying feasible design space, we introduce an entropic 1 norm MD-SA algorithm with the distance-generating function being the entropy function. This algorithm requires only low accuracy in the stochastic gradient. Thus, we are able to use a single sample (n = 1) regardless of the number of load cases considered. To obtain effective step sizes and updates, we propose several algorithmic strategies, including step size policy and re-calibration, iterates averaging, and a damping scheme to improve the performance and convergence of the proposed algorithm. We also demonstrate both theoretically and numerically that, compared with the robust SA approach based on the Euclidean norm (the commonly used method), the proposed entropic 1 norm MD-SA algorithm exhibits robustness in step size selection and better performance for various problem sizes, particularly large-scale problems. We further compare the performance of an SAA variant [18] (with OC update) with the proposed SA algorithm described in this paper. Compared with the SAA-based algorithm, the proposed algorithm is able to perform high-quality updates with less accurate estimates of gradient and, as a result, leads to fewer linear systems (i.e., one system per step versus six systems per step) to solve. We also demonstrate that, when adopted in conjunction with an iterative solver, the proposed MD-SA algorithm allows for a significantly higher convergence tolerance in solving the state equation without influencing the performance of the optimization, which typically cannot be achieved with any deterministic algorithms.

The remainder of this paper is organized as follows. Section 2 briefly reviews the deterministic and equivalent randomized formulations for topology optimization with many load cases. Section 3 reviews the general MD-SA framework and presents an entropic version of MD-SA using the 1 norm. Section 4 proposes an entropic 1 norm MD-SA algorithm tailored for a randomized topology optimization formulation. In Sec. 5, we introduce the step size strategies, a damping scheme, an iterative solver, and discuss the algorithmic parameters. Section 6 demonstrates numerical examples in two- and three-dimensions, highlighting the efficiency and effectiveness of the proposed algorithm, and Sec. 7 provides concluding remarks.

## 2 Deterministic and Randomized Topology Optimization With Many Load Cases

In this section, we briefly review the deterministic formulation of topology optimization under many load cases and its equivalent randomized form. Throughout this work, we use compliance minimization with a weighted-sum formulation, and we focus on the popular density-based method.

### 2.1 Deterministic Density-Based Topology Optimization With Many Load Cases.

For a given finite element mesh containing M elements and N nodes in d dimensions and assuming a total of m design load cases $fi∈RdN,i=1,…,m$, we denote by αi ($αi>0,∑i=1mαi=1$) and $ui∈RdN$ the weight and displacement vector associated with ith load case $fi$, respectively. The standard weighted-sum topology optimization formulation for minimum end-compliance problems using the density-based method takes the form [16,18],
$minx{C(x)=∑i=1mαifiTui(x)}s.t.∑e=1Mv(e)x¯(e)|Ω|−Vf=00≤x(e)≤1,e=1,…,Mwithui(x)=K(E(x¯))−1fi,i=1,…,mE(e)=Emin+[x¯(e)]p(E0−Emin),e=1,…,M$
(1)
In the above optimization problem, the objective function C(·) is the weighted compliance, $x∈RM$ is the vector of design variables, x(e) is the design variable of element e, $K∈RdN×dN$ is the global stiffness matrix, and dN is the number of degrees-of-freedom (DOF). In order to avoid checker-board instability, we apply a density filter on the vector of design variables to obtain the vector of filtered densities $x¯$ as $x¯=Hx$, where H is the filter matrix [27]. Furthermore, the design problem is subject to a global volume constraint, with the volume (area) of an element e denoted by v(e), the prescribed volume fraction denoted by Vf, and Ω and |Ω| being the problem domain and area/volume of the domain, respectively. To simplify the form of the volume constraint, we introduce $v¯(e)=∑j=1MHe,jv(j)$ and rewrite the volume constraint as
$∑e=1Mv¯(e)x(e)|Ω|−Vf=0$
(2)
Notice that $v¯(e)$ is a constant that does not change throughout the entire optimization process. We use the solid isotropic material with penalization (SIMP) [28,29] approach, Emin and E0 are the elastic moduli for Ersatz and solid materials, respectively, and p is a penalization parameter.
The gradient (sensitivity) $∇Cx(e)=∂C/∂x(e)$ of the objective function is the weighted sum of the sensitivities from each individual loading case:
$∇Cx(e)=−∑i=1mαiuiT∂K∂x(e)ui$
(3)
We remark that the optimization problem in formulation (1) is convex when p = 1 [30]. With p > 1, the formulation becomes non-convex, and there is no guarantee that the optimization algorithm converges to the global minimum.

### 2.2 Randomized Topology Optimization With Many Load Cases.

Here, we introduce a randomized topology optimization formulation under many load cases that are equivalent to the deterministic density-based one (1). We first briefly review the Hutchinson trace estimator [31], which is a popular stochastic sampling technique used to estimate the trace of a matrix. For alternative trace estimators in the literature, the interested reader is referred to Refs. [20,32] and the references therein.

For a given matrix $A∈Rq×q$, the Hutchinson trace estimator uses a random vector, $ξ∈Rq$, whose entries are independent and identically distributed (i.i.d.) and follow the Rademacher distribution (±1, each with probability 1/2 [33]). It can be shown that the random vector ξ has the following properties:
$E(ξ)=0andE(ξξT)=Iq$
(4)
where Iq is the q × q identity matrix. It follows then that
$E(ξTAξ)=trace(A)$
(5)
Using this concept, we introduce an equivalent randomized topology optimization formulation for the deterministic topology optimization formulation under many load cases. In the framework of discretized parameter estimation problems with PDE constraints, a randomized algorithm based on Eq. (5) was used in Ref. [19].
Consider the standard topology optimization formulation in Eq. (1) with m load cases. We introduce the weighted load and displacement matrices $F,U∈RdN×m$ defined as
$F=[α1f1,…,αmfm]andU=[α1u1,…,αmum]$
respectively. With these weighted matrices, we can rewrite the equilibrium equations as U = K−1F, and consequently, the end-compliance and its sensitivities as
$C(x)=∑i=1mαifiTui=trace(FTU)=trace(FTK−1F)$
(6)
$∇Cx(e)=−∑i=1mαiuiT∂K∂x(e)ui=−trace(UT∂K∂x(e)U)=−trace(FTK−1∂K∂x(e)K−1F)$
(7)
Furthermore, by introducing the random vector ξ and applying the Hutchinson trace estimator, we define the randomized topology optimization for the density-based method under m load cases as
$minx{C(x)=E[(Fξ)TK(E(x¯))−1(Fξ)]}s.t.∑e=1Mv¯(e)x(e)|Ω|−Vf=00≤x(e)≤1,e=1,…,MwithE(e)=Emin+[x¯(e)]p(E0−Emin),e=1,…,M$
(8)
Again, $x¯=Hx$. Accordingly, the sensitivity of the objective function in the above randomized formulation is also given in expectation form as
$∇Cx(e)=−trace(FTK−1∂K∂x(e)K−1F)=−E[(Fξ)TK−1∂K∂x(e)K−1(Fξ)]$
(9)
We remark that, because this randomized formulation (8) is equivalent to the standard one (1), it is convex when p = 1 and non-convex when p > 1, where p is the penalty parameter in the SIMP approach.
We also remark that, in recent work [18], a variant of the SAA algorithm is proposed where the compliance and its gradient in Eq. (8) are estimated as the sample average over n sampled load cases. For the density-based approach, the estimated compliance and its gradient are
$C^SAA(x)=1n∑k=1n(Fξk)TK(x)−1(Fξk)$
(10)
and
$(∇C^xSAA)(e)=−1n∑k=1n(Fξk)TK−1∂K∂x(e)K−1(Fξk)$
(11)
A standard deterministic optimization algorithm (i.e., the OC method) is then used to compute the design variable updates based on the estimated gradient. Thus, a sufficiently accurate gradient estimation is needed, and it is shown in Ref. [18] that a sample size of n ≈ 6 provides this accuracy in the considered applications. By doing so, the number of linear system solves at each optimization iteration is reduced from m to n. In the present paper, we take a conceptually different approach. Instead of estimating the gradient using a few samples and then performing an optimization update using the standard algorithm, we treat the randomized formulation (8) as a stochastic optimization problem and use SA algorithms to compute a design variable update. As a result, as shown in Sec. 4, the accuracy requirement on the gradient estimation is relaxed, and only one sample (n = 1) is needed at each optimization step.
For the remainder of this paper, we use the following notation. For a given vector $x∈Rn$, we denote by ‖xp its p norm. In particular, $∥x∥2=xTx$ denotes the Euclidean norm, and ‖x = max{|x(1)|, …, |x(n)|} denotes the max-norm (infinity norm). In addition, we denote by ΠX the metric projection operator onto a closed convex set $X⊂Rn$,
$ΠX(x)=argminy∈X∥x−y∥2$
(12)

## 3 Mirror Descent Stochastic Approximation

This section presents the mirror descent stochastic approximation approach for a general stochastic optimization problem. We first introduce the general framework of the MD-SA. We then restrict our attention to a version of the MD-SA algorithms, the Entropic MD-SA with the 1 norm, which is well-suited for our problem. Another version—the Euclidean MD-SA with the 2 norm and the comparison between these two algorithms are provided in the Appendices and Example 1.

### 3.1 General Framework.

As a direct descendant of the stochastic mirror descent method [26], the mirror descent SA algorithm, developed in Ref. [22], is an effective algorithm to solve stochastic optimization problems of the following form:
$minx∈X{ϕ(x)=E[Φ(x,ξ)]}$
(13)
Here, the feasible set $X⊂Rn$ is assumed to be a nonempty bounded closed convex set, and ξ is a random vector with a given probability distribution. Suppose that the differentiation and expectation operators in the objective function of Eq. (13) commute (this holds under mild regularity conditions), we then can write the gradient of ϕ(x) as
$∇ϕ(x)=E[G(x,ξ)]$
(14)
where $G(x,ξ)=∇xΦ(x,ξ)$. Note that if Φ(x, ξ) is convex in x, then the expected value function ϕ(x) is also convex. In that case, formula (14) also holds for the corresponding subgradients.

Note that the gradient of the objective function lies in the dual of the feasible space of the design variables. The key idea of the MD-SA algorithm is to obtain the updates of the vector of design variables by mapping the gradient descent into the dual of the primal space. At each optimization step, the mirror descent SA uses a single realization of the random vector ξ, independent of the previous iterations. The stochastic gradient G(x, ξ), at the current vector of design variables, is computed, and the algorithm takes a step in the descent direction of the stochastic gradient in the dual space. Finally, the update of the vector of design variables is obtained by “mirroring” the results back into the primal space (i.e., the feasible space of the optimization problem). The mirror descent SA algorithm is a generalization of the robust SA algorithm. For further details, the reader is referred to Ref. [22].

We will now describe the general framework of the MD-SA. For any norm ‖·‖, its dual norm is defined as $∥x∥*=sup∥y∥≤1yTx$. A function $ω:X→R$ is said to be a distance-generating function with respect to a norm ‖·‖ for modulus α > 0, if ω(·) is convex and continuous on X, continuously differentiable on the relative interior of X, and
$(x′−x)T(∇ω(x′)−∇ω(x))≥α∥x′−x∥,forallx,x′∈X$
(15)
Note that Eq. (15) implies that ω is strongly convex. The corresponding prox-function V(·, ·) is defined as follows:
$V(x,z)=ω(z)−[ω(x)+∇ω(x)T(z−x)]$
(16)
Here, z is a point in the feasible space and V(x, ·) is nonnegative and strongly convex with respect to the chosen norm ‖·‖. The function V(·, ·) is also known as the Bergman distance generated by ω.
Based on the prox-function V(x, z), the prox-mapping $Px:Rn→X$ is defined as follows:
$Px(y)=argminz∈X{yT(z−x)+V(x,z)}$
(17)
Because of the strong convexity of ω, the minimizer z*(x, y) = Px(y) is unique. Note that V(x, z) serves as a regularizer when mapping the point y (representing the gradient information in the dual space) to the primal space to obtain z, and x is the current vector of design variables. One example of a distance-generating function is $ω(x)=12xTx$ (see Appendix  A). An important advantage of the MD algorithm is that it is possible to adjust the distance-generating function to the geometry of the set X (cf. [22]). The prox-mapping in this paper is defined in terms of the feasible set of the considered optimization problem (8).
Having introduced the concept of the prox-function and prox-mapping, the general update of the MD-SA at the current iterate xk is
$xk+1=Pxk(γkG(xk,ξk))$
(18)
where γk is a chosen step size and ξk is a sample of the random vector ξ. A convergence analysis and convergence bounds for the so-defined MD-SA are provided in Ref. [22] for convex stochastic optimization problems. Our numerical studies, provided in Sec. 6, indicate that the MD-SA method works for non-convex problems (i.e., Eq. (8) with p > 1) as well.

The recurrence (18) provides a general iteration framework for MD-SA. By selecting the norm ‖·‖ and various forms of distance-generating functions ω, different forms of design update schemes can be obtained. Appropriate choices of the norm and distance-generating function for a given application can substantially improve convergence by adjusting the geometry to the specific constraints. In the following section, the MD-SA algorithm with the 1 norm will be presented.

### 3.2 Entropic MD-SA With ℓ1 Norm.

The MD-SA method with 1 norm has been studied in the context of linear programming and online learning, e.g., Ref. [34]. The dual of the 1 norm is the (‖·‖) norm. A distance-generating function for the 1 norm MD-SA is the entropy function given by (cf. Ref. [22])
$ω(x)=∑e=1Mx(e)ln(x(e))$
(19)
which leads to the entropic 1 norm MD-SA. Based on Eq. (19), the prox-function V(·, ·) can be written as
$V(x,z)=∑e=1M[z(e)ln(z(e)x(e))+x(e)−z(e)]$
(20)
and the corresponding prox-mapping
$Px(y)=argminz∈X{∑e=1M[(y(e)−1)(z(e)−x(e))+z(e)ln(z(e)x(e))]}$
(21)
When the feasible set is the standard simplex,
$X={x∈RM:∑e=1Mx(e)=1,x(e)≥0,e=1,…,M}$
(22)
a closed-form expression of the prox-function is
$[Px(y)](e)=x(e)e−y(e)∑j=1Mx(j)e−y(j),e=1,…,M$
(23)
Thus, the update $xk+1(e)=[Pxk(γkG(xk,ξk))](e)$ for each component in the entropic 1 norm MD-SA can be written as
$xk+1(e)=xk(e)e−γkG(xk,ξk)(e)∑j=1Mxk(j)e−γkG(xk,ξk)(j),e=1,…,M$
(24)
An important question for the MD-SA method is the choice of the step sizes γk; we will introduce the strategies for step size selection in Sec. 5.1.

## 4 Randomized Topology Optimization With An Entropic ℓ1 Norm MD-SA Update Algorithm

In this section, we propose an MD-SA algorithm tailored to efficiently solve the randomized topology optimization formulation (8). As it is tailored to the geometry of the feasible design space, we use the entropic 1 norm MD-SA algorithm described in Sec. 3.2. Furthermore, in the derived update algorithm, we introduce a move limit for each design variable at each optimization step that allows us to effectively incorporate the damping scheme described in Sec. 5.2.

According to Eq. (8), the feasible set for the design variable vector x is given by
$Xx={x∈RM:∑e=1Mv¯(e)x(e)|Ω|−Vf=0,0≤x(e)≤1,e=1,…,M}$
(25)
At iteration k with design variable vector xk, the compliance estimator $C^SA$ takes the form
$C^SA(xk)=(Fξk)TK(xk)−1(Fξk)$
(26)
and the stochastic gradient G(xk, ξk), using a single sample ξk, is given in component form as
$G(e)(xk,ξk)=(∇C^xSA)(e)=−(Fξk)TK−1∂K∂x(e)K−1(Fξk)$
(27)
Here, we map the vector of design variables to a scaled feasible set (which is the standard simplex), compute the updated design variable vector using the proposed entropic 1 norm MD-SA update, and then map the updated design variable vector back to the original feasible set. Following this procedure, we introduce $x~$ as the scaled design variable vector, whose eth component is given by $x~(e)=v¯(e)x(e)/(|Ω|Vf)$. The scaled feasible set $X~x$ for $x~$ is defined as
$X~x={x~∈RM:∑e=1Mx~(e)=1,0≤x~(e)≤v¯(e)|Ω|Vf,e=1,…,M}$
(28)
Accordingly, the stochastic gradient with respect to $x~$, denoted by $G~(x~k,ξk)$, is also scaled in component form as
$G~(e)(x~k,ξk)=∂x(e)∂x~(e)G(e)(xk,ξk)=|Ω|Vfv¯(e)G(e)(xk,ξk)$
(29)
Given the scaled feasible set and scaled design variable vector, the prox-mapping generated by the entropy function (19) takes the form
$Px~(y~)=argminz~∈X~x{∑e=1M[(y~(e)−1)(z~(e)−x~(e))+z~(e)ln(z~(e)x~(e))]}$
(30)
and the entropic 1 norm MD-SA update then is given by
$x~k+1=Px~k(γkG~(x~k,ξk))$
(31)
Comparing the above expression to Eq. (22), we notice that the scaled set $X~x$ includes an additional upper bound for each design variable $x~(e)≤v¯(e)/(|Ω|Vf)$. Therefore, the closed-form expression (24) cannot be applied directly, because the updated variables could violate their upper bounds. Thus, an iterative formula is derived here to solve for $x~k+1$ in the constrained optimization problem (30) and (31). To achieve this, we first introduce a Lagrange multiplier $λ~$ for the equality constraint in the definition of $X~$ and form the Lagrangian function
$L(z~,λ~)=∑e=1M[(y~(e)−1)(z~(e)−x~(e))+z~(e)ln(z~(e)x~(e))]+λ~(∑e=1Mz~(e)−1)$
(32)
where we assume $y~$ and $x~$ are given vectors in the optimization. The optimality condition of Eq. (32) with respect to $z~(e)$ states that
$∂L∂z~(e)(z~,λ~)=y~(e)+ln(z~(e)x~(e))+λ~=0$
(33)
which gives
$z~(e)(λ~)=x~(e)e−y~(e)e−λ~$
(34)
By introducing $μ~=e−λ~$ (notice that $μ~>0$), we can simplify the above expression as
$z~(e)(μ~)=μ~x~(e)e−y~(e)$
(35)
and incorporating the upper bound (28) into Eq. (34) gives
$z~(e)(μ)=min(v¯(e)|Ω|Vf,μ~x~(e)e−y~(e))$
(36)
The stationary condition of Eq. (32) with respect to $μ~$ yields,
$∂L∂μ~(z(μ))=∂L∂λ~(z(μ))λ~μ~=∑e=1Mz(e)(μ)−1=0$
(37)
Observe that Eq. (37) is monotonic with respect to $μ~$ and can be solved by various methods, e.g., the bisection method that is used in this work. Denote $μ~k*$ as the solution of Eq. (37) at step k. By plugging in $μ~k*$, step size γk, and the scaled stochastic gradient (29), the updated (scaled) design variable vector $x~k+1$ takes the form,
$x~k+1(e)=z~(e)(μ~k*)=min[v¯(e)|Ω|Vf,μ~k*x~k(e)e−γkG~(e)(x~k,ξk)]$
(38)
Once $x~k+1$ is obtained, the last step in the proposed entropic 1 norm MD-SA algorithm is to map it back to xk+1 in the original feasible set as
$xk+1(e)=|Ω|Vfv¯(e)x~k+1(e)=min[1,μ~k*xk(e)e−γkG~(e)(x~k,ξk)]$
(39)

The update formula (39) is the final expression of the entropic 1 norm MD-SA algorithm for the randomized topology optimization problem. Note that the gradient information in(39)is accessed through an estimate, G(xk, ξk), by using a single sampleξk. The entropic 1 norm MD-SA algorithm (39) has advantages over standard gradient-based algorithms (e.g., OC and MMA) for stochastic optimization problems. Using a standard gradient-based update algorithm for a stochastic optimization problem requires moderate accuracy of the estimated gradient, which leads to an increase of the sample size required. On the other hand, the MD-SA update in Eq. (39) requires only low accuracy in the stochastic gradient and thus requires only a single sample, meaning the sample size is always one. If the required gradient information is computationally expensive, the entropic 1 norm MD-SA algorithm is likely to be preferable compared with standard gradient-based algorithms.

Furthermore, for the structural optimization problems solved by the entropic 1 norm MD-SA algorithm in this paper, we propose a step size strategy with a damping scheme that creates a decaying step size set [18], which fits well with the structural optimization framework and stochastic algorithms. The proposed step size strategy and damping scheme are discussed in detail in Sec. 5. To incorporate the damping scheme, we introduce an additional move limit, denoted by move ∈ (0, 1], to the vector of design variables in the entropic 1 norm MD-SA update. At iteration k with design variable vector xk, the move limit modifies the lower bound (LBk) and upper bound (UBk) of the update of the vector of design variables as follows:
$LBk(e)=max{xk(e)−move,0}andUBk(e)=min{xk(e)+move,1}$
(40)
Incorporating the modified LBk and UBk, Eq. (39) is changed to
$xk+1(e)=max{LBk(e),min[UBk(e),μ~k*xk(e)e−γkG~(e)(x~k,ξk)]}$
(41)

We note that the specific form of update (39) is based on the feasible space defined by a simplex and the fact that the linear volume constraint is always active, the latter is always the case for compliance minimization problems. No specific form of the objective function is assumed. Thus, the update formula applies to other objective functions in forms of expectation of a random function, with linear (in terms of design variables) equality and active inequality constraints. For other constraint functions, one needs to adjust the projection (e.g., norm and distance-generating function) of computed stochastic gradients to the geometry of the feasible space, which is determined by the specific constraint functions. For a detail discussion, we refer to Refs. [22] and [35].

## 5 Algorithmic Strategies for Randomized Optimization

The choice of step size strategy is essential for SA algorithms. In this section, we propose a step size strategy with a damping scheme. The main idea is to calculate and re-calibrate step sizes in different stages of the optimization and to reduce the move limit when the average progress per step drops below a tolerance. Furthermore, to improve the performance of the proposed algorithm, we introduce a technique that averages the history of design variable updates and a re-calibration strategy for step size. Finally, we review an iterative solver with recycling for solving the large linear systems to further reduce the computational cost.

### 5.1 Step Size Strategy.

In SA algorithms, a key step is to determine the step size γk of the update (39). Thus, this subsection presents a formula to determine the step size γk. In general, there are two types of step size strategies, constant and varying [22]. The constant step size policy assumes a priori a total number of iterations, the step size is constant throughout the optimization, and the algorithm always ends at the prescribed maximum step. For the varying step size policy, the step size is a function of the iteration number and it decreases over the optimization process.

In Ref. [22], a constant step size is discussed for convex objective functions. The main idea is to calculate a step size that minimizes the error estimate of the stochastic optimization solutions. A formula for the constant step size policy is given as
$γk=θ2Dω,XB*Nmax,k=1,…,Ns$
(42)
where θ > 0 is a scaling parameter, Nmax is the number of allowable iterations, $B*$ is an estimate of an upper bound on the stochastic gradient norm, i.e.,
$B*2≥E[∥G(x,ξ)∥∞2]$
(43)
and Dω,X is the ω-diameter of X defined as
$Dω,X=[maxz∈Xω(z)−minz∈Xω(z)]1/2$
(44)
We adopt (42) to calculate the step size. In our case, $Dω,X=logM$, and we take the scaling parameter θ = 1. Because the bound B* for the stochastic gradient is unknown, we use the norm of the estimated gradient calculated by the sample average of a random sample with size n = 6 at the first optimization step, namely, $B*=1/n∥∑i=1nG(x0,ξi)∥∞$, where x0 stands for the initial guess for the vector of design variables. This gives for the step size
$γk=2logMB*Nmax,k=1,…,Nmax$
(45)
which is used in the entropic 1 norm MD-SA.

### 5.2 Proposed Damping Scheme.

Using the constant step size policy (45), the optimization terminates at the prescribed total number of steps, Nmax. To promote convergence before the maximum number of iterations is reached, we incorporate a damping scheme based on the one introduced in Ref. [18], which gradually reduces the move limit of the updates. The advantage of using the damping scheme is that it allows us to monitor the progress of the update throughout the optimization and damp the update appropriately.

Inspired by simulated annealing [36,37], the damping scheme proposed in Ref. [18] evaluates the average progress per step and reduces the move limit whenever the average progress drops below a tolerance. The effective step ratio Rk for iteration k is defined as
$Rk=1ND∥xk−x(k−ND+1)∥∥xk−xk−1∥$
(46)
where ND is the sample window size (see below). This effective step ratio serves as an indicator of the optimizer’s status, i.e., if the ratio is relatively large, then the optimizer is making progress; and if the ratio is relatively small (typically smaller than 0.1), then the steps are not effective.

The damping scheme works as follows: once Rk is below a prescribed tolerance, i.e., Rk < τD, we reduce the allowable move limit (i.e. move) (40) by a prescribed scale factor κ, namely move = move/κ. To avoid damping the update too early because of insufficient history data, we do not damp the update in the first ND optimization steps, where ND is a chosen parameter. The damping scheme evaluates the average progress of the optimizer to determine damping; thus, the window size for the damping scheme ND needs to be large enough to have a conservative damping and a smooth convergence. However, as the window size increases, we average over more steps. If ND is too large, it slows down convergence because the damping algorithm adapts more slowly. In practice, we have found ND = 100 is sufficient for problems containing more than one thousand design variables.

### 5.3 Averaging the Iterates.

The classical SA and MD-SA algorithms, going back to Ref. [38], are sensitive to the step size and may perform poorly in practice. In order to make the SA (and MD-SA) algorithm (more) robust, it was suggested in Refs. [39,40] to use larger step sizes and to weighted average the resulting iterates over a chosen window (even earlier this was discussed in Ref. [26]). If we denote the window size as NA, then current step k is the end of the window and step j = kNA + 1 is the start of the window. The weighted averaged design variable vector at step k, denoted as $x^kj$, is calculated as follows:
$x^kj=∑t=jkvtxtwithvt=γt∑t=jkγt$
(47)
where γt is the step size in iteration t and vt is the associated weight. In case of the constant step size strategy, we have that $x^kj=(1/NA)∑t=jkxt$.

In this algorithm, the weighted averaged design variable vector $x^kj$ represents the current solution. As discussed in Ref. [22], the weighted averaged design variable vector converges to an optimal solution for convex stochastic optimization problems; thus, we use the weighted averaged design variable vector to determine the current estimate of the optimal objective value and the optimized structure. On the other hand, the design variable vector xk is used in computing the stochastic gradient and performing the design variable updates, as suggested in Ref. [22]. The window size for averaging solutions (NA) directly determines the current solution; thus, it needs to be smaller than ND so as to achieve faster progress, because larger NA leads to slow progress during optimization. In practice, we have found NA = 50 is sufficient for problems containing more than 1000 design variables.

### 5.4 Re-calibration of Step Size.

Because the (constant) step size is evaluated based on the initial guess at the start of the algorithm and kept constant thereafter, it may perform poorly at later iterations. Therefore, we propose a step size re-calibration scheme which re-evaluates the step size periodically during the optimization procedure. The prescribed parameter calibration defines the number of times we perform the re-calibration. In each re-calibration, we take the final $x^k$ (after the optimization is converged) as the initial guess x0 and restart the optimization process by re-calculating the step size according to Eq. (45) with an updated $B*$ and x0 (replaced by $x^k$).

### 5.5 Iterative Solver: MINRES With Recycling.

Although the number of linear systems to solve is drastically reduced by the MD-SA method, we still have a long sequence of large linear systems to solve, with the systems changing only modestly from one optimization step to the next. Hence, we use a preconditioned iterative method with recycling, the recycling MINRES (RMINRES) method [4143], which was derived from the MINRES method [44,45]. RMINRES uses approximate invariant subspaces computed during previous linear solves to accelerate the convergence for subsequent linear systems. Recycling an approximate invariant subspace effectively removes the corresponding eigenvalues from the spectrum of the matrix over the resulting Krylov space [42,46]. In particular, if we remove the largest or smallest eigenvalues for a symmetric, positive definite matrix in this fashion, this improves the theoretical convergence bounds of the MINRES method [45]. The most common choice, also used here, is to effectively remove the smallest eigenvalues by approximating the corresponding invariant subspace. Compared with MINRES or with the method of conjugate gradients (CGs) [47,48], this generally leads to substantially faster convergence [4143,49].

We outline the main steps in the RMINRES algorithm for solving the linear system $Ku=f$, in a sequence of linear systems. For details, including a matlab® code, see Refs. [42,43].

Let the columns of the matrix $W∈RdN×k$ span the approximate invariant subspace that we recycle from previous linear systems and assume that $W$ has been computed such that $Y=KW$ has orthonormal columns, that is, $YTY=I$ (the identity matrix). Given an initial guess, $u~0$, and its residual, $r~0=f−Ku~0$, we compute $u0=u~0+WYTr~0$ and the updated residual $r0=r~0−YYTr~0$, set $v1=r0/∥r0∥2$, and we carry out a Lanczos iteration with additional orthogonalization against $Y$,
$v2t2,1=Kv1−Yb1−v1t1,1vj+1tj+1,j=Kvj−Ybj−vjtj,j−vj−1tj−1,j$
(48)
where $bj=YTKvj$, $tj,j=vjTKvj$, and $tj−1,j=tj,j−1=∥Kvj−1−Ybj−1−vj−1tj−1,j−1−vj−2tj−2,j−1∥2$ was computed in the previous iteration. At the mth iteration, with $Vm+1=[v1v2…vm+1]$ and $Vm+1TVm+1=Im+1$, $Bm=[b1b2…bm]$, and $T_m∈R(m+1)×m$, a tridiagonal matrix with the ti,j above as coefficients and its leading m × m part symmetric, we have
$KVm=YBm+Vm+1T_m$
(49)
Minimizing the residual for a solution of the form
$um=u0+Wz+Vmy$
(50)
gives
$rm=f−Ku0−KWz−KVmy=r0−Yz−(YBm+Vm+1T_m)y$
(51)
$=Vm+1(e1∥ro∥2−T_my)−Y(z+Bmy)$
(52)
Since the matrix $[YVm+1]$ has all orthonormal columns, minimizing $∥rm∥2$ involves (i) solving for $y$ in exactly the same way as in standard MINRES, which can be done efficiently with a short-term recurrence, (ii) choosing $z=−Bmy$, and (iii) updating $um$ according to Eq. (50) at the end of the linear solve.

The implementation details, including updating the matrix $W$ that defines the recycle space for subsequent linear systems, are outside the scope of this paper; for this we refer to Refs. [42,43].

In addition, as the MD-SA algorithm using a single sample will only provide gradient estimates with low accuracy, there is no need to solve the linear systems very accurately. Hence, we can reduce the computational cost further by using a low relative convergence tolerance. We provide results of these experiments for a 3D design in Sec. 6.3.

### 5.6 Entropic ℓ1 Norm MD-SA Algorithm.

The proposed entropic 1 norm MD-SA algorithm including algorithmic parameters for randomized topology optimization is summarized in Algorithm 1.

#### Randomized topology optimization using entropic ℓ1 norm MD-SA

Algorithm 1

1: Initialize:$x0,Nmax,τopt,τD,ND,move,calibration$

2: for$k=0,1,...,Nmax$do

3: if$k=0$then

4:  Evaluate $γk$ based on (46)

5: end if

6: Select $ξk$ and evaluate $G(xk,ξk)$

7: Compute $xk+1$ based on (42)

8: Calculate $x^k+1$ based on (48)

9: Evaluate the effective step ratio $Rk$

10: if$Rk<τD$ and $k>ND$then

11:  $move=move/κ$

12: end if

13: If $∥x^k+1−x^k∥∞<τopt$then

14:  quit

15: end if

16: end for

17: if$calibration>0$then

18:  $calibration=calibration−1$

19:  $x0←x^k+1$ and goto step 2

20: else

21:  Evaluate final compliance $C(x^k+1)$ and plot final topology

22: end if

## 6 Numerical Examples

In this section, we present several numerical examples in both two- and three-dimensions to demonstrate the effectiveness and the computational efficiency of the proposed entropic 1 norm MD-SA algorithm for topology optimization.

The first example compares the entropic 1 norm MD-SA with the 2 norm MD-SA (see Appendix  A). The second example compares the entropic 1 norm MD-SA and the randomized algorithm proposed in Ref. [18], which is a variant of the SAA algorithm, coupled with a common optimization update algorithm (i.e., OC). In the last example, we use a 3D problem to demonstrate the use of an iterative method with a relatively low convergence tolerance for the entropic 1 norm MD-SA algorithm to further reduce the computational cost. In all examples, we also compare the results from these stochastic algorithms with those from the standard deterministic algorithm. The examples are summarized in Table 1.

Table 1

Brief description of the numerical examples

ExampleDimensionDescriptionFeature
12DDisk design with 200 load casesComparison between deterministic, entropic 1 norm MD-SA, and 2 norm MD-SA algorithms
22DBracing design with 102 load casesComparison among standard deterministic formulation with OC, entropic 1 norm MD-SA algorithm, and SAA-based randomized algorithm with OC
33DBridge design with 441 load casesCombination of entropic 1 norm MD-SA with RMINRES iterative solver
ExampleDimensionDescriptionFeature
12DDisk design with 200 load casesComparison between deterministic, entropic 1 norm MD-SA, and 2 norm MD-SA algorithms
22DBracing design with 102 load casesComparison among standard deterministic formulation with OC, entropic 1 norm MD-SA algorithm, and SAA-based randomized algorithm with OC
33DBridge design with 441 load casesCombination of entropic 1 norm MD-SA with RMINRES iterative solver

To quantify the computational cost of the standard and stochastic optimization algorithms, we define the total number of linear systems of equations to solve in the optimization process as Nsolve = m × Nstep for the standard deterministic approach and Nsolve = n × Nstep for stochastic algorithms, where Nstep is the number of optimization steps. This is a measure of the computational efficiency of the optimization formulation. For large 3D systems, where iterative solvers are required, Nsolve is a good proxy for computational cost. In addition, we include the wall clock time of the entire optimization process for comparison. All the examples are performed on a machine with an Intel(R) Xeon(R) CPU E5-1660 v3, 3.00 GHz processor and 256 GB of RAM, running matlab R2018b. The optimization is considered converged if the current step size (bounded by the move limit) is below a prescribed tolerance τopt for the optimization process, that is, ‖xkxk−1 < τopt.

We have incorporated the proposed MD-SA algorithm into the computer program polytop [50]. All the problems are initialized as follows. The initial guess is taken as $x0(e)=Vf$. The convergence tolerance is τopt = 10−2; the initial move limit is chosen as move = 0.1. The maximum number of optimization steps for the standard deterministic approach is Nmax = 400.

The entropic 1 norm MD-SA algorithm uses the following parameters. The sample size is chosen to be n = 1, the window size to weighted average iterates is NA = 50; for step size calculation, θ = 1, Nmax = 400, the initial move limit is move = 0.1; the calibration parameter is calibration = 1; the sample size to estimate the initial gradient for stepsize calculation is 6. In the damping scheme, the window size is ND = 100, and we use a step size reduction factor κ = 2, and the tolerance for the damping τD = 0.05. For the 2 norm MD-SA algorithm, θ = 50 is used for step size calculation; all other parameters are the same as for the entropic 1 norm MD-SA algorithm. For the SAA-based randomized algorithm, the sample size is chosen to be n = 6, other parameters are as follows: move = 0.1, ND = 100, κ = 2, and τD = 0.05.

Let x* and $xSA*$ represent the optimal solutions obtained from the standard formulation in (1) and the MD-SA algorithms, respectively. For the stochastic algorithms, we present the true values of the objective function $C(xSA*)$ at the solution $xSA*$ (instead of its estimator $C^S(xSA*)$) and compare these with the ones obtained from the standard algorithm, C(x*), to evaluate the quality of the solutions. The relative difference is defined as $ΔC=(C(xSA*)−C(x*))/C(x*)$. To distinguish between different stochastic algorithms, we use $xL1−SA*$, $xL2−SA*$, and $xSAA*$ to denote the optimal solutions from the entropic 1 norm MD-SA algorithm, 2 norm MD-SA algorithm, and the SAA-based randomized algorithm, respectively.

### 6.1 Example 1: 2D Disk With 200 Load Cases.

In this example, we demonstrate the computational efficiency as well as the effectiveness of two MD-SA algorithms, entropic 1 norm and 2 norm (Euclidean) MD-SA, for various problem sizes. We show that, as the problem size increases, the entropic 1 norm MD-SA performs better than the 2 norm MD-SA.

Here, we consider a 2D disk whose design domain, boundary conditions, and passive zone (non-designable solid) are shown in Fig. 2(a). The domain is discretized with continuum polygonal elements [50]. We enforce the symmetry of the density distribution in both horizontal and vertical axes. A total of 200 linearly independent and equally weighted load cases are applied at the outer circle of the domain (Fig. 2(b)). For this design example, the active volume fraction (excluding the passive zone) is Vf = 0.25, the radius of the density filter is chosen as 0.05, and the penalization factor p is taken to be 3.0. Moreover, to demonstrate the difference between designs considering a combined single load case and multiple load cases, we include the case where all the 200 loads are applied simultaneously (i.e., one deterministic load case with all the 200 loads acting together). The corresponding optimized topology is shown in Fig. 3(a), which is distinct from all other designs which consider 200 load cases (Figs. 3(b)3(d)).

Fig. 2
Fig. 2
Fig. 3
Fig. 3

To investigate the performance and computational efficiency of the entropic 1 norm MD-SA (n = 1) and the 2 norm MD-SA (n = 1) algorithms versus the standard deterministic approach (m = 200), we consider four problem sizes: M = 10, 000; 40, 000; 70, 000; and 100, 000. The optimized topologies (from representative trials) obtained by these three methods (the standard deterministic approach, the entropic 1 norm MD-SA, and the 2 norm MD-SA) for problem size M = 40, 000 are shown in Figs. 3(b)3(d). For this problem size, the entropic 1 norm MD-SA algorithm leads to a topology with a similar (true) compliance value as the standard deterministic formulation. The 2 norm MD-SA algorithm, on the other hand, produces a final topology with a slightly higher (true) compliance value.

In order to evaluate the stability of the stochastic algorithms, we run each of the MD-SA algorithms (the entropic 1 norm MD-SA and the 2 norm MD-SA) 50 times and present the statistics of the results in Table 2 (this is not needed in practice). “Mean” and “Dev” stand for the mean and the standard deviation, over 50 trials, of the objective function value evaluated at the obtained solutions, respectively. Note that one trial is one run of the numerical experiment. For the standard deterministic approach, we present two sets of results in Table 2. The first set of results is obtained, if we stop the optimization when ‖xkxk−1 < τopt = 10−2 or when the number of optimization steps reaches the given maximum, Nmax = 400. This strategy is also used in Examples 2 and 3. The second set of results is obtained, if we terminate the optimization only when τopt = 10−2 is reached. In the comparison with other algorithms, we mainly use the results from the first strategy in the standard deterministic approach, because we also use Nmax = 400 for the MD-SA algorithms.

Table 2

Results for Example 1, disk design with 200 load cases

AlgorithmMCMean (C)ΔC avg.Dev (C)Nstep avg.Nsolve avg.Time avg. (s)$TimeNstep$ (s)
10,00010.3922845,60014016.2
Std.40,00010.22400(max)80,00010,42526.1
Deterministic70,00010.18400(max)80,00018,52046.3
m = 200100,00010.14400(max)80,00026,88867.2
Std.10,00010.3922845,60014016.2
Deterministic40,00010.17821164,20021,39726.1
(reach τopt)70,00010.111056211,20048,89346.3
m = 200100,00010.071449289,80097,40267.2
Entropic10,0009.60$−7.60%$0.21349349910.26
1 norm40,00010.11$−1.06%$0.203513514261.21
MD-SA70,00010.200.14%0.223573578152.29
n = 1100,00010.281.41%0.2235835811943.40
10,0009.96$−4.10%$0.27330330950.29
2 norm40,00010.442.10%0.233443444711.37
MD-SA70,00010.593.96%0.263503508882.54
n = 1100,00010.634.90%0.323553551,2983.65
AlgorithmMCMean (C)ΔC avg.Dev (C)Nstep avg.Nsolve avg.Time avg. (s)$TimeNstep$ (s)
10,00010.3922845,60014016.2
Std.40,00010.22400(max)80,00010,42526.1
Deterministic70,00010.18400(max)80,00018,52046.3
m = 200100,00010.14400(max)80,00026,88867.2
Std.10,00010.3922845,60014016.2
Deterministic40,00010.17821164,20021,39726.1
(reach τopt)70,00010.111056211,20048,89346.3
m = 200100,00010.071449289,80097,40267.2
Entropic10,0009.60$−7.60%$0.21349349910.26
1 norm40,00010.11$−1.06%$0.203513514261.21
MD-SA70,00010.200.14%0.223573578152.29
n = 1100,00010.281.41%0.2235835811943.40
10,0009.96$−4.10%$0.27330330950.29
2 norm40,00010.442.10%0.233443444711.37
MD-SA70,00010.593.96%0.263503508882.54
n = 1100,00010.634.90%0.323553551,2983.65

Note: Results are averaged over 50 trials.

Figures 4 and 5 depict the final (true) compliances versus the corresponding number of design variables for the entropic 1 norm MD-SA (for 50 trials) and the 2 norm MD-SA (for 50 trials) algorithms, respectively. In these figures, the final compliances from the standard deterministic algorithm with 200 load cases are included for comparison. Representative optimized topologies obtained from the entropic 1 norm and 2 norm MD-SA algorithms are provided as well.

Fig. 4
Fig. 4
Fig. 5
Fig. 5

The entropic 1 norm MD-SA algorithm, while offering similar solutions (in terms of the objective function values) to the standard approach ($−7.60%$, $−1.06%$, $0.14%$, and $1.41%$ relative differences, respectively, for M = 10, 000, M = 40, 000, M = 70, 000, and M = 100, 000), drastically reduces the computational cost for all the problem sizes considered. As shown in Table 2, for the largest problem (M = 100, 000), the number of linear systems to solve (Nsolve) on average is 223 times smaller than for the standard deterministic algorithm (i.e., 358 solves versus 80,000 solves), and the total wall clock time is 22 times lower (i.e., 20 min versus. 7.5 h). The 2 norm MD-SA algorithm provides comparable solutions with the standard approach for the smallest problem size (M = 10, 000), as shown in Fig. 5. However, as the problem size increases, the performance of the 2 norm MD-SA algorithm deteriorates.

Moreover, the comparison between the two MD-SA algorithms suggests that the entropic 1 norm algorithm is more accurate than the 2 norm algorithm for larger problems. This observation confirms the theoretical comparison of error estimates between the two MD-SA algorithms in Appendix  B. It should also be noted that, while the numbers of optimization steps are similar in both MD-SA algorithms, the wall clock time of the entropic 1 norm MD-SA algorithm on average is 8% less than that of the 2 norm one for all the problem sizes considered. The slight difference in wall clock time comes from the different ways to compute the projections (Eq. (30) versus Eq. (A3)) in the two MD-SA algorithms.

### 6.2 Example 2: Two-Dimensional Bracing Design With 102 Load Cases.

In this example, we aim to compare the proposed entropic 1 norm MD-SA algorithm (n = 1) in Eq. (41) with the standard deterministic algorithm (m = 102) in Eq. (1) with OC update and with the randomized algorithm (n = 6) proposed by Zhang et al. [18] (which uses a variant of the SAA technique with OC update). The design domain, boundary conditions, and the passive zone (non-designable solid region) are shown in Fig. 6(a). A total of 102 linearly independent and equally weighted load cases are applied on the two sides of the box (Fig. 6(b)). The problem domain is modeled using 153,600 continuum quadrilateral (Q4) elements, which gives 309,442 DOFs. For this example, the active volume fraction (excluding the passive zone) is Vf = 0.25, the radius of the linear density filter is 3, and the penalization factor p is taken to be 3.0.

Fig. 6
Fig. 6

First, we perform topology optimization using the standard deterministic formulation (1) with OC update scheme. The final topology obtained is shown in Fig. 7(c), which has an objective function value of C(x*) = 50.65. The final topology is obtained with 400 optimization steps (maximum number of optimization steps), and in each optimization step, we solve 102 linear systems (corresponding to 102 load cases), which leads to a total of Nsolve = 40, 800 solves.

Fig. 7
Fig. 7

Next, we use the entropic 1 norm MD-SA algorithm with the update proposed in Eq. (41), which uses a single sample to estimate the gradient (reselected every iteration), and the SAA-based randomized algorithm with the OC update algorithm, which uses six samples to estimate the gradient (reselected every iteration). Because the final topology from the standard algorithm is y-symmetric, we enforce the symmetry of the topologies in both cases by enforcing symmetry of the density distribution with respect to the y-axis. Similar to the first example, we run each of them 50 times, the resulting statistics are shown in Table 3. “Mean” and “Dev” are the mean and the standard deviation, over 50 trials, of the objective function evaluated at the obtained solutions, respectively. The optimized topologies (from representative trials) obtained by the entropic 1 norm MD-SA and the SAA-based randomized algorithms are shown in Fig. 7(a) and 7(b), respectively.

Table 3

Results for Example 2, bracing design with 102 load cases

AlgorithmC (x*)Mean (C)ΔC (avg.)Dev (C)Nstep (avg.)Nsolve (avg.)Time (avg.) (s)$TimeNstep$ (avg.) (s)
Std. Deterministic50.65400 (max)40,800948023.70
m = 102
Entropic MD-SA50.02$−1.20%$0.503723726711.80
n = 1
SAA-based [18]51.601.92%0.8725515307392.90
n = 6
AlgorithmC (x*)Mean (C)ΔC (avg.)Dev (C)Nstep (avg.)Nsolve (avg.)Time (avg.) (s)$TimeNstep$ (avg.) (s)
Std. Deterministic50.65400 (max)40,800948023.70
m = 102
Entropic MD-SA50.02$−1.20%$0.503723726711.80
n = 1
SAA-based [18]51.601.92%0.8725515307392.90
n = 6

Note: Results are averaged over 50 trials. “Dev” represents standard deviation.

For each method, the convergence history of the objective function for a representative trial is shown in Fig. 7(d). For the stochastic cases, the objective function estimators ($C^SA(x)$ and $C^SAA(x)$) are plotted during the optimization, while the true values of the objective function ($C(xL2−SA*)$ and $C(xSAA*)$) are only evaluated at the end of the optimization (indicated with markers). Notice that, because only one sample load case is used to estimate the compliance in the entropic 1 norm MD-SA algorithm rather than six sample load cases in the SAA-based one, the history of the estimator $C^SA(x)$ is more oscillatory than $C^SAA(x)$. However, in terms of the true objective function values obtained at the end of the optimization, the one from the entropic 1 norm MD-SA algorithm is smaller than the one from the SAA-based algorithm.

To quantify the accuracy and efficiency of the stochastic algorithms over multiple trials, Fig. 8 plots the final (true) compliance of the entropic 1 norm MD-SA algorithm (averaged over 50 trials), the SAA-based algorithm (averaged over 50 trials), and the standard deterministic algorithm versus the corresponding total number of linear system solves. In terms of the accuracy, the entropic 1 norm MD-SA algorithm provides a similar topology to the one obtained from the standard deterministic formulation with OC update and yields a slightly smaller mean compliance value ($−1.24%$ relative difference). The SAA-based approach with OC update also leads to a design similar to the one obtained from the standard deterministic formulation, but with a slightly larger mean compliance value ($+1.88%$ relative difference). In terms of the efficiency, both stochastic algorithms use substantially fewer linear system solves and less wall clock time than the standard deterministic one. For the entropic 1 norm MD-SA algorithm, the number of linear systems to solve is on average (over 50 trials) 110 times fewer than for the standard algorithm (i.e., 372 solves versus 40,800 solves), and the total wall clock time is 14.1 times shorter (i.e., 11.2 min versus 2.6 h). Moreover, the convergence for both the entropic 1 norm MD-SA algorithm and the standard one is comparable. For the SAA-based algorithm, the number of linear systems to solve is on average 27 times fewer than for the standard algorithm (1,530 solves versus 40,800 solves), and the total wall clock time is 12.8 times shorter (12.3 min versus 2.6 h). Moreover, the convergence of the SAA-based algorithm is more rapid than that of the standard algorithm. Comparing the two stochastic algorithms, the entropic 1 norm MD-SA leads to a $3.06%$ lower (averaged) compliance and $9.2%$ less wall clock time than the SAA-based one on average, because the MD-SA algorithm solves fewer linear systems (372 solves versus 1530 solves). According to the standard deviation of both methods, the entropic 1 norm MD-SA gives more consistent solutions than the SAA-based algorithm. In summary, among all algorithms in this example, the entropic 1 norm MD-SA offers the lowest (averaged) compliance while being the most computationally efficient, it takes the least amount of time and the smallest number of solves.

Fig. 8
Fig. 8

### 6.3 Example 3: Three-Dimensional Bridge Design With the RMINRES Iterative Solver.

In the third example, we use a three-dimensional (3D) bridge design to demonstrate the performance of the proposed entropic 1 norm MD-SA algorithm for 3D problems. In this example, we also combine the entropic 1 MD-SA algorithm with a fast iterative solver for linear systems to further reduce the computational cost of topology optimization with many load cases. Specifically, we use the RMINRES iterative solver [42,43] presented in Sec. 5.5, which recycles selected subspaces with an incomplete Cholesky preconditioner. For the RMINRES parameters, we choose the number of vectors to be recycled as 10, the number of Lanczos vectors kept in cycle as 100, and the maximum number of iterations as 1000.

The bridge design problem is setup as follows. The design domain, load cases, and boundary conditions are shown in Fig. 9(a). A total of 1764 equally weighted load cases are applied to the bridge deck, which is taken to be a non-designable solid layer. Due to the symmetry of the design problem, we optimize a quarter of the domain with m = 441 load cases, as shown in Fig. 9(b). A total of 248,832 brick elements are used to discretize the quarter domain, resulting in 793,875 DOFs. For this design example, the volume fraction is chosen as Vf = 0.08 (excluding the passive zone), the radius of the density filter is 6, and the penalization factor is taken to be p = 3.

Fig. 9
Fig. 9

We perform topology optimization using the standard deterministic formulation (1). The final topology obtained is shown in Fig. 10(a), which has an objective function value of C(x*) = 13.56. This final topology is obtained with 400 optimization steps (maximum number of optimization steps), and in each optimization step, matlab’s sparse direct solver is used to solve the linear systems with 441 right-hand sides. Note that matlab’s sparse direct solver uses compiled code, while our iterative solver is run as interpreted code (so it runs much more slowly).

Fig. 10
Fig. 10

We then compare the design using the entropic 1 norm MD-SA algorithm with n = 1. Three cases are considered: one using a sparse direct solver, and the other two using the RMINRES iterative solver. For the latter two cases, we test two convergence tolerance values, τiter = 10−8 and τiter = 10−4. For each case, we perform the optimization 10 times and present the results in Table 4. In addition, the optimized structures for the three cases (each obtained from one representative trial) are shown in Figs. 10(b)10(d).

Table 4

Results for Example 3, 3D density-based bridge design

τiterCmean(C)ΔCNstepTime (s)Timesolve(s)$TimeNstep$
Deterministic13.56400156,988142,694392
$&$ dir. sol.(max)
MD-SA14.144.25%29373,26971,156250
$&$ dir. sol.
MD-SA10−814.174.49%27829,25426,461105
& iter. sol.
MD-SA10−414.083.80%27820,24617,35072
& iter. solver
τiterCmean(C)ΔCNstepTime (s)Timesolve(s)$TimeNstep$
Deterministic13.56400156,988142,694392
$&$ dir. sol.(max)
MD-SA14.144.25%29373,26971,156250
$&$ dir. sol.
MD-SA10−814.174.49%27829,25426,461105
& iter. sol.
MD-SA10−414.083.80%27820,24617,35072
& iter. solver

Note: Results are averaged over 10 trials. Timesolve represents the wall clock time of solving the linear systems.

First, we observe that all three cases using the entropic 1 norm MD-SA algorithm with n = 1 produce designs similar to the standard deterministic formulation in terms of the final topology and on average 4.2$%$ higher true compliance values. However, the entropic 1 norm MD-SA algorithm is able to yield the final design with a greatly reduced wall clock time compared with the standard algorithm. For example, with the RMINRES iterative solver with a tolerance of τiter = 10−4, the total wall clock time is $87.1%$ lower than with the standard formulation (i.e., 5.6 h versus 43.6 h).

Comparing the three cases using the entropic 1 norm MD-SA algorithm, with the RMINRES iterative solver we obtain almost identical designs and compliances as with the sparse direct solver. However, the algorithm with the RMINRES iterative solver is more efficient than the algorithm with the sparse direct solver. In terms of the wall clock time, the RMINRES runtimes are $72%$ and $60%$ faster than the sparse direct solver when the tolerance values τiter = 10−4 and τiter = 10−8 are used, respectively.

Finally, based on the performance comparison between the RMINRES iterative solver with two tolerance values, we highlight that, the single sample in entropic 1 norm MDSA leads to a relatively inaccurate gradient estimate, and hence, there is no need to solve the linear system very accurately. This is an advantage of the entropic 1 norm MD-SA algorithm that we exploit with low accuracy iterative solves. The features of the proposed entropic 1 norm MD-SA algorithm and the RMINRES iterative solver with a large convergence tolerance allow us to achieve a further reduction in the computational time in addition to reducing the number of linear systems to solve.

## 7 Concluding Remarks and Perspective

In this paper, we propose a tailored stochastic approximation algorithm to solve mechanics-driven topology optimization problems with many load cases based on compliance minimization formulation at a drastically reduced computational cost. We first reformulate the deterministic topology optimization problem with many load cases into a stochastic one, whose objective function takes the form of the expectation of a random variable. We propose a tailored MD-SA algorithm (see Algorithm 1), which adopts an entropic distance-generating function using the 1 norm to mimic the underlying geometry of the feasible design space (i.e., a design space with a linear volume constraint). Unlike commonly used optimization algorithms, the proposed entropic 1 norm MD-SA algorithm requires only low accuracy in the stochastic gradient, which enables the use of a single sample per optimization step (i.e., the sample size is always one) to estimate the gradient. With the proposed MD-SA algorithm, we reduce the number of linear systems to solve per iteration from hundreds to one. To the authors’ knowledge, this is the first work in the literature that tailors SA algorithms to solve deterministic topology optimization problems.

To improve the performance and convergence of the proposed algorithm, we propose several algorithmic strategies for obtaining effective step sizes and updates, including the step size policy and re-calibration, iterates averaging, and a damping scheme inspired by simulated annealing. The main idea is to calculate and re-calibrate step sizes adaptively in different stages of the optimization. We also adopt an iterative solver with recycling for solving the large linear systems; this allows the MD-SA algorithm to use a significantly higher convergence tolerance in solving large state equations without influencing the performance of the design update to further reduce computational cost. With the proposed algorithmic strategies, the convergence rate of the proposed MD-SA algorithm is shown to be comparable to that of the standard algorithm.

Through numerical examples, we demonstrate the effectiveness, efficiency, and potential of the proposed entropic 1 norm MD-SA algorithm. Compared to the standard deterministic approach, the entropic 1 norm MD-SA algorithm allows to solve large-scale topology optimization problems with hundreds of load cases at a drastically reduced computational cost while obtaining similar design quality. For instance, in example 1, compared with the standard deterministic algorithm (m = 200), the number of linear system solves is 223 times fewer (i.e., 358 solves versus. 80,000 solves) and the average wall clock time is 22 times shorter (i.e., 20 minutes versus 7.5 hours) with the proposed entropic 1 norm MD-SA algorithm (n = 1). In addition, the proposed entropic 1 norm MD-SA (n = 1) is shown to outperform both the 2 norm MD-SA (n = 1) and the SAA-based algorithm [18] (n = 6) with OC update in terms of both efficiency and effectiveness. The first example investigates the performance of two MD-SA algorithms, namely the entropic 1 norm and the 2 norm, in a design problem with various problem sizes (M = 10, 000, M = 40, 000, M = 70, 000, and M = 100, 000). The entropic 1 norm algorithm is found to be more accurate than the 2 norm one for larger problems, which confirms the theoretical comparison of error estimates between the two MD-SA algorithms (see Appendix  B). In addition, the wall clock time of the entropic 1 norm MD-SA algorithm on average is 9.21% less than that of the 2 norm algorithm for all the problem sizes investigated. The second example shows that the entropic 1 norm MD-SA leads to a $3.06%$ lower (averaged) compliance and $9.2%$ lower (averaged) wall clock time than the SAA-based algorithm with OC, because the MD-SA algorithm solves fewer linear systems (i.e., one solve per step versus six solves per step). In the third example, the entropic 1 norm MD-SA algorithm is employed in conjunction with the RMINRES iterative solver for a 3D bridge design under 1764 load cases (441 load cases for a quarter of the domain). We demonstrate that, as compared to the deterministic optimization algorithm, the proposed entropic 1 MD-SA algorithm can produce good-quality designs even when working with an iterative solver with a relaxed tolerance (i.e., 10−4 instead of the more common 10−8), which is especially desirable for large-scale problems to further reduce the computational time.

Future research directions include studying the optimal choices of parameters for both overall computational cost and quality of design. Another important direction is to tailor the proposed algorithm to other objective functions and constraints as well as applying it to practical and complex design examples.

## Acknowledgment

The authors would like to acknowledge Glaucio H. Paulino for his help, insightful discussions, and suggestions. The work by X. S. Zhang was supported in part by the start-up fund from the University of Illinois. The work by E. de Sturler was supported in part by the grant NSF DMS 1720305. The work by A. Shapiro was supported in part by the grant NSF 1633196. The information provided in this paper is the sole opinion of the authors and does not necessarily reflect the view of the sponsoring agencies.

## Nomenclature

• d =

number of dimensions

•
• m =

•
• n =

sample size

•
• p =

penalization parameter in the density-based method

•
• C =

weighted compliance (objective function)

•
• M =

number of elements in the mesh

•
• N =

number of nodes in the mesh

•
• F =

weighted external force matrix

•
• H =

filter matrix for density

•
• K =

global stiffness matrix

•
• U =

weighted displacement matrix

•
• $a~$ =

scaled cross-sectional area

•
• $x¯$ =

filtered density vector

•
• $x~$ =

scaled density vector

•
• fi =

external force vector for the ith load case

•
• ui =

displacement vector for the ith load case

•
• E0 =

Young’s modulus of the solid material

•
• Emin =

Young’s modulus of the Ersatz material

•
• NA =

window size to obtain the weighted averaged density $x^$

•
• ND =

sample window size in the damping scheme

•
• Nmax =

maximum number of steps in the optimization

•
• Nstep =

total number of optimization steps to obtained the final topology

•
• Nsolve =

total number of linear systems solves in the optimization process

•
• Rmin =

•
• Rk =

effective step ratio in the damping scheme evaluated at optimization step k

•
• Vf =

prescribed allowable volume fraction in the density-based method

•
• Vmax =

prescribed maximum volume of the design

•
• Xx =

feasible set of the design variable x in the density-based approach

•
• $X~x$ =

feasible set of the scaled design variable $x~$ in the density-based approach

•
• x* =

optimal solution obtained by the standard algorithm in the density-based method

•
• x(e) =

density of element e (the eth design variable in the density-based method)

•
• v(e) =

volume of element e

•
• $C^SA$ =

expected objective function by the SA algorithm

•
• $C^SAA$ =

expected objective function by the SAA variant algorithm [18]

•
• x*SAA =

optimal solution obtained by the SAA variant algorithm [18] in the density-based method

•
• $x^ℓ1−SA*$ =

optimal solution obtained by the 1 norm MD-SA algorithm in the density-based method with weighted averaged iterates

•
• $x^ℓ2−SA*$ =

optimal solution obtained by the entropic 2 norm MD-SA algorithm in the density-based method with weighted averaged iterates

•
• calibration =

number of times the step size re-calibration process is performed

•
• move =

•
• G(xk, ξk) =

stochastic gradient at optimization step k

•
• $G~(x~k,ξk)$ =

stochastic gradient with respect to $x~$ at optimization step k

•
• Px(·) =

prox mapping defined by V(·, ·)

•
• V(·, ·) =

prox function generated by ω(x)

•
• αi =

weighting factor for the ith load case

•
• γk =

step size at optimization step k

•
• κ =

scale factor in the damping scheme

•
• ξ =

random vector with its entries drawn from the Rademacher distribution

•
• τD =

tolerance for the damping scheme

•
• τopt =

convergence tolerance for the optimization process

•
• τiter =

convergence tolerance for the RMINRES iterative solver

•
• ω(x) =

distance generation function

### Appendix A: Euclidean MD-SA With ℓ2 Norm

Here, we present the MD-SA algorithm with the 2 norm (also known as the Euclidean SA). In this variant, we use the Euclidean (2) norm and define the distance-generating function as follows, $ω(x)=12xTx$. In this case, the prox-function V(·, ·) becomes
$V(x,z)=12zTz−[12xTx+xT(z−x)]=12∥z−x∥22$
(A1)
and the prox-mapping Px(x) = ΠX (xy); see Eq. (12). As a result, we obtain the following update formula (18) for MD-SA algorithm with the 2 norm,
$xk+1=ΠX(xk−γkG(xk,ξk)$
(A2)
That is, using the Euclidean norm, the MD-SA is equivalent to the robust SA algorithm.
When applied to randomized topology optimization formulations, the feasible sets are given by
$Xx={x∈RM:∑e=1Mv(e)x(e)|Ω|−Vf=0,0≤x(e)≤1,e=1,…,M}$
(A3)
The 2 norm MD-SA framework gives the update formula
$xk+1=ΠXx(xk−γkG(xk,ξk))$
(A4)
We note that in the above update formula, a subproblem needs to be solved to find the corresponding Euclidean projection.
The 2 norm MD-SA adopts the same algorithmic framework as the entropic 1 norm MD-SA described in Sec. 5, except for a different formula for γk. A general formula for 2 norm MD-SA is suggested in Ref. [22] which takes the form
$γk=θDω,XBNmax,k=1,…,Nmax$
(A5)
where
$B2≥E[∥G(x,ξ)∥22]$
(A6)
is a bound on the 2 norm squared of the stochastic gradient. Similar to the entropic 1 norm MD-SA case, the above formula is used in this paper to determine the step size for the 2 norm MD-SA. In our case, $Dω,X=M$ and the scaling parameter θ is taken to be 50. Because the bound B for the stochastic gradient is unknown, we estimate B using a few samples as $B=1/n∥∑i=1nG(x0,ξi)∥2$, which is the sample-averaged 2 norm of the estimated gradient at the initial step. This gives the following formula for the constant step size γk for the 2 norm MD-SA,
$γk=50MBNmax,k=1,…,Nmax$
(A7)
As a final remark, unlike in the entropic 1 norm MD-SA, we find that the performance of 2 norm MD-SA is much more sensitive to the step size, especially to the scaling parameter θ.

In Algorithm 2, we summarize the 2 norm MD-SA algorithm for randomized topology optimization considered in this work.

### Appendix B: Comparison of Entropic ℓ1 Norm MD-SA and ℓ2 Norm MD-SA

Here, we provide an error estimate comparison of the entropic 1 norm MD-SA and the 2 norm MD-SA algorithms for a generic convex function f [22]. The error estimate for the 2 norm MD-SA is as follows:
$E[f(x¯Nstep)−f(x*)]≤O(1)max{θ,θ−1}BNstep−1/2$
(B1)
where $x¯Nstep$ is the weighted averaged design variable vector after Nstep steps using the corresponding MD-SA algorithm, $x*$ is the optimal design variable vector, O(1) is a generic constant, θ is the chosen parameter to determine the step size γ in Eq. (A5), and B is a bound on the 2 norm of the stochastic gradient,
$B2≥E[∥G(x,ξ)∥22]$
(B2)
The error estimate for the entropic 1 norm MD-SA is as follows:
$E[f(x¯Nstep)−f(x*)]≤O(1)max{θ,θ−1}ln(M)B*Nstep−1/2$
(B3)
where M is the number of design variables and B* is a bound on the norm of the stochastic gradient
$B*2≥E[∥G(x,ξ)∥∞2]$
(B4)
Notice that the following relation holds for every optimization step,
$1lnM≤BB*lnM≤MlnM$
(B5)
and
$1≤BB*≤M$
(B6)
According to Eqs. (B1, B3, and B6), the entropic 1 norm MD-SA can be more accurate (in terms of estimated error) than the 2 norm MD-SA for large M. Therefore, MD-SA with 1 norm is preferred for problems with larger dimensions. In Sec. 6.1, the numerical comparison between entropic 1 and 2 norm MD-SA algorithms shows that the entropic 1 norm MD-SA is more accurate than the 2 norm MD-SA for large problem sizes. This observation agrees with the theoretical comparison between these two algorithms regarding the error estimates. An important benefit of the 1 norm MD-SA algorithm over the 2 norm MD-SA (Euclidean SA) is the possibility to reduce the constant factor by adjusting the norm and the distance-generating function to the geometry of the problem [22].

## References

1.
Bendsoe
,
M. P.
,
Guedes
,
J. M.
,
Haber
,
R. B.
,
Pedersen
,
P.
, and
Taylor
,
J. E.
,
1994
, “
An Analytical Model to Predict Optimal Material Properties in the Context of Optimal Structural Design
,”
ASME J. Appl. Mech.
,
61
(
4
), pp.
930
937
. 10.1115/1.2901581
2.
Sigmund
,
O.
, and
Torquato
,
S.
,
1997
, “
Design of Materials With Extreme Thermal Expansion Using a Three-Phase Topology Optimization Method
,”
J. Mech. Phys. Solids
,
45
(
6
), pp.
1037
1067
. 10.1016/S0022-5096(96)00114-7
3.
Wang
,
F.
,
Sigmund
,
O.
, and
Jensen
,
J. S.
,
2014
, “
Design of Materials With Prescribed Nonlinear Properties
,”
J. Mech. Phys. Solids
,
69
, pp.
156
174
. 10.1016/j.jmps.2014.05.003
4.
Clausen
,
A.
,
Wang
,
F.
,
Jensen
,
J. S.
,
Sigmund
,
O.
, and
Lewis
,
J. A.
,
2015
, “
Topology Optimized Architectures With Programmable Poisson’s Ratio Over Large Deformations
,”
,
27
(
37
), pp.
5523
5527
5.
Ma
,
Z.-D.
,
Kikuchi
,
N.
,
Pierre
,
C.
, and
Raju
,
B.
,
2005
, “
Multidomain Topology Optimization for Structural and Material Designs
,”
J. Appl. Mech.
,
73
(
4
), pp.
565
573
.
6.
Aage
,
N.
,
Andreassen
,
E.
,
Lazarov
,
B. S.
, and
Sigmund
,
O.
,
2017
, “
Giga-Voxel Computational Morphogenesis for Structural Design
,”
Nature
,
550
(
7674
), pp.
84
86
. 10.1038/nature23911
7.
Liu
,
C.
,
Du
,
Z.
,
Zhang
,
W.
,
Zhu
,
Y.
, and
Guo
,
X.
,
2017
, “
,”
ASME J. Appl. Mech.
,
84
(
8
), p.
081008
. 10.1115/1.4036941
8.
Zhang
,
X. S.
,
Paulino
,
G. H.
, and
Ramos
,
A. S.
, Jr.,
2018
, “
Multi-Material Topology Optimization With Multiple Volume Constraints: A Ground Structure Approach Involving Material Nonlinearity
,”
Struct. Multidisc. Optim.
,
57
, pp.
161
182
. 10.1007/s00158-017-1768-3
9.
,
A.
,
Paulino
,
G.
,
Miller
,
M.
, and
Nguyen
,
T.
,
2010
, “
Topological Optimization for Designing Patient-specific Large Craniofacial Segmental Bone Replacements
,”
,
107
(
30
), pp.
13222
13227
. 10.1073/pnas.1001208107
10.
Nanthakumar
,
S.
,
Lahmer
,
T.
,
Zhuang
,
X.
,
Park
,
H. S.
, and
Rabczuk
,
T.
,
2016
, “
Topology Optimization of Piezoelectric Nanostructures
,”
J. Mech. Phys. Solids
,
94
, pp.
316
335
. 10.1016/j.jmps.2016.03.027
11.
Yun
,
K.-S.
, and
Youn
,
S.-K.
,
2018
, “
Microstructural Topology Optimization of Viscoelastic Materials of Damped Structures Subjected to Dynamic Loads
,”
Int. J. Solids. Struct.
,
147
, pp.
67
79
. 10.1016/j.ijsolstr.2018.04.022
12.
Chen
,
W.
, and
Huang
,
X.
,
2019
, “
Topological Design of 3D Chiral Metamaterials Based on Couple-Stress Homogenization
,”
J. Mech. Phys. Solids
,
131
, pp.
372
386
. 10.1016/j.jmps.2019.07.014
13.
Aguiló
,
M.
,
Blacker
,
T.
, and
Paulino
,
G.
,
2019
,
USACM Thematic Conference Topology Optimization Roundtable: Challenge Problem Competition
,
SIMULIA from Dassault Systèmes
, http://paulino.ce.gatech.edu/TopOpt%20Workshop%20Website/challenge.php, Accessed March 10, 2019.
14.
Diaz
,
A. R.
, and
Bendsœ
,
M. P.
,
1992
, “
Shape Optimization of Structures for Multiple Loading Conditions Using a Homogenization Method
,”
Struct. Optim.
,
4
(
1
), pp.
17
22
. 10.1007/BF01894077
15.
Bendsœ
,
M. P.
,
Ben-Tal
,
A.
, and
Zowe
,
J.
,
1994
, “
Optimization Methods for Truss Geometry and Topology Design
,”
Struct. Optim.
,
7
(
3
), pp.
141
159
. 10.1007/BF01742459
16.
Bendsœ
,
M. P.
, and
Sigmund
,
O.
,
2003
,
Topology Optimization: Theory, Methods, and Applications
,
Springer
,
Berlin, Germany
.
17.
Achtziger
,
W.
,
1998
, “
Multiple-Load Truss Topology and Sizing Optimization: Some Properties of Minimax Compliance
,”
J. Optim. Theory Appl.
,
98
(
2
), pp.
255
280
. 10.1023/A:1022637216104
18.
Zhang
,
X. S.
,
de Sturler
,
E.
, and
Paulino
,
G. H.
,
2017
, “
Stochastic Sampling for Deterministic Structural Topology Optimization With Many Load Cases: Density-Based and Ground Structure Approaches
,”
Comput. Methods Appl. Mech. Eng.
,
325
, pp.
463
487
. 10.1016/j.cma.2017.06.035
19.
Haber
,
E.
,
Chung
,
M.
, and
Herrmann
,
F.
,
2012
, “
An Effective Method for Parameter Estimation With PDE Constraints With Multiple Right-hand Sides
,”
SIAM J. Optim.
,
22
(
3
), pp.
739
757
. 10.1137/11081126X
20.
Roosta-Khorasani
,
F.
, and
Ascher
,
U.
,
2015
, “
Improved Bounds on Sample Size for Implicit Matrix Trace Estimators
,”
Found. Comput. Math.
,
15
(
5
), pp.
1187
1212
. 10.1007/s10208-014-9220-1
21.
Neelamani
,
R.
,
Krohn
,
C. E.
,
Krebs
,
J. R.
,
Romberg
,
J. K.
,
Deffenbaugh
,
M.
, and
Anderson
,
J. E.
,
2010
, “
Efficient Seismic Forward Modeling Using Simultaneous Random Sources and Sparsity
,”
Geophysics
,
75
(
6
), pp.
WB15
WB27
. 10.1190/1.3509470
22.
Nemirovski
,
A.
,
Juditsky
,
A.
,
Lan
,
G.
, and
Shapiro
,
A.
,
2009
, “
Robust Stochastic Approximation Approach to Stochastic Programming
,”
SIAM J. Optim.
,
19
(
4
), pp.
1574
1609
. 10.1137/070704277
23.
Bottou
,
L.
,
Curtis
,
F. E.
, and
Nocedal
,
J.
,
2018
, “
Optimization Methods for Large-scale Machine Learning
,”
Siam Rev.
,
60
(
2
), pp.
223
311
. 10.1137/16M1080173
24.
Goodfellow
,
I.
,
Bengio
,
Y.
, and
Courville
,
A.
,
2016
,
Deep Learning
,
MIT Press
,
Cambridge
.
25.
Nemirovski
,
A.
, and
Yudin
,
D.
,
1978
, “
On Cezari’s Convergence of the Steepest Descent Method for Approximating Saddle Point of Convex-Concave Functions
,”
,
239
, pp.
1056
1059
.
26.
Nemirovski
,
A.
, and
Yudin
,
D.
,
1983
,
Problem Complexity and Method Efficiency in Optimization
,
John Wiley
,
New York
.
27.
Bourdin
,
B.
,
2001
, “
Filters in Topology Optimization
,”
Int. J. Numer. Methods Eng.
,
50
(
9
), pp.
2143
2158
. 10.1002/nme.116
28.
Bendsøe
,
M. P.
,
1989
, “
Optimal Shape Design As a Material Distribution Problem
,”
Struct. Optim.
,
1
(
4
), pp.
193
202
. 10.1007/BF01650949
29.
Zhou
,
M.
, and
Rozvany
,
G.
,
1991
, “
The COC Algorithm, Part II: Topological, Geometrical and Generalized Shape Optimization
,”
Comput. Methods Appl. Mech. Eng.
,
89
(
1–3
), pp.
309
336
. 10.1016/0045-7825(91)90046-9
30.
,
J.
,
1999
, “
A Finite Element Analysis of Optimal Variable Thickness Sheets
,”
SIAM J. Numer. Anal.
,
36
(
6
), pp.
1759
1778
. 10.1137/S0036142996313968
31.
Hutchinson
,
M.
,
1989
, “
A Stochastic Estimator of the Trace of the Influence Matrix for Laplacian Smoothing Splines
,”
Commun. Stat. Simul. Comput.
,
18
(
3
), pp.
1059
1076
. 10.1080/03610918908812806
32.
Avron
,
H.
, and
Toledo
,
S.
,
2011
, “
Randomized Algorithms for Estimating the Trace of An Implicit Symmetric Positive Semi-Definite Matrix
,”
J. ACM
,
58
(
2
), pp.
1
34
. 10.1145/1944345.1944349
33.
Dilworth
,
S. J.
, and
Montgomery-Smith
,
S. J.
,
1993
, “
The Distribution of Vector-Valued Rademacher Series
,”
Ann. Probab.
,
21
(
4
), pp.
2046
2052
.
34.
Shalev-Shwartz
,
S.
,
2011
, “
Online Learning and Online Convex Optimization
,”
Found. Trends Mach. Learning
,
4
(
2
), pp.
107
194
. 10.1561/2200000018
35.
Shapiro
,
A.
,
Dentcheva
,
D.
, and
Ruszczyński
,
A.
,
2009
,
Lectures on Stochastic Programming: Modeling and Theory
,
SIAM
,
.
36.
Kirkpatrick
,
S.
,
Gelatt
,
C. D.
, Jr.
, and
Vecchi
,
M. P.
,
1983
, “
Optimization by Simulated Annealing
,”
Science
,
220
(
4598
), pp.
671
680
. 10.1126/science.220.4598.671
37.
Salamon
,
P.
,
Sibani
,
P.
, and
Frost
,
R.
,
2002
,
Facts, Conjectures, and Improvements for Simulated Annealing
,
SIAM
,
.
38.
Robbins
,
H.
, and
Monro
,
S.
,
1951
, “
A Stochastic Approximation Method
,”
Ann. Math. Stat.
,
22
(
3
), pp.
400
407
. 10.1214/aoms/1177729586
39.
Polyak
,
B. T.
,
1990
, “
New Stochastic Approximation Type Procedure
,”
Automat. I Telemekh.
,
7
, pp.
98
107
.
40.
Polyak
,
B. T.
, and
Juditsky
,
A. B.
,
1992
, “
Acceleration of Stochastic Approximation by Averaging
,”
SIAM J. Control Optim.
,
30
(
4
), pp.
838
855
. 10.1137/0330046
41.
Kilmer
,
M.
, and
de Sturler
,
E.
,
2006
, “
Recycling Subspace Information for Diffuse Optical Tomography
,”
SIAM J. Sci. Comput.
,
27
(
6
), pp.
2140
2166
. 10.1137/040610271
42.
Wang
,
S.
,
de Sturler
,
E.
, and
Paulino
,
G. H.
,
2007
, “
Large-Scale Topology Optimization Using Preconditioned Krylov Subspace Methods With Recycling
,”
Int. J. Numer. Methods Eng.
,
69
(
12
), pp.
2441
2468
. 10.1002/nme.1798
43.
Mello
,
L. A. M.
,
de Sturler
,
E.
,
Paulino
,
G. H.
, and
Silva
,
E. C. N.
,
2010
, “
Recycling Krylov Subspaces for Efficient Large-Scale Electrical Impedance Tomography
,”
Comput. Methods Appl. Mech. Eng.
,
199
(
49–52
), pp.
3101
3110
. 10.1016/j.cma.2010.06.001
44.
Paige
,
C. C.
, and
Saunders
,
M. A.
,
1975
, “
Solutions of Sparse Indefinite Systems of Linear Equations
,”
SIAM J. Numer. Anal.
,
12
(
4
), pp.
617
629
. 10.1137/0712047
45.
,
Y.
,
2003
,
Iterative Methods for Sparse Linear Systems
, 2nd ed.,
SIAM
,
.
46.
Eiermann
,
M.
,
Ernst
,
O. G.
, and
Schneider
,
O.
,
2000
, “
Analysis of Acceleration Strategies for Restarted Minimal Residual Methods
,”
J. Comput. Appl. Math.
,
123
(
1–2
), pp.
261
292
. 10.1016/S0377-0427(00)00398-8
47.
Hestenes
,
M. R.
, and
Stiefel
,
E.
,
1952
, “
Methods of Conjugate Gradients for Solving Linear Systems
,”
J. Res. Natl. Bureau Stand.
,
49
(
6
), pp.
409
436
. 10.6028/jres.049.044
48.
Meijerink
,
J.
, and
Van der Vorst
,
H.
,
1977
, “
An Iterative Solution Method for Linear Equations Systems of Which the Coefficient Matrix is a Symmetric M-matrix
,”
Math. Comput.
,
31
(
13
), pp.
148
162
.
49.
Gaul
,
A.
, and
Schlömer
,
N.
,
2015
, “
Preconditioned Recycling Krylov Subspace Methods for Self-Adjoint Problems
,”
Electron. Trans. Numer. Anal.
,
44
, pp.
522
547
.
50.
Talischi
,
C.
,
Paulino
,
G. H.
,
Pereira
,
A.
, and
Menezes
,
I. F. M.
,
2012
, “
PolyTop: A Matlab Implementation of a General Topology Optimization Framework Using Unstructured Polygonal Finite Element Meshes
,”
Struct. Multidisc. Optim.
,
45
(
3
), pp.
329
357
. 10.1007/s00158-011-0696-x