Abstract

In the last two decades, increased need for high-fidelity simulations of the time evolution and propagation of forces in granular media has spurred a renewed interest in the discrete element method (DEM) modeling of frictional contact. Force penalty methods, while economic and widely accessible, introduce artificial stiffness, requiring small time steps to retain numerical stability. Optimization-based methods, which enforce contacts geometrically through complementarity constraints leading to a differential variational inequality problem (DVI), allow for the use of larger time steps at the expense of solving a nonlinear complementarity problem (NCP) each time-step. We review the latest efforts to produce solvers for this NCP, focusing on its relaxation to a cone complementarity problem (CCP) and solution via an equivalent quadratic optimization problem with conic constraints. We distinguish between first-order methods, which use only gradient information and are thus linearly convergent and second-order methods, which rely on a Newton type step to gain quadratic convergence and are typically more robust and problem-independent. However, they require the approximate solution of large sparse linear systems, thus losing their competitive advantages in large scale problems due to computational cost. In this work, we propose a novel acceleration for the solution of Newton step linear systems in second-order methods using low-rank compression based fast direct solvers, leveraging on recent direct solver techniques for structured linear systems arising from differential and integral equations. We employ the quantized tensor train (QTT) decomposition to produce efficient approximate representations of the system matrix and its inverse. This provides a versatile and robust framework to accelerate its solution using this inverse in a direct or a preconditioned iterative method. We demonstrate compressibility of the Newton step matrices in primal dual interior point (PDIP) methods as applied to the multibody dynamics problem. Using a number of numerical tests, we demonstrate that this approach displays sublinear scaling of precomputation costs, may be efficiently updated across Newton iterations as well as across simulation time steps, and leads to a fast, optimal complexity solution of the Newton step. This allows our method to gain an order of magnitude speedups over state-of-the-art preconditioning techniques for moderate to large-scale systems, hence mitigating the computational bottleneck of second-order methods.

1 Introduction

The discrete element method (DEM) [1] is one of the most widely used approaches to simulate the multibody dynamics such as in granular materials. This method considers the granular medium as a collection of discrete particles; each responding to body forces such as gravity, inertia or drag, as well as repulsive or dissipative forces caused by contact. Materials in granular form are omnipresent in industry and understanding their dynamics is crucial for a broad range of application fields, including terramechanics, active media, additive manufacturing, nanoparticle self-assembly, avalanche dynamics, composite materials, pyroclastic flows, etc.

Distinct instances of the DEM are defined essentially by their modeling of contact forces, and thus in how collisions are handled. We distinguish two main types of discrete element methods in the literature: force penalty methods (DEM-P) and complementarity formulations (DEM-C). Penalty methods introduce one or multiple layers of spring-like forces between objects in contact, and may introduce additional fields to represent friction. These methods are widely used due to being computationally inexpensive and easy to implement. However, in many cases they introduce high stiffness, requiring extremely small time steps to retain stability during collisions. An in-depth and informative comparison between the two can be found in Ref. [2].

DEM-C methods such as in Refs. [3] and [4] enforce contacts using complementarity constraints, leading to a differential variational inequality problem upon discretization. This allows for the use of larger time steps in its integration, as contacts are enforced geometrically. Given a time-stepping scheme for this problem, a nonlinear complementarity problem (NCP) must be solved to compute the corresponding contact forces at each time-step. One common way to solve this optimization problem is via a linearization of the constraints, producing a linear complementarity problem [5,6]. An alternative relaxation method produces an equivalent, convex cone complementarity problem (CCP). Efficient quadratic cone programming techniques such as those in Refs. [711], and [4] have been proposed to solve the CCP.

In Ref. [12], the authors performed a comparison of these quadratic programming techniques focused on determining which class of methods performed best: second-order optimization methods, i.e., those that use Hessian information (Interior Point), or first-order methods, i.e., methods using only gradient information (Jacobi, Gauss-Seidel, Projected Gradient Descent). Second-order methods display quadratic convergence in a neighborhood of the solution, and thus their convergence is much faster and more problem-independent than that of linearly convergent first-order methods, requiring at least 1 to 2 orders of magnitude less iterations to reach a desired accuracy across experiments in this work. However, each iteration in second-order methods requires the solution of a (generally sparse) linear system, which can become costly as the dimensions of the many-body problem increase. For large problems, the added computational effort ultimately eclipses the gains obtained from the reduction in iteration counts.

The preferred solution method for large sparse systems is often a Krylov subspace iterative method. The iteration counts, and thus the performance of these methods are known to be directly affected by the eigenspectrum of the associated matrices. Preconditioning techniques can be used to cluster the eigenvalues away from zero and drastically reduce iteration counts; we refer the reader to Ref. [13] for a general review. Generic sparse preconditioners are most typically based on incomplete or sparsified factorizations, such as the well-known Incomplete LU and Cholesky methods [14]. In Ref. [12], a fast, parallel SaP (split and paralellize) [15] preconditioner was used to accelerate the primal dual interior point (PDIP) method, garnering reductions in iteration counts and execution times.

Most general-purpose preconditioners suffer a trade-off between precomputation costs and the resulting reduction in iteration counts, and their performance is often problem-dependent. Moreover, in the context of optimization problems such as the ones we are interested in, system matrices change every iteration, and it is often impossible or expensive to update the associated preconditioners. Finally, improving their scaling with the number of degrees-of-freedom and the level of sparsity is extremely important, as it is central to remaining competitive in large-scale problems.

In the last decades, a continued effort has been made to produce direct solvers for structured linear systems arising from differential and integral equations. These solvers entirely side-step the challenges related to convergence speed of iterative solvers. They can also lead to dramatic improvements in speed, in particular in situations where a large number of linear systems with coefficient matrices that stay fixed or can be updated via low rank modifications. Additionally, they also provide a methodology to produce robust preconditioners: low accuracy direct solvers may be used in conjunction with iterative refinement or the Krylov subspace method of choice; the use of an approximate low accuracy inverse requires less memory and precomputation time than a direct solver, at the expense of a slight increase in the number of iterations. These features have led to the adoption of these solvers in other areas of scientific computing and statistics.

In Ref. [16], an effective and memory-efficient solver based on the quantized tensor train decomposition (QTT) was presented. By recasting system matrices as tensors, the tensor train (TT) compression and inversion routines were used to produce direct solvers and robust preconditioners for integral equations in complex geometries in three dimensions. Key properties of this solver that differentiate it from other hierarchical matrix approaches feature sublinear computational costs and memory requirements with problem size N, as well as techniques to produce economic updates for a matrix and its inverse across time-steps, even when matrix size changes.

The main goal of this work is to employ a QTT-based approach to provide a radical speed up to second-order optimization methods. We demonstrate its application to interior point methods such as the PDIP in the context of many-body dynamics problems; however, we expect our discussion to apply with little to no modification to a general class of Newton and quasi-Newton type methods. We first study compressibilty of the system matrices and their inverses in TT format for a range of target accuracies. We then demonstrate how factorization re-use can provide significant speed-ups to precomputation costs, reducing costs by orders of magnitude. For three validation tests of common soil mechanics phenomena—sedimentation, blade drafting and direct shear experiments—we show that a TT-based preconditioner displays efficient and robust performance for problems with > 104 bodies, garnering up to an order of magnitude speed-up and greatly improved iteration counts when compared with state-of-the-art ILU-preconditioned methods. We also confirm an extremely significant gain in scaling of precomputation costs: precomputation for the TT preconditioner is sublinear (for all practical purposes, constant) as the number of collisions and matrix size increase.

The QTT decomposition is one of several approaches for approximate solution of linear systems based in hierarchical compression. In the context of sparse structured systems and related factorizations, work has been done for a number of hierarchical matrix formats: hierarchically semi separable [1723], Hmatrices: [2426], and fast multipole methods [27,28]. Producing efficient, global factorization updates and dealing with high storage costs is an ongoing challenge in these alternate formats.

2 Mathematical Preliminaries: Dynamics of Rigid Bodies in the Presence of Friction

2.1 Notation and Glossary.

In order to ensure clarity of exposition, we include list of notation and terminology employed throughout this work. The tensor train acceleration proposed involves the manipulation of vectors, matrices and tensors of different dimensions, as well as the associated indexing and operators (Table 1).

We include a glossary and notation for important quantities and parameters used throughout the text in Table 2.

Table 1

Notation guide for mathematical objects, indices and operators

ObjectsNotationDescription
b, q(⋅)Scalars, scalar valued functions (default math)
γ,f(·)Vectors, vector valued functions (lowercase bold math)
A,K(·)Matrices, matrix valued functions (uppercase math serif)
b,A,F(·)Tensors, tensor valued functions (chancery calligraphic)
ATensor Train decomposition (uppercase typewriter)
TTree data structure (uppercase calligraphic)
Indexing
γiith vector linked to ith object (e.g., rigid body, contact pair)
γ(i)ith entry of vector γ
A(i1,i2,,id)(i1,i2,,id) entry of tensor A
i=i1i2id¯“flattened” (vectorized) multi-index by lexicographical order
Operators
vec(A)Vectorize tensor A according to flattened index i=i1i2id¯
reshape(·,δ)Reshape array or index given compatible dimension vector δ
permute(A,p)Permute tensor dimensions given permutation vector p
diag(λ)Diagonal matrix generated by vector λ
ObjectsNotationDescription
b, q(⋅)Scalars, scalar valued functions (default math)
γ,f(·)Vectors, vector valued functions (lowercase bold math)
A,K(·)Matrices, matrix valued functions (uppercase math serif)
b,A,F(·)Tensors, tensor valued functions (chancery calligraphic)
ATensor Train decomposition (uppercase typewriter)
TTree data structure (uppercase calligraphic)
Indexing
γiith vector linked to ith object (e.g., rigid body, contact pair)
γ(i)ith entry of vector γ
A(i1,i2,,id)(i1,i2,,id) entry of tensor A
i=i1i2id¯“flattened” (vectorized) multi-index by lexicographical order
Operators
vec(A)Vectorize tensor A according to flattened index i=i1i2id¯
reshape(·,δ)Reshape array or index given compatible dimension vector δ
permute(A,p)Permute tensor dimensions given permutation vector p
diag(λ)Diagonal matrix generated by vector λ

Note: Font type and specifications employed to distinguish classes of mathematical objects is indicated with emphasized text. Standard notation is used for matrix indices and operations (e.g., transpose, inverse).

Table 2

Glossary of commonly used terms

MNumber of rigid particles in contact dynamics system.
Biith rigid body with location in generalized coordinates qi.
Fi,TiApplied force and torque vectors on Bi.
ui,ωiTranslational and rotational velocities of body Bi.
viRigid body velocity vector vi=(ui,ωi).
NcNumber of contacts, with locations ci in 3
Φi(q)Signed distance function for contact ci.
γiContact impulse vector γi=(γi,n,γi,1,γi,2)
μiSliding contact friction coefficient
dTensor or tensor train decomposition dimension
rkkth tensor train rank; r denotes maximum rank.
εAccuracy parameter; subindices used for different methods.
MNumber of rigid particles in contact dynamics system.
Biith rigid body with location in generalized coordinates qi.
Fi,TiApplied force and torque vectors on Bi.
ui,ωiTranslational and rotational velocities of body Bi.
viRigid body velocity vector vi=(ui,ωi).
NcNumber of contacts, with locations ci in 3
Φi(q)Signed distance function for contact ci.
γiContact impulse vector γi=(γi,n,γi,1,γi,2)
μiSliding contact friction coefficient
dTensor or tensor train decomposition dimension
rkkth tensor train rank; r denotes maximum rank.
εAccuracy parameter; subindices used for different methods.

2.2 Problem Formulation.

We consider a granular material comprised of M rigid particles Bi in 3; the position of each body is uniquely described by the coordinates xi for its center of mass and the rotation Qi of a reference frame fixed to the body, represented by a unimodular quaternion. Quaternion representation of rotations in three dimensions is usually favored due to being compact and numerically stable; since we require the four component quaternions to be length one, it introduces only three independent degrees-of-freedom per particle. Let qi=(xi,Qi) then be the six generalized coordinates for the ith body, and q=(q1,q2,,qM)6M.

Each particle's motion can thus be understood as a translation of its center, with velocity ui and a rotation of its frame, with angular velocity ωi. Applying Newton's second law, we relate the corresponding accelerations to the total force Fi and torque Ti applied to it. The general equations of motion are then given by
(1)
(2)

where L(q) is a linear operator that relates velocities to the rate of change in generalized coordinates, M is the mass matrix and v,fB,fC6M contain each body's translational and rotational velocities, the total body forces and torques applied to them and the reaction forces and torques due to contact dynamics, respectively.

2.3 Complementarity Contact Model.

We impose the following contact constraints: no two bodies should penetrate, and if there is contact, a normal force and a tangential frictional force act at the interface. Consider two bodies Bi1 and Bi2, and let Φi(q) be an unsigned distance function (also known as a gap function) for the pair i=(i1,i2) satisfying Φi(q)>0 if the two bodies are separated, Φi(q)=0 if they are touching, Φi(q)<0 otherwise.

If the pair of bodies touch (Φi(q)=0), let n,t1,t2 be unit normal and tangential vectors at the point of contact. Contact forces fN=γi,nn and fT=γi,1t1+γi,2t2 are then applied to each body in opposite directions. The complementarity constraint for the normal force (3) prevents penetration, enforcing that bodies move away from each other at contact. The Coulomb friction model ties the magnitudes of the normal and tangential forces. Using a maximum dissipation principle, the friction force is posed as the solution to an optimization problem
(3)
(4)

where μ is the static friction coefficient. The feasible set of forces f=fN+fT for the minimization problem in Eq. (4) are known as a friction coneϒi={(γi,n,γi,1,γi,2)|||(γi,1,γi,2)||μγi,n}. The complementarity condition in Eq. (3) is typically abbreviated using the notation 0γi,nΦi(q)0. We note that an alternate complementarity formulation for Eq. (3) allowing for partially elastic collisions can be obtained by replacing Φi with the normal velocity as discussed in Refs. [5] and [29].

Since all forces are zero in the absence of contact, we wish to incorporate contact constraints only for pairs of objects approaching collision (e.g., within the next timestep). For this purpose, we define the set A of pairs of bodies separated by a distance smaller than a threshold δ > 0
(5)

For a system with Nc=|A(q,δ)| contacts, this adds a number of constraints to the equations of motion proportional to Nc. We note that for dense granular flows, Nc itself scales as O(M) for M bodies.

Now, let Di=[Di,nDi,1Di,2]6M×3 map the multipliers γi,n,γi,1,γi,2 to the forces and torques applied to bodies Bi1 and Bi2 at contact. Then, the equations of motion result in the differential variational inequality
(6)
(7)
(8)
(9)
A semi-implicit, first-order integration scheme is then used to advance this system in time. Given position qk and velocity vk at a given time-step tk and step size Δt, velocity vk+1 and contact forces are solved via a NCP. The new velocity is used to evolve the position in time.
(10)
(11)
(12)
(13)

We note that Eq. (12) is obtained from Eq. (8) via a linearization, dividing by Δt (which does not affect complementarity, but is numerically desirable). This makes the future velocity vk+1 the sole variable needed to enforce the complementarity condition. We also note that in this discretization, (γi,n,γi,1,γi,2) constitute contact impulses, i.e., force magnitudes multiplied by the step length Δt.

2.4 Solving the Optimization Problem.

A relaxation over the complementarity constraint (12) can be introduced [3], turning the problem into a convex, second-order CCP
(14)
The solution of this relaxed problem approaches the solution of the NCP as the step-size Δt goes to zero. Additionally, the CCP is equivalent to the Karush Kuhn Tucker (KKT) first-order optimality conditions for a quadratic optimization problem with conic constraints. We define a contact transformation matrix D=[D1;D2;DNc]6M×3Nc, a matrix N and vector r
(15)
(16)
where b=[b1;b2;bNc],bi=[Φik/Δt;00]3 and k=Mvk+Δtfk. We note that N is a 3Nc×3Nc symmetric positive semidefinite matrix and typically sparse. The aforementioned quadratic program is then given by
(17)
(18)

While this relaxed problem may introduce artifacts when step size Δt, sliding velocity or friction are large, it enables the use of a wide range of quadratic programming methods for its solution:

  • Projected Jacobi and Gauss–Seidel methods [4].

  • Projected gradient descent methods like accelerated projected gradient descent [10], Barzilai and Borwein [30] and the Kucera and Preconditioned spectral projected gradient with fallback methods in Ref. [8].

  • Krylov subspace methods: Gradient projected minimum residual in Ref. [8].

  • PDIP [7,11,3134].

  • Symmetric cone interior point [9].

Projected Jacobi and Gauss–Seidel methods, while requiring only fast matrix applies of N, have slow, linear convergence, often requiring thousands of iterations to obtain a significant reduction for the objective function. The projected gradient descent and Krylov subspace methods have since been proposed, providing considerable iteration count reductions while retaining cost-efficiency per time-step. However, they remain linearly convergent, and they require an increasing number of iterations as problem size increases for a variety of problems of interest.

Interior point methods are often based on a modified Newton or quasi-Newton step, displaying quadratic convergence near minima and iteration counts which are less dependent on problem size. However, they require the approximate solution of large linear systems in order to produce the Newton step, thus losing this competitive advantage due to computational cost.

2.5 Overview of Interior Point Methods.

Interior point methods, also known as barrier methods, are a class of algorithms tailored to solve constrained convex optimization problems [35]. That is
(19)
(20)
for fiC2(n) and convex. They proceed by transforming this problem into an unconstrained minimization problem, encoding the feasible set defined by the constraints using a barrier function, e.g., the logarithmic barrier Bτ(z)=(1/τ)log(z). They then pose the unconstrained convex problem
(21)

The parameter τ controls the strength of the barrier, and as τ,Bτ(z)I(z) with I(z) the indicator function over (–, 0], and the problem in Eq. (21) becomes equivalent to the original constrained program defined by Eqs. (19) and (20). Given the set of problems defined by Eq. (21), interior point methods proceed by following along the corresponding central path{γ*(τ):τ>0} of optima, which are located inside of the feasible set, and converge to the solution of the original problem as τ.

2.4.1 Primal-Dual Interior Point Methods.

Primal-dual methods proceed by defining a path of solutions (γ*,λ*). These can be obtained by considering the KKT conditions of Eq. (21)
(22)
(23)
We then define the lagrange multipliers λi=(1/τfi(γ)), setting up the following conditions:
(24)
(25)
(26)
(27)

We note that, as τ, Eq. (26) becomes a strict complementarity condition.

2.4.2 The PDIP Step.

The PDIP step is then obtained by applying Newton's algorithm to solve these KKT conditions. Defining the residual of the system given by Eqs. (26) and (27)rτ(γ,λ) as
(28)
the Newton step is then defined by the solution of the linear system given by
(29)
this is typically coupled with a backtracking line search, and a strategy to increase τ until sufficient convergence to the solution of the original problem is satisfied.

2.4.3 Application to the CCP.

For the equivalent quadratic program with conic constraints in Eqs. (17) and (18), we define f0(γ)=q(γ)=(1/2)γTNγ+rTγ and 2Nc inequality constraints (Eq. (18)) given by fi(γ) as
(30)
the Newton equations that define the PDIP step thus require the solution of a linear system of the form
(31)

where M̂ is a diagonal matrix defined by M̂=i=12Ncλi2fi(γ)=diag(m̂), with m̂=[μ12λ1,λ1,λ1,,μ12λNc,λNc,λNc]. B,C are banded rectangular matrices, and E is diagonal. A typical sparsity pattern for a multibody dynamics problem is shown in Fig. 1.

Fig. 1
Sparsity structure of PDIP Newton system matrix for a general multibody dynamics problem
Fig. 1
Sparsity structure of PDIP Newton system matrix for a general multibody dynamics problem
Close modal
In order to find the Newton step, we must then solve this sparse linear system. We may either proceed directly, or by eliminating Δλ, a reduced, symmetric positive definite Schur complement matrix S of size 3Nc × 3Nc for Δγ can be obtained
(32)

We note that working with this Schur complement matrix S is widely preferred, as it is symmetric positive definite and smaller in size. We additionally observe that its sparsity pattern across all PDIP iterations is always that of matrix N. That is due to the fact that M̂ is diagonal and BE1C is block-diagonal (3 × 3 blocks). Finally, in all of our experiments, S was relatively more compressible in the Tensor train format. For these reasons, our discussion and acceleration efforts below center around solving the associated Schur complement system.

Most state-of-the-art methods for the direct solution and preconditioning of this system, the Tensor train decomposition included, will attempt to exploit sparsity structure in S. We recall that N is the Hessian of the quadratic q(γ) given by DTM1D (Eq. (15)), with D6M×3Nc contact transformation matrix and M diagonal mass matrix. In order to understand the sparsity pattern of N, we partition it into 3 × 3 blocks corresponding to each contact. The (, i)th block of D is nonzero if the ith contact involves body B. As a result, the (i, j)th block of N is nonzero if the ith and jth contacts share a body in common.

Let us consider a toy contact problem involving three spherical objects lying on a flat surface as shown in Fig. 2(a). Given the network of bodies at contact in Fig. 2(b), the block sparsity pattern in N corresponds to the edge-adjacency graph in Fig. 2(c); two edge nodes (indexed by body pairs) are connected if they share a body in common. The resulting pattern for this matrix (and thus for S) for this toy problem is shown in Fig. 3.

Fig. 2
Collision graphs example: (a) Configuration of three spherical bodies lying on a flat surface. Centers of mass are depicted as red circles, collision points as white squares, (b) graph connecting bodies that are in contact, and (c) graph connecting contacts sharing a body in common. This graph reveals the block-sparsity pattern for N and S.
Fig. 2
Collision graphs example: (a) Configuration of three spherical bodies lying on a flat surface. Centers of mass are depicted as red circles, collision points as white squares, (b) graph connecting bodies that are in contact, and (c) graph connecting contacts sharing a body in common. This graph reveals the block-sparsity pattern for N and S.
Close modal
Fig. 3
TT decomposition for a matrix with d = 3. Each level of refinement for the matrix block hierarchy corresponds to one tensor dimension and TT core. Columns of the corresponding unfolding matrix are obtained by vectorizing matrix blocks (in red if nonzero, white otherwise). The example illustrated above uses an interpolatory low rank decomposition, producing a uniform subsampling of tree nodes. The TT decomposition process thus consists of finding a hierarchical basis of matrix block entries. We show the main steps in algorithm 1: (i) computing a low rank decomposition UkVk for the corresponding unfolding matrix Mk, (ii) taking Uk (in green) as the kth TT core Gk, and (iii) interpreting the right factor Vk as a matrix in level k + 1 to form Mk+1.
Fig. 3
TT decomposition for a matrix with d = 3. Each level of refinement for the matrix block hierarchy corresponds to one tensor dimension and TT core. Columns of the corresponding unfolding matrix are obtained by vectorizing matrix blocks (in red if nonzero, white otherwise). The example illustrated above uses an interpolatory low rank decomposition, producing a uniform subsampling of tree nodes. The TT decomposition process thus consists of finding a hierarchical basis of matrix block entries. We show the main steps in algorithm 1: (i) computing a low rank decomposition UkVk for the corresponding unfolding matrix Mk, (ii) taking Uk (in green) as the kth TT core Gk, and (iii) interpreting the right factor Vk as a matrix in level k + 1 to form Mk+1.
Close modal

3 The Tensor Train Solver

In the context of multibody dynamics, we know the system matrices for the Newton step in Eq. (31) to be sparse and highly structured, and dependent on the iterates (γ,λ) as the PDIP iteration proceeds to the solution of problem (17). Further, from one time-step to the next, we expect the matrix required to obtain the Newton step to change in size as the active set of constraints (corresponding to pairs of objects in contact) evolves. It is because of these changes both within and between timesteps that producing an efficient and robust solution technique for the Newton system remains challenging.

Among currently available hierarchical compression techniques, the TT decomposition features compression and inversion algorithms that are applicable to a large set of structured matrices and that lend themselves to inexpensive global updates. In addition, they have shown to achieve sublinear precomputation times. We thus propose to use it as a framework for direct solution and preconditioning of iterative solvers for the linear systems in each PDIP iteration.

In this section, we give a cursory description of the QTT decomposition as a method to efficiently compress, invert and perform fast arithmetic with approximate representations of structured matrices. We then present a general discussion of its application to solving the linear systems associated with the PDIP for the CCP. This constitutes, to our knowledge, the first application of hierarchical compression solvers to the acceleration of second-order optimization methods. Although it is beyond the scope of this work, we expect the techniques laid out in this section to be readily applicable to a more general class of interior point and other Newton and quasi-Newton based methods for smooth convex problems.

3.1 The Tensor Train Decomposition.

The TT decomposition provides a powerful tensor compression technique [36] via low rank representation akin to that of a generalized singular value decomposition. We will focus on its application in the approximation of tensorized vectors and matrices given a hierarchical subdivision of their indices, known as QTT. Matrices in this setting are further interpreted as tensorized operators acting on such tensorized vectors. We outline how this interpretation allows us to effectively employ the TT as a tool for hierarchical compression and inversion of structured matrices.

3.1.1 Motivating Example.

Suppose we wish to compress the three-tensor A(i1,i2,i3) obtained from sampling the function f(x,y,z)=sin(x+y+z) on a uniform grid with n3 points (xi1,yi2,zi3) in [0,1]3. We can use addition formulas to decompose f(x,y,z) as a sum of separable functions
(33)
(34)

Evaluation of Eqs. (33) and (34) on the tensor A is depicted in Fig. 4. The first step (33) produces a rank 2 decomposition A(i1,i2,i3)=G1(i1,α1)V1(α1,i2,i3). In Eq. (34) we then decompose V1 to separate dependency of y and z; each of the two terms results in a rank 2 decomposition V1(α1,i2,i3)=G2(α1,i2,α2)G3(α3,i3). We note that each term Gk (depicted in solid yellow, blue and red in Fig. 4) depends on one dimension ik of the original tensor A, and storage has been reduced from n3 to 2n + 4n + 2n = 8n.

Fig. 4
TT decomposition example: Compression of three tensor from samples of f(x,y,z)=sin(x+y+z) through a chain of two rank 2 matrix decompositions corresponding to angle addition formulas in Eqs. (33) and (34). TT cores Gi are obtained through a chain of low rank matrix decompositions of unfolding matrices.
Fig. 4
TT decomposition example: Compression of three tensor from samples of f(x,y,z)=sin(x+y+z) through a chain of two rank 2 matrix decompositions corresponding to angle addition formulas in Eqs. (33) and (34). TT cores Gi are obtained through a chain of low rank matrix decompositions of unfolding matrices.
Close modal

What we have presented in this example is an exact tensor train decomposition of A. We present the analogous definition for the TT decomposition of a d dimensional tensor.

Definition 1. For a d-dimensional tensorA(i1,i2,,id),iknk, sampled atN=k=1dnkpoints, a TT decomposition is of the form
(35)
where, each two- or three-dimensionalGkis known as a tensor core. The ranges of auxiliary indicesαk=1,,rkdetermine the number of terms in the decomposition. We refer to rk as the kth TT-rank, analogous to matrix numerical rank.

In order to understand the chain of “low rank” decompositions in the tensor train format in terms of matrix low rank decompositions, we introduce auxiliary objects known as unfolding matrices.

Definition 2. For a tensor of dimension d, the kth unfolding matrix is defined as
(36)
wherepk=i1ik¯andqk=ik+1id¯are two flattened indices. Using Matlab's notation
(37)
The ranks of the TT decomposition are thus the ranks of unfolding matrices. For instance, if we consider the first unfolding matrix A1 of tensor A in our example, its rows will depend on xi1, and its columns on yi2,zi3. Equation (33) clearly implies that A1 is exactly of column rank 2, and the resulting matrix decomposition
(38)
is interchangeable with the first rank 2 decomposition of A, requiring only to merge indices i2, i3 in V1. We may similarly show that A2 is of rank 2 (by applying addition formulas for (x+y) and z); however, the second rank 2 approximation in the chain is obtained by decomposing an unfolding matrix of V1
(39)

As is the case with low rank matrix decompositions, most often a tensor A of interest will be approximately of low rank, in the sense that given a target accuracy ε, a TT decomposition with low TT ranks A may be found such that ||AA||F<ε. This decomposition can be obtained by a sequence of low-rank approximations to Ak. A generic algorithm proceeds as in Algorithm 1.

Due to the cost of successive low rank factorizations, implementing Algorithm 1 will predictably lead to relatively high computational cost, O(N) or higher, which is exponential in the tensor dimension d. Instead, we employ TT rank revealing strategies based on the multipass alternating minimal energy (AMEN) cross algorithm of Ref. [36], in which a low-TT-rank approximation is initially computed with fixed ranks and is improved upon by a series of passes through all cores.

3.1.2 TT Compression Update.

We note that, since these algorithms obtain the TT approximation based on an iterative, greedy rank detection procedure, they allow for inexpensive updates to a TT compressed representation of a tensor A. If we wish to produce the TT representation of Ã=A+E, the AMEN Cross algorithm may be started using the TT representation of A as an initial guess. If E has small TT ranks, this provides a significant speed-up for this compression algorithm, which often converges in a few iterations to the updated representation.

3.1.3 Computational Complexity and Memory Requirements.

Because AMEN cross and related TT rank revealing approaches proceed by enriching low-TT-rank approximations, all computations are performed on matrices of size rk1nk×rk or less. In Ref. [16], it is shown that the complexity for this algorithm is thus bounded by O(r3d) or equivalently O(r3logN), where r=max(rk) is the maximal TT-rank that may be a function of sample size N and accuracy ε. For a large number of structured matrices as well as their inverses, r typically stays constant or grows logarithmically with N [16,3739]. The overall complexity of computations is then sublinear in N.

3.2 Tensor Train for Hierarchically Structured Matrices.

Consider a block-sparse, structured matrix S such as the ones obtained from the Newton step system in PDIP applied to contact dynamics. Following the motivating example presented above, our objective is to tensorize matrix S in such a way that it is highly compressible in the TT format (i.e., the resulting TT ranks for a given accuracy ε are reliably small).

For this purpose, we propose a tensorization process obtained from a hierarchical partition of the unknowns corresponding to rows and columns of S based on their location in space. These kind of partitions, usually encoded with tree data structures, have been extensively used in the last two decades to produce multilevel low rank factorizations of structured matrices and associated fast direct solvers. Following Ref. [16], we show that the Tensor train decomposition applied to this tensorized matrix yields an effective hierarchical low rank factorization technique.

3.2.1 Contact Problem Example.

In order to demonstrate how this tensorization procedure works, we again consider the small contact problem involving three spherical objects lying on a flat surface presented in Fig. 2. As discussed in Sec. 2.4, the unknowns of the Schur complement matrix S correspond to one normal and two tangent impulses γi=(γi,n,γi,1,γi,2) for each contact point located in space at ci.

S can thus be naturally partitioned as a Nc×Nc block-sparse matrix, with the (i, j)th 3 × 3 block encoding the relationship between the change in impulse Δγj and the resulting residual rγ,i. We may additionally identify these 3 × 3 blocks as samples from a bivariate function of impulses at each contact point S(i,j)=K(γi,γj); we recall that for interior point methods, K is derived from the augmented Hessian of f(γ).

In order to tensorize this matrix, we first subdivide the physical domain along each coordinate direction (two subdivisions in Figs. 5(b) and 5(c)) and upon each subdivision, we assign a binary index ik{0,1} to contact points on each side. In this example, each contact is identified with the leaf node of a binary tree T of depth 2 depicted in Fig. 5(d). A final index i1{0,1,2} is required for each of the three coordinates of the impulse vector γi. We note that indices are numbered from local to global, matching the order they would appear when tensorizing a column vector. Replacing a single index i with the multiindex (i1,i2,i3) induces a natural tensorization for any vector of samples in our 12 unknowns (γ1,n,γ1,1,,γ4,n,γ4,1,γ4,2). Unknowns are ordered according to a depth-first (e.g., morton ordering) traversal of the tree, and then the vector is reshaped as a 3 × 2 × 2 tensor.

Fig. 5
Toy contact problem: (a) Layout of three rigid spheres lying on a flat surface; the four contact points are labeled c1,c2,c3,c4, (b) First level of hierarchical bisection, corresponding to tensor index i3∈{0,1}, (c) second level of bisection, corresponding to tensor index i2∈{0,1}, (d) binary tree structure of depth 2. Each leaf node requires the final tensor index i1∈{0,1,2} to distinguish each coordinate of γi=(γi,n,γi,1,γi,2).
Fig. 5
Toy contact problem: (a) Layout of three rigid spheres lying on a flat surface; the four contact points are labeled c1,c2,c3,c4, (b) First level of hierarchical bisection, corresponding to tensor index i3∈{0,1}, (c) second level of bisection, corresponding to tensor index i2∈{0,1}, (d) binary tree structure of depth 2. Each leaf node requires the final tensor index i1∈{0,1,2} to distinguish each coordinate of γi=(γi,n,γi,1,γi,2).
Close modal

The Schur matrix S in our example is 12 × 12, with rows and columns dependent on this very set of unknowns; it is depicted in the left-most matrix in Fig. 3. At first glance, it might be tempting to split row and column indices in the order they appear, obtaining tensor of dimension 6 with entries S(i1,i2,i3,j1,j2,j3); this, however, will invariably produce suboptimal TT ranks. Instead, let TΠ denote the product tree T×T, corresponding to hierarchical subdivision of both row and column indices. The first bisection produces 4 matrix 6 × 6 blocks (Fig. 3, center), indexed by a block index b3=i3j3¯. The second, 16 3 × 3 blocks indexed by (b2,b3)=(i2j2¯,i3j3¯) (Fig. 3, left). Finally, entries on each individual block are identified with the local index b1=i1j1¯. This corresponds to ordering matrix entries according to the multiindex (i1j1¯,i2j2¯,i3j3¯) (depth-first traversal on TΠ) and reshaping the matrix as a 9 × 4 × 4 tensor.

3.2.2 General Matrix Tensorization Process and TT Decomposition.

Given a set of Nc contact points with locations ci3, we set a box containing all of them as the root of the hierarchy and proceed to subdivide all nodes along each coordinate direction until each leaf node has at most m points. As was the case for our toy problem, each contact point ci may now be encoded by a set of d indices: a local index i10,1,,3m1 for the 3 m unknowns and d − 1 binary indices (i2,i3,,id) corresponding to its location on the tree T at each level. This process is shown on line 1 of Algorithm 2. We note that the binary encoding of points in n dimensional space induced by this coordinate partition induces a so-called Morton ordering (also known as Z-curve order). For efficient algorithms to construct and update such partitions, we refer the reader to Ref. [40].

In general, the tree T will be a uniform binary tree of depth d − 1 with empty nodes corresponding to regions in space with no contacts, and nonempty leaf nodes will contain different numbers of contacts (unless m =1). Equivalently, every unknown will have an associated multiindex (i1,,id), but not all multiindices will correspond to a valid unknown. In order to apply TT compression algorithms, we will thus work on an augmented matrix Ŝ of size 2d1n1×2d1n1 (with n1 = 3 m) such that Ŝ(i,j)=S(i,j) if both i and j correspond to valid unknowns, and Ŝ(i,j)=δi,j otherwise. In other words, Ŝ is a permutation of a 2 × 2 block diagonal matrix with blocks S and identity matrix I corresponding to the original and dummy variables, respectively. This corresponds to the function call in line 2 of Algorithm 2.

Finally, we must apply the Tensor train decomposition algorithm of our choice to compress a tensorized version of matrix Ŝ (lines 3–6 of Algorithm 2). Following our discussion for the contact problem example, we propose to order the entries according to a hierarchical bisection of both row and column dimensions of matrix blocks (splitting each into four children block at each step), corresponding to the product tree TΠ, and block indices (b1,,bd)=(i1j1¯,,idjd¯). Lines 3–5 of Algorithm 2 show an explicit way to obtain this d dimensional tensor by reshaping arrays and permuting their dimensions; however, the TT AMEN Cross compression algorithm used in this work will not in general form this array, and instead all we require is a way to evaluate the desired tensor S given a set of block indices
(40)

and use this routine to obtain an approximate TT factorization S via a greedy, rank-revealing process. We note that each core of S,Gk(αk1,ikjk¯,αk) depends only on the pair of row and column indices at the corresponding level of the hierarchy. When performing matrix arithmetic, such as matrix-vector product or inversion, TT cores are often reshaped as nk × nk matrices parametrized by αk1 and αk.

In Fig. 3, we illustrate the application of Algorithm 1 to produce the TT decomposition for the Schur complement in our toy contact problem, corresponding to line 6 of Algorithm 2. This method produces a TT decomposition via a sequence of low rank decompositions of unfolding matrices of the tensorized S. Because of the way we have indexed S, the columns of the first unfolding matrix S1 are obtained by vectorizing each of the 16 leaf blocks in TΠ. If we apply a row interpolatory decomposition to obtain the low rank factorization G1V1, this step of the algorithm finds a low rank interpolation basis of block entries. The algorithm then proceeds by merging children nodes, forming the next unfolding matrix and applying low rank decompositions in an upward pass through the matrix block tree.

3.2.3 TT Compressibility of S.

As evidenced by the experimental results in Sec. 4, following this procedure reliably yields approximate TT decompositions for S with low TT ranks. This is, first, because it exploits sparsity at different levels of resolution, corresponding to different levels of the tree TΠ (unfolding matrix columns are zero if the corresponding block is zero). As discussed in Sec. 2.4, block-sparsity of S is tied to a network of contact points; a block of S is nonzero if they share a body in common. Unless bodies are substantially different in size and diameter, this network is highly localized: two contacts sharing a body will generally be nearby in space. We note that an interesting avenue of future research might consider tensorizing S following graph partition techniques based on the adjacency graph in Fig. 2(c).

Second, the TT automatically exploits smoothness and symmetries of the bivariate function K(γ1,γ2). The resulting TT decomposition and rank bounds may thus be analyzed in terms of the approximation of K by a sum of separable functions of the form g1(x1)g2(x2)gd(xd) provided by this decomposition, as has been done for integral and differential operators and their inverses.

3.2.4 Fast Arithmetic and TT Direct Solvers.

Fast methods for TT matrix arithmetic are available for a number of operations, including inversion and matrix-vector and matrix-matrix products. The TT solvers in this work find an approximate TT structure for the inverse and employ the corresponding TT matrix-vector product γ=S1b. This mat-vec algorithm proceeds by contracting one dimension of the tensorized matrix S– 1 at a time, applying the kth TT core. Its complexity can be shown to be O(r2NlogN).

TT inversion methods compute an approximate TT decomposition of S1 given the TT decomposition of S. In these algorithms, matrix equations for each tensor core of the inverse is solved iteratively. Following an alternating least squares algorithm, given an initial guess for the inverse in TT form, it proceeds by iteratively cycling through the cores (freezing all cores but one) and solving a linear system to update the kth core of Stext--1. TT ranks of the inverse are not known a priori (and are distinct to those for S), and so strategies to increase core ranks are needed to ensure convergence of the alternating least squares procedure to an accurate inverse representation. Further details about variants of this approach can be found in Refs. [4143]. The complexity of this algorithm for a maximum TT rank r for both S and S−1 is bound by O(r4logN), as shown in Ref. [16].

3.2.5 Inverse Compression Update.

We note that, since they share the same iterative, greedy rank detection structure with the AMEN Cross compression algorithm, TT direct solver routines may also be significantly sped up using an approximate initial solution. In the context of the Newton step in interior point methods, we expect both forward and inverse operators to be obtainable via low TT rank updates.

3.3 Tensor Train Newton System Solver.

We now discuss how to adapt the tensor train decomposition framework to accelerate computation of the Newton steps within the PDIP method applied to the CCP. We first show how at a given timestep, the tensor train provides an easy-to-update approximate inverse for the Newton system's Schur complement matrix. We then describe a procedure to hot-start the TT compression and inversion algorithms reusing information from the previous timestep, even when the corresponding set of contacts (and thus, matrix size and structure) change. We summarize the entire PDIP step acceleration approach in Algorithm 3.

3.3.1 Accelerated Newton Step Solve.

At a given timestep t, the kth PDIP iteration involves the solution of the Schur complement system, for 3Nc × 3Nc matrix Sk=N+M̂kBkEk1Ck. In order to accelerate the Newton system solve, we construct approximate TT decompositions Sk and Sk1; we employ the AMEN cross and solve algorithms given user prescribed target accuracy and maximum TT rank parameters (lines 3–8, 14, and 15 in Algorithm 3). We then have the choice to employ Sk1 as a direct solver or couple it with an iterative procedure (e.g., preconditioned Krylov method) if a more accurate solve is needed. For the numerical experiments presented in Sec. 4, we show the latter approach to have the best performance in terms of reducing overall solution costs.

We recall from Sec. 3.1 that one of the features of the AMEN Cross TT compression algorithm is the ability to produce inexpensive updates. Going from iteration k to k +1, we consider Sk+1 as a perturbation
(41)

where Lk is block-diagonal. While Lk is generally of matrix rank O(Nc), across all experiments in Sec. 4 we observe it to be of approximate low TT rank, and the same is found for Sk+11 as a perturbation of Sk1. This fact allows us to use the TT decompositions Sk,Sk1 to hot-start the corresponding AMEN compression and inversion algorithms, reducing precomputation times considerably (∼ 10 × faster in our experiments).

3.3.2 Re-Using Information Across Timesteps.

Employing information from the solution of the CCP at a timestep t to hot-start the PDIP iteration at the next timestep t + Δ t is notoriously hard; even if it can be used to produce a feasible point that is close to the optimum (which is nontrivial due to changes in the set of contacts), efforts by the PDIP algorithm to preserve centrality might cause it to take small steps and waste time “returning” to the central path. For this reason, we initialize each PDIP iteration by making the tangential force impulses γi,1,γi,2 equal to zero, and the normal force γi,n equal to either a constant preset value (e.g., 1) or to the value computed in the previous timestep if the corresponding contact persists across timesteps. This ensures that our initial value is feasible and lies safely inside the friction cone.

Since pairs of bodies may phase in and out of contact, the set of contacts considered at each timestep (and thus the set of unknowns) as well as the corresponding sparsity pattern both change as the system evolves in time. In Fig. 6 we show an example based on our toy contact problem. While the matrix size for this problem remains the same (since one contact is added and one is removed), we note that the sparsity structure is different due to a change in graph connectivity. However, as long as Δt and relative velocities are sufficiently small, it is likely that the set of persistent contacts (and the corresponding submatrix of S) from one timestep to the next will be relatively large. In all validation experiments in Sec. 4, in fact, well over 90% of contacts persist once objects have sedimented.

Fig. 6
Change in contact set and matrix entries: From time t to t + Δt, we depict a change in the configuration of the three bodies in Fig. 2 as the system evolves. Above each configuration, we depict the corresponding block-sparsity pattern for the initial Schur complement matrices. Persistent matrix entries are depicted in red, entries being removed in orange and entries being introduced in green.
Fig. 6
Change in contact set and matrix entries: From time t to t + Δt, we depict a change in the configuration of the three bodies in Fig. 2 as the system evolves. Above each configuration, we depict the corresponding block-sparsity pattern for the initial Schur complement matrices. Persistent matrix entries are depicted in red, entries being removed in orange and entries being introduced in green.
Close modal

Our goal is then to re-use the TT decompositions for the initial Schur complement matrix S0 from one timestep to the next. From one timestep to the next, contact locations evolve, new active contacts may be introduced and inactive contacts need to be eliminated. We recall from Sec. 3.2 that the current set of active unknowns is considered as a subset of variables corresponding to a uniform binary tree with at most m contacts in each leaf node, and the matrix S0 is compressed within an augmented matrix Ŝ0 of size 2d1n1×2d1n1 with n1 = 3 m.

We must then update the indices for the new set of active contacts at time t + Δt

  • Indices for contacts that are no longer active must be set to dummy variables and corresponding entries of Ŝ0 set to the identity.

  • New contacts must be added to the hierarchy and encoded according to their location. Blocks for these contacts and their neighbors (sharing a rigid body) are now nonzero.

  • Since binary encoding of contact locations is computationally inexpensive, we may readily detect and update indices for persistently active contacts whose location has significantly changed.

In order to preserve as much structure as possible, we only change the encoding of persistent contacts if there is an indexing conflict, if their current leaf node has more than m contacts, or if their location has migrated significantly. In Fig. 7 we show a simple one-dimensional example of the evolution of tensor indices from one timestep to the next, tracking active nodes in the resulting binary tree structure and corresponding sparsity of matrix S0. We note that while Nc changes from 5 to 6, the size of the augmented matrix, depth of the binary tree and number of contacts per leaf (m =1) for this example remain the same.

Fig. 7
TT tensor index hierarchy evolution: We depict the evolution of a spatial hierarchy used to index pairs of bodies at collision from a simple one-dimensional example as it evolves from timestep t to t + Δt. Above each configuration, we show the corresponding binary tree and sparse matrix structure for the Schur complement matrix. Persistent collision pairs and corresponding matrix entries are depicted in red, pairs being removed in orange and pairs being introduced in green. In this example, one of five nodes is removed and two new nodes are introduced.
Fig. 7
TT tensor index hierarchy evolution: We depict the evolution of a spatial hierarchy used to index pairs of bodies at collision from a simple one-dimensional example as it evolves from timestep t to t + Δt. Above each configuration, we show the corresponding binary tree and sparse matrix structure for the Schur complement matrix. Persistent collision pairs and corresponding matrix entries are depicted in red, pairs being removed in orange and pairs being introduced in green. In this example, one of five nodes is removed and two new nodes are introduced.
Close modal

If this re-indexing is compatible with the encompasing binary tree, we may use our TT decompositions for S0 and S01 as initial guesses for the corresponding initial matrix and inverse at the next timestep. Otherwise, we must compute a new hierarchical subdivision and compress the TT factorizations from scratch.

4 Numerical Results

We now demonstrate the performance of the TT-based solver when accelerating the solution of the Newton step system, and thus of second-order methods such as PDIP, in the context of dense, multiple rigid-body dynamics. As mentioned in Secs. 1 and 3, we know that given a desired target accuracy ε, if the associated TT ranks rk are bounded or slowly growing as a function of problem size N, factorization costs and storage for the TT solver are sublinear in N, and that applying this matrix to a given right-hand-side is O(NlogN). We wish to study if Newton system matrices are TT compressible in this sense, and to test their performance and scaling in their solution as problem size grows (determined by the number of collisions Nc). By harnessing the ability of the TT to produce economic updates both from one Newton iteration to the next, as well as across timesteps, we demonstrate significant improvement for the solution of the complementarity problem using second-order methods.

The complementarity method for frictional contact, as well as the solution methods for its CCP relaxation have been thoroughly validated and contrasted with experimental data [12,44]. We setup three experiments based on standard phenomena in terramechanics, focusing on how TT-based linear solvers perform in the PDIP iteration. Models and contact dynamics simulations are performed using open-source library Project Chrono [45], PDIP solver comparisons are performed with a serial matlab implementation and all TT methods are based on the matlab TT-Toolbox [46]. All experiments are run on the serial queue of University of Michigan's Flux computing cluster ().

4.1 Tensor Train Compressibility and Information Re-Use.

We first carry out an assessment to determine how the general behavior of TT ranks for the Schur matrix and its inverse vary with target accuracy ε and maximum allowed TT ranks, varying these parameter from 10−2 to 10−4 and r10,100,1000, respectively. This gives us a general idea of the compressibility of these matrices upon re-ordering their nodes according to a spatial hierarchy. Since algorithmic constants for TT compression, inversion and matrix-vector apply all depend on rank, this also informs out choice of ε. We present average results for 1000 PDIP iterations for a sedimentation experiment with M =35,939 rigid bodies in Sec. 4.2; we note these are largely replicated across experiments, and that ranks and compression times grow very slowly with the number of degrees-of-freedom.

In Table 3 we observe that the Schur matrices and their inverses are indeed highly compressible, and that as we increase the maximum allowed rank, individual and average TT ranks converge. This is replicated in all our experiments, regardless of number of particles and number of collisions. In Table 4 we record the combined matrix compression and inversion times; as indicated in Sec. 3.2, performance of these algorithms is bounded by terms of the form rklogNc. Given that rank growth results in a substantial increase in precomputation times, in our experiments we find that a TT preconditioner approach with ε = 10−2 is most effective in reducing overall computation for the PDIP iterations. We note, however, that this parameter selection is generally problem and implementation dependent.

Table 3

Average TT ranks of Schur matrix: Setting target accuracy ε and maximum rank r, we record average TT ranks for the Schur complement Sk and it's inverse Sk1, reported as rS/rS1

r 10r 100r 1000
ε = 10−24.6/4.410.5/6.710.5/6.7
ε = 10−38.7/8.559.3/45.265.2/46.4
ε = 10−49.1/9.068.6/58.1113.3/91.2
r 10r 100r 1000
ε = 10−24.6/4.410.5/6.710.5/6.7
ε = 10−38.7/8.559.3/45.265.2/46.4
ε = 10−49.1/9.068.6/58.1113.3/91.2

Note: This data is obtained from 1000 timesteps of a sedimentation simulation of M =35,939 rigid bodies.

Table 4

Average TT compression times for Schur matrix: Setting target accuracy ε and maximum rank r, we record average compression times (in seconds) for the entire PDIP solver (∼ 50–100 iterations per timestep) for 1000 timesteps of a sedimentation simulation of M =35,939 rigid bodies

r 10r 100r 1000
ε = 10−2102525
ε = 10−31612441317
ε = 10−41627408157
r 10r 100r 1000
ε = 10−2102525
ε = 10−31612441317
ε = 10−41627408157

We then wish to quantify the impact of the strategies described in Sec. 3.3 to re-use information in TT factorizations for the matrix and its inverse. For this purpose, we run two versions of the TT preconditioned PDIP iteration with and without factorization re-use for 100 timesteps for the same sedimentation problem described above. We set target accuracy to ε = 10−2 and bound maximum ranks at r10.

In Fig. 8, we plot precomputation times for the initial Newton steps and for the entire PDIP iteration, in order to measure how effective our information re-use strategies for the TT are in bringing down compression and inversion costs. For the initial PDIP iteration, we see that except for the first timestep (for which both methods have no prior information), information re-use always provides a speedup, between 5 and 15× for most timesteps shown here. The accumulated effect of both re-use strategies can be seen in the second plot on the right; precomputation is improved for all timesteps, resulting in a 10–15× overall speedup. Across all experiments, we observe both an improvement by about an order of magnitude and a drastic reduction in the variance of precomputation times, making the TT approach faster and more robust. We also note that predictably, this speedup is larger for higher target accuracies, as the iterations through all TT cores become more computationally burdensome.

Fig. 8
TT compression speedups due to information re-use: Left: we plot the ratio of compression times without and with information re-use for the initial Newton step for each timestep against number of collisions Nc; Right: we plot the same ratio for total compression times for the PDIP iterations
Fig. 8
TT compression speedups due to information re-use: Left: we plot the ratio of compression times without and with information re-use for the initial Newton step for each timestep against number of collisions Nc; Right: we plot the same ratio for total compression times for the PDIP iterations
Close modal

4.2 Performance Comparison Experiments for Primal Dual Interior Point Iteration.

4.2.1 Sedimentation on Box With Rotating Mixer.

A randomly pertubed cubic lattice of (2n + 1)3 rigid particles of spherical shape of radius 0.1 and friction coefficient μ = 0.25 are dropped and sediment under gravity into a fixed box with a rotating mixer, as shown in Fig. 9. We then run simulations for n =8, 16, 32, with a total of rigid bodies M =4915, 35,939 and 274,627 (including the box and mixer), scaling up the box size to keep particle density roughly constant. We set a timestep size Δt = 0.025 and target accuracy 1 × 10−4 for the PDIP solver, until such time as all objects are deposited in the box and undergoing mixing.

Fig. 9
Sedimentation with rotating mixer: snapshots of simulation for sedimentation of 4913 spheres on a box shaped container with a rotating mixer with constant angular velocity
Fig. 9
Sedimentation with rotating mixer: snapshots of simulation for sedimentation of 4913 spheres on a box shaped container with a rotating mixer with constant angular velocity
Close modal

We compute the Newton step through the solution of the corresponding Schur complement system. In order to test performance of the TT-based preconditioner, we compare precomputation and solution times for a preconditioned biconjugate gradient stabilized (BICGSTAB) method against Incomplete LU and unpreconditioned versions of this iterative solver. We note that while use of the conjugate gradient method may be generally preferrable for these systems, we observe its performance may degrade due to ill-conditioning as iterates approach the feasible set boundary, and so for simplicity of presentation we exclude it from our comparison.

As discussed in Sec. 3.3, the Tensor train approach allows us to produce approximate direct and preconditioned iterative solvers by varying target accuracy and maximum TT rank in the compression and inversion processes. Following preliminary parametric studies, we find that setting target accuracy for TT inversion to 1 × 10−2 and capping maximum TT ranks at r10 provides the best performance in terms of the trade-offs involved in precomputation and TT preconditioner apply costs for our experimental setup.

We note that for practical purposes, the maximum number of iterations for the unpreconditioned solve was set at 1000; this limit was often reached reducing the overall accuracy of the resulting linear system solution and thus the quality of the PDIP iteration. Through further testing, we consistently observe average BICG iteration counts of 3–5 × 103 and a fivefold and sixfold increase in computational cost when removing this constraint. These estimates should be considered whenever comparing either of the preconditioned methods against unpreconditioned BICG.

In Fig. 10 we can observe how performance for each of these solvers scales with the number of collisions Nc. Two factors are contributing to increase problem complexity in this experiment: as objects sediment in the box, the number of collisions tends to increase and the Schur complement matrix becomes less sparse. From this plot, we can readily observe that the TT preconditioned solver generally shows superior scaling, outperforming the ILU preconditioner at about Nc ∼ 20,000, and gaining an order of magnitude speed-up against the ILU preconditioner by the end of experiments with M =35,939, 274,267. We can observe that while the fraction of time spent in precomputation tends to a constant for ILU (indicating similar asymptotic scaling for precomputation and solve times, experimentally O(Nc3), it quickly goes down to zero for the TT.

Fig. 10
Log-log plot solver comparison: for each experiment, we bin according to  log10Nc and plot average PDIP iteration times; Unpreconditioned BICGSTAB in red, ILU-BICGSTAB in green and TT-BICGSTAB in blue. For preconditioned solvers, we display the proportion spent in precomputation in a darker shade, solve times in a lighter one.
Fig. 10
Log-log plot solver comparison: for each experiment, we bin according to  log10Nc and plot average PDIP iteration times; Unpreconditioned BICGSTAB in red, ILU-BICGSTAB in green and TT-BICGSTAB in blue. For preconditioned solvers, we display the proportion spent in precomputation in a darker shade, solve times in a lighter one.
Close modal

In Fig. 11, we take a closer look at average precomputation times for each PDIP iteration for TT and ILU. We note that this involves computing roughly 50–100 factorizations, one per Newton iteration. In all experiments, we confirm that compression and inversion times for the Tensor train approach grow extremely slowly with Nc, staying on a range from 5 to 15 s.

Fig. 11
Log-log plot precomputation comparison: for each experiment, we bin according to  log10Nc and plot average preocmputation times; ILU-BICGSTAB in dark green and TT-BICGSTAB in dark blue
Fig. 11
Log-log plot precomputation comparison: for each experiment, we bin according to  log10Nc and plot average preocmputation times; ILU-BICGSTAB in dark green and TT-BICGSTAB in dark blue
Close modal

Finally, we compare BICGSTAB average iteration counts in Fig. 12. We note that while iteration counts generally increase at the beginning of the experiment, those for the TT grow slower and settle sooner as particles sediment; for M =35,939, 274,627, the number of iterations for the ILU becomes up to eight times larger. For the unpreconditioned case, the maximum iteration count of 1000 is reached for a significant number of linear system solves, limiting the accuracy of the resulting Newton steps.

Fig. 12
Average iteration count comparison: for each experiment, we bin according to  log10Nc and compare average BICGSTAB iteration counts; Unpreconditioned BICGSTAB in red, ILU-BICGSTAB in green and TT-BICGSTAB in blue
Fig. 12
Average iteration count comparison: for each experiment, we bin according to  log10Nc and compare average BICGSTAB iteration counts; Unpreconditioned BICGSTAB in red, ILU-BICGSTAB in green and TT-BICGSTAB in blue
Close modal

4.2.2 Drafting Test With Rectangular Blade.

We follow the validation experiment in Ref. [12]; as shown in Fig. 13, we setup a drafting test involving a rectangular blade of width 0.1 moving through a container filled with spherical rigid particles of radius 0.1 and friction coefficient μ = 0.25. This test may be used to compute the force that the blade experiences as it moves through the granular flow, which eventually reaches a steady-state. As in the sedimentation test above, we setup a perturbed lattice of (2n + 1)3 spherical particles inside the box, and perform experiments for n =8, 16 for a total number of M =4915, 35,939 rigid bodies. The blade moves from one end of the box to the other with a prescribed sinusoidal velocity with period 4 s.

Fig. 13
Blade drafting test: snapshots of simulation for rectangular blade drafting test with 729 spheres inside a closed box container. The blade moves on the x direction with prescribed sinusoidal velocity.
Fig. 13
Blade drafting test: snapshots of simulation for rectangular blade drafting test with 729 spheres inside a closed box container. The blade moves on the x direction with prescribed sinusoidal velocity.
Close modal

4.2.3 Direct Shear Experiment.

Finally, we include a direct shear test commonly used to measure the shear strength properties of granular soil, used in Ref. [44] to validate the DEM CCP approach for frictional contact. In this test, a soil sample is placed inside of a box and subjected to a normal load force (exterted by a cell weighing down on the material). The top half of the shear box is clamped while the lower is displaced in a controlled fashion from left to right, shearing the soil sample. This test can then be utilized to measure the shear stress and other relevant properties as a function of the shear displacement in the box. For our experiments, a perturbed box-shaped lattice with (3n + 1)(2n + 1)2 spherical particles is setup inside the box, experiencing shear from the lower half of the box and a load from a ceiling press ten times denser than the granular material. We perform experiments for n =8, 16, for a total number of M =7228, 53,364 rigid bodies. The movement of the lower half is again controlled prescribing a sinusoidal velocity. Simulation snapshots for 1023 bodies are shown in Fig. 14.
Fig. 14

Direct shear box experiment: snapshots of simulation for direct shear test with 1053 spheres inside a closed box container. The bottom half of the box is displaced in the x direction with a prescribed sinusoidal velocity, and the granular fluid is loaded on the top by a rectangular press.

Fig. 14

Direct shear box experiment: snapshots of simulation for direct shear test with 1053 spheres inside a closed box container. The bottom half of the box is displaced in the x direction with a prescribed sinusoidal velocity, and the granular fluid is loaded on the top by a rectangular press.

Close modal
Fig. 15

Log-log plot solver comparison for blade drafting ((a) and (b)) and direct shear ((c) and (d)) experiments: for each experiment, we bin according to log10Nc and plot average PDIP iteration times; Unpreconditioned BICGSTAB in red, ILU-BICGSTAB in green and TT-BICGSTAB in blue. For preconditioned solvers, we display the proportion spent in precomputation in a darker shade, solve times in a lighter one; (a) blade test Nb = 4915, (b) blade test Nb = 35939, (c) shear test Nb = 7228, and (d) shear test Nb = 53364.

Fig. 15

Log-log plot solver comparison for blade drafting ((a) and (b)) and direct shear ((c) and (d)) experiments: for each experiment, we bin according to log10Nc and plot average PDIP iteration times; Unpreconditioned BICGSTAB in red, ILU-BICGSTAB in green and TT-BICGSTAB in blue. For preconditioned solvers, we display the proportion spent in precomputation in a darker shade, solve times in a lighter one; (a) blade test Nb = 4915, (b) blade test Nb = 35939, (c) shear test Nb = 7228, and (d) shear test Nb = 53364.

Close modal

In Fig. 15, we once again show a comparison of performance for each of the three solvers as it scales with number of collisions Nc. For all four experiments we observe essentially the same scaling for solution and precomputation times for the TT preconditioned solver, with TT precomputation times again staying roughly around 10 s per timestep. The overall speedups attained are slightly smaller (about 5–10×), due likely to the increase in problem complexity and the force and velocity magnitudes involved compared to the sedimentation case. However, it remains the case that the TT provides a significantly more robust and better acceleration to the linear system, with better scaling precomputation times and reduced iteration counts than the ILU sparse preconditioner. As is the case for the sedimentation tests, due to the limit of the maximum number of iterations for the unpreconditioned solve, average iteration counts and timings can be estimated to be about five times higher than presented in these plots for full accuracy.

5 Conclusions

In this work we have presented a robust and highly efficient acceleration technique for the solution of Newton step linear systems in second-order methods based on approximate hierarchical compression and inversion in the Tensor train format. In multiple experiments for common terramechanics phenomena modeled with frictional contact for dense, multiple rigid body systems, we have successfully applied a TT preconditioner to accelerate their solution, providing speed-ups of up to an order of magnitude against state-of-the-art sparse solvers for systems with Nc20000. Across all our experiments, we observe that the Tensor train preconditioner provides lower and more reliable iteration count reductions, as well as practically constant precomputation costs and storage requirements, which are improved significantly by our proposed TT factorization re-use techniques.

As discussed in Sec. 1, the benefits of rapid, problem-independent convergence of second-order optimization solvers are often negated by expensive large sparse matrix solves. The application of sparse and structured linear algebra techniques has been thus far limited by unfavorably scaling precomputation costs, excessive memory storage and communication requirements and the absence of efficient global factorization updates. We have demonstrated that the Tensor train approach can successfully address these issues in DEM complementarity simulations of granular media. Based on its versatility and exploitation of a wide class of hierarchical low rank structure, we expect this to be true for a wide array of large scale optimization problems. We also anticipate that the low precomputation cost and storage requirements provided by the TT will be most impactful in high performance computing implementations; our ongoing work features a distributed memory implementation of the frictional contact CCP and the TT accelerated PDIP solver.

We note that the optimization algorithms and proposed accelerations in this work were tested for a complementarity formulation for perfectly plastic contact. As mentioned in Sec. 2, similar formulations exist for elastic and partially elastic (bouncing) contact, and we expect the algorithms presented here to apply readily to that case and to display similar performance.

Acknowledgment

We acknowledge support from the Automotive Research Center (ARC) in accordance with Cooperative Agreement W56HZV-14-2-0001 with U.S. Army Tank Automotive Research, Development and Engineering Center (TARDEC). Corona and Veerapaneni were also supported by the NSF under grant DMS-1454010. This research was supported in part through computational resources and services provided by Advanced Research Computing Center at the University of Michigan, Ann Arbor. Corona would like to thank Daniel Negrut, Radu Serban, Luning Fang and Milad Rakhsha at the Simulation Based Engineering Laboratory (SBEL) at UW Madison for their assistance with this project and for many helpful discussions.

Funding Data

  • National Science Foundation (Award ID: DMS-1454010; Funder ID: 10.13039/100000001).

  • Tank Automotive Research, Development and Engineering Center (Award ID: W56HZV-14-2-0001; Funder ID: 10.13039/100009922).

References

1.
Cundall
,
P. A.
, and
Strack
,
O. D.
,
1979
, “
A Discrete Numerical Model for Granular Assemblies
,”
Geotechnique
,
29
(
1
), pp.
47
65
.10.1680/geot.1979.29.1.47
2.
Pazouki
,
A.
,
Kwarta
,
M.
,
Williams
,
K.
,
Likos
,
W.
,
Serban
,
R.
,
Jayakumar
,
P.
, and
Negrut
,
D.
,
2017
, “
Compliant Contact Versus Rigid Contact: A Comparison in the Context of Granular Dynamics
,”
Phys. Rev. E
,
96
(
4
), p.
042905
.10.1103/PhysRevE.96.042905
3.
Anitescu
,
M.
,
2006
, “
Optimization-Based Simulation of Nonsmooth Rigid Multibody Dynamics
,”
Math. Program.
,
105
(
1
), pp.
113
143
.10.1007/s10107-005-0590-7
4.
Tasora
,
A.
, and
Anitescu
,
M.
,
2010
, “
A Convex Complementarity Approach for Simulating Large Granular Flows
,”
ASME J. Comput. Nonlinear Dyn.
,
5
(
3
), p.
031004
.10.1115/1.4001371
5.
Anitescu
,
M.
, and
Potra
,
F. A.
,
1997
, “
Formulating Dynamic Multi-Rigid-Body Contact Problems With Friction as Solvable Linear Complementarity Problems
,”
Nonlinear Dyn.
,
14
(
3
), pp.
231
247
.10.1023/A:1008292328909
6.
Stewart
,
D. E.
, and
Trinkle
,
J. C.
,
1996
, “
An Implicit Time-Stepping Scheme for Rigid Body Dynamics With Inelastic Collisions and Coulomb Friction
,”
Int. J. Numer. Methods Eng.
,
39
(
15
), pp.
2673
2691
.10.1002/(SICI)1097-0207(19960815)39:15<2673::AID-NME972>3.0.CO;2-I
7.
Fang
,
L.
,
2015
, “
A Primal-Dual Interior Point Method for Solving Multibody Dynamics Problems With Frictional Contact
,” Ph.D. thesis, University of Wisconsin–Madison, Madison, WI.
8.
Heyn
,
T.
,
Anitescu
,
M.
,
Tasora
,
A.
, and
Negrut
,
D.
,
2013
, “
Using Krylov Subspace and Spectral Methods for Solving Complementarity Problems in Many-Body Contact Dynamics Simulation
,”
Int. J. Numer. Methods Eng.
,
95
(
7
), pp.
541
561
.10.1002/nme.4513
9.
Jan
,
K.
,
2015
,
Simulating Granular Material Using Nonsmooth Time-Stepping and a Matrix-Free Interior Point Method
,
Fraunhofer Verlag
, Stuttgart, Germany.
10.
Mazhar
,
H.
,
Heyn
,
T.
,
Negrut
,
D.
, and
Tasora
,
A.
,
2015
, “
Using Nesterov's Method to Accelerate Multibody Dynamics With Friction and Contact
,”
ACM Trans. Graph. (TOG)
,
34
(
3
), p.
32
.10.1145/2735627
11.
Petra
,
C.
,
Gavrea
,
B.
,
Anitescu
,
M.
, and
Potra
,
F.
,
2009
, “
A Computational Study of the Use of an Optimization-Based Method for Simulating Large Multibody Systems?
,”
Optim. Methods Software
,
24
(
6
), pp.
871
894
.10.1080/10556780902806094
12.
Daniel
,
M.
,
Luning
,
F.
,
Paramsothy
,
J.
, and
Dan
,
N.
,
2017
, “
A Comparison of Numerical Methods for Solving Multibody Dynamics Problems With Frictional Contact Modeled Via Differential Variational Inequalities
,”
Comput. Methods Appl. Mech. Eng.
,
320
, pp.
668
693
.10.1016/j.cma.2017.03.010
13.
Benzi
,
M.
,
2002
, “
Preconditioning Techniques for Large Linear Systems: A Survey
,”
J. Comput. Phys.
,
182
(
2
), pp.
418
477
.10.1006/jcph.2002.7176
14.
Saad
,
Y.
,
2003
,
Iterative Methods for Sparse Linear Systems
, Vol.
82
,
SIAM, Philadelphia, PA
.
15.
Li
,
A.
,
Serban
,
R.
, and
Negrut
,
D.
,
2017
, “
Analysis of a Splitting Approach for the Parallel Solution of Linear Systems on GPU Cards
,”
SIAM J. Sci. Comput.
,
39
(
3
), pp.
C215
C237
.10.1137/15M1039523
16.
Corona
,
E.
,
Rahimian
,
A.
, and
Zorin
,
D.
,
2015
, “
A Tensor-Train Accelerated Solver for Integral Equations in Complex Geometries
,” preprint
arXiv: 1511.06029
.https://arxiv.org/abs/1511.06029
17.
Ambikasaran
,
S.
, and
Darve
,
E.
,
2013
, “
An O(N Log N) Fast Direct Solver for Partial Hierarchically Semi-Separable Matrices
,”
J. Sci. Comput.
,
57
(
3
), pp.
477
501
.10.1007/s10915-013-9714-z
18.
Chandrasekaran
,
S.
,
Dewilde
,
P.
,
Gu
,
M.
,
Lyons
,
W.
, and
Pals
,
T.
,
2007
, “
A Fast Solver for HSS Representations Via Sparse Matrices
,”
SIAM J. Matrix Anal. Appl.
,
29
(
1
), pp.
67
81
.10.1137/050639028
19.
Chandrasekaran
,
S.
,
Gu
,
M.
, and
Pals
,
T.
,
2006
, “
A Fast ULV Decomposition Solver for Hierarchically Semiseparable Representations
,”
SIAM J. Matrix Anal. Appl.
,
28
(
3
), pp.
603
622
.10.1137/S0895479803436652
20.
Gillman
,
A.
,
2011
, “
Fast Direct Solvers for Elliptic Partial Differential Equations
,” Ph.D. thesis, University of Colorado, Boulder, CO.
21.
Kenneth
,
L. H.
, and
Lexing
,
Y.
,
2015
, “
Hierarchical Interpolative Factorization for Elliptic Operators: Differential Equations
,”
Commun. Pure Appl. Math.
,
69
(
8
), pp.
1415
1451
.10.1002/cpa.21582
22.
Xia
,
J.
,
Chandrasekaran
,
S.
,
Gu
,
M.
, and
Li
,
X. S.
,
2010
, “
Superfast Multifrontal Method for Large Structured Linear Systems of Equations
,”
SIAM J. Matrix Anal. Appl.
,
31
(
3
), pp.
1382
1411
.10.1137/09074543X
23.
Xia
,
J.
,
Chandrasekaran
,
S.
,
Gu
,
M.
, and
Li
,
X. S.
,
2010
, “
Fast Algorithms for Hierarchically Semiseparable Matrices
,”
Numer. Linear Algebra Appl.
,
17
(
6
), pp.
953
976
.10.1002/nla.691
24.
Bebendorf
,
M.
,
2008
, “
A Means to Efficiently Solve Elliptic Boundary Value Problems
,”
Hierarchical Matrices
(Lecture Notes in Computational Science and Engineering, Vol.
63
),
Springer-Verlag
,
Berlin
.
25.
Börm
,
S.
,
Grasedyck
,
L.
, and
Hackbusch
,
W.
,
2003
,
Hierarchical Matrices
(Lecture Notes, Vol.
21
), Max-Planck-Institut für Mathematik, Leipzig, Germany.
26.
Börm
,
S.
,
2010
, Efficient Numerical Methods for Non-Local Operators: H2-Matrix Compression, Algorithms and Analysis (EMS Tracts in Mathematics, Vol.
14
),
European Mathematical Society (EMS)
,
Zürich, Switzerland
.
27.
Coulier
,
P.
,
Pouransari
,
H.
, and
Darve
,
E.
,
2017
, “
The Inverse Fast Multipole Method: Using a Fast Approximate Direct Solver as a Preconditioner for Dense Linear Systems
,”
SIAM J. Sci. Comput.
,
39
(
3
), pp.
A761
A796
.10.1137/15M1034477
28.
Pouransari
,
H.
,
Coulier
,
P.
, and
Darve
,
E.
,
2017
, “
Fast Hierarchical Solvers for Sparse Matrices Using Extended Sparsification and Low-Rank Approximation
,”
SIAM J. Sci. Comput.
,
39
(
3
), pp.
A797
A830
.10.1137/15M1046939
29.
Stewart
,
D. E.
,
2000
, “
Rigid-Body Dynamics With Friction and Impact
,”
SIAM Rev.
,
42
(
1
), pp.
3
39
.10.1137/S0036144599360110
30.
Barzilai
,
J.
, and
Borwein
,
J. M.
,
1988
, “
Two-Point Step Size Gradient Methods
,”
IMA J. Numer. Anal.
,
8
(
1
), pp.
141
148
.10.1093/imanum/8.1.141
31.
Andersen
,
M.
,
Dahl
,
J.
,
Liu
,
Z.
, and
Vandenberghe
,
L.
,
2011
, “
Interior-Point Methods for Large-Scale Cone Programming
,”
Optim. Mach. Learn.
,
5583
, pp.
1
26
.http://people.compute.dtu.dk/~mskan/publications/mlbook.pdf
32.
Kučera
,
R.
,
Machalová
,
J.
,
Netuka
,
H.
, and
Ženčák
,
P.
,
2013
, “
An Interior-Point Algorithm for the Minimization Arising From 3D Contact Problems With Friction
,”
Optim. Methods Software
,
28
(
6
), pp.
1195
1217
.10.1080/10556788.2012.684352
33.
Kleinert
,
J.
,
Simeon
,
B.
, and
Obermayr
,
M.
,
2014
, “
An Inexact Interior Point Method for the Large-Scale Simulation of Granular Material
,”
Comput. Methods Appl. Mech. Eng.
,
278
, pp.
567
598
.10.1016/j.cma.2014.06.009
34.
Mangoni
,
D.
,
Tasora
,
A.
, and
Garziera
,
R.
,
2018
, “
A Primal–Dual Predictor–Corrector Interior Point Method for Non-Smooth Contact Dynamics
,”
Comput. Methods Appl. Mech. Eng.
,
330
, pp.
351
367
.10.1016/j.cma.2017.10.030
35.
Nocedal
,
J.
, and
Wright
,
S.
,
2006
,
Nonlinear Equations
,
Springer
, New York.
36.
Oseledets
,
I. V.
, and
Tyrtyshnikov
,
E.
,
2010
, “
TT-Cross Approximation for Multidimensional Arrays
,”
Linear Algebra Its Appl.
,
432
(
1
), pp.
70
88
.10.1016/j.laa.2009.07.024
37.
Kazeev
,
V. A.
, and
Khoromskij
,
B.
,
2012
, “
Low-Rank Explicit QTT Representation of the Laplace Operator and Its Inverse
,”
SIAM J. Matrix Anal. Appl.
,
33
(
3
), pp.
742
758
.10.1137/100820479
38.
Oseledets
,
I. V.
,
2010
, “
Approximation of 2d × 2d Matrices Using Tensor Decomposition
,”
SIAM J. Matrix Anal. Appl.
,
31
(
4
), pp.
2130
2145
.10.1137/090757861
39.
Oseledets
,
I. V.
,
Tyrtyshnikov
,
E.
, and
Zamarashkin
,
N.
,
2011
, “
Tensor-Train Ranks for Matrices and Their Inverses
,”
Comput. Methods Appl. Math.
,
11
(
3
), pp.
394
403
.10.2478/cmam-2011-0022
40.
Bern
,
M.
,
Eppstein
,
D.
, and
Teng
,
S.-H.
,
1999
, “
Parallel Construction of Quadtrees and Quality Triangulations
,”
Int. J. Comput. Geom. Appl.
,
9
(
06
), pp.
517
532
.10.1142/S0218195999000303
41.
Dolgov
,
S.
, and
Savostyanov
,
D. V.
,
2013
, “
Alternating Minimal Energy Methods for Linear Systems in Higher Dimensions—Part I: SPD Systems
,” Preprint
arXiv:1301.6068
.https://arxiv.org/abs/1301.6068
42.
Dolgov
,
S.
, and
Savostyanov
,
D. V.
,
2013
, “
Alternating Minimal Energy Methods for Linear Systems in Higher Dimensions—Part II: Faster Algorithm and Application to Nonsymmetric Systems
,” Preprint
arXiv: 1304.1222
.https://arxiv.org/abs/1304.1222
43.
Oseledets
,
I. V.
, and
Dolgov
,
S. V.
,
2012
, “
Solution of Linear Systems and Matrix Inversion in the TT-Format
,”
SIAM J. Sci. Comput.
,
34
(
5
), pp.
A2718
A2739
.10.1137/110833142
44.
Melanz
,
D.
,
Jayakumar
,
P.
, and
Negrut
,
D.
,
2016
, “
Experimental Validation of a Differential Variational Inequality-Based Approach for Handling Friction and Contact in Vehicle/Granular-Terrain Interaction
,”
J. Terramechanics
,
65
, pp.
1
13
.10.1016/j.jterra.2016.01.004
45.
Tasora
,
A.
,
Serban
,
R.
,
Mazhar
,
H.
,
Pazouki
,
A.
,
Melanz
,
D.
,
Fleischmann
,
J.
,
Taylor
,
M.
,
Sugiyama
,
H.
, and
Negrut
,
D.
,
2015
, “
Chrono: An Open Source Multi-Physics Dynamics Engine
,”
International Conference on High Performance Computing in Science and Engineering
(
HPCSE
),
Solan, Czech Republic
,
May 25–28
, pp.
19
49
.https://www.researchgate.net/publication/303761434_Chrono_An_Open_Source_Multi-physics_Dynamics_Engine
46.
Oseledets
,
I. V.
,
2012
, “TT-Toolbox 2.2,” Skoltech, Moscow, Russia.