Abstract
The purpose of this work is to compare learning algorithms to identify which is the fastest and most accurate for training mechanical neural networks (MNNs). MNNs are a unique class of lattice-based artificial intelligence (AI) architected materials that learn their mechanical behaviors with repeated exposure to external loads. They can learn multiple behaviors simultaneously in situ and re-learn desired behaviors after being damaged or cut into new shapes. MNNs learn by tuning the stiffnesses of their constituent beams similar to how artificial neural networks (ANNs) learn by tuning their weights. In this work, we compare the performance of six algorithms (i.e., genetic algorithm, full pattern search, partial pattern search, interior point, sequential quadratic progression, and Nelder–Mead) applied to MNN leaning. A computational model was created to simulate MNN learning using these algorithms with experimentally measured noise included. A total of 3900 runs were simulated. The results were validated using experimentally collected data from a physical MNN. We identify algorithms like Nelder–Mead that are both fast and able to reject noise. Additionally, we provide insights into selecting learning algorithms based on the desired balance between accuracy and speed, as well as the general characteristics that are favorable for training MNNs. These insights will promote more efficient MNN learning and will provide a foundation for future algorithm development.
1 Introduction
Artificial intelligence (AI) has enabled the automation of many complex tasks [1–3] through the extraordinary ability of learning. Although artificial neural networks (ANNs) were developed in the 1950s [4,5] and had rich potential as universal approximators [6], ANNs were not initially adopted broadly. They proved challenging and computationally expensive to train even for the best algorithms of their time. The modern recurrence of ANN-based solutions is largely due to the application of new techniques such as backpropagation [7], which allow ANNs to use more powerful gradient-based learning algorithms to efficiently adjust their weights during the learning process [1,8].
More recently, physical neural networks (PNNs) [9–12] have been developed due to advantages such as high-power efficiency [13], simple digital to analog conversion [14], and the ability to learn physical properties [15]. Despite these advantages, PNNs are not currently used broadly due largely to the same problems that were initially faced by ANNs when they were first proposed (i.e., PNNs lack effective techniques and algorithms to efficiently train them). In physical systems, gradients are not always easily calculable, making gradient-based algorithms hard to use. When gradient methods are applicable, they often come at the expense of future adaptability [16] or simplicity by requiring masking signals to control unwanted system dynamics [14]. Other techniques avoid using a gradient such as the No-Prop algorithm [17], which enables learning from in situ oscillating systems [18–20] but does so computationally rather than by changing the system's physical properties. Physics-driven learning (PDL) is another gradient-free approach for physical learning that can quickly train physical systems with AI abilities [15,21] by gradually modifying a system through external perturbations toward desired states. However, PDL requires the ability to externally manipulate the system's outputs [22] to function. Kaveh and Bakhshpoori [23] provide a review of related metaheuristic algorithms.
Most recently, mechanical neural networks (MNNs) were proposed [24] as PNN-architected materials that learn their behaviors and properties by tuning the stiffness of their constituent beams similarly to how an ANN tunes its weights. This framework allows a single architected material to modify its physical properties in situ and learn a wide variety of behaviors simultaneously using simple gradient-free algorithms. Although the work of Lee et al. [24] demonstrates MNN learning, the speed and accuracy of the algorithms used to guide the process of determining a working combination of beam stiffness values that achieved desired behaviors were not sufficiently practical.
In this article, six learning algorithms are compared using both simulation and experimentation using the same physical MNN presented by Lee et al. [24] to identify the fastest and most accurate algorithm in general for making MNNs a practical reality. The sensitivity of these six algorithms to system noise is also compared. This study provides insights into the process for selecting learning algorithms based on the desired balance between the accuracy and speed of learning general behaviors using MNNs as well as insights regarding the general characteristics that are favorable for training such MNNs. Thus, the contributions of this article will enable a host of practical MNN applications including (i) aircraft wings that learn to deform with the optimal shapes that best increase fuel efficiency and maneuverability regardless of unanticipated and changing wind conditions; (ii) armor that learns to most effectively dampen or redirect shock waves regardless of the nature of the explosion or impact to best protect the person, vehicle, or structure; (iii) building foundations or bracings that learn to most effectively reduce shaking in the midst of different kinds of earthquakes that originate in different places; and (iv) optical mounts that learn to keep lenses in place within aircraft light-detection-and-ranging systems regardless of the dramatic changes in temperature and altitude that the airplane experiences during use.
1.1 Mechanical Neural Network Learning Approach.
This section describes how MNNs learn desired behaviors. As mentioned previously, MNNs are lattices consisting of beams with adjustable axial stiffness. The adjustable beams used by Lee et al. [24] (shown in Fig. 1(a)) are electromechanical compliant mechanisms that use voice coil actuators, flexure bearings, and strain gauge sensors to adjust their axial stiffness via closed-loop control. The beams are assembled within a regular triangular lattice and join together at nodes, which are composed of rotary flexures (Fig. 1(a)). These flexures allow the lattice to accommodate deformations as they extend and contract along their axes. To load the MNN, pairs of voice coil actuators connected to a set of decoupling flexures (Fig. 1(b)) are placed on each of the lattice's input nodes (Fig. 1(c)), which allow the actuators to independently apply forces to the input nodes.

(a) The tunable stiffness beams that constitute the MNN of this study. (b) The actuators and decoupling flexures that enable the input nodes of the MNN to be driven in any in-plane direction. (c) The fabricated MNN of this study shown with blue lines drawn on top of each tunable beam. (d) Two shape-morphing behaviors that the simulated MNN achieved by finding a working combination of axial stiffness values using a learning algorithm.

(a) The tunable stiffness beams that constitute the MNN of this study. (b) The actuators and decoupling flexures that enable the input nodes of the MNN to be driven in any in-plane direction. (c) The fabricated MNN of this study shown with blue lines drawn on top of each tunable beam. (d) Two shape-morphing behaviors that the simulated MNN achieved by finding a working combination of axial stiffness values using a learning algorithm.
The 21 beams within the physical MNN in Fig. 1(c) are shown as simplified blue beam lines in the lattice. The MNN has two input nodes on its left side and two output nodes on its right side. Shape-morphing behaviors are achieved when the lattice's output nodes displace to desired target displacements in response to its input nodes being loaded by desired input forces. The lattice learns its behaviors simultaneously by first assigning each beam in the lattice with random stiffness values. The lattice is then loaded with the desired input forces of each behavior, and its resulting output node displacements are then subtracted from the desired target displacements of the corresponding behavior. A mean-squared error (MSE) is then calculated by averaging the square of these difference values for all the behaviors simultaneously. A learning algorithm then determines and assigns a new combination of axial stiffness values to the beams of the lattice and the process repeats in an effort to minimize the calculated MSE until the desired behaviors are all simultaneously achieved.
A computational tool used this approach to simulate the learning of the two example behaviors as shown in Fig. 1(d). The first behavior was achieved when the lattice's top output node displaced 0.25 mm to the right and when its bottom output node displaced 0.25 mm to the left in response to its input nodes being loaded to the right with the same force magnitude. The second behavior was achieved when the lattice's top output node displaced 0.25 mm to the left and when its bottom output node displaced 0.25 mm to the right in response to its input nodes being loaded up with the same force magnitude. For visual clarity, the lattices’ displacements shown in Fig. 1(d) were multiplied 100-fold. Note from Fig. 1(d) that the learning algorithm used (i.e., sequential quadratic (SQ) programming) successfully identified a combination of axial stiffness values, depicted as different shades of blue, which enabled the lattice to simultaneously achieve both behaviors with an impressively small MSE of 1.4−4 mm2.
This article compares six of the most promising learning algorithms applied to MNNs to identify which can most rapidly identify a combination of beam axial stiffness values that enable their lattices to achieve desired behaviors as accurately as possible. Here, learning accuracy is measured using MSE. The lower the algorithm can make MNNs final MSE, the more accurately the MNN learns its desired behaviors. Learning speed is measured using the number of iterations that the algorithm requires to arrive at the final MSE. Here, one iteration is achieved each time an MSE is calculated during the learning process. Iterations roughly scale with learning time and are thus a fair way to compare the speed of different algorithms that are applied with different time scales (e.g., in simulation and experimental learning scenarios).
2 Learning Algorithms
This section describes the six learning algorithms that were compared for the MNN studies of this article. The approach of each algorithm is briefly summarized, and the hyperparameters that were used for this study are also provided.
2.1 Genetic Algorithm.
The genetic algorithm (GA) optimizes systems using a process similar to natural evolution [25]. Every GA has a set of candidate configurations called a population. Once the population is generated, the error for each configuration is evaluated, and a subset of the population is selected to create a new generation of candidates favoring members with low error to keep favorable traits in the population. The new population is generated with some combination of copying members (migration), exchanging traits between individuals (cross-over), and random changes (mutations) [26,27].
The GA implementation in this article uses the default functions and hyperparameters for migration, cross-over, and mutation created by matlab. The population size is set to 500 members, and the initial population is generated randomly for each attempt with axial stiffness varying from −2 N/mm to 2.3 N/mm. The algorithm converges when the minimum error is stagnant for 50 generations.
2.2 Full Pattern.
The pattern search algorithm refers to a class of search-based optimization algorithms conceived in the study by Hooke and Jeeves [28]. This article refers to the pattern search algorithm as the full pattern (FP) search to distinguish it from the partial pattern (PP) search used in the earlier MNN work by Lee et al. [24].
FP attempts to reduce the number of function evaluations needed to find a minimum [29] by sampling a set of points around a given pattern center, instead of searching in every direction. Points in the search pattern correspond to positive or negative deltas for the independent variables [30]. If one of these points has an error value that is lower than the pattern center, the center is moved to the best point. If none of the new points produces a decrease in error, then the delta is decreased. FP repeats these two steps until the delta reaches a minimum threshold, where the algorithm settles to a minimum. From this initial inception, augmented Lagrangian evaluations [31] and more complicated search patterns [32] have improved the efficiency of FP.
The implementation used in this article follows the more basic structure of FP described earlier, but its functionality is tweaked to include additional polling methods [33]. The implementation for this article uses the general hyperparameters from matlab but enforces a minimum pattern size of 5 × 10−2 N/mm.
2.3 Partial Pattern.
PP search is the name given to the algorithm described by Lee et al. [24]. It uses a modified pattern search algorithm. PP has three main features separating it from a simple pattern search. The first modification is in the polling method used. Instead of searching the entire pattern before selecting a new center point, PP evaluates the pattern in a random order and moves to a new center as soon as a point with a lower error is found. The other two modifications work to stabilize the algorithm for noisy error values. PP performs multiple iterations for a given point, which decreases the speed of its readings but increases their precision. PP also periodically reevaluates the error of the pattern center to ensure that pattern comparisons are accurate. As in the FP algorithm, the PP shrinks its pattern once if no lower points exist in the pattern.
In this article, PP starts with a pattern width of 2.15 N/m truncating any values that extend beyond the valid stiffness combinations (−2 N/mm to 2.3 N/mm), and the pattern is shrunk to 90% of its initial value if no better error point exists in the pattern.
2.4 Interior Point.
Interior point (IP) refers to a class of algorithms that optimize constrained systems by implementing logarithmic barrier functions in the Lagrangian. IP methods are used by Jarre et al. [34] in the optimization of truss configurations to respond to a given load. IP improves upon the Lagrange multiplier method as formalized by Fiaco and McCormick [35] for constrained optimization, where the objective function and its domain are subject to a set of mathematical constraints [36]. For Lagrange multipliers, the Lagrangian function acts as a summation of the objective function with each of the constraints scaled by a Lagrange multiplier variable. The variables create a system of equations with an equal number of equations and unknowns, thus allowing for a solution. Using this formulation, any given objective function can be minimized while satisfying any set of mathematical constraints by finding a stationary point for the Lagrangian and ensuring its concavity [37]. The IP algorithm adds the logarithms of slack terms for each inequality constraint to the function's Lagrangian. These barrier functions can decrease the computational cost for computing Jacobians and Hessian matrices, reducing the computational time by speeding up the evaluation of each iteration [38].
For computational efficiency, this article uses matlab's implementation of IP. The matlab IP adds a merit function with the logarithmic barrier function as well as linearized constraints to increase the computation speed. The matrix used in the Lagrangian is constructed to include additional second-order parameters to ensure that the first-derivative terms converge toward a minimum. Each step of the algorithm consists of a symmetrized matrix inversion [39], eventually converging to a stationary point. The general purpose hyperparameters determined by matlab are used in this article.
2.5 Sequential Quadratic.
SQ programing, like IP, is an optimization algorithm that improves upon the traditional method of Lagrange multipliers for constrained optimization problems by operating on second-order Taylor-series approximations instead of solving the full Lagrangian to simplify computation to obtain stable convergence [40]. The second-order approximation method is effective for large-scale nonlinear optimization, where matrix operations are a rate-limiting step and where linear approximations converge slowly.
This article uses matlab's implementation of SQ programming, which uses a positive definite Hessian matrix as the quadratic coefficient; this parameter, similarly to the Lagrange multiplier terms, is updated at every iteration to ensure fast computation until a stationary point is reached [39]. No hyperparameters are changed from the standard implementation suggested in matlab.
2.6 Nelder–Mead.
Developed by Nelder and Mead, the Nelder–Mead (NM) algorithm uses the properties of simplex elements to locate minima for a given objective function [41]. Simplices are an extension of triangles to varying dimensional space (e.g., lines in one dimension, triangles in two dimensions, and tetrahedrons in three dimensions). The method takes the form of a search algorithm that compares the relative values of a simplex element's vertices, to move and deform the shape, until it settles to a minimum with as few function evaluations as possible [42]. The first transformation used is flipping, where the simplex is flipped over the face opposite the vertex with the highest error, a “worst point.” If the flipped vertex is not the new best point, it is extended further from the flipping edge to search for a lower point. If these two transformations do not find a better point, then the point is retracted closer to the face used as a reference for the reflection. Finally, if the simplex achieves no improvements with the prior operations, all vertices are shrunk inward. The NM algorithm repeats these transformations to create a converging series that eventually reaches a local minimum for a given constrained problem. Work by Marandi et al. shows the use of the Nelder–Mead algorithm to optimize sensor network data collection [43].
This article uses the Nelder–Mead simplex algorithm that follows the sequence of transformations as stated earlier to search for a minimum [44]. The NM used in this article has an error tolerance of 0.01 mm2 and a minimum error difference of 0.005 N/mm, and the remaining hyperparameters use the generalized values determined by matlab.
3 Simulation Studies
This section provides the results of this article's simulation studies, which compare the learning speed and accuracy produced by the MNN shown in Fig. 1(c) when the six algorithms provided in Sec. 2 are applied during the learning process. The simulation tool introduced by Lee et al. [24] was used for 21 beams arranged in a triangular lattice to simulate the MNN shown in Fig. 1(c). The nodes along the top and bottom of the lattice were held fixed as shown in Fig. 1(d). The beams’ length (152.4 mm), range of tunable axial stiffness (−2 N/mm to 2.3 N/mm), allowable axial displacement (±2.5 mm), and off-axis passive stiffness values (provided by Lee et al. [24]) were all set to mimic those of the fabricated MNN in Fig. 1(c). The beams were assumed to be linear, and the principle of force scale was applied as described in the study by Lee et al. [24].
Moreover, to help the simulation more closely mimic the fabricated MNN shown in Fig. 1(c), the sensor noise thresholds of the fabricated MNN were directly measured and incorporated into the simulation studies of this section to capture the variability of each algorithm due to system noise.
For each simulation, the procedure detailed in the study by Lee et al. [24] for generating different sets of random behaviors is used. Specifically, the MNN is trained to achieve different sets of two random behaviors. For each behavior, two randomly oriented input forces with magnitudes between ±2 N are applied to the input nodes and random target displacements between ±0.35 mm along the x and y axes (defined in Fig. 1(d)) are assigned to each output node. To ensure that the behaviors in a set are distinct, the MSEs between each behavior's loading forces and target displacements are calculated, and only pairs of behaviors with more than 0.6 N2 and 0.1 mm2 for force and displacement MSE are permitted. Once the pairs of two random behaviors are generated, learning is simulated with the same behaviors using each algorithm.
The first simulation study attempts to train the MNN so that it learns five different sets of two random behavior pairs per algorithm. This is done with five different initial conditions (i.e., different starting combinations of axial beam stiffness values), and the entire process is repeated ten different times resulting in a total of 250 runs per algorithm. The results are provided in Fig. 2.

The average MSE of 250 different simulated runs plotted against the number of iterations using (a) genetic algorithm, (b) full pattern, (c) partial pattern, (d) interior point, (e) sequential quadratic, and (f) Nelder–Mead. The shaded error regions represent one standard deviation. (g) The average MSE plot of all six algorithms plotted together. (h) The final MSE plotted against the final number of iterations for each of the algorithm's 250 runs.

The average MSE of 250 different simulated runs plotted against the number of iterations using (a) genetic algorithm, (b) full pattern, (c) partial pattern, (d) interior point, (e) sequential quadratic, and (f) Nelder–Mead. The shaded error regions represent one standard deviation. (g) The average MSE plot of all six algorithms plotted together. (h) The final MSE plotted against the final number of iterations for each of the algorithm's 250 runs.
Figures 2(a)–2(f) provide plots of the lowest MSE identified versus the number of corresponding iterations for each of the six algorithms in Sec. 2. The solid line in each plot represents the average MSE of all 250 runs. The shaded regions represent one standard deviation from the average and indicate how much MSE variability exists in the system due in part to system noise, differences in initial conditions, and differences in behaviors learned. Note that the GA plot in Fig. 2(a) shows significant variability and begins with a high MSE that drops rapidly with its first generation and then gradually levels off with subsequent generations until it achieves its final lowest MSE. Both the trend and the spread of the FP plot in Fig. 2(b) decrease steadily and converge moderately fast. The PP plot in Fig. 2(c) has an MSE that gradually decreases in a stepwise fashion with a spread that is largely consistent from iteration to iteration. The IP plot in Fig. 2(d) begins with an MSE plateau that then decreases with a large spread between runs. The SQ plot in Fig. 2(e) has many similar characteristics to the IP plot largely because both algorithms rely on Lagrange multipliers. The NM plot in Fig. 2(f) shows a constant, smooth decrease in both average MSE and spread. Note that algorithms with defined starting points (e.g., FP, IP, SQ, and NM), except for PP, have a similar spread in the first iteration.
Figure 2(g) plots the average MSE of all the algorithms on top of each other versus the number of iterations using a log-log scale. For low numbers of iterations, GA, NM, and FP have similar low MSE, whereas PP, IP, and SQ have a similar high MSE. For large numbers of iterations, GA converges to the lowest MSE followed by PP. The trends for the FP and NM plots are similar although their algorithms are substantially different. Although the IP and SQ plots are nearly coincidental at the beginning, after the initial iterations, the SQ plot decreases in MSE more rapidly than the IP plot, but the IP plot converges to a lower MSE.
Figure 2(h) provides the final lowest MSE determined for every run performed by each algorithm (i.e., 250 runs per algorithm), plotted against the final number of iterations at which the algorithms converged. From these data, it is clear that the IP and SQ algorithms produce high MSE (i.e., low accuracy) but converge in relatively few iterations (i.e., high speed). The PP algorithm produces moderate MSEs in moderate numbers of iterations. Although the NM and FP algorithms converge in moderate numbers of iterations (NM is slightly faster than FP), they both achieve impressively low MSE. The GA algorithm consistently converges to the lowest MSE but requires orders of magnitude more iterations (i.e., time) compared with the other algorithms. The spread of the data runs plotted in Fig. 2(h) for each algorithm provides an indication of the algorithm's repeatability.
The data for the NM, FP, and GA algorithms are all positioned in narrow but tall clusters, which indicates that these algorithms consistently converge in a similar amount of time but produce less consistent accuracy. Algorithms that produce fewer visible run points on the plot indicate highly consistent performance between different runs. Note that the PP algorithm, which tends to stabilize the final MSE, has many overlapping data points spread across an area that is relatively short along the y-axis but wide along the x-axis.
The most efficient algorithms will achieve runs with the lowest final MSE in the fewest final number of iterations (i.e., data points closer to the bottom-left corner of Fig. 2(h)). Thus, although GA is the most accurate (i.e., achieves the lowest final MSE), NM and FP achieve sufficiently low MSE in much less time (i.e., fewer iterations). Thus, if accuracy is the only priority, GA is the clear algorithm of choice. But if the learning speed is important so that the MNN can learn in a practical amount of time, NM is likely the best choice since it is a bit faster than FP, but it achieves a sufficiently low final MSE.
The second simulation study attempts again to train the MNN of Fig. 1(c) so that it learns only one set of two random behaviors with only one initial condition repeated ten times (i.e., a total of ten runs) per algorithm to isolate how system noise affects the spread. Thus, this study excludes effects on spread due to differences in learned behaviors and initial conditions. The results of the second study are provided in Figs. 3(a)–3(f). The solid lines in the plots are the average of all ten runs, and the colored error region represents their standard deviation. Note that the spread of the plots shown in Figs. 3(a) and 3(c) is larger than the other plots because, in addition to being a result of system noise, they are also due to the inherent variability of the GA and PP algorithms. Note that as iterations progress in the plots shown in Figs. 3(b), 3(d), and 3(e), the spread due to system noise accumulates. Finally, note that the NM algorithm is the most resistant to system noise in that it exhibits very little spread in the plot shown in Fig. 3(f).

The average MSE of ten different simulated runs for the same two random behaviors and the same initial condition plotted against the number of iterations using (a) genetic algorithm, (b) full pattern, (c) partial pattern, (d) interior point, (e) sequential quadratic, and (f) Nelder–Mead. The shaded error regions represent one standard deviation and predominantly represent the spread due to system noise.

The average MSE of ten different simulated runs for the same two random behaviors and the same initial condition plotted against the number of iterations using (a) genetic algorithm, (b) full pattern, (c) partial pattern, (d) interior point, (e) sequential quadratic, and (f) Nelder–Mead. The shaded error regions represent one standard deviation and predominantly represent the spread due to system noise.
The third simulation study attempts to train the MNN of Fig. 1(c) so that it learns 100 different sets of two random behavior pairs using two different initial conditions repeated twice per behavior (i.e., a total of 400 runs). This simulation study was conducted to identify how the observed trends shown in Figs. 2(g) and 2(h) would change when the MNN learns a greater diversity of random behavior pairs. The results are provided in Fig. 4. The plot shown in Fig. 4(a) provides the average lowest MSE of all 400 runs for each of the six algorithms plotted on top of each other against the number of iterations using a log-log scale. Figure 4(b) provides the final lowest MSE determined for all 400 runs performed by each algorithm, plotted against the final number of iterations at which the algorithms converged. Note that compared with Figs. 2(g) and 2(h), Figs. 4(a) and 4(b) show remarkably similar and consistent trends. The most noticeable difference is observed in the performance of the PP algorithm. The data points corresponding to each of the PP algorithm runs in Fig. 4(b) show even more consistency in the final MSE identified with even more variability in the final number of iterations required to converge on a solution compared with the results shown in Fig. 2(h).

(a) The average MSE generated by all 6 algorithms simulated with 2 different runs for 100 different random behavior pairs and 2 different initial conditions plotted on top of each other. (b) The final MSE plotted against the final number of iterations for each algorithm's 400 runs.
4 Experimental Studies
Although the fabricated MNN of Fig. 1(c) is not capable of generating learning data as rapidly as the simulation tool, it was able to generate enough data to compare the learning capabilities of the six algorithms and validate the results of the simulation studies. The results of the experimentally collected data differ from the simulated results in that force scaling [24] was not applied during the learning process and reality considers many effects that are not considered by the simulation tool (e.g., large deformation nonlinearities, dynamic effects, and thermal effects). Additional photos and details regarding the experimental setup of the MNN used to conduct the studies of this section are provided in Ref. [24].
The physical MNN of Fig. 1(c) was trained to learn two different sets of two random behaviors with one initial condition for each of the six learning algorithms during a 6-month study (i.e., a total of two runs per algorithm). The results are provided in Fig. 5. The plot shown in Fig. 5(a) provides the average lowest MSE of the two runs for each of the six algorithms plotted on top of each other against the number of iterations using a log-log scale. Figure 5(b) provides the final lowest MSE determined for each of the two runs performed by each algorithm, plotted against the final number of iterations at which the algorithms converged. Note that although the results are not compared as well as the simulated plots of Figs. 2(g) and 2(h) are compared with the simulated plots of Figs. 4(a) and 4(b), the general trends do match. Although the IP and SQ algorithms were the fastest, they did not learn with sufficient accuracy to use in practical MNN applications. At the other extreme, although the GA and PP algorithm learned with the most impressive accuracy, they required an impractical amount of time to learn (the GA algorithm occupied most of the six-month study). Thus, for practical applications, the NM and FP algorithms were able to learn sufficiently quickly and accurately and are thus most suited for MNN learning in general. Furthermore, note that the NM and FP algorithms seemed to learn faster in the real MNN.

(a) The average MSE generated by all six algorithms simulated for two different random behavior pairs and one initial condition plotted on top of each other. (b) The final MSE plotted against the final number of iterations for each algorithm's two runs.
To illustrate the correlation between the learning trials of the simulated and experimental data, the bar chart shown in Fig. 6 is generated. The simulation values of this chart were calculated by dividing the final average MSE achieved by each algorithm from the simulation plot shown in Fig. 4(a) by the initial average MSE of the corresponding algorithm from the same plot. The experimental values of this chart were similarly calculated by dividing the final average MSE achieved by each algorithm from the experimental plot shown in Fig. 5(a) by the initial average MSE of the corresponding algorithm from the same plot. Note from Fig. 6 that although all of the final MSE to initial MSE ratios of the experimental data are consistently larger than the corresponding ratios of the simulation data, the relative performance of both the experimental and simulation approaches are similar for each algorithm, thus validating the results of the simulation approach. The fact that the experimental ratios are consistently larger than the corresponding simulation ratios indicates that, on average, the simulated MNN learned with greater accuracy than the fabricated MNN. These ratios would likely become more similar though if similar numbers of runs could be performed experimentally as were performed in simulation. Recall that the simulation data shown in Fig. 6 are derived from 400 runs per algorithm, whereas the experimental data shown in Fig. 6 are derived from only two runs per algorithm.
5 Conclusion
In this article, we compared the efficiency (i.e., speed and accuracy) of six different learning algorithms applied to MNNs that attempted to learn behaviors via simulation and experimentation. We found that although fast, Lagrangian methods like SQ programing and IP did not learn with sufficient accuracy (i.e., they converged to unacceptably high MSE). The GA learned with the highest accuracy but required an unreasonably long amount of time to learn. PP search also tended to require too much time and did not perform as accurately as the GA algorithm. The most promising algorithms for MNN learning were found to be the FP and NM algorithms since they were learned with impressive accuracy and in short enough time to be practical. On average, the NM algorithm seems to be a bit faster than FP and is the most resistant algorithm to the MNN system noise by far. Thus, the NM algorithm appears to be the most suited for practical MNN learning applications.
Acknowledgment
We thank program officer Byung “Les” Lee for his generous support.
Funding Data
Air Force Office of Scientific Research (AFSOR) FA9550-18-1-0459 (to J.B.H.).
Air Force Office of Scientific (AFSOR) Research FA9550-22-1-0008 (to J.B.H.).
Conflict of Interest
There are no conflicts of interest.
Data Availability Statement
The datasets generated and supporting the findings of this article are obtainable from the corresponding author upon reasonable request.