We compare the performance of human players against that of the efficient global optimization (EGO) algorithm for an NP-complete powertrain design and control problem. Specifically, we cast this optimization problem as an online competition and received 2391 game plays by 124 anonymous players during the first month from launch. We found that while only a small portion of human players can outperform the algorithm in the long term, players tend to formulate good heuristics early on that can be used to constrain the solution space. Such constraining of the search enhances algorithm efficiency, even for different game settings. These findings indicate that human-assisted computational searches are promising in solving comprehensible yet computationally hard optimal design and control problems, when human players can outperform the algorithm in a short term.

## Introduction

Optimal design problems can still challenge our computational ability to solve them. A case in point is the design of an electric vehicle combining topology (configuration) design, proportional design, and control design for overall system optimization [1]. This problem is nondeterministic polynomial time (NP) complete, not amenable to an all-in-one solution and with a disjoint feasible domain. Other computationally hard design problems include material synthesis, drug design, and mechanical, electrical, and structural topology design [2–7].

More recently, “human computation” [8] has been reported as a promising alternative to solving tough optimization problems. A seminal experiment was the Foldit [9] game, which gathers large numbers of nonexpert players (a “crowd”) to perform protein structure prediction by minimizing the protein's energy with spatial rearrangement of its structure. Since its launch in 2009, Foldit has attracted 300,000 players and has shown an advantage of human spatial reasoning for pursuing near-optimal solutions [10]. A follow-up study showed that the most popular folding strategy derived from the crowd is comparable in performance to an expert-developed computer algorithm for protein folding [11]. Inspired by this success, Lee et al. tackled the challenge of ribonucleic acid (RNA) synthesis, a combinatorial optimization problem [12], with the eteRNA experiment [13] showing that a human crowd with no knowledge of the underlying science improved their problem solving skills, outperformed existing computer algorithms, and even contributed knowledge to the creation of a more effective algorithm. Le Bras et al. proposed a way to expedite search in combinatorial problems by identifying the optimal values for a subset of variables through human–computer interactions [14]. They demonstrated that a material discovery problem can be cast as an online game solved visually by human players. The Phylo game is another example that utilized human pattern recognition capability in solving an NP-hard optimization problem called multiple sequence alignment, where players are set to identify similar genome sequences of animals by moving colored blocks around [15].

These results indicate the potential of using human intuition (sampling good design solutions [10,12] and recognizing visual patterns [14,15]) and human intelligence (learning and generating design rules [11,12]), for solving challenging optimization problems or enhancing numerical optimization solvers. Gamification, i.e., using games purposefully [16], is one way to engage large numbers of human participants in such tasks. The above studies do not provide analysis on whether investment in gamification is more cost-effective than improving existing computer algorithms. However, it appears that such “citizen science” can be valuable when (a) a group of trained human solvers can be gathered and maintained, and (b) efficient solution of a *large number* of problems of similar mathematical nature is desired. At this point, the implementations of these existing games are case-dependent, and whether their individual success can be replicated in solving other problems remains an open question [17].

These successes and doubts regarding human computation motivated this paper. How much can gamification offer in solving computationally hard design optimization problems?

This paper investigates the potential benefits of using human crowd computation through games designed to address the aforementioned electric vehicle design problem. The basic idea is to “learn” (in computer science terminology) the problem's constrained solution space from successful and failed player attempts, and to search adaptively for a near-optimal solution within this constrained space. This idea is an extension of apprenticeship learning (see Ref. [18] for an example), where the machine apprentice follows or explores similar actions as the expert master.

The rest of the paper is structured as follows: We introduce the motivating problem of vehicle powertrain design and control in Sec. 2 and its gamification in Sec. 3. In Sec. 4, we look into computational solutions to the game problem with and without using information from human players. Section 6 compares performance of human players against the computational solution, and discusses the use of human computation as well as some technical details of the experiment. We conclude with Sec. 7.

## Motivating Problem

A growing number of civil [19–22] and military [23–25] vehicle applications are being considered for electrification due to ever restrictive emission reduction and energy security requirements. With different driving conditions (speed and power demands), vehicle optimal powertrain design and control strategies can be quite different [26]. Therefore, efficient design automation is desired to identify optimal solutions (i.e., the powertrain design and the control strategy) given the input vehicle specifications and driving conditions.

Fathy et al. [27] and Peters et al. [28] investigated the combined design and control problem, and showed that ignoring the coupling between design and control yields suboptimal solutions. One approach to handle the coupling is to formulate an all-in-one optimization problem to include both design and control [29,30] problems in a single one; another approach is to nest optimal control within the design problem. This latter approach is more commonly used in hybrid electric vehicle configuration studies [1,31–33], where existing optimal control algorithms such as dynamic programming (DP) [21,34], Pontryagin's minimum principle (PMP) [35,36], and equivalent consumption minimization strategy (ECMS) [37] can be applied. The DP method finds the control strategy with globally optimal energy efficiency by discretizing the state space and using optimal control to find a shortest path; PMP and ECMS have lower computation cost but can find only local optimal solutions. They require some model simplifications to guarantee global optimality [36]. The computational cost of DP grows exponentially with the size of the state space and the number of time steps [38]. The search for the optimal powertrain design and control policy can therefore become expensive, especially when a large variety of topology design candidates exist.

Optimal control algorithms commonly used for vehicle control require full information of the mechanical and electrical behavior (models), as well as future driving conditions, i.e., probability distributions of speed and power demand. When this information is (partially) unknown, learning mechanisms are required to guide the control strategy. In particular, when repeated trials are allowed, such as during the design phase, apprenticeship (imitation) learning algorithms [18,39] are developed for the controller to follow sample trajectories demonstrated by human experts. This paper is different from existing reinforcement learning studies in two aspects: (1) While existing work either utilizes an existing human expert [39] or learns a strategy (perhaps implicitly through a reward function) natural to human beings [18], here we examine a problem that is hard for individual players and is solved through crowd competition; (2) rather than developing a learning algorithm, here we focus on understanding the efficacy of human computation.

## The EcoRacer Game

We introduce the ecoRacer game [40] that translates the powertrain design and control problem into a crowd competition. The game asks players to drive an electric car through a given track. Unlike traditional racer games that compete on speed, players must spend as little energy as possible to complete the track in 36 s^{2}. To achieve this goal, players must employ good control strategies in combination with selection of a final drive ratio—a design variable. The final drive ratio range is set between 10 and 40. A higher ratio will result in a larger output torque to the front wheels but lower maximum vehicle speed. After each completion of the track, the battery energy consumption is submitted as the player's score, and players are notified about their current ranking among all competitors. The leading scores along with their final drive ratios are also shown to the players. These features follow existing trendy games and we hypothesize that they will trigger competition among players, leading to more plays of the game. However, the hypothesis is not tested in this paper. Game features are summarized in Fig. 1.

The control involves only acceleration and braking to ensure that the game is intuitive and easy to play on mobile platforms and for all audiences, including those with no driving experience. The controls are presented to the player at the beginning of the game. Energy consumption (regeneration) by acceleration (braking) is visualized on a battery bar. The Chipmunk 2D physics engine [41] is incorporated to model the track, car body, motor, wheels, and suspensions, with supplied motor efficiency map, maximum and minimum torque curve, and other vehicle parameters. The physics engine simulates the scene once every 1/48 s, allowing precise control of the car by the player. For mobile devices with lower computing capability, this high frequency simulation may lead to slower car movement. Since the clock counts in real-time, the players on these devices become less competent. In order to address this issue, we defined “one second” in the game as the time spent on completing 48 simulation calls.

## Computational Solutions of the EcoRacer Game

In order to benchmark the performance of human players, this section provides two computational solutions to the ecoRacer game. The first considers the game as a nested optimization problem. The outer loop searches for an optimal final drive ratio, while the inner loop solves a DP problem to find the optimal control policy for the given design. Through a nearly exhaustive search of the discrete control and design space, we can obtain a solution close enough to the true global optimum. We should emphasize that this approach requires all models and parameters that constitute the game to be known. In contrast, the second approach treats the game as a black-box function that inputs control and design variables and outputs a score. The black-box function is learned based on existing trials through metamodeling (response surfaces). This latter approach is closer to how we usually solve optimization problems where simulations or experiments are involved; it is also a fair comparison to human players as both the machine and humans learn and improve through playing the game.

### Optimal Design and Planning With Full Knowledge.

*P*

_{batt}be the instant battery power consumption, $\omega mot(t)$ and $Tmot(t)$ the motor speed and torque at time

*t*,

*v*is the vehicle speed and,

*a*is the vehicle acceleration. Also, denote by

*t*the time spent on finishing the race, $x(tf)$ the distance covered at

_{f}*t*,

_{f}*x*

_{final}the total track length,

*v*

_{max}the maximum speed limit, and

*a*

_{max}and

*a*

_{min}the limits on the vehicle acceleration. The objective is to minimize the battery consumption denoted by

*E*

_{batt}with respect to the final drive ratio

*ρ*and the control of $Tmot(t)$, while completing the race within the time limit

*t*

_{max}. Note that

*t*is not a free variable but it depends on the decision variables

_{f}*ρ*and $Tmot(t)$. This minimization problem can be formulated as follows:

where *R*_{tire} is the wheel radius, *J*_{wheel} is the wheel inertia, *M*_{veh} is the vehicle mass, and *F*_{res} is the resistive force, including both road resistance and extra load from road inclination.

We solve the energy minimization problem in Eq. (1) using a nested formulation described below.

#### Inner Loop Control Problem.

*N*= 180 equal steps, each corresponding to 5 m. This setting provides an accurate enough solution while keeping a single call of the DP algorithm computationally affordable. We use the horizontal position and speed of the vehicle as state variables, and the change of speed, $\Delta v$, as the decision variable. At each step

*i*, $\Delta v$ can take three values that correspond to the three actions players can take during the game: $\Delta vmax,i$ for acceleration, $\Delta vmin,i$ for brake, and 0 for no action. The values of $\Delta vmax,i$ and $\Delta vmin,i$ are determined by the motor torque limits based on the current state. To incorporate the time constraint, we first calculate time cost along with energy consumption for each decision at every state, and then treat the violation in the total time span as a penalty in the DP objective. With these assumptions, the DP formulation in Eq. (1) for given

*ρ*becomes

where $\lambda \u22650$ is penalty weight. Larger *λ* results in smaller *t _{f}*. We use the bisection method to find the minimum

*λ*that satisfies the time constraint. Note that this requires solving the DP problem multiple times.

#### Outer Loop Design Problem.

The outer loop design problem searches for an optimal final drive ratio that minimizes battery energy consumption. The problem is discretized since players are only allowed to choose from a discrete set of final drive ratios. Starting with three initial *ρ* values, we fit a quadratic function to approximate the fuel consumption and iteratively minimize this function. We obtain convergence when two subsequent iterations give the same optimal solution. For $\rho >20$, no *λ* value yields a control solution to satisfy the time constraint due to low vehicle maximum speed caused by high final drive ratio. The optimal design $\rho *=18$ is identified with 47.8% battery state of charge (SOC) upon finishing the track. Depending on the initial set of final drive ratios selected for the outer loop, the inner loop is called four to seven times with each call taking 1.7 to 2 hrs to converge to a *λ* that satisfies the time constraint for the given final drive ratio^{3}. Figure 2 summarizes the optimization results. As validation, we tested the same strategy using the game engine and obtained a final battery SOC of 43.4%. The difference between the game engine and the DP calculation is due to (1) the discrepancy between the physics engine and the DP model, e.g., vehicle jumping could happen in the game but is not modeled in DP, and (2) the discretization scheme involved in the DP solution. A finer discretization will reduce the difference with increased computation time.

### Optimal Design and Control Through EGO.

We now discuss the EGO algorithm that iteratively learns a good design and control strategy without relying on the settings of the game.

#### Control Parameters.

*c*, along with the following four states: (1) track slope

*s*(“1” for uphill, “−1” for downhill and “0” for flat ground), (2) remaining distance

*d*, (3) remaining time $t\xaf$, and (4) vehicle speed

*v*. We assume that the control strategy can be parameterized by some vector

**w**, so that, given

**w**, the control signal can be determined by the states

where $u(\xb7)$ is a mapping from the joint space of states and control parameters to the control signal. Denote the space of **w** as $W$, which represents a subset of all control strategies that a human player can deploy. The formal determination of $u(\xb7)$ and $W$ using human data will be discussed in Sec. 5. For elaboration on the search algorithm, it suffices to consider $W$ as a bounded vector space and $u(\xb7)$ as a bounded function defined on $W$.

#### The Score.

*ρ*and control parameters

**w**, the game can be simulated to output the battery charge consumed, denoted by

*e*where $0\u2264e\u22641$, and the remaining distance from the terminal, denoted by $dend$. We define the following score as the objective to maximize for optimization:

where $1(\xb7)$ is an indicator function that returns 1 when its argument is *true*, or 0 otherwise. A successful play where $dend=0$ will output the final SOC that is $(1\u2212e)$ while a failed play outputs the negative remaining distance. This objective favors the completion of the track, but also avoids evaluating failures indifferently.

#### The EGO Algorithm.

We can now employ the EGO algorithm, a search routine suitable for optimizing a black-box function defined on a continuous and bounded design space [42]. Denote the solution space as $S:=W\xd7D$, where $D=[10,40]$ is the one-dimensional design space. The algorithm starts by sampling $S$. It then creates a kriging model using the initial samples and their responses. The next sample is chosen to maximize the expected improvement of the objective. Then, the kriging model is updated by incorporating the new sample and its response. The modeling and sampling procedure is repeated until some termination criterion is met. Figure 3 summarizes the EGO procedure. The nested maximum expected improvement problem is solved using a genetic algorithm.

## Augmented EGO Search Using Human Plays

We now discuss parameterization of human plays and how the EGO algorithm can be improved based on these plays.

### Parameterization of Human Plays.

Recall that each play can be represented as a sequence of five-tuples with control signals and corresponding states. To avoid over-burdening the server, we collect these five-tuples at each unit distance during the play. We denote the total number of unit distances by *N* and each recorded play as ${ci,si,di,t\xafi,vi}$, for $i=1,\u2026,N$^{4}.

**w**represents the control strategy to be optimized along with the final drive ratio. For optimization purpose, it is ideal to find a low dimensional $W$ (and thus $S$) where

**w**s can be found to preserve the recorded control signals according to Eq. (4). In practice, we manually tune the definition of $u(\xb7)$ so that replaying the best human play, i.e., with the highest recorded score, using the converted control parameters will yield a score close to the original one. This manual process leads to the following choice of $u(\xb7)$:

**w**are chosen by minimizing the discrepancy between the true control signals and those derived from Eq. (4)

We then replay all human plays using the converted control parameters along with their corresponding final drive ratios to update scores.

Note that instead of handcrafting the function $u(\xb7)$, we could also use a neural network to learn the mapping between states and control signals, and later consider network parameters as control parameters. However, an arbitrarily defined network could introduce a larger number of control parameters to achieve a good fit to the data, and thereby create a larger search space for the EGO algorithm. For the same reason, a nonlinear kernel (with infinite dimensions) is not used.

In addition, the EGO algorithm requires bounds to be specified for the space of control parameters. Here we identified that the control parameters of the best human play are bounded in [−3, 3]. Therefore setting $W:=[\u22123,3]9$ will ensure that the best human solution is available to EGO.

### Classification of Human Solutions.

The parameterization step described above leads to a data set describing all human plays and their corresponding scores. To extract knowledge from these data, we create a one-class classifier for plays with positive scores. The classifier, denoted as $\varphi ([\rho ,w])$, predicts whether the input solution is likely to succeed ($\varphi >0$) or fail ($\varphi \u22640$), and can be used as a constraint during the search. The classifier is built using LIBSVM [43] with a Gaussian kernel. The kernel parameter *γ* is set to 0.1 and the training error parameter *ν* to $10\u22126$. Our rationale for training the classifier with all plays with positive scores rather than the few with top scores is that the former represent a broader range of strategies that *finish* the track. The resultant classifier will thus represent a more relaxed constraint in the solution space than the one derived from only the top plays, offering solution strategies that could be more generally applicable to the ecoRacer game with game settings different from the current game. To empirically demonstrate knowledge learned by the classifier from human plays, we examine two basic control policies that should be applied universally to all game settings: (A) If the vehicle has zero speed when going uphill, it should accelerate in order to complete the track; and (B) if the vehicle is approaching the terminal with a high speed, it should restore energy by braking.

To verify that $\varphi >0$ captures these two, we uniformly draw 10^{6} samples from $S$. For each sample solution, we calculate its control signals under a variety of states specified in Fig. 4. For instance, for rule A, we fix the vehicle speed to zero and the slope to “uphill” while sweeping the other two states across all their feasible values. We consider each solution and states pair as a test point. The percentage of test points that give the correct control signal according to the two rules can be calculated among all samples or among the ones that satisfy $\varphi >0$. Results summarized in Fig. 4 show that incorporating $\varphi >0$ into the search offers a significantly higher chance of sampling reasonable solutions. The set $S$ consists of a large number of irrational control strategies, e.g., braking while running out of time, or depleting energy to rush through the track. According to the one million samples, the subspace of $\varphi >0$ accounts for only 0.02% of the entire $S$. We note that while certain heuristics (e.g., rule A) can be manually coded into the search by the algorithm designer, others cannot. Taking rule B as an example, while human players would decelerate toward the terminal to regenerate energy, the relationship between the timing of braking and the vehicle states (e.g., speed and distance from the terminal) is difficult to formulate explicitly, especially when neither the environment nor the vehicle model are known.

## Experiment and Results

We summarize the experiment with human players and compare their performance with that of the EGO algorithm. We then apply the classifier learned from human plays to the same energy optimization problems with different track settings.

### Comparison Between Human Players and the Algorithm.

The ecoRacer game was announced on Facebook and WeChat on October 14th, 2014, and introduced to a sophomore engineering design class at the University of Michigan on November 4th, 2014. The statistics and analysis in this paper are based on data received before November 26th, 2014. The development team was advised not to play the game during this period. A total number of 124 unique participants registered and played the game, with 2391 plays recorded. The best play (by user “ikalyoncu” on his or her 150th trial) reached a score of 43.2%, only marginally worse than the DP solution of 43.8%. In addition, this player identified the actual optimal final drive ratio of $\rho *=18$. Statistics of the game are summarized in Fig. 5. Raw player inputs and corresponding scores can be accessed from^{5}.

In the same figure, we compare human performance, colored by player IDs, against that of EGO in the first 200 plays. The EGO implementation starts with five initial samples uniformly drawn from $S$ in the first iteration. Data from five independent EGO runs are collected due to the probabilistic nature of the initial sampling and the genetic algorithm employed to solve the maximum expected improvement problem. For both human and EGO plays, we also plot the averaged best scores. For human players, the calculation at each iteration only considers those who reach that iteration. Lastly, since all failed human plays are stored with scores of zero, we assign failed EGO plays with zeros for fair comparison.

Major findings from the experiment are as follows: (1) One can see that most players quit early and thus are outperformed by the computer in the long run. Only three players have higher scores than the computer after 200 plays. In addition, the fact that only 41% of players (51 out of 124) played more than 10 times suggests that the game is either too hard or not interesting enough for the participants. (2) For dedicated players who kept playing, however, their progress along trials is evident, especially when compared with the EGO results: While the EGO algorithm is designed to balance exploration of the space and exploitation of observations, the ten-dimensional solution space as well as the rugged objective function prevented the algorithm from converging to a local solution in 200 iterations. Although the algorithmic parameters (such as the Gaussian parameter and the merit function form) can be tuned to improve EGO's performance on a specific problem, dedicated human players provided an alternative way to search with no parameter tuning. Note that the bimodal distribution in scores toward 200 plays, i.e., high scores and zeros, is due to the nature of the game that high scores are achieved when one can finish the track nearly on time by spending just enough energy. Our interpretation is that human players gradually learned this rule, trying to improve their scores while risking failing the game.

These data, along with feedback from some of the players, provide us the qualitative understanding that human players are capable of learning quickly and creating good solutions, but only a few can optimize their solution by precisely executing near-optimal control strategies.

### Search With and Without Crowdsourced Knowledge.

The comparison between the EGO algorithm and the players alone showed limited advantage of the latter in finding a good solution in the long run, largely due to the fact that only few players have the necessary persistence to fine tune their strategy by playing repetitively. However, as we show in Fig. 6, the method introduced in Sec. 5.2 allows us to turn player data into a constraint in the solution space and to enhance the search for games that were not played by humans.

Specifically, we tested four different tracks to investigate whether the learned knowledge is transferable or scalable. These are the “inverse,” “hill,” “zigzag,” and “long” tracks. The inverse track is a flipped version of the original track. The long track duplicates the original track and the energy capacity five times. For each track, we run the EGO algorithm with and without enforcing the constraint $\varphi >0$ during the search. In the computation, a penalty of $106\varphi $ is added to the score if the solution is infeasible ($\varphi <0$). In each case, the algorithm starts with five initial samples and terminates after a total of 200 samples. Since $\varphi $ is created using a support vector machine, the initial samples are randomly chosen among the support vectors when the constraint is incorporated, or otherwise randomly sampled from $S$.

Figure 6 compares performance from the two algorithm settings, using the average values and standard errors from five independent EGO runs for each case. The result shows that while EGO without human plays can identify good solutions in the long run, the algorithm can gain a significant early advantage using heuristics extracted from human plays. Further, knowledge learned from human players is transferable to different track settings and scalable to longer tracks, offering a promising solution to the “curse of dimensionality” [38].

One may naturally then question whether the EGO algorithm could learn from its own plays for future improvements and whether collecting all 2391 human plays is necessary to achieve good heuristics. While this paper does not address these questions theoretically, we used the long track to show that for the same small set of human and EGO plays, the former could lead to better future search; with enough plays, the two will have a similar contribution to future search. Figure 7 compares the EGO performance with four different constraints generated from (1) all human plays, (2) all EGO plays, (3) the first 500 human plays, and (4) the first 500 EGO plays, using the original track. The results show a significantly worse performance with the constraint learned from the first 500 EGO plays, while learning from all EGO plays achieves performance comparable to that from all human plays.

Lastly, learning from too small a set of human plays could diminish search performance. Applying a constraint learned from the first 100 human plays, the averaged score for the long track after 200 iterations is 39, significantly lower than that from the previous experiment (Fig. 6) with no human plays (60). A closer investigation shows that the constraint classifies all best solutions discovered from unconstrained EGO runs as infeasible. In other words, a small amount of plays with low successful rate could result in a biased conclusion and reject potential solutions, leading to a search algorithm that is worse than one with no prior knowledge. To further verify, we created five classifiers $\varphi \u0302100,200,\u2026,500$ from the first 100 to 500 plays and checked whether they can correctly identify the best solutions (obtained from unconstrained EGO runs) for all tracks as feasible. Results are consistent across all tracks. The best solutions discovered from unconstrained EGO runs are identified as feasible only after 400 or more plays are considered.

To understand this finding better, we calculated the success rate of human players, i.e., the rate of finishing the track with nonzero remaining energy, for the first 100–500 plays. These rates are 10.0%, 19.0%, 29.3%, 42.3%, and 38.0%, respectively. We can then hypothesize that human plays become useful when the players achieve a certain level of success rate, i.e., discover good but not necessarily optimal solutions. However, the rate drops when new players enter with unsuccessful plays. With an increasing number of plays, we observe not only an improvement in the success rate in general, but also in the score. Therefore, determining the number of plays needed for generating a useful EGO constraint based on success rate could be conservative.

### Discussion.

The experiment and simulation studies presented here help to answer questions regarding the use of human computation for engineering design.

*Why would humans be useful in an optimization task?* Human computation, often in the form of crowdsourcing, has mainly been successful in batch tasks that rely on human intuition rather than computational resource, see Ref. [8] for examples. Nonetheless, human computation becomes especially beneficial when intuitive tasks are actually computationally expensive. As demonstrated by Foldit and eteRNA, some human players have excellent ability at solving spatial optimization problems, as their search efficiency outperforms that of computer algorithms. The ecoRacer game, similarly, is based on the hypothesis that some players are well trained at tuning their control strategy and design through trial-and-error. The hypothesis is reasonable considering that people around the world spend 3 × 10^{9 }hrs training themselves in online games every week [17], and that a large portion of the games requires players to perfect their control. As shown in Fig. 5, some players learned good strategies much earlier than the algorithm did. However, the underlying optimization problem must be well translated (and camouflaged) by the game so that people can appreciate the problem and enjoy the fun without facing the actual engineering problem. This requirement leads to the next question.

Is it worth expending the effort of translating any particular problem and even building a game for it, in the hope that some talented people could solve it? Human computation experiments require significant effort to create the proper game environment and to establish sophisticated competition, collaboration and rewarding mechanisms. These efforts are necessary for continuously growing the player population in order to increase the chance of finding talented players. The efforts are not likely to be worthwhile if the underlying problem needs to be solved only once. Given sufficient computational resources, the computer algorithm is likely to outperform human players either by iterating long enough (see Fig. 7 in Ref. [11] for an example) or by solving the problem via brute-force, e.g., generating the entire tree of states and costs for DP^{6}. Protein folding [10], RNA synthesis [12] and powertrain design for vehicles of particular usage may be tough enough problems requiring repeated solution and thus worthy of human computation investment. In addition, we should emphasize that extracting heuristics from human players and applying them to new problem settings, as demonstrated in Refs. [11] and [12] and this study, could be of significant importance in solving problems facing the curse of dimensionality. As noted in Sec. 6.2, the machine can also learn from its own experience to effectively constrain its search in the long run. Therefore, human computation has an advantage when human players can outperform the algorithm in the short run. The short-term advantage of human search is that it may capture prior knowledge about the problem that a machine does not acquire. Therefore, future work should investigate the extraction of such human prior knowledge from a limited number of effective solutions from players.

In summary, while we cannot yet compare the cost effectiveness of human computation versus optimization algorithms, human computation can be valuable for (i) problems comprehensible to people without specialized knowledge, with evidence suggesting that humans find solutions based on their experience that are not discovered by an algorithm; and (ii) problems that must be solved many times with different parameters, so that long-term gains can compensate for the initial game infrastructure investment.

Given that players are more likely to spend their spare time on playing regular games than solving scientific or engineering puzzles, is human computation a realistic alternative? For example, while Foldit attracted 300,000 players, there are hundreds of millions of active users of Angry Birds [17]. Given this reality, even if human computation is proven viable for solving computationally expensive design optimization problems, would there be enough players per problem for the strategy to be effective? Our experiment, as well as existing reports on human computation, showed that the problem can be efficiently solved by a small population of core players. In the case of ecoRacer, it was those players who contributed the successful plays. This finding implies that while the crowd attracted by any particular problem will not necessarily be large, these self-motivated participants may suffice to produce valuable data to expedite problem solving.

A related question is how challenges of translating and camouflaging the problem would be addressed for design problems more complicated than the ecoRacer. We believe that these challenges may not relate to complexity of the design problem, e.g., large problem size, nonconvexity, or lack of models. Existing problems, such as in Foldit, eteRNA, and in this paper, are complicated (NP-complete or NP-hard), yet easily understood by most players. Rather, the challenge is to make the design problem naturally fun to play. Therefore, a significant amount of effort may be needed to calibrate game mechanisms so that the crowd is incentivized, if the problem does not by itself attract players. We have not yet studied whether such mechanisms are effective across problems, regardless of problem complexity.

Some technical issues of the presented ecoRacer experiment are discussed below:

- (1)
From a reinforcement learning perspective, the EGO approach can be categorized as “Direct Policy Search” where the entire simulation is treated as a black-box. All comparisons between EGO and human plays are therefore limited to our current algorithm design. Other popular direct methods such as NEAT [44], as well as indirect algorithms such as actor-critic [45] should be tested and their integration with human play data should be investigated.

- (2)
Converting user control signals to control parameters

**w**caused discrepancy in the resultant scores. This is because the bases in the mapping $u(\xb7)$ are chosen so that the score derived from*the best*play's control parameters is close to its actual score. Therefore, the classifier $\varphi $ is not derived directly from the genuine player data. This discrepancy can be reduced by choosing a better set of bases for $u(\xb7)$ that closes the gaps between*each*play's original score and that from the converted control parameters. - (3)
We used a one-class classification that solely extracts knowledge from successful plays. An alternative approach would be to take failed plays into account and derive a binary classifier. It is also possible to create the classifier by utilizing the actual scores, as opposed to the binary “failed” and “successful” labels.

- (4)
The motivation for showing the best scores as well as the drive ratios is to give players an incentive and some hints to further improve their solutions. We believe that such game mechanism is necessary to motivate enough players to participate. However, this mechanism causes the human computation (the game) to be different from the EGO algorithm: (i) The best scores allow players to measure how close are their solutions, based on which they may either fine-tune their strategies or choose to explore new ones; (ii) the displayed drive ratio allows newcomers to quickly converge to a good drive ratio without much exploration, providing them a good initial drive ratio for their search. However, even with information sharing, most players are still out-played by the computer in the long run. In addition, we observe no convergence to the optimal drive ratio value in the first 500 game plays from the crowd. Therefore the conclusions in Sec. 6.2 are valid even though the mechanisms of the game and the EGO algorithm are slightly different.

## Conclusion

We examined the value of incorporating game-based human computation in solving optimization problems in the context of an optimal powertrain design and control problem. The results showed that while only a small portion of human players outperformed the EGO algorithm, useful heuristics can be extracted from the recorded plays and effectively applied to problem settings that were not presented to the players. This indicates the promising use of human computing in transferring knowledge learned from human-comprehensible problems to similar ones of larger scale or difficulty, to achieve scalable and effective search. The findings from this paper offer useful insights for future attempts at solving computationally expensive engineering optimization problems through human computation. In future work, it would be interesting to investigate if useful information can be extracted from the evolution in human plays, and how decomposition strategies involving problem partitioning and coordination that have been effectively applied to numerical solvers can be incorporated within a human computation framework. With a larger crowd of participants, we could also investigate the relationship between the crowd size and its performance. Lastly, it is valuable to explore game mechanisms that encourage players to improve their strategies in more efficient ways.

## Acknowledgment

This work has been supported by the National Science Foundation under Grant No. CMMI-1266184. This support is gratefully acknowledged. We thank anonymous reviewers for insightful critique that helped improve this presentation significantly. We also thank members of the Optimal Design Laboratory at the University of Michigan for suggestions and advice on game design and experimentation, and all our players for making this paper possible.

This time limit is calibrated empirically so that the game is challenging enough to encourage replays, but not too difficult to drive new players away.

This simulation is performed using a desktop computer with Intel Xeon E5-2620 CPU clocked at 2.10 GHz and 128 GB RAM.

The scores of replays using these recorded data are not the same as the original scores from players. This is because, while the game takes player inputs once every 1/48 s, we only record a subset of these signals at discrete distance values.

While finding a shortest path can be done in polynomial time, the cost of generating the graph could grow exponentially with respect to the track length.