Crowdsourcing is the practice of getting ideas and solving problems using a large number of people on the Internet. It is gaining popularity for activities in the engineering design process ranging from concept generation to design evaluation. The outcomes of crowdsourcing contests depend on the decisions and actions of participants, which in turn depend on the nature of the problem and the contest. For effective use of crowdsourcing within engineering design, it is necessary to understand how the outcomes of crowdsourcing contests are affected by sponsor-related, contest-related, problem-related, and individual-related factors. To address this need, we employ existing game-theoretic models, empirical studies, and field data in a synergistic way using the theory of causal inference. The results suggest that participants' decisions to participate are negatively influenced by higher task complexity and lower reputation of sponsors. However, they are positively influenced by the number of prizes and higher allocation to prizes at higher levels. That is, an amount of money on any following prize generates higher participation than the same amount of money on the first prize. The contributions of the paper are: (a) a causal graph that encodes relationships among factors affecting crowdsourcing contests, derived from game-theoretic models and empirical studies, and (b) a quantification of the causal effects of these factors on the outcomes of GrabCAD, Cambridge, MA contests. The implications of these results on the design of future design crowdsourcing contests are discussed.
During the past decade, crowdsourcing has been widely used as a way to facilitate open innovation where ideas from outside an organization are sought to supplement internal activities . In crowdsourcing, a problem is outsourced to a large group of people, predominantly online communities. In a specific form of crowdsourcing, prizes are announced along with a problem, the crowd submits solutions to the problem, and the best solutions are awarded the prizes. This form of crowdsourcing is called a crowdsourcing contest.
There are many decisions to be made in setting up crowdsourcing contests , for example, selecting among (i) single-stage versus multistage tournament, (ii) open entry versus restricted entry versus entry fee, (iii) a grand competition for the entire system versus multiple smaller competitions for subsystems, (iv) fixed prize versus auctions, (v) winner-takes-all versus multiple-prizes, (vi) alternate prize amounts and the duration of the contest. These decisions affect the outcomes of contests, which can be measured in terms of quality of solutions, number of contributors, amount of effort invested by participants, overall cost of running the contest. For example, higher prize amount attracts greater participation, and participation by a large number of individuals indicates a greater probability that someone in the participant pool will be able to solve the problem.
Overall outcomes of contests are driven by individuals' decisions such as whether to participate or not, and the amount of effort and cost to invest in. Individual decisions are, in turn, dependent on decisions made by other participants, and various factors related to the problem, the design of the contest, the sponsor, and the individual. For instance, individuals are less likely to participate if many other experts are also participating, because the chances of winning are low [3,4]. Individuals are more likely to participate if the problem is simple, or within their area of expertise. Similarly, individuals are more likely to be motivated to participate if the contest is sponsored by reputed organizations. The interdependence between these factors and their effects on the outcomes are shown in Fig. 1.
Understanding the interdependence between these factors is essential for effective design of crowdsourcing contests. Thus, our objective in this paper is to understand how the outcomes of crowdsourcing contests are affected by sponsor, contest, problem, and individual-related factors.
Review of Relevant Research.
Research on crowdsourcing in the engineering design community has focused on demonstrating the benefits of crowdsourcing in design-related tasks (e.g., idea generation and evaluation), and developing ways to improve the effectiveness of crowdsourcing in design. For example, Burnap et al.  present a method to identify experts in a crowd for design evaluation tasks. Green et al.  use crowdsourcing to evaluate creativity of design using university students as subjects. Their results indicate that it is possible to evaluate originality reliably using a crowd of nonexperts. Gerth et al.  present a hypothesis that task factors and individual expertise affect the quality of solution. The hypothesis is tested using a multi-agent simulation for data generation and machine learning techniques for data analysis. Kudrowitz and Wallace  crowdsource a task of evaluating a large set of ideas on Amazon Mechanical Turk , where the ideas are evaluated based on metrics such as creativity, novelty, clarity, and usefulness. While these studies demonstrate the usefulness of crowdsourcing for engineering design tasks, they do not analyze the effects of different factors (including competition) affecting the outcomes of crowdsourcing.
Contests have been extensively studied in economic sciences using game theory [3,4,10] to analyze the effects of contest factors (e.g., prize, number of prizes, and entry fee) on participants' decisions. Considering individual participants as the actors in contests, game-theoretic models theoretically evaluate participants' decisions and their dependence on contest-related factors. For example, models by Taylor , Che and Gale , and Szymanski  are used to analyze the effects of different incentive structures on individuals' decisions, and the resulting outcome quality in innovation contests. Although game theoretic models of innovation contests are helpful in establishing the influence of contest-related factors on the outcomes, and can be adapted for engineering design, they are typically based on a number of assumptions about the problem type, information available to the individuals, and simplifications about the nature of the design process. As an example, in Ref. , the innovation process is modeled as a random draw from a known distribution. Depending on the assumptions, the predictions from the models can be substantially different (even contradictory, in some cases). Further, due to the complexity of interactions between different factors (as shown in Fig. 1), game-theoretic models typically fix all the factors, except for a small subset. Therefore, it is difficult for contest designers to determine which model is an appropriate representation of their specific situation.
Laboratory experiments and field studies are two commonly used methods for empirical research on crowdsourcing to observe participants behaviors. In laboratory experiments, all aspects of experiment setting such as participants' identity and background, task and problem context, and duration are controlled. The experimenter can also control information provided to participants. In contrast, field studies are conducted on online platforms, and offer partial or no control over experiment settings. Participation in field studies is open to all or group of subscribers of respective platforms.
Laboratory experiments have been used to assess the validity of predictions made by game theoretic models, where participants (typically, university students) are asked to solve well defined tasks [12–14]. All other factors are controlled in such a way that the correspondence between assumptions in the game-theoretic models and experiments is maintained. The advantage of using laboratory experiments is that they allow causal inferences by varying one factor at a time. However, the limitation (as with game-theoretic models) is that it is difficult to establish whether idealized settings used in experiments are representative of real engineering design tasks.
At the other end of the spectrum, field data from real crowdsourcing platforms are used to analyze the relationships among factors affecting crowdsourcing, and the resulting outcomes. For example, researchers [15–17] have observed the effects of various factors such as the nature of the problem and the associated uncertainty, participants' expertise, resource availability, and competition on the outcome quality and participation. Li et al.  show that the length of problem specification and software complexity (measured in terms of inheritance, coupling, and cohesion) affect the software quality on TopCoder.com platform. These studies, however, are restricted to either software development problems on the website link1  or microtasks for problem solving on Amazon Mechanical Turk . Using field data in research has additional methodological challenges including lack of control and lack of observability of information (e.g., effort invested by participants and cost incurred by them), which results in the inability to attribute the outcomes to specific factors. Because of the lack of control, field data, by itself, can only be used for correlation analysis but not for making causal inferences, which is necessary for understanding how outcomes are affected by various factors.
In summary, engineering design literature is focused on identifying ways to utilize crowdsourcing for design-related tasks but lacks the analysis of participants' decisions under competitive settings. On the other hand, literature on game-theoretic models and empirical studies has focused on analyzing decisions in contests but the relevance of assumptions to engineering design situations is not clearly established. Laboratory experiments provide limited insights because of limited representativeness of the tasks and subject pools, whereas analysis of field data is challenged by lack of control. In this paper, we address the lack of understanding about the relevance of game-theoretic models to design crowdsourcing, and investigate how different factors affect participants' decisions in real engineering design crowdsourcing contests.
In order to maintain representativeness of real engineering design crowdsourcing contests, we use field data from crowdsourcing contests. To address the challenge of lack of control, our approach is to use the theory of causal inference , which helps in addressing the trade-off between control and realism. The theory of causal inference has formal ways of handling confounding effects if the factors are explicitly modeled in the causal graph. Using this theory, we can quantify the impact of controlling different factors even from field data under assumed causal dependencies between such factors. Using this theory as a foundation, our approach in this paper consists of three steps: (i) evaluating the relevance of assumptions in game-theoretic models to engineering design to identify predicted relationships among factors, (ii) synthesizing the predictions from game-theoretic models and existing empirical studies (laboratory and field experiments) into a single causal graph, and (iii) applying field data from GrabCAD challenges  to the hypothesized causal graph to quantify the impact of different factors.
The outline of the paper is as follows: First, we contextualize the predictions about the effects of contest, problem, and individual factors for the engineering design context. The predictions from game-theoretic models are presented in Sec. 2, and the predictions from existing empirical studies are presented in Sec. 3. Using these predictions, we create a causal graph of the relationships between different factors. The causal graph is presented in Sec. 4. Finally, we employ the causal graph for testing a set of predictions on a dataset acquired from GrabCAD's crowdsourcing platform. The results of the analysis are presented in Sec. 5.
Predictions of Participants' Behaviors From
Game theoretic analysis has been extensively performed in the economics and management literature to predict the outcomes of contests and to optimally design them [3,4,13,21]. The models generate theoretical predictions about participants' decisions in crowdsourcing contests. The models are based on the assumption that each participant is a rational decision maker who strives to maximize his/her expected payoff. Since contests involve multiple participants, the models incorporate beliefs about others' decisions, and uncertainty in receiving desired outcomes in their formulation. Each participant best responds to others' decisions, leading to Nash equilibrium. Decisions in Nash equilibrium are dependent on contest, problem, and individuals factors.
where E denotes the expectation operator.
where participant i's probability of winning is a function of his/her own solution quality qi, as well as others' quality q–i (which are not known to i).
Game-theoretic models are formed by using different assumptions about factors such as Π, fi, Ci, and qi. These factors can be related to characteristics of the contest, the problem, and the individuals, as highlighted in Table 1. In Sec. 2.1, we present typical assumptions used in such game-theoretic models of contests, and their relationships with the above factors.
Typical Assumptions in Game-Theoretic Models
where fi,m is the probability of winning mth prize , assuming that a participant can win at most one prize.
The cost Ci in Eq. (2) is a generic variable which includes all resources invested, e.g., monetary resources, human effort, computational cost, etc. It is dependent on how these resources are allocated and expended during the design process. The resources can either be used in a single period (referred to as a single-period process), or in a sequential manner (referred to as a multiperiod process). The distinction between single-period and multiperiod processes, illustrated in Fig. 2, is dependent on the number of resource usage steps until the final solution is picked for submission by the participant. In both processes, cost (or effort) is a strategic variable, whereas quality is the outcome.
In a single-period process, a decision maker first decides how many resources (money, computational or laboratory experiments) to invest cs, and then uses all the investment for design. Participant i receives quality outcome of as a result of investing Ci = cs or effort ei in one step. A single period process does not imply that a designer is only carrying out a single task. It only refers to how resources are allocated and used.
In a multiperiod process, a decision maker sequentially decides on resource usage after observing the outcome of each design step. Suppose at each step t, participant i incurs cost cm, and observes the outcome with quality . Participant i performs ei steps sequentially.2 The solution picked for submission then has the best quality of all outcome qualities obtained during the multiperiod process (). The total cost incurred in this process is a function of the number of steps ei. In its simplest form, if the same cost (cm) is incurred at each step, then the total cost is Ci = cm ei.
In engineering design, a period may refer to an iteration where a designer creates a physical/virtual prototype and evaluates it. The assumption of single-period process is typically used when creation of a prototype incurs small cost, and multiple prototypes can be created and evaluated in parallel. However, when each design prototype is expensive, it is necessary to learn the outcome before creating the next prototype. In such scenarios, the assumption of multiple-period process is more applicable .
Quality of Outcomes.
Game-theoretic models make assumptions about the relationship between resources invested (Ci) and quality received (qi). This relationship can be either deterministic or stochastic. In deterministic quality models, a complete control over the quality from a design process is exercised by defining the quality as an explicit function of Ci. In other words, a desired quality qi can be achieved by investing resources given in terms of cost (qi = q(Ci)). On the other hand, the quality from a design process is stochastically dependent on effort or cost in stochastic quality models. The uncertainty in defining quality may be due to technical uncertainty associated with a design process or uncertain evaluation criteria used by the sponsor. In all these cases, for a given cost Ci, the outcome of the design process remains uncertain. The quality of an outcome is thus modeled as a stochastic variable qi = q(Ci) + ϵ where ϵ is a random variable. The assumption of deterministic quality is applicable to design tasks where the concepts are well understood and the evaluation criteria are well defined on a numerical scale (e.g., topology optimization of a bracket with the objective of maximizing strength-to-weight ratio), whereas stochastic quality is more appropriate in concept design tasks.
The probability of winning (fi) in Eq. (2) represents the uncertainty about participant i winning the prize. The uncertainty in winning originates from unknown quality (q–i) of opponents' solution, and/or from the uncertain quality (qi) of participant i's own solution (see Fig. 3). If the design process or quality evaluation criteria of a contest is unknown, every participant's solution quality is uncertain. Such uncertainties are modeled in game-theoretic models through the probability of winning. Two commonly used models of probability of winning (fi) are contest success functions (CSFs) , and cumulative distribution functions (CDFs) .
Contest success functions are defined as functions of effort invested by a participant, and efforts invested by their opponents. A general form of CSFs for N participants is , where h() represents the effectiveness of any participant i's effort ei toward winning the contest. When opponents' qualities are unknown, the probability of winning can be modeled based on their effort using CSFs. CSFs predict higher probability of winning for a participant with higher effort. CDFs, on the other hand, are probability distributions defined over quality. If the uncertainty about an opponent's quality is defined by a probability distribution g(q–i), then the probability that participant i with quality qi wins against his/her opponent with quality q–i is given as . CDFs predict higher probability of winning for higher quality qi.
Contest success functions model how effective a participant's effort is toward winning, while CDFs model how effective his/her quality is toward winning. Uncertainty in quality from a design process may exist in contests modeled by both CSFs and CDFs. This uncertainty is defined using a quality distribution in CDFs but is not explicitly modeled in CSFs.
Instantiation of Models and Corresponding Predictions About Crowdsourcing Contests.
In game-theoretic models, participants' decisions to participate and the amount of effort are analyzed based on his/her expected payoff as given in Eq. (2). Once the payoff function is defined, the best response strategy for any participant is to maximize his/her expected payoff in response to the strategies of other participants. All models frame contest as a simultaneous game with perfect information (except for Ref.  discussed in Sec. 2.2.1), i.e., participants play the contest concurrently without knowing other participants' strategies (effort, cost, or quality) but they are aware of others' payoff function (cost structure, different uncertainties). Since other participants' strategies are unknown, they are sometimes estimated based on beliefs about others' quality, effort, and cost. Finally, each participant chooses to participate if his/her expected payoff is non-negative, i.e., Πfi ≥ Ci , and invests effort (or cost) that corresponds to the Nash-equilibrium predicted from a model [3,4].
Different structural assumptions about the prize Π, the probability of winning fi, and the cost Ci lead to different game-theoretic models. For example, a model by Sheremata  considers a fixed single-prize for the winner, models probability of winning as a function of effort using a CSF, and assumes a single-period cost function dependent on effort. Within this model, the overall effort by all participants, instead of the best quality, is defined as the contest outcome. This model is different from Taylor's model  of fixed single-prize contest with uncertainty in design process. In this model, the cost is incurred over multiple periods but the winning probability is defined in terms of a CDF of quality G(q).
Examples of existing models from the literature are listed in Table 2. For the purpose of relating structural assumptions to outcomes predicted in game-theoretic models, we classify the models based on their formulation of quality outcome (qi) and cost (Ci). qi is modeled either as deterministic or stochastic, while Ci either as single-period or multiperiod function. However, some models do not consider the type of quality as they use CSFs for probability of winning. Because these models can be interpreted as stochastic or deterministic type, we add undefined quality as the third type. Also, no model exists for the combination of deterministic quality and multiperiod cost, because the relation between the two is known in deterministic problems, and the resources are therefore always expended in single period. Given such categorization, we present predictions from sample models in different categories.
Deterministic Quality, Single-Period Cost Function.
A model by Che and Gale  falls under this category. It assumes a single-period design process where qi is a deterministic function of Ci. The model analyzes equilibrium strategies for two participants, symmetric or asymmetric with respect to the cost function. The outcomes of contests are predicted as mixed-strategy equilibrium meaning that a distribution over equilibrium quality is present, given by a CDF. This model predicts results comparing auction-style contests to all forms of single-prize contests. The model predicts that a first-price auction with two symmetric participants offers higher expected payoff for the sponsor than a fixed single-prize contest. This prediction is labeled P1.
In a model of multiple-prize contest of this category, the influence of relative magnitude of each prize is studied by Archak . In this model, the relation between quality and effort is deterministic, participants are asymmetric with different cost functions Ci(qi), and the probability of winning is modeled using a CDF. The quality outcome from a contest is presented as a symmetric equilibrium where all participants submit the same quality. Based on this equilibrium, the model predicts the following: jth prize's influence on the quality of kth winner is of the order . Consequently, every prize positively influences quality, quality of the kth prize is most influenced by the kth prize, and the influence of every added prize on quality outcomes is decreasing. This prediction is labeled prediction P2.
Stochastic Quality, Single-Period Cost Function.
Schottner  compares auction-style contests to fixed-single-prize contests in the presence of uncertain quality outcomes from a design process. Uncertainty in quality outcome is modeled as a sum of a deterministic part and a random part, (qi = f(ei, Ci) + ϵi). A distribution over the random part of quality is used to define the probability of the winning function. The model assumes that the quality (qi) picked for submission by every participant is observable to others before placing an auction bid for price (pi). Quality outcome from contests is presented as a pure-strategy Nash equilibrium. The model predicts that large differences in qualities of two participants due to a large variation in the random part ϵi of quality elicit high bid prices from participants. This results in higher cost of prize for the sponsor. In comparison, the model predicts that a controlled contest with a fixed single-prize may be optimal (prediction P3).
Terwiesch and Xu  model a single-period design process with the uncertainty in quality due to unclear evaluation criteria, called uncertainty in evaluation. They argue that contests (expertise-based) with problems of low uncertainty in evaluation have clear quality evaluation metric, and well-behaved solution space. At the other extreme, contests (ideation) with problems of high uncertainty in evaluation have undefined specifications resulting in subjective evaluations by the sponsor. The authors suggest that the effect of participant expertise (ability) is dominant in expertise-based contests as compared to ideation contests. Participants with lower expertise are less likely to participate in expertise-based contests. In ideation contests, however, higher uncertainty around the quality outcome and lower requirement of expertise attracts participation, independent of expertise. Ideation contests, therefore, draw more participation compared to expertise-based contests (prediction P4).
Stochastic Quality, Multiperiod Cost Function.
Taylor  models the uncertainty in quality in a multiple-period process for fixed single prize contests using a quality distribution G(qi). The probability of winning of any participant is defined using CDF of G(qi). A participant's quality at each period is randomly drawn from G(qi). The overall quality outcome is arrived based on a strategy to stop spending resources, called z-stop strategy, according to which the participants invest resources in multiple steps sequentially until the pure-strategy Nash equilibrium quality is achieved. This equilibrium predicts that the number of participants N spending effort is proportional to . This leads to the hypothesis that the number of participants in a single-prize contest is proportional to the prize amount (Π) (prediction P5).
Fullerton et al.  extend this model to auction-style contests. They argue that the bid price pi is a monotonically increasing function of quality qi. Based on this assumption, the model predicts that the sponsor's expected payoff in auction-style contests is strictly less than his/her expected payoff in fixed-prize contests (prediction P1).
Undefined Quality, Single-Period Cost Function.
Szymanski and Valetti  present a model where every participant's effort is defined as a contest outcome. The sponsor's payoff is given as an aggregated effort of all participants. In the model, the probability of winning is defined as a function of effort, given by CSFs. Further, all participants are symmetric in their cost which is same as effort (Ci = ei), and all participants are risk-neutral which lead them to follow a pure Nash equilibrium strategy where everyone invests the same amount of effort. This Nash equilibrium effort is expended in one step. Sheremata  makes similar structural assumptions, and derives the Nash equilibrium for multiple-prize contests. A comparison of single-prize contests with multiple-prize contests, in these models, suggests that symmetric and risk-neutral participants generate higher effort in fixed single-prize contests than fixed multiple-prize contests (prediction P6).
Szymanski and Valetti  extend the model to participants with asymmetric costs, and analyze their effort levels with respect to the addition of a second prize. The model predicts that in the case of asymmetric costs assuming risk-neutral participants, multiple-prizes generate more effort than the single-prize contest with the same total prize amount. Additionally, deviation from risk-neutral behavior is more likely to support higher effort from multiple-prizes (prediction P7) . The model also predicts that in fixed multiple-prize contests, the equilibrium effort by participants increases with the increase in prize amounts (prediction P8).
Summary of Predictions From the Models.
In summary, the models discussed above predict contest outcomes assuming particular forms of uncertainty in quality and the cost function. Some models neglect uncertainty while others model it explicitly. Most models consider either single-period cost function or multiperiod cost function to establish the relationship between cost and effort. We therefore compare predictions from different game-theoretic models to observe the effects of these factors.
From the comparison of predictions from the models, we observe that the uncertainty in quality affects contest outcomes, wherever the cost structure does not. For example, the desirability of auctions reduces with an increase in uncertainty about quality outcomes, as uncertain quality results in large price bids submitted by participants (predictions P1, P3). Uncertainty in problem specifications and quality evaluation motivates larger participation in contests, i.e., ideation contests are expected to attract higher participation compared to expertise based contests (prediction P4). In contrast, an auction-style contest, in both single and multiperiod cost functions, is expected to provide higher payoff to the sponsor than a fixed single-prize contest (prediction P1). Effects of prize amounts and number of prizes do not change with cost structure (predictions P5, P8). Higher prize amounts, for both cost structures, induce greater participation.
In design crowdsourcing, the outcomes of interest also include the number of participants which improves the diversity of solutions, and the quality of winning solution. The predictions from game-theoretic models about effort and quality of outcomes can be extended to these two factors, as discussed below.
The quality of the winning solution is the maximum of quality outcomes from all participants. By assuming that quality of winning is positively correlated with the outcome quality from each participant, and quality is deterministic as formulated in prediction P2 by Archark , we propose that quality of winning solution increases with prize amounts for single-prize and multiple-prize contests (prediction P9).
The number of participants is the count of participants who expend nonzero effort. To extend the insights about effort as an outcome to the number of participants as an outcome, the decision to participate is regarded as spending nonzero effort. For crowdsourcing contests, we assume that the number of participants is positively correlated with the effort invested by each participant. Higher effort by all participants would also mean higher number of individuals participating. Additionally, assuming that participants of crowdsourcing contests are asymmetric and risk-averse, predictions P7 and P8 can be extended to the number of participants. So, we predict that the number of participants increases with prize amounts for single-prize and multiple-prize contests (prediction P10). Further, the number of participants increases with the number of prizes (prediction P11).
Predictions About Participants' Behaviors From Existing Empirical Studies
Laboratory experiments assess the predictive power of game-theoretical models [12–14]. These experiments are observed to support predictions from respective game-theoretic models. For example, the laboratory experiment by Fullerton et al.  compares the outcomes of fixed-prize contests and auctions. From this experiment, the prediction P1 that auction generates better expected payoff for the sponsor than fixed single-prize contest is found to be true. Another experiment by Sheremata  compares participants' effort in single-prize contests to their effort in multiple-prize contests. The results of this experiment show that subjects' behaviors are consistent with the corresponding model's prediction P6. Sha et al.  observe the effects of effort on quality and winning probability using a laboratory experiment. The results show that quality from a multiple-period process with uncertainty is an increasing log-linear function of effort, which attains the highest quality at high effort. In this process, participants who win have put more effort. The probability of winning therefore is observed to increase with effort.
Game-theoretic models and laboratory experiments evaluate the effects of contest-related factors on participants' decisions. However, participants of a crowdsourcing contest are also influenced by problem and sponsor-related factors, which are typically analyzed using field studies [28,29]. Examples of problem-related factors are task size, technical difficulty, variability of required skills, uncertainty in solution, feasibility, etc. Examples of sponsor-related factors are sponsor interaction, feedback to participants' questions, trustworthiness, reputation, popularity, etc. We broadly categorize these factors into task complexity, sponsor-type, and sponsor interaction to study their impact on participants' decisions.
The task complexity in contests can negatively affect individuals' decisions to participate. Simple problems have minimal information content and decoupled functional parameters, whereas complex problems have more content and coupled functional requirements [30,31]. Complex problems demand more resources and time spent on research as compared to simple problems. Thus, a simple problem is preferred over a complex problem where size and coupling are necessary to be evaluated . In field studies on software development  and language processing , problems with smaller size and lower complexity are observed to generate better outcomes. Consequently, we propose that higher task complexity decreases participation (prediction P12).
Sponsor type is broadly defined as reputation and trustworthiness of the sponsor of a contest. It has a significant impact on participation decisions. Participants are motivated by recognition they can earn from the sponsor . A reputed sponsor by association provides a better visibility to their submissions and improved recognition of their effort. Additionally, participants who are sponsor's “fans” are more likely to participate . Bigger reputation generally implies a larger fan base, and thus possibly larger participation. Furthermore, the field study by Ref.  suggests that sponsor's trustworthiness inspires trust in participants while its fraudulent behavior negatively impacts participation. Therefore, we predict that reputation and trustworthiness of the sponsor improves participation (prediction P13).
Sponsor interaction is defined as the interaction between sponsor and participants beyond initial problem statement. During a contest, the sponsor may communicate with participants on contest platform through instant messages, comments, or feedback to answer participants' questions. Previous studies on crowdsourcing in electronic commerce [29,34] suggest that sponsor interaction is important to transfer sponsor's ideas to participants so that participants can generate more and creative solutions based on constructive feedback. Their field studies show that participants' solution quality is positively correlated with sponsor interaction. Similar trend is expected in design crowdsourcing where uncertainty in evaluation and quality is present, as pointed in Sec. 2, and more information or details can improve solutions. Therefore, we propose that sponsor interaction increases effort and solution quality by participants (prediction P14).
An Integrated Causal Model of Influencing Factors
The predictions from the game-theoretic models and empirical studies are summarized in Table 3. Based on the predictions, we develop a causal model  considering the factors identified in existing studies game-theoretic models and empirical studies that influence participants' decisions and contest outcomes. The factors included in the causal model are listed in Table 4. The causal model is depicted as an acyclic directed graph in Fig. 4. This causal graph formalizes and encodes the belief about plausible causal relations between different factors. In the rest of the paper, we perform causal inference analysis on this graph using field data to quantify the strengths of different dependencies. The analysis is based on the assumption that these encoded relationships are correct.
The approach adopted in this paper is exploratory rather than confirmatory . In an exploratory approach, a model is specified based on prior scientific knowledge, tested on sample data, and components of the model are validated or respecified. In contrast, in a confirmatory approach a well-accepted model is tested on sample data. Our explanations for proposed causal relationships include a review of existing game-theoretic and empirical studies, and also arguments for plausible confounding. A complete validation (or falsification) of our causal model however requires new data from further experimentation. Since the current knowledge about design crowdsourcing lacks such a causal model and the true model is unknown, future experiments can be built upon the proposed causal model.
In the causal model, we assume that the contest-, problem-, and sponsor-related factors are interdependent, and their effects on the contest outcomes may be confounded. Because sponsors design tasks for contests and decide prizes, their characteristics (called sponsor type) may influence contest design. For instance, popularity, reputation, and trustworthiness will likely influence individuals' preferences toward more effort. Sponsors with these types are likely to have received those types due to their ability to provide higher rewards. Conversely, sponsors with lower popularity may not be able to reward as much, or trusted with completing payment. In such cases, the sponsor type will influence the context design. Further, the sponsor types may influence problem factors. The reputation of sponsor could have emerged due to the sponsor's advancements in particular technology, in that case higher reputation may be correlated with higher task complexity. Alternatively, a sponsor may acquire lower reputation due to vague and unclear problem descriptions in contests, in which case the sponsor type will be correlated with the uncertainty in quality and evaluation.
When assessing relations between problem- and contest-related factors, it is expected that task complexity will factor in the design of contest, e.g., higher complexity may demand higher prize amount. Further, the uncertainty in quality is expected to be correlated with the contest design. As we know from Ref. , uncertainty in quality (especially due to unclear evaluation) differentiates ideation contests from expertise-based contests. The contest design can be influenced by whether the task involves ideation or well-defined problem solving, as both require different levels of expertise and knowledge in different domains .
Therefore, we posit a model of causal dependence between all influencing factors which helps to specify confounding relations of these factors to the contest outcomes (see Fig. 4). The causal graph of relations among all factors is assumed to satisfy the rules of graphical models for causal inference . Any node in the graph is directly affected by only its parent nodes, and indirectly affected by its ancestor nodes. All exogenous factors, i.e., factors that do not have any parents, are assumed to be independent, and may be used as control variables.
This causal graph acts as a formal model of our assumptions when testing predictions. A causal relationship between two factors X and Y postulated in the predictions can be mapped to all connected paths between X and Y in the causal graph. For example, the relationship between total prize (X = Π) and quality of winning solution (Y = Q) is related to paths: (1) (X = Π → Preferences → ei → Ci → qi → Y = Q), and (2) (X = Π → Preferences → ei ← Uncertainty in quality → qi → Y = Q). In Sec. 5, we discuss whether and to what extent such relationships can be tested on the field data from GrabCAD challenges.
Testing Predictions Using Field Data From GrabCAD Challenges
GrabCAD is an open computer-aided design (CAD) sharing community of individuals with diverse specialization . Organizations have sponsored challenges on GrabCAD for interior design, graphic design, industrial design, and product design. A typical GrabCAD challenge starts with posting (i) a detailed description of motivation and the goal behind the problem, (ii) resources such as designs, drawings, or tutorials on required software for participants, (iii) specific solution requirements such as STEP, IGES files or rendering images, and (iv) judging criteria for choosing the winners. In problem description, contest details such as number of prizes, monetary prize amounts, gifts such as iPad, 3D printers, camera etc. with their monetary values, and deadline of final submissions are made public. Participants upload their submissions to the challenge web-page anytime between the start and the end of that challenge. After the final deadline has passed, all submissions are evaluated by a jury which consists of representatives from the sponsor company. The jury rates solutions based on some judging criteria and picks the highest rated solutions. Out of these top submissions, a pre-set number of submissions are awarded prizes.
We analyzed all challenges hosted on GrabCAD platform between 2011 and 2016. From this set, we filtered out challenges that involved promotional tasks with no specific problem statement or prizes, e.g., community challenges with no prizes hosted by GrabCAD to promote crowd's involvement. We removed challenges that have crowd submissions set to private. Such challenges are dropped from our analysis because participants' behaviors in them are either unknown, or most likely caused by intrinsic factors which we cannot observe. Finally, we selected data of 96 challenges which provide measurable information on influencing and outcome-related factors. Scrapy , a Python-based web crawler, was used to capture the accessible data in accordance with regulations of GrabCAD.
The dataset for each challenge includes a description of the sponsor company, the problem description, the number of monetary prizes and gifts, corresponding prize amounts (worth price for gifts), the number of submissions, a unique identity of each submission and corresponding participant, and the identity of the winners. Based on participant identities and their submissions, the number of participants for each challenge is derived after removing repetitive and blank submissions so that each participant is counted only once. The number of prizes is counted as the sum of number of monetary prizes and number of gifts.
Measurable Factors in GrabCAD Challenges.
Some of the factors considered in the causal graph (Fig. 4) are measurable in the GrabCAD dataset while others are not. In this section, we analyze which factors are measurable and how they are measured. For consistency, factors are called measurable if they are directly or indirectly inferred from available information, and can be quantified or categorized given some rules. A list of measurable factors in GrabCAD challenges is given in Table 5.
Contest-related factors: Most of the contest-related factors are measurable. In the contest design, fixed single- and multiprize structures are observed on GrabCAD challenges. Auctions are not used for payment in GrabCAD challenges. Number of prizes and prizes are measured.
Individual-related factors: None of the individual-related factors is measurable for participants of GrabCAD challenges. Among the two cost structures assumed in game-theoretic models, single and multiperiod costs, none are observable and therefore not measured for GrabCAD challenges. The resources, both effort and cost, spent by participants are unknown. Although participants' submissions are observed, no consistent measure of quality is available across all challenges. Due to this, participants' beliefs about others' quality and own winning probability are also not measured.
Problem-related factors: Some problem-related factors are inferred from problem descriptions in challenges, and measured based on specific rules described in the following.
Uncertainty in quality of outcomes: As discussed in Sec. 2.1.3, this uncertainty arises from two sources: (i) uncertain relationship between resources invested and quality of solution, and (ii) uncertainty in evaluation and problem description, which relates to how a sponsor defines quality measure for submissions to choose the winners. The first source of uncertainty is not measured because cost and quality are not measured, and their relationship varies among participants. The second source of uncertainty is measured from problem descriptions and requirements.
Based on the uncertainty in evaluation, we categorize challenges into ideation and expertise-based challenges. This categorization follows the guidelines suggested by Terwiesch and Xu . The uncertainty in evaluation, which relates to unclear problem requirements and judging criteria, dictate these guidelines. Problems with high uncertainty in evaluation are categorized as ideation contests, while problems with low uncertainty in evaluation are categorized as expertise-based contests. For example, the Alcoa airplane bracket bearing design challenge  had an existing bearing bracket which participants were asked to optimize. Submissions in this challenge were evaluated based on the ratio of ultimate strength and weight. Well-defined design requirements put this challenge in the category of low uncertainty in evaluation. Also, well-behaved solution landscape implies low technical uncertainty. Thus, it is an expertise-based challenge. On the other hand, in the Robot Gripper Test Object challenge , participants were asked to come up with ideas for gripping mechanisms but the evaluation criteria were not well defined. Here, any implementation of gripping mechanism can be verified using simulation or prototypes. However, unclear problem requirements result in uncertainty in comparing any two mechanisms. This challenge therefore fits into the category of ideation-based contests.
Task complexity: We measure task complexity for GrabCAD challenges as the elemental count of subcomponents in task, i.e., the number of components required to be designed in challenge. The challenges of single component design are categorized under the part-design category, while the challenges that require the design of multiple components and interaction mechanism between these components are categorized under the system-design category. The number of components and the existence of complex interactions among them are aimed to represent size and coupling metrics of complexity commonly used for engineering design tasks . For example, consider the autodesk robot gripper arm design challenge  where the task was to design a light-weight robotic arm. Since the task here was to design a single component (the arm), it is categorized as a part-design challenge. In contrast, the task in the dirt-bike tire changing tool challenge  was to design a tire changing tool. This task involved design of a tool frame, a lever, and gears. It also required the design of mechanism of interaction between these parts and a wheel tire. This challenge is therefore classified as a system-design challenge.
Sponsor-related factors: Sponsor interaction and sponsor type are not directly measurable. But from the information about sponsor's identity available in the dataset, sponsor type, which relates to reputation and trustworthiness, can be indirectly measured. We measure a sponsor's popularity and reputation using the global traffic rank of that sponsor's website as given on Alexa . Low rank refers to high traffic while higher rank to lower traffic. For sponsors with very low web traffic, traffic ranks are not known from Alexa, and a high number (rank of 20 million) is assigned to their traffic ranks. In this measurement, we assume that the web traffic of sponsor's website is an indirect measure of sponsor's reputation. Indeed, according to Refs.  and , the number of visitors on the website is higher if the online trust in the website's sponsor is higher. Additionally, the online trust is influenced by the sponsor's reputation for reliable behavior, perceived size, trustworthiness, and credibility.
Testing Hypotheses on GrabCAD Dataset.
Given that a subset of the factors presented in the proposed causal model are measurable, only a subset of the predictions can be tested on GrabCAD data. Specifically, predictions P4, P5, P10, P11, P12, and P13 may be tested, and are hereafter referred to as hypotheses H1, H2, H3, H4, H5, and H6 respectively.
When testing each of the above hypotheses, we make few assumptions. We assume that causal relationships between the factors are defined according to the proposed causal model, where each arrow represents a causal effect of the corresponding starting node on the target node. We do not test alternate structures of the causal model, rather we use this hypothesized causal model to quantify the strength of these dependencies using field data. Next, we assume that each causal relationship is governed by a linear model, i.e., the target node is a linear function of the starting node. This assumption is a natural starting point for analysis when the functional form of the dependencies is not known. Also, we assume that any uncertainty around this linear function affecting the target node is exogenous, and independent of all factors in the proposed causal model. The assumption does not imply that uncertainty in the target node arising from the uncertainty in the starting node is ignored; it is in fact adjusted using the method of linear regression when testing for causal effects, as described in the following.
Assessment of Confounding in Measurable Factors.
In this section, we check for confounding between measurable factors in the GrabCAD data before evaluating their effects on participation. We identify confounds and include ones not captured in the causal model to the causal analysis. From a correlation analysis (see Table 6), we observe that significant correlations are consistent with the assumed causal model. An additional association within problem-related factors (between uncertainty in quality U and task complexity T) not captured in the causal model is found significant. This association is consistent with a view of design complexity as an uncertainty in relating current information content with the amount of information required to satisfy the problem .
Shadish et al.  offer experimental designs to identify and account for confounds in measured factors. These designs rely on measuring outcomes before treatment (pretest), after treatment (posttest), and separating groups with and without treatments (control). For the GrabCAD data, contest outcomes are only measured after challenges are over and not at different times, i.e., a pretest is lacking. As a result, our study can be viewed as a “posttest-only design with non-equivalent groups.” This design is subject to selection bias whereby treatment effects may be confounded with population differences since population in control groups is self-selected. For our case, populations are different groups of challenges. The selection bias then can be addressed by identifying differences in challenge characteristics, and whether they are confounded or correlated. Therefore, we evaluate the effects of measurable factors while considering confounding between them. The analysis may still be subject to selection bias due to unobserved external factors, which is a limitation of this study. To evaluate the effects, we employ a statistical approach as discussed in Sec. 5.3.2.
Finding Causal Effects While Adjusting for Confounding Factors.
where α1 and α2 are the causal effects of X and Z, respectively, on Y, and UY is the uncertainty in Y arising from exogenous factors not related to values of X or Z. By including Z in the linear regression model of Y on X, any effect that Z has on X is incorporated in the model, and that effect is thereby adjusted. Here, the causal effect α1 is a direct effect if the connecting path(s) X to Y has no intermediate factors, i.e., X has no child other than Y. Similarly, the causal effect α2 is a direct effect if Z also has a direct link to Y, in addition to the direct link to X. Otherwise, these effects are indirect effects.
As an example, when testing the effect of total prize (X) on the number of participants (Y), the confounding factors (sponsor type, task complexity, and uncertainty in quality), which affect both total prize and number of participants, need to be adjusted. Since all directed paths from these confounding factors to total prize reach through the set of individual prizes (Π1…ΠM), which are parents of total prize, all individual prizes comprise the set Z. If the effect of the parent node set Z on total prize is adjusted, the effects of all ancestor nodes are adjusted as a result. One way to adjust the effects of the set Z is to keep values of all its factors constant. Then, a linear regression between number of participants and total prize would quantify the corresponding linear effect. However, when the values of the Z factors are varying as is the case for the GrabCAD data, the implementation of linear regression according to Eq. (5) is a better approximation of the linear effect of total prize on the number of participants.
Results of Hypothesis Testing.
According to the procedure explained above, factors X, Y, and a set of factors Z are identified for each hypothesis to be tested, and a linear regression is carried out using the ordinary least squares method. The results obtained from this method are listed in Table 7. These estimated parameters are indirect effects as paths from X to Y include other factors. This table reports indirect effects of X and Z on Y but does not report direct effects of Z on X. Since the actual models of causal relationships are unknown and can be non-linear, the results may possess high variability (low R2). However, because we are interested in quantifying the significance of X's effect on Y, we emphasize on estimates of p-values.
We observe that the effects of X on Y in multiple regressions for hypotheses H1 and H2 are not statistically significant. This may be due to the disparity in sample sizes and small sample size. For hypothesis H1, 59 challenges in the GrabCAD dataset are of expertise-type while remaining 27 are of ideation-type (Cohen-d effect size without adjusting Web-traffic = 0.37, prob. of Type II error in multiple regression model = 0.37). We are also not able to test the effect of Web traffic rank (Z) on uncertainty in quality (X) as posited in the proposed casual model, because the linear regression of Z on X (Z ∼ αX + Uz) provides statistically insignificant results. To test hypothesis H2, 15 single-prize challenges are available from the data. The data from these 15 challenges do not provide statistically significant results, because the sample size is insufficient, and the probability of Type II error (= 0.45) in the multiple regression model is large. Also, the effects of the corresponding Z factors on prize (X) of these 15 single-prize challenges are insignificant.
We observe that the effects of X on Y in multiple regressions for hypotheses H3, H4, H5, and H6 are statistically significant, and the null hypotheses can be rejected. The probability of Type II error in multiple regression models for testing hypotheses H3, H4, and H5 is < 0.01 while it is 0.28 for H6. The effects of ith prize and total prize on number of participants are positive and significant at a statistical significance level of 0.05. The causal effect (the regression coefficient) of the ith prize on the number of participants increases as i increases, which suggests that every dollar allocated to the (i + 1)th prize has more impact on participation than a dollar allocated to the ith prize. The effects of corresponding adjusted factors (Z) on the ith prize (X) are found to be statistically insignificant. However, the effects of ith prize (Z) on total prize (X) are significant for all i, and accordingly adjusted when finding the effect of total prize (X) on the number of participants (Y).
The number of prizes is observed to have a positive impact (coeff = 4.89, p-value = 0.003) on the number of participants. In obtaining this result, the effects of all prizes (Z) on number of prizes (X) are adjusted, but they are not tested as these effects are found to be statistically insignificant.
In the results for hypothesis H5, task complexity is observed to have a significant negative impact on the number of participants (coeff = −54.3, p-value < 0.001). The estimated value of intercept parameter in the linear regression model of binary task complexity variable is 106.7. This suggests that the average number of participants in part-design challenges is 106.7 while it is 52.4 (intercept minus coeff) in system-design challenges.
The web traffic rank of a sponsor is observed to negatively affect the number of participant. This effect (coeff = −0.33, p-value = 0.027) is moderately significant at level α = 0.05. The observed trend suggests that the number of participants decreases by 0.33 with every 100,000 increase in the web traffic rank (for an increase of 10 million in rank, the number of participants decreases by 33). This drop may be small; however, it is an aggregate result for all challenges assuming linear relationships. It is possible that the number of participants is an exponentially decreasing function of web traffic rank. For instance, the Handrail Clamp Assembly Challenge by NASA (Web traffic rank = 640, N = 297), despite having a similar prize structure and problem type as the throttle pedal design challenge by Microtechnologies (Web traffic rank ≈ 4 million, N = 87), observed almost four times higher participation than the latter. To test for linear–log relationship, we fit a regression model for hypothesis H6 given by . This model provides a moderately better fit to the data (coeff α = −15.2, p-value = 0.003).
Implications on Designing Crowdsourcing Contests for Engineering Design.
Through the quantification of the causal effects, the results offer insights into what outcomes to expect of future design crowdsourcing contests on GrabCAD, or contests on other platforms that are similar in characteristics. Contest designers can use the results from the model to answer specific questions such as the following. If a sponsor firm has a fixed budget (say X dollars), how much should it increase the first prize, the second prize, and so on? How many prizes should it award? How many participants can a sponsor firm expect in comparison to another more popular and reputed sponsor firm? By how much the participation could decrease if coupling between multiple design components is introduced in a challenge problem? The estimated coefficients in Table 7, for example, show that increasing the first prize by $5000 will likely increase the expected participation by 55, but distributing the same amount equally in top five prizes will likely increase the expected participation by 150. Dividing X dollars into ten prizes, rather than putting all money on a single prize, will likely increase the number of participants by about 50. Similarly, the number of participants (N ≈ 53) for a coupled design problem in contest could be about half of that (N ≈ 107) for a decoupled design problem in contest. Participation in contests by an unknown company will likely be much lower than a larger more reputed firm.
Emerging research on crowdsourcing in the engineering design community has studied feasibility and effectiveness of crowdsourcing for engineering design tasks [5–8]. Following this research interest, we investigate how the design of crowdsourcing contests, where participants are mainly incentivized by monetary rewards, affect their outcomes. Game-theoretic models and empirical studies identify factors that have influence on participants' decisions. We propose a causal model of interdependence among these factors, and their impact on contest outcomes. In the data analysis, we test a subset of the predictions from the proposed causal model on the field data from GrabCAD challenges. Apart from hypotheses H1 and H2, results of all tested predictions are observed to be statistically significant. These results suggest that participants' decision to participate is positively affected by higher number of prizes (more so at higher prizes than the first prize), more prize amounts, less complexity, and higher reputation and trustworthiness in sponsor. Overall, the favorability of multiple prizes and higher allocation to higher prizes provide a support for game-theoretic assumptions that participants are asymmetric/nonhomogeneous and risk-averse. The results are generalizable to other platforms if the hypothesized relationships among influencing and outcome-related factors, and features of these factors are similar to relationships and features present on GrabCAD.
The use of field data presents unique challenges. Some aspects of field data such as effort, cost, or quality are not observable or measurable. As a result, all predictions of the proposed causal model cannot be tested using the GrabCAD challenges data. For the predictions that are tested to be significant, underlying assumptions of the causal inference methodology that proposed interdependences are true, other external factors are absent, and dependencies are linear may limit generalization.
Broadly speaking, the limitations of the analysis are as follows: (i) causal inference using a single data source does not prove the validity of causal assumptions , and (ii) the validity of the results is dependent on the hypothesized causal model being true. It is possible that there are other factors influencing the outcomes that are not currently part of the causal graph. These include individual-specific factors such as their intrinsic motivation for participation, their expertise, the relation of the crowdsourcing task to their day-time job, the tools they have access to, and available time for working on problem. These factors cannot be observed from the field data, but can be incorporated in future work through other research methodologies such as controlled experiments, surveys, and interviews.
In future work, the effects of nonmeasurable factors in a field data can be evaluated using laboratory experiments. Selecting participants with specific backgrounds, testing for specific skills or preferences, and monitoring/managing their design processes are ways to achieve control over contest. This will enable attribution of contest outcomes to specific factors. For instance, auction-related predictions (P1, P3, P6), untestable using the GrabCAD dataset, can be tested by setting up an auction for payment in a controlled contest environment. Insights from laboratory experiments, however, may have limited external validity present in field data.
This paper contributes to the design research literature through its investigation of game-theoretic models and their underlying assumptions to understand participants' decisions. The mapping of assumptions in game-theoretic models to characteristics of engineering tasks provide a framework that can be used to develop new models of crowdsourcing contests for engineering design tasks. This study identifies that existing game-theoretic formulations largely ignore problem- and sponsor-related factors, which are key influencing factors in contests. Another contribution of this paper is the causal graph, which encodes the interdependence of factors derived from existing game-theoretic models and empirical studies. Future studies can leverage this causal graph to drive further experimentation into participants' decisions in design crowdsourcing contests.
In multiperiod processes, the total number of steps is often used as a proxy for the effort invested.
U.S. National Science Foundation (NSF) CMMI (Grant No. 1400050).