Crowdsourced evaluation is a promising method of evaluating engineering design attributes that require human input. The challenge is to correctly estimate scores using a massive and diverse crowd, particularly when only a small subset of evaluators has the expertise to give correct evaluations. Since averaging evaluations across all evaluators will result in an inaccurate crowd evaluation, this paper benchmarks a crowd consensus model that aims to identify experts such that their evaluations may be given more weight. Simulation results indicate this crowd consensus model outperforms averaging when it correctly identifies experts in the crowd, under the assumption that only experts have consistent evaluations. However, empirical results from a real human crowd indicate this assumption may not hold even on a simple engineering design evaluation task, as clusters of consistently wrong evaluators are shown to exist along with the cluster of experts. This suggests that both averaging evaluations and a crowd consensus model that relies only on evaluations may not be adequate for engineering design tasks, accordingly calling for further research into methods of finding experts within the crowd.

References

1.
Hong
,
L.
, and
Page
,
S. E.
,
2004
, “
Groups of Diverse Problem Solvers Can Outperform Groups of High-Ability Problem Solvers
,”
Proc. Natl. Acad. Sci. U.S.A.
,
101
(
46
), pp.
16385
16389
.10.1073/pnas.0403723101
2.
Estellés-Arolas
,
E.
, and
González-Ladrón-de Guevara
,
F.
,
2012
, “
Towards an Integrated Crowdsourcing Definition
,”
J. Inf. Sci.
,
38
(
2
), pp.
189
200
.10.1177/0165551512437638
3.
Gerth
,
R. J.
,
Burnap
,
A.
, and
Papalambros
,
P.
,
2012
, “
Crowdsourcing: A Primer and its Implications for Systems Engineering
,”
2012 NDIA Ground Vehicle Systems Engineering and Technology Symposium
, Troy, MI, Aug. 14–16.
4.
Kittur
,
A.
,
Chi
,
E. H.
, and
Suh
,
B.
,
2008
, “
Crowdsourcing User Studies With Mechanical Turk
,”
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
, Florence, Italy, Apr. 5–10, pp.
453
456
.10.1145/1357054.1357127
5.
Von Ahn
,
L.
,
Maurer
,
B.
,
McMillen
,
C.
,
Abraham
,
D.
, and
Blum
,
M.
,
2008
, “
Recaptcha: Human-Based Character Recognition via Web Security Measures
,”
Science
,
321
(
5895
), pp.
1465
1468
.10.1126/science.1160379
6.
Warnaar
,
D. B.
,
Merkle
,
E. C.
,
Steyvers
,
M.
,
Wallsten
,
T. S.
,
Stone
,
E. R.
,
Budescu
,
D. V.
,
Yates
,
J. F.
,
Sieck
,
W. R.
,
Arkes
,
H. R.
,
Argenta
,
C. F.
,
Shin
,
Y.
, and
Carter
,
J. N.
,
2012
, “
The Aggregative Contingent Estimation System: Selecting, Rewarding, and Training Experts in a Wisdom of Crowds Approach to Forecasting
,”
Proceedings of the 2012 AAAI Spring Symposium: Wisdom of the Crowd
, Palo Alto, CA, Mar. 26–28.
7.
Ipeirotis
,
P. G.
, and
Paritosh
,
P. K.
,
2011
, “
Managing Crowdsourced Human Computation: A Tutorial
,”
Proceedings of the 20th International World Wide Web Conference Companion
, Hyderabad, India, Mar. 28–Apr. 1, pp.
287
288
.10.1145/1963192.1963314
8.
Sheshadri
,
A.
, and
Lease
,
M.
,
2013
, “
Square: A Benchmark for Research on Computing Crowd Consensus
,”
Proceedings of the 1st AAAI Conference on Human Computation and Crowdsourcing
, Palm Springs, CA, Nov. 7–9.
9.
Nunnally
,
J.
, and
Bernstein
,
I.
,
2010
,
Psychometric Theory 3E
,
McGraw-Hill Series in Psychology, McGraw-Hill Education
, New York.
10.
Papalambros
,
P. Y.
, and
Shea
,
K.
,
2005
, “
Creating Structural Configurations
,”
Formal Engineering Design Synthesis
,
E. K.
Antonsson
, and
J.
Cagan
, eds.,
Cambridge University
, Cambridge, UK, pp.
93
125
.10.1017/CBO9780511529627.007
11.
Amazon,
2005
, “
Amazon Mechanical Turk
,” http://www.mturk.com
12.
Van Horn
,
D.
,
Olewnik
,
A.
, and
Lewis
,
K.
,
2012
, “
Design Analytics: Capturing, Understanding, and Meeting Customer Needs Using Big Data
,”
ASME
Paper No. DETC2012-71038.10.1115/DETC2012-71038
13.
Tuarob
,
S.
, and
Tucker
,
C. S.
,
2013
, “
Fad or Here to Stay: Predicting Product Market Adoption and Longevity Using Large Scale, Social Media Data
,”
ASME
Paper No. DETC2013-12661.10.1115/DETC2013-12661
14.
Stone
,
T.
, and
Choi
,
S.-K.
,
2013
, “
Extracting Consumer Preference From User-Generated Content Sources Using Classification
,”
ASME
Paper No. DETC2013-13228.10.1115/DETC2013-13228
15.
Ren
,
Y.
, and
Papalambros
,
P. Y.
,
2012
, “
On Design Preference Elicitation With Crowd Implicit Feedback
,”
ASME
Paper No. DETC2012-70605.10.1115/DETC2012-70605
16.
Ren
,
Y.
,
Burnap
,
A.
, and
Papalambros
,
P.
,
2013
, “
Quantification of Perceptual Design Attributes Using a Crowd
,”
Proceedings of the 19th International Conference on Engineering Design (ICED13), Design for Harmonies
, Vol.
6
, Design Information and Knowledge, Seoul, Korea, Aug. 19–22.
17.
Kudrowitz
,
B. M.
, and
Wallace
,
D.
,
2013
, “
Assessing the Quality of Ideas From Prolific, Early-Stage Product Ideation
,”
J. Eng. Des.
,
24
(
2
), pp.
120
139
.10.1080/09544828.2012.676633
18.
Grace
,
K.
,
Maher
,
M. L.
,
Fisher
,
D.
, and
Brady
,
K.
,
2014
, “
Data-Intensive Evaluation of Design Creativity Using Novelty, Value, and Surprise
,”
Int. J. Des. Creativity and Innovation
, pp.
1
23
.10.1080/21650349.2014.943295
19.
Fuge
,
M.
,
Stroud
,
J.
, and
Agogino
,
A.
,
2013
, “
Automatically Inferring Metrics for Design Creativity
,”
ASME
Paper No. DETC2013-12620.10.1115/DETC2013-12620
20.
Yang
,
M. C.
,
2010
, “
Consensus and Single Leader Decision-Making in Teams Using Structured Design Methods
,”
Des. Stud.
,
31
(
4
), pp.
345
362
.10.1016/j.destud.2010.03.002
21.
Gurnani
,
A.
, and
Lewis
,
K.
,
2008
, “
Collaborative, Decentralized Engineering Design at the Edge of Rationality
,”
ASME J. Mech. Des.
,
130
(
12
), p.
121101
.10.1115/1.2988479
22.
de Caritat
,
M. J. A. N.
,
1785
,
Essai sur l'application de l'analyse à la probabilité des décisions rendues à la pluralité des voix
,
L'imprimerie royale
, Paris, France.
23.
Lord
,
F. M.
,
1980
,
Applications of Item Response Theory to Practical Testing Problems
,
Erlbaum
,
Mahwah, NJ
.
24.
Rasch
,
G.
,
1960/1980
, “
Probabilistic Models for Some Intelligence and Achievement Tests, Expanded Edition (1980) With Foreword and Afterword by B. D. Wright
,”
Copenhagen
,
Danish Institute for Educational Research
,
Denmark
.
25.
Oravecz
,
Z.
,
Anders
,
R.
, and
Batchelder
,
W. H.
,
2013
, “
Hierarchical Bayesian Modeling for Test Theory Without an Answer Key
,”
Psychometrika
(published online), pp. 1–24.10.1007/s11336-013-9379-4
26.
Miller
,
N.
,
Resnick
,
P.
, and
Zeckhauser
,
R.
,
2005
, “
Eliciting Informative Feedback: The Peer-Prediction Method
,”
Manage. Sci.
,
51
(
9
), pp.
1359
1373
.10.1287/mnsc.1050.0379
27.
Prelec
,
D.
,
2004
, “
A Bayesian Truth Serum for Subjective Data
,”
Science
,
306
(
5695
), pp.
462
466
.10.1126/science.1102081
28.
Wauthier
,
F. L.
, and
Jordan
,
M. I.
,
2011
, “
Bayesian Bias Mitigation for Crowdsourcing
,”
Adv. Neural Inf. Process. Syst.
, pp.
1800
1808
.
29.
Bachrach
,
Y.
,
Graepel
,
T.
,
Minka
,
T.
, and
Guiver
,
J.
,
2012
, “
How to Grade a Test Without Knowing The Answers—A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing
,”
Proceedings of the 29th International Conference on Machine Learning
, Edinburgh, Scotland, UK, June 26–July 1.
30.
Welinder
,
P.
,
Branson
,
S.
,
Belongie
,
S.
, and
Perona
,
P.
,
2010
, “
The Multidimensional Wisdom of Crowds
,”
Adv. Neural Inf. Process. Syst.
,
10
, pp.
2424
2432
.
31.
Lakshminarayanan
,
B.
, and
Teh
,
Y. W.
,
2013
, “
Inferring Ground Truth From Multi-Annotator Ordinal Data: A Probabilistic Approach
,” preprint arXiv 1305.0015.
32.
Tang
,
W.
, and
Lease
,
M.
,
2011
, “
Semi-Supervised Consensus Labeling for Crowdsourcing
,”
Special Interest Group on Information Retrieval 2011 Workshop on Crowdsourcing for Information Retrieval
, Beijing, China, July 28, pp. 1–6.
33.
Liu
,
Q.
,
Peng
,
J.
, and
Ihler
,
A. T.
,
2012
, “
Variational Inference for Crowdsourcing
,”
Adv. Neural Inf. Process. Syst.
, pp.
701
709
.
34.
Whitehill
,
J.
,
Ruvolo
,
P.
,
Wu
,
T.
,
Bergsma
,
J.
, and
Movellan
,
J. R.
,
2009
, “
Whose Vote Should Count More: Optimal Integration of Labels From Labelers of Unknown Expertise
,”
Adv. Neural Inf. Process. Syst.
,
22
, pp.
2035
2043
.
35.
Kim
,
J.
,
Zhang
,
H.
,
André
,
P.
,
Chilton
,
L. B.
,
Mackay
,
W.
,
Beaudouin-Lafon
,
M.
,
Miller
,
R. C.
, and
Dow
,
S. P.
,
2013
, “
Cobi: A Community-Informed Conference Scheduling Tool
,”
Proceedings of the 26th Annual ACM symposium on User Interface Software and Technology
, St Andrews, UK, Oct. 8–11, pp.
173
182
.10.1145/2501988.2502034
36.
Snow
,
R.
,
O'Connor
,
B.
,
Jurafsky
,
D.
, and
Ng
,
A. Y.
,
2008
, “
Cheap and Fast—but Is It Good?: Evaluating Non-Expert Annotations for Natural Language Tasks
,”
Proceedings of the Conference on Empirical Methods in Natural Language Processing
, Honolulu, HI, pp.
254
263
.
37.
Zaidan
,
O. F.
, and
Callison-Burch
,
C.
,
2011
, “
Crowdsourcing Translation: Professional Quality From Non-Professionals
,”
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
, Portland, OR, pp.
1220
1229
.
38.
Sheng
,
V. S.
,
Provost
,
F.
, and
Ipeirotis
,
P. G.
,
2008
, “
Get Another Label? Improving Data Quality and Data Mining Using Multiple, Noisy Labelers
,”
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery
, Las Vegas, NV, Aug. 24–27, pp.
614
622
.10.1145/1401890.1401965
39.
Celaschi
,
F.
,
Celi
,
M.
, and
García
,
L. M.
,
2011
, “
The Extended Value of Design: An Advanced Design Perspective
,”
Des. Manage. J.
,
6
(
1
), pp.
6
15
.10.1111/j.1948-7177.2011.00024.x
40.
Bommarito
,
M. F. R.
,
Gong
,
A.
, and
Page
,
S.
,
2011
, “
Crowdsourcing Design and Evaluation Analysis of DARPA's XC2V Challenge
,” University of Michigan Technical Report.
41.
Caragiannis
,
I.
,
Procaccia
,
A. D.
, and
Shah
,
N.
,
2013
, “
When Do Noisy Votes Reveal the Truth?
,”
Proceedings of the Fourteenth ACM Conference on Electronic Commerce
, Philadelphia, PA, June 16–20, pp.
143
160
.10.1145/2492002.2482570
42.
Powell
,
M. J.
,
1964
, “
An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives
,”
Comput. J.
,
7
(
2
), pp.
155
162
.10.1093/comjnl/7.2.155
43.
Haario
,
H.
,
Saksman
,
E.
, and
Tamminen
,
J.
,
2001
, “
An Adaptive Metropolis Algorithm
,”
Bernoulli
,
7
(
2
), pp.
223
242
.10.2307/3318737
44.
Gelfand
,
A. E.
, and
Smith
,
A. F.
,
1990
, “
Sampling-Based Approaches to Calculating Marginal Densities
,”
J. Am. Stat. Assoc.
,
85
(
410
), pp.
398
409
.10.1080/01621459.1990.10476213
45.
Patil
,
A.
,
Huard
,
D.
, and
Fonnesbeck
,
C. J.
,
2010
, “
PyMC: Bayesian Stochastic Modelling in Python
,”
J. Stat. Software
,
35
(
4
), pp.
1
81
.
46.
Schramm
,
U.
,
Thomas
,
H.
,
Zhou
,
M.
, and
Voth
,
B.
,
1999
, “
Topology Optimization With Altair Optistruct
,”
Proceedings of the Optimization in Industry II Conference
, Banff, Canada.
47.
University of Michigan—Optimal Design Laboratory,
2013
, “
Turker Design—Crowdsourced Design Evaluation
,” http://www.turkerdesign.com.
48.
Ester
,
M.
,
Kriegel
,
H.-P.
,
Sander
,
J.
, and
Xu
,
X.
,
1996
, “
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases With Noise
,”
Knowl. Discovery Data Min.
,
96
, pp.
226
231
.
49.
Raykar
,
V. C.
,
Yu
,
S.
,
Zhao
,
L. H.
,
Valadez
,
G. H.
,
Florin
,
C.
,
Bogoni
,
L.
, and
Moy
,
L.
,
2010
, “
Learning From Crowds
,”
J. Mach. Learn. Res.
,
11
, pp.
1297
1322
.
50.
Prelec
,
D.
,
Seung
,
H. S.
, and
McCoy
,
J.
,
2013
, “
Finding Truth Even If the Crowd Is Wrong, Technical Report, Working Paper
,” MIT.
51.
Rzeszotarski
,
J. M.
, and
Kittur
,
A.
,
2011
, “
Instrumenting the Crowd: Using Implicit Behavioral Measures to Predict Task Performance
,”
Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology
, Santa Barbara, CA, pp.
13
22
.10.1145/2047196.2047199
52.
Budescu
,
D. V.
, and
Chen
,
E.
,
2014
, “
Identifying Expertise to Extract the Wisdom of the Crowds
,”
Management Science
(published online) pp. 1–34.10.1287/mnsc.2014.1909
53.
Della Penna
,
N.
, and
Reid
,
M. D.
,
2012
, “
Crowd & Prejudice: An Impossibility Theorem for Crowd Labelling Without a Gold Standard
,”
Proceedings of 2012 Collective Intelligence Conference
, Cambridge, MA, Apr. 18–20.
54.
Waggoner
,
B.
, and
Chen
,
Y.
,
2013
, “
Information Elicitation Sans Verification
,”
Proceedings of the 3rd Workshop on Social Computing and User Generated Content
, Philadelphia, PA, June 16.
55.
Davis-Stober
,
C. P.
,
Budescu
,
D. V.
,
Dana
,
J.
, and
Broomell
,
S. B.
,
2014
, “
When Is a Crowd Wise?
,”
Decision
,
1
(
2
), pp.
79
101
.10.1037/dec0000004
56.
Kruger
,
J.
,
Endriss
,
U.
,
Fernández
,
R.
, and
Qing
,
C.
,
2014
, “
Axiomatic Analysis of Aggregation Methods for Collective Annotation
,”
Proceedings of the 2014 International Conference on Autonomous Agents and Multi-Agent Systems
, Paris, France, May 5–9, pp.
1185
1192
.
You do not currently have access to this content.