Graphical Abstract Figure

Comparing Model Performance on the Empathic Accuracy (EA) Task: Importance of Domain-Specific Assessment

Graphical Abstract Figure

Comparing Model Performance on the Empathic Accuracy (EA) Task: Importance of Domain-Specific Assessment

Close modal

Abstract

Empathic design research aims to gain deep and accurate user understanding. We can measure the designer's empathic ability as empathic accuracy (EA) in understanding the user's thoughts and feelings during an interview. However, the EA measure currently relies on human rating and is thus time-consuming, making the use of large language models (LLMs) an attractive alternative. It is essential to consider two significant constraints when implementing LLMs as a solution: the choice of LLM and the impact of domain-specific datasets. Datasets of the interactions between the designer and the user are not generally available. We present such a dataset consisting of the EA task employed in user interviews to measure empathic understanding. It consists of over 400 pairs of user thoughts or feelings matched with a designer's guess of the same and the human ratings of the accuracy. We compared the performance of six sentence embedding state-of-the-art LLMs with different pooling techniques on the EA task. We used the LLMs to extract semantic information before and after fine-tuning. We conclude that directly using LLMs based on their reported performance in general language tasks could result in errors when judging a designer's empathic ability. We also found that fine-tuning LLMs on our dataset improved their performance, but the model's ability to fit the EA task and pooling method also determined the LLM's performance. The results will provide insight for other LLM-based similarity analyses in design.

References

1.
Li
,
J.
, and
Hölttä-Otto
,
K.
,
2020
, “
The Influence of Designers’ Cultural Differences on the Empathic Accuracy of User Understanding
,”
Des. J.
,
23
(
5
), pp.
779
796
.
2.
Postma
,
C. E.
,
Zwartkruis-Pelgrim
,
E.
,
Daemen
,
E.
, and
Du
,
J.
,
2012
, “
Challenges of Doing Emphatic Design: Experiences From Industry
,”
Int. J. Des.
,
6
(
1
), pp.
59
70
. https://www.ijdesign.org/index.php/IJDesign/article/view/1008/403
3.
Mao
,
J. Y.
,
Vredenburg
,
K.
,
Smith
,
P. W.
, and
Carey
,
T.
,
2005
, “
The State of User-Centered Design Practice
,”
Commun. ACM
,
48
(
3
), pp.
105
109
.
4.
Hölttä-Otto
,
K.
,
Otto
,
K.
,
Song
,
C.
,
Luo
,
J.
,
Li
,
T.
,
Seepersad
,
C. C.
, and
Seering
,
W.
,
2018
, “
The Characteristics of Innovative, Mechanical Products—10 Years Later
,”
ASME J. Mech. Des.
,
140
(
8
), p.
084501
.
5.
Saunders
,
M. N.
,
Seepersad
,
C. C.
, and
Hölttä-Otto
,
K.
,
2011
, “
The Characteristics of Innovative, Mechanical Products
,”
ASME J. Mech. Des.
,
133
(
2
), p.
021009
.
6.
Shin
,
Y.
,
Im
,
C.
,
Oh
,
H.
, and
Kim
,
J.
,
2017
, “
Design for Experience Innovation: Understanding User Experience in New Product Development
,”
Behav. Inf. Technol.
,
36
(
12
), pp.
1218
1234
.
7.
Genco
,
N.
,
Johnson
,
D.
,
Hölttä-Otto
,
K.
, and
Seepersad
,
C.
,
2011
, “
A Study of the Effectiveness of the Empathic Experience Design Creativity Technique
,”
Proceedings of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Washington, DC
,
Aug. 28–31
, p.
9
, ASME Paper No. DETC2011-48256.
8.
Kouprie
,
M.
, and
Visser
,
F. S.
,
2009
, “
A Framework for Empathy in Design: Stepping Into and Out of the User’s Life
,”
J. Eng. Des.
,
20
(
5
), pp.
437
448
.
9.
Van Rijn
,
H.
,
Sleeswijk Visser
,
F.
,
Stappers
,
P. J.
, and
Özakar
,
A. D.
,
2011
, “
Achieving Empathy With Users: The Effects of Different Sources of Information
,”
CoDesign
,
7
(
2
), pp.
65
77
.
10.
Raviselvam
,
S.
,
Hölttä-Otto
,
K.
, and
Wood
,
K. L.
,
2016
, “
User Extreme Conditions to Enhance Designer Empathy and Creativity: Applications Using Visual Impairment
,”
Proceedings of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Charlotte, NC
,
Aug. 21–24
, p.
7
, ASME Paper No. V007T006A005.
11.
Alzayed
,
M. A.
,
McComb
,
C.
,
Menold
,
J.
,
Huff
,
J.
, and
Miller
,
S. R.
,
2021
, “
Are You Feeling Me? An Exploration of Empathy Development in Engineering Design Education
,”
ASME J. Mech. Des.
,
143
(
11
), p.
112301
.
12.
Surma-Aho
,
A.
,
Björklund
,
T.
, and
Hölttä-Otto
,
K.
,
2018
, “
Assessing the Development of Empathy and Innovation Attitudes in a Project-Based Engineering Design Course
,”
Proceedings of ASEE Annual Conference & Exposition
,
Salt Lake City, UT
,
June 23–27
.
13.
Drouet
,
L.
,
Visser
,
F. S.
, and
Lallemand
,
C.
,
2023
, “
Using Empathy-Centric Design in Industry: Reflections From the UX Researcher, the Client, and the Method Expert
,”
Proceedings of the 2nd Empathy-Centric Design Workshop
,
Hamburg, Germany
,
Apr. 23
, p.
10
.
14.
Michailidou
,
I.
, and
Lindemann
,
U.
,
2016
, “
Exploring the Actual Practice of User Experience and Scenario-Based Methods
,”
Proceedings of DS 84: Proceedings of the DESIGN 2016 14th International Design Conference
,
Cavtat, Dubravnik, Croatia
,
May 16–19
, pp.
1783
1794
.
15.
Surma-Aho
,
A.
, and
Hölttä-Otto
,
K.
,
2022
, “
Conceptualization and Operationalization of Empathy in Design Research
,”
Des. Stud.
,
78
(
4
), p.
101075
.
16.
Bairaktarova
,
D.
,
Bernstein
,
W. Z.
,
Reid
,
T.
, and
Ramani
,
K.
,
2016
, “
Beyond Surface Knowledge: An Exploration of How Empathic Design Techniques Enhances Engineer’s Understanding of Users’ Needs
,”
Int. J. Eng. Educ.
,
32
(
1
), pp.
111
122
. https://api.semanticscholar.org/CorpusID:27701763
17.
Ickes
,
W.
,
1993
, “
Empathic Accuracy
,”
J. Pers.
,
61
(
4
), pp.
587
610
.
18.
Berlamont
,
L.
,
Sels
,
L.
,
Ickes
,
W.
,
Ceulemans
,
E.
,
Hinnekens
,
C.
, and
Verhofstadt
,
L.
,
2022
, “
Associations Between Affect and Empathic Accuracy During Conflict Interactions in Couples
,”
J. Soc. Pers. Relat.
,
39
(
7
), pp.
2239
2261
.
19.
Marangoni
,
C.
,
Garcia
,
S.
,
Ickes
,
W.
, and
Teng
,
G.
,
1995
, “
Empathic Accuracy in a Clinically Relevant Setting
,”
J. Pers. Soc. Psychol.
,
68
(
5
), pp.
854
869
.
20.
Chang-Arana
,
ÁM
,
Piispanen
,
M.
,
Himberg
,
T.
,
Surma-aho
,
A.
,
Alho
,
J.
,
Sams
,
M.
, and
Hölttä-Otto
,
K.
,
2020
, “
Empathic Accuracy in Design: Exploring Design Outcomes Through Empathic Performance and Physiology
,”
Des. Sci.
,
6
, p.
e16
.
21.
Li
,
J.
,
Surma-Aho
,
A.
, and
Hölttä-Otto
,
K.
,
2021
, “
Measuring Designers' Empathic Understanding of Users by a Quick Empathic Accuracy (QEA)
,”
Proceedings of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Aug. 17–19
, ASME Paper No. V006T06A027.
22.
Salmi
,
A.
,
Li
,
J.
, and
Holtta-Otto
,
K.
,
2023
, “
Automatic Facial Expression Analysis as a Measure of User-Designer Empathy
,”
ASME J. Mech. Des.
,
145
(
3
), p.
031403
.
23.
Li
,
J.
,
Surma-Aho
,
A.
,
Chang-Arana
,
ÁM
, and
Hölttä-Otto
,
K.
,
2021
, “
Understanding Customers Across National Cultures: The Influence of National Cultural Differences on Designers’ Empathic Accuracy
,”
J. Eng. Des.
,
32
(
10
), pp.
538
558
.
24.
Chang-Arana
,
ÁM
,
Surma-Aho
,
A.
,
Li
,
J.
,
Yang
,
M. C.
, and
Hölttä-Otto
,
K.
,
2020
, “
Reading the User's Mind: Designers Show High Accuracy in Inferring Design-Related Thoughts and Feelings
,”
Proceedings of International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Virtual, Online
,
Aug. 17–19
, p.
8
, ASME Paper No. V008T08A029.
25.
Ickes
,
W.
,
2001
, “Measuring Empathic Accuracy,”
Interpersonal Sensitivity: Theory and Measurement
,
J. A.
Hall
, and
F. J.
Bernieri
, eds.,
Lawrence Erlbaum Associates Publishers
,
New York
, pp.
219
241
.
26.
Timoshenko
,
A.
, and
Hauser
,
J. R.
,
2019
, “
Identifying Customer Needs From User-Generated Content
,”
Mark. Sci.
,
38
(
1
), pp.
1
20
.
27.
Wang
,
X.
,
Liu
,
A.
, and
Kara
,
S.
,
2022
, “
Constructing Product Usage Context Knowledge Graph Using User-Generated Content for User-Driven Customization
,”
ASME J. Mech. Des.
,
145
(
4
), p.
041404
.
28.
Zhu
,
Q.
,
Chong
,
L.
,
Yang
,
M.
, and
Luo
,
J.
,
2024
, “Reading Users' Minds From What They Say: An Investigation Into LLM-Based Empathic Mental Inference,” arXiv preprint arXiv:2403.13301.
29.
Nguyen
,
S.
,
Beck
,
D.
, and
Holtta-Otto
,
K.
, “
Predicting Empathic Accuracy From User-Designer Interviews
,”
Proceedings of Annual Workshop of the Australasian Language Technology Association
,
Melbourne, Australia
,
Nov. 29–Dec. 1
,
Association for Computational Linguistics
, pp.
125
129
.
30.
Liusie
,
A.
,
Raina
,
V.
,
Raina
,
V.
, and
Gales
,
M.
,
2022
, “
Analyzing Biases to Spurious Correlations in Text Classification Tasks
,”
Proceedings of Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing Association for Computational Linguistics, 2: Short Papers
,
Online
,
Nov. 20–23
, pp.
78
84
.
31.
Wang
,
K.
,
Reimers
,
N.
, and
Gurevych
,
I.
,
2021
, “
TSDAE: Using Transformer-Based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning
,”
Proceedings of Conference on Empirical Methods in Natural Language Processing
,
Punta Cana, Dominican Republic
,
Nov. 7–11
,
Association for Computational Linguistics
, pp.
671
688
.
32.
Hoogeveen
,
D.
,
Verspoor
,
K. M.
, and
Baldwin
,
T.
,
2015
, “
CQADupStack: A Benchmark Data Set for Community Question-Answering Research
,”
Proceedings of 20th Australasian Document Computing Symposium
,
New York, NY
,
Association for Computing Machinery
, pp.
1
8
.
33.
Cohan
,
A.
,
Feldman
,
S.
,
Beltagy
,
I.
,
Downey
,
D.
, and
Weld
,
D.
,
2020
, “
SPECTER: Document-Level Representation Learning Using Citation-Informed Transformers
,”
Proceedings of Association for Computational Linguistics
,
Online
,
July 5–10
, pp.
2270
2282
.
34.
Agirre
,
E.
,
Banea
,
C.
,
Cardie
,
C.
,
Cer
,
D.
,
Diab
,
M.
,
Gonzalez-Agirre
,
A.
,
Guo
,
W.
, et al
,
2015
, “
SemEval-2015 Task 2: Semantic Textual Similarity, English, Spanish and Pilot on Interpretability
,”
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
,
Association for Computational Linguistics
, pp.
252
263
.
35.
Agirre
,
E.
,
Banea
,
C.
,
Cardie
,
C.
,
Cer
,
D.
,
Diab
,
M.
,
Gonzalez-Agirre
,
A.
,
Guo
,
W.
,
Mihalcea
,
R.
,
Rigau
,
G.
, and
Wiebe
,
J.
, “
SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
,”
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
,
Dublin, Ireland
,
Aug. 23–24
, pp.
81
91
.
36.
Agirre
,
E.
,
Banea
,
C.
,
Cer
,
D.
,
Diab
,
M.
,
Gonzalez-Agirre
,
A.
,
Mihalcea
,
R.
,
Rigau
,
G.
, and
Wiebe
,
J.
, “
SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation
,”
SemEval-2016. 10th International Workshop on Semantic Evaluation
,
San Diego, CA
,
June 16–17
,
Association for Computational Linguistics
, pp.
497
511
.
37.
Agirre
,
E.
,
Cer
,
D.
,
Diab
,
M.
, and
Gonzalez-Agirre
,
A.
, “
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity
,”
SEM 2012: The First Joint Conference on Lexical and Computational Semantics—Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)
,
Montréal, Canada
,
June 7–8
,
Association for Computational Linguistics
, pp.
385
393
.
38.
Agirre
,
E.
,
Cer
,
D.
,
Diab
,
M.
,
Gonzalez-Agirre
,
A.
, and
Guo
,
W.
, “
*SEM 2013 Shared Task: Semantic Textual Similarity
,”
Second Joint Conference on Lexical and Computational Semantics (* SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
,
Atlanta, GA
,
June 13–14
, pp.
32
43
.
39.
Cer
,
D.
,
Diab
,
M.
,
Agirre
,
E.
,
Lopez-Gazpio
,
I.
, and
Specia
,
L.
,
2017
, “
SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation
,”
Proceedings of International Workshop on Semantic Evaluation
,
Vancouver, Canada
,
Aug. 3–4
,
Association for Computational Linguistics
, pp.
1
14
.
40.
Marelli
,
M.
,
Bentivogli
,
L.
,
Baroni
,
M.
,
Bernardi
,
R.
,
Menini
,
S.
, and
Zamparelli
,
R.
, “
SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences Through Semantic Relatedness and Textual Entailment
,”
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)
,
Dublin, Ireland
,
Aug. 23–24
, pp.
1
8
.
41.
Reimers
,
N.
, and
Gurevych
,
I.
,
2019
, “
Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks
,”
Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing
,
Hong Kong, China
,
Nov. 3–7
, pp.
3982
3992
.
42.
Carlsson
,
F.
,
Gogoulou
,
E.
,
Ylipää
,
E.
,
Cuba Gyllensten
,
A.
, and
Sahlgren
,
M.
,
2021
, “
Semantic Re-Tuning With Contrastive Tension
,”
Proceedings of International Conference on Learning Representations
,
Vienna, Austria
,
May 4
.
43.
Gao
,
T.
,
Yao
,
X.
, and
Chen
,
D.
,
2021
, “
Simcse: Simple Contrastive Learning of Sentence Embeddings
,”
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
,
Punta Cana, Dominican Republic
,
Nov. 7–11
,
Association for Computational Linguistics
, pp.
6894
6910
.
44.
Li
,
Z.
,
Zhang
,
X.
,
Zhang
,
Y.
,
Long
,
D.
,
Xie
,
P.
, and
Zhang
,
M.
,
2023
, “Towards General Text Embeddings With Multi-stage Contrastive Learning,” arXiv preprint arXiv:2308.03281.
45.
Jiang
,
T.
,
Jiao
,
J.
,
Huang
,
S.
,
Zhang
,
Z.
,
Wang
,
D.
,
Zhuang
,
F.
,
Wei
,
F.
,
Huang
,
H.
,
Deng
,
D.
, and
Zhang
,
Q.
,
2022
, “PromptBERT: Improving BERT Sentence Embeddings With Prompts,” arXiv preprint arXiv:2201.04337.
46.
Cer
,
D.
,
Yang
,
Y.
,
Kong
,
S.-Y.
,
Hua
,
N.
,
Limtiaco
,
N.
,
John
,
R. S.
,
Constant
,
N.
,
Guajardo-Cespedes
,
M.
,
Yuan
,
S.
, and
Tar
,
C.
,
2018
, “
Universal Sentence Encoder for English
,”
Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
,
Brussels, Belgium
,
Oct. 31–Nov. 4
, pp.
169
174
.
47.
Chuang
,
Y.-S.
,
Dangovski
,
R.
,
Luo
,
H.
,
Zhang
,
Y.
,
Chang
,
S.
,
Soljačić
,
M.
,
Li
,
S.-W.
,
Yih
,
W.-T.
,
Kim
,
Y.
, and
Glass
,
J.
,
2022
, “
DiffCSE: Difference-Based Contrastive Learning for Sentence Embeddings
,”
Proceedings of 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
,
Seattle, United States
,
July 10–15
, pp.
4207
4218
.
48.
Muennighoff
,
N.
,
2022
, “SGPT: GPT Sentence Embeddings for Semantic Search,” arXiv preprint arXiv:2202.08904.
You do not currently have access to this content.