This paper focuses on automatic summarization of multiple engineering papers. A summarization approach based on documents’ macro- and microstructure has been proposed. The macrostructure consists of a list of ranked topics from engineering papers. Topics are discovered by extracting and grouping frequently appearing word sequences into equivalence classes. Hence, the macrostructure symbolically presents the topical links in different papers. Meanwhile, the microstructure is defined as the rhetorical structure within a single paper. The identification of microstructure is approached as a classification problem. Each sentence in a paper is automatically labeled with one of the predefined rhetorical categories. Unlike existing summarization methods that first separate documents into nonoverlapping clusters and then summarize each cluster individually, our approach aims to summarize multiple documents according to the characteristics suggested at macro- and microstructure levels. The experimental study showed that our proposed approach outperformed peer systems in terms of recall-oriented understudy for gisting evaluation scores and readers’ responsiveness. In an independent manual categorization task using the summaries generated by our approach and peer systems, we also performed better in terms of precision and recall.

1.
Luhn
,
H. P.
, 1958, “
The Automatic Creation of Literature Abstracts
,”
IBM J. Res. Dev.
0018-8646,
2
(
2
), pp.
159
165
.
2.
Barzilay
,
R.
, and
Elhadad
,
M.
, 1997, “
Using Lexical Chains for Text Summarization
,”
Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization
, Madrid, Spain, pp.
10
17
.
3.
Gong
,
Y.
, and
Liu
,
X.
, 2001, “
Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis
,”
Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, New Orleans, LA, pp.
19
25
.
4.
Hovy
,
E.
, and
Lin
,
C. -Y.
, 1997, “
Automated Text Summarization in SUMMARIST
,”
Proceedings of the ACL’97/EACL’97 Workshop on Intelligent Scalable Text Summarization
, Madrid, Spain, pp.
18
24
.
5.
Yeh
,
J. -Y.
,
Ke
,
H. -R.
,
Yang
,
W. -P.
, and
Meng
,
I. -H.
, 2005, “
Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis
,”
Inf. Process. Manage.
0306-4573,
41
(
1
), pp.
75
95
.
6.
Edmundson
,
H. P.
, 1969, “
New Methods in Automatic Extracting
,”
J. ACM
1535-9921,
16
(
2
), pp.
264
285
.
7.
Kupiec
,
J.
,
Pedersen
,
J.
, and
Chen
,
F.
, 1995, “
A Trainable Document Summarizer
,”
Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Seattle, WA, pp.
68
73
.
8.
Marcu
,
D.
, 1999, “
Discourse Trees Are Good Indicators of Importance in Text
,”
Advances in Automatic Text Summarization
,
I.
Mani
and
M.
Maybury
, eds.,
MIT
,
Cambridge, MA
, pp.
123
136
.
9.
Mani
,
I.
, and
Bloedorn
,
E.
, 1999, “
Summarizing Similarities and Differences Among Related Documents
,”
Inf. Retrieval
,
1
(
1/2
), pp.
35
67
.
10.
McKeown
,
K.
, and
Radev
,
D. R.
, 1995, “
Generating Summaries of Multiple News Articles
,”
Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Seattle, WA, pp.
74
82
.
11.
Moens
,
M. -F.
,
Angheluta
,
R.
, and
Dumortier
,
J.
, 2005, “
Generic Technologies for Single- and Multi-Document Summarization
,”
Inf. Process. Manage.
0306-4573,
41
(
3
), pp.
569
586
.
12.
Radev
,
D. R.
, 2000, “
A Common Theory of Information Fusion From Multiple Text Sources, Step One: Cross-Document Structure
,”
Proceedings of the First ACL SIGDIAL Workshop on Discourse and Dialogue
, Hong Kong, SAR, China, pp.
74
83
.
13.
Barzilay
,
R.
, and
McKeown
,
K.
, 2005, “
Sentence Fusion for Multidocument News Summarization
,”
Comput. Linguist.
0891-2017,
31
(
3
), pp.
297
328
.
14.
Maña-López
,
M. J.
, 2004, “
Multidocument Summarization: An Added Value to Clustering in Interactive Retrieval
,”
ACM Trans. Inf. Syst.
,
22
(
2
), pp.
215
241
.
15.
Boros
,
E.
,
Kantor
,
P. B.
, and
Neu
,
D. J.
, 2001, “
A Clustering Based Approach to Creating Multi-Document Summaries
,”
Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, New Orleans, LA.
16.
Teufel
,
S.
, and
Moens
,
M.
, 2002, “
Summarizing Scientific Articles: Experiments With Relevance and Rhetorical Status
,”
Comput. Linguist.
0891-2017,
28
(
4
), pp.
409
445
.
17.
Zhan
,
J.
,
Loh
,
H. T.
, and
Liu
,
Y.
, 2009b, “
On Macro- and Micro-Level Information in Multiple Documents and Its Influence on Summarization
,”
Int. J. Inf. Manage.
,
29
(
1
), pp.
57
66
.
18.
Paice
,
C. D.
, and
Jones
,
P. A.
, 1993, “
The Identification of Important Concepts in Highly Structured Technical Papers
,”
Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Pittsburgh, PA, pp.
69
78
.
19.
Saggion
,
H.
, and
Lapalme
,
G.
, 2002, “
Generating Indicative-Informative Summaries With SumUM
,”
Comput. Linguist.
0891-2017,
28
(
4
), pp.
497
526
.
20.
Radev
,
D. R.
,
Jing
,
H.
,
Styś
,
M.
, and
Tam
,
D.
, 2004, “
Centroid-Based Summarization of Multiple Documents
,”
Inf. Process. Manage.
0306-4573,
40
(
6
), pp.
919
938
.
21.
Conroy
,
J. M.
,
Schlesinger
,
J. D.
, and
O’Leary
,
D. P.
, 2006, “
Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score
,”
Proceedings of the COLING/ACL
, Sydney, Australia, pp.
152
159
.
22.
Erkan
,
G.
, and
Radev
,
D. R.
, 2004, “
Lexrank: Graph-Based Centrality as Salience in Text Summarization
,”
J. Artif. Intell. Res.
1076-9757,
22
, pp.
457
479
.
23.
Salton
,
G.
,
Singhal
,
A.
,
Mitra
,
M.
, and
Buckley
,
C.
, 1997, “
Automatic Text Structuring and Summarization
,”
Inf. Process. Manage.
0306-4573,
33
(
2
), pp.
193
207
.
24.
McDonald
,
R.
, 2007, “
A Study of Global Inference Algorithms in Multi-Document Summarization
,”
Proceedings of the European Conference on Information Retrieval
, Rome, Italy.
25.
Carbonell
,
J.
, and
Goldstein
,
J.
, 1998, “
The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries
,”
Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Melbourne, Australia, pp.
335
336
.
26.
Daume
,
H.
, III
, and
Marcu
,
D.
, 2006, “
Bayesian Query-Focused Summarization
,”
Proceedings of the Conference of the Association for Computational Linguistics
, Sydney, Australia.
27.
Choi
,
F. Y. Y.
, 2000, “
Advances in Domain Independent Linear Text Segmentation
,”
Proceedings of the First North American Chapter of the Association for Computational Linguistics
, Seattle, WA, pp.
26
33
.
28.
Hearst
,
M. A.
, 1997, “
TextTiling: Segmenting Text Into Multi-Paragraph Subtopic Passages
,”
Comput. Linguist.
0891-2017,
23
(
1
), pp.
33
64
.
29.
Moens
,
M. -F.
, and
De Busser
,
R.
, 2001, “
Generic Topic Segmentation of Document Texts
,”
Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, New Orleans, LA, pp.
418
419
.
30.
Ponte
,
J. M.
, and
Croft
,
W. B.
, 1997, “
Text Segmentation by Topic
,”
Proceedings of the First European Conference on Research on Advanced Technology for Digital Libraries
, Pisa, Italy, pp.
113
125
.
31.
Schlesinger
,
J. D.
,
Conroy
,
J. M.
,
Okurowski
,
M. E.
, and
O’Leary
,
D. P.
, 2003, “
Machine and Human Performance for Single and Multidocument Summarization
,”
IEEE Intell. Syst.
1094-7167,
18
(
1
), pp.
46
54
.
32.
Paice
,
C. D.
, 1990, “
Constructing Literature Abstracts by Computer: Techniques and Prospects
,”
Inf. Process. Manage.
0306-4573,
26
(
1
), pp.
171
186
.
33.
Nanba
,
H.
, and
Okumura
,
M.
, 1999, “
Towards Multi-Paper Summarization Using Reference Information
,”
Proceedings of the 16th International Joint Conference on Artificial Intelligence
, Stockholm, Sweden, pp.
926
931
.
34.
Ahonen
,
H.
, 1999, “
Finding All Maximal Frequent Sequences in Text
,”
Proceedings of the ICML’99 Workshop on Machine Learning in Text Data Analysis
, Bled, Slovenia.
35.
Liu
,
Y.
, 2005, “
A Concept-Based Text Classification System for Manufacturing Information Retrieval
,” Ph.D. thesis, National University of Singapore, Singapore, Singapore.
36.
Yap
,
I.
,
Loh
,
H. T.
,
Shen
,
L.
, and
Liu
,
Y.
, 2006, “
Topic Detection Using MFSs
,”
Lecture Notes in Computer Science, LNCS 4031
,
Proceedings of the 19th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA\AIE 2006)
,
Lecture Notes in Computer Science
, LNCS 4031, Annecy, France, pp.
342
352
.
37.
Clifton
,
C.
,
Cooley
,
R.
, and
Rennie
,
J.
, 2004, “
TopCat: Data Mining for Topic Identification in a Text Corpus
,”
IEEE Trans. Knowl. Data Eng.
1041-4347,
16
(
8
), pp.
949
964
.
38.
Medelyan
,
O.
, and
Witten
,
I. H.
, 2006, “
Thesaurus Based Automatic Keyphrase Indexing
,”
Proceedings of the Sixth ACM/IEEE-CS Joint Conference on Digital Libraries
, Chapel Hill, NC, pp.
296
297
.
39.
Witten
,
I. H.
,
Paynter
,
G. W.
,
Frank
,
E.
,
Gutwin
,
C.
, and
Nevill-Manning
,
C. G.
, 1999, “
KEA: Practical Automatic Keyphrase Extraction
,”
Proceedings of the Fourth ACM Conference on Digital Libraries
.
40.
Trawiński
,
B.
, 1989, “
A Methodology for Writing Problem-Structured Abstracts
,”
Inf. Process. Manage.
0306-4573,
25
(
6
), pp.
693
702
.
41.
Zappen
,
J. P.
, 1983, “
A Rhetoric for Research in Sciences and Technologies
,”
New Essays in Technical and Scientific Communication Research Theory Practice
,
P. V.
Anderson
,
R. J.
Brockman
, and
C. R.
Miller
, eds.,
Baywood
,
Farmingdale, NY
, pp.
123
138
.
42.
Lewis
,
D. D.
, and
Gale
,
W. A.
, 1994, “
A Sequential Algorithm for Training Text Classifiers
,”
Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Dublin, Ireland, pp.
3
12
.
43.
Vapnik
,
V. N.
, 1995,
The Nature of Statistical Learning Theory
,
Springer-Verlag
,
Berlin
.
44.
Witten
,
I. H.
, and
Frank
,
E.
, 2005,
Data Mining: Practical Machine Learning Tools and Techniques
,
2nd Ed.
,
Morgan Kaufmann
,
San Francisco, CA
.
45.
Joachims
,
T.
, 1998, “
Text Categorization With Support Vector Machines: Learning With Many Relevant Features
,”
Proceedings of the Tenth European Conference on Machine Learning
, Chemnitz, Germany, pp.
137
142
.
46.
Joachims
,
T.
, 1999, “
Making Large-Scale Support Vector Machine Learning Practical
,”
Advances in Kernel Methods: Support Vector Learning
,
B.
Schölkopf
,
C.
Burges
, and
A.
Smola
, eds.,
MIT
,
Cambridge, MA
, pp.
169
184
.
47.
Biber
,
D.
, 1995,
Dimensions of Register Variation: A Cross-Linguistic Comparison
,
Cambridge University Press
,
Cambridge, England
.
48.
Baeza-Yates
,
R.
, and
Ribeiro-Neto
,
B.
, 1999,
Modern Information Retrieval
,
Addison-Wesley Longman
,
Boston, MA
.
49.
Salton
,
G.
, and
McGill
,
M. J.
, 1983,
Introduction to Modern Information Retrieval
,
McGraw-Hill
,
New York
.
50.
Zhan
,
J.
,
Loh
,
H. T.
, and
Liu
,
Y.
, 2009a, “
Gather Customer Concerns From Online Product Reviews—A Text Summarization Approach
,”
Expert Syst. Appl.
,
36
(
2
), pp.
2107
2115
.
51.
Jing
,
H.
,
Barzilay
,
R.
,
McKeown
,
K.
, and
Elhadad
,
M.
, 1998, “
Summarization Evaluation Methods: Experiments and Analysis
,”
Proceedings of the AAAI’98 Workshop on Intelligent Text Summarization
, Stanford, CA, pp.
60
68
.
52.
Mani
,
I.
,
House
,
D.
,
Klein
,
G.
,
Hirschman
,
L.
,
Firmin
,
T.
, and
Sundheim
,
B.
, 1999, “
The TIPSTER SUMMAC Text Summarization Evaluation
,”
Proceedings of the Ninth Conference on European Chapter of the Association for Computational Linguistics
, Bergen, Norway, pp.
77
85
.
53.
Tombros
,
A.
, and
Sanderson
,
M.
, 1998, “
Advantages of Query Biased Summaries in Information Retrieval
,”
Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Melbourne, Australia, pp.
2
10
.
54.
Liu
,
Y.
, and
Loh
,
H. T.
, 2007, “
Corpus Building for Corporate Knowledge Discovery and Management: A Case Study of Manufacturing
,”
Lecture Notes in Artificial Intelligence, LNAI 4692
,
Proceedings of the 11th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, KES’07
,
Lecture Notes in Artificial Intelligence
, LNAI 4692, Vietri sul Mare, Italy, pp.
542
550
.
55.
Lin
,
C. Y.
, 2004, “
ROGUE: A Package for Automatic Evaluation of Summaries
,”
Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004)
, Barcelona, Spain, pp.
74
81
.
56.
Roussinov
,
D. G.
, and
Chen
,
H.
, 2001, “
Information Navigation on the Web by Clustering and Summarizing Query Results
,”
Inf. Process. Manage.
0306-4573,
37
(
6
), pp.
789
816
.
57.
McKeown
,
K.
,
Passonneau
,
R. J.
,
Elson
,
D. K.
,
Nenkova
,
A.
, and
Hirschberg
,
J.
, 2005, “
Do Summaries Help?
,”
Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, Salvador, Brazil.
58.
Zamir
,
O.
, and
Etzioni
,
O.
, 1999, “
Grouper: A Dynamic Clustering Interface to Web Search Results
,”
Comput. Netw.
1389-1286,
31
(
11–16
), pp.
1361
1374
.
59.
Chen
,
J.
, and
Zribi
,
M.
, 2000, “
Control of Multifingered Robot Hands With Rolling and Sliding Contacts
,”
Int. J. Adv. Manuf. Technol.
0268-3768,
16
, pp.
71
77
.
You do not currently have access to this content.