The importance of mining patents to support product design has been recognized, because patents are the major information source to support innovation and contain novel ideas, which usually cannot be found in published academic papers. In patent text mining, a basic issue is patent classification. However, automatic patent classification is difficult. One potential cause of the difficulty is the imbalanced dataset i.e. the interested positive class is minor while uninterested negative class is major. To alleviate the problem of imbalanced dataset and improve the performance of a Support Vector Machine (SVM) classifier, this study proposes P-SMOTE, a new oversampling technique which focuses on the blank spaces along positive borderline of a SVM. The proposed technique was firstly investigated on Reuters-21578, which is a standard text classification dataset. Then, P-SMOTE was applied to a design patent document dataset. It was observed that a SVM classifier with P-SMOTE, compared to a SVM classifier only, successfully achieved better results.
Skip Nav Destination
ASME 2011 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
August 28–31, 2011
Washington, DC, USA
Conference Sponsors:
- Design Engineering Division and Computers and Information in Engineering Division
ISBN:
978-0-7918-5479-2
PROCEEDINGS PAPER
P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification
Jingjing Wang,
Jingjing Wang
National University of Singapore, Singapore
Search for other works by this author on:
Wen Feng Lu,
Wen Feng Lu
National University of Singapore, Singapore
Search for other works by this author on:
Han Tong Loh
Han Tong Loh
National University of Singapore, Singapore
Search for other works by this author on:
Jingjing Wang
National University of Singapore, Singapore
Wen Feng Lu
National University of Singapore, Singapore
Han Tong Loh
National University of Singapore, Singapore
Paper No:
DETC2011-47313, pp. 1089-1098; 10 pages
Published Online:
June 12, 2012
Citation
Wang, J, Lu, WF, & Loh, HT. "P-SMOTE: One Oversampling Technique for Class Imbalanced Text Classification." Proceedings of the ASME 2011 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. Volume 2: 31st Computers and Information in Engineering Conference, Parts A and B. Washington, DC, USA. August 28–31, 2011. pp. 1089-1098. ASME. https://doi.org/10.1115/DETC2011-47313
Download citation file:
11
Views
Related Proceedings Papers
Related Articles
Latent Customer Needs Elicitation by Use Case Analogical Reasoning From Sentiment Analysis of Online Product Reviews
J. Mech. Des (July,2015)
Web Mining for Innovation
Mechanical Engineering (November,2008)
A Conceptual Design Tool for Resolving Conflicts Between Product Functionality and Environmental Impact
J. Mech. Des (September,2010)
Related Chapters
Topographic Processing of Very Large Text Datasets
Intelligent Engineering Systems through Artificial Neural Networks Volume 18
Processing Free Form Objects within a Product Development Process Framework
Advances in Computers and Information in Engineering Research, Volume 1
The Stirling Engine
Air Engines: The History, Science, and Reality of the Perfect Engine