In this paper a stepwise information-theoretic feature selector is designed and implemented to reduce the dimension of a data set without losing pertinent information. The effectiveness of the proposed feature selector is demonstrated by selecting features from forty three variables monitored on a set of heavy duty diesel engines and then using this feature space for classification of faults in these engines. Using a cross-validation technique, the effects of various classification methods (linear regression, quadratic discriminants, probabilistic neural networks, and support vector machines) and feature selection methods (regression subset selection, RV-based selection by simulated annealing, and information-theoretic selection) are compared based on the percentage misclassification. The information-theoretic feature selector combined with the probabilistic neural network achieved an average classification accuracy of 90%, which was the best performance of any combination of classifiers and feature selectors under consideration.

1.
Bell
C. B.
,
1962
. “
Mutual information and maximal correlation as measure of dependence
”.
Annals of Mathematical Statistics
,
33
, pp.
587
595
.
2.
Cover, T., and Thomas, J., 1993. Elements of Information Theory. John Wiley, New York.
3.
Hyvrinen
A.
, and
Oja
E.
,
2000
. “
Independent component analysis: Algorithms and applications
”.
Neural Networks
,
13
(
4–5)
, pp.
411
430
.
4.
Huber
P.
,
1985
. “
Projection pursuit
”.
The Annals of Statistics
,
13
(
2)
, pp.
435
475
.
5.
Schwarz
G.
,
1978
. “
Estimating the dimension of a model
”.
The Annals of Statistics
,
6
(
2)
, pp.
461
464
.
6.
Mallows
C.
,
1973
. “
Some comments on Cp
”.
Technometrics
,
15
(
4)
, pp.
661
675
.
7.
Akaike
H.
,
1974
. “
A new look at the statistical model identification
”.
IEEE Transaction on Automatic Control
,
AC-19
(
6)
, pp.
716
723
.
8.
Peng
H.
,
Long
F.
, and
Ding
C.
,
2005
. “
Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy
”.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
27
(
8)
, pp.
1226
1238
.
9.
Warner
B.
, and
Misra
M.
,
1996
. “
Understanding neural networks as statistical tools
”.
The American Statistician
,
50
(
4)
, pp.
284
293
.
10.
Hastie, T., Tibshirani, R., and Friedman, J., 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
11.
Vapnik
V.
,
1999
. “
An overview of statistical learning theory
”.
IEEE Transactions on Neural Networks
,
10
(
5)
, pp.
988
1000
.
12.
Jain
A.
,
Duin
P.
, and
Mao
J.
,
2000
. “
Statistical pattern recognition: A review
”.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
22
(
1)
, pp.
1
37
.
13.
Stoica
P.
, and
Selen
Y.
,
2004
. “
Model selection: A review of information criterion rules
”.
IEEE Signal Processing Magzine
,
21
(
4)
, pp.
36
47
.
14.
Liu, H., and Motoda, H., 1998. Feature Extraction Construction and Selection: A Data Mining Perspective, second ed. Kluwer Academic Publisher, Boston.
15.
Hand, D., 1997. Construction and Assement of Classification Rules, second ed. John Wiley, Chichester.
16.
Gordon, A., 1999. Classification. Chapman and Hall-CRC, Boca Raton.
17.
Vasconcelos
N.
,
2003
. “
A family of information-theoretic algorithms for low-complexity discriminant feature selection in image retrieval
”. In
Proceedings of International Conference on Image Processing
, Vol.
3
, pp.
741
744
.
18.
Al-Ani
A.
, and
Deriche
M.
,
2001
. “
An optimal feature selection technique using the concept of mutual information
”. In
International Symposium on Signal Processing and its Applications
, Vol.
2
, pp.
477
480
.
19.
Kwak
N.
, and
Chong-Ho
C.
,
2002
. “
Input feature selection by mutual information based on parzen window
”.
IEEE Transactions on Pattern Analysis and Machine Intelligence
,
24
(
12)
, pp.
1667
1671
.
20.
Battiti
R.
,
1994
. “
Using mutual information for selecting features in supervised neuralnet learning
”.
IEEE Transactions on Neural Networks
,
5
(
12)
, pp.
537
550
.
21.
Cummins Inc., 2005. http://www.cummins.com.
22.
MATLAB, 2005. regress, classify, pnn - Statistical Neural Network Toolbox, http://www.mathworks.com.
23.
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., and Weingessel, A., 2004. SVM, e1071 Package, The R project for statistical computing, http://www.r-project.org.
24.
Lumley, T., and Miller, A., 2004. Regsubsets, Leaps Package, The R project for statistical computing, http://www.r-project.org.
25.
Cerdeira, J., Silva, P., Cadima, J., and Minhoto, M., 2005. Anneal, Subselect Package, The R project for statistical computing, http://www.r-project.org.
26.
SAS, 2005. Proc GLM, SAS, http://www.sas.com.
This content is only available via PDF.
You do not currently have access to this content.