IEEE/CAA Journal of Automatica Sinica
Citation: | Haoyue Liu, MengChu Zhou and Qing Liu, "An Embedded Feature Selection Method for Imbalanced Data Classification," IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 703-715, May 2019. doi: 10.1109/JAS.2019.1911447 |
[1] |
F. Wang, T. Xu, T. Tang, M. C. Zhou, and H. Wang, " Bilevel feature extraction-based text mining for fault diagnosis of railway systems,” IEEE Trans. Intelligent Transportation Systems, vol. 18, no. 1, pp. 49–58, Jan. 2017. doi: 10.1109/TITS.2016.2521866
|
[2] |
D. Ramyachitra and P. Manikandan, " Imbalanced dataset classification and solutions: a review,” Inter. J. Computing and Business Research (IJCBR)
|
[3] |
E. Ramentol, Y. Caballero, R. Bello, and F. Herrera, " SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory,” Knowledge and Information Syst., vol. 33, no. 2, pp. 245–265, Nov. 2012. doi: 10.1007/s10115-011-0465-6
|
[4] |
Q. Kang, X. Chen, S. Li, and M. C. Zhou, " A noise-filtered under-sampling scheme for imbalanced classification,” IEEE Trans. Cybernetics, vol. 47, no. 12, pp. 4263–4274, Dec. 2018.
|
[5] |
B. Krawczyk, M. Woźniak, and G. Schaefer, " Cost-sensitive decision tree ensembles for effective imbalanced classification,” Applied Soft Computing, vol. 14, pp. 554–562, Jan. 2014. doi: 10.1016/j.asoc.2013.08.014
|
[6] |
V. Lopez, S. del Rio, J. Manuel Benitez, and F. Herrera, " On the use of MapReduce to build linguistic fuzzy rule based classification systems for big data,” in Proc. IEEE Int. Conf. Fuzzy Syst.. pp. 1905−1912, IEEE, Jul. 2014.
|
[7] |
Z. L. Cai and W. Zhu, " Feature selection for multi-label classification using neighborhood preservation,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 320–330, Jan. 2018. doi: 10.1109/JAS.2017.7510781
|
[8] |
C. Jian, J. Gao, and Y. Ao, " A new sampling method for classifying imbalanced data based on support vector machine ensemble,” Neurocomputing, vol. 193, pp. 115–122, 2016. doi: 10.1016/j.neucom.2016.02.006
|
[9] |
I. Guyon and A. Elisseeff, " An introduction to variable and feature selection,” J. Machine Learning Research, vol. 3, pp. 1157–1182, Mar. 2003.
|
[10] |
X. H. Yuan, L. B. Kong, D. C. Feng, and Z. C. Wei, " Automatic feature point detection and tracking of human actions in time-of-flight videos,” IEEE/CAA J. Autom Sinica, vol. 4, no. 4, pp. 677–685, Oct. 2017. doi: 10.1109/JAS.2017.7510625
|
[11] |
J. Wang, L. Qiao, Y. Ye, and Y. Chen, " Fractional envelope analysis for rolling element bearing weak fault feature extraction,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 2, pp. 353–360, 2017. doi: 10.1109/JAS.2016.7510166
|
[12] |
N. V. Chawla, N. Japkowicz, and A. Kotcz, " Editorial: special issue on learning from imbalanced data sets,” ACM Sigkdd Explorations Newsletter, vol. 6, no. 1, pp. 1–6, 2004. doi: 10.1145/1007730
|
[13] |
A. K. Uysal and S. Gunal, " A novel probabilistic feature selection method for text classification,” Knowledge-Based Systems, vol. 36, pp. 226–235, 2012. doi: 10.1016/j.knosys.2012.06.005
|
[14] |
L. Yu and H. Liu, " Feature selection for high-dimensional data: a fast correlation-based filter solution,” in Proc. Int. Conf. Machine Learning, vol. 3, pp. 856−863, 2003.
|
[15] |
V. Bolón-Canedo, N. Sánchez-Marono, A. Alonso-Betanzos, J. Manuel Benítez, and F. Herrera, " A review of microarray datasets and applied feature selection methods,” Information Sciences, vol. 282, pp. 111–135, 2014. doi: 10.1016/j.ins.2014.05.042
|
[16] |
G. Chandrashekar and F. Sahin, " A survey on feature selection methods,” Computers & Electrical Engineering, vol. 41, no. 1, pp. 16–28, 2014.
|
[17] |
H. Liu and H. Motoda, " Feature selection for knowledge discovery and data mining,” Springer Science & Business Media, vol. 454, 2012.
|
[18] |
S. Shilaskar and A. Ghatol, " Feature selection for medical diagnosis: Evaluation for cardiovascular diseases,” Expert Syst. with Applications, vol. 40, no. 10, pp. 4146–4153, 2013. doi: 10.1016/j.eswa.2013.01.032
|
[19] |
I. A. Gheyas and L. S. Smith, " Feature subset selection in large dimensionality domains,” Pattern Recognition, vol. 43, no. 1, pp. 5–13, 2010. doi: 10.1016/j.patcog.2009.06.009
|
[20] |
S. Maldonado and R. Weber, " A wrapper method for feature selection using support vector machines,” Information Sciences, vol. 179, no. 13, pp. 2208–2217, 2009. doi: 10.1016/j.ins.2009.02.014
|
[21] |
Y. Zhu, J. Liang, J. Chen, and M. Zhong, " An improved NSGA-III algorithm for feature selection used in intrusion detection,” J. Knowledge-Based Syst., vol. 116, pp. 74–85, Jan. 2017. doi: 10.1016/j.knosys.2016.10.030
|
[22] |
A. Moayedikia, K. L. Ong, Y. L. Boo, W. G. Yeoh, and R. Jensen, " Feature selection for high dimensional imbalanced class data using harmony search,” J. Engineering Applications of Artificial Intelligence, vol. 57, pp. 38–49, Jan. 2017. doi: 10.1016/j.engappai.2016.10.008
|
[23] |
I. Guyon and A. Elisseeff, " An introduction to variable and feature selection,” J. Machine Learning Research, vol. 3, pp. 1157–1182, Mar. 2003.
|
[24] |
S. Maldonado and J. Lopez, " Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification,” J. Applied Soft Computing, vol. 67, pp. 94–105, Jun. 2018. doi: 10.1016/j.asoc.2018.02.051
|
[25] |
C. Apté, F. Damerau, and M. S. Weiss, " Automated learning of decision rules for text categorization,” ACM Trans. Information Syst., vol. 12, no. 3, pp. 233–251, 1994. doi: 10.1145/183422.183423
|
[26] |
G. Forman, " An extensive empirical study of feature selection metrics for text classification,” J. Machine Learning Research, vol. 3, pp. 1289–1305, Mar. 2003.
|
[27] |
C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri, " Know your neighbors: Web spam detection using the web topology,” in Proc. the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 423−430, Jul. 2007.
|
[28] |
H. Koh, W. C. Tan, and G. C. Peng, " Credit scoring using data mining techniques,” Singapore Management Review, vol. 26, no. 2, pp. 252004.
|
[29] |
J. R. Quinlan, " Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
|
[30] |
J. R. Quinlan, "Constructing decision tree," C4, 5, pp. 17–26, 1993.
|
[31] |
X. Chen, M. Wang, and H. Zhang, " The use of classification trees for bioinformatics,” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 1, pp. 55–63, 2011. doi: 10.1002/widm.14
|
[32] |
L. Breiman, " Classification and regression trees,” Routledge, 2017.
|
[33] |
H. Y. Liu, M. C. Zhou, X.S. Lu, and C. Yao, " Weighted Gini index feature selection method for imbalanced data,” in Proc. 15th IEEE International Conference on Networking, Sensing and Control (ICNSC), pp. 1−6, Mar. 2018.
|
[34] |
H. Y. Liu and M. C. Zhou, " Decision tree rule-based feature selection for large-scale imbalanced data,” in Proc. 26th IEEE Wireless and Optical Communication Conf. (WOCC), pp. 1−6, IEEE, Apr. 2017.
|
[35] |
T. Q. Chen and T. He, " Xgboost: extreme gradient boosting,” R Package Version 0.4−2, 2015.
|
[36] |
T. Fawcett, " An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006. doi: 10.1016/j.patrec.2005.10.010
|
[37] |
N. V. Chawla, N. Japkowicz, and A. Kotcz, " Editorial: special issue on learning from imbalanced data sets,” ACM SIGKDD Explorations Newsletter, vol. 1, pp. 1–6, 2004.
|
[38] |
D. D. Lewis, and A. G. William. " A sequential algorithm for training text classifiers,” in Proc. 17th Annu. Int. ACM SIGIR Conf on Research and Development in Information Retrieval, Springer-Verlag New York, Inc., pp. 3−12, 1994.
|
[39] |
C. J. Van Rijsbergen. Information Retrieval (2nd ed.). Butterworth-Heinemann, Newton, MA, USA, 1979.
|
[40] |
M. Friedman, " A comparison of alternative tests of significance for the problem of m rankings,” The Annu. of Mathematical Statistics, no. 1, pp. 86–92, 1940.
|
[41] |
R. F. Woolson, " Wilcoxon signed-rank test,” Wiley Encyclopedia of Clinical Trials, pp. 1–3, 2007.
|
[42] |
J. Demšar, " Statistical comparisons of classifiers over multiple data sets,” J. Machine Learning Research, vol. 7, pp. 1–30, Jan. 2006.
|
[43] |
S. Garcia and H. Francisco, " An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons,” J. Machine Learning Research, vol. 9, pp. 2677–2694, Dec. 2008.
|
[44] |
P. Zhang, S. Shu, and M. C. Zhou, " An Online Fault Detection Method based on SVM-Grid for Cloud Computing Systems,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 2, pp. 445–456, Mar. 2018. doi: 10.1109/JAS.2017.7510817
|
[45] |
J. Cheng, M. Chen, M. Zhou, S. Gao, C. Liu, and C. Liu, " Overlapping Community Change Point Detection in an Evolving Network,” IEEE Trans. Big Data, DOI: 10.1109/TBDATA.2018.2880780, Nov. 2018.
|
[46] |
S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, " Dendritic neuron model with effective learning algorithms for classification, approximation and prediction,” IEEE Trans-Neural Networks and Learning Syst., DOI: 10.1109/TNNLS.2018.2846646, 2018.
|
[47] |
Q. Kang, L. Shi, M. C. Zhou, X. Wang, Q. Wu, and Z. Wei, " A Distance-based Weighted Undersampling Scheme for Support Vector Machines and Its Application to Imbalanced Classification,” IEEE Trans. Neural Networks and Learning Syst., vol. 29, no. 9, pp. 4152–4165, Sep. 2018. doi: 10.1109/TNNLS.2017.2755595
|