Feature Selection for Multi-label Classification Using Neighborhood Preservation

Zhiling Cai; William Zhu

doi:10.1109/JAS.2017.7510781

Volume 5 Issue 1

Jan. 2018

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2018 > 5(1): 320-330

Zhiling Cai and William Zhu, "Feature Selection for Multi-label Classification Using Neighborhood Preservation," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 320-330, Jan. 2018. doi: 10.1109/JAS.2017.7510781

Citation:

Zhiling Cai and William Zhu, "Feature Selection for Multi-label Classification Using Neighborhood Preservation," IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 320-330, Jan. 2018. doi: 10.1109/JAS.2017.7510781

Citation:

PDF( 13598 KB)

Feature Selection for Multi-label Classification Using Neighborhood Preservation

doi: 10.1109/JAS.2017.7510781

Zhiling Cai,
William Zhu^,

Laboratory of Granular Computing and AI, Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China

Funds:

the National Natural Science Foundation of China 61379049

the National Natural Science Foundation of China 61772120

More Information

Abstract

Abstract

Multi-label learning deals with data associated with a set of labels simultaneously. Dimensionality reduction is an important but challenging task in multi-label learning. Feature selection is an efficient technique for dimensionality reduction to search an optimal feature subset preserving the most relevant information. In this paper, we propose an effective feature evaluation criterion for multi-label feature selection, called neighborhood relationship preserving score. This criterion is inspired by similarity preservation, which is widely used in single-label feature selection. It evaluates each feature subset by measuring its capability in preserving neighborhood relationship among samples. Unlike similarity preservation, we address the order of sample similarities which can well express the neighborhood relationship among samples, not just the pairwise sample similarity. With this criterion, we also design one ranking algorithm and one greedy algorithm for feature selection problem. The proposed algorithms are validated in six publicly available data sets from machine learning repository. Experimental results demonstrate their superiorities over the compared state-of-the-art methods.
- Feature selection,
- multi-label learning,
- neighborhood relationship preserving,
- sample similarity

FullText(HTML)

References(59)

References

[1]	M. R. Boutell, J. B. Luo, X. P. Shen, and C. M. Brown, "Learning multi-label scene classification, " Pattern Recognit., vol. 37, no. 9, pp. 1757-1771, Sep. 2004. http://www.sciencedirect.com/science/article/pii/S0031320304001074
[2]	M. L. Zhang and Z. H. Zhou, "A review on multi-label learning algorithms, " IEEE Trans. Knowl. Data Eng., vol. 26, no. 8, pp. 1819-1837, Aug. 2014. http://ieeexplore.ieee.org/document/6471714/
[3]	R. E. Schapire and Y. Singer, "Boostexter:a boosting-based system for text categorization, " Mach. Learn., vol. 39, no. 2-3, pp. 135-168, May 2000. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.33.1666
[4]	M. L. Zhang and L. Wu, "Lift: Multi-label learning with label-specific features, " IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 1, 107-120, Jan. 2015. http://www.ncbi.nlm.nih.gov/pubmed/26353212
[5]	Z. Barutcuoglu, R. E. Schapire, and O. G. Troyanskaya, "Hierarchical multi-label prediction of gene function, " Bioinformatics, vol. 22, no. 7, pp. 830-836, Apr. 2006. http://www.ncbi.nlm.nih.gov/pubmed/16410319
[6]	M. L. Zhang and Z. H. Zhou, "Multilabel neural networks with applications to functional genomics and text categorization, " IEEE Trans. Knowl. Data Eng., vol. 18, no. 10, pp. 1338-1351, Oct. 2006. http://ieeexplore.ieee.org/document/1683770/
[7]	W. Zhu and F. Y. Wang, "Reduction and axiomization of covering generalized rough sets, " Inf. Sci., vol. 152, pp. 217-230, Jun. 2003. http://www.sciencedirect.com/science/article/pii/S0020025503000562
[8]	W. Zhu, "Topological approaches to covering rough sets, " Inf. Sci., vol. 177, no. 6, pp. 1499-1508, Mar. 2007. http://dl.acm.org/citation.cfm?id=1223851
[9]	W. Zhu, "Relationship between generalized rough sets based on binary relation and covering, " Inf. Sci., vol. 179, no. 3, pp. 210-225, Jan. 2009. http://www.sciencedirect.com/science/article/pii/S0020025508003769
[10]	W. Zhu, "Relationship among basic concepts in covering-based rough sets, " Inf. Sci., vol. 179, no. 14, pp. 2478-2486, Jun. 2009. http://www.sciencedirect.com/science/article/pii/S0020025509000929
[11]	F. Y. Wang, "Control 5. 0: from Newton to Merton in Popper's cybersocial-physical spaces, " IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 233-234, Jul. 2016. http://ieeexplore.ieee.org/document/7508796/
[12]	F. Y. Wang, X. Wang, L. X. Li, and L. Li, "Steps toward parallel intelligence, " IEEE/CAA J. Autom. Sinica, vol. 3, no. 4, pp. 345-348, Oct. 2016. http://www.en.cnki.com.cn/Article_en/CJFDTOTAL-ZDHB201604001.htm
[13]	F. Y. Wang, J. J. Zhang, X. H. Zheng, X. Wang, Y. Yuan, X. X. Dai, J. Zhang, and L. Q. Yang, "Where does Alphago go: from Church-Turing thesis to Alphago thesis and beyond, " IEEE/CAA J. Autom. Sinica, vol. 3, no. 2, pp. 113-120, Apr. 2016. http://www.en.cnki.com.cn/Article_en/CJFDTOTAL-ZDHB201602001.htm
[14]	Y. Zhang and Z. H. Zhou, "Multilabel dimensionality reduction via dependence maximization, " ACM Trans. Knowl. Discov. Data, vol. 4, no. 3, Article ID: 14, Oct. 2010. http://dl.acm.org/citation.cfm?id=1839495
[15]	K. Fukunaga, Introduction to Statistical Pattern Recognition. San Diego, CA, USA:Academic Press, 2013.
[16]	I. T. Jolliffe, Principal Component Analysis, 2nd ed. New York, USA: Springer, 2002.
[17]	K. Yu, S. P. Yu, and V. Tresp, "Multi-label informed latent semantic indexing, " in Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp. 258-265. http://dl.acm.org/citation.cfm?id=1076080
[18]	H. Wold, "Estimation of principal components and related models by iterative least squares, " in Multivariate Analysis, P. R. Krishnajah, Ed. New York, USA: Academic Press, 1966, pp. 350-352. http://www.ams.org/mathscinet-getitem?mr=220397
[19]	X. J. Chang, F. P. Nie, Y. Yang, and H. Huang, "A convex formulation for semi-supervised multi-label feature selection, " in Proc. 28th AAAI Conference on Artificial Intelligence, Québec City, Québec, Canada, 2014, pp. 1171-1177. http://dl.acm.org/citation.cfm?id=2894055
[20]	X. N. Kong and P. S. Yu, "GMLC:a multi-label feature selection framework for graph classification, " Knowl. Inf. Syst., vol. 31, no. 2, pp. 281-305, May 2012. doi: 10.1007/s10115-011-0407-3
[21]	L. Song, A. Smola, A. Gretton, J. Bedo, and K. Borgwardt, "Feature selection via dependence maximization, " J. Mach. Learn. Res., vol. 13, no. 1, pp. 1393-1434, Jan. 2012. http://dl.acm.org/citation.cfm?id=2343691
[22]	Z. Zhao, X. F. He, D. Cai, L. J. Zhang, W. Ng, and Y. T. Zhuang, "Graph regularized feature selection with data reconstruction, " IEEE Trans. Knowl. Data Eng., vol. 28, no. 3, pp. 689-700, Mar. 2016. http://ieeexplore.ieee.org/document/7303939
[23]	G. Tsoumakas, I. Katakis, and I. Vlahavas, "Mining multi-label data, " in Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach, Eds. Boston, MA, USA: Springer, 2009, pp. 667-685. doi: 10.1007/978-0-387-09823-4_34
[24]	A. Chinnaswamy and R. Srinivasan, "Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data, " in Innovations in Bio-Inspired Computing and Applications, V. Snášel, A. Abraham, P. Krómer, M. Pant, and A. Muda, Eds. Cham, Germany: Springer, 2016, pp. 229-239. http://www.springerlink.com/content/fulltext.pdf?id=doi:10.1007/978-3-319-28031-8_20
[25]	O. Gharroudi, H. Elghazel, and A. Aussem, "A comparison of multilabel feature selection methods using the random forest paradigm, " in Advances in Artificial Intelligence, M. Sokolova and P. van Beek, Eds. Cham, Germany: Springer, 2014, pp. 95-106. doi: 10.1007/978-3-319-06483-3_9
[26]	M. L. Zhang, J. M. Peña, and V. Robles, "Feature selection for multi-label naive bayes classification, " Inf. Sci., vol. 179, no. 19, pp. 3218-3229, Sep. 2009. http://www.sciencedirect.com/science/article/pii/S0020025509002552
[27]	Q. Q. Gu, Z. H. Li, and J. W. Han, "Correlated multi-label feature selection, " in Proc. 20th ACM International Conference on Information and Knowledge Management, Glasgow, Scotland, UK, 2011, pp. 1087-1096. http://dl.acm.org/citation.cfm?id=2063734
[28]	F. P. Nie, H. Huang, X. Cai, and C. Ding, "Efficient and robust feature selection via ell2, 1-norms minimization, " in Proc. 23rd International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2010, pp. 1813-1821.
[29]	L. J. Zhang, Q. H. Hu, J. Duan, and X. X. Wang, "Multi-label feature selection with fuzzy rough sets, " in Rough Sets and Knowledge Technology, D. Miao, W. Pedrycz, D. Ślȩzak, G. Peters, Q. Hu, and R. Wang, Eds. Cham, Germany: Springer, 2014, pp. 121-128. doi: 10.1007/978-3-319-11740-9_12
[30]	S. W. Ji and J. P. Ye, "Linear dimensionality reduction for multi-label classification, " in Proc. 21st International Jont Conference on Artifical Intelligence, Pasadena, California, USA, 2009, pp. 1077-1082. http://dl.acm.org/citation.cfm?id=1661617
[31]	Y. M. Yang and J. O. Pedersen, "A comparative study on feature selection in text categorization, " in Proc. 14th International Conference on Machine Learning, San Francisco, CA, USA, 1997, pp. 412-420. http://dl.acm.org/citation.cfm?id=657137
[32]	D. G. Kong, C. Ding, H. Huang, and H. F. Zhao, "Multi-label reliefF and F-statistic feature selections for image annotation, " in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 2012, pp. 2352-2359. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6247947
[33]	N. Spolaôr, E. A. Cherman, M. C. Monard, and H. D. Lee, "A comparison of multi-label feature selection methods using the problem transformation approach, " Electron. Notes Theor. Comput. Sci., vol. 292, pp. 135-151, Mar. 2013. http://www.sciencedirect.com/science/article/pii/S1571066113000121
[34]	K. Trohidis, G. Tsoumakas, G. Kalliris, and I. P. Vlahavas, "Multilabel classification of music into emotions, " in International Society for Music Information Retrieval, Eds. Philadelphia, Pennsylvania USA: MITP, 2008, pp. 325-330. https://www.mendeley.com/research-papers/multilabel-classification-music-emotions/
[35]	J. Read, "A pruned problem transformation method for multi-label classification, " in Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008), Eds. Christchurch, New Zealand, 2008, pp. 143-150. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.331.3998
[36]	S. Diplaris, G. Tsoumakas, P. A. Mitkas, and I. Vlahavas, "Protein classification with multiple algorithms, " in Advances in Informatics, P. Bozanis and E. N. Houstis, Eds. Berlin, Heidelberg, Germany: Springer, 2005, pp. 448-456. http://dl.acm.org/citation.cfm?id=2098508
[37]	W. Z. Chen, J. Yan, B. Y. Zhang, Z. Chen, and Q. Yang, "Document transformation for multi-label feature selection in text categorization, " in 7th IEEE International Conference on Data Mining, Omaha, NE, USA, 2007, pp. 451-456. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4470272
[38]	G. Doquire and M. Verleysen, "Feature selection for multi-label classification problems, " in Advances in Computational Intelligence, J. Cabestany, I. Rojas, and G. Joya, Eds. Berlin, Heidelberg, Germany: Springer, 2011, pp. 9-16. http://dl.acm.org/citation.cfm?id=2023255
[39]	G. Doquire and M. Verleysen, "Mutual information-based feature selection for multilabel classification, " Neurocomputing, vol. 122, pp. 148-155, Dec. 2013. http://www.sciencedirect.com/science/article/pii/S0925231213006656
[40]	J. Lee and D. W. Kim, "Feature selection for multi-label classification using multivariate mutual information, " Pattern Recognit. Lett., vol. 34, no. 3, pp. 349-357, Feb. 2013. http://dl.acm.org/citation.cfm?id=2423077
[41]	J. Lee and D. W. Kim, "Fast multi-label feature selection based on information-theoretic feature ranking, " Pattern Recognit., vol. 48, no. 9, pp. 2761-2771, Sep. 2015. http://www.sciencedirect.com/science/article/pii/S0031320315001338
[42]	Z. Zhao, L. Wang, H. Liu, and J. P. Ye, "On similarity preserving feature selection, " IEEE Trans. Knowl. Data Eng., vol. 25, no. 3, pp. 619-632, Mar. 2013. http://ieeexplore.ieee.org/document/6051436/
[43]	C. Xu, T. L. Liu, D. C. Tao, and C. Xu, "Local rademacher complexity for multi-label learning, " IEEE Trans. Image Process., vol. 25, no. 3, pp. 1495-1507, Mar. 2016. http://www.ncbi.nlm.nih.gov/pubmed/26863660
[44]	S. M. Tabatabaei, S. Dick, and W. Xu, "Toward non-intrusive load monitoring via multi-label classification, " IEEE Trans. Smart Grid, vol. 8, no. 1, pp. 26-40, Jan. 2017. http://ieeexplore.ieee.org/document/7498597/
[45]	S. Godbole and S. Sarawagi, "Discriminative methods for multi-labeled classification, " in Advances in Knowledge Discovery and Data Mining, H. Dai, R. Srikant, and C. Zhang, Eds. Berlin, Heidelberg, Germany: Springer, 2004, pp. 22-30. http://www.springerlink.com/content/maa4ag38jd3pwrc0
[46]	M. L. Zhang and Z. H. Zhou, "ML-KNN: A lazy learning approach to multi-label learning, " Pattern Recognit., vol. 40, no. 7, pp. 2038-2048, Jul. 2007. http://www.sciencedirect.com/science/article/pii/S0031320307000027
[47]	G. Tsoumakas, I. Katakis, and I. Vlahavas, "Random k-labelsets for multilabel classification, " IEEE Trans. Knowl. Data Eng., vol. 23, no. 7, pp. 1079-1089, Jul. 2011. http://ieeexplore.ieee.org/document/5567103/
[48]	G. Tsoumakas and I. Katakis, "Multi-label classification: an overview, " Int. J. Data Warehous. Min., vol. 3, no. 3, Article ID: 1, Jul. 2007. http://econpapers.repec.org/article/iggjdwm00/v_3a3_3ay_3a2007_3ai_3a3_3ap_3a1-13.htm
[49]	A. Elisseeff and J. Weston, "A kernel method for multi-labelled classification, " in Proc. 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, Vancouver, British Columbia, Canada, 2001, pp. 681-687. http://dl.acm.org/citation.cfm?id=2980628&preflayout=tabs
[50]	N. Ueda and K. Saito, "Parametric mixture models for multi-labeled text, " in Proc. 15th International Conference on Neural Information Processing Systems, Cambridge, MA, USA, 2002, pp. 737-744. http://dl.acm.org/citation.cfm?id=2968710
[51]	S. H. Zhu, X. Ji, W. Xu, and Y. H. Gong, "Multi-labelled classification using maximum entropy method, " in Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Salvador, Brazil, 2005, pp. 274-281. http://dl.acm.org/citation.cfm?id=1076082
[52]	P. Hou, X. Geng, and M. L. Zhang, "Multi-label manifold learning, " in Proc. 30th AAAI Conference on Artificial Intelligence, Phoenix, Arizona, USA, 2016, pp. 1680-1686. http://dl.acm.org/citation.cfm?id=3016134
[53]	X. F. He, D. Cai, and P. Niyogi, "Laplacian score for feature selection, " in Proc. 18th International Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2005, pp. 507-514. http://dl.acm.org/citation.cfm?id=2976312
[54]	R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. New York, USA: John Wiley & Sons, 2001.
[55]	K. Kira and L. A. Rendell, "A practical approach to feature selection, " in Proc. 9th International Workshop on Machine Learning, San Francisco, CA, USA, 1992, pp. 249-256. http://dl.acm.org/citation.cfm?id=142034
[56]	G. H. Hardy, J. E. Littlewood, and G. Pólya, Inequalities. London, UK:Cambridge University Press, 1952.
[57]	S. Dumais, J. Platt, D. Heckerman, and M. Sahami, "Inductive learning algorithms and representations for text categorization, " in Proc. 7th International Conference on Information and Knowledge Management, Bethesda, Maryland, USA, 1998, pp. 148-155. http://en.cnki.com.cn/article_en/cjfdtotal-sjsj200604042.htm
[58]	J. Read, B. Pfahringer, G. Holmes, and E. Frank, "Classifier chains for multi-label classification, " Mach. Learn., vol. 85, no. 3, pp. 333-359, Dec. 2011. doi: 10.1007/s10994-011-5256-5
[59]	Y. J. Lin, Q. H. Hu, J. H. Liu, and J. Duan, "Multi-label feature selection based on max-dependency and min-redundancy, " Neurocomputing, vol. 168, pp. 92-103, Nov. 2015. http://www.sciencedirect.com/science/article/pii/S0925231215008309

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(6) / Tables(8)

Get Citation

PDF

XML

Article Metrics

Article views (1038) PDF downloads(82)

Feature Selection for Multi-label Classification Using Neighborhood Preservation

doi: 10.1109/JAS.2017.7510781

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content