IEEE/CAA Journal of Automatica Sinica
Citation: | Zhao Ren, Kun Qian, Zixing Zhang, Vedhas Pandit, Alice Baird and Björn Schuller, "Deep Scalogram Representations for Acoustic Scene Classification," IEEE/CAA J. Autom. Sinica, vol. 5, no. 3, pp. 662-669, Mar. 2018. doi: 10.1109/JAS.2018.7511066 |
[1] |
E. Marchi, D. Tonelli, X. Z. Xu, F. Ringeval, J. Deng, S. Squartini, and B. Schuller, "Pairwise decomposition with deep neural networks and multiscale kernel subspace learning for acoustic scene classification, " in Proc. Detection and Classification of Acoustic Scenes and Events, Budapest, Hungary, 2016, pp. 65-69.
|
[2] |
W. He, Z. J. Li, and C. L. P. Chen, "A survey of human-centered intelligent robots: Issues and challenges, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 4, pp. 602-609, Oct. 2017. http://www.ieee-jas.org/CN/abstract/abstract280.shtml
|
[3] |
F. Eyben, F. Weninger, F. Groß, and B. Schuller, "Recent developments in openSMILE, the Munich open-source multimedia feature extractor, " in Proc. 21st ACM Int. Conf. Multimedia, Barcelona, Spain, 2013, pp. 835-838. https://dl.acm.org/citation.cfm?doid=2502081.2502224
|
[4] |
L. Li, Y. L. Lin, N. N. Zheng, and F. Y. Wang, "Parallel learning: A perspective and a framework, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 3, pp. 389-395, Jul. 2017. doi: 10.1109/JAS.2017.7510493
|
[5] |
F. Y. Wang, N. N. Zheng, D. P. Cao, C. M. Martinez, L. Li, and T. Liu, "Parallel driving in CPSS: A unified approach for transport automation and vehicle intelligence, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 4, pp. 577-587, Oct. 2017. doi: 10.1109/JAS.2017.7510598
|
[6] |
S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, M. Freitag, S. Pugachevskiy, A. Baird, and B. Schuller, "Snore sound classification using image-based deep spectrum features, " in Proc. INTERSPEECH 2017: Conf. Int. Speech Communication Association, Stockholm, Sweden, 2017, pp. 3512-3516. https://dl.acm.org/citation.cfm?doid=2502081.2502224
|
[7] |
M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen, "DCASE 2016 acoustic scene classification using convolutional neural networks, " in Proc. Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016, pp. 95-99.
|
[8] |
I. Daubechies, "The wavelet transform, time-frequency localization and signal analysis, " IEEE Trans. Inf. Theory, vol. 36, no. 5, pp. 961-1005, Sep. 1990. http://ieeexplore.ieee.org/document/57199/
|
[9] |
V. N. Varghees and K. I. Ramachandran, "Effective heart sound segmentation and murmur classification using empirical wavelet transform and instantaneous phase for electronic stethoscope, " IEEE Sens. J., vol. 17, no. 12, pp. 3861-3872, Jun. 2017. http://ieeexplore.ieee.org/document/7903626
|
[10] |
K. Qian, C. Janott, Z. X. Zhang, C. Heiser, and B. Schuller, "Wavelet features for classification of vote snore sounds, " in Proc. 2016 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Shanghai, China, 2016, pp. 221-225.
|
[11] |
K. Qian, C. Janott, J. Deng, C. Heiser, W. Hohenhorst, M. Herzog, N. Cummins, and B. Schuller, "Snore sound recognition: on wavelets and classifiers from deep nets to kernels, " in Proc. 39th Ann. Int. Conf. of the IEEE Engineering in Medicine and Biology Society, Seogwipo, South Korea, 2017, pp. 3737-3740.
|
[12] |
K. Qian, C. Janott, V. Pandit, Z. X. Zhang, C. Heiser, W. Hohenhorst, M. Herzog, W. Hemmert, and B. Schuller, "Classification of the excitation location of snore sounds in the upper airway by acoustic multifeature analysis, " IEEE Trans. Biomed. Eng., vol. 64, no. 8, pp. 1731-1741, Aug. 2017. http://ieeexplore.ieee.org/document/7605472/
|
[13] |
K. Qian, Z. Ren, V. Pandit, Z. J. Yang, Z. X. Zhang, and B. Schuller, "Wavelets revisited for the classification of acoustic scenes, " in Proc. Detection and Classification of Acoustic Scenes and Events 2017, Munich, Germany, 2017, pp. 108-112.
|
[14] |
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet large scale visual recognition challenge, " Int. J. Comput. Vis., vol. 115, no. 3, pp. 211-252, Dec. 2015.
|
[15] |
J. Schlüter and S. Böck, "Improved musical onset detection with convolutional neural networks, " in Proc. 2014 IEEE Int. Conf. Acoustics, Speech and Signal Processing, Florence, Italy, 2014, pp. 6979-6983. http://ieeexplore.ieee.org/document/6854953/
|
[16] |
G. Gwardys and D. Grzywczak, "Deep image features in music information retrieval, " Int. J. Electron. Telecomm., vol. 60, no. 4, pp. 321-326, Dec. 2014. https://www.deepdyve.com/lp/de-gruyter/deep-image-features-in-music-information-retrieval-k0MzODXMRz
|
[17] |
J. Deng, N. Cummins, J. Han, X. Z. Xu, Z. Ren, V. Pandit, Z. X. Zhang, and B. Schuller, "The University of Passau open emotion recognition system for the multimodal emotion challenge, " in Proc. 7th Chinese Conf. Pattern Recognition (CCPR), Chengdu, China, 2016, pp. 652-666. doi: 10.1007/978-981-10-3005-5_54
|
[18] |
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet classification with deep convolutional neural networks, " in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, Nevada, USA, 2012, pp. 1097-1105. https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks
|
[19] |
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition, " in Proc. Int. Conf. Learning Representations, San Diego, CA, USA, 2015.
|
[20] |
S. J. Pan and Q. Yang, "A survey on transfer learning, " IEEE Trans. Knowl. Data Eng. , vol. 22, no. 10, pp. 1345-1359, Oct. 2010.
|
[21] |
W. Y. Zhang, H. G. Zhang, J. H. Liu, K. Li, D. S. Yang, and H. Tian, "Weather prediction with multiclass support vector machines in the fault detection of photovoltaic system, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 3, pp. 520-525, Jul. 2017. http://www.ieee-jas.org/EN/abstract/abstract270.shtml
|
[22] |
S. Young, G. Evermann, D. Kershaw, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book. Cambridge, UK:Cambridge University Engineering Department, 2002.
|
[23] |
D. P. Mandic and J. A. Chambers, Recurrent Neural Networks for Prediction:Learning Algorithms, Architectures and Stability. New York, USA:Wiley Online Library, 2002.
|
[24] |
S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural Comput. , vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
|
[25] |
S. H. Bae, I. Choi, and N. S. Kim, "Acoustic scene classification using parallel combination of LSTM and CNN, " in Proc. Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016, pp. 11-15.
|
[26] |
D. Yu and J. Y. Li, "Recent progresses in deep learning based acoustic models, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 3, pp. 396-409, Jul. 2017. http://www.ieee-jas.org/EN/abstract/abstract260.shtml
|
[27] |
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling, " in Proc. NIPS 2014 Deep Learning and Representation Learning Workshop, Montreal, Canada, 2014.
|
[28] |
Z. Ren, V. Pandit, K. Qian, Z. J. Yang, Z. X. Zhang, and B. Schuller, "Deep sequential image features for acoustic scene classification, " in Proc. Detection and Classification of Acoustic Scenes and Events, Munich, Germany, 2017, pp. 113-117.
|
[29] |
A. Mesaros, T. Heittola, A. Diment, B. Elizalde, A. Shah, E. Vincent, B. Raj, and T. Virtanen, "DCASE 2017 challenge setup: tasks, datasets and baseline system, " in Proc. Workshop on Detection and Classification of Acoustic Scenes and Events, Munich, Germany, 2017, pp. 85-92.
|
[30] |
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson, "CNN architectures for large-scale audio classification, " in Proc. 2017 IEEE Int. Conf. Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 2017, pp. 131-135.
|
[31] |
S. Amiriparian, M. Freitag, N. Cummins, and B. Schuller, "Sequence to sequence autoencoders for unsupervised representation learning from audio, " in Proc. Detection and Classification of Acoustic Scenes and Events 2017, Munich, Germany, 2017, pp. 17-21.
|
[32] |
E. Fonseca, R. Gong, D. Bogdanov, O. Slizovskaia, E. Gomez, and X. Serra, "Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks, " in Proc. Detection and Classification of Acoustic Scenes and Events 2017, Munich, Germany, 2017, pp. 37-41.
|
[33] |
A. Vafeiadis, D. Kalatzis, K. Votis, D. Giakoumis, D. Tzovaras, L. M. Chen, and R. Hamzaoui, "Acoustic scene classification: From a hybrid classifier to deep learning, " in Proc. Detection and Classification of Acoustic Scenes and Events 2017, Munich, Germany, 2017, pp. 123-127.
|
[34] |
S. Park, S. Mun, Y. Lee, and H. Ko, "Acoustic scene classification based on convolutional neural network using double image features, " in Proc. Detection and Classification of Acoustic Scenes and Events 2017, Munich, Germany, 2017, pp. 98-102.
|
[35] |
R. N. Khushaba, S. Kodagoda, S. Lal, and G. Dissanayake, "Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm, " IEEE Trans. Biomed. Eng., vol. 58, no. 1, pp. 121-131, Jan. 2011. http://ieeexplore.ieee.org/document/5580017/
|
[36] |
T. H. Vu and J. C. Wang, "Acoustic scene and event recognition using recurrent neural networks, " in Proc. Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016.
|
[37] |
M. Zöhrer and F. Pernkopf, "Gated recurrent networks applied to acoustic scene classification and acoustic event detection, " in Proc. Detection and Classification of Acoustic Scenes and Events 2016, Budapest, Hungary, 2016, pp. 115-119.
|
[38] |
E. Sejdić, I. Djurović, and J. Jiang, "Time-frequency feature representation using energy concentration: an overview of recent advances, " Digit. Signal Process., vol. 19, no. 1, pp. 153-183, Jan. 2009. https://www.sciencedirect.com/science/article/pii/S105120040800002X
|
[39] |
I. Daubechies, Ten Lectures on Wavelets. Philadelphia, Pa, USA:SIAM, 1992.
|
[40] |
S. C. Olhede and A. T. Walden, "Generalized morse wavelets, " IEEE Trans. Signal Process., vol. 50, no. 11, pp. 2661-2670, Nov. 2002.
|
[41] |
A. Vedaldi and K. Lenc, "MatConvNet: Convolutional neural networks for MATLAB, " in Proc. 23rd ACM Int. Conf. Multimedia, Brisbane, Australia, 2015, pp. 689-692.
|
[42] |
R. Jozefowicz, W. Zaremba, and I. Sutskever, "An empirical exploration of recurrent network architectures, " in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2342-2350.
|
[43] |
D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate, " in Proc. Int. Conf. Learning Representations 2015, San Diego, CA, USA, 2015.
|
[44] |
Z. C. Yang, D. Y. Yang, C. Dyer, X. D. He, A. J. Smola, and E. H. Hovy, "Hierarchical attention networks for document classification, " in Proc. NAACL+HLT 2016, San Diego, CA, USA, 2016, pp. 1480-1489.
|
[45] |
M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks, " IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673-2681, Nov. 1997. http://ieeexplore.ieee.org/document/650093/
|
[46] |
R. K. Srivastava, K. Greff, and J. Schmidhuber, "Highway networks, " arXiv preprint, arXiv: 1505. 00387, 2015.
|
[47] |
T. Scheffer, C. Decomain, and S. Wrobel, "Active hidden Markov models for information extraction, " in Proc. 4th Int. Conf. Advances in Intelligent Data Analysis, Porto, Portugal, 2001, pp. 309-318. doi: 10.1007/3-540-44816-0_31
|
[48] |
K. Qian, Z. X. Zhang, A. Baird, and B. Schuller, "Active learning for bird sound classification via a kernel-based extreme learning machine, " J. Acoust. Soc. Am., vol. 142, no. 4, pp. 1796, Oct. 2017.
|
[49] |
A. Mesaros, T. Heittola, and T. Virtanen, "TUT database for acoustic scene classification and sound event detection, " in Proc. 24th European Signal Processing Conf. , Budapest, Hungary, 2016, pp. 1128-1132. http://ieeexplore.ieee.org/document/7760424/
|
[50] |
B. Schuller, S. Steidl, A. Batliner, A. Vinciarelli, K. Scherer, F. Ringeval, M. Chetouani, F. Weninger, F. Eyben, E. Marchi, M. Mortillaro, H. Salamin, A. Polychroniou, F. Valente, and S. Kim, "The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism, " in Proc. 14th Ann. Conf. Int. Speech Communication Association, Lyon, France, 2013, pp. 148-152.
|
[51] |
S. Mun, S. Park, D. K. Han, and H. Ko, "Generative adversarial network based acoustic scene training set augmentation and selection using SVM hyper-plane, " in Proc. Detection and Classification of Acoustic Scenes and Events 2017, Munich, Germany, 2017, pp. 93-97.
|
[52] |
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets, " in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672-2680.
|
[53] |
K. F. Wang, C. Gou, Y. J. Duan, Y. L. Lin, X. H. Zheng, and F. Y. Wang, "Generative adversarial networks: introduction and outlook, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 4, pp. 588-598, Oct. 2017. http://www.ieee-jas.org/CN/abstract/abstract278.shtml
|