A Cognitive Memory-Augmented Network for Visual Anomaly Detection

Tian Wang; Xing Xu; Fumin Shen; Yang Yang

doi:10.1109/JAS.2021.1004045

Volume 8 Issue 7

Jul. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(7): 1296-1307

T. Wang, X. Xu, F. M. Shen, and Y. Yang, "A Cognitive Memory-Augmented Network for Visual Anomaly Detection," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1296-1307, Jul. 2021. doi: 10.1109/JAS.2021.1004045

Citation:

T. Wang, X. Xu, F. M. Shen, and Y. Yang, "A Cognitive Memory-Augmented Network for Visual Anomaly Detection," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1296-1307, Jul. 2021. doi: 10.1109/JAS.2021.1004045

Citation:

PDF( 2019 KB)

A Cognitive Memory-Augmented Network for Visual Anomaly Detection

doi: 10.1109/JAS.2021.1004045

Tian Wang^1
,,
Xing Xu^{1
,
,},
Fumin Shen^1
,,
Yang Yang^{1, 2
,}

1.
Center for Future Multimedia and School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu 611731, China
2.
Institute of Electronic and Information Engineering of UESTC in Guangdong, Dongguan 523808, China

Funds: This work was supported in part by the National Natural Science Foundation of China (61976049, 62072080, U20B2063), the Fundamental Research Funds for the Central Universities (ZYGX2019Z015), the Sichuan Science and Technology Program, China (2018GZDZX0032, 2019ZDZX0008, 2019YFG0003, 2019YFG0533, 2020YFS0057), and Dongguan Songshan Lake Introduction Program of Leading Innovative and Entrepreneurial Talents

More Information

Author Bio:
Tian Wang received the B.S. degree from Shaanxi Normal University, China, in 2019. He is currently a Master student at Center for Future Media and School of Computer Science and Engineering, University of Electronic Science and Technology of China. His research interests include computer vision and machine learning

Xing Xu (M’16) received the B.E. and M.E. degrees from the Huazhong University of Science and Technology, Wuhan, China, in 2009 and 2012, respectively, and the Ph.D. degree from Kyushu University, Fukuoka, Japan, in 2015. He is currently an Associate Professor with the School of Computer Science and Engineering, University of Electronic of Science and Technology of China. His current research interests mainly focus on multimedia information retrieval and computer vision. He is the recipient of six academic awards, including the IEEE Multimedia Prize Paper 2020, Best Paper Award from Association for Computing Machinery (ACM) Multimedia 2017, and the World’s FIRST 10K Best Paper Award-Platinum Award from IEEE International Conference on Multimedia and Expo (ICME) 2017

Fumin Shen (M’15) received the B.S. degree from Shandong University in 2007, and the Ph.D. degree from the Nanjing University of Science and Technology, China, in 2014. He is currently a Professor with the School of Computer Science and Engineering, University of Electronic Science and Technology of China. His major research interests include computer vision and machine learning. He was a recipient of the Best Paper Award Honorable Mention from ACM SIGIR 2016 and ACM SIGIR 2017 and the World’s FIRST 10K Best Paper Award-Platinum Award from the IEEE ICME 2017

Yang Yang (M’16–SM’19) received the B.S. degree from Jilin University, Changchun, China, in 2006, the M.S. degree from Peking University, Beijing, China, in 2009, and the Ph.D. degree from The University of Queensland, Brisbane, Australia, in 2012, all in computer science. He is currently a Professor with the University of Electronic Science and Technology of China. His current research interests include multimedia content analysis, computer vision, and social media analytics
Corresponding author: Xing Xu, e-mail: xing.xu@uestc.edu.cn
Received Date: 2021-01-03
Revised Date: 2021-02-24
Accepted Date: 2021-03-28

Available Online: 2021-04-19

Abstract

Abstract

With the rapid development of automated visual analysis, visual analysis systems have become a popular research topic in the field of computer vision and automated analysis. Visual analysis systems can assist humans to detect anomalous events (e.g., fighting, walking alone on the grass, etc). In general, the existing methods for visual anomaly detection are usually based on an autoencoder architecture, i.e., reconstructing the current frame or predicting the future frame. Then, the reconstruction error is adopted as the evaluation metric to identify whether an input is abnormal or not. The flaws of the existing methods are that abnormal samples can also be reconstructed well. In this paper, inspired by the human memory ability, we propose a novel deep neural network (DNN) based model termed cognitive memory-augmented network (CMAN) for the visual anomaly detection problem. The proposed CMAN model assumes that the visual analysis system imitates humans to remember normal samples and then distinguishes abnormal events from the collected videos. Specifically, in the proposed CMAN model, we introduce a memory module that is able to simulate the memory capacity of humans and a density estimation network that can learn the data distribution. The reconstruction errors and the novelty scores are used to distinguish abnormal events from videos. In addition, we develop a two-step scheme to train the proposed model so that the proposed memory module and the density estimation network can cooperate to improve performance. Comprehensive experiments evaluated on various popular benchmarks show the superiority and effectiveness of the proposed CMAN model for visual anomaly detection comparing with the state-of-the-arts methods. The implementation code of our CMAN method can be accessed at https://github.com/CMAN-code/CMAN_pytorch.
- Cognitive computing,
- density estimation,
- memory,
- visual analysis systems,
- visual anomaly detection

FullText(HTML)

References(46)

References

[1]	M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis, “Learning temporal regularity in video sequences,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 733–742.
[2]	W. Luo, W. Liu, and S. Gao, “Remembering history with convolutional lstm for anomaly detection,” in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), IEEE, 2017, pp. 439–444.
[3]	D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. van den Hengel, “Memorizing normality to detect anomaly: Memoryaugmented deep autoencoder for unsupervised anomaly detection,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 1705–1714.
[4]	W. Luo, W. Liu, and H. Gao, “A revisit of sparse coding based anomaly detection in stacked rnn framework,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 341–349.
[5]	Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layerwise training of deep networks,” in Proc. Advances Neural Information Processing Systems 19, Proc. the Twentieth Annual Conf. on Neural Information Processing Systems, Vancouver, British Columbia, Canada, December 4–7, 2007.
[6]	D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” arXiv preprint arXiv: 1312.6114, 2013.
[7]	X. Xu, K. Lin, L. Gao, H. Lu, H. T. Shen, and X. Li, “Cross-modal common representations by private-shared subspaces separation, ” IEEE Trans. Cybernetics, pp. 1–14, 2020.
[8]	D. Abati, A. Porrello, S. Calderara, and R. Cucchiara, “Latent space autoregression for novelty detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 481–490.
[9]	P. Perera, R. Nallapati, and B. Xiang, “Ocgan: One-class novelty detection using gans with constrained latent representations,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 2898–2906.
[10]	T. Schlegl, P. Seeböck, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in Proc. Int. Conf. Information Processing Medical Imaging, Springer, 2017, pp. 146–157.
[11]	W. Liu, W. Luo, D. Lian, and S. Gao, “Future frame prediction for anomaly detection–a new baseline,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6536–6545.
[12]	R. Hadsell, S. Chopra, and Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” in Proc IEEE Computer Society Conf. Computer Vision and Pattern Recognition (CVPR’06), IEEE, 2006, pp. 1735–1742.
[13]	W. Li, V. Mahadevan, and N. Vasconcelos, “Anomaly detection and localization in crowded scenes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 1, pp. 18–32, 2013.
[14]	Y. Mizukami, K. Tadamura, J. Warrell, P. Li, and S. Prince, “Cuda implementation of deformable pattern recognition and its application to mnist handwritten digit database,” in Proc 20th Int. Conf. Pattern Recognition, IEEE, 2010, pp. 2001–2004.
[15]	A. Krizhevsky, V. Nair, and G. Hinton, “The cifar-10 dataset,” [online], vailible: http://www.cs.toronto.edu/kriz/cifar.html, vol. 55, 2014.
[16]	P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “MVTec AD – A comprehensive real-world dataset for unsupervised anomaly detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2020, pp. 9592–9600.
[17]	H. Park, J. Noh, and B. Ham, “Learning memory-guided normality for anomaly detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020, pp. 14 372–14 381.
[18]	R. T. Ionescu, F. S. Khan, M.-I. Georgescu, and L. Shao, “Object-centric auto-encoders and dummy anomalies for abnormal event detection in video,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 7842–7851.
[19]	B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep autoencoding gaussian mixture model for unsupervised anomaly detection,” in Proc. Int. Conf. Learning Representations, 2018, pp. 1–14.
[20]	M. Sabokrou, M. Fayyaz, M. Fathy, Z. Moayed, and R. Klette, “Deepanomaly: Fully convolutional neural network for fast anomaly detection in crowded scenes,” Computer Vision and Image Understanding, vol. 172, pp. 88–97, 2018. doi: 10.1016/j.cviu.2018.02.006
[21]	C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
[22]	J. Kim and K. Grauman, “Observe locally, infer globally: A space-time mrf for detecting abnormal activities with incremental updates,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, IEEE, 2009, pp. 2921–2928.
[23]	V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, “Anomaly detection in crowded scenes,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, IEEE, 2010, pp. 1975–1981.
[24]	Y. Zhao, B. Deng, C. Shen, Y. Liu, H. Lu, and X.-S. Hua, “Spatiotemporal autoencoder for video anomaly detection,” in Proc. 25th ACM Int. Conf. Multimedia, 2017, pp. 1933–1941.
[25]	A. Agarwal, A. Sarkar, and A. K. Dubey, “Computer vision-based fruit disease detection and classification,” in Smart Innovations Communication and Computational Sciences. New York, USA: Springer, 2019, pp. 105–115.
[26]	D. Vallejo, J. Albusac, L. Jimenez, C. Gonzalez, and J. Moreno, “A cognitive surveillance system for detecting incorrect traffic behaviors,” Expert Systems with Applications, vol. 36, no. 7, pp. 10 503–10 511, 2009. doi: 10.1016/j.eswa.2009.01.034
[27]	T. J. Prescott, D. Camilleri, U. Martinez-Hernandez, A. Damianou, and N. D. Lawrence, “Memory and mental time travel in humans and social robots,” Philosophical Transactions of the Royal Society B, vol. 374, no. 1771, pp. 352–369, 2019.
[28]	W. Dodd and R. Gutierrez, “The role of episodic memory and emotion in a cognitive robot,” in Proc. ROMAN IEEE Int. Workshop on Robot and Human Interactive Communication, IEEE, 2005, pp. 692–697.
[29]	P. Luc, N. Neverova, C. Couprie, J. Verbeek, and Y. LeCun, “Predicting deeper into the future of semantic segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 648–657.
[30]	A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv: 1609.03499, 2016.
[31]	H. Larochelle and I. Murray, “The neural autoregressive distribution estimator,” in Proc. Fourteenth Int. Conf. Artificial Intelligence and Statistics, 2011, pp. 29–37.
[32]	B. Uria, I. Murray, and H. Larochelle, “Rnade: The real-valued neural autoregressive density-estimator,” in Proc. Advances Neural Information Processing Systems, 2013, pp. 2175–2183.
[33]	A. van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, and A. Graves, “Conditional image generation with pixelcnn decoders,” in Proc. Advances Neural Information Processing Systems, 2016, pp. 4790–4798.
[34]	J. Weston, S. Chopra, and A. Bordes, “Memory networks, ” arXiv preprint arXiv: 1410.3916, 2014.
[35]	J. Rae, J. J. Hunt, I. Danihelka, T. Harley, A. W. Senior, G. Wayne, A. Graves, and T. Lillicrap, “Scaling memory-augmented neural networks with sparse reads and writes,” in Proc. Advances Neural Information Processing Systems, 2016, pp. 3621–3629.
[36]	C. Stauffer and W. E. L. Grimson, “Adaptive background mixture models for real-time tracking,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, vol. 2. IEEE, 1999, pp. 246–252.
[37]	D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
[38]	S. A. Nene, S. K. Nayar, and H. Murase, “Columbia object image library (coil-20),” Tech. Rep. cucus-006-96, 1996.
[39]	H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv: 1708.07747, 2017.
[40]	R. Tudor Ionescu, S. Smeureanu, B. Alexe, and M. Popescu, “Unmasking the abnormal events in video,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 2895–2903.
[41]	B. Scholkopf, J. Platt, and J. Taylor, “Estimating the support of a high dimensional distribution neural computation,” Neural Computation, vol. 13, no. 7, pp. 1443–1471, 2001. doi: 10.1162/089976601750264965
[42]	L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,” in Proc. Int. conf. Machine Learning, 2018, pp. 4393–4402.
[43]	M. Sabokrou, M. Khalooei, M. Fathy, and E. Adeli, “Adversarially learned one-class classifier for novelty detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 3379–3388.
[44]	M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proc. MLSDA 2nd Workshop Machine Learning for Sensory Data Analysis, 2014, pp. 4–11.
[45]	S. Pidhorskyi, R. Almohsen, and G. Doretto, “Generative probabilistic novelty detection with adversarial autoencoders,” in Proc. Advances Neural Information Processing Systems, 2018, pp. 6822–6833.
[46]	P. Bergmann, S. Lwe, M. Fauser, D. Sattlegger, and C. Steger, “Improving unsupervised defect segmentation by applying structural similarity to autoencoders,” in Proc.14th Int. Conf. Computer Vision Theory and Applications, 2019, 372–380.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (1207) PDF downloads(102)

Highlights

A Cognitive Memory-Augmented Network is proposed for visual anomaly detection.
A memory module is designed to simulate the memory capacity of humans.
A density estimation module is developed to learn the data distribution.
A two-step scheme is proposed to enable the cooperation of the two modules.

A Cognitive Memory-Augmented Network for Visual Anomaly Detection

doi: 10.1109/JAS.2021.1004045

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content