IEEE/CAA Journal of Automatica Sinica
Citation: | D. Yu, M. Y. Zhang, M. T. Li, F. S. Zha, J. G. Zhang, L. N. Sun, and K. Q. Huang, “Squeezing more past knowledge for online class-incremental continual learning,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 722–736, Mar. 2023. doi: 10.1109/JAS.2023.123090 |
[1] |
P. Sun, R. Zhang, Y. Jiang, T. Kong, C. Xu, W. Zhan, M. Tomizuka, L. Li, Z. Yuan, C. Wang, and P. Luo, “Sparse R-CNN: End-to-end object detection with learnable proposals,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 14454–14463.
|
[2] |
P. Dai, R. Weng, W. Choi, C. Zhang, Z. He, and W. Ding, “Learning a proposal classifier for multiple object tracking,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 2443–2452.
|
[3] |
M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei, “Rethinking bisenet for real-time semantic segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 9716–9725.
|
[4] |
L. Liu, D. Dugas, G. Cesari, R. Siegwart, and R. Dubé, “Robot navigation in crowded environments using deep reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, 2020, pp. 5671–5677.
|
[5] |
R. Agarwal, M. Schwarzer, P. S. Castro, A. C. Courville, and M. Bellemare, “Deep reinforcement learning at the edge of the statistical precipice,” Advances Neural Information Processing Syst., vol. 34, pp. 29304–29320, 2021.
|
[6] |
M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Proc. Psychology Learning Motivation, Elsevier, 1989, vol. 24, pp. 109–165.
|
[7] |
R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in Cognitive Sciences, vol. 3, no. 4, pp. 128–135, 1999. doi: 10.1016/S1364-6613(99)01294-2
|
[8] |
I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, “An empirical investigation of catastrophic forgeting in gradientbased neural networks,” arXiv preprint arXiv: 1312.6211, 2013.
|
[9] |
M. B. Ring, “Continual learning in reinforcement environments,” Ph.D. dissertation, University of Texas at Austin, Austin, USA, 1994.
|
[10] |
M. Zhai, L. Chen, and G. Mori, “Hyper-lifelonggan: Scalable lifelong learning for image conditioned generation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 2246–2255.
|
[11] |
R. Aljundi, M. Rohrbach, and T. Tuytelaars, “Selfless sequential learning,” in Proc. Int. Conf. Learning Representations, 2019.
|
[12] |
M. PourKeshavarzi, G. Zhao, and M. Sabokrou, “Looking back on learned experiences for class/task incremental learning,” in Proc. Int. Conf. Learning Representations, 2022.
|
[13] |
T.-Y. Wu, G. Swaminathan, Z. Li, A. Ravichandran, N. Vasconcelos, R. Bhotika, and S. Soatto, “Class-incremental learning with strong pretrained models,” arXiv preprint arXiv: 2204.03634, 2022.
|
[14] |
D. Shim, Z. Mai, J. Jeong, S. Sanner, H. Kim, and J. Jang, “Online class-incremental continual learning with adversarial shapley value,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 11, pp. 9630–9638.
|
[15] |
Y. Gu, X. Yang, K. Wei, and C. Deng, “Not just selection, but exploration: Online class-incremental continual learning via dual view consistency,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2022, pp. 7442–7451.
|
[16] |
M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 44, no. 7, pp. 3366–3385, 2021.
|
[17] |
M. F. Carr, S. Jadhav, and L. M. Frank, “Hippocampal replay in the awake state: A potential substrate for memory consolidation and retrieval,” Nature Neuroscience, vol. 14, no. 2, p. 147, 2011. doi: 10.1038/nn.2732
|
[18] |
J. L. McClelland, “Complementary learning systems in the brain. A connectionist approach to explicit and implicit cognition and memory.” Annals New York Academy Sciences, vol. 843, p. 153, 1998.
|
[19] |
J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al, “Overcoming catastrophic forgetting in neural networks,” Proc. National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017. doi: 10.1073/pnas.1611835114
|
[20] |
F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” Proc. Machine Learning Research, vol. 70, p. 3987, 2017.
|
[21] |
J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y. W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4528–4537.
|
[22] |
R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proc. European Conf. Computer Vision, 2018, pp. 139–154.
|
[23] |
C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, “Variational continual learning,” arXiv preprint arXiv: 1710.10628, 2017.
|
[24] |
H. Ahn, S. Cha, D. Lee, and T. Moon, “Uncertainty-based continual learning with adaptive regularization,” Advances Neural Information Processing Systems, vol. 32, 2019.
|
[25] |
K. Wei, C. Deng, X. Yang, and M. Li, “Incremental embedding learning via zero-shot translation,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 11, pp. 10254–10262.
|
[26] |
Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
|
[27] |
A. Rannen, R. Aljundi, M. B. Blaschko, and T. Tuytelaars, “Encoder based lifelong learning,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 1320–1328.
|
[28] |
Y. Gu, C. Deng, and K. Wei, “Class-incremental instance segmentation via multi-teacher networks,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 2, pp. 1478–1486.
|
[29] |
G. Yang, E. Fini, D. Xu, P. Rota, M. Ding, M. Nabi, X. Alameda-Pineda, and E. Ricci, “Uncertainty-aware contrastive distillation for incremental semantic segmentation,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 45, pp. 2567–2581, 2022.
|
[30] |
E. Fini, V. G. T. da Costa, X. Alameda-Pineda, E. Ricci, K. Alahari, and J. Mairal, “Self-supervised models are continual learners,” arXiv preprint arXiv: 2112.04215, 2021.
|
[31] |
A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv: 1606.04671, 2016.
|
[32] |
D. Liu, J. Xu, Zhang, and Y. Yan, “Investigation of knowledge transfer approaches to improve the acoustic modeling of vietnamese ASR system,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1187–1195, 2019. doi: 10.1109/JAS.2019.1911693
|
[33] |
R. Aljundi, P. Chakravarty, and T. Tuytelaars, “Expert gate: Lifelong learning with a network of experts,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 3366–3375.
|
[34] |
J. Yoon, E. Yang, J. Lee, and S. J. Hwang, “Lifelong learning with dynamically expandable networks,” arXiv preprint arXiv:1708.01547, 2017.
|
[35] |
A. Douillard, A. Ramé, G. Couairon, and M. Cord, “DYTOX: Transformers for continual learning with dynamic token expansion,” arXiv preprint arXiv: 2111.11326, 2021.
|
[36] |
A. Mallya and S. Lazebnik, “PackNet: Adding multiple tasks to a single network by iterative pruning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7765–7773.
|
[37] |
C. Fernando, D. Banarse, C. Blundell, Y. Zwols, D. Ha, A. A. Rusu, A. Pritzel, and D. Wierstra, “Pathnet: Evolution channels gradient descent in super neural networks,” arXiv preprint arXiv: 1701.08734, 2017.
|
[38] |
J. Serra, D. Suris, M. Miron, and A. Karatzoglou, “Overcoming catastrophic forgetting with hard attention to the task,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4548–4557.
|
[39] |
A. Benjamin, D. Rolnick, and K. Kording, “Measuring and regularizing networks in function space,” in Proc. Int. Conf. Learning Representations, 2019.
|
[40] |
A. Prabhu, P. H. Torr, and P. K. Dokania, “Gdumb: A simple approach that questions our progress in continual learning,” in Proc. European Conf. Computer Vision, Springer, 2020, pp. 524–540.
|
[41] |
A. Chaudhry, A. Gordo, P. Dokania, P. Torr, and D. Lopez-Paz, “Using hindsight to anchor past knowledge in continual learning,” in Proc. AAAI Conf. Artificial Intelligence, 2021, vol. 35, no. 8, pp. 6993–7001.
|
[42] |
J. Bang, H. Kim, Y. Yoo, J.-W. Ha, and J. Choi, “Rainbow memory: Continual learning with a memory of diverse samples,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, June 2021, pp. 8218–8227.
|
[43] |
X. Jin, A. Sadhu, J. Du, and X. Ren, “Gradient-based editing of memory examples for online task-free continual learning,” Advances Neural Information Processing Syst., vol. 34, pp. 29193–29205, 2021.
|
[44] |
L. Wang, X. Zhang, K. Yang, L. Yu, C. Li, H. Lanqing, S. Zhang, Z. Li, Y. Zhong, and J. Zhu, “Memory replay with data compression for continual learning,” arXiv preprint arXiv: 2202.06592, 2022.
|
[45] |
J. Yoon, D. Madaan, E. Yang, and S. J. Hwang, “Online coreset selection for rehearsal-based continual learning,” in Proc. Int. Conf. Learning Representations, 2022.
|
[46] |
S. Sun, D. Calandriello, H. Hu, A. Li, and M. Titsias, “Information theoretic online memory selection for continual learning,” arXiv preprint arXiv: 2204.04763, 2022.
|
[47] |
S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “ICARL: Incremental classifier and representation learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 2001–2010.
|
[48] |
Buzzega, M. Boschini, A. Porrello, D. Abati, and S. Calderara, “Dark experience for general continual learning: A strong, simple baseline,” Advances Neural Information Processing Systems, vol. 33, pp. 15920–15930, 2020.
|
[49] |
H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep generative replay,” in Proc. Advances Neural Information Processing Systems, 2017, pp. 2990–2999.
|
[50] |
M. Riemer, T. Klinger, D. Bouneffouf, and M. Franceschini, “Scalable recollections for continual lifelong learning,” in Proc. AAAI Conf. Artificial Intelligence, 2019, vol. 33, pp. 1352–1359.
|
[51] |
G. M. van de Ven, H. T. Siegelmann, and A. S. Tolias, “Brain-inspired replay for continual learning with artificial neural networks,” Nature Communi., vol. 11, no. 1, pp. 1–14, 2020. doi: 10.1038/s41467-019-13993-7
|
[52] |
K. Wei, C. Deng, X. Yang, and D. Tao, “Incremental zero-shot learning,” IEEE Trans. Cybernetics, vol. 52, no. 12, pp. 13788–13799, 2021.
|
[53] |
Y.-M. Tang, Y.-X. Peng, and W.-S. Zheng, “Learning to imagine: Diversify memory for incremental learning using unlabeled data,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2022, pp. 9549–9558.
|
[54] |
D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for continual learning,” in Proc. Advances Neural Information Processing Systems, 2017, pp. 6467–6476.
|
[55] |
A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient lifelong learning with A-GEM,” arXiv preprint arXiv: 1812.00420, 2018.
|
[56] |
R. Aljundi, M. Lin, B. Goujaud, and Y. Bengio, “Gradient based sample selection for online continual learning,” in Proc. Advances Neural Information Processing Systems, 2019, pp. 11816–11825.
|
[57] |
M. Farajtabar, N. Azizan, A. Mott, and A. Li, “Orthogonal gradient descent for continual learning,” in Proc. Int. Conf. Artificial Intelligence Statistics, 2020, pp. 3762–3773.
|
[58] |
E. F. Ohata, G. M. Bezerra, J. V. S. das Chagas, A. V. L. Neto, A. B. Albuquerque, V. H. C. de Albuquerque, and P. Reboucas Filho, “Automatic detection of COVID-19 infection using chest X-ray images through transfer learning,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 239–248, 2020.
|
[59] |
A. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, “CurveNet: Curvature-based multitask learning deep networks for 3D object recognition,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177–1187, 2020.
|
[60] |
G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” arXiv preprint arXiv: 1503.02531, 2015.
|
[61] |
J. H. Cho and B. Hariharan, “On the efficacy of knowledge distillation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Oct. 2019, pp. 4794–4802.
|
[62] |
Y. Bengio, A. Courville, and Vincent, “Representation learning: A review and new perspectives,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013. doi: 10.1109/TPAMI.2013.50
|
[63] |
A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “Fitnets: Hints for thin deep nets,” arXiv preprint arXiv: 1412.6550, 2014.
|
[64] |
B. Heo, J. Kim, S. Yun, H. Park, N. Kwak, and J. Y. Choi, “A comprehensive overhaul of feature distillation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 1921–1930.
|
[65] |
J. Kim, S. Park, and N. Kwak, “Paraphrasing complex network: Network compression via factor transfer,” arXiv preprint arXiv: 1802.04977, 2018.
|
[66] |
F. Tung and G. Mori, “Similarity-preserving knowledge distillation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2019, pp. 1365–1374.
|
[67] |
Y. LeCun, L. Bottou, Y. Bengio, and Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. doi: 10.1109/5.726791
|
[68] |
A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” M.S. thesis, University of Toronto, Toronto, Canada, 2009.
|
[69] |
O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al., “Matching networks for one shot learning,” Advances Neural Information Processing Systems, vol. 29, pp. 3630–3638, 2016.
|
[70] |
Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” CS 231N, vol. 7, no. 7, p. 3, 2015.
|
[71] |
A. Chaudhry, M. Rohrbach, M. Elhoseiny, T. Ajanthan, P. K. Dokania, P. H. Torr, and M. Ranzato, “On tiny episodic memories in continual learning,” arXiv preprint arXiv: 1902.10486, 2019.
|
[72] |
W. Park, D. Kim, Y. Lu, and M. Cho, “Relational knowledge distillation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2019, pp. 3967–3976.
|
[73] |
A. Cheraghian, S. Rahman, P. Fang, S. K. Roy, L. Petersson, and M. Harandi, “Semantic-aware knowledge distillation for few-shot classincremental learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2021, pp. 2534–2543.
|