IEEE/CAA Journal of Automatica Sinica
Citation: | P. L. Huang, J. W. Han, N. Liu, J. Ren, and D. W. Zhang, “Scribble-supervised video object segmentation,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 339–353, Feb. 2022. doi: 10.1109/JAS.2021.1004210 |
[1] |
Z. Teng, J. Xing, Q. Wang, C. Lang, S. Feng, and Y. Jin, “Robust object tracking based on temporal and spatial deep networks,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 1144–1153.
|
[2] |
Z. Ji, K. Xiong, Y. Pang, and X. Li, “Video summarization with attention-based encoder-decoder networks,” IEEE Trans. Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709–1717, 2019.
|
[3] |
D. Shao, Y. Xiong, Y. Zhao, Q. Huang, Y. Qiao, and D. Lin, “Find and focus: Retrieve and localize video events with natural language queries,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 200–216.
|
[4] |
Q. Wang, J. Gao, and X. Li, “Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes,” IEEE Trans. Image Processing, vol. 28, no. 9, pp. 4376–4386, 2019. doi: 10.1109/TIP.2019.2910667
|
[5] |
Y. Chen, W. Li, and L. Van Gool, “Road: Reality oriented adaptation for semantic segmentation of urban scenes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7892–7901.
|
[6] |
F. Perazzi, A. Khoreva, R. Benenson, B. Schiele, and A. SorkineHornung, “Learning video object segmentation from static images,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 2663–2672.
|
[7] |
S. Caelles, K.-K. Maninis, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool, “One-shot video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 221–230.
|
[8] |
C. Ventura, M. Bellver, A. Girbau, A. Salvador, F. Marques, and X. Giro-i Nieto, “Rvos: End-to-end recurrent network for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 5277–5286.
|
[9] |
J. Johnander, M. Danelljan, E. Brissman, F. S. Khan, and M. Felsberg, “A generative appearance model for end-to-end video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 8953–8962.
|
[10] |
P. Voigtlaender, Y. Chai, F. Schroff, H. Adam, B. Leibe, and L.-C. Chen, “Feelvos: Fast end-to-end embedding learning for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 9481–9490.
|
[11] |
Y. J. Koh and C.-S. Kim, “Primary object segmentation in videos based on region augmentation and reduction,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 7417–7425.
|
[12] |
T. Brox and J. Malik, “Object segmentation by long term analysis of point trajectories,” in Proc. European Conf. Computer Vision, Springer, 2010, pp. 282–295.
|
[13] |
H. Song, W. Wang, S. Zhao, J. Shen, and K.-M. Lam, “Pyramid dilated deeper convlstm for video salient object detection,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 715–731.
|
[14] |
X. Lu, W. Wang, C. Ma, J. Shen, L. Shao, and F. Porikli, “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 3623–3632.
|
[15] |
X. Lu, W. Wang, J. Shen, Y.-W. Tai, D. Crandall, and S. C. Hoi, “Learning video object segmentation from unlabeled videos,” arXiv preprint arXiv: 2003.05020, 2020.
|
[16] |
M. Tang, A. Djelouah, F. Perazzi, Y. Boykov, and C. Schroers, “Normalized cut loss for weakly-supervised cnn segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 1818–1827.
|
[17] |
M. Tang, F. Perazzi, A. Djelouah, I. Ben Ayed, C. Schroers, and Y. Boykov, “On regularized losses for weakly-supervised CNN segmentation,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 507–522.
|
[18] |
P. Krähenbühl and V. Koltun, “Efficient inference in fully connected crfs with gaussian edge potentials,” in Proc. Advances in Neural Information Processing Systems, 2011, pp. 109–117.
|
[19] |
K.-K. Maninis, S. Caelles, Y. Chen, J. Pont-Tuset, L. Leal-Taixé, D. Cremers, and L. Van Gool, “Video object segmentation without temporal information,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 41, no. 6, pp. 1515–1530, 2018.
|
[20] |
P. Voigtlaender and B. Leibe, “Online adaptation of convolutional neural networks for video object segmentation,” arXiv preprint arXiv: 1706.09364, 2017.
|
[21] |
Y. Chen, J. Pont-Tuset, A. Montes, and L. Van Gool, “Blazingly fast video object segmentation with pixel-wise metric learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 1189–1198.
|
[22] |
Y. J. Lee, J. Kim, and K. Grauman, “Key-segments for video object segmentation,” in Proc. Int. Conf. Computer Vision, IEEE, 2011, pp. 1995–2002.
|
[23] |
T. Ma and L. J. Latecki, “Maximum weight cliques with mutex constraints for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2012, pp. 670–677.
|
[24] |
X. Lu, C. Ma, B. Ni, X. Yang, I. Reid, and M.-H. Yang, “Deep regression tracking with shrinkage loss,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 353–369.
|
[25] |
D. Zhang, O. Javed, and M. Shah, “Video object segmentation through spatially accurate and temporally dense extraction of primary object regions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2013, pp. 628–635.
|
[26] |
L. Chen, J. Shen, W. Wang, and B. Ni, “Video object segmentation via dense trajectories,” IEEE Trans. Multimedia, vol. 17, no. 12, pp. 2225–2234, 2015. doi: 10.1109/TMM.2015.2481711
|
[27] |
K. Fragkiadaki, G. Zhang, and J. Shi, “Video segmentation by tracing discontinuities in a trajectory embedding,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, IEEE, 2012, pp. 1846–1853.
|
[28] |
P. Ochs and T. Brox, “Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions,” in Proc. Int. Conf. Computer Vision, IEEE, 2011, pp. 1583–1590.
|
[29] |
A. Papazoglou and V. Ferrari, “Fast object segmentation in unconstrained video,” in Proc. IEEE Int. Conf. Computer Vision, 2013, pp. 1777–1784.
|
[30] |
W.-D. Jang, C. Lee, and C.-S. Kim, “Primary object segmentation in videos via alternate convex optimization of foreground and background distributions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 696–704.
|
[31] |
A. Faktor and M. Irani, “Video segmentation by non-local consensus voting.” in Proc. British Machine Vision Conference (BMVC), 2014, vol. 2, no. 7, Article No. 8.
|
[32] |
Y.-T. Hu, J.-B. Huang, and A. G. Schwing, “Unsupervised video object segmentation using motion saliency-guided spatio-temporal propagation,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 786–802.
|
[33] |
W. Wang, J. Shen, and F. Porikli, “Saliency-aware geodesic video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015, pp. 3395–3402.
|
[34] |
S. D. Jain, B. Xiong, and K. Grauman, “Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2017, pp. 3664–3673.
|
[35] |
J. Cheng, Y.-H. Tsai, S. Wang, and M.-H. Yang, “Segflow: Joint learning for video object segmentation and optical flow,” in Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 686–695.
|
[36] |
S. Li, B. Seybold, A. Vorobyov, A. Fathi, Q. Huang, and C.-C. Jay Kuo, “Instance embedding transfer to unsupervised video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6526–6535.
|
[37] |
W. Wang, H. Song, S. Zhao, J. Shen, S. Zhao, S. C. Hoi, and H. Ling, “Learning unsupervised video object segmentation through visual attention,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 3064–3074.
|
[38] |
D. Lin, J. Dai, J. Jia, K. He, and J. Sun, “Scribblesup: Scribblesupervised convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 3159–3167.
|
[39] |
B. Wang, G. Qi, S. Tang, T. Zhang, Y. Wei, L. Li, and Y. Zhang, “Boundary perception guidance: A scribble-supervised semantic segmentation approach,” in Proc. 28th Int. Joint Conf. Artificial Intelligence. AAAI Press, 2019, pp. 3663–3669.
|
[40] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. Advances Neural Information Processing Systems, 2017, pp. 5998–6008.
|
[41] |
X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7794–7803.
|
[42] |
P. Yi, Z. Wang, K. Jiang, J. Jiang, and J. Ma, “Progressive fusion video super-resolution network via exploiting non-local spatio-temporal correlations,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 3106–3115.
|
[43] |
J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 3146–3154.
|
[44] |
H. Zhang, H. Zhang, C. Wang, and J. Xie, “Co-occurrent features in semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 548–557.
|
[45] |
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
|
[46] |
H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” arXiv preprint arXiv: 1805.08318, 2018.
|
[47] |
T. Joy, A. Desmaison, T. Ajanthan, R. Bunel, M. Salzmann, P. Kohli, P. H. Torr, and M. P. Kumar, “Efficient relaxations for dense crfs with sparse higher-order potentials,” SIAM Journal on Imaging Sciences, vol. 12, no. 1, pp. 287–318, 2019. doi: 10.1137/18M1178104
|
[48] |
B. Zhang, J. Xiao, Y. Wei, M. Sun, and K. Huang, “Reliability does matter: An end-to-end weakly supervised semantic segmentation approach,” arXiv preprint arXiv: 1911.08039, 2019.
|
[49] |
Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 11, pp. 1222–1239, 2001. doi: 10.1109/34.969114
|
[50] |
C. Liu, W. T. Freeman, E. H. Adelson, and Y. Weiss, “Human-assisted motion annotation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, IEEE, 2008, pp. 1–8.
|
[51] |
V. Karasev, A. Ravichandran, and S. Soatto, “Active frame, location, and detector selection for automated and manual video annotation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, pp. 2123–2130.
|
[52] |
B. Luo, H. Li, F. Meng, Q. Wu, and C. Huang, “Video object segmentation via global consistency aware query strategy,” IEEE Trans. Multimedia, vol. 19, no. 7, pp. 1482–1493, 2017. doi: 10.1109/TMM.2017.2671447
|
[53] |
D. Moltisanti, S. Fidler, and D. Damen, “Action recognition from single timestamp supervision in untrimmed videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 9915–9924.
|
[54] |
N. Xu, L. Yang, Y. Fan, D. Yue, Y. Liang, J. Yang, and T. Huang, “Youtube-VOS: A large-scale video object segmentation benchmark,” arXiv preprint arXiv: 1809.03327, 2018.
|
[55] |
J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, and L. Van Gool, “The 2017 DAVIS challenge on video object segmentation,” arXiv preprint arXiv: 1704.00675, 2017.
|
[56] |
Z. Guo and R. W. Hall, “Parallel thinning with two-subiteration algorithms,” Communications of the ACM, vol. 32, no. 3, pp. 359–373, 1989. doi: 10.1145/62065.62074
|
[57] |
F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2016, pp. 724–732.
|
[58] |
H. Lin, X. Qi, and J. Jia, “Agss-VOS: Attention guided single-shot video object segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 3949–3957.
|
[59] |
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Advances in Neural Information Processing Systems, 2019, pp. 8024–8035.
|
[60] |
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
|
[61] |
H. Zhao, Y. Zhang, S. Liu, J. Shi, C. Change Loy, D. Lin, and J. Jia, “Psanet: Point-wise spatial attention network for scene parsing,” in Proc. European Conf. Computer Vision (ECCV), 2018, pp. 267–283.
|
[62] |
Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” in Proc. IEEE Int. Conf. Computer Vision, 2019, pp. 603–612.
|
[63] |
L. Yang, Y. Wang, X. Xiong, J. Yang, and A. K. Katsaggelos, “Efficient video object segmentation via network modulation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6499–6507.
|
[64] |
S. Wug Oh, J.-Y. Lee, K. Sunkavalli, and S. Joo Kim, “Fast video object segmentation by reference-guided mask propagation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7376–7385.
|
[65] |
J. Luiten, P. Voigtlaender, and B. Leibe, “Premvos: Proposal-generation, refinement and merging for video object segmentation,” in Proc. Asian Conf. Computer Vision, Springer, 2018, pp. 565–580.
|
[66] |
J. Cheng, Y.-H. Tsai, W.-C. Hung, S. Wang, and M.-H. Yang, “Fast and accurate online video object segmentation via tracking parts,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 7415–7424.
|
[67] |
Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast online object tracking and segmentation: A unifying approach,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2019, pp. 1328–1338.
|
[68] |
X. Luo, M. Zhou, S. Li, L. Hu, and M. Shang, “Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications,” IEEE Trans. Cybernetics, vol. 50, no. 5, pp. 1844–1855, 2019.
|
[69] |
X. Luo, Y. Yuan, M. Zhou, Z. Liu, and M. Shang, “Non-negative latent factor model based onβ-divergence for recommender systems,” IEEE Trans. Systems,Man,and Cybernetics:Systems, vol. 51, no. 8, pp. 4612–4623, 2019.
|
[70] |
X. Luo, M. Zhou, S. Li, and M. Shang, “An inherently nonnegative latent factor model for high-dimensional and sparse matrices from industrial applications,” IEEE Trans. Industrial Informatics, vol. 14, no. 5, pp. 2011–2022, 2017.
|
[71] |
X. Luo, M. Zhou, Y. Xia, Q. Zhu, A. C. Ammari, and A. Alabdulwahab, “Generating highly accurate predictions for missing qos data via aggregating nonnegative latent factor models,” IEEE Trans. Neural Networks and Learning Systems, vol. 27, no. 3, pp. 524–537, 2015.
|
[72] |
Z. Yizhuo, W. Zhirong, P. Houwen, and L. Stephen, “A transductive approach for video object segmentation,” arXiv preprint arXiv: 2004.07193, 2020.
|
[73] |
P. M. Kebria, A. Khosravi, S. M. Salaken, and S. Nahavandi, “Deep imitation learning for autonomous vehicles based on convolutional neural networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 82–95, 2019.
|
[74] |
Y. Xia, H. Yu, and F.-Y. Wang, “Accurate and robust eye center localization via fully convolutional networks,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1127–1138, 2019. doi: 10.1109/JAS.2019.1911684
|
[75] |
T. D. Pham, K. Wardell, A. Eklund, and G. Salerud, “Classification of short time series in early parkinson’s disease with deep learning of fuzzy recurrence plots,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1306–1317, 2019. doi: 10.1109/JAS.2019.1911774
|
[76] |
S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2018.
|
[77] |
D. Zhang, J. Han, L. Zhao, and D. Meng, “Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework,” Int. J. Computer Vision, vol. 127, no. 4, pp. 363–380, 2019. doi: 10.1007/s11263-018-1112-4
|
[78] |
D. Zhang, J. Han, L. Zhao, and T. Zhao, “From discriminant to complete: Reinforcement searching-agent learning for weakly supervised object detection,” IEEE Trans. Neural Networks and Learning Systems, vol. 31, no. 12, pp. 5549–5560, 2020.
|
[79] |
J. Han, Y. Yang, D. Zhang, D. Huang, D. Xu, and F. De La Torre, “Weakly-supervised learning of category-specific 3D object shapes,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1423–1437, 2019.
|
[80] |
D. Zhang, J. Han, G. Guo, and L. Zhao, “Learning object detectors with semi-annotated weak labels,” IEEE Trans. Circuits and Systems for Video Technology, vol. 29, no. 12, pp. 3622–3635, 2018.
|
[81] |
D. Zhang, G. Guo, D. Huang, and J. Han, “Poseflow: A deep motion representation for understanding human behaviors in videos,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2018, pp. 6762–6770.
|