Citation: | P. Huang and X. Luo, “FDTs: A feature disentangled transformer for interpretable squamous cell carcinoma grading,” IEEE/CAA J. Autom. Sinica, 2024. |
[1] |
A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learning Representations, 2020.
|
[2] |
X. Chu, Z. Tian, Y. Wang, et al., “Twins: Revisiting the design of spatial attention in vision transformers,” Advances in Neural Infor. Processing Systems, vol. 34, pp. 9355−9366, 2021.
|
[3] |
L. Yuan, Y. Chen, T. Wang, et al., “Tokens-to-token VIT: Training vision transformers from scratch on imagenet,” in Proc. IEEE/CVF Int. Conf. Computer Vision, 2022, pp. 538−547.
|
[4] |
W. Yu, M. Luo, P. Zhou, et al., “Metaformer is actually what you need for vision,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 10819−10829.
|
[5] |
Z. Liu, Y. Lin, Y. Cao, et al., “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2021, pp. 10012−10022.
|
[6] |
I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, et al., “Mlp-mixer: An all-mlp architecture for vision,” Advances in Neural Infor. Processing Systems, pp. 24261−24272, 2024.
|
[7] |
G. Huang, Z. Liu, L. Maaten, et al., “Densely connected convolutional networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2017, pp. 4700−4708.
|
[8] |
M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Proc. Int. Conf. Machine Learning, 2019, pp. 6105−6114.
|
[9] |
Z. Liu, H Mao, C Wu, et al., “A convnet for the 2020s,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2022, pp. 11976−11986.
|
[10] |
X. Ding, X. Zhang, N. Ma, et al., “RepVGG: Making VGG-style convnets great again,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2021, pp. 13733−13742.
|
[11] |
C. Pan, J. Peng, and Z. Zhang, “Depth-guided vision transformer with normalizing flows for monocular 3D object detection,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 3, pp. 673−689, 2024.
|
[12] |
Y. Zheng, J. Li, J. Shi, et al., “Kernel attention transformer for histopathology whole slide image analysis and assistant cancer siagnosis,” IEEE Trans. Medical Imaging, vol. 42, no. 9, pp. 2726–2739, 2023.
|
[13] |
P. Huang, P. He, S. Tian, et al., “A ViT-AMC network with adaptive model fusion and multi-objective optimization for interpretable laryngeal tumor grading from histopathological images,” IEEE Trans. Medical Imaging, vol. 42, no. 1, pp. 15−28, 2023.
|
[14] |
X. Wang, S. Yang, J. Zhang, et al., “Transformer-based unsupervised contrastive learning for histopathological image classification,” Medical Image Analysis, vol. 81, p. 102559, 2022.
|
[15] |
Z. Li, Y. Jiang, M. Lu, et al., “Survival prediction via hierarchical multimodal co-attention transformer: A computational histology-radiology solution,” IEEE Trans. Medical Imaging, vol. 42, no. 9, pp. 2678–2689, 2023.
|
[16] |
A. Wang, H. Chen, Z. Lin, et al., “RepViT: Revisiting mobile cnn from ViT perspective,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2024, pp. 15909−15920.
|
[17] |
Y. Hu, Y. Cheng, A. Lu, et al., “LF-ViT: Reducing spatial redundancy in vision transformer for efficient image recognition,” in Proc. AAAI Conf. Artificial Intelligence, 2024, vol. 38, no. 3, pp. 2274−2284.
|
[18] |
B. Heo, S. Park, D. Han, et al., “Rotary position embedding for vision transformer,” in Proc. European Conf. Computer Vision, 2025, pp. 289−305.
|