An Interpretable Temporal Convolutional Framework for Granger Causality Analysis

Aoxiang Dong; Andrew Starr; Yifan Zhao

doi:10.1109/JAS.2025.125396

Volume 13 Issue 3

Mar. 2026

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > 13(3): 665-679

A. Dong, A. Starr, and Y. Zhao, “An interpretable temporal convolutional framework for Granger causality analysis,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 3, pp. 665–679, Mar. 2026. doi: 10.1109/JAS.2025.125396

Citation:

A. Dong, A. Starr, and Y. Zhao, “An interpretable temporal convolutional framework for Granger causality analysis,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 3, pp. 665–679, Mar. 2026. doi: 10.1109/JAS.2025.125396

Citation:

PDF( 4967 KB)

An Interpretable Temporal Convolutional Framework for Granger Causality Analysis

doi: 10.1109/JAS.2025.125396

More Information

Abstract

Abstract

Most existing parametric approaches for detecting linear or nonlinear Granger causality (GC) face challenges in estimating appropriate time delays, a critical factor for accurate GC detection. This issue becomes particularly pronounced in nonlinear complex systems, which are often opaque and consist of numerous components or variables. In this paper, we propose a novel temporal convolutional network (TCN)-based end-to-end GC detection approach called the interpretable temporal convolutional framework (ITCF). Unlike conventional deep learning models, which act like a “black box” and are difficult to analyse the interactions between variables, the proposed ITCF is able to detect both linear and nonlinear GC and automatically estimate time delay during the multivariant time series prediction. Specifically, GC is obtained by employing the least absolute shrinkage and selection operator (Lasso) regression during the prediction of multivariate time series using TCN. Then, time delays can be estimated by interpreting the TCN kernels. We propose a convolutional hierarchical group Lasso (cHGL), a hierarchical regularisation approach to effectively utilise temporal information within each TCN channel for enhanced GC detection. Additionally, as far as we are concerned, this paper is the first to integrate the Iterative Soft-Thresholding Algorithm into the backpropagation of TCN to optimise the proposed cHGL, which enables causal channel selection and induces sparsity within each TCN channel to remove redundant temporal information, ultimately creating an end-to-end GC detection framework. The testing results of four experiments, involving two simulations and two real data, demonstrate that the proposed ITCF, in comparison with state-of-the-art, offers a more reliable estimation of GC relationships in complex systems featuring intricate dynamics, limited data lengths, or numerous variables.

FullText(HTML)

References(58)

References

[1]	Y. Zhao, E. Hanna, G. R. Bigg, and Y. Zhao, “Tracking nonlinear correlation for complex dynamic systems using a windowed error reduction ratio method,” Complexity, vol. 2017, no. 1, p. 8570720, Nov. 2017. doi: 10.1155/2017/8570720
[2]	A. Papana, C. Kyrtsou, D. Kugiumtzis, and C. Diks, “Financial networks based on Granger causality: A case study,” Phys. A Stat. Mech. Appl., vol. 482, pp. 65–73, Sep. 2017. doi: 10.1016/j.physa.2017.04.046
[3]	X. Hou, K. Wang, C. Zhong, and Z. Wei, “ST-trader: A spatial-temporal deep neural network for modeling stock market movement,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 5, pp. 1015–1024, May 2021. doi: 10.1109/JAS.2021.1003976
[4]	J. Cao, Y. Zhao, X. Shan, H.-L. Wei, Y. Guo, L. Chen, J. A. Erkoyuncu, and P. G. Sarrigiannis, “Brain functional and effective connectivity based on electroencephalography recordings: A review,” Hum. Brain Mapp., vol. 43, no. 2, pp. 860–879, Feb. 2022. doi: 10.1002/HBM.25683
[5]	X. Chen and Y. Wang, “Predicting resting-state functional connectivity with efficient structural connectivity,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 6, pp. 1079–1088, Nov. 2018. doi: 10.1109/JAS.2017.7510880
[6]	D. A. Smirnov and I. I. Mokhov, “From Granger causality to long-term causality: Application to climatic data,” Phys. Rev. E, vol. 80, no. 1, p. 016208, Jul. 2009. doi: 10.1103/PhysRevE.80.016208
[7]	Y. Zhao, G. R. Bigg, S. A. Billings, E. Hanna, A. J. Sole, H.-L. Wei, V. Kadirkamanathan, and D. J. Wilton, “Inferring the variation of climatic and glaciological contributions to West Greenland iceberg discharge in the twentieth century,” Cold Reg. Sci. Technol., vol. 121, pp. 167–178, Jan. 2016. doi: 10.1016/j.coldregions.2015.08.006
[8]	C. W. J. Granger, “Investigating causal relations by econometric models and cross-spectral methods,” Econometrica, vol. 37, no. 3, pp. 424–438, Aug. 1969. doi: 10.2307/1912791
[9]	A. Shojaie and E. B. Fox, “Granger causality: A review and recent advances,” Annu. Rev. Stat. Appl., vol. 9, pp. 289–319, Mar. 2022. doi: 10.1146/annurev-statistics-040120-010930
[10]	T. Schreiber, “Measuring information transfer,” Phys. Rev. Lett., vol. 85, no. 2, pp. 461–464, Jul. 2000. doi: 10.1103/PhysRevLett.85.461
[11]	L. Faes, G. Nollo, and A. Porta, “Information-based detection of nonlinear Granger causality in multivariate processes via a nonuniform embedding technique,” Phys. Rev. E, vol. 83, no. 5, p. 051112, May 2011. doi: 10.1103/PhysRevE.83.051112
[12]	M. Dhamala, G. Rangarajan, and M. Ding, “Estimating Granger causality from Fourier and wavelet transforms of time series data,” Phys. Rev. Lett., vol. 100, no. 1, p. 018701, Jan. 2008. doi: 10.1103/PhysRevLett.100.018701
[13]	E. Torun, T. P. Chang, and R. Y. Chou, “Causal relationship between spot and futures prices with multiple time horizons: A nonparametric wavelet Granger causality test,” Res. Int. Bus. Finance, vol. 52, p. 101115, Apr. 2020. doi: 10.1016/J.RIBAF.2019.101115
[14]	D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Kernel method for nonlinear Granger causality,” Phys. Rev. Lett., vol. 100, no. 14, p. 144103, Apr. 2008. doi: 10.1103/PHYSREVLETT.100.144103/FIGURES/4/MEDIUM
[15]	S. Seth and J. C. Principe, “Assessing Granger non-causality using nonparametric measure of conditional independence,” IEEE Trans. Neural Networks Learn. Syst., vol. 23, no. 1, pp. 47–59, Jan. 2012. doi: 10.1109/TNNLS.2011.2178327
[16]	Y. Zhao, S. A. Billings, H. Wei, and P. G. Sarrigiannis, “Tracking time-varying causality and directionality of information flow using an error reduction ratio test with applications to electroencephalography data,” Phys. Rev. E, vol. 86, no. 5, p. 051919, Nov. 2012. doi: 10.1103/PhysRevE.86.051919
[17]	J. Runge, J. Heitzig, V. Petoukhov, and J. Kurths, “Escaping the curse of dimensionality in estimating multivariate transfer entropy,” Phys. Rev. Lett., vol. 108, no. 25, p. 258701, Jun. 2012. doi: 10.1103/PhysRevLett.108.258701
[18]	A. C. Lozano, N. Abe, Y. Liu, and S. Rosset, “Grouped graphical Granger modeling methods for temporal causal modeling,” in Proc. 15th ACM Int. Conf. Knowledge Discovery and Data Mining, Paris, France, 2009, pp. 577–586.
[19]	T. Hastie, R. Tibshirani, and M. Wainwright, Statistical Learning With Sparsity. New York, USA: Chapman and Hall/CRC, 2015.
[20]	M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 68, no. 1, pp. 49–67, Feb. 2006. doi: 10.1111/j.1467-9868.2005.00532.x
[21]	A. C. Lozano, N. Abe, Y. Liu, and S. Rosset, “Grouped graphical Granger modeling for gene expression regulatory networks discovery,” Bioinformatics, vol. 25, no. 12, pp. i110–i118, Jun. 2009. doi: 10.1093/BIOINFORMATICS/BTP199
[22]	W. B. Nicholson, I. Wilms, J. Bien, and D. S. Matteson, “High dimensional forecasting via interpretable vector autoregression,” J. Mach. Learn. Res., vol. 21, no. 1, p. 166, Jan. 2020.
[23]	Y. Antonacci, L. Astolfi, G. Nollo, and L. Faes, “Information transfer in linear multivariate processes assessed through penalized regression techniques: Validation and application to physiological networks,” Entropy, vol. 22, no. 7, p. 732, Jul. 2020. doi: 10.3390/e22070732
[24]	Y. Antonacci, J. Toppi, A. Pietrabissa, A. Anzolin, and L. Astolfi, “Measuring connectivity in linear multivariate processes with penalized regression techniques,” IEEE Access, vol. 12, pp. 30638–30652, Feb. 2024. doi: 10.1109/ACCESS.2024.3368637
[25]	S. A. Billings, Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains. Chichester, UK: Wiley and Sons, 2013.
[26]	Y. Zhao, S. A. Billings, H. Wei, F. He, and P. G. Sarrigiannis, “A new NARX-based Granger linear and nonlinear casual influence detection method with applications to EEG data,” J. Neurosci. Methods, vol. 212, no. 1, pp. 79–86, Jan. 2013. doi: 10.1016/j.jneumeth.2012.09.019
[27]	Y. Li, H.-L. Wei, S. A. Billings, and X.-F. Liao, “Time-varying linear and nonlinear parametric model for Granger causality analysis,” Phys. Rev. E, vol. 85, no. 4, p. 041906, Apr. 2012. doi: 10.1103/PhysRevE.85.041906
[28]	D. Marinazzo, M. Pellicoro, and S. Stramaglia, “Nonlinear parametric model for Granger causality of time series,” Phys. Rev. E, vol. 73, no. 6, p. 066216, Jun. 2006. doi: 10.1103/PhysRevE.73.066216
[29]	S. Chen, X. Hong, B. L. Luk, and C. J. Harris, “Construction of tunable radial basis function networks using orthogonal forward selection,” IEEE Trans. Syst. Man Cybern. B Cybern., vol. 39, no. 2, pp. 457–466, Apr. 2009. doi: 10.1109/TSMCB.2008.2006688
[30]	W. Ren, B. Li, and M. Han, “A novel Granger causality method based on HSIC-Lasso for revealing nonlinear relationship between multivariate time series,” Phys. A Stat. Mech. Appl., vol. 541, p. 123245, Mar. 2020. doi: 10.1016/j.physa.2019.123245
[31]	C. C. Aggarwal, Neural Networks and Deep Learning. Cham: Springer, 2018.
[32]	K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, no. 5, pp. 359–366, Jan. 1989. doi: 10.1016/0893-6080(89)90020-8
[33]	B. Lim and S. Zohren, “Time-series forecasting with deep learning: A survey,” Philos. Trans. Roy. Soc. A Math. Phys. Eng. Sci., vol. 379, no. 2194, p. 20200209, Apr. 2021. doi: 10.1098/rsta.2020.0209
[34]	H. Wang and G. Song, “Innovative NARX recurrent neural network model for ultra-thin shape memory alloy wire,” Neurocomputing, vol. 134, pp. 289–295, Jun. 2014. doi: 10.1016/j.neucom.2013.09.050
[35]	J. L. Elman, “Finding structure in time,” Cogn. Sci., vol. 14, no. 2, pp. 179–211, Mar. 1990. doi: 10.1207/s15516709cog1402_1
[36]	S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. doi: 10.1162/NECO.1997.9.8.1735
[37]	K. Cho, B. V. Merriënboer, Ç. Gulçehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proc. Conf. Empirical Methods in Natural Language Processing, Doha, Qatar: Association for Computational Linguistics, 2014, pp. 1724–1734.
[38]	Y. Wang, K. Lin, Y. Qi, Q. Lian, S. Feng, Z. Wu, and G. Pan, “Estimating brain connectivity with varying-length time lags using a recurrent neural network,” IEEE Trans. Biomed. Eng., vol. 65, no. 9, pp. 1953–1963, Sep. 2018. doi: 10.1109/TBME.2018.2842769
[39]	P. J. Werbos, “Backpropagation through time: What it does and how to do it,” Proc. IEEE, vol. 78, no. 10, pp. 1550–1560, Oct. 1990. doi: 10.1109/5.58337
[40]	S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv: 1803.01271, 2018.
[41]	A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “WaveNet: A generative model for raw audio,” in Proc. 9th ISCA Workshop on Speech Synthesis Workshop, Sunnyvale, USA, 2016, p. 125.
[42]	A. Montalto, S. Stramaglia, L. Faes, G. Tessitore, R. Prevete, and D. Marinazzo, “Neural networks with non-uniform embedding and explicit validation phase to assess Granger causality,” Neural Networks, vol. 71, pp. 159–171, Nov. 2015. doi: 10.1016/J.NEUNET.2015.08.003
[43]	B. Liu, X. He, M. Song, J. Li, G. Qu, J. Lang, and R. Gu, “A method for mining Granger causality relationship on atmospheric visibility,” ACM Trans. Knowl. Discov. Data, vol. 15, no. 5, p. 92, Oct. 2021. doi: 10.1145/3447681
[44]	A. Tank, I. Covert, N. Foti, A. Shojaie, and E. B. Fox, “Neural Granger causality,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4267–4279, Aug. 2022. doi: 10.1109/TPAMI.2021.3065601
[45]	M. Nauta, D. Bucur, and C. Seifert, “Causal discovery with attention-based convolutional neural networks,” Mach. Learn. Knowl. Extr., vol. 1, no. 1, pp. 312–340, Jan. 2019. doi: 10.3390/MAKE1010019
[46]	T. Shi, W. Yang, A. Qi, P. Li, and J. Qiao, “LASSO and attention-TCN: A concurrent method for indoor particulate matter prediction,” Appl. Intell., vol. 53, no. 17, pp. 20076–20090, Mar. 2023. doi: 10.1007/s10489-023-04507-6
[47]	Y. Shao, J. Tang, J. Liu, L. Han, and S. Dong, “Multivariable system prediction based on TCN-LSTM networks with self-attention mechanism and LASSO variable selection,” ACS Omega, vol. 8, no. 50, pp. 47798–47811, Dec. 2023. doi: 10.1021/ACSOMEGA.3C06263
[48]	M. Rosoł, M. Młyńczak, and G. Cybulski, “Granger causality test with nonlinear neural-network-based methods: Python package and simulation study,” Comput. Methods Programs Biomed., vol. 216, p. 106669, Apr. 2022. doi: 10.1016/J.CMPB.2022.106669
[49]	R. Jenatton, J. Mairal, G. Obozinski, and F. Bach, “Proximal methods for hierarchical sparse coding,” J. Mach. Learn. Res., vol. 12, pp. 2297–2334, Jul. 2011.
[50]	N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends® Optim., vol. 1, no. 3, pp. 127–239, Jan. 2014. doi: 10.1561/2400000003
[51]	A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imaging Sci., vol. 2, no. 1, pp. 183–202, Jan. 2009. doi: 10.1137/080716542
[52]	Y. Antonacci, L. Minati, L. Faes, R. Pernice, G. Nollo, J. Toppi, A. Pietrabissa, and L. Astolfi, “Estimation of Granger causality through artificial neural networks: Applications to physiological systems and chaotic electronic oscillators,” PeerJ Comput. Sci., vol. 7, p. e429, May 2021. doi: 10.7717/peerj-cs.429
[53]	A. Bolstad, B. D. Van Veen, and R. Nowak, “Causal network inference via group sparse regularization,” IEEE Trans. Signal Process., vol. 59, no. 6, pp. 2628–2641, Jun. 2011. doi: 10.1109/TSP.2011.2129515
[54]	E. N. Lorenz, “Predictability — A problem partly solved,” in Predictability of Weather and Climate, T. Palmer and R. Hagedorn, Eds. Cambridge, UK: Cambridge University Press, 2006, pp. 40–58.
[55]	S. M. Smith, K. L. Miller, G. Salimi-Khorshidi, M. Webster, C. F. Beckmann, T. E. Nichols, J. D. Ramsey, and M. W. Woolrich, “Network modelling methods for FMRI,” Neuroimage, vol. 54, no. 2, pp. 875–891, Jan. 2011. doi: 10.1016/j.neuroimage.2010.08.063
[56]	R. J. Prill, D. Marbach, J. Saez-Rodriguez, P. K. Sorger, L. G. Alexopoulos, X. Xue, N. D. Clarke, G. Altan-Bonnet, and G. Stolovitzky, “Towards a rigorous assessment of systems biology models: The DREAM3 challenges,” PLoS One, vol. 5, no. 2, p. e9202, Feb. 2010. doi: 10.1371/journal.pone.0009202
[57]	M. Langer, Z. He, W. Rahayu, and Y. Xue, “Distributed training of deep learning models: A taxonomic perspective,” IEEE Trans. Parallel Distrib. Syst., vol. 31, no. 12, pp. 2802–2818, Dec. 2020. doi: 10.1109/TPDS.2020.3003307
[58]	S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends® Mach. Learn., vol. 3, no. 1, pp. 1–22, Jan. 2011.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(14) / Tables(6)

Get Citation

PDF

XML

Article Metrics

Article views (1019) PDF downloads(73)

An Interpretable Temporal Convolutional Framework for Granger Causality Analysis

doi: 10.1109/JAS.2025.125396

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content