A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 11 Issue 7
Jul.  2024

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
K. Jiang, W. Liu, Y.  Wang, L. Dong, and  C. Sun,  “Discovering latent variables for the tasks with confounders in multi-agent reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 7, pp. 1591–1604, Jul. 2024. doi: 10.1109/JAS.2024.124281
Discovering Latent Variables for the Tasks With Confounders in Multi-Agent Reinforcement Learning

doi: 10.1109/JAS.2024.124281
Funds:  This work was supported in part by the National Natural Science Foundation of China (62136008, 62236002, 61921004, 62173251, 62103104), the “Zhishan” Scholars Programs of Southeast University, and the Fundamental Research Funds for the Central Universities (2242023K30034)
  • Efficient exploration in complex coordination tasks has been considered a challenging problem in multi-agent reinforcement learning (MARL). It is significantly more difficult for those tasks with latent variables that agents cannot directly observe. However, most of the existing latent variable discovery methods lack a clear representation of latent variables and an effective evaluation of the influence of latent variables on the agent. In this paper, we propose a new MARL algorithm based on the soft actor-critic method for complex continuous control tasks with confounders. It is called the multi-agent soft actor-critic with latent variable (MASAC-LV) algorithm, which uses variational inference theory to infer the compact latent variables representation space from a large amount of offline experience. Besides, we derive the counterfactual policy whose input has no latent variables and quantify the difference between the actual policy and the counterfactual policy via a distance function. This quantified difference is considered an intrinsic motivation that gives additional rewards based on how much the latent variable affects each agent. The proposed algorithm is evaluated on two collaboration tasks with confounders, and the experimental results demonstrate the effectiveness of MASAC-LV compared to other baseline algorithms.


    • Provides a more clear representation of the latent variable space in multi-agent reinforcement learning tasks with confounding factors
    • Based on variational inference theory, a latent variable discovery method is proposed to infer the distribution of latent variables from a large amount of offline experience, enhancing the agent's capability to explore complex tasks
    • Infers specific latent variables for each agent based on the environment at different times and states, utilizing them to expand the observation space of each agent
    • Derives counterfactual policies that exclude the latent variable space, and quantifies the distinct impacts of latent variables on each agent by measuring the differences between policies with and without latent variables


