Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatana and Eamonn Keogh, "The UCR Time Series Archive," IEEE/CAA J. Autom. Sinica, vol. 6, no. 6, pp. 1293-1305, Nov. 2019. doi: 10.1109/JAS.2019.1911747
The UCR Time Series Archive

doi: 10.1109/JAS.2019.1911747
  • The UCR time series archive – introduced in 2002, has become an important resource in the time series data mining community, with at least one thousand published papers making use of at least one data set from the archive. The original incarnation of the archive had sixteen data sets but since that time, it has gone through periodic expansions. The last expansion took place in the summer of 2015 when the archive grew from 45 to 85 data sets. This paper introduces and will focus on the new data expansion from 85 to 128 data sets. Beyond expanding this valuable resource, this paper offers pragmatic advice to anyone who may wish to evaluate a new algorithm on the archive. Finally, this paper makes a novel and yet actionable claim: of the hundreds of papers that show an improvement over the standard baseline (1-nearest neighbor classification), a fraction might be mis-attributing the reasons for their improvement. Moreover, the improvements claimed by these papers might have been achievable with a much simpler modification, requiring just a few lines of code.


  • 1Why would someone use the archive and not acknowledge it? Carelessness probably explains the majority of such omissions. In addition, for several years (approximately 2006 to 2011), access to the archive was conditional on informally pledging to test on all data sets to avoid cherry picking (see Section IV). Some authors who did then go on to test on only a limited subset, possibly choosing not to cite the archive to avoid bringing attention to their failure to live up to their implied pledge.
    2These works should not be confused with papers that suggest using a wavelet representation to perform dimensionality reduction to allow more efficient indexing of time series.
    Figures(12)  / Tables(3)

    • introduces a significant expansion of the UCR Time Series Archive, the standard benchmark for time series classification for the last two decades.
    • offers advice and “pitfalls-to-avoid” for researchers working in time series classification.
    • offers some concrete demonstrations of the dangers of “cherry picking”, a common problem in literature that makes comparisons between rival methods difficult and can give the false illusion of progress.


