Abstract
Time-series analysis is an important domain of machine learning and a plethora of methods have been developed for the task. This paper proposes a new representation of time series, which in contrast to existing approaches, decomposes a time-series dataset into latent patterns and membership weights of local segments to those patterns. The process is formalized as a constrained objective function and a tailored stochastic coordinate descent optimization is applied. The time-series are projected to a new feature representation consisting of the sums of the membership weights, which captures frequencies of local patterns. Features from various sliding window sizes are concatenated in order to encapsulate the interaction of patterns from different sizes. The derived representation offers a set of features that boosts classification accuracy. Finally, a large-scale experimental comparison against 11 baselines over 43 real life datasets, indicates that the proposed method achieves state-of-the-art prediction accuracy results.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. SODA ’07, Philadelphia, PA, society for industrial and applied mathematics, pp 1027–1035
Barthelemy Q, Larue A, Mayoue A, Mercier D, Mars J (2012) Shift and 2d rotation invariant sparse coding for multivariate signals. IEEE Trans Signal Process 60(4):1597–1611
Batista GEAPA, Wang X, Keogh EJ (2011) A complexity-invariant distance measure for time series. In: SDM, SIAM / Omnipress, pp 699–710
Batista GEAPA, Keogh EJ, Tataw OM, de Souza VMA (2014) CID: an efficient complexity-invariant distance for time series. Data Min Knowl Disc 28(3):634–669
Baydogan MG, Runger G, Tuv E (2013) A bag-of-features framework to classify time series. IEEE Trans Pattern Anal Mach Intell 35(11):2796–2802
Buza K, Schmidt-Thieme L (2010) Motif-based classification of time series with Bayesian networks and SVMs. In: Fink A, Lausen B, Seidel W, Ultsch A (eds) Advances in data analysis, data handling and business intelligence. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, Heidelberg, pp 105–114
Chen Y, Nascimento M, Ooi BC, Tung A (2007) Spade: on shape-based pattern detection in streaming time series. In: IEEE 23rd international conference on data engineering, 2007. ICDE 2007. pp 786–795
Chen L, Ng R (2004) On the marriage of lp-norms and edit distance. In: Proceedings of the thirtieth international conference on very large data bases—vol 30. VLDB ’04, VLDB endowment, pp 792–803
Cuturi M (June 2011) Fast global alignment kernels. In: et al. G. (ed) Proceedings of the ICML 2011. ICML 2011, New York, ACM, pp 929–936
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh EJ (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1(2):1542–1552
Grabocka J, Nanopoulos A, Schmidt-Thieme L (2012a) Classification of sparse time series via supervised matrix factorization. In: Hoffmann J, Selman B (eds) AAAI, AAAI Press
Grabocka J, Nanopoulos A, Schmidt-Thieme L (2012b) Invariant time-series classification. In: Flach PA, Bie TD, Cristianini N (eds) ECML/PKDD (2). Lecture notes in computer science, vol 7524. Springer, pp 725–740
Gudmundsson S, Runarsson TP, Sigurdsson S (2008) Support vector machines and dynamic time warping for time series. In: IJCNN, IEEE, pp 2772–2776
Hills J, Lines J, Baranauskas E, Mapp J, Bagnall A (2013) Classification of time series by shapelet transformation. Data Min Knowl Disc 28:851–881
Huang PS, Yang J, Hasegawa-Johnson M, Liang F, Huang TS (2012) Pooling robust shift-invariant sparse representations of acoustic signals. In: INTERSPEECH, ISCA
Keogh EJ, Pazzani MJ (2000) Scaling up dynamic time warping for datamining applications. In: KDD. pp 285–289
Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3(3):263–286
Kuksa P, Pavlovic V (2010) Spatial representation for efficient sequence classification. In: 20th international conference on pattern recognition (ICPR), 2010. pp 3320–3323
Lewicki MS, Sejnowski TJ (1999) Coding time-varying signals using sparse, shift-invariant representations. In: Proceedings of NIPS, Cambridge, MIT Press pp 730–736
Lin J, Khade R, Li Y (2012) Rotation-invariant similarity in time series using bag-of-patterns representation. J Intell Inf Syst 39(2):287–315
Lin J, Li Y (2009) Finding structural similarity in time series data using bag-of-patterns representation. In: Proceedings of the 21st international conference on scientific and statistical database management. SSDBM 2009, Springer, Berlin pp 461–477
Lin J, Keogh E, Wei L, Lonardi S (October 2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
Marussy K, Buza K (2013) Success: a new approach for semi-supervised classification of time-series. In: Rutkowski L, Korytkowski M, Scherer R, Tadeusiewicz R, Zadeh L, Zurada J (eds) Artificial intelligence and soft computing. Lecture notes in computer science, vol 7894. Springer, Berlin, pp 437–447
Mueen A, Keogh EJ, Young N. (2011) Logical-shapelets: an expressive primitive for time series classification. In: Apté C, Ghosh J, Smyth P (eds) KDD, ACM, pp 1154–1162
Platt JC (1999) Advances in kernel methods. MIT Press, Cambridge
Rakthanmanon T, Campana B, Mueen A, Batista G, Westover B, Zhu Q, Zakaria J, Keogh E (2012) Searching and mining trillions of time series subsequences under dynamic time warping. In: Proceedings of the 18th ACM SIGKDD. KDD 2012, New York, ACM, pp 262–270
Rakthanmanon T, Keogh E (2013) Fast shapelets: a scalable algorithm for discovering time series shapelets. In: Proceedings of the 13th SIAM international conference on data mining
Vlachos M, Kollios G, Gunopulos D (2002) Discovering similar multidimensional trajectories. In: Proceedings 18th international conference on data engineering, 2002. pp 673–684
Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Disc 26(2):275–309
Wang J, Liu P, She MF, Nahavandi S, Kouzani A (2013) Bag-of-words representation for biomedical time series classification. Biomed Signal Process Control 8(6):634–644
Wang F, Lee N, Hu J, Sun J, Ebadollahi S (2012) Towards heterogeneous temporal clinical event pattern discovery: a convolutional approach. In: Proceedings of ACM SIGKDD. KDD ’12, New York, ACM pp 453–461
Wei L, Keogh E (2006) Semi-supervised time series classification. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’06, New York, ACM pp 748–753
Yu HF, Hsieh CJ, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: Zaki MJ, Siebes A, Yu JX, Goethals B, Webb GI, Wu X (eds) ICDM, IEEE computer society, pp 765–774
Zhang D, Zuo W, Zhang D, Zhang H (2010) Time series classification using support vector machine with Gaussian elastic metric kernel. In: ICPR, IEEE, pp 29–32
Acknowledgments
Partially co-funded by the Seventh Framework Programme of the European Comission, through project REDUCTION (# 288254). www.reduction-project.eu.
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.
Rights and permissions
About this article
Cite this article
Grabocka, J., Schmidt-Thieme, L. Invariant time-series factorization. Data Min Knowl Disc 28, 1455–1479 (2014). https://doi.org/10.1007/s10618-014-0364-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-014-0364-z