Abstract. Suppose we are given an event sequence \(X\) of observed events and an equally long binary sequence \(Y\) that indicates whether something of interest happened at that particular point in time. We consider the problem of mining sequential patterns from \(X\) that reliably predict those interesting events. With reliable we mean those patterns that not only predict that an interesting event is likely to follow, but especially those patterns for which we can with high precision tell how long until that event will happen. That is, we are after patterns that have highly skewed distributions of delays between pattern occurrences and predicted events. In particular, we are after the smallest, least redundant set of such patterns that together explain the interesting events well.
We formally define this problem in terms of the Minimum Description Length principle, by which we identify the best patterns as those that describe the data most succinctly. As discovering the optimal explanation of \(Y\) given a set of patterns, as well as the discovery of optimal pattern set are both hard problems that do not allow for straightforward optimization, we propose the heuristic Omen algorithm. Through extensive empirical evaluation we show that Omen works well in practice and beats the state of the art both quantitatively and qualitatitively.
Just Wait For It... Mining Sequential Patterns with Reliable Prediction Delays. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'20), IEEE, 2020. (full paper, 9.8% acceptance rate; overall 19.7%)