Causal Inference by Compression

Abstract. Causal inference is one of the fundamental problems in science. In recent years, several methods have been proposed for discovering causal structure from observational data. These methods, however, focus specifically on numeric data, and are not applicable on discrete nominal or binary data.

In this work, we focus on causal inference for binary data. Simply put, we propose causal inference by compression. To this end we propose an inference framework based on solid information theoretic foundations, i.e. Kolmogorov complexity. However, Kolmogorov complexity is not computable, and hence we propose a practical and computable instantiation based on the Minimum Description Length (MDL) principle.

To apply the framework in practice, we propose Origo, an efficient method for inferring the causal direction from binary data. Origo employs the lossless Pack compressor, works directly on the data and does not require assumptions about neither distributions nor the type of causal relations. Extensive evaluation on synthetic, benchmark, and real-world data shows that ORIGO discovers meaningful causal relations, and outperforms state-of-the-art methods by a wide margin.

Implementation

the Python source code (March 2017) by Kailash Budhathoki and Jilles Vreeken.

Related Publications

Budhathoki, K & Vreeken, J Causal Inference by Compression. In: Proceedings of the IEEE International Conference on Data Mining (ICDM'16), IEEE, 2016. (regular paper, 8.5% acceptance rate; overall 19.6%)