Identifiability of Cause and Effect using Regularized Regression

Abstract. We consider the problem of telling apart cause from effect between two univariate continuous-valued random variables \(X\) and \(Y\). In general it is impossible to make definite statements about causality without making assumptions on the underlying model; one of the most important aspects of causal inference is hence to determine under which assumptions are we able to do so.

In this paper we show under which general conditions we can identify cause from effect by simply choosing the direction with the best regression score. We define a general framework of identifiable regression-based scoring functions, and show how to instantiate it in practice using regression splines. Compared to existing methods that either give strong guarantees, but are hardly applicable in practice, or provide no guarantees, but do work well in practice, our instantiation combines the best of both worlds; it gives guarantees, while empirical evaluation on synthetic and real world data shows that it performs at least as well as the state of the art.

Implementation

the R source code (May 2019) by Alexander Marx.

Related Publications

Marx, A & Vreeken, J Identifiability of Cause and Effect using Regularized Regression. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'19), ACM, 2019. (oral presentation 9.2% acceptance rate; overall 14.2%)