Abstract. The maximum entropy principle uniquely identifies the distribution that models our knowledge about the data, but is otherwise maximally unbiased. As soon as we include non-trivial observations in our model, however, exact inference quickly becomes intractable. We propose a relaxation that permits efficient inference by dynamically factorizing the joint distribution into factors. In particular, we show that these factors are learnable from data and that it is consistent with standard maximum entropy distribution. Through an extensive set of experiments we show that the relaxation is scalable, approximates the vanilla distribution closely, allows for a classification that is as good, as well as results in a concise set of patterns.