Toward a framework for automatic model building

Statistical Modeling, Causal Inference, and Social Science 2013-03-15

Patrick Caldon writes:

I saw your recent blog post where you discussed in passing an iterative-chain-of models approach to AI.

I essentially built such a thing for my PhD thesis – not in a Bayesian context, but in a logic programming context – and proved it had a few properties and showed how you could solve some toy problems. The important bit of my framework was that at various points you also go and get more data in the process – in a statistical context this might be seen as building a little univariate model on a subset of the data, then iteratively extending into a better model with more data and more independent variables – a generalized forward stepwise regression if you like. It wrapped a proper computational framework around E.M. Gold’s identification/learning in the limit based on a logic my advisor (Eric Martin) had invented.

What’s not written up in the thesis is a few months of failed struggle trying to shoehorn some simple statistical inference into this framework with decent computational properties! I had a good crack with a few different ideas and didn’t really get anywhere, and worse I couldn’t say much in the end about why it seemed to be hard. That said I think it’s straightforward in Gold’s original framework to show something along the lines that an integer-leaf valued CART tree is identifiable in the limit iff such a tree describes a collection of data, and my framework should give a straightforward (if probably computationally terrible) way of actually implementing such a thing.

I’ve now moved onto different things (indeed, moved on from logic in academia into statistics in finance) but I thought you might it interesting to see this problem analysed from a different perspective.