Sunday, August 18, 2013

Analogy between Software Development and Stochastic Gradient Descent


When I am developing a software, I feel like I am executing a stochastic gradient descent algorithm myself. You start with a large step size: you define a lot of important classes and everything is very flexible at that time. Then, as the # of lines of your code gets larger and larger, your step size gets smaller: you make more of local changes than global changes (ex: let's change the signature of this function so that I can pass this variable...). But just like problems of stochastic approximation, it is difficult and costly to get global estimate of how good your current solution is, so you have to make decision based on a local observation: current feature request by your boss.

Sometimes I feel like I am stuck in local optimum and write everything from scratch to find better solution, but usually when the new implementation is finally done I realize that it is not much better than the previous one. Similar things happen in stochastic gradient descent as well: I have rarely seen you reach a significantly better solution by running it again, although it is a local method. But you've spent 2x more time by re-executing it!

Also, the step-size schedule is very very important. You need to decay it in the right rate. So in SGD you test it on sub-sample. In software engineering you develop prototypes.

You should've inferred at this point that I am a crappy software engineer. Yes I do suck.

Saturday, July 6, 2013

How to waste time

The reason I wasted 2 sweet afternoon hours of Saturday: 1) cmake 2.8.10 has a bug which removing CMakeCache.txt does not completely remove cached values of previous execution. Attempting -DCMAKE_CXX_COMPILER=icpc even results in an infinte loop! (seriously?)  2) intel compiler does not support override keyword.

At this point, I really want to abandon cmake. I have cumulatively spent at least (literally!) a week on figuring out why my cmake file is not working on a new system I want to deploy my stuff on. But what's an alternative?  bjam? Seriously?

Thursday, July 4, 2013

Thursday, June 20, 2013

Simple Model + Small Data is the only way to go?

A question that bothers me:

1. simple model + small data: OK, if you have good understanding on the generative process.
2. complex model + small data: most likely fail due to overfitting, unless the noise is very very small so that the system is almost deterministic.
3. simple model + big data: calculation of parameter estimate is challenging although sometimes tractable using techniques like SGD, but frequentist hypothesis testing almost always fails because you might be oversimplifying the problem.
4. complex model + big data: calculation of parameter estimate is impossible.

Conclusion: a good statistician should only work with 1. simple model + small data!?

Thursday, June 13, 2013

The Signal and the Noise

At last, I have finished reading the book 'the signal and the noise: why so many predictions fail- but some don't'. It does a very good job in explaining 1) why it is important for us to understand uncertainty - the central theme of statistics -, 2) why statistical analysis is so challenging, and 3) how we can (sometimes) improve the model, in plain English (that is, without statistical jargons). This book shines the most in its careful selection of problems it discusses; baseball, earthquake, stock market, chess, and terrorism are very good examples which shows different aspects of the 'prediction problem'.

I would strongly recommend this book to those who are interested in understanding why people are making such a big fuss about Big Data/machine learning/statistics. As Larry Wasserman pointed out in his blog post, however, its treatment of frequentist statistics is very unfair... and I feel very uncomfortable every time a Bayesian claims that Bayesian statistics is a magic bullet to every problem frequentist statistics has. But this is a pop science book after all... probably it was a necessary sacrifice to deliver the idea to non-academics.