Sunday, August 18, 2013

Analogy between Software Development and Stochastic Gradient Descent

When I am developing a software, I feel like I am executing a stochastic gradient descent algorithm myself. You start with a large step size: you define a lot of important classes and everything is very flexible at that time. Then, as the # of lines of your code gets larger and larger, your step size gets smaller: you make more of local changes than global changes (ex: let's change the signature of this function so that I can pass this variable...). But just like problems of stochastic approximation, it is difficult and costly to get global estimate of how good your current solution is, so you have to make decision based on a local observation: current feature request by your boss.

Sometimes I feel like I am stuck in local optimum and write everything from scratch to find better solution, but usually when the new implementation is finally done I realize that it is not much better than the previous one. Similar things happen in stochastic gradient descent as well: I have rarely seen you reach a significantly better solution by running it again, although it is a local method. But you've spent 2x more time by re-executing it!

Also, the step-size schedule is very very important. You need to decay it in the right rate. So in SGD you test it on sub-sample. In software engineering you develop prototypes.

You should've inferred at this point that I am a crappy software engineer. Yes I do suck.

Saturday, July 6, 2013

How to waste time

The reason I wasted 2 sweet afternoon hours of Saturday: 1) cmake 2.8.10 has a bug which removing CMakeCache.txt does not completely remove cached values of previous execution. Attempting -DCMAKE_CXX_COMPILER=icpc even results in an infinte loop! (seriously?)  2) intel compiler does not support override keyword.

At this point, I really want to abandon cmake. I have cumulatively spent at least (literally!) a week on figuring out why my cmake file is not working on a new system I want to deploy my stuff on. But what's an alternative?  bjam? Seriously?

Thursday, July 4, 2013

Thursday, June 20, 2013

Simple Model + Small Data is the only way to go?

A question that bothers me:

1. simple model + small data: OK, if you have good understanding on the generative process.
2. complex model + small data: most likely fail due to overfitting, unless the noise is very very small so that the system is almost deterministic.
3. simple model + big data: calculation of parameter estimate is challenging although sometimes tractable using techniques like SGD, but frequentist hypothesis testing almost always fails because you might be oversimplifying the problem.
4. complex model + big data: calculation of parameter estimate is impossible.

Conclusion: a good statistician should only work with 1. simple model + small data!?

Thursday, June 13, 2013

The Signal and the Noise

At last, I have finished reading the book 'the signal and the noise: why so many predictions fail- but some don't'. It does a very good job in explaining 1) why it is important for us to understand uncertainty - the central theme of statistics -, 2) why statistical analysis is so challenging, and 3) how we can (sometimes) improve the model, in plain English (that is, without statistical jargons). This book shines the most in its careful selection of problems it discusses; baseball, earthquake, stock market, chess, and terrorism are very good examples which shows different aspects of the 'prediction problem'.

I would strongly recommend this book to those who are interested in understanding why people are making such a big fuss about Big Data/machine learning/statistics. As Larry Wasserman pointed out in his blog post, however, its treatment of frequentist statistics is very unfair... and I feel very uncomfortable every time a Bayesian claims that Bayesian statistics is a magic bullet to every problem frequentist statistics has. But this is a pop science book after all... probably it was a necessary sacrifice to deliver the idea to non-academics.

Friday, January 13, 2012 William M. Briggs - 'It is Time to Stop Teaching Frequentism to Non-statisticians'

Every time a Bayesian attacks fallacies of frequentism, I agree with their points. But that does not necessarily imply Bayesian is 'better' than frequentist method and should replace it. Bayesians have their own problems, and it cannot be solved by blaming frequentists. I really like Bayesian ideas and their methods... but I hate some of Bayesians boasting their results too much - "Frequentists are all wrong, while Bayesian methods are perfect" is clearly an overstatement.

Tuesday, July 12, 2011

Defining Google+ circles

I have not been able to start using G+ actively, since it took quite a time for me to come up with nice circles definition, that is  
1) MECE (Mutually Exclusive, Collectively Exhaustive)
2) Fits into memory - no more than 5 groups!
3) Sizes of circles are well-balanced

Inspired by sociologist Mark Granovetter's idea of 'weak tie', I think now it is done quite elegantly:
1) Koreans - who are not annoyed because I am posting in Korean
1-A) Korean Strong Tie - those I can speak international matters with
1-B) Korean Weak Tie
2) International Friends - those who don't read Korean
# 2-A) International Strong Tie
# 2-B) International Weak Tie
Currently 2) is kinda small relative to 1), thus 2 is not yet sub-divided into 2-A) and 2-B).3) Family

But still, I need yet another rule to decide on which occasion should I use facebook/twitter/G+. Man, using SNS is quite a burden!