\sum_{i,j,k} d_{ijk}_stra: Analogy between Software Development and Stochastic Gradient Descent

When I am developing a software, I feel like I am executing a stochastic gradient descent algorithm myself. You start with a large step size: you define a lot of important classes and everything is very flexible at that time. Then, as the # of lines of your code gets larger and larger, your step size gets smaller: you make more of local changes than global changes (ex: let's change the signature of this function so that I can pass this variable...). But just like problems of stochastic approximation, it is difficult and costly to get global estimate of how good your current solution is, so you have to make decision based on a local observation: current feature request by your boss.

Sometimes I feel like I am stuck in local optimum and write everything from scratch to find better solution, but usually when the new implementation is finally done I realize that it is not much better than the previous one. Similar things happen in stochastic gradient descent as well: I have rarely seen you reach a significantly better solution by running it again, although it is a local method. But you've spent 2x more time by re-executing it!

Also, the step-size schedule is very very important. You need to decay it in the right rate. So in SGD you test it on sub-sample. In software engineering you develop prototypes.

You should've inferred at this point that I am a crappy software engineer. Yes I do suck.

\sum_{i,j,k} d_{ijk}_stra

Sunday, August 18, 2013

Analogy between Software Development and Stochastic Gradient Descent

No comments:

Post a Comment