Tuesday, July 13, 2010

Statistical Research and Ethics

In my opinion, Statistics is an attempt to draw a line between "speakable"s and "unspeakable"s. "Is there really a relationship between X and Y?" Usually you cannot say "yes" or "no" for sure, but there is always something left for you to "speak". Statistical Theory is about those which are left.

In this respect, statistical analysis is an ethical deed, because speaking of unspeakable is unethical. This is why I'm infuriated when some people blur this "line" between two instead of making it clearer. I'm sick of people making exaggerations (especially with those "social network" things) about data they have in hand, thus I became a Statistician.

In this line, speaking of "unspeakable"s by merely introducing a prior distribution that you're not sure of  validity is an unethical deed. One should be very careful when interpreting the outcome of Bayesian analysis. When we have no idea about what the prior distribution is, then we also have no idea about how the posterior distribution can be interpreted. But it's really tempting to "intentionally" ignore the fact that the selection of prior was ad-hoc.

Hence it is natural to favor a framework of statistical inference which this distinction between "speakable" and "unspeakable" is clear. This is why there are much more people using frequentist methods than Bayesian methods. Neyman-Pearson framework, However, the dominant statistical procedure used in hypothesis testing, puts so much emphasis on null hypothesis that in many cases there's little left to talk about alternative hypotheses. For example, low p-value indicates the null hypothesis is not a good model for your data, but it does not necessarily mean there is a good model in models of alternative hypothesis. This limitation sometimes makes people make statements about "unspeakable"s of alternative hypothesis, which is also unethical.

Maybe no statistical procedure can be perfect. We conduct statistical analysis since there are uncertainties. If we can speak with 100% certainty, then it is not about statistics. Maybe you're talking about Mathematics (although as you might know, you cannot always speak with 100% certainty even in Mathematics). But you should also agree that there are "speakable"s in data, because those accumulated "speakable"s have constructed the 'Science' which aids us every day. So we should investigate how we can develop a procedure which we can easily avoid speaking the "unspeakable" while there's still much thing left to "speak".

In this respect, I like Liu's theory (and the framework) of statistical inference since it makes statistical statements to be clear of what is speakable and what is not. So it is safer than other frameworks from committing an unethical deed.

Shamefully I'm a novice at Statistics, so I'm yet to judge usefulness or impact of what his group has done. However, I like them going a research in this direction, since this is what we 'should' do, helping people avoid committing "unethical" deed. I'm not saying we have a solution now. I'm saying we have something to do, and we're doing it.

No comments:

Post a Comment