Tuesday, July 12, 2011

Defining Google+ circles

I have not been able to start using G+ actively, since it took quite a time for me to come up with nice circles definition, that is  
1) MECE (Mutually Exclusive, Collectively Exhaustive)
2) Fits into memory - no more than 5 groups!
3) Sizes of circles are well-balanced

Inspired by sociologist Mark Granovetter's idea of 'weak tie', I think now it is done quite elegantly:
1) Koreans - who are not annoyed because I am posting in Korean
1-A) Korean Strong Tie - those I can speak international matters with
1-B) Korean Weak Tie
2) International Friends - those who don't read Korean
# 2-A) International Strong Tie
# 2-B) International Weak Tie
Currently 2) is kinda small relative to 1), thus 2 is not yet sub-divided into 2-A) and 2-B).3) Family

But still, I need yet another rule to decide on which occasion should I use facebook/twitter/G+. Man, using SNS is quite a burden!

How to send search requests to Journal Websites

When I search for papers, I simply use google search. Sometimes I use google scholar, but oddly enough, the former usually gives me better quality results. But in some areas of study people use journal websites directly, maybe because terminologies they use are quite general, thus they want to exclude non-academic websites. (Maybe they don't like Google? Actually I don't really know why :D)

However, you do not want to visit every journal website for every single query. If there are ten journals in your area of study, then visiting all ten journal websites should consume a lot of your precious time. So a friend of mine wants to make a web-page which can send search request to any journal website she wants to use.

If the request is done in GET, it is easy figure out what parameters you need. For example, if you search "graphs" in Google, the address bar on your browser shows the following URL:

http://www.google.de/search?sourceid=chrome&ie=UTF-8&q=graphs

So you can simply substitute "graphs" part by the word of your choice to request search to google. But when you search 'graphs' in APS (American Physics Society) website, it simply shows

http://publish.aps.org/search

because the request is done in POST. There are two ways of finding out parameters used in POST, I think. The first is to look at HTML source code of the web page which requests search. But usually HTML source codes are very unreadable, so reading it is not very fun. The other way is usually more efficient: to look at sent request. You may use fancy network monitoring tools, but actually it suffices to use Google Chrome.

To do this, click on 'Wrench' icon next to the address bar, go to 'Tools' menu, and turn on developer tools. Then a fancy window appears on bottom of the browser. Choose 'Network' tab, then you can see something like the following:

 From 'Form Data', you can see which parameters were requested in 'POST' message. In this case, it seems that 'q%5Bclauses%5D%5B%5D%5Bfield%5D' denotes the type of the field, and 'q%5Bclauses%5D%5B%5D%5Bvalue%5D' is the value of the field. Thus, by using the following URL:

http://publish.aps.org/search/query?q%5Bclauses%5D%5B%5D%5Bfield%5D=abstitle&q%5Bclauses%5D%5B%5D%5Bvalue%5D=graphs

you can search for 'graphs' in APS website. Similarly,

http://jcp.aip.org/search?key=JCPSA6&societykey=AIP&coden=JCPSA6&q=graphs&displayid=AIP&sortby=newestdate&faceted=faceted&sortby=newestdate&CP_Style=false&alias=&searchzone=2

Oh, in this case search was done in GET...!?

http://pubs.acs.org/action/doSearch?action=search&searchText=graphs&qsSearchArea=searchText&type=within&publication=40001010

Oh... it was also done in GET... So there was only one case that was done in POST... Why did I start writing this article in the first place... OTL Anyways, this is how to do it.

Tuesday, July 5, 2011

Using tuple as a key of unordered_map in boost

Yes, this is a very trivial thing, but (surprisingly) none of the sources on web gave me the direct solution. Although the solution is straightforward after all, but it took me tremendous time for me to figure this out, so I would like to leave some memo for future reference.

  1. When you define hash_value(), it is very important that the function is in the same namespace to that of key class.
  2. boost::tuple is in namespace boost::tuples. (Why!?!?)
So you have to include the following code:

typedef tuple param_tuple;

namespace boost {
  namespace tuples {
    std::size_t hash_value(param_tuple const& e) {
      std::size_t seed = 0;
      boost::hash_combine( seed, e.get<0>() );
      boost::hash_combine( seed, e.get<1>() );
      boost::hash_combine( seed, e.get<2>() );
      return seed;
    }
  }
}

Thursday, February 10, 2011

Using ROC Curves

As a homework of machine learning course, I'm implementing a spam filter with Naive Bayes classifier. To evaluate the performance, I wanted to use ROC curve, but I was unsure of how can I use it in a proper way. So I found the following tutorial extremely useful:

http://www.cs.bris.ac.uk/~flach/ICML04tutorial/

It has a lot of materials, maybe a little too much since I'm not that committed to the theory of ROC curve for now, but maybe I should definitely return to this material since this is surely of practical importance.

My labmate Nguyen Cao informed me about the R package on ROC curves. I haven't taken a serious look at it yet, but it looks pretty nice. Following is the link to the website:

http://rocr.bioinf.mpi-sb.mpg.de/

There are a lot of things to learn...! :D

Tuesday, February 8, 2011

Installed doxygen + doxymacs

I'm trying to learn to use Doxygen.

http://www.stack.nl/~dimitri/doxygen/

This way I hope I can learn to document my code better :)

The following page helped me install doxymacs, emacs plug-in for doxygen.
(Actually what I had to do was just to type apt-get install doxymacs)
http://emacs-fu.blogspot.com/2009/01/commenting-your-functions.html



Then I confronted the problem of not displaying my e-mail address correctly.
I think this is because I didn't configure my Ubuntu properly, but at least I could find that



 (setq user-mail-address "my@email.com")
will set my user-mail-address variable, and would propagate to doxymacs-user-mail-address.
As a statistics Ph.D student, I felt obliged to know how to document R codes (although R is not my favorite language for scientific computation). The solution is to try Roxygen in the following project page:
http://roxygen.org/
It's good that ESS (Emacs Speaks Statistics) supports Roxygen! Actually I've never done anything ambitious in R (just homework problems), but I would definitely try to use Roxygen when the appropriate time comes!